Total SMD Holdings
150 PB
→ 530 PB projected 2030
31.4 PB/yr current ingest
31.4 PB/yr current ingest
Annual Unique Users
53 M
Unique IP addresses/yr
ESD = 53% of total
ESD = 53% of total
Total Datasets
54,532
10 repositories · 5 divisions
PSD largest count: 27,460
PSD largest count: 27,460
Annual Ingest Rate
31.4 PB/yr
ESD drives 89% of growth
NISAR will accelerate further
NISAR will accelerate further
Data Volume by Division — Current & 2030
// ESD projected at 400,000 TB by 2030 — an 18× increase from FY2024
User Reach by Division (Unique IPs/yr)
// GIBS/Worldview accounts for 20M of ESD's 28.4M — visualization drives reach
Cloud vs On-Premises Transition — FY2024 → 2030
// 42% → 84% cloud over six years · ESD targets 395 PB cloud by 2030 · BPS is already 100% cloud-native
FY2024 — Current
On-Prem 58% · 87 PB
Cloud 42% · 90 PB
⟶
MIGRATION
+375 PB
cloud growth
cloud growth
2030 — Projected
On-Prem 16% · 83 PB
Cloud 84% · 458 PB
Data Access: Cloud Dominance
// ESD cloud access alone: 147,650 PB/yr — vs all SMD on-prem: 14,292 PB/yr
Annual Growth Rate by Division
// Proportional bars · NISAR will redefine ESD's ingest trajectory post-2025
ESD — Earth Science28,000 TB/yr
APD — Astrophysics2,100 TB/yr
PSD — Planetary Science785 TB/yr
HPD — Heliophysics470 TB/yr
BPS — Bio & Physical Sci92 TB/yr
Division Data Assets — FY2024 Cross-Comparison
// All volumes in Terabytes · Hover column headers for definitions
| Division | Users (IP/yr) | Datasets | Volume (TB) | Growth (TB/yr) | 2030 Proj (TB) | Cloud Posture | Div DMP |
|---|---|---|---|---|---|---|---|
| ESD Earth Science |
28,400,000 | 13,500 | 127,665 | 28,000 | 400,000 | 82% cloud | ✔ Yes |
| APD Astrophysics |
8,800,000 | 46 | 15,700 | 2,100 | 70,000 | 26% cloud | ✗ Mission-level |
| PSD Planetary Science |
7,990,000 | 27,460 | 3,860 | 785 | 24,000 | 54% hybrid | ✔ Yes |
| HPD Heliophysics |
8,000,000 | 11,500 | 3,151 | 470 | 35,000 | 40% hybrid | ✔ Yes |
| BPS Bio & Physical Sci |
50,000 | 1,068 | 240 | 92 | 750 | 100% cloud ✔ | ✗ Repo-level |
Dataset Count by Division
// PSD's federation creates highest count; APD's per-mission model produces fewest
ESD Cloud Access vs All Other SMD
// ESD cloud: 147,650 PB/yr · All other divisions cloud combined: 4.2 PB/yr
Division Technology Profiles
// Key architectural, policy, and platform characteristics
ESD
✔ Division DMP
✔ Login required
✔ 100% API (cloud)
✔ MAAP platform
✔ Openscapes/SAR Lab
✔ 228× access ratio
✔ Login required
✔ 100% API (cloud)
✔ MAAP platform
✔ Openscapes/SAR Lab
✔ 228× access ratio
APD
✗ No division DMP
✔ Embargoed login only
✔ Fornax/TIKE/RRN
✔ Free egress (ODR)
✗ 74% still on-prem
→ Developing cloud
✔ Embargoed login only
✔ Fornax/TIKE/RRN
✔ Free egress (ODR)
✗ 74% still on-prem
→ Developing cloud
PSD
✔ Division DMP
✗ No login required
✔ AWS us-west-2
✗ Download-dominant
✗ Low DOI coverage
→ Cloud developing
✗ No login required
✔ AWS us-west-2
✗ Download-dominant
✗ Low DOI coverage
→ Cloud developing
HPD
✔ Division DMP
✗ No login required
✔ 57% data on NCCS
✔ HelioCloud GPU
✔ Free egress (ODR)
✔ 100% API cloud
✗ No login required
✔ 57% data on NCCS
✔ HelioCloud GPU
✔ Free egress (ODR)
✔ 100% API cloud
BPS
✗ No division DMP
✗ No login required
✔ 100% cloud-native
✔ SMCE tools
✔ GeneLab on HEC
✔ Smallest footprint
✗ No login required
✔ 100% cloud-native
✔ SMCE tools
✔ GeneLab on HEC
✔ Smallest footprint
On-Premises Total
87 PB
58% of holdings · ↓ 83 PB by 2030
Cloud Total (AWS)
90 PB
42% of holdings · ↑ 458 PB by 2030
Cloud Access Rate
147,654 PB/yr
>99.9% driven by ESD · 228× ratio
On-Premises Storage Distribution
// ESD 77% · APD 17% · HPD 3.5% · PSD 2%
Cloud Storage Distribution
// ESD 91% · BPS 100% cloud despite smallest volume (240 TB)
API & DOI Compliance (Cloud Datasets)
// Datasets without APIs cannot support AI/ML pipelines · Without DOIs they cannot be cited
Computing Architecture by Division
// Platforms, HEC access, egress cost, and AWS region for each division
| Division | Cloud Platforms | HEC Access | Free Egress | AWS Region | Key Platforms |
|---|---|---|---|---|---|
| ESD | ✔ Yes | Case-by-case | ✗ Egress costs | us-west-2 | MAAP · Openscapes · Open SAR Lab · CryoCloud |
| APD | ↗ Developing | Via ROSES/NAS | ✔ Free (ODR) | us-east-1 | Fornax · NEXUS · TIKE · Roman Research Nexus |
| PSD | ↗ Developing | Via NAS | ↗ Developing | us-west-2 | SMD-provided services (in development) |
| HPD | ✔ HelioCloud | ✔ NCCS 57% data | ✔ Free (ODR) | AWS ODR | HelioCloud (CPU + GPU) |
| BPS | ✔ SMCE | GeneLab on HEC | ✔ Public S3 | AWS S3 | SMCE · RadLab · Environmental Data App |
SMD Repository Ecosystem — All 10 Repositories
// Hover each card for detail · Each serves a distinct scientific domain
High Energy Astrophysics Science Archive Research Center
HEASARC
📍 Goddard Space Flight Center
Astrophysics
Mikulski Archive for Space Telescopes
MAST
📍 Space Telescope Science Institute
Astrophysics
IPAC Infrared Science Archive
IRSA
📍 IPAC · Caltech
Astrophysics
Open Science Data Repository
OSDR
📍 Ames Research Center
Bio & Physical Sciences
Physical Sciences Informatics
PSI
📍 Marshall Space Flight Center
Bio & Physical Sciences
Earth Observation System Data & Information System
EOSDIS ★ LARGEST
📍 Goddard Space Flight Center
Earth Science
Solar Data Analysis Center
SDAC
📍 Goddard Space Flight Center
Heliophysics
Space Physics Data Facility
SPDF
📍 Goddard Space Flight Center
Heliophysics
Astromaterials Data System
Astromat
📍 Columbia University
Planetary Science
Planetary Data System
PDS ★ MOST DATASETS
📍 6 science nodes nationwide
Planetary Science
EOSDIS — Earth Science Infrastructure Detail
// 11 DAACs + 11 SIPSs · 228× cloud access-to-storage ratio · What drove cloud adoption
Key ESD FY2024 Metrics
Cloud users: 21M/yr
On-prem users: 7.4M/yr
Cloud access: 147,654 PB/yr
Cloud egress: 649 PB/yr
Cloud storage: 82,044 TB
On-prem storage: 67,121 TB
Cloud datasets: 5,700 (100% API)
On-prem datasets: 7,800 (100% API)
User Platforms
▸ MAAP (multi-mission)
▸ Openscapes (Jupyter)
▸ CryoCloud
▸ Open SAR Lab
▸ GIBS / Worldview
▸ GIS Transform Services
▸ Openscapes (Jupyter)
▸ CryoCloud
▸ Open SAR Lab
▸ GIBS / Worldview
▸ GIS Transform Services
Adoption Drivers
▸ COG + Zarr partial access
▸ Hackathons + workshops
▸ Managed Jupyter envs
▸ Traditional download kept
▸ Community engagement
▸ Login friction: known issue
▸ Hackathons + workshops
▸ Managed Jupyter envs
▸ Traditional download kept
▸ Community engagement
▸ Login friction: known issue
Technical Glossary
// All acronyms from NASA/SP-20260001965 · Type to filter
DAAC
Distributed Active Archive Center
One of 11 ESD facilities managing ingest, archival, and distribution of Earth science data by domain. Examples: LP DAAC (land processes, USGS), GES DISC (atmosphere, GSFC), NSIDC DAAC (cryosphere), ASF DAAC (SAR/NISAR, Alaska). Each DAAC is operationally independent but unified under a single EOSDIS discovery portal at Earthdata.gov.
EOSDIS
Earth Observation System Data & Information System
NASA's unified Earth science data infrastructure managed at GSFC. Spans 11 DAACs + 11 SIPSs providing ingest, preservation, access, and distribution. Single discovery portal at Earthdata.gov. Current holdings: 127,665 TB growing to 400,000 TB by 2030. Serves 21M cloud + 7.4M on-prem users annually.
DOI
Digital Object Identifier
Persistent identifier enabling datasets to be cited in peer-reviewed publications and tracked for usage impact. Key gap: PSD on-prem has only 11% DOI coverage (1,565 of 13,620 datasets). APD cloud has only 12 of 46. Without DOIs, datasets cannot participate in citation networks or AI-driven data discovery pipelines.
API
Application Programming Interface
Machine-readable interface for programmatic data access. Datasets without APIs cannot participate in automated workflows or AI/ML pipelines. ESD and BPS achieve near-100% API coverage across cloud holdings. Critical gap: PSD on-prem has only 640 of 13,620 datasets API-accessible (4.7%), limiting its integration with modern science platforms.
COG
Cloud-Optimized GeoTIFF
Geospatial raster format enabling HTTP range requests — partial file reads without full download. Key enabler of ESD's 228× access-to-storage ratio. Users retrieve only the spatial extent or time step needed, dramatically reducing egress costs and latency. Used alongside Zarr for multidimensional EO data across EOSDIS DAACs.
Zarr
Zarr — Cloud-Native Array Format
Chunked, compressed, multidimensional array format for cloud object storage (S3). Enables dimension-sliced access (e.g., a single pixel's complete time series) without reading full files. Critical for AI/ML workflows on large EO datacubes. Increasingly used alongside COG across EOSDIS DAACs as cloud-native analysis replaces bulk downloads.
MAAP
Multi-Mission Algorithm and Analysis Platform
ESD's flagship cloud analysis environment enabling computation co-located with data — eliminating egress costs and download latency. Designed for cross-mission workflows (e.g., combining GEDI + ICESat-2 + NISAR for biomass estimation). Key driver of ESD's 21M cloud user base and 228× access-to-storage ratio.
NCCS
NASA Center for Climate Simulation
NASA HEC facility at GSFC housing the Discover supercomputer. HPD uniquely copies 57% of its data (1,800 TB) to NCCS for HEC access — a "bring data to computation" strategy. Also supports GMAO weather/climate modeling, where GES DISC observational data feeds operational model runs and outputs are archived back to EOSDIS.
CSDA
Commercial Smallsat Data Acquisition Program
NASA ESD program purchasing commercial satellite data (Planet, Maxar, Capella, ICEYE, GHGSat, Satellogic, Tomorrow.io) via a $476M IDIQ contract. Vendor data evaluated by PI teams (scientific utility across all 6 ESD R&A focus areas) and SME teams (radiometric/geometric quality) before sustained purchase decisions. On-ramp #4 evaluated PlanetiQ and Umbra.
SIPS
Science Investigator-led Processing Systems
ESD's 11 PI-operated processing facilities producing higher-level data products from raw mission telemetry before submission to DAACs. Examples: MODAPS (MODIS/VIIRS processing), OBDAAC (ocean biology), LANCE (near-real-time products for FIRMS/VIIRS fire alerts). Distinct from DAACs, which focus on preservation and user access rather than production.
HEC
High-End Computing
Advanced computational resources provided through NASA Advanced Supercomputing (NAS) at Ames (Pleiades) and NCCS at GSFC (Discover). Most SMD divisions provide limited adjacent compute resources. HPD is the only division with a systematic data-co-location-with-HEC strategy (57% of data at NCCS). APD and PSD route HEC access via ROSES proposal allocations.
GIBS
Global Imagery Browse Services
ESD tile-based satellite imagery service via OGC WMTS/WMS protocols, powering NASA Worldview and third-party GIS applications. Accounts for 20M of ESD's 28.4M unique users per year — making it the largest single ESD access pathway. Distinct from Earthdata.gov's 7.4–8.4M download-oriented users. Critical for rapid EO data browsing and situational awareness.
ODR
AWS Open Data Registry
AWS program waiving data egress charges for qualifying public datasets. APD (us-east-1) and HPD store data under ODR, enabling free access for all researchers. ESD stores in standard AWS us-west-2 (not ODR), so egress costs apply — a friction point explicitly noted in the report. ESD's 649 PB/yr egress at AWS commercial rates would represent substantial costs without institutional arrangements.
ROSES
Research Opportunities in Space and Earth Sciences
NASA SMD's omnibus competitive grant solicitation, published annually. Used by CSDA to recruit PI evaluation teams (e.g., ROSES-2023 A.51 for PlanetiQ and Umbra, 70-day proposal window). Also the mechanism allocating NAS HEC resources to researchers with selected proposals. New program elements for commercial data evaluation appear as ROSES amendments.
SMCE
Science Managed Cloud Environment
BPS-managed cloud platform hosting analysis tools — Multi-Study Data Visualization, Environmental Data Application, and RadLab. Provides CPU compute adjacent to BPS S3 data. A model for other divisions considering how to provide analysis infrastructure without requiring users to independently manage cloud architecture. GeneLab workflows also run on HEC via SMCE coordination.
DMP
Data Management Plan
Formal plan specifying how a division or mission manages data through its lifecycle — from creation through long-term archiving. Three divisions (ESD, HPD, PSD) have division-level standardized DMPs. APD and BPS use repository- or mission-specific approaches — trading cross-mission consistency for flexibility. The report flags inconsistent DMP coverage as a cross-division interoperability gap.
HDRL
Heliophysics Data Resource Library
HPD's unified metadata and discovery layer federating SDAC and SPDF into a single search interface. Provides standardized high-level metadata across both heliophysics repositories. HelioCloud extends HDRL into the cloud with GPU and CPU compute, enabling scalable heliophysics analysis without local data download. HPD is the only division providing GPU cloud compute adjacent to data.
PDS
Planetary Data System
PSD's federated archive: 6 science discipline nodes (Atmospheres, Cartography, Geosciences, Imaging, Rings, Small Bodies) plus NAIF ancillary data facility. Each node independently curates and serves its domain. Largest dataset count in SMD (27,460) due to per-mission granular archiving. Archives all NASA planetary mission data from Voyager and Pioneer through Perseverance and New Horizons.
OCSDO
Office of the Chief Science Data Officer
SMD office responsible for making NASA science data accessible, preserved, and ready for future discovery. Manages cross-division data policy, DOI/API/DMP standards, cloud transition strategy, and the annual repository metrics reporting. Kevin Murphy (CSDO) authored this report; Chelle Gentemann chairs the Steering Team. Also governs the CSDA commercial data acquisition policy framework.
ChatGSFC
ChatGSFC — GSFC AI Assistant
NASA Goddard Space Flight Center's internal AI assistant platform. Used in preparing this FY2024 report to combine, clarify, and summarize division-contributed metrics. Disclosed transparently in the document's AI Usage Disclosure per agency reporting standards (first NASA SP-series report to include such a disclosure). Content was reviewed and edited by the human author team led by Kevin Murphy.
Sci-X
Science Explorer
NASA's next-generation publication discovery and search platform, planned to succeed ADS (Astrophysics Data System) agency-wide. APD, HPD, and PSD identify Sci-X as their primary publication discovery mechanism. Aims to increase interlinking between repository data DOIs and related publications, enabling provenance tracking and AI-assisted cross-mission literature mining.