Abstract
Stratified microbial communities are central to ocean biogeochemical cycles, yet their vertical structure and functional potential remain under characterized in oligotrophic regions. We present a metagenomic dataset from surface ocean and the deep chlorophyll maximum (DCM) layers of the stratified Western Pacific Ocean, sampled at four stations spanning approximately 800 kilometres. Each of the eight samples generated over 22.9 Gb of high-quality Illumina HiSeq 2500 paired end reads (Q20 > 95%, Q30 > 90%). De novo assemblies yielded 1.3–1.9 million contigs per sample, with total assembly sizes of 948 Mb to 1.33 Gb and N50 values of 632–749 bp. Gene prediction identified ~5.26 million non-redundant genes across all samples, reflecting substantial microbial diversity and depth-specific variation. Assembly statistics, taxonomic profiles, and functional annotations of genes are included for technical validation of the dataset, demonstrating data completeness and analytical depth. This dataset offers annotated sequence data and environmental metadata suitable for benchmarking, method development, and comparative studies of marine metagenomes.
Data availability
All data supporting this study are publicly available. Raw metagenomic sequencing reads are available from the NCBI Sequence Read Archive under BioProject PRJNA1311452 (SRA accession SRP613464). Metagenome assemblies and metagenome-assembled genomes are available from NCBI GenBank under the same BioProject. Assembly FASTA files and associated annotation datasets are available via Figshare at https://doi.org/10.6084/m9.figshare.30060526.
Code availability
No custom scripts were employed in the generation of the dataset for this study. All bioinformatic analyses were performed using established tools, with their respective parameters and software versions detailed in the Methods section, in accordance with standard scientific practice.
References
Xie, Z.-X. et al. Dissecting microbial community structure and metabolic activities at an oceanic deep chlorophyll maximum layer by size-fractionated metaproteomics. Prog. Oceanogr. 188, 102439 (2020).
Gong, F. et al. Response of microbial community of surface and deep chlorophyll maximum to nutrients and light in South China Sea. Frontiers in Marine Science 10, 1122765 (2023).
Giner, C. R. et al. Marked changes in diversity and relative activity of picoeukaryotes with depth in the world ocean. The ISME journal 14, 437–449 (2020).
Sunagawa, S. et al. Structure and function of the global ocean microbiome. Science 348, 1261359 (2015).
Thangaraj, S. et al. Unraveling prokaryotic diversity distribution and functional pattern on nitrogen and methane cycling in the subtropical Western North Pacific Ocean. Mar. Pollut. Bull. 196, 115569 (2023).
Venter, J. C. et al. Environmental genome shotgun sequencing of the Sargasso Sea. science 304, 66–74 (2004).
Walsh, E. A. et al. Bacterial diversity and community composition from seasurface to subseafloor. The ISME journal 10, 979–989 (2016).
Beman, J. M. & Carolan, M. T. Deoxygenation alters bacterial diversity and community composition in the ocean’s largest oxygen minimum zone. Nature communications 4, 2705 (2013).
Thangaraj, S. et al. Water mass structure determine the prokaryotic community and metabolic pattern in the Korea Strait during fall 2018 and 2019. Frontiers in Marine Science 10, 1215251 (2023).
Bengtsson‐Palme, J. et al. METAXA2: improved identification and taxonomic classification of small and large subunit rRNA in metagenomic data. Molecular ecology resources 15, 1403–1414 (2015).
Quast, C. et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41, D590–D596 (2012).
Caporaso, J. G. et al. QIIME allows analysis of high-throughput community sequencing data. Nat. Methods 7, 335–336 (2010).
Hammer, O. PAST: Paleontological statistics software package for education and data analysis. Palaeontol electron 4, 9 (2001).
Peng, Y., Leung, H. C., Yiu, S. M. & Chin, F. Y. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 1420–1428, https://doi.org/10.1093/bioinformatics/bts174 (2012).
Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 1–11 (2010).
Yin, Y. et al. dbCAN: a web resource for automated carbohydrate-active enzyme annotation. Nucleic Acids Res. 40, W445–W451 (2012).
Tatusov, R. L. et al. The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res. 29, 22–28 (2001).
Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016).
Bairoch, A. & Apweiler, R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28, 45–48 (2000).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP613464 (2025).
NCBI GenBank https://identifiers.org/ncbi/bioproject:PRJNA1311452 (2025).
Thangaraj, S. Gene Catalog and Associated Functional Data for BioProject PRJNA1311452. figshare. Dataset https://doi.org/10.6084/m9.figshare.30060526 (2025).
Acknowledgements
The authors acknowledge receiving financial support for the research and publication of this article. This support included funding from the National Key R&D Program of China (2019YFC1407800), the National Natural Science Foundation of China (41876134), and the Changjiang Scholar Program of the Chinese Ministry of Education (T2014253) to JS.
Author information
Authors and Affiliations
Contributions
Satheeswaran Thangaraj (S.T.) and Jun Sun (J.S.) collaboratively conceived and designed the study. S.T. conducted the field sampling, carried out DNA extraction, and coordinated the sample processing and all analysis. J.S. provided project supervision and secured funding support. Both authors contributed to analysing and interpreting the results, drafting and revising the manuscript, and approved the final version for publication.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Reprints and permissions
About this article
Cite this article
Thangaraj, S., Sun, J. Depth Resolved Metagenomic Dataset from Surface and Deep Chlorophyll Maximum Layers in the Western Pacific Ocean.
Sci Data (2026). https://doi.org/10.1038/s41597-026-06706-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-026-06706-5
Source: Ecology - nature.com
