Operational taxonomic unit (OTU) table characterizing water track and adjacent soil microbial communities in Taylor Valley, Antarctica during the 2012-13 austral summer

Summary

Abstract:

This data package includes the abundance of microbial operational taxonomic units (OTUs) for samples collected during the austral summer of 2012-2013 in the Lake Hoare and Goldman Glacier Basins of Taylor Valley, Antarctica. A total of twenty samples from on- and off-water track soils were collected and analyzed. Samples were collected from the Lake Hoare Basin on 27 December 2012 and from the Goldman Glacier Basin on 4 January 2013. The aim of the study was to identify how variation in the measured physical and chemical environment of water tracks within the two water track systems influenced soil microbial community structure and diversity. Soil bacterial biodiversity was assessed using cultivation independent 16S rRNA gene sequencing.

Date Range:

December 27, 2012 to January 4, 2013

Data Citation

George, S., Fierer, N., Levy, J.S., Adams, B.J. 2020. Operational taxonomic unit (OTU) table characterizing water track and adjacent soil microbial communities in Taylor Valley, Antarctica during the 2012-13 austral summer. Environmental Data Initiative. DOI: 10.6073/pasta/a98c5ce00cc51d5424b07aebcfcf9f74. Dataset accessed 21 August 2025.

Dataset(s)

Description:

OTU Table

Variables (click to expand):

DATASET_CODE

Label: Dataset code
Definition: Internal dataset code
Type: Nominal
Missing values: None specified

OTU_ID

Label: Operational Taxonomic Unit ID
Definition: The operational taxonomic unit ID (e.g., “OTU_137”).
Type: Nominal
Missing values: None specified

A1SLH

Label: Sample A1SLH
Definition: Sample A1SLH, where "A1" represents sample 1, collected off the water track in Lake Hoare Basin. Values represent the number of raw (i.e., not rarefied) sequence reads for each OTU within this sample.
Type: Nominal
Missing values: None specified

A2SLH

Label:
Definition: Sample A2SLH, where "A2" represents sample 2, collected off the water track in Lake Hoare Basin. Values represent the number of raw (i.e., not rarefied) sequence reads for each OTU within this sample.
Type: Nominal
Missing values: None specified

A3SGG

Label: Sample A3SGG
Definition: Sample A3SGG, where "A3" represents sample 3, collected off the water track in Goldman Glacier Basin. Values represent the number of raw (i.e., not rarefied) sequence reads for each OTU within this sample.
Type: Nominal
Missing values: None specified

A4WTGG

Label: Sample A4WTGG
Definition: Sample A4WTGG, where "A4" represents sample 4, collected on the water track in Goldman Basin. Values represent the number of raw (i.e., not rarefied) sequence reads for each OTU within this sample.
Type: Nominal
Missing values: None specified

A5SLH

Label: Sample A5SLH
Definition: Sample A5SLH, where "A5" represents sample 5, collected off the water track in Lake Hoare Basin. Values represent the number of raw (i.e., not rarefied) sequence reads for each OTU within this sample.
Type: Nominal
Missing values: None specified

A6WTLH

Label: Sample A6WTLH
Definition: Sample A6WTLH, where "A6" represents sample 6, collected on the water track in Lake Hoare Basin. Values represent the number of raw (i.e., not rarefied) sequence reads for each OTU within this sample.
Type: Nominal
Missing values: None specified

A7WTLH

Label: Sample A7WTLH
Definition: Sample A7WTLH, where "A7" represents sample 7, collected on the water track in Lake Hoare Basin. Values represent the number of raw (i.e., not rarefied) sequence reads for each OTU within this sample.
Type: Nominal
Missing values: None specified

A8SGG

Label: Sample A8SGG
Definition: Sample A8SGG, where "A8" represents sample 8, collected off the water track in Goldman Glacier Basin. Values represent the number of raw (i.e., not rarefied) sequence reads for each OTU within this sample.
Type: Nominal
Missing values: None specified

A9WTLH

Label: Sample A9WTLH
Definition: Sample A9WTLH, where "A9" represents sample 9, collected on the water track in Lake Hoare Basin. Values represent the number of raw (i.e., not rarefied) sequence reads for each OTU within this sample.
Type: Nominal
Missing values: None specified

A10WTLH

Label: Sample A10WTLH
Definition: Sample A10WTLH, where "A10" represents sample 10, collected on the water track in Lake Hoare Basin. Values represent the number of raw (i.e., not rarefied) sequence reads for each OTU within this sample.
Type: Nominal
Missing values: None specified

A11WTGG

Label: Sample A11WTGG
Definition: Sample A11WTGG, where "A11" represents sample 11, collected on the water track in Goldman Glacier Basin. Values represent the number of raw (i.e., not rarefied) sequence reads for each OTU within this sample.
Type: Nominal
Missing values: None specified

A12SLH

Label: Sample A12SLH
Definition: Sample A12SLH, where "A12" represents sample 12, collected off the water track in Lake Hoare Basin. Values represent the number of raw (i.e., not rarefied) sequence reads for each OTU within this sample.
Type: Nominal
Missing values: None specified

A13WTLH

Label: Sample A13WTLH
Definition: Sample A13WTLH, where "A13" represents sample 13, collected on the water track in Lake Hoare Basin. Values represent the number of raw (i.e., not rarefied) sequence reads for each OTU within this sample.
Type: Nominal
Missing values: None specified

A14SLH

Label: Sample A14SLH
Definition: Sample A14SLH, where "A14" represents sample 14, collected off the water track in Lake Hoare Basin. Values represent the number of raw (i.e., not rarefied) sequence reads for each OTU within this sample.
Type: Nominal
Missing values: None specified

A15WTGG

Label: Sample A15WTGG
Definition: Sample A15WTGG, where "A15" represents sample 15, collected on the water track in Goldman Glacier Basin. Values represent the number of raw (i.e., not rarefied) sequence reads for each OTU within this sample.
Type: Nominal
Missing values: None specified

A16WTGG

Label: Sample A16WTGG
Definition: Sample A16WTGG, where "A16" represents sample 16, collected on the water track in Goldman Glacier Basin. Values represent the number of raw (i.e., not rarefied) sequence reads for each OTU within this sample.
Type: Nominal
Missing values: None specified

A17WTLH

Label: Sample A17WTLH
Definition: Sample A17WTLH, where "A17" represents sample 17, collected on the water track in Lake Hoare Basin. Values represent the number of raw (i.e., not rarefied) sequence reads for each OTU within this sample.
Type: Nominal
Missing values: None specified

A18SLH

Label: Sample A18SLH
Definition: Sample A18SLH, where "A18" represents sample 18, collected off the water track in Lake Hoare Basin. Values represent the number of raw (i.e., not rarefied) sequence reads for each OTU within this sample.
Type: Nominal
Missing values: None specified

A19SGG

Label: Sample A19SGG
Definition: Sample A19SGG, where "A19" represents sample 19, collected off the water track in Goldman Glacier Basin. Values represent the number of raw (i.e., not rarefied) sequence reads for each OTU within this sample.
Type: Nominal
Missing values: None specified

A20SGG

Label: Sample A20SGG
Definition: Sample A20SGG, where "A20" represents sample 20, collected off the water track in Goldman Glacier Basin. Values represent the number of raw (i.e., not rarefied) sequence reads for each OTU within this sample.
Type: Nominal
Missing values: None specified

TAXA_DOMAIN

Label: Taxonomic Domain
Definition: Taxonomic Domain of the OTU (e.g., “k_Bacteria”). Unclassified values indicated by “Unclassified”.
Type: Nominal
Missing values: None specified

TAXA_PHYLUM

Label: Taxonomic Phylum
Definition: Taxonomic Phylum of the OTU (i.e. “p_Proteobacteria”). Missing values indicated by blank cell.
Type: Nominal
Missing values: None specified

TAXA_CLASS

Label: Taxonomic Class
Definition: Taxonomic Class of the OTU (e.g., “c_Gammaproteobacteria”). Missing values indicated by blank cell, or “c__” if class could not be identified.
Type: Nominal
Missing values: None specified

TAXA_ORDER

Label: Taxonomic Order
Definition: Taxonomic Order of the OTU (e.g., “o_Pseudomonadales”). Missing values indicated by blank cell, or “o__” if order could not be identified.
Type: Nominal
Missing values: None specified

TAXA_FAMILY

Label: Taxonomic Family
Definition: Taxonomic Family of the OTU (e.g., “f_Pseudomonadaceae”). Missing values indicated by blank cell, or “f__” if family could not be identified.
Type: Nominal
Missing values: None specified

TAXA_GENUS

Label: Taxonomic Genus
Definition: Taxonomic Genus of the OTU (e.g., “g_Pseudomonas”). Missing values indicated by blank cell, or “g__” if genus could not be identified.
Type: Nominal
Missing values: None specified

TAXA_SPECIES

Label: Taxonomic Species
Definition: Taxonomic Species of the OTU (e.g., “s_viridiflava”). Missing values indicated by blank cell, or “s__” if species could not be identified.
Type: Nominal
Missing values: None specified

File:

SOILS_WT_OTU.csv (214.48 KB)

Short name:

SOILS_WT_OTU

Dataset ID:

264

People

Principal Investigator(s):

Contact:

McMurdo Dry Valleys LTER Information Manager

Associated Personnel:

Lab Technician

Data Manager

Methodology

Methods:

As described in the associated manuscript, soil samples and pore water were collected from the upper 10 cm of the soil horizon using aseptic techniques, and were stored in sterile Whirl-Pack bags at -20°C until processing. Sediment and pore water collected from the darkened portions of water tracks were designated “on-track,” and samples from adjacent lighter soils were classified as “off-track.” Off-track samples were located at least 5 m from the current edge of the water tracks. Wet, on-track soils have a typical albedo of 0.15, while off-track soil albedo is generally 0.22, making them readily distinguishable in the field.

DNA extraction and microbial community analyses were conducted using the cultivation-independent 16S rRNA gene sequencing approach as described in Prober et al., 2015 (doi: 10.1111/ele.12381). Total genomic DNA was extracted from each sample using the MoBio PowerSoil DNA Isolation Kit. For microbial analyses, the 4v hypervariable region of the 16S rRNA gene was PCR amplified using the 515f and 806f primer pair which captures both Bacteria and Archaea. Three PCRs were run per sample, with the amplicons from the replicate reactions pooled. Each primer pair included Illumina adapters and 12-bp error-correcting barcodes unique to each sample, as described in the Earth Microbiome Project protocol (Thompson et al., 2017; doi: 10.1038/nature24621). After gel visualization to confirm amplification, PicoGreen dsDNA assay was used to quantify amplicon yields. The amplicons were then pooled together in equimolar concentrations for sequencing on the Illumina MiSeq instrument. DNA sequencing was completed at the University of Colorado Next Generation Sequencing Facility using the 2x150pb paired-end sequencing chemistry. Four DNA extraction and for no-template PCR ‘blanks’ were included in the run to check for potential contamination.

Sequences were demultiplexed using a custom Python script (‘prep_fastq_for_uparse.py’, at: https://github.com/leffj/helper-code-for-uparse), with the UPARSE pipeline used for quality filtering and phylotype (i.e. operational taxonomic unit) clustering. Quality filtering was conducted using a maximum e-value of 0.5 with paired-end sequences merged prior to downstream processing. Representative sequences from returned phylotypes that were not ≥75% similar to sequences contained in the Greengenes database were removed; afterwards the raw sequences were mapped to phylotypes at a 97% similarity cutoff. Taxonomic classification of each phylotype was determined using the Ribosomal Database Project classifier against the Greengenes database with a confidence threshold of 0.5. The OTU table for which this metadata describes is the result.

The OTU table and eDNA sequences used for this study were from a legacy project, from which only the OTU table remained.

Additional information:

Funding for this study was provided by the National Science Foundation (NSF) as follows:

NSF OPP-1637708, LTER: ecosystem Response to Amplified Landscape Connectivity in the McMurdo Dry Valleys, Antarctica, PI: Michael N. Gooseff
NSF OPP-1341629, Collaborative Research: The Role of Glacial History on the Structure and Functioning of Ecological Communities in the Shackleton Glacier Region of the Transantarctic Mountains, PI: Noah Fierer
NSF OPP-1847067, CAREER: Linking Antarctic Cold Desert Groundwater to Thermokarst & Chemical Weathering in Partnership with the Geoscience UAV Academy, PI: Joseph S. Levy

Research Section:

LTER Core Areas:

population dynamics

MCM Keywords:

Keywords:

Metadata Download