Skip to main content

Bioinformatics Pipeline for Zostera marina (Eelgrass)

Bioinformatics Pipeline for Zostera marina (Eelgrass)

About the Project

In the lower Chesapeake Bay, Virginia, warmer water temperatures in recent years have resulted in large scale diebacks of eelgrass meadows (Zostera marina). In contrast, many eelgrass populations in Back Sound, North Carolina appear to be more resilient to warming water temperatures. Understanding the drivers of these regional differences in eelgrass resilience could help more effectively restore eelgrass meadows in a changing climate. With a network of the intended users from reserves, state agencies, and Chesapeake Bay nonprofits, this project compared resiliency traits of eelgrass populations in Virginia and North Carolina by conducting reciprocal restoration trials and genomic sequencing. The project results indicate the importance of seed sources in potential future eelgrass restoration, in addition to site selection.

About this Resource

This tool provides a complete, step-by-step workflow for processing and analyzing whole-genome resequencing data from Zostera marina (eelgrass). It includes shell scripts and SLURM job submission commands for each major stage: adapter trimming, sequence concatenation, read mapping, variant calling, SNP filtering, and genotype imputation. The pipeline also covers downstream processing steps such as depth and missingness calculations, population-level VCF filtering, and genotype matrix generation for population genetic analyses. Designed for use on high-performance computing clusters, this pipeline supports reproducible genomic data analysis for ecological, evolutionary, and restoration genetics applications.

This resource is intended for bioinformaticians, population geneticists, and ecological genomics researchers working with Zostera marina or other marine species. It is specifically designed for users with intermediate to advanced experience in Linux-based high- performance computing environments, shell scripting, and genomics software such as GATK, vcftools, and samtools. This pipeline is suited for those conducting whole-genome resequencing projects who require a scalable, reproducible workflow for variant discovery, filtering, and population-level genotype analysis.