Thursday, August 13, 2015

Frequently Asked Questions


  • How should I cite discoveries made using ExAC data?

    We anticipate the publication of a flagship paper describing this data set in early 2015. Until that is published, please cite: Exome Aggregation Consortium (ExAC), Cambridge, MA (URL: http://exac.broadinstitute.org) [date (month, year) accessed].
  • I have identified a rare variant in ExAC that I believe is associated with a specific clinical phenotype. What phenotype data are available for these individuals?

    Many of the individuals who have contributed data to ExAC were not fully consented for phenotype data sharing, and unfortunately at this time we are typically unable to provide any information about the clinical status of variant carriers. We have made every effort to exclude individuals with severe pediatric diseases from the ExAC data set, and certainly do not expect our data set to be enriched for such individuals, but we typically cannot rule out the possibility that some of our participants do actually suffer from your disease of interest. 

    In the future it may be possible to obtain phenotype data for a subset of ExAC samples. If you are interested in using such data in your studies, please email us.
  • What genome build is the ExAC data based on?

    All data are based on GRCh37/hg19
  • What version of dbSNP was used to annotate variants?

    dbSNP 135. The browser supports RSIDs up to dbSNP 141, but these are not included in the VCF.
  • What version of Gencode was used to annotate variants?

    Version 19 (annotated with VEP version 77).
  • Why does the browser seem to disagree with the ExAC VCF at this multiallelic site?

    Due to the limitations of the VCF format, multi-allelic variants are put together on one VCF line. This inevitably adds complexity to otherwise simple variants, and thus when parsing onto the browser, we apply a minimal representation script. For instance, a variant whose REF is GC and ALT alleles are TC,G - the first ALT allele is actually a SNP and will be represented in the browser as G->T.
  • What are the restrictions on data usage?

    All data here are released under a Fort Lauderdale Agreement for the benefit of the wider biomedical community. You can freely download and search the data, and use it for publications focused on specific sets of variants (for instance, assessing the frequency of a set of candidate causal variants observed in a collection of rare disease patients). However, we ask that you not publish global (genome-wide) analyses of these data until after the ExAC flagship paper has been published, estimated to be in early 2015. If you’re uncertain which category your analyses fall into, please email us. The data are available under the ODC Open Database License (ODbL) (summary available here): you are free to share and modify the ExAC data so long as you attribute any public use of the database, or works produced from the database; keep the resulting data-sets open; and offer your shared or adapted version of the dataset under the same ODbL license.
  • Are all the individuals in the Exome Variant Server included?

    No. We were not given permission from dbGaP to include individuals from several of the cohorts included in the NHLBI's Exome Sequencing Project. As a result, genuine rare variants that are present in the EVS may not be observed in ExAC.
  • What populations are represented in the ExAC data?

    Population
    Male Samples
    Female Samples
    Total
    African/African American (AFR)
    1,888
    3,315
    5,203
    Latino (AMR)
    2,254
    3,535
    5,789
    East Asian (EAS)
    2,016
    2,311
    4,327
    Finnish (FIN)
    2,084
    1,223
    3,307
    Non-Finnish European (NFE)
    18,740
    14,630
    33,370
    South Asian (SAS)
    6,387
    1,869
    8,256
    Other (OTH)
    275
    179
    454
    Total
    33,644
    27,062
    60,706
  • What cohorts are represented in the ExAC data?

    Consortium/Cohort
    Samples
    1000 Genomes
    1,851
    Bulgarian Trios
    461
    GoT2D
    2,502
    Inflammatory Bowel Disease
    1,675
    Myocardial Infarction Genetics Consortium
    14,622
    NHLBI-GO Exome Sequencing Project (ESP)
    3,936
    National Institute of Mental Health (NIMH) Controls
    364
    SIGMA-T2D
    3,845
    Sequencing in Suomi (SISu)
    948
    Swedish Schizophrenia & Bipolar Studies
    12,119
    T2D-GENES
    8,980
    Schizophrenia Trios from Taiwan
    1,505
    The Cancer Genome Atlas (TCGA)
    7,601
    Tourette Syndrome Association International Consortium for Genomics (TSAICG)
    297
    Total
    60,706

No comments:

Post a Comment