APOB, TTC39B, ATF3, RGS1, LIPG,I am trying to get the genomic coordinates (as in bp) with +/-5KB position of these genes via UCSC TableBrowser / MySQL server using NCBI 36/hg18 build. I have tried to get this information via TableBrowser, but I can't find the chromStart and chromEnd field in the given table. Am I missing something ?
Also, please share your favorite tutorial / docs that explain the schema/tables in UCSC MySQL server.
Thanks in advance.
Answer:
from the table browser http://genome.ucsc.edu/cgi-bin/hgTables?command=start , select group= Genes , track= UCSC gene , table=knownGene and then 'describe table schema'
You'll see that knownGene is linked to kgXref :
hg18.kgXref.kgID (via knownGene.name)
and kgXref contains a column named 'geneSymbol'.all in one, you can get the positions of the transcripts for those genes:
mysql -h genome-mysql.cse.ucsc.edu -A -u genome -D hg18 \
-e 'select distinct X.geneSymbol,G.chrom,G.txStart-5000,G.txEnd+5000 \
from knownGene as G, kgXref as X \
where X.geneSymbol in ("APOB", "TTC39B", "ATF3", "RGS1", "LIPG") \
and X.kgId=G.name'
+------------+-------+----------------+--------------+
| geneSymbol | chrom | G.txStart-5000 | G.txEnd+5000 |
+------------+-------+----------------+--------------+
| APOB | chr2 | 21072805 | 21125450 |
| ATF3 | chr1 | 210843616 | 210865739 |
| ATF3 | chr1 | 210800319 | 210865739 |
| ATF3 | chr1 | 210843616 | 210865704 |
| ATF3 | chr1 | 210849982 | 210864212 |
| LIPG | chr18 | 45337424 | 45378276 |
| LIPG | chr18 | 45337424 | 45367217 |
| RGS1 | chr1 | 190806479 | 190820782 |
| TTC39B | chr9 | 15156560 | 15302244 |
| TTC39B | chr9 | 15156560 | 15227442 |
| TTC39B | chr9 | 15171584 | 15302244 |
| TTC39B | chr9 | 15172968 | 15268702 |
| TTC39B | chr9 | 15172968 | 15227442 |
+------------+-------+----------------+--------------+
I agree that the information is hard to find: I only knew where to find it because I use to play regularly with those tables.The UCSC mailing list is a good place to find this kind of information. I also did a lot of reverse engineering by just 'greping' the flat files available from the UCSC.
The same script works for hg19 as well.
No comments:
Post a Comment