Gene Sorter Test Protocol

Config files

Check contents of config files in /usr/local/apache/cgi-bin/hgNearData *and* hgGeneData (genome.ra).

Core tables

ID Test Name Action Expected Result RT
1 joinerCheck Review all.joiner rules Rules contain all tables.
1.1 joinerCheck Run joinerCheck -keys -identifier=knownGeneId. (For worm, use identifier=wormBaseId. For yeast, use identifier=sgdCodingId. For fly, use identifier=bdgpTranscriptId). No errors.
1.2 joinerCheck Run joinerCheck -keys -identifier=knownIsoformCluster. (For fly, use identifier=bdgpIsoformCluster. For yeast, use identifier=sgdIsoformCluster). No errors.
2 Data integrity Select a region in knownGenes with overlapping alignments, and view the cluster assignments. Overlaps are in the same cluster.
3 Core tables Push knownIsoforms and knownCanonical to hgwbeta. Update hgNearOk column in hgcentralbeta.dbDb. Gene sorter is available on hgwbeta
3.1 Sort options View sort menu Options available should be:
  • Gene Distance
  • Chromosome
  • Name similarity
  • Alphabetical
  • GO similarity
A note on sort orders: some are absolute, some are relative. Absolute sort orders are Chromosome and Alphabetical. These will query the full set of known genes. Relative orders will always keep the gene of interest in the center of the display set.
3.2 Configuration Click on configure button Columns available should be:
  • #
  • name
  • UniProtKB
  • SP acc
  • genbank
  • 5' UTR Fold
  • 3' UTR Fold
  • genome position
  • exon count
  • pfam domains
  • PDB
  • gene ontology
  • description
Keep number, name, genome position and description selected. (This is the default). Note that genome position is a midpoint.
4 Alphabetical order Set the sort order to alphabetical. Select any gene to search on. For example: TLR1 (one of a group of highly conserved genes). Ordered list appears. It will probably not contain the gene you searched on. The values displayed are the geneSymbol column in the kgXref table. Duplicates will occur if a gene aligns in more than one place. The value in the search box will switch to the kgId from the knownIsoforms.transcript. Known issue: lower case ids are pushed to end of sort list. (RT #1127)
4.1 Name filter Restrict the name list using the filter option. Filters can be constructed for any column. Ordered list appears based on filter.
5 Chromosome order Set the sort order to chromosome. Set the display to 25. Click on the go button. The list that appears contains 25 elements, sorted by genome position. Again, it will probably not contain the gene you searched on. In the case of multiple alignments, they will be sorted by chromosome, in alphabetical order of chrom name (RT #1036). Alphabetical order is chr1, chr10, chr11..chr2, chr20. Numerical order would be better.
5.1 Chromosome filter Restrict the chromosome. Your input for chromosome name must match the chromInfo table exactly, and the start and end cannot include commas. (RT#998) Ordered list appears based on filter. Note the item that appears in the 13th position.
6 Gene distance Open another instance of the Gene Sorter. Make sure display is set to 25. Search on the item that appears in the 13th position in the first Gene Sorter. Set the sort order to gene distance. Click on the go button. The item that you searched on appears first in the list. The list continues based on the absolute values of the median distances.
6.1 Name link Use the links in the name column. Selected gene will move to top of list. (This does not have an effect when sort order is alphabetical or chromosome).
7 Name similarity Set the sort order to name similarity. Click on the go button. The item that you searched on appears first in the list. The list continues based on similar names, regardless of chromosome location.
8 GO similarity Use the configure function to add the "Gene Ontology" column. Set the sort order to go similarity. Click on the go button. The item that you searched on appears first in the list. The list continues based on shared GO values, regardless of chromosome location.
8.1 GO similarity n/a Select a gene that has "n/a" fr Only that gene appears in the list.

Protein Homology

table sort option configuration column
knownBlastTab Protein Homology - BLASTP Bits, E-Value, %ID
rankProp Protein Homology - rankProp Rankprop score
spPsiBlast Protein Homology - PSI-BLAST PSI-BLAST E-Value

knownBlastTab

spPsiBlast

rankProp

Pfam

Gene Maps

table column(s) in gene sorter source table
knownToRefSeq RefSeq refGene
knownToLocusLink (will be going away) LocusLink refGene
knownToGnfAtlas2, knownToGnf1h (human) GNF Atlas ID 2, GNF Atlas 2, Max GNF Atlas 2 gnfAtlas2, (affyGnf1h)
knownToGnfAtlas2, knownToGnf1m (mouse) GNF1M ID, GNF Atlas2, Max GNF Atlas2 gnfAtlas2, (affyGnf1m)
knownToU95 (human only; moderately sparse) U95 ID, GNF U95 (expression data), Max GNF U95 affyU95
knownToU133 (human only) U133 id only? affyU133
knownToU74 (mouse only) U74 ID, GNF U74a, GNF U74b, GNF U74c affyU74
knownToMOE430 (mouse only) MOE430 ID affyMOE430
knownToMOE430A (mouse only) Rinn Sex Exp, Max Rinn Sex also affyMOE430

Expression data

table sort option configuration column
knownExpDistance (human only; large; not published) Expression (UCLA) UCLA Exp Delta
gnfU95Distance (human only; large) Expression (GNF Atlas 1) GNF Atlas 1 Delta (sparse)
gnfAtlas2Distance (fairly large) Expression (GNF Atlas 2) GNF Atlas2 Delta (sparse)
rankProp Expression (GNF Atlas 2) GNF Atlas2 Delta (sparse)
affyGnfU74ADistance, affyGnfU74BDistance, affyGnfU74CDistance (mouse only) Expression (GNF U74A) none

To test this part of the gene sorter:

BLAST data (orthologs in other species)

Counts where eValue = 0

release human mouse rat zebrafish drosophila c. elegans yeast
hg17 n/a 14,328 5,188 4,493 1,734 994 261
mm5 12,536 n/a 5,326 4,629 1,467 821 222
mm6 9,283 n/a 4,178 3,664 1,049 582 155

Counts where identity = 100

release human mouse rat zebrafish drosophila c. elegans yeast
hg17 n/a 529 239 54 21 1 0
mm5 898 n/a 740 81 30 0 0
mm6 648 n/a 547 52 21 1 0