4b3dd9e844b2a6c502e5405e70f7976e30df7225
braney
  Thu Apr 25 13:13:51 2024 -0700
knownGeneV45lift37 on hg19

diff --git src/hg/makeDb/trackDb/human/hg19/knownGeneV45lift37.html src/hg/makeDb/trackDb/human/hg19/knownGeneV45lift37.html
new file mode 100644
index 0000000..a18605f
--- /dev/null
+++ src/hg/makeDb/trackDb/human/hg19/knownGeneV45lift37.html
@@ -0,0 +1,178 @@
+<h2>Description</h2>
+<p>
+The GENCODE Genes track (version 45, January 2024) shows high-quality manual
+annotations merged with evidence-based automated annotations across the entire
+human genome generated by the
+<a href="https://www.gencodegenes.org/" target="_blank">GENCODE project</a>.
+By default, only the basic gene set is
+displayed, which is a subset of the comprehensive gene set. The basic set represents transcripts
+that GENCODE believes will be useful to the majority of users.</p>
+
+<p>
+The track includes protein-coding genes, non-coding RNA genes, and pseudo-genes, though pseudo-genes
+are not displayed by default. It contains annotations on the reference chromosomes as well as
+assembly patches and alternative loci (haplotypes).</p>
+
+<p>
+The following table provides statistics for the v45 release derived from the GTF file that contains
+annotations only on the main chromosomes. More information on how they were generated can be found
+in the <a target="_blank" href="https://www.gencodegenes.org/human/stats.html">GENCODE site</a>.</p>
+
+<p>
+<blockquote><table class="stdTbl">
+<tr><th COLSPAN=4>GENCODE v45 Release Stats</th></tr>
+<tr align=left><th>Genes</th><th>Observed</th><th>Transcripts</th><th>Observed</th></tr>
+<tr align=left><td>Protein-coding genes</td><td>19,395</td><td>Protein-coding transcripts</td><td>89,110</td></tr>
+<tr align=left><td>Long non-coding RNA genes</td><td>20,424</td><td><font size="-1">- full length protein-coding</font></td><td>64,028</td></tr>
+<tr align=left><td>Small non-coding RNA genes</td><td>7,565</td><td><font size="-1">- partial length protein-coding</font></td><td>25,082</td></tr>
+<tr align=left><td>Pseudogenes</td><td>14,719</td><td>Nonsense mediated decay transcripts</td><td>21,427</td></tr>
+<tr align=left><td>Immunoglobulin/T-cell receptor gene segments</td><td>648</td><td>Long non-coding RNA loci transcripts</td><td>59,719</td></tr>
+<tr align=left><td>Total No of distinct translations</td><td>65,357</td><td>Genes that have more than one distinct translations</td><td>13,600</td></tr>
+</table><BR>
+</blockquote></p>
+
+<p>
+For more information on the different gene tracks, see our <a target="_blank"
+href="/FAQ/FAQgenes.html">Genes FAQ</a>.</p>
+
+<h2>Display Conventions and Configuration</h2>
+<p>
+By default, this track displays only the basic GENCODE set, splice variants, and non-coding genes.
+It includes options to display the entire GENCODE set and pseudogenes. To customize these
+options, the respective boxes can be checked or unchecked at the top of this description page. 
+
+<p>
+This track also includes a variety of labels which identify the transcripts when visibility is set
+to &quot;full&quot; or &quot;pack&quot;. Gene symbols (e.g. NIPA1) are displayed by default, but
+additional options include GENCODE Transcript ID (ENST00000561183.5), UCSC Known Gene ID
+(uc001yve.4), UniProt Display ID (Q7RTP0). Additional information about gene
+and transcript names can be found in our
+<a target="_blank" href="/FAQ/FAQgenes.html#genename">FAQ</a>.</p>
+
+<p>
+This track, in general, follows the display conventions for <a target="_blank"
+href="../goldenPath/help/hgTracksHelp.html#GeneDisplay">gene prediction tracks</a>. The exons for
+putative non-coding genes and untranslated regions are represented by relatively thin blocks, while
+those for coding open reading frames are thicker. 
+<p><b>Coloring</b> for the gene annotations is based on the annotation type: </p>
+<ul>
+  <li><font color="#0c0c78"><b>coding</b></font>: protein coding transcripts, including polymorphic
+       pseudogenes
+  <li><font color="#006400"><b>non-coding</b></font>: non-protein coding transcripts
+  <li><font color="#ff33ff"><b>pseudogene</b></font>: pseudogene transcript annotations
+  <li><font color="#fe0000"><b>problem</b></font>: problem transcripts (Biotypes of
+       retained_intron, TEC, or disrupted_domain)</li>
+</ul>
+
+<p>
+This track contains an optional <a target="_blank"
+href="../goldenPath/help/hgCodonColoring.html">codon coloring feature</a> that allows users to
+quickly validate and compare gene predictions. There is also an option to display the data as
+a <a target="_blank" href="../goldenPath/help/hgWiggleTrackHelp.html">density graph</a>, which
+can be helpful for visualizing the distribution of items over a region.</p>
+
+<a name="squishyPack"></a>
+<h3>Squishy-pack Display</h3>
+<p>
+Within a gene using the <b>pack</b> display mode, transcripts below a specified rank will be
+condensed into a view similar to <b>squish</b> mode. The <b>transcript ranking</b> approach is
+preliminary and will change in future releases. The transcripts rankings are defined by the
+following criteria for protein-coding and non-coding genes:</p>
+<b>Protein_coding genes</b>
+<ol>
+  <li>MANE or Ensembl canonical
+    <ul>
+      <li>1st: MANE Select / Ensembl canonical</li>
+      <li>2nd: MANE Plus Clinical</li>
+    </ul>
+  </li>
+  <li>Coding biotypes
+    <ul>
+      <li>1st: protein_coding and protein_coding_LoF</li>
+      <li>2nd: NMDs and NSDs</li>
+      <li>3rd: retained intron and protein_coding_CDS_not_defined</li>
+    </ul>
+  </li>
+  <li>Completeness
+    <ul>
+      <li>1st: full length</li>
+      <li>2nd: CDS start/end not found</li>
+    </ul>
+  </li>
+  <li>CARS score (only for coding transcripts)</li>
+  <li>Transcript genomic span and length (only for non-coding transcripts)</li>
+</ol>
+<b>Non-coding genes</b>
+<ol>
+  <li> Transcript biotype
+    <ul>
+      <li>1st: transcript biotype identical to gene biotype</li>
+    </ul>
+  </li>
+  <li>Ensembl canonical</li>
+  <li>GENCODE basic</li>
+  <li>Transcript genomic span</li>
+  <li>Transcript length</li>
+</ol>
+
+
+<h2>Methods</h2>
+<p>
+The GENCODE v45 track was built from the <a href="https://www.gencodegenes.org/human/"
+target="_blank">GENCODE downloads</a> file 
+<code>gencode.v45.chr_patch_hapl_scaff.annotation.gff3.gz</code>. Data from other sources
+were correlated with the GENCODE data to build association tables.</p>
+
+<h2>Related Data</h2>
+<p>
+The GENCODE Genes transcripts are annotated in numerous tables, each of which is also available as a
+<a href="http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/" target="_blank">downloadable
+file</a>.
+
+<p>
+One can see a full list of the associated tables in the <a href="/cgi-bin/hgTables"
+target="_blank">Table Browser</a> by selecting GENCODE Genes from the <b>track</b> menu; this list
+is then available on the <b>table</b> menu.
+</ul>
+
+<h2>Data access</h2>
+<p>
+GENCODE Genes and its associated tables can be explored interactively using the
+<a href="../goldenPath/help/api.html" target="_blank">REST API</a>, the
+<a href="/cgi-bin/hgTables" target="_blank">Table Browser</a> or the
+<a href="/cgi-bin/hgIntegrator" target="_blank">Data Integrator</a>. 
+The genePred format files for hg38 are available from our 
+<a target="_blank" href="http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/">
+downloads directory</a> or in our
+<a href="http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/" target="_blank">
+GTF download directory</a>. 
+All the tables can also be queried directly from our public MySQL
+servers, with more information available on our
+<a target="_blank" href="/goldenPath/help/mysql.html">help page</a> as well as on
+<a target="_blank" href="http://genome.ucsc.edu/blog/tag/mysql/">our blog</a>.</p>
+
+<h2>Credits</h2>
+<p>
+The GENCODE Genes track was produced at UCSC from the GENCODE comprehensive gene set using a
+computational pipeline developed by Jim Kent and Brian Raney.  This version of the track was
+generated by Jonathan Casper.</p>
+
+<h2>References</h2>
+
+<p>
+Frankish A, Carbonell-Sala S, Diekhans M, Jungreis I, Loveland JE, Mudge JM, Sisu C, Wright JC,
+Arnan C, Barnes I <em>et al</em>.
+<a href="https://academic.oup.com/nar/article-lookup/doi/10.1093/nar/gkac1071" target="_blank">
+GENCODE: reference annotation for the human and mouse genomes in 2023</a>.
+<em>Nucleic Acids Res</em>. 2023 Jan 6;51(D1):D942-D949.
+PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/36420896" target="_blank">36420896</a>; PMC: <a
+href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9825462/" target="_blank">PMC9825462</a>
+</p>
+
+<p>A full list of GENCODE publications is available
+at <a href="https://www.gencodegenes.org/pages/publications.html" target="_blank">The GENCODE
+Project web site</a>.
+</p>
+
+<h2>Data Release Policy</h2>
+<p>GENCODE data are available for use without restrictions.</p>