Archiving Case 3

-- Links remain to archived assembly from other assemblies on RR. These links may be from chain/nets, multiz, otherOrgs in hgGene, Human (Fly, Worm) Proteins and liftOver. LiftOvers are treated the same way in all Cases.

Rationale

Databases that have links remaining to them on RR need special handling. Browsers on both RR and archive will not draw links if the db to be linked to is not present; but if it is, we want to provide the link. In some cases, the link will not appear, in others, the coordinates will be displayed without a link to a browser.

/gbdb/[db] files will be kept available for mirrors and will not be added to the mirror exclude list. The mirrors can decide for themselves if they want to be able to get DNA from archived assemblies.

For some databases with links on archive, we want to provide them. In some cases, this has in the past meant copying a db to archive that is not itself archived. That is why hg17 is in both places. As long as they are there, we are linking to them. This may require updating the tables to the latest version. In general, we will not be pushing any more dbs just to satisfy links. Existing dbs with more than chromInfo that are not actually archived are danRer3, hg16, hg17 and sacCer1.

Set up database on hgarchive1:

  • Push and prune hgcentral table. The blatServers, sessionDb, and userDb tables should exist on hgcentarch[DbName], but should be empty. (No BLAT servers are available for archived dbs.) The remaining tables should be pruned so that only the relevant information is retained. For Case 3 (links on RR to archived db):
    On hgwbeta:
     hgsqldump --all -d -c hgcentralbeta blatServers sessionDb userDb > ~/hgcentral.sql
     hgsqldump --all -c hgcentralbeta clade dbDb defaultDb gdbPdb genomeClade liftOverChain >> ~/hgcentral.sql
    
    (if using hgcentral, add '-h genome-centdb' after the '-c' above)

    mysql on hgarchive1
    --------------------

     hgarchive1> mysql -A --user=hgcat -p  anydb
    
     CREATE DATABASE hgcentarchMm6;
     USE hgcentarchMm6;
     SOURCE ~/hgcentral.sql
    

    genomeClade

    get list for genomeClade from findDbPartners.csh and keep anything needed for otherOrgs (excluding liftOver).

  • method 1 - empty existing table from the sql load of entire hgcentral and reload (table signature remains)
     hgsql -h hgwbeta -e "SELECT * FROM genomeClade" hgcentralbeta > ~/mm6.genomeClade
    
    Then on hgarchive1, edit out orgs not needed: Keep
     Mouse 
     Tetraodon
     Zebrafish
     X. tropicalis
     Chicken
     Dog
     Opossum
     Cow
     Rat
     Human
     Chimp 
     Fugu
    
     mysql -A --user=hgcat -p hgcentarchMm6
     LOAD DATA LOCAL INFILE "~/mm6.genomeClade" INTO TABLE genomeClade;
    
    
  • method 2 - drop existing table and reload from sql file from hgwbeta
     hgsqldump --all -c hgcentralbeta genomeClade >> ~/mm6.genomeClade.sql
    
    or hgwdev:
     hgsqldump -h hgwbeta --all -c hgcentralbeta genomeClade >> ~/mm6.genomeClade.sql
    
    then on hgarchive1:
     mysql -A --user=hgcat -p hgcentarchMm6
     DROP TABLE genomeClade;
     SOURCE ~/mm6.genomeClade.sql
    
  • method 3 - simply prune exisiting table
     DELETE FROM genomeClade WHERE genome != 'Mouse'
      AND genome != 'Cow' 
      AND genome != 'Human' 
      AND genome != 'Tetraodon' 
      AND genome != 'Zebrafish' 
      AND genome != 'Rat'
      AND genome != 'X. tropicalis'
      AND genome != 'Chicken'
      AND genome != 'Dog'
      AND genome != 'Opossum';
    
    SELECT * FROM genomeClade;
    
  • dbDb

    get list for dDb from findDbPartners.csh and keep anything in there (excluding liftOver).

  • step 1 - remove dbDb rows for assemblies not needed for chain/net or Conservation track details-page links
    RENAME TABLE dbDb TO dbDbBackup;
    
    CREATE TABLE dbDb SELECT * FROM dbDbBackup WHERE
         name = "mm6"
      OR name = "rn3"
      OR name = "hg17"
      OR name = "panTro1"
      OR name = "canFam1"
      OR name = "canFam2"
      OR name = "bosTau1"
      OR name = "monDom1"
      OR name = "galGal2"
      OR name = "xenTro1"
      OR name = "danRer2"
      OR name = "tetNig1"
      OR name = "fr1";
    
    UPDATE dbDb SET orderKey = 1 WHERE name = "mm6";
    DROP TABLE dbDbBackup
    
  • step 2 - set dbDb to active to enable outlinks for any assembly on hgarchive.
    set active = 0 for everything else.
    hgGateway dropdown will have the otherOrgs, but they wil not stick.
    UPDATE dbDb SET active = 0;
    UPDATE dbDb SET active = 1 WHERE name = "mm6"
      OR name = "hg17"
      OR name = "danRer2"
      OR name = "bosTau1";
    
    SELECT name, active, orderKey FROM dbDb;
    
  • defaultDb

  • remove all rows from defaultDb except this db and set this one to default.
     UPDATE defaultDb SET name= 'mm6' WHERE genome = 'Mouse';
     DELETE FROM defaultDb WHERE name != 'mm6';
     SELECT * FROM defaultDb;
    
  • clade

  • trim clade table to one line.
     DELETE FROM clade WHERE name != 'vertebrate';
     SELECT * FROM clade;
    
  • liftOverChain

  • keep all liftOverChain rows relevant to this assembly.
     DELETE FROM liftOverChain WHERE fromDb != 'mm6' AND toDb != 'mm6';
    

    check all hgcentral tables

     SELECT COUNT(*) FROM blatServers;
     SELECT COUNT(*) FROM userDb;
     SELECT COUNT(*) FROM sessionDb;
    
     SELECT * FROM genomeClade;
     SELECT * FROM clade;
     SELECT * FROM liftOverChain;
     SELECT name, active, priority FROM dbDb ORDER BY active, name;
     SELECT * FROM defaultDb;