Date

Attendees

NameRelevant Expertise (for this effort)InstitutionCountry
Nicholas Provartcommunity resource, sequence analysis and visualization toolsBAR/University of TorontoCanada
Yuling Jiaogenome sequencing and assemblyPeking UniversityChina
Bo Wang, Xiaofei Yang, Kai Yegenome sequencing and assembly, centromere geneticsXi'an Jiaotong UniversityChina
Korbinian Schneeberger, Raúl Wijfjes, Xiao Donggenome sequencing and assemblyLudwig Maximilian University of Munich, Max Planck Institute for Plant Breeding ResearchGermany
Fernando Rabanalgenome sequencing and assembly, pancentromere characterizationMax Planck Institute for BiologyGermany
Alexandros Bousiostransposable element annotationUniversity of SussexUK
Klaas Van Wijkpeptide atlasCornell UniversityUSA
Craig PikaardrRNAs, NOR sequencing and assemblyIndiana UniversityUSA
Michael Schatzgenome sequencing, assembly, and annotationJohns Hopkins UniversityUSA
Terence Murphy, Francoise Thibaud-Nissen, Anjana Rainagenome annotation pipelinesNCBIUSA
Andrew Farmercomparative genomics and visualizationNCGRUSA
Shujun Outransposable element annotationOhio State UniversityUSA
Todd Michaelgenome sequencing, assembling plant genomesSalk InstituteUSA
Tanya Berardini, Leonore Reisercommunity resource, genome annotationTAIR/Phoenix BioinformaticsUSA


Goals

  • Get all participants on the same page, provide background and impetus for this project

Agenda

    1. Introductions: name, institution, interest in this effort, relevant expertise (15 mins)
    2. Tanya - very brief history, overview of current motivation, TAIR's efforts since Araport11 release (10 mins)
    3. Françoise/NCBI team member - overview of NCBI Eukaryotic Genome Annotation pipeline using the initial run with Naish T2T genome as example (15 mins)
    4. Korbinian - overview of Col-CC (community consensus) assembly progress so far (15 mins)
  • General discussion, aim to answer the following questions: (rest of time)
  • Should we use the Col-CC assembly as the basis for the v12 annotation?
  • If yes, is there anyone else, not currently included, who should be aware of or included in this process?
    • When is a reasonable date of completion?
  • Can NCBI perform the automated annotation with their eukaryotic pipeline with that consensus assembly?
  • Who can commit to participating in the manual review and update of the automated pass?
  • Tool/s to use? Deployed where?
  • Create list of participants, who else could we reach out to and involve in this part
  • Dataset specific expertise? lncRNAs, TEs, protein-coding genes, etc
  • TAIR can help in coordinating work to minimize overlap
  • Who would handle submission to Genbank and how can we best prepare for a smooth submission?
  • Schedule follow up meetings for subgroups (assembly, manual review, other)

Summary

General enthusiasm for the need and utility of a reannotation.

Proposed timeline: 12 calendar months to set up the framework, process, teams to get V12 released.

Funding: No dedicated, separately-sourced funding for any particular group at this time. Interested groups will contribute expertise and/or infrastructure.

  1. Assembly
    1. need to work out details of tracking the metadata on BioSample provenance for the individual pieces
    1. K. Schneebeger's group's work on assembling a Col-Community Consensus (CC) assembly is likely to finish by the end of 2022, and will incorporate C. Pikaard's group's data on NOR2 and NOR4, 4 Col-0 MA lines from F. Rabanal/D. Weigel
    2. Col-CC should be submitted to NCBI as an independent assembly
    3. Idea to visualize the multiple individual assemblies that were combined to make Col-CC as a patchwork (GCV? other visualization tool?)
  2. Automated Annotation
    1. NCBI will take the Col-CC assembly when accepted by NCBI and available and will run it through their eukaryotic annotation pipeline
    2. need to resolve details on whether or not to include the Araport11 proteins as evidence
    3. add isoSeq from PRJNA755474 from this paper to next run
    4. please send more recent isoSeq/RNA-seq/CAGE experimental data in GenBank to include in the next run
  3. Manual Review
    1. TAIR to investigate hosting requirements/existing training tools, ease of output of information needed for NCBI submission even before manual review begins
    2. used by many MODs to maintain their genomes, concurrent editing possible, community maintained code
    1. TAIR as coordinator
    2. Klass van Wijk: anything to do with proteins (including small peptides - sORFs, etc) and protein isoforms (AS, etc)
    3. Kai Ye : We (XJTU team) would work on centromeres and microsatellite sites.
    4. Shujun Ou, Alex Bousios: TEs, ATHILAs
    5. Craig Pikaard: NOR2 and NOR4, rDNAs
    1. WebApollo as tool
    2. Community experts
  4. Submission to NCBI/GenBank
    1. begin working on release early, no need to wait till manual review is done, can be done with dummy data to work out format issues
  5. Dissemination
    1. broad support for authorship on V12 paper for ALL who were involved in effort, in any stage of the process
    2. V12 release to be incorporated into TAIR, BAR, etc as soon as possible after NCBI RefSeq is updated to this version

Action items

  • We'll check in by email in mid-December to get an update from Korbinian and from TAIR on the assembly progress and WebApollo.