VI

VI. MAIZE SEQUENCING STATUS REPORTS

THE MAIZE SEQUENCING PROJECTS

<http://www.maizegdb.org/sequencing_project.php>

This is a summary of Rick Wilson's talk at the 2006 Maize Meeting, posted at MaizeGDB and updated May 2006.

November 15, 2005, the NSF, USDA, and DOE announced their award of $32 million to the Genome Sequencing Center (Washington University;GSC), Cold Spring Harbor Laboratory(CSHL), the Arizona Genome Institute (AGI), Iowa State University (ISU), University of California-Berkeley, DOE Joint Genome Institute (JGI), University of Georgia and Stanford University for sequencing the maize genome. See also: <http://www.nsf.gov/news/news_summ.jsp?cntn_id=104608&org=BIO&from=news>

Project Descriptions

B73 A BAC by BAC approach

This effort - expected to require three years of work - will utilize a minimal tiling path of approximately 19,000 mapped BAC clones, and will focus on producing high-quality sequence coverage of all identifiable gene-containing regions of the maize genome. These regions will be ordered, oriented, and along with all of the intergenic sequences, anchored to the extant physical and genetic maps of the maize genome. Important features of the project include immediate release of preliminary and high-quality sequence assemblies, and the development of a genome browser that will facilitate user interaction with sequence and map data.

Mo17 Chromosome 10 by shotgun sequencing (JGI)

A whole genome shotgun (WGS) strategy is expected to capture ~90% of the maize genome. The WGS strategy is to be assessed using chromosome 10 of Mo17 flow sorted material as a test case.

B73 Project Input Data Descriptions

The Physical Map <http://www.genome.arizona.edu/fpc/maize/>

Total Assembled Contigs: 721
Equal to 2,150 Mb; 93.5% coverage of 2300 Mb genome
Anchored: 421 ctgs; 86.1% the genome
Average anchored contig size: 4.7 Mb
Unanchored: 300 ctgs, 7.4% coverage
Average unanchored contig size: 0.56 Mb
189 of the 300 unanchored contigs are less than 10 clones
Largest anchored contig 22.9Mb in Chr9
Largest unanchored contig 6.7 Mb

Total FPC Markers: 25,000
STS markers: ~9,000
Overgo Markers: 14,825
Anchored markers: 1,918

The Tiling Path

Using the physical map, ~3,200 seed BACs are being chosen with an average spacing of 800 kb. These seeds are required to have:

1) at least one end sequenced,

2) both agarose and HICF fingerprints,

3) at least average insert size (~150 kb),

4) at least one overgo match.

Subsequently, BAC end sequences and fingerprint data are being used to extend the seed BACs into tiling path contigs for sequencing.

B73 Project Output Data Descriptions

Sequence traces:
Automatically deposited to the Trace Archive at NCBI within 24 hours of production (includes fosmid ends).

BAC clone assemblies:
Phase 1 HTGS_FULLTOP: 2 x 384 paired end attempts. Completed shotgun phase.
Phase 1 HTGS_PREFIN Completed automated improvement phase.
Phase 1 HTGS_ACTIVEFIN. Active work being done by a finisher.
Phase 1 HTGS_IMPROVED. Finished sequence in gene regions. Improved regions will be indicated. Once order and orientation of improved segments are confirmed, a comment will be added to indicate this.

B73 Project Timeline

Year 1:

Production sequencing for ~7,000 BAC clones (GSC).

Sequence 0.55M (0.3X coverage) fosmid end pairs (GSC).

Begin pre-finishing and finishing (GSC, AGI, CSHL).

Finish ~4,500 BACs (GSC, AGI, CSHL)

Begin genome assembly & annotation efforts (CSHL, ISU).

Year 2:

Production sequencing for ~10,000 BAC clones (GSC).

Finish ~10,000 BACs (GSC, AGI, CSHL).

Continue genome assembly & annotation efforts (CSHL, ISU).

Year 3:

Production sequencing for remaining BACs (GSC).

Finish remaining (~4,500) BACs (GSC, AGI, CSHL).

Continue genome assembly & annotation efforts (CSHL, ISU).

Accessing the B73 data:

At NCBI http://www.ncbi.nlm.nih.gov/entrez/
Use the nucleotide search :
Zea mays[ORGN] AND HTG[KYWD] AND WUGSC[CTR]
to pull the clone assemblies currently available.

http://www.maizesequence.org (available by late summer 2006)

Genome assemblies:

Annotated BAC clones assembled in the context of mapping and other data, displayed in Gramene. Dynamically updated as new data is available. No built-in delays; new builds, annotation and data will be made available as processing queues allow. See also p. 74, this volume.

Further information on the project can be accessed through the following links:

Maize Genome Sequencing Information Portal <http://www.maizegdb.org/genome/ >

(reviews of sequencing methods, the Request for Proposals, etc.)

The Genome Sequencing Center's Maize Page for B73

<http://genome.wustl.edu/genome.cgi?GENOME=Zea%20mays%20mays%20cv.%20B73&GROUP=7>

B73 and Mo17 FISH image shows repetitive sequences (including knobs; courtesy of Jim Birchler)

<http://www.maizegdb.org/genome/B73Mo17FISH.php>

MGSC: Gramene and MaizeGDB cooperate to provide access to sequences and related data

--Lawrence, CJ; Ware, D

The NSF, USDA, and DOE announced on November 15, 2005 that together they had funded the sequencing of the genome of inbred line B73 as well as chromosome ten of Mo17 (a project that aims simultaneously to evaluate shotgun sequencing strategies for large genomes and to investigate maize diversity). In addition, the USDA-ARS contributed the MaizeGDB project resources. Because Gramene will be the primary portal to the maize B73 sequences (which are to be annotated by the Ware group), a description of past and present interactions between MaizeGDB and Gramene is presented here. This contribution describes our groups� interactions and also explains current and planned access points and portals to the maize sequence data. For a description of the maize sequencing project�s deliverables and timelines, see pp. 71-72 in this volume of the Maize Newsletter.

MaizeGDB and Gramene personnel began collaborating early on, and have been involved in developing shared resources like the Plant Ontologies, (http://www.plantontology.org) a set of terms that describe plant anatomy and developmental stages, for the last three years. This hierarchical vocabulary enables data to be integrated by the use of common terms across different databases to describe divergent datasets, such as EST collections, mutant strains, and stocks, so that they can be simultaneously searched and analyzed. This set of terms currently is in place at both MaizeGDB and Gramene, enabling the annotation of various data types at both repositories, and is a resource upon which many connections can be built (between MaizeGDB and Gramene, and also with other resources like TAIR, the Solanacea Genomics Network, the Virtual Plant Information Network, and other plant databases).

In addition to working together, members of the MaizeGDB and Gramene teams have been apprised of and involved in the development of both resources. For instance, Gramene PI L. Stein contributed to guiding MaizeGDB�s development by serving on the MaizeDB to MaizeGDB Transition Steering Committee, and Gramene co-PI D.W. currently serves as a member of the MaizeGDB Working Group. Similarly, MaizeGDB director C.J.L. has participated in Gramene Scientific Advisory Board meetings during the past two years. Curators from Gramene attended the MaizeGDB curation tools workshop in Ames, Iowa in the fall of 2004, and a working meeting to integrate maps and molecular markers was co-organized by MaizeGDB and Gramene personnel and was conducted one evening at the 2005 Maize Genetics Conference. Ideas and data are exchanged between the two groups on a regular basis.

The first of a number of sequence data meetings between the Ware maize sequence analysis group and the MaizeGDB team is slated to take place in June of 2006 at the Cold Spring Harbor Laboratory. During this meeting, we will work to identify means to synchronize data release and make accessing maize sequence data easier for researchers, irrespective of data storage location. We also will explore methods for addressing feedback from maize geneticists that is relevant to both projects. We expect that a joint feedback mechanism may be in order, but the logistics and implementation of such a mechanism will require serious consideration and discussion. It is expected that outcomes from the June meeting will serve to guide both groups� development strategies to maximize accessibility to sequence data while minimizing duplication of effort.

At present, the Gramene and MaizeGDB websites are linked throughout by way of shared data, common nomenclature, and a standard set of linking rules. New linkages and entry points to data will be made available at both sites as they are identified. For a list of some existing linkages, see Tables 1 and 2. Datasets shared by both groups include sequences, BACs, loci, markers, maps, and ontology terms. These datasets will serve as the basis for creating new linkages to increase the interconnectedness of the two resources. We solicit ideas you might have for how to improve both MaizeGDB and Gramene. Please send all comments and suggestions to both MaizeGDB and Gramene by way of our groups� shared email address: [email protected]. Your help, guidance, and continued support are greatly appreciated!

Table 1. Links from MaizeGDB to Gramene that are already in place.

MaizeGDB Data Type	<Example Entry URL> and Link Placement to Gramene	Purpose
Sequences	<http://www.maizegdb.org/cgi-bin/displayseqrecord.cgi?id=AC149813> Right green bar, under �Search Tools�.	Jump from MaizeGDB BAC data to the Gramene Finger Print Contig viewer
BACs	<http://www.maizegdb.org/cgi-bin/displaybacrecord.cgi?id=507533> Top of the page, in bold font.	Jump from MaizeGDB BAC data to the Gramene Finger Print Contig viewer
Loci	<http://www.maizegdb.org/cgi-bin/displaylocusrecord.cgi?id=12098> Right green bar, under �Search Tools�.	View the locus within the context of its map location using CMap
Maps	<http://www.maizegdb.org/cgi-bin/displaymaprecord.cgi?id=143439> Right green bar, under �Other Map Views�.	View the map visually using CMap

Table 2. Links from Gramene to MaizeGDB that are already in place.

Gramene Data Type	<Example Entry URL> and Link Placement to MaizeGDB	Purpose
BACs	< http://www.gramene.org/Zea_mays/cytoview?mapfrag=AC149813> Context menu for BAC on �Acc Clones� track.	Show associated marker data on MaizeGDB
Maps	<http://www.gramene.org/Zea_mays/cytoview?mapfrag=c0148C07> Context menu for clone on �FPC Map� track.	Show associated marker data on MaizeGDB
Markers	<http://www.gramene.org/Zea_mays/cytoview?contig=ctg129> Context menu for individual markers on �Markers� track.	Jump to marker info on MaizeGDB
Diversity	<http://www.gramene.org/db/cmap/feature?feature_acc=cmf1104a-ctg251-10> Cross-reference to MaizeGDB.	Jump to locus info on MaizeGDB