Curation
of the EcoCyc Database: The EcoCyc Update Project
Martha
Arnaud for the EcoCyc Database, SRI International, 333 Ravenswood
Avenue,
Menlo Park, CA 94025, martha.arnaud@sri.com
The EcoCyc
database is a model-organism database for Escherichia coli. EcoCyc is best known for its coverage
of metabolic pathways and enzymes; however, EcoCyc is now evolving into a
curated encyclopedia of E. coli molecular biology. Curated descriptions of transporters, transcription factors,
operons, and regulatory elements have been added to EcoCyc. Recently, we have initiated a project
to systematically curate every gene product in E. coli. Over the next few years, we will add paragraph-style
text comments to describe every characterized gene product as well as comprehensive
reference lists. This talk will
focus on our goals, plan, priorities, and our progress to date.
EcoCyc is
freely available to researchers at academic and non-profit institutes.
An
Overview of Eukaryotic Annotation at TIGR
Roger
Smith
The
Institute for Genomic Research (TIGR) is involved in the annotation of a number
of eukaryotic genomes and a systematic approach is utilized for the primary
annotation of these genomes. Newly
sequenced genomes are assembled into scaffolds and then directed sequencing
fills in gaps in these scaffolds, a process referred to as closure. Typically, once a genome is near
completion or is fully closed, the sequence data is passed through a
centralized data management system known as Eukaryotic Genome Control (EGC). This fully automated yet customizable
system first identifies biological features such as genes and then data is
gathered from a number of sources utilizing various methods for each gene to
facilitate the assignment of function and structure.
Once this
data is assembled in an automated fashion, it is manually reviewed and
annotated by a team of scientists to produce a high quality, thorough, complete
and consistent annotation of the proteome. A custom software interface, Annotation Station is used by
annotators to manually inspect the evidence aligned to the genes and edit
them. A web-based interface
developed at TIGR, MANATEE (available at http://manatee.sourceforge.net/)
allows annotators to easily access the computationally derived data and add functional
information to gene products such as names, aliases, symbols, E.C. numbers, GO
assignments, and comments. This comprehensive tool provides annotators with the
best possible information to curate gene products based on functionally
characterized protein matches. The
overall strategy, tools, and software used at TIGR in eukaryotic annotation
will be discussed in more detail with an emphasis on the manual annotation
methodologies employed.
Organization
and presentation of biological information in the Saccharomyces Genome Database
Maria C.
Costanzoa, Rama Balakrishnana, Karen R. Christiea,
Kara Dolinskib, Selina S. Dwighta, Stacia R. Engela,
Becket Feierbachb, Dianna G. Fiska, Jodi Hirschmana,
Eurie L. Honga, Laurie Issel-Tarvera, Robert S. Nasha,
Anand Sethuramana, Barry Starra, Chandra L. Theesfelda,
Rey Andradaa, Gail Binkleya, Qing Donga,
Christopher Lanea, Mark Schroederb, Shuai Wenga,
David Botsteinb, J. Michael Cherrya
aDepartment
of Genetics, School of Medicine, Stanford University, Stanford, CA 94305-5120,
USA
bLewis-Sigler
Institute for Integrative Genomics, Carl Icahn Laboratory, Princeton
University, Washington Road, Princeton, NJ 08544, USA
The Saccharomyces
Genome Database
(SGD) collects and organizes information about genes and gene products of the
model organism S. cerevisiae, bakersÕ or budding yeast. As the repository of the
official S. cerevisiae genome sequence, SGD provides tools to access and analyze the genomic
sequence and to view the overall organization of the genome. SGDÕs primary
focus, however, is on the biology of the yeast cell and its molecular
components. Reflecting this, SGD is organized around a basic unit, the Locus
page, that collects information for a single gene or chromosomal feature. The
layout of the Locus page is intended to be as user-friendly as possible, with simple
presentation, clear organization, intuitive use of links, logical paths of
navigation, and thorough help documentation. The Locus page provides links to
literature, sequence, Gene Ontology (GO) annotations, expression data,
functional analysis studies, and other types of information specific to the
gene of the page. For instance, the ÒFunctional AnalysisÓ pull-down menu on a
Locus page takes the user directly to expression data for that locus, in the
selected dataset. Because the Locus pages are the
organizational center for the database, they can be accessed and searched from
a variety of other pages and tools in SGD. Recent additions to the Locus
page, and projects currently in progress for its improvement, will be
discussed.
GeneDB:
A Prokaryotic and Eukaryotic Genome Resource
Christiane
Hertz-Fowler & The Pathogen Sequencing Unit
The
Wellcome Trust Genome Campus
Hinxton
Cambridge
CB10 1SA UK
The
Pathogen Sequencing Unit (PSU) at the Wellcome Trust Sanger Institute is
involved in the sequencing and annotation of a diverse range of prokaryotic and
eukaryotic organisms, in some instances these projects are part of
collaborations between sequencing centres. GeneDB (http://www.genedb.org) database was developed
to house genome datasets from these projects. Currently, datasets from 16
species, including from 5 bacterial, 3 fungal, 4 Apicomplexan and 3
Kinetoplastid species, are represented within GeneDB. The major emphasis in development so far has been to make
the sequence and annotation of finished as well as ongoing genomes projects
available via a user-friendly resource.
All data
can be easily accessed using browsable catalogues, text and/or sequence searches
as well as via a query interface which allows a wide range of annotation to be
interrogated. Genes or feature
predictions are displayed on their own pages, containing location information,
neighbourhood maps, results of predictive software packages and additional
curated annotation in a graphical and text based format. Queries can be
extended to include datasets from multiple genomes and can be viewed, refined
and downloaded in a variety of file formats. Extensive cross-referencing allows
retrieval of related information not only across species within GeneDB but also
from external resources.
Datasets
from six organisms are curated by biologists, aiming to integrate sequence data
with the vast array of available functional, expression and phenotypic data,
accessible through public databases, the literature and fed back via the
research communities. With an increasing emphasis on comparative sequencing
projects, curators are also involved in maintaining datasets of related
organisms.
The
presentation will briefly introduce GeneDB before focusing on the challenges
faced by the GeneDB curators.
RegulonDB: Curation, Literature Search, Notation and
Evicences about Transcriptional Regulation and Transcription Unit Organization
in E.coli K-12
Gama-Castro S., Peralta-Gil M., Mart’nez-Antonio A.,
Santos-Zavaleta A., Salgado H., Jimenez V. and Collado-Vides J., Program of
Computational Genomics, CIFN, UNAM, A,P, 565-A, Cuernavaca, Morelos 62100,
MŽxico.
RegulonDB is a database with experimental
knowledge about the elements of transcriptional regulation in Escherichia coli
K-12. (Salgado, et al.
Nucleic Acids Res. 2001 29:72-4). It contains information on transcription
units, promoters, terminators, regulatory proteins, binding sites for
regulatory proteins and conditions and the associated affected genes. All of
these objects are supported by references and evidences, supporting the
validity of the data. We have defined a specific notation to describe several
objects in a unique and unambiguous way in the database. These include
promoters, transcription units and regulatory proteins.
The curation process includes all articles that contain information about
transcriptional regulation. The first step of this search is to gather
abstracts from PubMed database using a set of pertinent keywords. Then the
abstracts of these papers are read and selected to obtain the complete articles
in order to read them. Finally, the data extracted is added through several
capture forms into RegulonDB, there is a capture form for each object. The
quality control of the data added is monitored automatically through reports of
incosistency in the data. We will describe a quick overview of the database,
emphasizing the complete curation process.
Integration of New Data into RGD: Quality Control and
Data Submission Tools
Dean Pasko, Susan Bromberg, Wenhua Wu, Chunyu Fan,
Chin-Fu Chen, Gopal Gopinathrao, Rajni Nigam, Cindy Foote, Dorothy Reilly,
Angela Zuniga-Meyer, Jiali Chen, Norberto de la Cruz, Mary Shimoyama, Simon
Twigger, Aubrey Hughes, Jed Mathis, Nataliya Nenasheva, Victoria Petri, Weiye
Wang, Lan Zhao, Peter Tonellato, Howard Jacob
Human and Molecular Genetics Center,
Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, Wisconsin
53226, USA
One goal of
RGD is to integrate data from multiple sources including the literature,
research laboratories, as well as other comprehensive databases such as
Swiss-Prot and LocusLink. This requires that we accurately match incoming data
with our data. Because there are
many different symbols for any given gene, quality control measures have been
developed to ensure that data records are associated correctly before they are
loaded. This was done using a
combination of gene symbol and aliases, GenBank accession numbers, and sequence. Data that does not meet the established
criteria are separated according to the conflict type and resolved by manual
curation. Data curated from the
literature also goes through this pipeline, but in addition quality control
measures have been built into the data submission forms curators use. The methods used to align the data and
resolve conflicts as well as data entry forms will be described.
Comparative
Map Curation in Gramene Using CMap
Immanuel
Yap, Ken Y. Clark, and the Gramene Team
CMap is a
visualization tool for comparative mapping. It is being
developed as a module for the Generic Model Organism Database Project (GMOD) and is available for download at
<http://www.gmod.org/cmap/> CMap defines a map_set as a collection
of maps, and a map as a linear
array of features.
Correspondences may be generated between different features on different maps. Since CMap strictly acts as a
display tool, the curator is free
to redefine the meaning of maps, features, and correspondences as well as how they are drawn. For instance, the curator may define a map as being
genetic, physical, sequence-based,
etc. A feature on a map may
represent a locus, gene, molecular
marker, EST, phenotype, QTL, etc.
Features may be portrayed as a
point or an interval of different color. Correspondences may be
generated automatically, based on feature names, or specified by
the curator. Data may be loaded by batch from a perl
command-line script. CGI scripts allow
manual addition, modification, and editing of data via a web browser.
Gramene <http://www.gramene.org/>
is using CMap to display and
compare genetic, physical, and sequence-based maps of rice and other grasses. Issues faced by Gramene curators when
using CMap to store and display
maps and generate correspondences will be
discussed.
Victoria Carolloa, Gerard Lazoa,
David Matthewsb, Olin Andersona
a USDA-ARS-WRRC, 800 Buchanan Street,
Albany, CA 94710
b Cornell University, Dept. of Plant
Breeding, Ithaca, NY 14853
The
GrainGenes project is a USDA-supported compilation of molecular and phenotypic
information on wheat, barley, rye and oats. The curation and delivery of web-based genetic and physical
maps on GrainGenes has been a mainstay of the database since its inception in
1991. GrainGenes currently serves
90 Map_Data sets of entire genomes, 22 Linkage_Data sets of single chromosome
studies, and 581 records of 2_point_data studies. Mapped loci are linked to text records containing associated
data such as marker type, mapping scores, images of autoradiograms, band size,
linked QTL, references, and mapping probes. Probe records associated with loci contain links to external
databases, clone and source information, PCR conditions, primer sequences,
etc. QTL maps on GrainGenes link
the loci to supporting statistics and descriptions of trait studies. Maps selected to add to the GrainGenes
map collection are usually identified via the literature reference stream into
the database, but very often curators work closely with colleagues in the
Triticeae research community to publish maps on GrainGenes simultaneously with
publication in scientific journals and newsletters. GrainGenes is planning a major move from an object-oriented
ACEDB database to a relational database.
New map viewing graphical interfaces will also be implemented. The GBrowse viewer, developed by the
GMOD group, will provide a user-friendly interface for physical maps, and allow
users to annotate map data and curators to include more features than currently
available on the ACEDB map viewer.
The CMap viewer, developed by Gramene, will facilitate map integration
and comparative genomics. An
overview of current map accessions, planned additions and new features will be
discussed.
Sequence
Curation in dictyBase
P.
Fey, E. M. Just, P. Gaudet, P. A. Dyck, S. Merchant, W. A. Kibbe, R. L.
Chisholm
Northwestern University, Feinberg School
of Medicine, Center for Genetic Medicine, 303 E Chicago Ave, Chicago, IL 60611
dictyBase (http://dictybase.org) provides the scientific
community with a database that aims to integrate all currently available Dictyostelium genome sequence, literature and as
much as possible, biological knowledge associated with specific genes. dictyBase is based on the Saccharomyces
Genome Database (SGD) which has worked closely with us in developing dictyBase.
In dictyBase we wish to present information collected from a variety of different
sources, including sequencing data from the genome sequencing centers, GenBank
records submitted by research laboratories, data from cDNA sequencing projects,
and other gene predictions (not from sequencing centers). Our goal is to present all of this
diverse, and occasionally conflicting data to allow individual users to
evaluate and make their own judgments about its validity. To accomplish this we display all the
different sequence types on separate tracks in our Genome browser (Gbrowse; http://www.gmod.org/ggb/index.shtml):
'Verified Genes', 'Gene
Predictions from Sequencing Centers', 'EST Alignments', 'HMM Gene Predictions',
'Contigs', 'Unanchored Genes'. In addition, this approach also may be valuable
for other databases with incomplete genome sequences.
Since the
most consistent source of sequence data comes from the genome sequencing
centers, our current strategy is to use genome sequence as the primary sequence. However, to date, Dictyostelium genome
center gene predictions (geneID) are derived in an entirely automated fashion,
as are a separate set of gene predictions derived by an independent effort and
distinct software package--the HMM gene predictions. Therefore we created the
'Verified Genes' track, whose coordinates come from manual entries by dictyBase
curators based on all available information (GenBank records and ESTs). The
coding sequences are blasted against the Chromosomal DNA from the sequencing
centers to compare gene coordinates. In case of a discrepancy between the
experimentally validated gene model and the sequencing center gene model, the
curators assign new chromosomal coordinates according to the experimentally
derived sequence. In this way
dictyBase can contribute to the improvement of gene models and present users
with manually annotated gene models when there is additional supporting
data. We hope to stimulate
discussion in the group of how other databases are handling these issues.
Apollo:
a genome annotation tool
Lynn
Crosby, FlyBase, Department of Molecular and Cellular Biology,
Harvard
University, Cambridge, MA 02138-2020, USA
Apollo,
developed jointly by members of the Berkeley Drosophila Genome Project and the
Sanger Institute (UK), is a powerful and versatile tool for the annotation of
genomic sequence. It is a Java
application, and can be downloaded and run on Windows, Mac OS X, or any
Unix-type system (including Linux).
Apollo is freely available; see http://www.fruitfly.org/annot/apollo/
and http://www.fruitfly.org/annot/apollo/install.html. User documentation is remarkably
detailed and complete. The BDGP
version, which has been used for Drosophila genome annotation, is slightly
different than the Sanger version, used for viewing the human genome. Documentation and software for
developers are also freely available, from SourceForge (which can be accessed
from the page above).
Apollo is
both a viewer and an interactive annotating tool. The view is divided into an annotation zone and an evidence
zone; one or both strands may be viewed.
The tool allows the user to view very large amounts of aligned data
quickly and effectively. Innumerable aspects of the view can be customized, and
moving around within the region being viewed is fast and intuitive. One may zoom in and out; at the highest
resolution DNA and protein sequence data appear, superimposed and aligned on
the genomic sequence base line, on the annotated objects, and on the evidence
objects. Annotation may be done at
varying levels of detail, using alternative views and interactive manipulations
between the evidence and the annotations.
Apollo is a
cooperative work in progress.
Currently under development is the ability to add evidence in a dynamic
fashion (rather than being pre-loaded), and a version of Apollo that will
simultaneously present two aligned genomes. In addition, a number of user groups have developed
applications and extensions of Apollo.
Developers at Berkeley and Sanger provide extensive technical support
for such efforts; these communications may also be accessed via SourceForge.
Clustering
MeSH Representations of Medical Literature
Craig
Struble, Department of Mathematics and Computer Science, Marquette
University, Milwaukee WI
Clustering
documents is an important problem with applications in concept formation,
knowledge extraction, and classification. Clustering papers in Medline, a
collection of abstracts from medical related publications, has been previously
investigated, with many successful applications. Many approaches for document
representation and clustering are based on a full text analysis of the
abstracts and bodies of each paper in a document collection.
We have
been investigating an alternative approach for representing documents based on
Medical Subject Headings (MeSH), an ontology for indexing papers in Medline and
PubMed, an online database of publication abstracts that encompasses Medline.
Using MeSH based representations, we have been able to visualize and identify
structure not readily seen with full text based representations. We present our
results of clustering documents contained in the Rat Genome Database (RGD),
comparing full text and MeSH based representations.
Textpresso:
An Information Retrieval and Extraction System for C. elegans
Literature
Eimear
Kenny, Hans-Michael Mueller and Paul Sternberg
A major
challenge facing researchers in biomedical sciences is extracting the vast amount of information
available only in biological literature,
most of it contained in individual papers. Manual extraction of information from scientific papers is
tedious and slow. We have therefore
designed Textpresso, a web-based system that aids the C. elegans researcher and professional curator in
retrieving and efficiently
extracting information from papers and abstracts. Textpresso
recognizes words and phrases in
text as belonging to word categories in the Textpresso Ontology. The Textpresso search corpus is
automatically preprocessed so that
these words and phrases are annotated with their corresponding ontology category. For example, the ontology
category "Regulation"
would be annotated to words such as "enhance",
"repress",
"regulate" etc. The ontology includes all terms from the Gene
Ontology (GO) Consortium. The
semantically marked-up text is presented in XML format, making it available to XML-processing software
tools. A web-based user interface
(http://www.textpresso.org/) offers
two ways to search this corpus
using keywords and/or categories; the easy-to-use "Simple
Retrieval" and the more
complex and powerful "Advanced Retrieval". The Textpresso corpus comprises ~18,000 abstracts and ~2,700
full text papers information rich
in C. elegans biology. The researcher is able to view sentences that match their query, as well as
paragraphs, whole articles and citation
information. The project currently focuses on C. elegans
literature, however, an expansion
to the literature of S. cerevisiae is planned and other model organisms should be straightforward. The project
is part of WormBase (http://www.wormbase.org/) and GMOD (http://www.gmod.org/).
PubFetch:
Collecting literature from multiple data sources
Vijay
Narayanasamy & Simon Twigger
Rat Genome
Database
Medical
College of Wisconsin, Milwaukee WI 53226
Scientific
literature curation to extract information forms an essential component of any
model organism database (MOD). The literature data is available from multiple
sources. Some of the publicly available electronic sources include PubMed and
Agricola. There are several subscription based literature data sources as well.
PubFetch provides a generic way of searching and retrieving literature data
from online literature data sources so that the downstream applications donÕt
have to deal with the idiosyncrasies of the individual literature databases.
Apart from fetching the documents, PubFetch has functionalities to filter
duplicate documents and to present the documents in the desired format. The
current version of PubFetch retrieves documents from PubMed and Agricola and
formats them into MEDLINE Display format. PubFetch is available as a
stand-alone command line Java application, web application, and also as a
service in the BioMOBY webservices framework.
BioCreAtIvE:
Critical Assessment of Information Extraction
Marc
Colosimo, Alexander Yeh, Alexander Morgan, Lynette Hirschman
MITRE
This talk
will describe MITREÕs work in evaluation for text data mining applied to
biology. We first summarize Task 1
for the KDD (Knowledge Discovery and Data Mining) Challenge Cup 2002, run by
Alex Yeh at MITRE. The challenge was to develop an automated system for a task
early in the FlyBase Harvard curation pipeline: identifying which articles to curate, based on whether they
contained experimental evidence for Drosophila gene products. The rest of the
talk will focus on the ongoing BioCreAtIvE evaluation which MITRE is running in
conjunction with Christian Blaschke and Alfonso Valencia from CNB-Madrid. BioCreAtIvE includes a task on listing
genes mentioned in abstracts (using data provided by the fly, mouse and yeast
databases, as well as data provided by NCBI). The second subtask focuses on functional annotation of full
text articles: specifically, automatic creation of Gene Ontology terms for
proteins (with data provided by SWISS-PROT).
Curatorial
procedures at Mouse Genome Informatics, with an emphasis on expression data
Constance
M. Smith for the Gene Expression Database at Mouse Genome Informatics, The
Jackson Laboratory, Bar Harbor ME
04609
Mouse
Genome Informatics (MGI) provides integrated access to data on the genetics,
genomics, and biology of the laboratory mouse. Information in MGI is obtained via manual curation of the
published literature, from electronic downloads, and from electronic
submission. The Gene Expression
Database (GXD), one of the databases comprising MGI, collects and integrates
information about gene expression in the developing mouse. GXD is designed to integrate data
obtained from many different kinds of assay types, including RNA in situ hybridization,
immunohistochemistry, in situ reporter (knock in), Northern and Western blots,
and RT-PCR. In GXD, as well as the
rest of MGI, controlled vocabularies are extensively used to ensure uniform
data coding and to enable complex queries of the data. For instance, expression patterns are
described using an extensive dictionary of standardized anatomical terms,
enabling the recording of expression results from assays with different spatial
resolution in a consistent manner.
Whenever possible text annotations are complemented by digitized images
of original expression data to further interpretation of the primary data. GXD is available at the Mouse Genome
Informatics site at www.informatics.jax.org.
Gene
Expression Curation in WormBase
Wen J.
Chen, Igor Antoshechkin, WormBase Consortium
WormBase
gene expression curation focus on three major parts: descriptive analysis of
individual genes, microarray and gene regulation.
We have
screened all of the C. elegans publications(~7000) accumulated to date and manually extracted spacial and
temporal gene expression data. Our
first-pass curation pipeline continue to flag new papers for
extraction. New expression data
are released fort-nightly. WS112 contains experimental results from 2.354 experiments that
studied ~1,700 genes. Thus, we
consider ourselves complete and up-to-date on the curation of this type
of data.
We have
developed database models for microarray data that are based on MIAME (Minimum Information About a
Microarray Experiment) recommendations
and established a pipeline for paper curation and data entry into
the database. The majority of data
processing is carried out via Perl scripts and is automated, although some script modification is
required due to differences in
primary data file formats, which are usually obtained directly from authors. There are currently twenty one papers
containing microarray data in C.
elegans literature describing developmental expression profiles as well as gene expression under certain
conditions such as genetic
mutations or drug treatment. Seven of them have been curated and the data have been entered into WormBase. These
data contain 595,451 individual
expression level data points and 175 clusters. We estimate that we will process all data available in the
literature by the end of this
year. We are also working on developing database mod els for SAGE
(Serial Analysis of Gene Expression) data. SAGE is another high-throughput method of gene expression
analysis, which is gaining
popularity. There are currently two SAGE papers in C. elegans
literature.
We just
started the curation on gene regulation, which includes all the experimental studies on how a gene or
an environmental condition regulates
the expression of other genes. This kind of data is also curated
manually. Current curation on gene
regulations follows our first-pass curation pipeline. Newly published articles are curated first so that
there will soon be a complete
collection of gene regulation data that are published after 2003. Curation on earlier articles will be done later.
To ensure
consistency and to facilitate data analysis, we have been using developmental(life stage) ontology to
code temporal data. In the near
future, we will begin to apply a cell and anatomy ontology to
describe spatial data.
Biological
Interaction Curation In FlyBase
Chihiro
Yamada
FlyBase
Cambridge, Department of Genetics
University
of Cambridge, CB1 3NY UK.
With a
large number of genome sequencing projects having been completed, and more on
the way, there has been a growth in interest in not only what genes are present
in the genome, but how those genes interact. There are various ways to study this and computational
studies of interactions can be facilitated by some of the types of data that
Model Organism Databases curate. In FlyBase there are a number of classes of
curated data that could be used in these studies and I shall be discussing them
in my talk.
Interactions
between mutant alleles of different genes have been curated in FlyBase for
three years and now constitute a large body of data that Bioinformaticists are
starting to examine. I will briefly discuss mutant allele curation in FlyBase,
and then go on to explain how it is extended to cover interactions between
mutant alleles of different genes.
I'll finish
by talking about other ways our curation captures studies on interactions, including describing interactions
with GO terms, and some initial work being carried out by Harvard with the aim
to describe molecular interactions.
Mutant
Manifests: toward a zebrafish phenotype ontology
David
Fashena 1, Erik Segerdell 1, Melissa Haendel 1, Judy Sprague 1, Monte
Westerfield 1,2
1 Zebrafish
Information Network, 5291 University of Oregon, Eugene, OR USA 97403-5291
http://zfin.org
2 Institute
of Neuroscience, 1254 University of Oregon, Eugene, OR USA 97403-1254
Abstract:
The
function of genes during embryonic development is illustrated by the phenotypes
resulting from mutated genes. Phenotype data from zebrafish mutants and gene
knock-downs are being generated at an increasing rate. To accommodate the flood
of new data in a way that will facilitate the phenotypic analysis of gene
function, we are investigating the use of a zebrafish phenotype ontology. The
phenotype ontology would complement our existing ontology of anatomical
structures and behavioral and physiological processes. The structure of any
phenotype would be:
Phenotype =
observable + attribute + value + qualifier
The ÒobservablesÓ
are species-specific and come from the zebrafish anatomical ontology. To
facilitate cross-species comparisons, we ensure that as many anatomical terms
as possible are shared between zebrafish and mouse. Homologous structures like
fins and limbs are cross-listed. The Òattributes, values and qualifiersÓ come
from the cross-species phenotype ontology being developed in collaboration with
the Phenotype Ontology Consortium. The zebrafish phenotype ontology will allow
the annotation of zebrafish phenotypes in a format that enables ready
comparison to mutant phenotypes in other organisms. The system is sufficiently
flexible to accommodate mutants, morpholinos and environmental factors. This is
a significant expansion in the way ZFIN curates mutant phenotypes and will
entail new challenges for our curators.
Community
Curation at MaizeGDB
Carolyn
J. Lawrence, Mary L. Polacco, Trent Seigfried, and Volker Brendel
The
Maize Genetics and Genomics Database (MaizeGDB) is a central repository for
maize sequence, stock, phenotype, genotypic and karyotypic variation, and
chromosomal mapping data. The MaizeGDB team endeavors to make use of the maize community's
expertise and willingness to provide expert annotation. To this end,
community curation tools have been created and are available for public
use. How community curators can use the curation tools to contribute data
directly to the database will be presented, and some of the protocols
implemented to ensure that new records added to the database by community
curators are of the highest quality will be discussed.
Community Interactions: Feedback,
Support and Curation
Eva Huala
TAIR serves the needs of a large
community of plant biology researchers ranging from professors to
undergraduates along with teachers, students, and others. Our mandate to serve the needs of the
plant biology community requires that we be responsive to user input of all
kinds, ranging from simple questions about where to find information or how to
use tools, to suggestions for improvement of datasets and tools or requests for
specialized datasets. To make TAIR
more accessible to all users we provide several avenues for gathering data,
hearing community feedback and keeping in touch with the needs and desires of
our community. We accomplish this by including allowing users to post comments
directly on object detail pages, using open source tracking software
(Jitterbug) to assign, respond to and archive user questions to TAIR, making
custom datasets available in our User Requests ftp directory (ftp://tairpub:tairpub@ftp.arabidopsis.org/home/tair/User_Requests),
and giving workshops on how to use TAIR.