This page is the default entry point to the Sybil web interface. It displays summarized information on the
contents of the comparative database and can be customized with links to the various comparative tools
and displays. Note that certain links and/or summary tables will only appear if the necessary supporting data
have been loaded into the comparative database.
|sybil: strepneumo: home: Search for genes/proteins
The gene/protein search tool is a simple query interface that allows one to search the comparative
database for all genes whose protein product description matches one or more keywords. There are
a few things to note when using this tool:
- the search is case-sensitive (i.e., in the sample database a search for "histone" or "Histone" will succeed--and give different answers--but a search for "HISTONE" will retrieve nothing)
- multiple keywords must be separated by spaces (e.g., "kinase receptor")
- the search tool will NOT interpret the words "and" and "or" as logical operators (i.e., they are treated like any other keyword)
- any protein matching at least one of the keywords will be retrieved
- proteins are ranked according to the number of keyword occurrences in each
A more sophisticated gene/protein search tool is planned; visit the Sybil project site for software updates.
|sybil: strepneumo: home: List protein clusters
The "List protein clusters" tool enumerates all of the protein clusters that were generated by a particular
protein clustering analysis. Simply choose the desired analysis and output format (either HTML or tab-delimited
plain text) and then select/click on "list clusters." For more information about the protein clustering analyses available
in Sybil, see the relevant descriptions on the Sybil project web site:
Note that the parameters used for each of the analyses in the current database should be summarized in
the "description" column in the list of "Analyses/computes" that appears on the database home page. Also note
that the sample database does not necessarily contain data from all of the above clustering analyses.
|sybil: strepneumo: home: Organisms/genomes
This is a list of all the organisms and/or genomes for which sequence and comparative data have been loaded
into the comparative database.
|sybil: strepneumo: home: pseudomolecules
The contigs obtained from the whole genome assembly of the draft genomes have been ordered using the TIGR4 genome as a reference for
alignments. Once ordered, the contigs have been linked together into a pseudomolecule for the purpose of annotation and whole genome
displays. Contigs that did not align to the reference were linked together in random order in a second pseudomolecule that is typically
much shorter that the first one. You have the ability to select either one of the pseudomolecules for analyses offered throughout the
Sybil system. A spacer sequence called a "pmark" (NNNNNCACACACTTAATTAATTAAGTGTGTGNNNNN) introduces translational start and stop codons
in all 6 frames and is placed between the contigs in the pseudomolecules to avoid predicting genes across contig boundaries and to
predict gene fragments at contig edges. Pmarks can be viewed in some of the Sybil displays such that potential breaks in synteny can be
|sybil: strepneumo: home: Analyses/computes
This table lists all of the analyses/computes that have been loaded into the comparative database. The Sybil
project site contains a list of the various analyses
supported by the system.
Note that the "description" column in this list mentions some of the parameter settings that were chosen when the analysis
was run (since it is possible to run/load the same analysis multiple times, but with different parameter settings.) The Jaccard
cluster analysis, for example, is typically run at least once for each genome in the database, followed by one or more
cross-genome clustering analyses, such as the Sybil "COG" analysis or Jaccard-filtered COG analysis.
|sybil: strepneumo: protein keyword search
This page displays the results of a gene/protein keyword search. All of the proteins whose product description
contains at least one of the query keywords will be listed here, with those that have the most keyword matches
appearing first. Following a link on this page will launch the individual gene/protein summary page for the
If the search returned no results, use the browser's "back" button to return to the previous page and try the search
again with a different keyword, or a substring of the original keyword.
Note that the search is currently case-sensitive, so to find occurrences of both "Histone" and "histone" it will
be necessary to search either with both keywords (i.e., "Histone histone") or to use a substring that is likely to
match both (and only) keywords (e.g., "istone").
|sybil: strepneumo: list protein clusters
This page displays all the protein clusters that were generated by a single protein clustering analysis.
Following a link on this page will launch the protein cluster display page for the selected cluster. For
more information about the protein clustering analyses available in Sybil, see the relevant
descriptions on the Sybil project web site:
|sybil: strepneumo: protein display
This page displays summarized information pertaining to a single gene/protein in the
|sybil: strepneumo: protein display: protein properties
A list of protein properties. In future this section may be expanded to include additional stored
and/or computed physical properties of the protein. Currently however, only the following properties
- organism - organism/genome to which this protein or gene belongs
- product name - a description of the protein product
- sequence length - length of the protein's predicted amino acid sequence
- created - date on which the protein was created in the (original) database
- last modified - date on which the protein was last modified
|sybil: strepneumo: protein display: database references
A list of external database references and/or alternate accession numbers and IDs for the protein.
In the sample database each of the genes should have at least one reference back to the annotation
database from which it was loaded. Whenever possible these external database accession
numbers should be hyperlinked to the appropriate database's web site.
|sybil: strepneumo: protein display: protein clusters
A list of all the protein clusters to which this protein/gene belongs. Each cluster in this list is hyperlinked to
the protein cluster display page for that cluster. For more information about the protein clustering analyses available
in Sybil, see the relevant descriptions on the Sybil project web site:
|sybil: strepneumo: protein display: genomic context
A graphical representation of the region(s) of the genome to which this gene maps. The name of the
gene itself appears in black and the names of neighboring genes are shown in grey. Click on any gene
in the image to switch to the gene/protein summary page for that gene. Note that genes are color-coded
according to the organism from which they are derived, using the site-wide color scheme that has been
defined in the Sybil site configuration file.
|sybil: strepneumo: protein display: blastp hits
A graphical representation of the top few BLASTP matches for the protein. The current protein appears
as a thick colored rectangle at the top of the image, labeled with a sequence axis that reflects the
length of its predicted protein sequence. Matches (i.e., BLAST HSPs/GSPs) to other proteins
in the database are indicated by the thinner colored rectangles below. Each match is annotated with its
P-value and percent identity score and matches to the same target protein are grouped together. Color
is used to signify the organism/genome to which each protein belongs. Clicking on any protein except
the query protein at the top will switch to the gene/protein summary page for that protein.
|sybil: strepneumo: protein display: amino acid sequence
The predicted amino acid sequence of the protein in FASTA format. Note that certain frameshifted "ORFs"
or pseudogenes may not have an amino acid sequence stored in the comparative database.
|sybil: strepneumo: protein cluster display
|sybil: strepneumo: protein cluster display: cluster summary
This section displays summarized information about the current protein cluster. This information includes:
- algorithm - an abbreviation for the protein clustering algorithm used to generate this cluster
- description - a more verbose description of the protein clustering algorithm used to generate this cluster, often including the values of one or two key parameters
- number of proteins - the number of proteins that should appear in the list of "clustered proteins"
- avg. blastp identity - an approximate percentage measuring the well-conservedness of the proteins in this cluster, computed by averaging the percent identities of all high-scoring pairwise BLASTP HSPs/GSPs
- avg. blastp coverage - an approximate percentage that measures how "well-covered" (on average) each protein is by BLASTP HSPs/GSPs. If this value is relatively low but the avg. blastp identity is high then the cluster may have been formed as the result of one or two very well-conserved motifs appearing in each of the proteins. If this value is higher, on the other hand, it indicates that the member proteins have some similarity over all or most of their respective lengths (assuming that all of the proteins are similar in size, since this value will drop if a single short protein is aligned with several longer ones.)
|sybil: strepneumo: protein cluster display: clustered proteins
A list of all the proteins that have been clustered together by the protein clustering algorithm.
If the protein cluster has been edited subsequently by a curator (i.e., adding or removing proteins
from the cluster based on manual inspection and/or evidence from the scientific literature), any
newly-added proteins will be highlighted in this list. Proteins in the list are color-coded
according to their respective source organisms, and each protein name is hyperlinked to the
protein/gene display for that protein. Additionally, the dropdown menus to the left of each
protein entry indicate the position of the protein in the
Genome Context Image. These positions can be changed either individually using the dropdown, or by clicking the
organism name just to the right of the the dropdown and utilizing the species selection menu.
Changes made to the protein ordering will take effect when the redraw button is clicked. The
'hide all' and 'show all' buttons change all of the dropdown values to make protein selection
|sybil: strepneumo: protein cluster display: genomic context
This is a graphical display of the genes whose protein products have been clustered; the clustered genes
will apepar in the center of the image connected by regions shaded in red/pink. Nearby genes on each of
the relevant genomic sequences will also be shown, and those that were also clustered together (by the
same clustering analysis) are connected by regions that are shaded in grey.
The names of genes that belong to a cluster (from the same cluster analysis) are displayed in black, whereas the
names of genes that do not belong to a cluster appear in light grey. Clicking on any gene launches the protein/gene
display for that gene's protein and clicking on any of the red or grey shaded regions will switch to the protein cluster
display for the corresponding cluster.
|sybil: strepneumo: protein cluster display: clustal alignment
A multiple sequence alignment is precomputed and stored in the comparative database for each and every protein
cluster generated by a protein clustering algorithm (with the possible exception of extremely large clusters;
a configurable parameter allows the clustering analysis to omit the alignment calculation for clusters over a
specified size.) The gene/protein names are color coded by source organism and a line immediately below the
multiple alignment displays one measure of the relatedness of the aligned sequences, based on average distance
to the consensus sequence.
|sybil: strepneumo: sequence selection
The sequence selection form is common to several Sybil pages. It allows you to select genomic sequences
for display/listing. Here are the details:
- Add organisms to the display by clicking the 'add' link on the left. This will add the organism
to the next available position (as denoted by the dropdown).
- By default the longest sequence associated with a particular organism will be selected. To select a different sequence click in the '[change]'
link at the right side of the list. This will bring up a popup in which you can add/remove sequences as needed by clicking the checkboxes.
Clicking 'save & close' will confirm your selection and close the popup box.
- Remove organisms by clicking 'remove'.
- Clicking on the organism name of any of the organisms in the list will allow you to add groups of organisms to the display.
At this point you can add all organisms of a particular species by clicking the 'Show G. species first' link in the popup.
This will add all of the organisms of that species in the first positions.
The synteny gradient display is a unique representation of conserved gene content/order
between several genomes. The image generated can be used to identify rearrangements as well
as large scale insertions/deletions. The view is based on a reference sequence that is drawn
in the bottom panel. Genes in this genome are colored yellow->blue from left to right. White denotes a region with
no gene annotation. Matching genes in each of the query sequences are drawn atop their reference
match (i.e. not in the order in which they appear in their respective genomes). These genes are colored based
on the relative position in their respective genomes (Yellow for the beginning and blue for the end).
The drawing routine can be summarized with the following steps:
- Draw reference genome with genes colored from yellow to blue.
- Draw query gene matches above the reference gene they hit.
- Color the query gene hits based on where they appear in their native genome.
- Optional: Color query genes with multiple copies (paralogs) black (as the color cannot be
determined based position).
Note that the color gradient for each genome is based on that genomes
length. Therefore yellow is always the start and blue is always the end of the sequence. The color is therefore used only
to determine the relative position of the matching gene in it's native genome. Therefore, 'blue' in one genome should not be
interpreted as the exact same basepair location as 'blue' in the reference.
Currently if multiple sequences are selected for a particular organism and the '1 sequence per line'
option is not checked then the sequences will be ordered based on length. The gradient will then be assigned based on this false 'assembly'. In this
case it is only useful to see what type of genomic coverage you have between the genomes. The order may be confounded because multiple assemblies are