sybil: strepneumo: help [db=strepneumo_v15]

Sybil help page

This is the help page for the Sybil web interface. The help is organized by page; for each of the following pages documentation is available that describes each of the major features of the page. The help may be accessed directly via one of the following links or by scrolling through this document. Or, one may follow the "help" links that appear throughout the web site in order to go directly to the relevant help entry.

sybil: strepneumo: home

This page is the default entry point to the Sybil web interface. It displays summarized information on the contents of the comparative database and can be customized with links to the various comparative tools and displays. Note that certain links and/or summary tables will only appear if the necessary supporting data have been loaded into the comparative database.

sybil: strepneumo: home: Search for genes/proteins

The gene/protein search tool is a simple query interface that allows one to search the comparative database for all genes whose protein product description matches one or more keywords. There are a few things to note when using this tool:

  • the search is case-sensitive (i.e., in the sample database a search for "histone" or "Histone" will succeed--and give different answers--but a search for "HISTONE" will retrieve nothing)
  • multiple keywords must be separated by spaces (e.g., "kinase receptor")
  • the search tool will NOT interpret the words "and" and "or" as logical operators (i.e., they are treated like any other keyword)
  • any protein matching at least one of the keywords will be retrieved
  • proteins are ranked according to the number of keyword occurrences in each

A more sophisticated gene/protein search tool is planned; visit the Sybil project site for software updates.

sybil: strepneumo: home: List protein clusters

The "List protein clusters" tool enumerates all of the protein clusters that were generated by a particular protein clustering analysis. Simply choose the desired analysis and output format (either HTML or tab-delimited plain text) and then select/click on "list clusters." For more information about the protein clustering analyses available in Sybil, see the relevant descriptions on the Sybil project web site:

Note that the parameters used for each of the analyses in the current database should be summarized in the "description" column in the list of "Analyses/computes" that appears on the database home page. Also note that the sample database does not necessarily contain data from all of the above clustering analyses.

sybil: strepneumo: home: Organisms/genomes

This is a list of all the organisms and/or genomes for which sequence and comparative data have been loaded into the comparative database.

sybil: strepneumo: home: pseudomolecules

The contigs obtained from the whole genome assembly of the draft genomes have been ordered using the TIGR4 genome as a reference for alignments. Once ordered, the contigs have been linked together into a pseudomolecule for the purpose of annotation and whole genome displays. Contigs that did not align to the reference were linked together in random order in a second pseudomolecule that is typically much shorter that the first one. You have the ability to select either one of the pseudomolecules for analyses offered throughout the Sybil system. A spacer sequence called a "pmark" (NNNNNCACACACTTAATTAATTAAGTGTGTGNNNNN) introduces translational start and stop codons in all 6 frames and is placed between the contigs in the pseudomolecules to avoid predicting genes across contig boundaries and to predict gene fragments at contig edges. Pmarks can be viewed in some of the Sybil displays such that potential breaks in synteny can be identified.

sybil: strepneumo: home: Analyses/computes

This table lists all of the analyses/computes that have been loaded into the comparative database. The Sybil project site contains a list of the various analyses supported by the system. Note that the "description" column in this list mentions some of the parameter settings that were chosen when the analysis was run (since it is possible to run/load the same analysis multiple times, but with different parameter settings.) The Jaccard cluster analysis, for example, is typically run at least once for each genome in the database, followed by one or more cross-genome clustering analyses, such as the Sybil "COG" analysis or Jaccard-filtered COG analysis.

sybil: strepneumo: protein keyword search

This page displays the results of a gene/protein keyword search. All of the proteins whose product description contains at least one of the query keywords will be listed here, with those that have the most keyword matches appearing first. Following a link on this page will launch the individual gene/protein summary page for the selected gene/protein.

If the search returned no results, use the browser's "back" button to return to the previous page and try the search again with a different keyword, or a substring of the original keyword. Note that the search is currently case-sensitive, so to find occurrences of both "Histone" and "histone" it will be necessary to search either with both keywords (i.e., "Histone histone") or to use a substring that is likely to match both (and only) keywords (e.g., "istone").

sybil: strepneumo: list protein clusters

This page displays all the protein clusters that were generated by a single protein clustering analysis. Following a link on this page will launch the protein cluster display page for the selected cluster. For more information about the protein clustering analyses available in Sybil, see the relevant descriptions on the Sybil project web site:

sybil: strepneumo: protein display

This page displays summarized information pertaining to a single gene/protein in the comparative database.

sybil: strepneumo: protein display: protein properties

A list of protein properties. In future this section may be expanded to include additional stored and/or computed physical properties of the protein. Currently however, only the following properties are reported:

  • organism - organism/genome to which this protein or gene belongs
  • product name - a description of the protein product
  • sequence length - length of the protein's predicted amino acid sequence
  • created - date on which the protein was created in the (original) database
  • last modified - date on which the protein was last modified
sybil: strepneumo: protein display: database references

A list of external database references and/or alternate accession numbers and IDs for the protein. In the sample database each of the genes should have at least one reference back to the annotation database from which it was loaded. Whenever possible these external database accession numbers should be hyperlinked to the appropriate database's web site.

sybil: strepneumo: protein display: protein clusters

A list of all the protein clusters to which this protein/gene belongs. Each cluster in this list is hyperlinked to the protein cluster display page for that cluster. For more information about the protein clustering analyses available in Sybil, see the relevant descriptions on the Sybil project web site:

sybil: strepneumo: protein display: genomic context

A graphical representation of the region(s) of the genome to which this gene maps. The name of the gene itself appears in black and the names of neighboring genes are shown in grey. Click on any gene in the image to switch to the gene/protein summary page for that gene. Note that genes are color-coded according to the organism from which they are derived, using the site-wide color scheme that has been defined in the Sybil site configuration file.

sybil: strepneumo: protein display: blastp hits

A graphical representation of the top few BLASTP matches for the protein. The current protein appears as a thick colored rectangle at the top of the image, labeled with a sequence axis that reflects the length of its predicted protein sequence. Matches (i.e., BLAST HSPs/GSPs) to other proteins in the database are indicated by the thinner colored rectangles below. Each match is annotated with its P-value and percent identity score and matches to the same target protein are grouped together. Color is used to signify the organism/genome to which each protein belongs. Clicking on any protein except the query protein at the top will switch to the gene/protein summary page for that protein.

sybil: strepneumo: protein display: amino acid sequence

The predicted amino acid sequence of the protein in FASTA format. Note that certain frameshifted "ORFs" or pseudogenes may not have an amino acid sequence stored in the comparative database.

sybil: strepneumo: protein cluster display

This page displays information pertaining to a single protein cluster. For more information about the protein clustering analyses available in Sybil, see the relevant descriptions on the Sybil project web site:

sybil: strepneumo: protein cluster display: cluster summary

This section displays summarized information about the current protein cluster. This information includes:

  • algorithm - an abbreviation for the protein clustering algorithm used to generate this cluster
  • description - a more verbose description of the protein clustering algorithm used to generate this cluster, often including the values of one or two key parameters
  • number of proteins - the number of proteins that should appear in the list of "clustered proteins"
  • avg. blastp identity - an approximate percentage measuring the well-conservedness of the proteins in this cluster, computed by averaging the percent identities of all high-scoring pairwise BLASTP HSPs/GSPs
  • avg. blastp coverage - an approximate percentage that measures how "well-covered" (on average) each protein is by BLASTP HSPs/GSPs. If this value is relatively low but the avg. blastp identity is high then the cluster may have been formed as the result of one or two very well-conserved motifs appearing in each of the proteins. If this value is higher, on the other hand, it indicates that the member proteins have some similarity over all or most of their respective lengths (assuming that all of the proteins are similar in size, since this value will drop if a single short protein is aligned with several longer ones.)
sybil: strepneumo: protein cluster display: clustered proteins

A list of all the proteins that have been clustered together by the protein clustering algorithm. If the protein cluster has been edited subsequently by a curator (i.e., adding or removing proteins from the cluster based on manual inspection and/or evidence from the scientific literature), any newly-added proteins will be highlighted in this list. Proteins in the list are color-coded according to their respective source organisms, and each protein name is hyperlinked to the protein/gene display for that protein. Additionally, the dropdown menus to the left of each protein entry indicate the position of the protein in the Genome Context Image. These positions can be changed either individually using the dropdown, or by clicking the organism name just to the right of the the dropdown and utilizing the species selection menu. Changes made to the protein ordering will take effect when the redraw button is clicked. The 'hide all' and 'show all' buttons change all of the dropdown values to make protein selection easier.

sybil: strepneumo: protein cluster display: genomic context

This is a graphical display of the genes whose protein products have been clustered; the clustered genes will apepar in the center of the image connected by regions shaded in red/pink. Nearby genes on each of the relevant genomic sequences will also be shown, and those that were also clustered together (by the same clustering analysis) are connected by regions that are shaded in grey.

The names of genes that belong to a cluster (from the same cluster analysis) are displayed in black, whereas the names of genes that do not belong to a cluster appear in light grey. Clicking on any gene launches the protein/gene display for that gene's protein and clicking on any of the red or grey shaded regions will switch to the protein cluster display for the corresponding cluster.

sybil: strepneumo: protein cluster display: clustal alignment

A multiple sequence alignment is precomputed and stored in the comparative database for each and every protein cluster generated by a protein clustering algorithm (with the possible exception of extremely large clusters; a configurable parameter allows the clustering analysis to omit the alignment calculation for clusters over a specified size.) The gene/protein names are color coded by source organism and a line immediately below the multiple alignment displays one measure of the relatedness of the aligned sequences, based on average distance to the consensus sequence.

sybil: strepneumo: sequence selection

The sequence selection form is common to several Sybil pages. It allows you to select genomic sequences for display/listing. Here are the details:

  • Add organisms to the display by clicking the 'add' link on the left. This will add the organism to the next available position (as denoted by the dropdown).
  • By default the longest sequence associated with a particular organism will be selected. To select a different sequence click in the '[change]' link at the right side of the list. This will bring up a popup in which you can add/remove sequences as needed by clicking the checkboxes. Clicking 'save & close' will confirm your selection and close the popup box.
  • Remove organisms by clicking 'remove'.
  • Clicking on the organism name of any of the organisms in the list will allow you to add groups of organisms to the display. At this point you can add all organisms of a particular species by clicking the 'Show G. species first' link in the popup. This will add all of the organisms of that species in the first positions.
sybil: strepneumo:gradient

The synteny gradient display is a unique representation of conserved gene content/order between several genomes. The image generated can be used to identify rearrangements as well as large scale insertions/deletions. The view is based on a reference sequence that is drawn in the bottom panel. Genes in this genome are colored yellow->blue from left to right. White denotes a region with no gene annotation. Matching genes in each of the query sequences are drawn atop their reference match (i.e. not in the order in which they appear in their respective genomes). These genes are colored based on the relative position in their respective genomes (Yellow for the beginning and blue for the end).

The drawing routine can be summarized with the following steps:

  • Draw reference genome with genes colored from yellow to blue.
  • Draw query gene matches above the reference gene they hit.
  • Color the query gene hits based on where they appear in their native genome.
  • Optional: Color query genes with multiple copies (paralogs) black (as the color cannot be determined based position).

Note that the color gradient for each genome is based on that genomes length. Therefore yellow is always the start and blue is always the end of the sequence. The color is therefore used only to determine the relative position of the matching gene in it's native genome. Therefore, 'blue' in one genome should not be interpreted as the exact same basepair location as 'blue' in the reference.

Currently if multiple sequences are selected for a particular organism and the '1 sequence per line' option is not checked then the sequences will be ordered based on length. The gradient will then be assigned based on this false 'assembly'. In this case it is only useful to see what type of genomic coverage you have between the genomes. The order may be confounded because multiple assemblies are present.

sybil web site: e-mail: