|
|
|
This is the help page for the Sybil web interface. The help is organized by page; for each of the following pages
documentation is available that describes each of the major features of the page. The help may be accessed directly
via one of the following links or by scrolling through this document. Or, one may follow the "help" links that
appear throughout the web site in order to go directly to the relevant help entry.
|
|
This page is the default entry point to the Sybil web interface. It displays summarized information on the
contents of the comparative database and can be customized with links to the various comparative tools
and displays. Note that certain links and/or summary tables will only appear if the necessary supporting data
have been loaded into the comparative database.
|
| sybil: pneumo: home: Search for genes/proteins |
|
The gene/protein search tool is a simple query interface that allows one to search the comparative
database for all genes whose protein product description matches one or more keywords. There are
a few things to note when using this tool:
- the search is case-sensitive (i.e., in the sample database a search for "histone" or "Histone" will succeed--and give different answers--but a search for "HISTONE" will retrieve nothing)
- multiple keywords must be separated by spaces (e.g., "kinase receptor")
- the search tool will NOT interpret the words "and" and "or" as logical operators (i.e., they are treated like any other keyword)
- any protein matching at least one of the keywords will be retrieved
- proteins are ranked according to the number of keyword occurrences in each
A more sophisticated gene/protein search tool is planned; visit the Sybil project site for software updates.
|
| sybil: pneumo: home: List protein clusters |
|
The "List protein clusters" tool enumerates all of the protein clusters that were generated by a particular
protein clustering analysis. Simply choose the desired analysis and output format (either HTML or tab-delimited
plain text) and then select/click on "list clusters." For more information about the protein clustering analyses available
in Sybil, see the relevant descriptions on the Sybil project web site:
Note that the parameters used for each of the analyses in the current database should be summarized in
the "description" column in the list of "Analyses/computes" that appears on the database home page. Also note
that the sample database does not necessarily contain data from all of the above clustering analyses.
|
| sybil: pneumo: home: Organisms/genomes |
|
This is a list of all the organisms and/or genomes for which sequence and/or comparative data have been loaded
into the comparative database. In the sample database all of the genomes (P. falciparum,
P. y. yoelii, and C. parvum) were loaded from the corresponding annotation databases at TIGR, hence
the database identifiers of the form "TIGR_euk:pfa1" in the "source database" column. The "description" column
gives a brief summary of the genomic sequence data loaded for each organism.
|
| sybil: pneumo: home: Analyses/computes |
|
This table lists all of the analyses/computes that have been loaded into the comparative database. The Sybil
project site contains a list of the various analyses
supported by the system.
Note that the "description" column in this list mentions some of the parameter settings that were chosen when the analysis
was run (since it is possible to run/load the same analysis multiple times, but with different parameter settings.) The Jaccard
cluster analysis, for example, is typically run at least once for each genome in the database, followed by one or more
cross-genome clustering analyses, such as the Sybil "COG" analysis or Jaccard-filtered COG analysis.
|
| sybil: pneumo: protein keyword search |
|
This page displays the results of a gene/protein keyword search. All of the proteins whose product description
contains at least one of the query keywords will be listed here, with those that have the most keyword matches
appearing first. Following a link on this page will launch the individual gene/protein summary page for the
selected gene/protein.
If the search returned no results, use the browser's "back" button to return to the previous page and try the search
again with a different keyword, or a substring of the original keyword.
Note that the search is currently case-sensitive, so to find occurrences of both "Histone" and "histone" it will
be necessary to search either with both keywords (i.e., "Histone histone") or to use a substring that is likely to
match both (and only) keywords (e.g., "istone").
|
| sybil: pneumo: list protein clusters |
|
This page displays all the protein clusters that were generated by a single protein clustering analysis.
Following a link on this page will launch the protein cluster display page for the selected cluster. For
more information about the protein clustering analyses available in Sybil, see the relevant
descriptions on the Sybil project web site:
|
| sybil: pneumo: protein display |
|
This page displays summarized information pertaining to a single gene/protein in the
comparative database.
|
| sybil: pneumo: protein display: protein properties |
|
A list of protein properties. In future this section may be expanded to include additional stored
and/or computed physical properties of the protein. Currently however, only the following properties
are reported:
- organism - organism/genome to which this protein or gene belongs
- product name - a description of the protein product
- sequence length - length of the protein's predicted amino acid sequence
- created - date on which the protein was created in the (original) database
- last modified - date on which the protein was last modified
|
| sybil: pneumo: protein display: database references |
|
A list of external database references and/or alternate accession numbers and IDs for the protein.
In the sample database each of the genes should have at least one reference back to the annotation
database from which it was loaded. Whenever possible these external database accession
numbers should be hyperlinked to the appropriate database's web site.
|
| sybil: pneumo: protein display: protein clusters |
|
A list of all the protein clusters to which this protein/gene belongs. Each cluster in this list is hyperlinked to
the protein cluster display page for that cluster. For more information about the protein clustering analyses available
in Sybil, see the relevant descriptions on the Sybil project web site:
|
| sybil: pneumo: protein display: genomic context |
|
A graphical representation of the region(s) of the genome to which this gene maps. The name of the
gene itself appears in black and the names of neighboring genes are shown in grey. Click on any gene
in the image to switch to the gene/protein summary page for that gene. Note that genes are color-coded
according to the organism from which they are derived, using the site-wide color scheme that has been
defined in the Sybil site configuration file.
|
| sybil: pneumo: protein display: blastp hits |
|
A graphical representation of the top few BLASTP matches for the protein. The current protein appears
as a thick colored rectangle at the top of the image, labeled with a sequence axis that reflects the
length of its predicted protein sequence. Matches (i.e., BLAST HSPs/GSPs) to other proteins
in the database are indicated by the thinner colored rectangles below. Each match is annotated with its
P-value and percent identity score and matches to the same target protein are grouped together. Color
is used to signify the organism/genome to which each protein belongs. Clicking on any protein except
the query protein at the top will switch to the gene/protein summary page for that protein.
|
| sybil: pneumo: protein display: amino acid sequence |
|
The predicted amino acid sequence of the protein in FASTA format. Note that certain frameshifted "ORFs"
or pseudogenes may not have an amino acid sequence stored in the comparative database.
|
| sybil: pneumo: protein cluster display |
|
This page displays information pertaining to a single protein cluster. For more information about the protein clustering analyses available
in Sybil, see the relevant descriptions on the Sybil project web site:
|
| sybil: pneumo: protein cluster display: cluster summary |
|
This section displays summarized information about the current protein cluster. This information includes:
- algorithm - an abbreviation for the protein clustering algorithm used to generate this cluster
- description - a more verbose description of the protein clustering algorithm used to generate this cluster, often including the values of one or two key parameters
- number of proteins - the number of proteins that should appear in the list of "clustered proteins"
- avg. blastp identity - an approximate percentage measuring the well-conservedness of the proteins in this cluster, computed by averaging the percent identities of all high-scoring pairwise BLASTP HSPs/GSPs
- avg. blastp coverage - an approximate percentage that measures how "well-covered" (on average) each protein is by BLASTP HSPs/GSPs. If this value is relatively low but the avg. blastp identity is high then the cluster may have been formed as the result of one or two very well-conserved motifs appearing in each of the proteins. If this value is higher, on the other hand, it indicates that the member proteins have some similarity over all or most of their respective lengths (assuming that all of the proteins are similar in size, since this value will drop if a single short protein is aligned with several longer ones.)
|
| sybil: pneumo: protein cluster display: clustered proteins |
|
A list of all the proteins that have been clustered together by the protein clustering algorithm.
If the protein cluster has been edited subsequently by a curator (i.e., adding or removing proteins
from the cluster based on manual inspection and/or evidence from the scientific literature), any
newly-added proteins will be highlighted in this list. Proteins in the list are color-coded
according to their respective source organisms, and each protein name is hyperlinked to the
protein/gene display for that protein. Additionally, the dropdown menus to the left of each
protein entry indicate the position of the protein in the
Genome Context Image. These positions can be changed either individually using the dropdown, or by clicking the
organism name just to the right of the the dropdown and utilizing the species selection menu.
Changes made to the protein ordering will take effect when the redraw button is clicked. The
'hide all' and 'show all' buttons change all of the dropdown values to make protein selection
easier.
|
| sybil: pneumo: protein cluster display: genomic context |
|
This is a graphical display of the genes whose protein products have been clustered; the clustered genes
will apepar in the center of the image connected by regions shaded in red/pink. Nearby genes on each of
the relevant genomic sequences will also be shown, and those that were also clustered together (by the
same clustering analysis) are connected by regions that are shaded in grey.
The names of genes that belong to a cluster (from the same cluster analysis) are displayed in black, whereas the
names of genes that do not belong to a cluster appear in light grey. Clicking on any gene launches the protein/gene
display for that gene's protein and clicking on any of the red or grey shaded regions will switch to the protein cluster
display for the corresponding cluster.
|
| sybil: pneumo: protein cluster display: clustal alignment |
|
A multiple sequence alignment is precomputed and stored in the comparative database for each and every protein
cluster generated by a protein clustering algorithm (with the possible exception of extremely large clusters;
a configurable parameter allows the clustering analysis to omit the alignment calculation for clusters over a
specified size.) The gene/protein names are color coded by source organism and a line immediately below the
multiple alignment displays one measure of the relatedness of the aligned sequences, based on average distance
to the consensus sequence.
|
|