HitPredict

Help and documentation

New in this version

 More proteins! More interactions! Two new evidence types. Now interactions from multiple experiments and small-scale binary experiments are scored and marked as high confidence. Read details below.


Introduction

    HitPredict is a resource of high confidence protein-protein interactions (PPIs) in 9 species. HitPredict was compiled in the following manner:

  • All physical interactions for the selected 9 species (listed below) were downloaded from IntAct, BIOGRID and HPRD and combined to form a non-redundant dataset of 239584 PPIs. We chose these 3 data sources because of their extensive data coverage and comprehensive annotations. Genetic interactions were excluded. Similarly, interactions in which one or both of the proteins were no longer present in UniProt, were also removed.

  • The interacting proteins were annotated with Pfam domains, Gene Ontology terms, PDB ids, sequence homologs and other genomic identifiers.

  • In order to assess the reliability of the interactions, we divided them into the following categories:

    • Small-scale (SS) interactions - These interactions are identified in small-scale experiments that focus on the study of a few proteins. For the purposes of this analysis, experiments with less than 100 interactions were considered as small-scale. Small-scale interactions are further differentiated into:

      1. Small-scale (SS) binary interactions - These interactions are directly observed in the binary form through experimental techniques like yeast-2-hybrid, FRET, X-ray crytallography, etc. These interactions are considered to be high confidence.

      2. Small-scale (SS) derived interactions - These interactions are derived from protein co-complex data obtained using Co-IP or pull-down experiments. The binary interactions between the bait and prey proteins are estimated using the spoke model. Reliability score is calculated for these interactions to predict their confidence levels.

    • High-throughput (HTP) binary and derived interactions - These interactions are determined in high-throughput, or large-scale, experiments. These may be either directly observed or derived from protein co-complex data.For the purposes for this analysis, experiments with more than 100 interactions are considered to be high-throughput. Reliability score is calculated for these interactions to predict their confidence levels.

A reliability score in the form of a Likelihood ratio is calculated for SS derived and HTP interactions. The likelihood ratio is computed using naive Bayesian networks and is based on the following properties in the interacting proteins:
  1. Structurally known interacting Pfam domains obtained - 3DID
  2. Gene Ontology (GO) annotations of the interacting proteins - GO
  3. Homologous interactions - HINTdb

Search

    Interactions can be searched for by selecting the following options:
  1. Species - Interactions for one of the 9 species can be searched by selecting the appropriate term from the combo box. Data is available for human, mouse, rat, fly, worm, E. coli, budding yeast, fission yeast and Arabidopsis.

  2. Protein - Proteins can be searched for using their UniProt ID, Entrez gene ID, RefSeq ID, name or description. This serach gives a list of proteins.


Proteins

The Proteins page gives a list of the proteins that were found in the database for the given query.

The results provide the UniProt ID of the protein, its name and description and the number of interactions found in the database for this protein. Clicking on the UniProt ID or the number of interactions leads to the Interactions page for the protein.

 If the search results are not what you expected, you can try another search from the top right hand corner of this page.


Interactions

The Interactions page lists all the interaction of a protein, their type and their predicted confidence level along with any supporting evidence.

A graph representing the interaction network of the protein, shown in orange, with upto 15 interactors is shown. The interactions with the highest confidence are displayed. The links are color coded to indicate the type of the interaction and its confidence. The nodes in the graph are linked to the Interactions pages of the respective proteins.

This is followed by a tabular list of all the interactions for the protein sorted by confidence.

Brief descriptions of the columns in the table can be seen by moving the cursor over them. Following is the detailed column description:

  1. Interaction ID: The interaction ID in HitPredict. Clicking on it leads to the Interaction Details page which gives further information about the interaction and any evidence that supports its confidence prediction.

  2. Interactor: Name and description of the interacting protein. Clicking on the name takes the user to the Interactions page of this protein.

  3. Confidence: Shows the confidence level predicted by HitPredict for this interaction.

  4. Likelihood: Likelihood ratio of the interaction based on evidence found to support it. Likelihood for each evidence is calculated as the ratio of the true positive rate and the false positive rate with which it can identify interactions in a gold standard. The combined likelihood for an interaction is obtained by multiplying the likelihood ratios of the different sources of evidence that support it. Hence likelihood ratio corresponds to the reliability of the interaction. all interactions with a likelihood > 1 are predicted to be true. The likelihood increases with the number of genomic features that support an interaction.

    • S: The interaction was determined in binary form in a small-scale experiment.

    • M: The interaction was determined in binary form in multiple experiments.

    • D: The interacting proteins each contain of the two Pfam domains that are known to interact structurally.

    • G: The interacting proteins have at least one GO term in common.

    • H: There are one or more similar interactions known in the same or other species.


Interaction details

The Interaction details page shows detailed information about the interaction selected from the Interactions page.

This page shows the following information:

  1. Interacting Proteins - Annotations of the interacting proteins.

    • G: Lists the GO terms that the interacting protein share. Clicking on a term leads to more information about it.


Reference

   Ashwini Patil, Kenta Nakai and Haruki Nakamura; HitPredict: a database of quality-assessed protein-protein interactions in nine species, Nucl. Acids Res. 2011 Database Issue:D744-9

   Ashwini Patil and Haruki Nakamura; Filtering high-throughput protein-protein interaction data using a combination of genomic features, BMC Bioinformatics, 2005, 6:100

Human Genome Centre, Institute of Medical Science, University of Tokyo

Last updated: 1 May 2012