HitPredict
PHYSICAL PROTEIN-PROTEIN INTERACTIONS WITH RELIABILITY SCORES

Help and documentation

New in HitPredict Version 4

  • 398696 experimentally identified physical protein-protein interactions from 5 databases
  • 70808 proteins
  • 105 species
  • Interaction score combining annotation-based and method-based information
  • Extensive manual curation

Contents

    HitPredict is a resource of experimentally identified physical protein-protein interactions. It was compiled in the following manner:

  • All physical interactions were downloaded from IntAct, BioGRID, HPRD, DIP and MINT. These were combined to form a non-redundant dataset. Species with less than 10 interactions were removed.


  • All non-physical interactions, such as genetic interactions, were excluded.


  • Interactions of proteins from different species were also excluded.


  • All proteins were assigned valid UniProt IDs. In cases where UniProt IDs could not be directly assigned, protein sequences were aligned to UniProtKB using BLAST and the ID of the longest hit with 99% sequence identity was used.


  • Interactions in which one or both of the proteins were no longer present in UniProt, were removed.


  • Interactions of proteins that did not map to valid UniProt IDs were removed. Interactions were excluded only after manual confirmation.


  • The interacting proteins were annotated with Entrez gene IDs, Ensembl IDs, Pfam domains and Gene Ontology terms.



  • ** The orange boxes in the flowchart shows the processes that involved manual curation.

Scoring

    From version 4, all interactions in HitPredict (small-scale or high-throughput) are assigned an Interaction score. The interaction score denotes the reliability of the interaction and is the geometric mean of the following two methods:

  • Annotation-based Score
    This score is calculated in the form of a likelihood ratio using naive Bayesian networks and is based on the following properties of the interacting proteins:

    1. Structurally known interacting Pfam domains obtained - 3DID

    2. Gene Ontology (GO) annotations of the interacting proteins - GO

    3. Homologous interactions - HINTdb

  • Method-based Score
    This score is based on the experimental information available for the interactions and is calculated as the mean of the following three scores:

    1. Publication score: Score based on the number of unique publications or experiments supporting the interaction

    2. Method score: Score based on the following methods of interaction identification - biophysical, protein complementation assay, post transcriptional inference, biochemical, imaging technique and their subtypes. The default scores for each method are used as specified by the HUPO PSI-MI consortium.

    3. Type score: Score based on the following interaction types and their subtypes - association, physical association and direct interaction. The default scores for each type are used as specified by the HUPO PSI-MI consortium.

    These scores are calculated and combined into a single score using the method shown in Villaveces et al., Database, 2015. A method score >= 0.485 is considered to indicate high confidence. This cut-off is suggested by Villaveces et al.

  • Combined Interaction Score
    This score is the geometric mean of the Annotation-based score and the Method-based score. As such, it takes into account the experimental support for the interaction as well as the genomic features of the interacting proteins. This score has been shown to have a better performance than either of the two scores as well as the score used by the Mentha database (same as that used by MINT). The ROC curve for this evaluation can be seen here


  • ** The orange boxes in the flowchart shows the processes that involved manual curation.

The method and type scores take into account the number of times an interaction was identified with a specific method or type. Since multiple databases can have the same method and type term, this can result in inflation of scores. To prevent this, we followed the method shown in the adjacent flowchart.


Briefly, a unique list of Pubmed IDs supporting each interaction was created from all databases. Each Pubmed ID was then associated with an interaction type and experimental method. This unique lists of method and type terms was used to calculate the Method and Type scores, respectively.


This process required significant manual confirmation since terms used to describe interaction type and methods were sometimes not within the PSI-MI controlled vocabulary. Additionally, different databases often annotated the same Pubmed ID with distinct type and method descriptions. These discrepancies had to be manually resolved before calculation of the method-based score.


Protein Search

   In order to view the interactions of a protein, the user can search for the protein using a keyword, name or identifier of the protein.


By default, a protein is searched for in all 105 species in the database. But the user can select one of the model organisms from the combo box to further limit the search results.


Proteins

The Proteins page gives a list of the proteins that were found in the database for the given query.

The results provide the UniProt ID of the protein, species, its name and description and the number of interactions found in the database for this protein. Clicking on the UniProt ID or the number of interactions leads to the Interactions page for the protein.

 If the search results are not what you expected, you can try another search from the top right hand corner of this page.


Interactions

The Interactions page lists all the interaction of a protein, the number of experiments supporting the interaction, if the interaction is found only in high-throughput experiments, the method score, annotation score, the final interaction score and the interaction confidence.

A graph representing the interaction network of the protein, shown in red, with upto 15 interactors is shown. The interactions with the highest scores are displayed. The links are color coded to indicate the interaction score. The nodes in the graph are linked to the Interactions pages of the respective proteins and the edges are linked to a page showing the details of the interaction.

This is followed by a tabular list of all the interactions for the protein sorted by confidence. At the top of the table is a Download interactions link that allows the user to download the interactions from the table.

The Interaction ID shown in the table is generated by HitPredict. Clicking on the interaction ID takes the user to a page showing the various attributes of the interaction that were used to calculate the Method Score and the Annotation Score.


Interaction details

The Interaction details page shows detailed information about the interaction selected from the Interactions page.

This page shows the following information:

  • Details of the interacting proteins, the organism in which the interaction has been observed, the database from which this interaction was obtained and the three interaction scores.

  • Experimental evidence in the form of Pubmed IDs supporting the interaction, the various identification methods and the interaction types associated with the interaction. This information is used to calculate the method score. Each publication supporting the interaction is denoted as an "Experiment".

  • Features of interacting proteins used to calculate the Annotation score in the form of the possible interacting Pfam domains present in the interactors, common GO terms between the interactors and the number of homologs found for this interaction within the HitPredict database.

    The interacting proteins are shown in yellow with their homologs aligned below them. Adjacent homologs interact with each other. The homologs are positioned according to the location of alignment and the color indicates the score of the similarity. Clicking on the '+' at the far left shows the homologous interaction ID, information about the homologous proteins, the e-value, score and percentage of similarity with the query proteins. This is followed by a two letter abbreviation denoting the species in which it is found.


Download interactions

All interactions from HitPredict are available for download. Interactions can be downloaded in two formats:

  • Simple text format giving the UniProt ID, Name, Entrez Gene ID and Ensembl ID of the two interacting proteins along with the method, annotation and total scores for each interaction.

  • PSI-MITAB 2.5 format giving the details of the interacting proteins, the source database and interaction scores as specified by the HUPO PSI-MI standards.

Interactions of model organisms can be downloaded individually. Interactions of all other organisms can be obtained from the HitPredict_interactions.txt.tgz/ HitPredict_interactions_MITAB-2.5.tgz files.


References

Yosvany Lopez, Kenta Nakai and Ashwini Patil; HitPredict version 4 - comprehensive reliability scoring of physical protein-protein interactions from more than 100 species, Database 2015, accepted.

   Ashwini Patil, Kenta Nakai and Haruki Nakamura; HitPredict: a database of quality-assessed protein-protein interactions in nine species, Nucl. Acids Res. 2011 Database Issue:D744-9

   Ashwini Patil and Haruki Nakamura; Filtering high-throughput protein-protein interaction data using a combination of genomic features, BMC Bioinformatics, 2005, 6:100




Human Genome Center, The Institute of Medical Science, The University of Tokyo
Last updated: 10 September 2015