Quality Assessment Method


HitPredict uses the following three features to calculate the reliability of an interaction:
  1. Structurally known interacting Pfam domains obtained from 3DID - high quality evidence but sparse due to lack of complex structures.
  2. Gene Ontology (GO) annotations of the interacting proteins from GO - available frequently but requires that the protein be annotated.
  3. Homologous interactions obtained from HINTdb - does not require structure or annotation information.
The features are selected to make maximum use of sequence, structure and functional annotations of the interacting proteins and each feature has its advantages and limitations as noted above.

The block diagram below gives a brief overview of the method used to calculate the reliability of the interactions.

High confidence interactions were predicted in a set of standard positive (known true) and standard negative (known false) interactions in S. cerevisiae based on the presence of one or more of the features listed above. Following the assessment of the predictions i.e. calculation of number of true positive and false positive predictions, the reliability of each feature was calculated in the form of the Likelihood ratios using the equation shown above. The accuracy of the method was determined by calculating the specificity and sensitivity using 10-fold cross-validation.

Bayesian networks can be used to combine evidence from different sources. They estimate the posterior odds of an event based on prior evidence. In this case, Bayesian networks were used to combine the likelihood ratios of the 3 features and estimate the odds of an interaction being true given that it had one or more of the 3 features as illustrated by the equations below.

As seen in the greph below, a combination of genomic features gives a high Likelihood ratio indicating a high probability that the interaction supported by these features is true. It should also be noted that different features provide differing levels of confidence in the interaction.

The performance of the method is assessed using the Receiver Operating Characteristic curve shown below.

The Likelihood ratio can then be used to estimate the reliability of other interactions in various species.


   Ashwini Patil, Kenta Nakai and Haruki Nakamura; HitPredict: a database of quality-assessed protein-protein interactions in nine species, Nucl. Acids Res. 2011 Database Issue:D744-9

   Ashwini Patil and Haruki Nakamura; Filtering high-throughput protein-protein interaction data using a combination of genomic features, BMC Bioinformatics, 2005, 6:100

Human Genome Center, The Institute of Medical Science, The University of Tokyo
Last updated: 10 September 2015