About


Prediction Method

We designed and developed a system for the prediction of protein subcellular localization (P2SL). P2SL uses local subsequence features along with various amino acid similarity schemes. We used self-organizing map (SOM) for prototype feature extraction and implicit protein sorting signal (motif) distribution. Subsequently a set of support vector machines (SVMs) is used for the classification of the features extracted by SOM. Hence, P2SL is a hybrid computational system that predicts over ER-Targeted (all ER-mediated membrane enclosed proteins), cytosolic, mitochondrial and nuclear protein localization classes.

MEP2SL Search Methods

The MEP2SL database can be queried in four categories:

Data File Format

All downloadable data files are in tab separated gzipped plain text format composed of five columns. The first colums is UniRef100 id, the second is the predicted subcellular localization distribution of the sequence, the third is sequence description, the fourth is sequence, and the last column is the annotated subcellular localization from UniProt Knowledgebase.

Localization Distribution Interpretation

Our approach finds the frequency distribution of protein subsequences for each subcellular localization class and then uses this distribution as a feature for classification. ER-Targeted, cytosolic, mitochondrial or nuclear class probability distributions are represented by samples of subsequence distributions over SOM. ER-Targeted versus Cytosolic, ER-Targeted versus Mitochondrial, ER-Targeted versus Nuclear, Mitochondrial versus Cytosolic, Mitochondrial versus Nuclear and Nuclear versus Cytosolic binary SVM classifiers are used. Each class is voted over three classifiers. Considering only the 2 and 3 voted localization classes, we represent the localization distributions in 26 different sets:

3/3 Nuclear
3/3 Cytosolic
3/3 ER-Targeted
3/3 Mitochondrial
3/3 Nuclear and 2/3 Cytosolic
3/3 Nuclear and 2/3 ER-Targeted
3/3 Nuclear and 2/3 Mitochondrial
3/3 Cytosolic and 2/3 ER-Targeted
3/3 Cytosolic and 2/3 Mitochondrial
3/3 Cytosolic and 2/3 Nuclear
3/3 ER-Targeted and 2/3 Cytosolic
3/3 ER-Targeted and 2/3 Mitochondrial
3/3 ER-Targeted and 2/3 Nuclear
3/3 Mitochondrial and 2/3 Cytosolic
3/3 Mitochondrial and 2/3 ER-Targeted
3/3 Mitochondrial and 2/3 Nuclear
2/3 Cytosolic and 2/3 Nuclear
2/3 Cytosolic and 2/3 Mitochondrial
2/3 ER-Targeted and 2/3 Cytosolic
2/3 ER-Targeted and 2/3 Mitochondrial
2/3 ER-Targeted and 2/3 Nuclear
2/3 Mitochondrial and 2/3 Nuclear
2/3 Cytosolic and 2/3 Mitochondrial and 2/3 Nuclear
2/3 ER-Targeted and 2/3 Cytosolic and 2/3 Mitochondrial
2/3 ER-Targeted and 2/3 Cytosolic and 2/3 Nuclear
2/3 ER-Targeted and 2/3 Mitochondrial and 2/3 Nuclear

Diagram Interpretation for Localization Distribution

Twenty-six columns represent localization distribution sets. The number of sequences in each set is written at the top of the each column. Each localization class (ER-Targeted, Cytosolic, Mitochondrial, Nuclear) is color coded (ER-Targeted: yellow, Cytosolic: blue, Nuclear: red, and Mitochondrial: green). The width of the color band in each column is an indicator of the prediction votes such that thinner band is for 2 votes and thicker one is for 3 votes.