Abstract

Fisher kernels have been successfully applied to many problems in bioinformatics. However, their success depends on the quality of the generative model upon which they are built. For Fisher kernel techniques to be used on novel problems, a mechanism for creating accurate generative models is required. A novel framework is presented for automatically creating domain-specific generative models that can be used to produce Fisher kernels for support vector machines (SVMs) and other kernel methods. The framework enables the capture of prior knowledge and addresses the issue of domain-specific kernels, both of which are current areas that are lacking in many kernel-based methods. To obtain the generative model, genetic algorithms are used to evolve the structure of hidden Markov models (HMMs). A Fisher kernel is subsequently created from the HMM, and used in conjunction with an SVM, to improve the discriminative power. This paper investigates the effectiveness of the proposed method, named GA-SVM. We show that its performance is comparable if not better than other state of the art methods in classifying secretory protein sequences of malaria. More interestingly, it showed better results than the sequence-similarity-based approach, without the need for additional homologous sequence information in protein enzyme family classification. The experiments clearly demonstrate that the GA-SVM is a novel way to find features with good performance from biological sequences, that does not require extensive tuning of a complex model.

1.  Introduction

Support vector machines (SVMs) have taken a dominant position in many machine learning applications due to their excellent generalization performance. Yet, this performance owes its success to the (seemingly unreasonable) effectiveness of standard kernel functions to capture prior information about the application domain. In the domain of biological sequence analysis, SVMs using standard kernels are frequently used (see, e.g., Liao and Noble, 2003; and Yang, 2004). As an attempt to tailor SVMs to other problem areas, considerable effort has been made to construct kernel functions to have built-in prior knowledge of the problem domain. For example, string kernels have been successfully applied in protein sequence classification. These include the spectrum kernel which builds on a sequence-similarity kernel (Leslie et al., 2002), and mismatch kernels that measure similarity by matching approximate common substrings (Leslie et al., 2004). String kernels have also been applied to the problem of classifying large scale datasets such as splice site classification (Rätsch et al., 2006).

A more general and flexible approach to constructing kernels that capture prior domain knowledge is the use of so called Fisher kernels. In this method, positive definite kernels are constructed using Fisher score vectors that are based on an underlying generative model (Jaakkola et al., 1999, 2000). Of particular interest for this paper are Fisher kernels based on hidden Markov models (HMMs; Rabiner, 1989; Durbin et al., 1998). This approach, however, suffers from the limitation of requiring a good HMM for the problem a priori. For example, Jaakkola et al. used the SAM-T98 HMM (Karplus et al., 1998), which was designed for the protein homologies they were studying (Jaakkola et al., 1999, 2000). Although Fisher kernels have been used to solve more problems in bioinformatics, such as protein classification (Tsuda et al., 2002) and splice-site recognition (Sonnenburg et al., 2002; Ratsch and Sonnenburg, 2004), their usage and performance is highly dependent on the HMM used. This raises the practical question: How can one construct a suitable HMM to use as a basis for a Fisher kernel that is adapted for a particular task? The solution we propose is to evolve the structure of an HMM using a genetic algorithm (GA), which is then used to obtain a kernel function using the standard Fisher kernel framework. This combination of automated generation and interpretability of the kernel mapping offers practitioners very desirable properties for using SVMs in this setting.

HMMs are easy to train given sequence data using the well-known Baum-Welch algorithm. Before training the model, its structure has to be determined; this task has traditionally been left to the domain specialist. There have, however, been some attempts at automating this process. Iterative improvements of HMM have been proposed (Stolcke, 1994; Fujiwara et al., 1995) as have GAs (Yada et al., 1994) although these methods do not seem to have been widely adopted. One of the difficulties of evolving the HMM architecture is to retain prior information that may be available, while simultaneously allowing sufficient flexibility to learn from data. The approach taken in this paper builds on previous work evolving block-structured HMMs using a GA (Won et al., 2006, 2007, 2008). For simple problems, these HMMs show no significant difference in performance from carefully constructed hand-designed HMMs (Won et al., 2004, 2006). Interestingly, for complex problems such as protein structure prediction, evolved models have been shown to significantly outperform their hand-designed counterparts (Won et al., 2007). One explanation for this observation is that hand designing the structure of a large HMM is probably beyond the ability of most humans—a not entirely surprising conclusion.

The purpose of this paper is to investigate the effectiveness of using evolved HMMs, which are then used to create Fisher kernels for use in SVMs. This is an important question, as it would allow Fisher kernels to be used in new domains where an HMM has not already been designed by a domain expert. We investigate a number of issues including the advantage of using Fisher kernels and SVMs over just using the HMM for classification, the performance of the GA-SVM compared to other SVM-based algorithms, and the use of an evolved HMM developed for a different but related problem.

The rest of this paper is organized as follows. In the next section, we introduce a framework for creating a GA-SVM. We briefly describe how we evolve an HMM using a GA and give a review of HMMs and Fisher kernels. In Section 3, we describe the experimental evaluation of GA-SVM and compare the performance with other SVM-based methods. We study the performance of a GA-SVM with a toy HMM, an evolved HMM, and an HMM evolved to model protein secondary structure. Section 4 provides our analysis of how changing the structure of the evolution affects the performance of the final classifier. We conclude in Section 5.

2.  Framework to Design a Sequence-Specific Kernel

The success of a Fisher kernel depends on a well-tailored generative model. To obtain a sequence-specific Fisher kernel without expert knowledge, we designed a two-stage framework. First, the structure of an HMM is evolved using a GA that optimizes the likelihood of the HMM to generate a positive training set of sequences. Next, the Fisher kernel is constructed from the evolved HMM using both the positive and negative training set.

An HMM is a probabilistic finite-state machine consisting of transition probabilities between states, and probabilities for the emission of symbols, that differ at each state. It can be viewed as a generative model, which randomly generates sequences of symbols. It starts in some initial state, and, at each step, it moves from the current state to the next state, with a probability given by the transition probabilities. It then emits a symbol according to the emission probabilities of the state that it is in. The HMM can be used as a classifier by computing the likelihood of a sequence being generated by the HMM. The HMM is trained on a set of sequences, so that the likelihood of generating any of those sequences is high compared to the likelihood of generating a random sequence. Details of training the HMM and turning the HMM into a Fisher kernel are given in Section 2.2. In the next section, we discuss the procedure used for evolving the structure of the HMMs.

2.1.  Evolving HMMs

Designing the structure of an HMM is key to its performance. Once the structure of an HMM is decided, the other parameters (transition and emission probabilities) are easily updated using the well-known EM algorithm (Dempster et al., 1977). The decisions that need to be made in designing the HMM structure are the number of hidden states and the connections between states (i.e., which transitions are allowed). These decisions are necessary to control the capacity of the learning machine. The more states and connections that an HMM has available, the more capable it will be to learn the training data, but also, the more susceptible it will be to over-fitting the training data and giving poor generalization behavior. As with any machine learning task, the generalization performance will depend on matching the capacity of the machine with the amount of training data. Our goal is to find an HMM structure that achieves this balance. For some simple problems, there are reasonably well tested procedures for designing HMM structures (Durbin et al., 1998). However, in many cases biological information is limited, and it is hard to know how to design a good HMM architecture.

Previous attempts at evolving HMMs using GAs have allowed the genetic operators to add and delete a state in any position (Yada et al., 1994; Chau et al., 1997). In contrast, we constrain the evolution so that the HMM retains a block structure (Won et al., 2006). Although this appears to be a rather modest change, it builds in sufficient domain knowledge that the evolved HMMs substantially outperform other HMMs that do not use the block structures. This makes the evolved HMMs competitive with, if not better than, hand-designed ones.

2.1.1.  Block Structure to Form an HMM

The block structure HMM uses four types of blocks: linear, self-loop, forward jump, and zero state blocks (see Figure 1). These structures are commonly used by domain specialists when designing HMM for biological sequence analysis. The blocks are interconnected to build a complete HMM. A linear block models a sequence of exactly n residues, as the states have only one transition to the next states. A self-loop block models a sequence with at least n residues. States in the self-loop blocks have transitions to themselves and to the next state. A state with a transition probability to itself of aii will, on average, visit itself 1/(1−aii) times. A forward jump block models sequences of length m where and (the number of forward transitions from the first state to states at the end of the block is a parameter, which also is allowed to evolve). Zero blocks do not emit a sequence. During the genetic procedure, zero blocks change the effective number of the blocks in the individual HMMs. This regulates the size and complexity of the HMMs. The blocks are fully linked to each other, meaning the state at the end of each block has transitions to the states at the beginning of all the blocks. The block structure has been successfully applied to modeling biological sequences. For example, the two-state linear block captured the periodic hydrophilic and hydrophobic amino acids (Won et al., 2007), and forward loop blocks captured the spacer and the periodic signals of the promoters in C. jejuni (Petersen et al., 2003).

Figure 1:

Blocks that are used to construct a full HMM: (a) linear block, (b) self-loop block, (c) forward jump block, and (d) zero block.

Figure 1:

Blocks that are used to construct a full HMM: (a) linear block, (b) self-loop block, (c) forward jump block, and (d) zero block.

2.1.2.  Genetic Operators for Block Structure: Mutation and Crossover

A GA is used to search for a structural model, while the standard parameter estimation method (i.e., Baum-Welch) is used for learning transition and emission probabilities. The HMM structure is changed by mutating the blocks and combining pairs of HMMs using a crossover operator. Mutations occur within the blocks, which allows the HMM structure to adapt to the problem by adding or deleting a component (a state or a transition) of a block. Figure 2 shows a mutation on the second block that adds one state to an HMM (or deletes a state from an HMM if we interpret the diagram in the opposite direction). When adding or deleting a state in a block, the transitions that link the blocks do not change. During mutation, the transitions between states are updated to maintain the type of block.

Figure 2:

A mutation operation occurred in one of the states, in self-loop blocks. A state is added to the first HMM (or deleted from the second HMM).

Figure 2:

A mutation operation occurred in one of the states, in self-loop blocks. A state is added to the first HMM (or deleted from the second HMM).

Note that the complexity of the structure is varied in two ways. Mutations can change the number of states in each block. In addition, a block can mutate to another type of block which might have a different number of transitions in it. Most dramatically, a block could mutate to a zero block or vice versa. This effectively removes (or adds) a whole block, thus adapting the learning capacity of the HMM.

Crossover inherits well-adapted blocks from one generation to the next by exchanging a single or several blocks between two HMMs. States within a block are crossed over along with the block. Cross over exchanges the same number of blocks, but it may exchange a different number of HMM states between the two HMMs. Figure 3 shows a crossover where blocks are exchanged, and the internal states are reordered. Internal transitions within a block (e.g., agi, a44, a45, etc.) move with the block. Transitions that link blocks change their indices. For instance, the transition a51 becomes ai1, but the probability is not changed.

Figure 3:

A crossover operation that exchanges blocks is shown. The second block of the first HMM is crossed over with the third block of the second HMM. This generates two new HMM structures. During crossover, the number of blocks does not change.

Figure 3:

A crossover operation that exchanges blocks is shown. The second block of the first HMM is crossed over with the third block of the second HMM. This generates two new HMM structures. During crossover, the number of blocks does not change.

2.1.3.  Learning HMM Structures from Biological Sequences

The initial population of P HMMs are generated randomly. Blocks are selected among the four types of the HMM blocks (Algorithm 1). The length of a block is also assigned randomly from 1 to 10. The number of blocks is kept fixed, while the type of the block is changed by mutation. However, the zero block has the effect of changing the number of functional blocks in the HMM. Each member is evaluated by calculating its fitness. A number of members are chosen through the selection procedure based on each member's fitness. The structure of the selected members is evolved using the genetic operators (crossover and mutation). The whole population is then parameter optimized using the Baum-Welch training method (Rabiner, 1989). After a number of iterations, most of the initial transitions converge to zero and yield simpler structures.
formula
In order to calculate fitness, we use the Akaike information criterion (Akaike, 1994) to balance the likelihood (ability to model sequences) and the complexity of each model structure (susceptibility to over-fitting)
formula
1
where li is the length of a sequence xi, and p labels the HMMs (with parameters . The symbol fp denotes the number of free parameters in the HMM; and balances the likelihood and the complexity of the HMM. At each generation, the members of the population are selected with a Boltzmann probability
formula
2
and is the standard deviation in the fitnesses of the members of the population. The term s controls the strength of the selection. We used s equal to 0.3, equal to 0.5, and P equal to 30 for the simulation—these values were chosen after initial experimentation, although the final results were found to be reasonably robust to modest changes of these parameters.

2.2.  Fisher Kernels from HMMs

Although the theory of HMMs is well known, we briefly describe them in enough detail to allow us to describe how to create Fisher kernels. An HMM is a learning machine that assigns a probability to an observed sequence of symbols, . Each symbol comes from a finite alphabet (in the problems discussed in this paper this is the set of 20 amino acids). The HMM consists of a set of states . At each step, the HMM will emit a symbol and make a transition between states. The state of the model at time step t is denoted qt. The transition probability for moving from state i to state j is denoted by ; while emission probabilities for emitting a symbol l, given that we are in a state i, is denoted by . We denote the full set of parameters for the HMM by . For any given state sequence , the joint probability for the HMM visiting that state sequence and emitting a symbol sequence x is given by
formula
The initial state (q0) is assumed not to emit a symbol. To compute the likelihood of a sequence x we sum over all possible paths through the hidden states
formula
3
Note that the actual state of the system at any time step is hidden (since the states are marginalized over).
Naively summing over all possible state paths appears computationally expensive, since there are an exponential number of paths. However, all quantities of interest can be efficiently computed from the forward and backward variables () which are defined as
formula
4
formula
5
These are calculated using the forward and backward dynamic programming algorithms. We can also use these quantities to obtain the posterior probability
formula
6
For a detailed description of these algorithms, see Rabiner (1989) and Durbin et al. (1998).
Often, we need to calculate how well a model discriminates a set of sequences from other sequences. One approach to achieve this using HMMs is to train two HMMs, one on positive instances, and the other on negative instances. We then take the log-odds to decide which model a sequence (x) belongs to
formula
7
where are the parameters obtained by training an HMM with positive examples, and are the parameters obtained by training an HMM on negative (or background) sequences. Below, we use this approach to test the discriminative ability of the HMM alone.
A Fisher kernel is derived from a generative model. The Fisher kernel is defined as
formula
8
where F is the Fisher information matrix and is a Fisher score vector (Jaakkola et al., 2000). The Fisher information matrix is defined as the expectation of the correlation matrix across Fisher-score vectors. The calculation of the Fisher information matrix, F, requires second-order derivatives of the log-likelihood function, which is computationally expensive and sometimes simply intractable (Spall, 2005). Furthermore, the calculation would lead to a whitening of the data. Therefore, F is often replaced by the identity matrix; we follow the same methodology here.
The Fisher score vector is the derivative (gradient) of the log-likelihood of a sequence, , with respect to the parameters of the HMM. If we denote a generic parameter by , then the component of the Fisher score vector for this parameter is given by
formula
9
The derivatives with respect to the transition probabilities are
formula
10
where mij is the number of times a transition from state i to state j occurs in generating the sequence x. That is,
formula
11
The right-most term in Equation (10) gives the number of times a state i is visited, which is equivalent to the sum of the posterior probabilities in Equation (6) at state i for the whole process.
Similarly, the derivatives with respect to the emission probabilities are
formula
12
where mi(l) is the number of times a state i emits a symbol l along the whole process. This can again be computed from the forward and backward variables
formula
13
The update rules of the Baum-Welch algorithm for the transition and emission probabilities are
formula
14
respectively. Thus, the Fisher score uses the same set of statistics that is used by the Baum-Welch algorithm. These are found by running the standard forward-backward algorithm (Rabiner and Juang, 1986).

2.3.  Testing the GA-SVM Classifier

Figure 4 explains the procedure to train an SVM from a set of sequences. As with conventional SVMs, the GA-SVM requires both positive and negative datasets. In evolving the structure of the HMM, the GA-SVM only uses the positive dataset. The evolved HMM is then used to obtain the Fisher score vectors for the positive and the negative dataset by applying Equations (10) and (12). The obtained Fisher score vector is used to train the SVM. We used the SVMlight (Joachims, 1994) package with a linear kernel for the entire set of tests.

Figure 4:

A schematic of the training procedure used for GA-SVM. (1) We start from the datasets consisting of positive and negative examples. (2) The positive dataset is used to train the HMM. Alternatively we used a single-state HMM, a fully-connected HMM, and P. S. HMM instead of the evolved HMM for testing purposes. (3) For the HMM, we derive the Fisher kernel using both the positive and negative training set. (4) Finally, the SVM is trained again using the positive and negative training set.

Figure 4:

A schematic of the training procedure used for GA-SVM. (1) We start from the datasets consisting of positive and negative examples. (2) The positive dataset is used to train the HMM. Alternatively we used a single-state HMM, a fully-connected HMM, and P. S. HMM instead of the evolved HMM for testing purposes. (3) For the HMM, we derive the Fisher kernel using both the positive and negative training set. (4) Finally, the SVM is trained again using the positive and negative training set.

Fivefold cross-validation was used, where we divide the positive and negative dataset into five partitions. In this scenario, four of the partitions are used to evolve the HMM and train the SVM. The trained SVM is then tested on the remaining partition that has not been seen in training. This is repeated five times, each time retraining on a different subset of the data.

To obtain an understanding of the performance of GA-SVM, we have compared the SVM with a Fisher kernel obtained from four different sources

  1. An evolved HMM using the same set of data as the discrimination task

  2. A single-state HMM

  3. A fully connected HMM

  4. An HMM evolved for protein secondary structure prediction, which we name P. S. HMM(Won et al., 2007).

The evolved HMM has better performance than the single-state HMM and the fully connected HMM, as we would expect; however, interestingly, it has worse performance than P. S. HMM, despite the fact that P. S. HMM was evolved for solving a different problem. Using P. S. HMM to construct the Fisher kernel provides state of the art performance on the datasets we have studied.

The P. S. HMM was evolved for prediction of the secondary structure of proteins (Won et al., 2007). It is composed of 26 blocks and 52 HMM states and was evolved using 2,230 high quality structures partitioned into 236 folds. The HMM is composed of 11 self-loop blocks, six forward jump blocks, and nine linear blocks. Thus, the P. S. HMM has the general fold information of proteins, even though it did not use homologous information during its training. We used P. S. HMM to construct the Fisher kernel using the positive and negative training set as we did for the GA-SVM.

3.  Evaluation

In this section, we describe our evaluation procedure, and present data showing how well our approach performed. Details of how the parameters of the GA affected the performance of the GA-SVM are given in Section 4.

3.1.  Dataset and Metrics

To test the proposed method, we used two test sets. These were chosen because they are publicly available, and have recently published results against which we can compare the performance of the GA-SVM. The two datasets are

  1. A set of secretory protein sequences of the malaria parasite. For the negative training set, we used a set of nonsecretory protein sequences of the malaria parasite.

  2. Eighteen protein families from the ENZYME database (Bairoch, 2000), used to classify protein function (Faria et al., 2009).

Details of the datasets are given below.

To measure the classification performance, we consider four metrics: the correlation coefficient (CC), sensitivity, specificity, and positive predicative value (PPV). Denoting the number of true positive, false positive, true negative, and false negative prediction by T+, F+, T-, F-, respectively, then these four measures are defined as
formula

3.2.  Secretory Protein Sequences of Malaria Parasite

Human malaria is caused by the parasite Plasmodium falciparum. The parasites survive in their host cells by secreting various proteins. These are of considerable interest, as they are potential targets for vaccines. There have been a number of computational studies aimed at identifying secretory proteins from sequence information alone. TargetP was developed using the signal sequence at N-terminus, and the subcellular localization method to predict secretory proteins (Emanuelsson et al., 2000). Other approaches use specific signal (PEXEL) or motif (VTS) information (Marti et al., 2004; Hiller et al., 2004), but motifs are not always found. Recently, SVMs have been applied to this task by Verma et al. (2008). They used information about the composition of amino acid, split amino acid, and dipeptide to build an SVM classifier. They found that sequence composition from the N and C termini of a protein sequence is more informative to classify secretory protein than using normal amino acid composition. They divided a sequence into three parts: N terminus, C terminus, and the remained middle part and called it split amino acid compositions. Finally, to improve the performance, they used homologous sequence information.

Classification for this task is improved using homologous sequence information produced by PSI-BLAST (Altschul et al., 1997). PSI-BLAST produces a position-specific scoring matrix (PSSM) from multiple alignment of protein sequences. Though clearly beneficial, obtaining PSSM information is often costly in time, and, in many cases, homologous sequences may not be available. In our setup, we did not use PSSM information. As we will see, the use of PSSM information gives the best performance on this problem; however, using an SVM with a Fisher kernel generated from P. S. HMM gives almost equivalent performance without the need for PSSM information.

The dataset consists of 252 positive and negative examples. The average lengths of the sequences in the positive and negative set are 937.5 and 575.6, respectively.

3.2.1.  Single-State HMM

To demonstrate the discriminative power of a Fisher kernel SVM we compared the prediction results of a single-state HMM and a Fisher kernel SVM derived from it. To use the HMM as a classifier, we trained two single-state HMMs; one on positive examples and one on negative examples. The log-odds score was calculated for classifying the data. We changed the log-odds threshold to draw the receiver operator characteristic (ROC) curve, shown in Figure 5. We compared this with a Fisher kernel SVM derived from the single-state SVM whose performance is shown in the same figure.

Figure 5:

Comparison of a simple HMM classifier and a derived Fisher kernel SVM classifier.

Figure 5:

Comparison of a simple HMM classifier and a derived Fisher kernel SVM classifier.

The Fisher kernel SVM classifier significantly improved sensitivity as well as specificity in this toy example. The maximum CC of the Fisher kernel SVM classifier is 0.675, while that of the HMM classifier is below 0.4. The area under curve (AUC) of the Fisher kernel SVM and the simple HMM was 0.86 and 0.73, respectively. Note that the parameters of the HMM are used to calculate how well a sequence fits the given model in the HMM-based classifiers. In contrast, the Fisher kernel SVM method uses the variation of parameters as a set of features to discriminate the given sequence.

3.3.  A GA-SVM Classifier

We now consider evolving an HMM as the basis of a Fisher kernel SVM. We started with a random population of 30 models consisting of 15 blocks. The GA was trained for 200 iterations. The best HMM was then used to derive a Fisher kernel.

One of the evolved HMMs is illustrated in Figure 6. Each node in this figure indicates a block composed of a number of HMM states, and an arrow indicates a transition (with probability greater than 0.1) that links a block to another block or the block itself. As shown in this figure, most of the initial random transitions shrink to zero during the training process. To compare the performance of the GA-SVM with other state of the art classification methods, we calculated the sensitivity and specificity, which we show in the ROC curve in Figure 7. The AUC was 0.93, 0.90, 0.89, 0.87, and 0.90 for GA-SVM, evolved HMM, SVM(amino acid), SVM(dipeptides), SVM(split amino acid), and string kernels, respectively.

Figure 6:

One of the evolved HMMs. Each node represents a block. It is composed of 73 states (excluding the begin state) grouped into 13 blocks. Self-loop blocks, forward jump blocks, and linear blocks are illustrated with solid elliptical blocks, rectangular, and white round blocks, respectively. Each block is labeled with a block number and the number of states in the block (inside the parenthesis). Transition probabilities including the transition to the block itself are shown with the arrows. For simplicity, only transition probabilities over 0.1 are shown with an arrow. Graphviz software was used to generate this figure (Gansner and North, 2000).

Figure 6:

One of the evolved HMMs. Each node represents a block. It is composed of 73 states (excluding the begin state) grouped into 13 blocks. Self-loop blocks, forward jump blocks, and linear blocks are illustrated with solid elliptical blocks, rectangular, and white round blocks, respectively. Each block is labeled with a block number and the number of states in the block (inside the parenthesis). Transition probabilities including the transition to the block itself are shown with the arrows. For simplicity, only transition probabilities over 0.1 are shown with an arrow. Graphviz software was used to generate this figure (Gansner and North, 2000).

Figure 7:

Performance comparison of several algorithms to classify secretory protein sequences.

Figure 7:

Performance comparison of several algorithms to classify secretory protein sequences.

We compared the performance of the GA-SVM with an HMM-based classifier, string kernels, and other SVM-based methods using amino acid, dipeptide, and split amino acid composition (Verma et al., 2008). The evolved HMM outperforms the SVM of Verma et al. (2008) despite the fact that it learned only from the data with no input of expert knowledge.

Also shown in Figure 7 is the performance of using the evolved HMM as a classifier. We see that the Fisher kernel SVM has a significant advantage. To demonstrate that the good performance of the GA-SVM is due to a combination of the evolved HMM and the SVM, rather than the SVM itself, we have compared against a number of different SVMs. These include a number of SVM-based methods using amino acid, dipeptide, and split amino acid composition (Verma et al., 2008). Finally, we have compared against string kernels, which are widely regarded as powerful kernels for classifying sequence data. The shogun toolbox (Sonnenburg et al., 2006) was used for implementing this test. The performance of the string kernel in this application is highly dependent on the order and gap parameters and less so on the C and degree parameters. The SVM using the Fisher kernel outperforms the string kernels.

Finally, we tested the Fisher kernel SVM generated from the P. S. HMM which was evolved to solve an altogether different problem, namely, the prediction of secondary protein structure. Surprisingly, this SVM gave a substantial increase in performance over the GA-SVM, which used an HMM evolved to solve this particular problem. An ROC curve for these two classifiers is shown in Figure 8. A possible explanation of this is that the more demanding task of secondary protein structure prediction produced a richer HMM, and, consequently, a more expressive Fisher kernel which the SVM could exploit. This suggests that in evolving HMMs for a problem, we might, in general, do better to train them on a harder task than the discrimination problem we are trying to solve.

Figure 8:

Performance comparison of (1) SVM using PSSM, (2) GA-SVM using an evolved HMM, and (3) GA-SVM using P. S. HMM.

Figure 8:

Performance comparison of (1) SVM using PSSM, (2) GA-SVM using an evolved HMM, and (3) GA-SVM using P. S. HMM.

Also shown in Figure 8 is the performance of the SVM developed in Verma et al. (2008), which uses external homologous sequence information. As discussed above, this uses additional position specific scoring matrix (PSSM) information obtained by running PSI-BLAST (Altschul et al., 1997). This approach has slightly better performance than the evolved kernel methods. Nevertheless, using P. S. HMM as the basis of a Fisher kernel SVM gives a similar level of performance without the need of PSSM information.

3.4.  Protein Function Classification

We now describe the second dataset we tested. Classically, alignment-based algorithms using BLAST (Altschul et al., 1997) have been used to classify protein function. However, possible inaccurate sequence alignments, and propagation errors which occur by inheriting the results of the previous misalignment, cause problems for those alignment-based algorithms (Tian and Skolnick, 2003; Devos and Valencia, 2001). Recently, an SVM-based algorithm, called the peptide program, was proposed to classify protein function by Faria et al. (2009). Even though overall performance was worse than the BLAST classifier, the peptide program showed better performance than the BLAST classifier on families with a small amount of data in the ENZYME database.

We used the same positive and the negative dataset of 18 enzyme commission (EC) families that Faria et al. used, and tested the performance of the GA-SVM. For each test, we defined a positive set and a negative set. For a family (e.g., EC 1.1.1.1), positives are the proteins in the same family and negatives are all the families in the upper class except for the positive sets (e.g., EC 1.1.1.2, 1.1.1.3). Statistics of the dataset are given in Table 1. Note that we did not generally use the negative families as positive examples, so they are not necessarily shown in Table 1.

Table 1:
Statistics of the protein families in the ENZYME database used as the second dataset in our evaluation.
Positive setNegative set
DatasetNumberAverage lengthNumberAverage length
Enzyme 1.1.1.1 210 346.1 4,504 379.4 
1.1.1.25 217 306.0 4,497 381.4 
1.8.4.11 204 208.1 196 191.2 
2.1.2.10 216 369.3 1,625 379.1 
2.3.2.6 201 232.0 201 371.6 
2.5.1.55 202 281.6 3,300 324.0 
2.7.1.11 203 420.3 4,425 342.9 
2.7.1.21 207 226.2 4,221 352.3 
2.7.2.1 217 400.1 1,178 367.2 
2.7.7.27 209 422.2 6,919 738.7 
3.1.26.11 210 334.9 1,387 356.2 
3.5.4.19 219 169.7 1,937 331.1 
4.1.1.31 202 873.9 3,252 347.7 
4.2.3.4 215 389.1 611 333.7 
5.1.1.1 219 371.6 484 319.4 
5.1.1.3 204 294.6 499 352.4 
5.3.1.24 217 264.8 2,175 336.2 
6.3.4.2 206 579.5 1,338 479.6 
Positive setNegative set
DatasetNumberAverage lengthNumberAverage length
Enzyme 1.1.1.1 210 346.1 4,504 379.4 
1.1.1.25 217 306.0 4,497 381.4 
1.8.4.11 204 208.1 196 191.2 
2.1.2.10 216 369.3 1,625 379.1 
2.3.2.6 201 232.0 201 371.6 
2.5.1.55 202 281.6 3,300 324.0 
2.7.1.11 203 420.3 4,425 342.9 
2.7.1.21 207 226.2 4,221 352.3 
2.7.2.1 217 400.1 1,178 367.2 
2.7.7.27 209 422.2 6,919 738.7 
3.1.26.11 210 334.9 1,387 356.2 
3.5.4.19 219 169.7 1,937 331.1 
4.1.1.31 202 873.9 3,252 347.7 
4.2.3.4 215 389.1 611 333.7 
5.1.1.1 219 371.6 484 319.4 
5.1.1.3 204 294.6 499 352.4 
5.3.1.24 217 264.8 2,175 336.2 
6.3.4.2 206 579.5 1,338 479.6 

3.5.  Using P. S. HMM in a Fisher Kernel

We performed 18 independent classification tests on each EC family (Table 2). We first used the P. S. HMM(Won et al., 2007) to check the performance of using an SVM with Fisher kernels. We compare the performance with string kernels from the shogun toolbox (Sonnenburg et al., 2006). Table 2 compares the performance of the classifiers on the EC family dataset. In this example, the performance of the GA-SVM is higher than the peptide program and the BLAST classifier (Faria et al., 2009). This result is impressive, considering that the GA-SVM did not use additional sequence alignment information. Compared to the peptide program, whose performance correlated with the size of the dataset, the GA-SVM showed better performance in every family except for EC 5.1.1.3. Compared to the BLAST classifier, the GA-SVM showed better performance except in EC 1.1.1.1. The overall performance outranked the other classifier in sensitivity, and CC. Due to the low sensitivity, the CC of the string kernel classifier is lower than other classifiers, but showed the best PPV in these 18 experiments. The statistical test using the Friedman test (Friedman, 1937) showed significance that the p values were 1.4 , 3.7 and 5.9 for PPV, sensitivity, and CC, respectively. The post hoc test showed that the pairs with P. S. HMM were the main cause of the statistical significance (Table 3, 4, and 5). The paired t-test for the CC between the BLAST classifier and the GA-SVM showed a significant p value of 0.06.

Table 2:
Performance comparison of several classifiers to classify enzyme families. The metrics given are positive predicative value (PPV), sensitivity (SEN), and correlation coefficient (CC). We also report the average of ranks.
Peptide programBLAST classifierString kernelP. S. HMM
EC familyPPVSenCCPPVSenCCPPVSenCCPPVSenCC
1.1.1.1 94% 81% 0.87 95% 98% 0.96 97% 67% 0.80 95% 95% 0.95 
1.1.1.25 93% 84% 0.88 100% 98% 0.99 100% 50% 0.70 100% 98% 0.99 
1.8.4.11 100% 100% 1.00 98% 98% 0.95 100% 100% 1.00 100% 100% 1.00 
2.1.2.10 98% 100% 0.99 100% 98% 0.99 100% 98% 0.99 100% 100% 1.00 
2.3.2.6 100% 100% 1.00 100% 100% 1.00 100% 100% 1.00 100% 100% 1.00 
2.5.1.55 98% 98% 0.97 100% 100% 1.00 100% 100% 1.00 100% 100% 1.00 
2.7.1.11 100% 90% 0.95 100% 98% 0.99 100% 88% 0.93 100% 98% 0.99 
2.7.1.21 97% 79% 0.87 97% 90% 0.94 100% 55% 0.73 100% 93% 0.96 
2.7.2.1 100% 100% 1.00 100% 100% 1.00 98% 100% 0.99 100% 100% 1.00 
2.7.7.27 93% 88% 0.90 100% 100% 1.00 100% 93% 0.96 100% 100% 1.00 
3.1.26.11 93% 100% 0.96 100% 98% 0.99 100% 64% 0.78 100% 100% 1.00 
3.5.4.19 100% 91% 0.95 100% 100% 1.00 100% 95% 0.97 100% 100% 1.00 
4.1.1.31 98% 100% 0.99 100% 100% 1.00 100% 66% 0.80 100% 100% 1.00 
4.2.3.4 100% 98% 0.98 100% 95% 0.97 100% 98% 0.98 100% 100% 1.00 
5.1.1.1 100% 100% 1.00 100% 89% 0.92 100% 100% 1.00 100% 100% 1.00 
5.1.1.3 100% 100% 1.00 98% 100% 0.98 100% 100.0% 1.00 98% 100% 0.98 
5.3.1.24 95% 82% 0.87 96% 100% 0.98 100% 64% 0.78 98% 100% 0.99 
6.3.4.2 98% 98% 0.97 100% 100% 1.00 100% 98% 0.99 100% 100% 1.00 
Average rank 2.61 2.22 2.56 1.56 1.78 1.83 1.17 2.67 2.78 1.22 1.06 1.17 
Peptide programBLAST classifierString kernelP. S. HMM
EC familyPPVSenCCPPVSenCCPPVSenCCPPVSenCC
1.1.1.1 94% 81% 0.87 95% 98% 0.96 97% 67% 0.80 95% 95% 0.95 
1.1.1.25 93% 84% 0.88 100% 98% 0.99 100% 50% 0.70 100% 98% 0.99 
1.8.4.11 100% 100% 1.00 98% 98% 0.95 100% 100% 1.00 100% 100% 1.00 
2.1.2.10 98% 100% 0.99 100% 98% 0.99 100% 98% 0.99 100% 100% 1.00 
2.3.2.6 100% 100% 1.00 100% 100% 1.00 100% 100% 1.00 100% 100% 1.00 
2.5.1.55 98% 98% 0.97 100% 100% 1.00 100% 100% 1.00 100% 100% 1.00 
2.7.1.11 100% 90% 0.95 100% 98% 0.99 100% 88% 0.93 100% 98% 0.99 
2.7.1.21 97% 79% 0.87 97% 90% 0.94 100% 55% 0.73 100% 93% 0.96 
2.7.2.1 100% 100% 1.00 100% 100% 1.00 98% 100% 0.99 100% 100% 1.00 
2.7.7.27 93% 88% 0.90 100% 100% 1.00 100% 93% 0.96 100% 100% 1.00 
3.1.26.11 93% 100% 0.96 100% 98% 0.99 100% 64% 0.78 100% 100% 1.00 
3.5.4.19 100% 91% 0.95 100% 100% 1.00 100% 95% 0.97 100% 100% 1.00 
4.1.1.31 98% 100% 0.99 100% 100% 1.00 100% 66% 0.80 100% 100% 1.00 
4.2.3.4 100% 98% 0.98 100% 95% 0.97 100% 98% 0.98 100% 100% 1.00 
5.1.1.1 100% 100% 1.00 100% 89% 0.92 100% 100% 1.00 100% 100% 1.00 
5.1.1.3 100% 100% 1.00 98% 100% 0.98 100% 100.0% 1.00 98% 100% 0.98 
5.3.1.24 95% 82% 0.87 96% 100% 0.98 100% 64% 0.78 98% 100% 0.99 
6.3.4.2 98% 98% 0.97 100% 100% 1.00 100% 98% 0.99 100% 100% 1.00 
Average rank 2.61 2.22 2.56 1.56 1.78 1.83 1.17 2.67 2.78 1.22 1.06 1.17 
Table 3:
The p values from the post hoc analysis on PPV using the Nemenyi-Damico-Wolfe-Dunn test (Hollander and Wolfe, 1999).
Peptide programBLAST classifierString kernelP. S. HMM
Peptide program  0.09 1.3  7.2  
BLAST classifier   0.51 0.79 
String kernel    0.96 
Peptide programBLAST classifierString kernelP. S. HMM
Peptide program  0.09 1.3  7.2  
BLAST classifier   0.51 0.79 
String kernel    0.96 
Table 4:
The p values from the post hoc analysis on sensitivity using the Nemenyi-Damico-Wolfe-Dunn test (Hollander and Wolfe, 1999).
Peptide programBLAST classifierString kernelP. S. HMM
Peptide program  0.64 0.64 0.025 
BLAST classifier   0.08 0.35 
String kernel    5.6  
Peptide programBLAST classifierString kernelP. S. HMM
Peptide program  0.64 0.64 0.025 
BLAST classifier   0.08 0.35 
String kernel    5.6  
Table 5:
The p values from the post hoc analysis on CC using the Nemenyi-Damico-Wolfe-Dunn test (Hollander and Wolfe, 1999).
Peptide programBLAST classifierString kernelP. S. HMM
Peptide program  0.33 0.93 5.4  
BLAST classifier   0.10 0.37 
String kernel    6.8  
Peptide programBLAST classifierString kernelP. S. HMM
Peptide program  0.33 0.93 5.4  
BLAST classifier   0.10 0.37 
String kernel    6.8  

We have also tested the performance of a GA-SVM using an HMM evolved on the same dataset we use for training the SVM. Once again the performance, although satisfactory, is not as good as using the P. S. HMM. We show the correlation coefficient for the Fisher kernel SVMs using P. S. HMM and the evolve HMM in Table 6 (columns 2 and 3).

Table 6:
Performance (CC) comparison of a Fisher kernel HMM using: the P. S. HMM, an evolved HMM, and the P. S. HMM with random emission parameters. Note that column 2 reports the same data as given in the last column of Table 2.
EC familyP. S. HMMEvolved HMMP. S. HMM random
1.1.1.1 0.95 0.88 0.86 
1.1.1.25 0.99 0.94 0.86 
1.8.4.11 1.00 1.00 0.98 
2.1.2.10 1.00 1.00 0.85 
2.3.2.6 1.00 0.98 0.93 
2.5.1.55 1.00 1.00 0.85 
2.7.1.11 0.99 0.97 0.89 
2.7.1.21 0.96 0.96 0.97 
2.7.2.1 1.00 1.00 0.86 
2.7.7.27 1.00 1.00 0.33 
3.1.26.11 1.00 0.97 0.92 
3.5.4.19 1.00 0.96 0.87 
4.1.1.31 1.00 1.00 0.97 
4.2.3.4 1.00 0.98 0.94 
5.1.1.1 1.00 0.98 0.77 
5.1.1.3 0.98 1.00 0.79 
5.3.1.24 0.99 1.00 0.60 
6.3.4.2 1.00 1.00 0.93 
Average rank 1.17 1.50 2.89 
EC familyP. S. HMMEvolved HMMP. S. HMM random
1.1.1.1 0.95 0.88 0.86 
1.1.1.25 0.99 0.94 0.86 
1.8.4.11 1.00 1.00 0.98 
2.1.2.10 1.00 1.00 0.85 
2.3.2.6 1.00 0.98 0.93 
2.5.1.55 1.00 1.00 0.85 
2.7.1.11 0.99 0.97 0.89 
2.7.1.21 0.96 0.96 0.97 
2.7.2.1 1.00 1.00 0.86 
2.7.7.27 1.00 1.00 0.33 
3.1.26.11 1.00 0.97 0.92 
3.5.4.19 1.00 0.96 0.87 
4.1.1.31 1.00 1.00 0.97 
4.2.3.4 1.00 0.98 0.94 
5.1.1.1 1.00 0.98 0.77 
5.1.1.3 0.98 1.00 0.79 
5.3.1.24 0.99 1.00 0.60 
6.3.4.2 1.00 1.00 0.93 
Average rank 1.17 1.50 2.89 

In addition, to show that the parameters learned by the HMMs are important, we created a Fisher kernel HMM using the P. S. HMM structure but with the emission probabilities randomly set to a value between 0 and 1. This is shown in the last column of Table 6. Clearly, the HMM with randomized emission probability yielded significantly worse results, demonstrating that the parameters that are learned are as important as the structure of the HMM. The differences in the performance are significant (Friedman's p value is 5.3 ) and the post hoc analysis (Hollander and Wolfe, 1999) showed that the differences between P. S. HMM random and evolved HMM (p value: 5.7 ) and P. S. HMM and between P. S. HMM random (p value: 5.1 ) were significant.

4.  Evaluating the Evolutionary Procedure

In this section, we study how the performance obtained by the GA-SVM depends on the GA. To begin with, we investigated the optimal number of blocks to train the secretory protein dataset. We tested the performance while increasing the number of blocks from 3 to 15. For each block, we performed 30 independent tests. The test showed significant differences when the number of blocks is larger than or equal to 6 (Figure 9), suggesting that the minimum number of blocks required to obtain good classification performance for the secretory protein dataset is six. To test the significance of changing the number of blocks, we performed a student t-test where we paired the data in an ordered way (e.g., comparing two blocks with three blocks, three blocks with four blocks, etc.) and checked their p values. The Friedman test (Friedman, 1937) showed a significant p value of 4.7 . The post hoc test showed the statistical significance was observed from six blocks (three blocks versus six blocks has a p value of 8.3 ). Also, we compared the performance with the Fisher kernel obtained from a fully connected HMM. The HMM parameters were obtained using a conventional evolutionary algorithm representing a set of HMM parameters in a string. The evolutionary algorithm was run with a population size of 40, and with the number of states equal to 15. As shown in Figure 9, the performance using the fully connected model was significantly worse (adjusted p value using Bonferroni correction, Holm, 1979, is 1.4 compared to the test using three blocks). The performance was not improved with more states (data not shown here). These results show that a guided training, such as using the block scheme, is necessary to evolve a good generative model.

Figure 9:

The performance measured for different numbers of blocks. We also compared the performance with the Fisher kernel from a fully connected HMM whose parameters were obtained by the evolutionary algorithm (labeled Ev).

Figure 9:

The performance measured for different numbers of blocks. We also compared the performance with the Fisher kernel from a fully connected HMM whose parameters were obtained by the evolutionary algorithm (labeled Ev).

Additionally, we investigated whether the choice of the initial population changed the performance. We did this, for example, by limiting the initial population to contain only self-loop blocks or only forward loop blocks. We found that this did not affect the performance significantly (data not shown). This suggests that the evolutionary process is sufficiently robust to find a good solution irrespective of the starting population.

To further study if the evolved HMM (shown in Figure 10) learned any biological meaning, we investigated the distribution of amino acids in each block. The average emission probabilities for each block is shown in Figure 11. Interestingly, the distribution of block 4 showed K(Lys) as the most enriched amino acid, followed by N(Asn) and E(Glu), as well as depletion of H(His), M(Met), and W(Try). This strikingly resembles the amino acid distribution in the secretory protein sequence (the profile is shown in Verma et al., 2008). Block 4 is the block mostly used in the decoded results (25%), suggesting that it is used as a background that shows the general properties of secretory protein sequences. Block 1 is similar to block 4, but the other blocks showed distinct distribution patterns, which may contribute in classifying the sequence. The amino acids N(Asn) and E(Glu) have particularly high emission probabilities in blocks 3 and 6, which may reflect the property of the sequences that are not well represented in the averaged profile.

Figure 10:

An example of an evolved HMM with six blocks (transitions between blocks are not shown). Self-loop, forward jump, and linear blocks are illustrated with shaded elliptical, rectangular, and white elliptical shapes, respectively. We also show the block number and the number of states in each block (given in parenthesis).

Figure 10:

An example of an evolved HMM with six blocks (transitions between blocks are not shown). Self-loop, forward jump, and linear blocks are illustrated with shaded elliptical, rectangular, and white elliptical shapes, respectively. We also show the block number and the number of states in each block (given in parenthesis).

Figure 11:

The distribution of amino acids for the trained six-block HMM shown in Figure 10.

Figure 11:

The distribution of amino acids for the trained six-block HMM shown in Figure 10.

5.  Conclusions

This paper investigates the use of evolved HMMs as the basis of Fisher kernels. Using evolved kernels solves a limitation of the Fisher kernel method, namely the need of obtaining a suitable generative model of the problem being investigated. We have demonstrated that SVMs using kernels from evolved HMMs have better performance than other SVMs designed specifically to solve the same problem. The classifiers were constructed without the need for extensive tuning of a complex model by an expert. Domain knowledge is, however, able to be automatically captured into the model via the specification of the block structure used to evolve the topology of the HMM. This is a more intuitive way to capture expert knowledge and leaves the complex relationships between these elements to be discovered from the data by the evolutionary process. Exactly the same block structure has been successfully used in a variety of different bioinformatics problems (Won et al., 2006, 2008). Although it was not specially tailored for this application, using this block structure clearly captures elements of biological knowledge as it provides substantial improvement in performance compared to using other HMMs.

Evolutionary strategies have been widely used for classifying various types of data (Hruschka et al., 2009) including biological data (Pal et al., 2006). Recently, ant colony optimization algorithms (ACO) have been used for protein function prediction (Otero et al., 2010). Our approach suggests a possible way to effectively merge evolutionary strategies to sequence analysis. As demonstrated in the case studies, the proposed hybridized model not only showed good performance but also explained the biological grammar it learned from the data, which is hard to achieve by evolving an entire kernel for classification, as shown in Sullivan and Luke (2007).

We have demonstrated the effectiveness of this approach on two datasets. An advantage of using the Fisher kernel in an SVM is that we are not forced to consider only a short subsequence, as is necessary when using some string kernels. This approach can, therefore, be used more flexibly than most other string kernels. In comparison with traditional HMM-based classifiers, we gain an improvement in generalization performance provided by the SVM. The proposed scheme is well organized in selecting the generative model and enhancing the discriminative power of the model.

We note that our approach has some drawbacks. In particular, it takes more time to run than string kernels predominantly because of the running time to evolve an HMM. The high-dimensionality of the features also makes it slower than a string kernel—the size of the kernel being (*) + (*), where is the number of states, and is the number of symbols, given a fully connected HMM. Nevertheless, the evolutionary process converges quickly—usually within 20 evolutionary iterations. We believe that the GA-SVM could also be successfully applied in other application areas where domain knowledge about partial relationships between elements of the data is available. It is interesting to note that the observed distribution of amino acids in parts of the trained HMM model is close to that found in nature. This demonstrates that the evolutionary method learns domain specific knowledge during its evolution from the raw data alone.

The toy example using a single-state HMM showed the effectiveness of using the Fisher score in discriminating biological sequences, compared with using the HMM alone. Also, the GA-SVM generated from an HMM with randomized emission probability showed the importance of the learned parameters in the generative model. We believe that both learning an HMM using a GA and using it by turning it into a Fisher kernel are essential to the success of the proposed approach.

As shown in our evaluation, the Fisher kernel obtained from the P. S. HMM(Won et al., 2007) showed improved performance over a kernel using an HMM evolved on the datasets of interest. In the enzyme family classification task, it beat the performance of the BLAST classifier which used sequence alignment. We believe that the success of P. S. HMM is because it included general protein family/fold information. This general prior knowledge about proteins is then utilized by the SVM for a specific classification task. This suggests a possible future direction of research is to discover wider classes of problems which can be used to train the HMM so that it contains a richer set of features, which can be exploited in the Fisher kernel. However, the results presented here demonstrate that even without any further enhancement we have a powerful method for obtaining good classification performance, without the need of specific domain knowledge or time-consuming sequence alignment information.1

References

Akaike
,
H.
(
1994
).
Information theory and extension of the maximum likelihood principle
. In
Second International Symposium on Information Theory
, pp.
267
281
.
Altschul
,
S.
,
Madden
,
T.
,
Schaffer
,
A.
,
Zhang
,
J.
,
Zhang
,
Z.
,
Miller
,
W.
, and
Lipman
,
D.
(
1997
).
Gapped BLAST and PSI-BLAST: A new generation of protein database search programs
.
Nucleic Acids Research
,
24
:
3389
3402
.
Bairoch
,
A.
(
2000
).
The ENZYME database in 2000
.
Nucleic Acids Research
,
28
:
304
305
.
Chau
,
C. W.
,
Kwong
,
S.
,
Diu
,
C. K.
, and
Fahrner
,
W. R.
(
1997
).
Optimization of HMM by a genetic algorithm
. In
International Conference on Acoustics, Speech and Signal Processing
, pp.
1727
1730
.
Dempster
,
A. P.
,
Laird
,
N. M.
, and
Rubin
,
D. B.
(
1977
).
Maximum likelihood from incomplete data via the EM algorithm
.
Royal Statistical Society B
,
39
:(
1
38
).
Devos
,
D.
, and
Valencia
,
A.
(
2001
).
Intrinsic errors in genome annotation
.
Trends in Genetics
,
17
(
8
):
429
431
.
Durbin
,
R. M.
,
Eddy
,
S. R.
,
Krogh
,
A.
, and
Mitchison
,
G.
(
1998
).
Biological sequence analysis.
Cambridge, UK
:
Cambridge University Press
.
Emanuelsson
,
O.
,
Nielsen
,
H.
,
Brunak
,
S.
, and
Heijne
,
G.
(
2000
).
Predicting subcelluar localization of proteins based on their N-terminal amino acid sequence
.
Journal of Molecular Biology
,
300
:
1005
1016
.
Faria
,
D.
,
Ferreira
,
A. E.
, and
Falcao
,
A. O.
(
2009
).
Enzyme classification with peptide programs: A comparative study
.
BMC Bioinformatics
,
10
(
231
),
available at
http://www.biomedcentral.com/1471-2105/10/231
Friedman
,
M.
(
1937
).
The use of ranks to avoid the assumption of normality implicit in the analysis of variance
.
Journal of the American Statistical Association
,
32
(
200
):
675
701
.
Fujiwara
,
Y.
,
Asogawa
,
M.
, and
Konagaya
,
A.
(
1995
).
Motif extraction using an improved iterative duplication method for HMM topology learning
. In
Pacific Symposium on Biocomputing ’96
, pp.
713
714
.
Gansner
,
E. R.
, and
North
,
S. C.
(
2000
).
An open graph visualization system and its applications to software engineering
.
Software—Practice and Experience
,
30
(
11
):
1203
1233
.
Hiller
,
N.,
Bhattacharjee
,
S.
,
van Ooij
,
C.
,
Liolios
,
K.
,
Harrison
,
T.
,
Lopez-Estraño
,
C.
, and
Haldar
,
K.
(
2004
).
A host-targeting signal in virulence proteins reveals a secretome in malarial infection
.
Science
,
306
:
1934
1937
.
Hollander
,
M.
, and
Wolfe
,
D. A.
(
1999
).
Nonparametric statistical method
, 2nd ed.
New York
:
Wiley-Interscience
.
Holm
,
S.
(
1979
).
A simple sequentially rejective multiple test procedure
.
Scandinavian Journal of Statistics
,
6
:
65
70
.
Hruschka
,
E. R.
,
Campello
,
R. J. G. B.
,
Freitas
,
A. A.
, and
de Carvalho
,
A. C. P. L. F.
(
2009
).
A survey of evolutionary algorithms for clustering
.
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
,
39
(
2
):
133
155
.
Jaakkola
,
T.
,
Diekhans
,
M.
, and
Haussler
,
D.
(
1999
).
Using the Fisher kernel method to detect remote protein homologies
. In
Proceedings of the International Conference on Intelligent Systems Molecular Biology
, volume
7
, pp.
149
158
.
Jaakkola
,
T.
,
Diekhans
,
M.
, and
Haussler
,
D.
(
2000
).
A discriminative framework for detecting remote protein homologies
.
Journal of Computational Biology
,
7
:
95
114
.
Joachims
,
T.
(
1994
).
Making large-scale support vector machine learning practical
.
Cambridge, MA
:
MIT Press
.
Karplus
,
K.
,
Barret
,
C.
, and
Hughey
,
R.
(
1998
).
Hidden Markov models for detecting remote protein homologies
.
Bioinformatics
,
14
(
10
):
846
856
.
Leslie
,
C. S.
,
Eskin
,
E.
,
Cohen
,
A.
,
Weston
,
J.
, and
Noble
,
W. S.
(
2004
).
Mismatch string kernels for discriminative protein classification
.
Bioinformatics
,
20
:
467
476
.
Leslie
,
C. S.
,
Eskin
,
E.
, and
Noble
,
W. S.
(
2002
).
The spectrum kernel: A string kernel for SVM protein classification
. In
Pacific Symposium on Biocomputing
, pp.
566
575
.
Liao
,
L.
, and
Noble
,
W. S.
(
2003
).
Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships
.
Journal of Computational Biology
,
10
:
857
868
.
Marti
,
M.
,
Good
,
R.
,
Rug
,
M.
,
Knuepfer
,
E.
, and
Cowman
,
A.F.
(
2004
).
Targeting malaria virulence and remodeling proteins to the host erythrocyte
.
Science
,
306
:
1930
1933
.
Otero
,
F. E. B.
,
Freitas
,
A. A.
, and
Johnson
,
C. G.
(
2010
).
A hierarchical multi-label classification ant colony algorithm for protein function prediction
.
Memetic Computing
, pp.
165
181
.
Pal
,
S. K.
,
Bandyopadhyay
,
S.
, and
Ray
,
S. S.
(
2006
).
Evolutionary computation in bioinformatics: A review
.
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
,
36
(
5
):
601
615
.
Petersen
,
L.
,
Larsen
,
T.
,
Ussery
,
D.
,
On
,
S.
, and
Krogh
,
A.
(
2003
).
RpoD promoters in Campylobacter jejuni exhibit a strong periodic signal instead of a -35 box
.
Journal of Molecular Biology
,
326
:
1361
1372
.
Rabiner
,
L. R.
(
1989
).
A tutorial on hidden Markov models and selected applications in speech recognition
.
Proceedings of the IEEE
,
77
(
2
):
257
286
.
Rabiner
,
L. R.
, and
Juang
,
B. H.
(
1986
).
An introduction to hidden Markov models
.
IEEE ASSP Magazine
,
3
(
1
):
4
16
.
Ratsch
,
G.
, and
Sonnenburg
,
S.
(
2004
).
Accurate splice site detection for caenorhabditis elegans
.
Cambridge, MA
:
MIT Press
.
Rätsch
,
G.
,
Sonnenburg
,
S.
, and
Schäfer
,
C.
(
2006
).
Learning interpretable SVMs for biological sequence classification
. In
BMC Bioinformatics Supplement
,
7
:
S9
.
Sonnenburg
,
S.
,
Rätsch
,
G.
,
Jagota
,
A.
, and
Müller
,
K.-R.
(
2002
).
New methods for splice site recognition
.
Lecture Notes in Computer Science
,
2415
:
329
336
.
Sonnenburg
,
S.
,
Rätsch
,
G.
,
Schaefer
,
C.
, and
Schoelkopf
,
B.
(
2006
).
Large scale multiple kernel learning
.
Journal of Machine Learning Research
,
7
:
1531
1565
.
Spall
,
C.
(
2005
).
Monte Carlo computation of the Fisher information matrix in nonstandard settings
.
Journal of Computational and Graphical Statistics
,
14
:
889
909
.
Stolcke
,
A.
(
1994
).
Bayesian learning of probabilistic language models
.
PhD thesis, University of California at Berkeley
.
Sullivan
,
K.
, and
Luke
,
S.
(
2007
).
Evolving kernels for support vector machine classification
. In
GECCO
, pp.
1702
1707
.
Tian
,
W.
, and
Skolnick
,
J.
(
2003
).
How well is enzyme function conserved as a function of pairwise sequence identity?
Journal of Molecular Biology
,
333
:
863
882
.
Tsuda
,
K.
,
Kawanabe
,
M.
,
Rätsch
,
G.
,
Sonnenburg
,
S.
, and
Müller
,
K.-R.
(
2002
).
A new discriminative kernel from probabilistic models
.
Neural Computation
,
14
:
2397
2414
.
Verma
,
R.
,
Tiwari
,
A.
,
Kaur
,
S.
,
Varshney
,
G.
, and
Raghava
,
G.
(
2008
).
Identification of proteins secreted by malaria parasite into erythrocyte using SVM and PSSM profiles
.
BMC Bioinformatics
,
9
(
210
),
available at
http://www.biomedcentral.com/1471-2105/9/201
Won
,
K.-J.
,
Hamelryck
,
T.
,
Prügel-Bennett
,
A.
, and
Krogh
,
A.
(
2007
).
An evolving method for learning HMM structure: Prediction of protein secondary structure
. In
BMC Bioinformatics
,
8
:
357
,
available at
http://www.biomedcentral.com/1471-2105/8/357
Won
,
K.-J.
,
Prügel-Bennett
,
A.
, and
Krogh
,
A.
(
2004
).
Training HMM structure with genetic algorithm for biological sequence analysis
.
Bioinformatics
,
20
:
3613
3619
.
Won
,
K.-J.
,
Prügel-Bennett
,
A.
, and
Krogh
,
A.
(
2006
).
Evolving the structure of hidden Markov models
.
IEEE Transactions on Evolutionary Computation
,
10
(
1
):
39
49
.
Won
,
K.-J.
,
Sandelin
,
A.
,
Marstrand
,
T. T.
, and
Krogh
,
A.
(
2008
).
Modeling promoter grammars with evolving hidden Markov models
.
Bioinformatics
,
24
(
15
):
1669
1675
.
Yada
,
T.
,
Ishikawa
,
M.
,
Tanaka
,
H.
, and
Asai
,
K.
(
1994
).
DNA sequence analysis using hidden Markov model and genetic algorithm
. In
Genome Informatics
,
5
:
178
179
.
Yang
,
Z. R.
(
2004
).
Biological applications of support vector machines
.
Briefings in Bioinformatics
,
5
(
4
):
328
338
.

Note

Author notes

*Corresponding Author.