Skip to Main Content
Table 5 
Feature set for our axiom alignment model. The features are based on content and typography.
Content Unigram, Bigram, Dependency and Entity Overlap Real valued features that compute the proportion of common unigrams, bigrams, dependencies, and geometry entities (constants, predicates, and functions) across the two axioms. When comparing geometric entities, we include geometric entities derived from the associated diagrams when available. 
Longest Common Subsequence Real valued feature that computes the length of longest common sub-sequence of words between two axiom mentions normalized by the total number of words in the two mentions. 
Number of discourse elements Real valued feature that computes the absolute difference in the number of discourse elements in the two mentions. 
Alignment Scores We use an off-the-shelf monolingual word aligner—JACANA (Yao et al. 2013) pretrained on PPDB—and compute alignment score between axiom mentions as the feature. 
MT Metrics We use two common MT evaluation metrics METEOR (Denkowski and Lavie 2010) and MAXSIM (Chan and Ng 2008), and use the evaluation scores as features. While METEOR computes n-gram overlaps controlling on precision and recall, MAXSIM performs bipartite graph matching and maps each word in one axiom to at most one word in the other. 
Summarization Metrics We also use Rouge-S (Lin 2004), a text summarization metric, and use the evaluation score as a feature. Rouge-S is based on skip-grams. 
 
Discourse (Typography) JSON structure Indicator matching the current (and parent) node of axiom mentions in respective JSON hierarchies; i.e., are both nodes mentioned as axioms, diagrams or bounding boxes? 
Equation Template Indicator feature that matches templates of equations detected in the axiom mentions. The template matcher is designed such that it identifies various rewritings of the same axiom equation, e.g., PA × PB = PT2 and PA × PB = PC2 could refer to the same axiom with point T in one axiom mention being point C in another mention. 
Image Caption Proportion of common unigrams in the image captions of the diagrams associated with the axiom mentions. If both mentions do not have associated diagrams, this feature does not fire. 
Content Unigram, Bigram, Dependency and Entity Overlap Real valued features that compute the proportion of common unigrams, bigrams, dependencies, and geometry entities (constants, predicates, and functions) across the two axioms. When comparing geometric entities, we include geometric entities derived from the associated diagrams when available. 
Longest Common Subsequence Real valued feature that computes the length of longest common sub-sequence of words between two axiom mentions normalized by the total number of words in the two mentions. 
Number of discourse elements Real valued feature that computes the absolute difference in the number of discourse elements in the two mentions. 
Alignment Scores We use an off-the-shelf monolingual word aligner—JACANA (Yao et al. 2013) pretrained on PPDB—and compute alignment score between axiom mentions as the feature. 
MT Metrics We use two common MT evaluation metrics METEOR (Denkowski and Lavie 2010) and MAXSIM (Chan and Ng 2008), and use the evaluation scores as features. While METEOR computes n-gram overlaps controlling on precision and recall, MAXSIM performs bipartite graph matching and maps each word in one axiom to at most one word in the other. 
Summarization Metrics We also use Rouge-S (Lin 2004), a text summarization metric, and use the evaluation score as a feature. Rouge-S is based on skip-grams. 
 
Discourse (Typography) JSON structure Indicator matching the current (and parent) node of axiom mentions in respective JSON hierarchies; i.e., are both nodes mentioned as axioms, diagrams or bounding boxes? 
Equation Template Indicator feature that matches templates of equations detected in the axiom mentions. The template matcher is designed such that it identifies various rewritings of the same axiom equation, e.g., PA × PB = PT2 and PA × PB = PC2 could refer to the same axiom with point T in one axiom mention being point C in another mention. 
Image Caption Proportion of common unigrams in the image captions of the diagrams associated with the axiom mentions. If both mentions do not have associated diagrams, this feature does not fire. 
Close Modal

or Create an Account

Close Modal
Close Modal