Skip to Main Content
Table 5.

Citation purpose and polarity classification schemes

PaperClassification schemeData setImportant findings
Garzone and Mercer (2000)  (1) Negational—7 classes 14 journal articles from Physics (8) and Biochemistry (6) • Poor performance of classifier on unseen Physics articles (less well-structured), compared to Biochemistry articles (more well-structured) 
(2) Affirmational—5 classes 
(3) Assumptive—4 classes 
(4) Tentative—1 class 
(5) Methodological—5 classes 
(6) Interpretational/Developmental—3 classes 
(7) Future Research—1 class 
(8) Use of Conceptual Material—2 classes 
(9) Contrastive—2 classes 
(10) Reader Alert—4 classes 
Nanba et al. (2000)  (1) Type B—Basis 395 papers in Computational Linguistics (e-print archive) • Performance of the classifier solely depends on the cue phrases, absence of which causes wrong prediction 
(2) Type C—Comparison or Contrast 
(3) Type O—Other 
Pham and Hoffmann (2003)  (1) Basis 482 citation contexts and 150 unseen citation contexts • Incremental knowledge acquisition using the tool KAFTAN for citation classification 
(2) Support 
(3) Limitation 
(4) Comparison 
Teufel et al. (2006a, b)  (1) Weakness of cited approach—Weak—3.1% 116 articles and 2,829 citation instances from articles in Computational Linguistics (e-print archive) • 60% of instances belong to neutral class 
(2) Contrast/Comparison in Goals/Methods (neutral)—CoCoGM—3.9% • Low frequency of negative citations 
(3) Contrast/Comparison in Results (neutral)—CoCoR0—0.8% 
(4) Unfavorable Contrast/Comparison—CoCo—1.0% 
(5) Contrast between two cited methods—CoCoXY—2.9% 
(6) Author uses cited work as starting point—PBas—1.5% 
(7) Author uses tools/algorithms/data—PUse—15.8% 
(8) Author adapts or modifies tools/algorithms/data—PModi—1.6% 
(9) Citation is positive about approach or problem addressed—PMot—2.2% 
(10) Author’s work and cited work are similar—PSim—3.8% 
(11) Author’s work and cited work are compatible/ provide support for each other—PSup—1.1% 
(12) Neutral description/not enough textual evidence/unlisted citation function—Neut—62.7% 
Le et al. (2006)  (1) Paper is based on the cited work 811 citing areas in 9000 papers from ACM Digital Library and Science Direct • Use of finite-state machines for citation type recognition does not require domain experts or knowledge about cue phrases 
(2) Paper is a part of the cited work 
(3) Cited work supports this work 
(4) Paper points out problems or gaps in the cited work 
(5) Cited work is compared with the current work 
(6) Other citations 
Agarwal et al. (2010)  (1) Background/Perfunctory 1,710 sentences from 43 open-access full text biomedical articles • Model performed less on classes, Evaluation, Explanation & Similarity/Consistency 
(2) Contemporary, (3) Contrast/Conflict • Infrequent keywords not recognized by model 
(4) Evaluation, (5) Explanation 
(6) Method, (7) Modality 
(8) Similarity/Consistency 
Shotton (2010)  Factual: (1) cites, (2) citesAsAuthority, (3) isCitedBy, (4) citesAsMetadataDocument, (5) citesAsSourceDocument, (6) citesForInformation, (7) obtainsBackgroundFrom, (8) sharesAuthorsWith, (9) usesDataFrom, (10) usesMethodIn Ontology developed for Biomedical articles • OWL-based tool, CiTO for characterizing the nature of citations 
RhetoricalPositive: (1) confirms, (2) credits, (3) updates, (4) extends, (5) obtainsSupportFrom, (6) supports 
RhetoricalNegative: (1) corrects, (2) critiques, (3) disagreesWith, (4) qualifies, (5) refutes 
RhetoricalNeutral: (1) discusses, (2) reviews 
Dong and Schäfer (2011)  (1) Background—65.04% 1768 instances & 122 papers from ACL Anthology (2007 and 2008) • Use of Ensemble-style self-training reduces the manual annotation work 
(2) Fundamental idea—23.80% 
(3) Technical basis—7.18% 
(4) Comparison—3.95% 
Jochim and Schütze (2012)  (1) Conceptual—89.2% vs. Operational—10.8% 84 papers and 2008 citation from papers in 2004 ACL Proceedings (ARC) • Annotation of four facets using Moravscik’s scheme instead of a single label 
(2) Organic—10.1% vs. Perfunctory—89.9% 
(3) Evolutionary—89.8% vs. Juxtapositional—10.2% 
(4) Confirmative—91.4% vs. Negational—8.6% 
Abu-Jbara et al. (2013)  Purpose: (1) Criticizing—14.7% 3,271 instances from 30 papers in ACL Anthology Network (AAN) • 47% of citations belong to the class Neutral 
(2) Comparison—8.5% • Citation Purpose classification Macro-Fscore: 58.0% 
(3) Use—17.7% 
(4) Substantiating—7% 
(5) Basis—5% 
(6) Neutral—47% 
Polarity: (1) Positive—30% 
(2) Negative—12% 
(3) Neutral—58% 
Xu et al. (2013)  (1) Functional—48.4% ACL Anthology Network corpus (AAN) • Self-citations are skewed to the class Functional 
(2) Perfunctory—50% • Authors citing more has more functional citations 
(3) Fallback—1.6% 
Li et al. (2013)  (1) Based on—2.8% 91 Biomedical articles and 6,355 citation instances from Biomedical articles (PubMed) • Coarse-grained sentiment classification performs only slightly better than fine-grained citation function classification 
(2) Corroboration—3.6% 
(3) Discover—12.3% 
(4) Positive—0.1% 
(5) Practical—1% 
(6) Significant—0.6% 
(7) Standard—0.2% 
(8) Supply—1.2% 
(9) Contrast—0.6% 
(10) Cocitation—33.3% 
(11) Neutral, (12) Negative—(Omitted both these categories) 
Hernández-Álvarez et al. (2016)  Purpose: (1) Use—(a) Based on, Supply—16.1% 2,092 citations in 85 papers from ACL Anthology Network (AAN) • Classes Acknowledge and Useful dominate the data distribution for purpose classification 
(b) Useful—33.7% • Neutral class has more than 50% of instances 
(2) Background—(c) Acknowledge/Corroboration/Debate—37.4% 
(3) Comparison—(d) Contrast—5.3% 
(4) Critique—(e)Weakness—6% 
(f) Hedges—1.8% 
Polarity: (1) Positive—28.7% 
(2) Negative—9.7%, (3) Neutral—64.7% 
Munkhdalai, Lalor, and Yu (2016)  Function: (1) Background—30.5%, 20.5% Data 1—3,422 (Function), 3,624 (Polarity) citations • Majority of citations annotated as results and findings 
(2) Method—23.9%, 18.2% Data 2—4,426(Function), 4,423(Polarity) citations from 2,500 randomly selected PubMed Central articles • Bias of citations towards positive statements 
(3) Results/findings—45.3%, 38.3% 
(4) Don’t know—0.1%, 0.06% 
Polarity: (1) Negational—4.8%, 2.6% 
(2) Confirmative—75%, 59.8% 
(3) Neutral—19.8%, 19% 
(4) Don’t know—0.2%,0.1% 
Fisas et al. (2016)  (1) Criticism—23%: (a) Weakness, (b) Strength, (c) Evaluation, (d) Other 10,780 sentences from 40 papers in Computer Graphics • A multilayered corpus with sentences annotated for (1) Citation purpose, (2) features to detect scientific discourse and (3) Relevance for summary 
(2) Comparison—9%: (a) Similarity, (b) Difference 
(3) Use—11%: (a) Method, (b) Data, (c) Tool, (d) Other 
(4) Substantiation—1% 
(5) Basis—5%: (a) Previous own Work, (b) Others work, (c) Future Work 
(6) Neutral—53%: (a) Description, (b) Ref. for more information, (c) Common Practices, (d) Other 
Jha et al. (2017)  Same as Abu-Jbara et al. (2013)  3500 citations in 30 papers from ACL Anthology Network (AAN) • Developed data sets for reference scope detection and citation context detection 
• Comprehensive study aimed at applications of citation classification 
Lauscher et al. (2017)  Same as Abu-Jbara et al. (2013)  Data sets from Abu-Jbara et al. (2013) and Jha et al. (2017)  • Heavy skewness of data set towards less informative classes for both schemes 
• Use of domain-specific embeddings does not enhance results 
Jurgens et al. (2018)  (1) Background—51.8% 1,969 instances from ACL-Anthology Reference Corpus (ACL-ARC) • Majority of instances belong to class Background 
(2) Uses—18.5% • Error analysis shows the importance of citation context identification for result improvement 
(3) Compares or Contrasts—17.5% 
(4) Motivation—4.9% 
(5) Continuation—3.7% 
(6) Future—3.6% 
Su, Prasad et al. (2019)  (1) Weakness—2.2% ACL-ARC Computational Linguistics • Highly skewed data set with majority of instances belonging to Neutral class 
(2) Compare and Contrast—6.6% • Use of Multitask learning for citation function and provenance detection 
(3) Positive—20.6% 
(4) Neutral—70.6% 
Cohan et al. (2019)  (1) Background—58% 6,627 papers and 11,020 instances from Semantic Scholar (Computer Science & Medicine) • Introduction of new data set known as SciCite 
(2) Method—29% • The best state-of-the-art macro-fscore obtained using BiLSTM attention with ELMO vector & structural scaffolds 
(3) Result Comparison—13% 
Pride, Knoth, and Harag (2019)  (1) Background—54.61% Multidisciplinary data set of 11,233 instances from CORE • Largest multidisciplinary author annotated data set 
(2) Uses—15.51% 
(3) Compares/Contrasts—12.05% 
(4) Motivation—9.92% 
(5) Extension—6.22%, (6) Future—1.7% 
PaperClassification schemeData setImportant findings
Garzone and Mercer (2000)  (1) Negational—7 classes 14 journal articles from Physics (8) and Biochemistry (6) • Poor performance of classifier on unseen Physics articles (less well-structured), compared to Biochemistry articles (more well-structured) 
(2) Affirmational—5 classes 
(3) Assumptive—4 classes 
(4) Tentative—1 class 
(5) Methodological—5 classes 
(6) Interpretational/Developmental—3 classes 
(7) Future Research—1 class 
(8) Use of Conceptual Material—2 classes 
(9) Contrastive—2 classes 
(10) Reader Alert—4 classes 
Nanba et al. (2000)  (1) Type B—Basis 395 papers in Computational Linguistics (e-print archive) • Performance of the classifier solely depends on the cue phrases, absence of which causes wrong prediction 
(2) Type C—Comparison or Contrast 
(3) Type O—Other 
Pham and Hoffmann (2003)  (1) Basis 482 citation contexts and 150 unseen citation contexts • Incremental knowledge acquisition using the tool KAFTAN for citation classification 
(2) Support 
(3) Limitation 
(4) Comparison 
Teufel et al. (2006a, b)  (1) Weakness of cited approach—Weak—3.1% 116 articles and 2,829 citation instances from articles in Computational Linguistics (e-print archive) • 60% of instances belong to neutral class 
(2) Contrast/Comparison in Goals/Methods (neutral)—CoCoGM—3.9% • Low frequency of negative citations 
(3) Contrast/Comparison in Results (neutral)—CoCoR0—0.8% 
(4) Unfavorable Contrast/Comparison—CoCo—1.0% 
(5) Contrast between two cited methods—CoCoXY—2.9% 
(6) Author uses cited work as starting point—PBas—1.5% 
(7) Author uses tools/algorithms/data—PUse—15.8% 
(8) Author adapts or modifies tools/algorithms/data—PModi—1.6% 
(9) Citation is positive about approach or problem addressed—PMot—2.2% 
(10) Author’s work and cited work are similar—PSim—3.8% 
(11) Author’s work and cited work are compatible/ provide support for each other—PSup—1.1% 
(12) Neutral description/not enough textual evidence/unlisted citation function—Neut—62.7% 
Le et al. (2006)  (1) Paper is based on the cited work 811 citing areas in 9000 papers from ACM Digital Library and Science Direct • Use of finite-state machines for citation type recognition does not require domain experts or knowledge about cue phrases 
(2) Paper is a part of the cited work 
(3) Cited work supports this work 
(4) Paper points out problems or gaps in the cited work 
(5) Cited work is compared with the current work 
(6) Other citations 
Agarwal et al. (2010)  (1) Background/Perfunctory 1,710 sentences from 43 open-access full text biomedical articles • Model performed less on classes, Evaluation, Explanation & Similarity/Consistency 
(2) Contemporary, (3) Contrast/Conflict • Infrequent keywords not recognized by model 
(4) Evaluation, (5) Explanation 
(6) Method, (7) Modality 
(8) Similarity/Consistency 
Shotton (2010)  Factual: (1) cites, (2) citesAsAuthority, (3) isCitedBy, (4) citesAsMetadataDocument, (5) citesAsSourceDocument, (6) citesForInformation, (7) obtainsBackgroundFrom, (8) sharesAuthorsWith, (9) usesDataFrom, (10) usesMethodIn Ontology developed for Biomedical articles • OWL-based tool, CiTO for characterizing the nature of citations 
RhetoricalPositive: (1) confirms, (2) credits, (3) updates, (4) extends, (5) obtainsSupportFrom, (6) supports 
RhetoricalNegative: (1) corrects, (2) critiques, (3) disagreesWith, (4) qualifies, (5) refutes 
RhetoricalNeutral: (1) discusses, (2) reviews 
Dong and Schäfer (2011)  (1) Background—65.04% 1768 instances & 122 papers from ACL Anthology (2007 and 2008) • Use of Ensemble-style self-training reduces the manual annotation work 
(2) Fundamental idea—23.80% 
(3) Technical basis—7.18% 
(4) Comparison—3.95% 
Jochim and Schütze (2012)  (1) Conceptual—89.2% vs. Operational—10.8% 84 papers and 2008 citation from papers in 2004 ACL Proceedings (ARC) • Annotation of four facets using Moravscik’s scheme instead of a single label 
(2) Organic—10.1% vs. Perfunctory—89.9% 
(3) Evolutionary—89.8% vs. Juxtapositional—10.2% 
(4) Confirmative—91.4% vs. Negational—8.6% 
Abu-Jbara et al. (2013)  Purpose: (1) Criticizing—14.7% 3,271 instances from 30 papers in ACL Anthology Network (AAN) • 47% of citations belong to the class Neutral 
(2) Comparison—8.5% • Citation Purpose classification Macro-Fscore: 58.0% 
(3) Use—17.7% 
(4) Substantiating—7% 
(5) Basis—5% 
(6) Neutral—47% 
Polarity: (1) Positive—30% 
(2) Negative—12% 
(3) Neutral—58% 
Xu et al. (2013)  (1) Functional—48.4% ACL Anthology Network corpus (AAN) • Self-citations are skewed to the class Functional 
(2) Perfunctory—50% • Authors citing more has more functional citations 
(3) Fallback—1.6% 
Li et al. (2013)  (1) Based on—2.8% 91 Biomedical articles and 6,355 citation instances from Biomedical articles (PubMed) • Coarse-grained sentiment classification performs only slightly better than fine-grained citation function classification 
(2) Corroboration—3.6% 
(3) Discover—12.3% 
(4) Positive—0.1% 
(5) Practical—1% 
(6) Significant—0.6% 
(7) Standard—0.2% 
(8) Supply—1.2% 
(9) Contrast—0.6% 
(10) Cocitation—33.3% 
(11) Neutral, (12) Negative—(Omitted both these categories) 
Hernández-Álvarez et al. (2016)  Purpose: (1) Use—(a) Based on, Supply—16.1% 2,092 citations in 85 papers from ACL Anthology Network (AAN) • Classes Acknowledge and Useful dominate the data distribution for purpose classification 
(b) Useful—33.7% • Neutral class has more than 50% of instances 
(2) Background—(c) Acknowledge/Corroboration/Debate—37.4% 
(3) Comparison—(d) Contrast—5.3% 
(4) Critique—(e)Weakness—6% 
(f) Hedges—1.8% 
Polarity: (1) Positive—28.7% 
(2) Negative—9.7%, (3) Neutral—64.7% 
Munkhdalai, Lalor, and Yu (2016)  Function: (1) Background—30.5%, 20.5% Data 1—3,422 (Function), 3,624 (Polarity) citations • Majority of citations annotated as results and findings 
(2) Method—23.9%, 18.2% Data 2—4,426(Function), 4,423(Polarity) citations from 2,500 randomly selected PubMed Central articles • Bias of citations towards positive statements 
(3) Results/findings—45.3%, 38.3% 
(4) Don’t know—0.1%, 0.06% 
Polarity: (1) Negational—4.8%, 2.6% 
(2) Confirmative—75%, 59.8% 
(3) Neutral—19.8%, 19% 
(4) Don’t know—0.2%,0.1% 
Fisas et al. (2016)  (1) Criticism—23%: (a) Weakness, (b) Strength, (c) Evaluation, (d) Other 10,780 sentences from 40 papers in Computer Graphics • A multilayered corpus with sentences annotated for (1) Citation purpose, (2) features to detect scientific discourse and (3) Relevance for summary 
(2) Comparison—9%: (a) Similarity, (b) Difference 
(3) Use—11%: (a) Method, (b) Data, (c) Tool, (d) Other 
(4) Substantiation—1% 
(5) Basis—5%: (a) Previous own Work, (b) Others work, (c) Future Work 
(6) Neutral—53%: (a) Description, (b) Ref. for more information, (c) Common Practices, (d) Other 
Jha et al. (2017)  Same as Abu-Jbara et al. (2013)  3500 citations in 30 papers from ACL Anthology Network (AAN) • Developed data sets for reference scope detection and citation context detection 
• Comprehensive study aimed at applications of citation classification 
Lauscher et al. (2017)  Same as Abu-Jbara et al. (2013)  Data sets from Abu-Jbara et al. (2013) and Jha et al. (2017)  • Heavy skewness of data set towards less informative classes for both schemes 
• Use of domain-specific embeddings does not enhance results 
Jurgens et al. (2018)  (1) Background—51.8% 1,969 instances from ACL-Anthology Reference Corpus (ACL-ARC) • Majority of instances belong to class Background 
(2) Uses—18.5% • Error analysis shows the importance of citation context identification for result improvement 
(3) Compares or Contrasts—17.5% 
(4) Motivation—4.9% 
(5) Continuation—3.7% 
(6) Future—3.6% 
Su, Prasad et al. (2019)  (1) Weakness—2.2% ACL-ARC Computational Linguistics • Highly skewed data set with majority of instances belonging to Neutral class 
(2) Compare and Contrast—6.6% • Use of Multitask learning for citation function and provenance detection 
(3) Positive—20.6% 
(4) Neutral—70.6% 
Cohan et al. (2019)  (1) Background—58% 6,627 papers and 11,020 instances from Semantic Scholar (Computer Science & Medicine) • Introduction of new data set known as SciCite 
(2) Method—29% • The best state-of-the-art macro-fscore obtained using BiLSTM attention with ELMO vector & structural scaffolds 
(3) Result Comparison—13% 
Pride, Knoth, and Harag (2019)  (1) Background—54.61% Multidisciplinary data set of 11,233 instances from CORE • Largest multidisciplinary author annotated data set 
(2) Uses—15.51% 
(3) Compares/Contrasts—12.05% 
(4) Motivation—9.92% 
(5) Extension—6.22%, (6) Future—1.7% 
Close Modal

or Create an Account

Close Modal
Close Modal