Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
Date
Availability
1-1 of 1
Gaurav Verma
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Quantitative Science Studies (2024) 5 (1): 153–169.
Published: 01 March 2024
FIGURES
Abstract
View article
PDF
We put forward a novel approach using a generative language model (GPT-4) to produce labels and rationales for large-scale text analysis. The approach is used to discover public value expressions in patents. Using text (5.4 million sentences) for 154,934 US AI patent documents from the United States Patent and Trademark Office (USPTO), we design a semi-automated, human-supervised framework for identifying and labeling public value expressions in these sentences. A GPT-4 prompt is developed that includes definitions, guidelines, examples, and rationales for text classification. We evaluate the labels and rationales produced by GPT-4 using BLEU scores and topic modeling, finding that they are accurate, diverse, and faithful. GPT-4 achieved an advanced recognition of public value expressions from our framework, which it also uses to discover unseen public value expressions. The GPT-produced labels are used to train BERT-based classifiers and predict sentences on the entire database, achieving high F1 scores for the 3-class (0.85) and 2-class classification (0.91) tasks. We discuss the implications of our approach for conducting large-scale text analyses with complex and abstract concepts. With careful framework design and interactive human oversight, we suggest that generative language models can offer significant assistance in producing labels and rationales.
Includes: Supplementary data