Skip Nav Destination
1-3 of 3
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Data Intelligence 1–18.
Published: 09 February 2023
AbstractView article PDF
Text-to-SQL aims at translating textual questions into the corresponding SQL queries. Aggregate tables are widely created for high-frequent queries. Although text-to-SQL has emerged as an important task, recent studies paid little attention to the task over aggregate tables. The increased aggregate tables bring two challenges: (1) mapping of natural language questions and relational databases will suffer from more ambiguity, (2) modern models usually adopt self-attention mechanism to encode database schema and question. The mechanism is of quadratic time complexity, which will make inferring more time-consuming as input sequence length grows. In this paper, we introduce a novel approach named WAGG for text-to-SQL over aggregate tables. To effectively select among ambiguous items, we propose a relation selection mechanism for relation computing. To deal with high computation costs, we introduce a dynamical pruning strategy to discard unrelated items that are common for aggregate tables. We also construct a new large-scale dataset SpiderwAGG extended from Spider dataset for validation, where extensive experiments show the effectiveness and efficiency of our proposed method with 4% increase of accuracy and 15% decrease of inference time w.r.t a strong baseline RAT-SQL.
Data Intelligence (2022) 4 (3): 471–492.
Published: 01 July 2022
FIGURES | View All (5)
AbstractView article PDF
COVID-19 evolves rapidly and an enormous number of people worldwide desire instant access to COVID-19 information such as the overview, clinic knowledge, vaccine, prevention measures, and COVID-19 mutation. Question answering (QA) has become the mainstream interaction way for users to consume the ever-growing information by posing natural language questions. Therefore, it is urgent and necessary to develop a QA system to offer consulting services all the time to relieve the stress of health services. In particular, people increasingly pay more attention to complex multi-hop questions rather than simple ones during the lasting pandemic, but the existing COVID-19 QA systems fail to meet their complex information needs. In this paper, we introduce a novel multi-hop QA system called COKG-QA, which reasons over multiple relations over large-scale COVID-19 Knowledge Graphs to return answers given a question. In the field of question answering over knowledge graph, current methods usually represent entities and schemas based on some knowledge embedding models and represent questions using pre-trained models. While it is convenient to represent different knowledge (i.e., entities and questions) based on specified embeddings, an issue raises that these separate representations come from heterogeneous vector spaces. We align question embeddings with knowledge embeddings in a common semantic space by a simple but effective embedding projection mechanism. Furthermore, we propose combining entity embeddings with their corresponding schema embeddings which served as important prior knowledge, to help search for the correct answer entity of specified types. In addition, we derive a large multi-hop Chinese COVID-19 dataset (called COKG-DATA for remembering) for COKG-QA based on the linked knowledge graph OpenKG-COVID19 launched by OpenKG ① , including comprehensive and representative information about COVID-19. COKG-QA achieves quite competitive performance in the 1-hop and 2-hop data while obtaining the best result with significant improvements in the 3-hop. And it is more efficient to be used in the QA system for users. Moreover, the user study shows that the system not only provides accurate and interpretable answers but also is easy to use and comes with smart tips and suggestions.
Huajun Chen, Ning Hu, Guilin Qi, Haofen Wang, Zhen Bi ...
Data Intelligence (2021) 3 (2): 205–227.
Published: 02 June 2021
FIGURES | View All (11)
AbstractView article PDF
The early concept of knowledge graph originates from the idea of the semantic Web, which aims at using structured graphs to model the knowledge of the world and record the relationships that exist between things. Currently publishing knowledge bases as open data on the Web has gained significant attention. In China, Chinese Information Processing Society of China (CIPS) launched the OpenKG in 2015 to foster the development of Chinese Open Knowledge Graphs. Unlike existing open knowledge-based programs, OpenKG chain is envisioned as a blockchain-based open knowledge infrastructure. This article introduces the first attempt at the implementation of sharing knowledge graphs on OpenKG chain, a blockchain-based trust network. We have completed the test of the underlying blockchain platform, and the on-chain test of OpenKG's data set and tool set sharing as well as fine-grained knowledge crowdsourcing at the triple level. We have also proposed novel definitions: K-Point and OpenKG Token, which can be considered to be a measurement of knowledge value and user value. 1,033 knowledge contributors have been involved in two months of testing on the blockchain, and the cumulative number of on-chain recordings triggered by real knowledge consumers has reached 550,000 with an average daily peak value of more than 10,000. For the first time, we have tested and realized on-chain sharing of knowledge at entity/triple granularity level. At present, all operations on the data sets and tool sets at OpenKG.CN, as well as the triplets at OpenBase, are recorded on the chain, and corresponding value will also be generated and assigned in a trusted mode. Via this effort, OpenKG chain looks forward to providing a more credible and traceable knowledge-sharing platform for the knowledge graph community.