ABSTRACT
Recently artificial intelligence (AI) and machine learning (ML) models have demonstrated remarkable progress with applications developed in various domains. It is also increasingly discussed that AI and ML models and applications should be transparent, explainable, and trustworthy. Accordingly, the field of Explainable AI (XAI) is expanding rapidly. XAI holds substantial promise for improving trust and transparency in AI-based systems by explaining how complex models such as the deep neural network (DNN) produces their outcomes. Moreover, many researchers and practitioners consider that using provenance to explain these complex models will help improve transparency in AI-based systems. In this paper, we conduct a systematic literature review of provenance, XAI, and trustworthy AI (TAI) to explain the fundamental concepts and illustrate the potential of using provenance as a medium to help accomplish explainability in AI-based systems. Moreover, we also discuss the patterns of recent developments in this area and offer a vision for research in the near future. We hope this literature review will serve as a starting point for scholars and practitioners interested in learning about essential components of provenance, XAI, and TAI.
1. INTRODUCTION
Over the past decade, the rapid rise of applications in artificial intelligence (AI) has raised the discussion of explainable AI (XAI) and trustworthy AI (TAI) among data science practitioners [1]. We have seen remarkable progress in AI algorithms and facilities for high-performance computation, and applications of AI are thriving in various domains, such as virtual assistants, healthcare, autonomous vehicles, criminal justice, human resource, and environmental science. In many applications, the results generated by AI/ML models have a huge impact on human decision-making. However, existing models are insufficient to certify how and why the results were obtained, which leads to growing concerns that these AI/ML models are unfair, opaque, or non-intuitive [2]. For example, ML and deep learning (DL) are the most representative technologies in AI and are widely used by data science practitioners. ML is a powerful tool and can identify patterns and examine correlations on large datasets. DL is a subset of ML that achieves great power and flexibility [3]. It uses a vast amount of labeled data and multiple layers of algorithms to imitate the neural network in our brain, with the aim to achieve human-like cognitive abilities. The most representative DL technology is the artificial neural network (ANN) or deep neural network (DNN). DNN comprises a large number of neurons or nodes with each layer. These nodes are interconnected in a complex manner and activate multiple combinations at each layer. However, it is debatable how this complex network works and derives its output which leads to the “black-box” problem [4]. Although these models perform complex computational tasks with high predictive accuracy, we need to ensure that the steps, workflows, and results of these models are transparent, interpretable, unbiased, and trustworthy. One of the approaches for increasing transparency is to explain these complex models through XAI, which in turn is a feasible step in building TAI [5].
The goal of XAI is to provide algorithmic transparency that can be understood by the average human being [6]. XAI will help to answer questions like how the system made certain predictions, why the system fails, or what biases are present in the system or data [7, 8]. However, not all AI applications need explanation. Some practitioners and academics discussed that explaining a black-box model is difficult to achieve or perhaps unnecessary. Instead, they suggested that these models should be designed inherently interpretable [9, 10, 11]. This approach is highly debatable, as in most of the applications the accurate predictive solutions are provided by complex ML models. Some of the ML models such as rule-based learning, K-nearest neighbor, and linear regression have high interpretability and their workflows are easy to understand. However, many other AI models such as DNN, support vector machine (SVM), and Bayesian models have complex structures and workflows, which are mysterious to the outside observers. On some occasions, even the programmers of these models are incapable of explaining why a model behaves in a certain way and generates a specific output. With the growing use of AI applications in every aspect of our modern life, there is also an increased risk of unanticipated behavior. The danger is in creating and using decisions that are not justifiable, legitimate, or that merely do not allow obtaining detailed explanations of their behavior. In that sense, XAI and TAI will be qualified to reveal the strengths, limitations, and/or weaknesses of AI/ML models. They are also an important means to establish user engagement and trust in AI applications.
The technical approaches for XAI and TAI are under quick development, at which some researchers highlighted that provenance is an evolving field to explain AI-based systems [12, 13, 14]. Provenance answers the question of who-what-when-where by documenting the process at each step, such as entities, agents, and activities. By portraying transparency, the documented provenance helps trace back the origin of data, demonstrate the steps of data processing, and determine the trustworthiness of results [15, 16, 17]. Given the non-intuitive nature of many AI/ML algorithms, tracking provenance in AI/ML workflows will be helpful since it is an effective technique to highlight significant components in the process and allows scientists to understand how the result was obtained [18]. To achieve repeatability and comparability in AI/ ML experiments, one must first understand the metadata and most importantly the provenance of the artifacts in the ML process [19]. Very recently, [20] also suggested that data provenance assists and improve fairness, accountability, transparency, and explainability (FATE) in AI/ML algorithms and enables trust. Several other researchers [12, 13, 14] suggested that provenance documentation is an emerging approach toward XAI and TAI. Nevertheless, the work in this field is still limited and there is no systematic discussion or road map for those topics in multi-disciplinary data science.
We anticipate that provenance documentation is an important factor in building XAI and TAI as it not only provides metadata of a workflow but also confirms the authenticity and reproducibility of results. This paper aims to conduct a literature review of existing research on XAI, TAI, and provenance, with a focus on their applications in data science. We started our literature search by scrutinizing academic papers from Scopus as it is one of the largest and most reliable literature databases for scientific research. The search was conducted based on keywords to select papers. We used generic search strings to get more search results like “explainable ai”, “trustworthy ai”, “artificial intelligence”, “explainable artificial intelligence”, “machine learning”, and “provenance”. Our objective was to focus on recent advances. Therefore, we restricted our search from 2010 to 2020. We followed the standard systematic literature review method with backward and forward snowballing strategies [21]. Snowballing strategy uses a reference list of the paper or citations of the paper to identify additional papers. The gathered papers were then scanned based on the title, abstract, and keywords to verify whether the reported work includes work on XAI, TAI, and provenance. We did not aim to survey all research papers. Instead, we divided our search based on two standards; 1) selection based on a higher level of citation and 2) high-quality papers including good coverage and technicality in the field. Irrelevant articles were excluded, and the remaining articles were examined in detail to understand whether they provide enough information about the proposed methodology, technical approaches, and results. In addition to the literature found on Scopus, in the review and discussion, we also incorporated a number of other publications that deliver a good definition of fundamental concepts and illustrate successful applications.
The structure of this article is organized as follows. Section 2 introduces the concepts of TAI and XAI. Section 3 uses bibliometric analysis to illustrate the latest work in the fields of provenance, XAI, and TAI and demonstrates the interconnections between them. Section 4 explains how provenance documentation plays a fundamental role in TAI and XAI by analyzing their relationships on a more detailed level. Section 5 discusses a few potential research directions of provenance, TAI, and XAI in the next decade. Finally, Section 6 concludes the paper.
2. FUNDAMENTAL CONCEPTS OF XAI AND TAI
2.1 Background of Explainability and Trustworthiness in AI
AI/ML models have achieved rapid progress and worldwide adoption, and many of them can be seen on our streets and at our homes. However, despite the successful AI applications, we still lack a scientific understanding of their workflows. To gain more benefit out of these AI-based systems they first need to explain to humans why they made a certain decision and which important features they considered in the process [5, 22, 23]. There are numerous reasons why these systems should be understandable, interpretable, and explainable. It will not only gain trust in humans but will also give confidence that the system works well. In recent years there have been several controversies where the outcomes generated by AI/ML models were biased or discriminatory [24, 25]. These models have become so dominant that they are raising doubts about future humanity and demand an explanation. For example, in 2016 Microsoft launched a Twitter bot called “Tay”, which was designed to entertain and engage people. In less than 24 hours, Tay's talk extended to racist and offensive comments, forcing Microsoft to take it offline [26, 27]. There were even life-threatening incidents caused by AI. In 2015, a self-driving Tesla was involved in a deadly accident in China when it was in autopilot mode and failed to identify a road-sweeping truck [28]. In another incident reported in 2018, a self-driving Uber killed a woman in Arizona. It turned out that the automatic car's software had no capability to classify an object as a pedestrian until that object was near a crosswalk [29, 30]. The IBM Watson system once failed to recommend correct treatments for cancer patients [31]. Also, Amazon's AI recruiting tool displayed a gender bias. It was demonstrated that the new recruiting tool was trained to screen applicants by looking for patterns in applications submitted to the company. The majority of the submissions were from men candidates, reflecting male dominance in the tech industry. Accordingly, the AI recruiting tool trained itself that male candidates were preferable, which eventually led to the gender inequality in its recommendations [32]. There are several more examples mentioned in the literature where AI-based systems malfunctioned (e.g., [5, 33]). Accordingly, there is a growing need for tools to check vulnerabilities and flaws in AI-based systems, as well as to help developers and users understand why the machine makes a certain decision.
The basic principle of TAI is to build AI-based systems that are lawful, ethical, and robust to ensure that humans can rely on them [34, 35, 36]. The key to establish TAI is by using XAI, which refers to the series of frameworks and techniques used to ensure that the results generated by AI-based systems are easily understandable and interpretable to humans [37]. Explainability plays a crucial role in achieving trust and transparency in AI algorithms. To improve explainability, data science practitioners have developed many approaches and strategic plans on XAI. For example, the National Academies of Sciences and the Royal Society organized a forum in 2017, which reported that trust, transparency, interpretability, and fairness are the most significant societal challenges in AI-based systems [38]. Simultaneously, the Defense Advanced Research Projects Agency (DARPA) funded the “Explainable AI (XAI) Program” to improve the explainability of AI results [39]. Also, in July 2017, “The New Generation Artificial Intelligence Development Plan” was sanctioned by China's State Council, to encourage explainability and extensibility [40]. In May 2018, the European Parliament set the law of General Data Protection Regulation (GDPR) to award citizens a “Right to Explanation” in cases where their activities are affected by AI [41]. Soon after that, in June 2018 a High-Level Expert Group on AI (HLEG) was set up in the European Commission to design the guideline for TAI [42]. The government of Finland published a final report on Finland's artificial intelligence programs in June 2019 in order to position Finland as a leader in the application of AI [43]. To encourage public trust and promoting the use of AI in the federal government, the White House signed an executive order on TAI in December 2020 [44]. Along with those efforts, the topics of XAI and TAI have received great attention in the academic, industrial, and governmental sectors.
Very recently, [45] outlined research agendas that combine the concepts of trustworthy computing, AI, and formal methods for ensuring trustworthiness. In her view, the previous discussion on trustworthy computing covers a set of topics: reliability, safety, security, privacy, availability, and usability. The AI/ML systems especially DL models add a dimension of complexity to traditional computing systems and raise more topics of interest, such as accuracy, robustness, fairness, accountability, transparency, interpretability/ explainability, ethics, and more. She also pointed out that although the ML community takes accuracy as a gold standard, XAI and TAI will require trade-offs among the topics mentioned above. In recent years, XAI and TAI topics have also been increasingly discussed in workshops and conferences. For instance, the Fairness, Accountability, and Transparency in Machine Learning (FAT/ ML) conference series are a unique venue for those topics [46]. The records of search queries and publications also reflect the increasing attention to XAI and TAI. The graph in Figure 1 shows the popularity of keywords on Google Trends from 01/2017 to 12/2020. For the same period, we found 772 publications on Scopus whose title, abstract, or keywords refer to XAI or TAI. Figure 2 shows the distribution of those publications in each year.
Interest over time (01/01/2017 - 12/31/2020) for the terms “Explainable AI” and “Trustworthy AI” in search queries as shown in Google Trends. The value on the vertical axis is a normalized measure of a topic's popularity among all searches on all topics.
Distribution of publications (01/01/2017 - 12/31/2020) whose title, abstract, or keywords include “Explainable AI” or “Trustworthy AI”. This query was used to extract the results from Scopus: (TITLE-ABS-KEY (“Explainable AI”) OR TITLE-ABS-KEY (“Trustworthy AI”)) AND PUBYEAR > 2016 AND PUBYEAR < 2021. The query was conducted on Aug 1st, 2021.
2.2 Technical Approaches for XAI and TAI
There have been several advances in explanation methods and strategies to make AI-based systems more ethical, transparent, and explainable [47]. In particular, there have been many discussions on technical approaches to enable XAI and TAI in ML models. ML models are classified into two types: transparent and opaque [48]. The transparent ML models are recognized as understandable and capable of explaining to some degree by themselves, such as logistics/linear regression, decision tree, k-nearest neighbors, and Bayesian models [8, 49]. These models can fit well when the primary dataset is not complex. In contrast, opaque ML models are “black-box” in nature, making them complex and tricky to understand. Despite obtaining high predictive accuracy, they lack explainability or interpretability of how the results are generated [5, 22]. Convolutional neural network (CNN), recurrent neural network (RNN), support vector machine (SVM) and random forest (RF) are the algorithms that fall under opaque models. For instance, RF was initially introduced as a technique to improve accuracy using a single decision tree. In that situation, RF can be treated as a ‘transparent’ model. However, this technique often suffers from overfitting and poor generalization. To address this issue RF combines multiple trees in which each individual tree is trained on a different part of the training dataset and captures different characteristics to calculate the final outcome. This whole process is far more challenging to explain and lacks interpretability than a single tree, forcing the user to apply a post-hoc explainability approaches to gain more insights from it [48, 50]. A post-hoc explainability approach is often employed to extract information about what the model has learned [7]. It means that, when an ML model is unable to explain the intricate method, a separate model is applied to provide an explanation. The post-hoc explainability is categorized into two different techniques: model-agnostic and model-specific [23]. The model-agnostic technique can be applied to any type of ML model no matter how complex they are. For instance, some model-agnostic techniques such as Local Interpretable Model Agnostic Explanations (LIME) [6] and SHapely Additive exPLanations (SHAP) [51] are widely used to explain DL models. While model-specific technique is only applicable to a single model or a class of models, Tree SHAP (TSHAP) [52] and Integrated Gradients (IG) [53] are some of the popular techniques used for explaining the ML models. When compared with the model-specific techniques, the model-agnostic techniques are more flexible [6]. Figure 3 depicts the classification of ML models and the corresponding XAI approaches, in which we have taken the motivation from [48, 50] but we adapted the organizational structure to better match the topics discussed here.
Classification of ML models and XAI approaches.
Although these XAI approaches can generate results to explain an ML model, many metadata and context information are still missing. To increase transparency and explainability in AI-based systems, applying provenance documentation can be a complementary technology to the existing XAI approaches [13, 47]. Provenance documentation shows promise in increasing transparency as it can be used for many purposes, such as understanding how data were collected, determining ownership and rights, tracing steps in data analysis, and making judgments about resources to use. Section 3 presents a detailed bibliometric analysis to demonstrate how provenance, XAI, and TAI are interconnected to each other.
3. PROVENANCE, XAI, AND TAI: BIBLIOMETRIC ANALYSIS FROM DIFFERENT ASPECTS
Bibliometric analysis is an effective way to measure the influence of publications in a research area. Our objective behind the bibliometric analysis is to demonstrate evidence of how provenance, XAI, and TAI are interconnected to each other in the publications. To collect the appropriate literature, we compared several databases, such as Google Scholar, PubMed, Web of Science, and Scopus. Although Google Scholar can provide diversified literature, it lacks quality control which makes it inefficient for publication search and analysis. In our work, we decided to focus on only the Scopus database as it provides wide coverage of literature from all major disciplines and all records are organized with good quality measures. A number of terms were used to query the title, abstract, or keywords of publications. As the query script (see below) shows, besides “provenance”, we required at least one of the other search terms to be present in the title, abstract, or keywords of a publication. The query was executed in Scopus on August 30th, 2021, and a total of 426 publications between 01/2010 and 12/2020 were found.
Query:
(TITLE-ABS-KEY (machine AND learning)
OR TITLE-ABS-KEY (explainable AND ai)
OR TITLE-ABS-KEY (trustworthy AND ai)
OR TITLE-ABS-KEY (artificial AND intelligence)
OR TITLE-ABS-KEY (explainable AND artificial AND intelligence)
AND TITLE-ABS-KEY (provenance))
AND PUBYEAR > 2009 AND PUBYEAR < 2021
To analyze the results, we used two tools: Bibliometrix and VOS Viewer. Bibliometrix is an open-source tool designed in the R environment for quantitative research, including all the key bibliometric methods of analysis. It allows importing bibliographic data directly from Scopus and other databases. Besides the general bibliometric analysis functions, other measures such as co-citation, coupling, and co-word analysis are also enabled [54]. VOS Viewer is a software tool for constructing and visualizing bibliometric networks such as authors, journals, and/or individual publications. More sophisticated conditions such as co-occurrences of words or co-citation based on authors can also be used in the network construction [55]. Below is a list of results generated in our analysis to the 426 publications found on Scopus.
Annual number of publications among the 426 records retrieved from Scopus.
Line graph representing cumulative appearance of word growth among authors’ keywords of the 426 publications.
Analysis by timeline of publications: The line graph in Figure 4 shows the number of publications per year from 2010 to 2020. The interesting pattern is an exponential growth in publications from 2016. It shows that the studies related to XAI, TAI, and provenance have received increasing attention in the past four years. Figure 5 is a word growth graph, which shows the cumulative appearance of authors’ keywords (i.e., keywords given by authors in a publication) over time among the 426 publications. While overall it shows a trend similar to Figure 4, it is noteworthy that artificial intelligence, machine learning, learning systems, and provenance are the words that stand out as the most predominant among all the authors’ keywords.
Analysis by subject keywords of references: The references cited by a publication are also a good way to reflect the subject of the publication itself. Keyword Plus collects words or phrases in the titles of a publications references, which provides greater depth and variety for bibliometric analysis [56]. With Keyword Plus data of the 426 publications retrieved from Scopus, we created a word cloud to visualize the frequency of keywords (Figure 6). The bigger the word or phrase appears in the word cloud, the more often it appears in the Keyword Plus data. Machine learning, learning systems, provenance, data provenance, semantics, and metadata are the most prominent words standing out in the figure.
A word cloud illustrating the most frequent keywords in the Keyword Plus data of the 426 publications.
Analysis by subject area and document type: Another advantage of Scopus data is to show the disciplinary background of the publications. The pie chart in Figure 7 illustrates the proportions of different disciplines among the 426 publications. It is clear that most publications are in the fields of computer science and mathematics. Also, it is interesting to see that about a quarter of the publications have a background in other disciplines, such as engineering, decision science, and Earth and planetary sciences, which means XAI, TAI, and provenance have also received attention in those disciplines. The donut chart in Figure 8 represents the proportions of document types. Conference papers are more than half and journal articles are about a quarter of the 426 publications.
Analysis by co-relationship of authors’ keywords: The co-occurrence of authors’ keywords shows how different research topics are relevant to each other in a publication. For all the authors’ keywords in the 426 publications from Scopus, we first ranked them by frequency of appearance. Then, we took the top 15 keywords in the list and used VOS Viewer to draw a co-occurrence graph (Figure 9). In the figure, the size of each node represents the frequency of appearance of the corresponding keyword. Also, it shows that the 15 keywords are divided into four clusters based on their interconnections, and their frequency of co-occurrence is reflected in the size of lines between the nodes. Among all the 15 keywords and four clusters, provenance and machine learning have the highest appearances. They are closely interconnected with each other and also co-occur with a large number of other keywords.
Proportions of disciplines among the 426 publications.
Document types among the 426 publications.
Co-occurrence of authors’ keywords among the 426 publications. Here only the top 15 keywords with the highest frequency of appearance are shown.
4. A REFLECTION ON THE RELATIONSHIP BETWEEN PROVENANCE, XAI, AND TAI
4.1 Increasing Attention and Community Works on Standards for Provenance Documentation
The bibliometric analysis in the above section shows an increasing trend of research on provenance, XAI, and TAI. This subsection will incorporate the review of a number of other publications to demonstrate their inter-relationships at a finer scale. Experts and researchers are interested in capturing provenance for several reasons, among which the most important is that well-documented provenance confirms the authenticity of scientific outputs [57]. Provenance is the origin or history of something in its literal meaning [58]. Some researchers [13] discussed that provenance can be understood as a subset of metadata. We would like to add that provenance not only present the metadata of various objects in a workflow but also the interrelationships between them to show the history of derivation [59]. According to PROV Family Documents of the World Wide Web Consortium (W3C), provenance is described as “information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness” [60, 61]. As such, provenance can answer questions such as how the quality of the data is, what is the data source, when was the data created, what were the steps involved in creating a result, what were the steps in a model used for the data analysis, and who developed and/or ran the workflow [62, 63].
As AI continues to expand with more diverse information the need of documenting provenance also increases. AI systems need to include provenance as it enables trust and provides users with tools that allow them to access, record, and further investigate resources and steps in a workflow [64]. The Association for Computing Machinery (ACM) Policy Council set principles for transparency and accountability, in which data provenance is one of the key principles [65]. Although their comments are on the generic transparency and accountability, their approaches and methods are also insightful for the work of XAI and TAI. [66] stated that regular supervision is necessary for AI-based systems as they can cause harm to many people by generating bias or discriminatory results. Even if the predictions generated by AI/ML models deliver high accuracy, it is crucial to know the very roots before concluding any decision, especially in critical domains such as human activities [67, 68]. In [13] it was stated that adopting the established methods from the field of provenance to describe ML models will lead to more transparent AI-based systems. A few other researchers also discussed how provenance can increase the reproducibility of ML models [69, 70, 71]. Recently, the research in [72] described how blockchain allows users to trace the provenance of training models resulting in more transparent and fair AI-based systems. For example, users will be able to discover biases or unclear sourcing of data and see what exactly leads to an action or decision made by AI-based systems. Several other researchers also proposed that provenance is essential to hold AI-based systems to the same standards of accountability as humans [2, 73]. Based on a review of those publications, in Figure 10 we present the research topics involved in provenance, XAI, and TAI, and illustrate the overlapped parts.
The similarity of topics involved in provenance, XAI, and TAI.
There are many existing models, languages, and tools designed and developed by researchers to enable provenance documentation, and some are developed specifically for AI/ML models. The W3C PROV Ontology (PROV-O) is a representation of the PROV Data Model (PROV-DM) using the Web Ontology Language 2 (OWL2) [74]. It allows creating new classes, properties and exchange provenance information generated from different systems. ProvStore is the first online public provenance repository supporting the standards of W3C PROV. It allows users to store, access, integrate, share, organize, visualize, and export provenance documents in various formats, such as PROVN, JSON, Turtle, and XML [75]. There are also tools supporting the validation and browsing of provenance documents. ProvValidator is an online tool for validating provenance documents, ensuring that the documents have consistent history and are safe to use for analysis [76]. Prov Viewer is a visualization tool that allows users to explore provenance data through zooming, collapsing, filtering to provide different levels of granularity in the analysis [77]. For workflow platforms and AI/ML models, there are also ongoing activities on specific standards and tools for provenance documentation. The Common Workflow Language (CWL) is a standard designed to provide specifications and semantics for workflows and tools in data-intensive science. The goal is to make scientific results portable and scalable across software and hardware environments, and thus support reproducibility [78]. OpenML is an online platform that allows machine learning researchers to share the code and results (e.g., model, prediction, and evaluation) and organize it in an effective way for easy access [79]. ModelDB is an open-source end-to-end system for the management of ML models and has libraries available for Scikit-Learn and Spark ML. It also allows data scientists to perform experiments and build ML models, while the metadata such as pre-processing steps, hyperparameters, quality metrics, and training are automatically captured in the background. ModelDB uses a relational database to store all the extracted metadata and a branching model to track each model's history over time [80].
4.2 Real-world Practices of Provenance Documentation and the Support to XAI and TAI
In real-world practice, the scope of provenance differs from user to user and is also dependent on the research needs and technologies used [58, 81, 82]. To formalized provenance documentation, [83] outlined the characteristics of the provenance model into several categories, such as content, management, and use. The purpose is to support engineers to categorize the components and dimensions according to the functionality they are involved in. The W3C PROV is a set of documents that defines various aspects necessary to achieve, exchange, and make use of provenance information amongst diverse environments [60]. For example, the PROV-DM is structured in six components: 1) entities and activities, and the time at which they were created, used, or ended, 2) derivations of entities from other entities, 3) agents bearing responsibility for entities that were generated and activities that happened, 4) a notion of the bundle as a mechanism to support the provenance of provenance, 5) properties to link entities that refer to the same thing, and 6) collections forming a logical structure for its members [84]. Those models, categories, and guidelines are further adapted to match needs in real-world applications. For instance, [85] attempted to build a large-scale provenance model for an eScience experiment enabling provenance to be made available as metadata. [86] presented a unique approach for analyzing and tracking provenance collected from scripts. This tool helps scientists record, reproduce, and compare all information and supports decision-making. [87] proposed a provenance network analysis method by applying ML techniques on the network metrics to generate provenance information automatically from application data/logs. To provide sufficient information on the decisions made by AI-based systems to the end-users, [17] proposed a six-W framework (which, what, who, where, when, and why).
There have been many successful applications of provenance documentation in recent years, and some of them show good performance with AI/ML models in workflow platforms. Renku is an open online platform that can track every version of data, code, and results, and help researchers evaluate, reproduce, and reuse data and algorithms [88]. WholeTale is a similar platform that enables reproducibility by allowing researchers to capture and share data, code, and workflow environment in research [89]. The work reported in [90, 91] adapted PROV-O in an ontology to capture provenance of workflows in global change research. Based on those earlier works of provenance documentation, [92] developed an experiment to capture fine-granular provenance of workflows in Jupyter. The work in [93] proposed a lightweight system that allows storage, extraction, and management of provenance and metadata from ML experiments. Dataset, models, predictions, evaluations, hyperparameters of the models, schemas of the dataset, and layout of the deep neural network are some of the common artifacts that can be achieved. The research of [94] designed a visual analytics system named “exlpAIner” which allows users to understand all steps of an ML model, diagnose the limitation using XAI methods, and then refine and optimize the model. In [95] a guideline provenance ontology (G-Prov) was developed, with the intent to represent provenance of treatments at different granularity levels and share the information with healthcare practitioners. Provenance of scientific workflows has been a long-term concern in research [70]. Recently, with the wide usage of Jupyter and RMarkdown in different scientific disciplines, there has also been solid progress on provenance documentation in workflow platforms. For instance, in [96] a tool named ProvBook was designed, which captures and stores the provenance of a notebook in Jupyter and allows users to compare results. ProvPy is a Python library with an implementation of the W3C PROV-DM. It allows to import and export of provenance information in different formats, such as PROV-JSON and PROV-XML [97].
Some recent projects also leverage the technical advances in semantics, data visualization and cloud computing. For example, MetaClip (METAdata for CLImate Products) [98] develops vocabularies and an R package to capture the provenance of climate research in PROV-O format. The provenance is recorded in JSON-LD format and appended inside the image file of a climate research output. Then, an interactive web portal can load the image and then read and visualize the provenance information into a graph. The nodes and edges in the graph are interactive, where an end user can click and browse the detailed attributes. Another example is Geoweaver [99, 100] which is an open-source and cloud-based application that allows AI practitioners in earth science to integrate, write, and share workflows. In the cloud-based environment, other users can easily find and trace shared workflows of interest, and replicate the code in their own work.
5. A VISION ON THE TRENDS OF PROVENANCE, XAI, AND TAI IN THE NEXT DECADE
It is evident that provenance can help us address issues associated with transparency, explainability, accountability, and authenticity in XAI and TAI. The above bibliometric analysis and reflection highlighted many existing studies, and we believe there will be more advancement in the joint research of XAI, TAI, and provenance in the coming years. Below is a list of our thoughts on future work.
Although AI/ML models have made profound advances, many of them are still deficient in preventing biased and discriminative results. Biases might be caused by many reasons, such as incomplete data, data labelling, adversarial manipulation, missed steps in an ML model, or a workflow guided by a bad hypothesis. Adapting provenance methods will lead to more traceability and transparency of AI applications. A comprehensive description of methods, models, algorithms, and data should be recorded with the aim that they can be further reviewed. Rigorous validation and testing should be done on AI/ML models, and those test results should also be well documented. These steps in provenance documentation can help researchers build explainable and trustworthy systems. Even though documented provenance cannot immediately determine the cause of a bias or error, the complete information can support researchers in tracing all components in the workflow to find the likely cause.
As data are the primary source for any results generated by an AI-based system, studies of XAI and TAI can benefit from many existing mature technologies of metadata and data provenance. Data are suspect when the origin cannot be verified. If a company is using data that are not traceable but concluding an important decision, then this decision is not reliable and will raise concerns amongst users. Provenance provides the flexibility of documenting data at every single step in a data science workflow, ranging from data collection, data cleansing, data analysis, derived data, to the final result. The documented data provenance will be a solid component for XAI and TAI in AI-based systems.
The granularity of provenance (i.e., level of details) depends on the real-world needs. It is crucial to understand that different stakeholders have different requirements on the details of provenance in AI/ML models. Not all people are interested in detailed workflow documentation, while some critical domains such as healthcare, government, and criminal justice require diligent information as the results generated by AI/ML models can have a serious impact on human life, environment, and/or policy making. For AI-based systems, there should be a detailed user survey to clarify the needs of stakeholders before the functions for provenance documentation are developed.
More automated technologies and tools should be developed for recording and sharing provenance information of AI-based systems. We need efficient tools to document provenance and a better-digitized environment to archive, share, and distribute the provenance information to a broad community. Those tools will document the provenance in standard structures and make the information accessible and queryable. In particular, we hope packages can be used for popular workflow platforms such as Jupyter and RMarkdown to automatically document provenance. Several recent studies mentioned in Section 4 have already made solid progress in that direction. Once those packages are in place, there can be a lot of adoptions and adaptations in various scientific domains.
Moreover, we need to understand XAI and TAI as a socio-technical issue, and we need a comprehensive approach to tackle the issue from both social and technical aspects. The GDPR (General Data Protection Regulation) released by the European Parliament is a good example to help understand this topic. GDPR introduces the standardized data protection law, aiming to create consistent protection of users’ data. It states that the data cannot be used without user consent. To assist the implementation of this regulation, provenance information can be used to track down all the activities, which can help to clarify if the data are used in the right way or not. In the world of AI, more work is required to increase awareness and fully establish users’ rights and obligations on their data.
6. CONCLUSIONS
The need for explainability in AI/ML models has attracted great attention in recent years. However, it is not sufficient to explain AI/ML models using post-hoc explanations alone. Provenance documentation is one of the means to accomplish transparency, traceability, explainability, and reproducibility in AI-based systems. This study presented a systematic literature review of recent work and advances in the field of XAI, TAI, and provenance. First, we provided the fundamental concepts of XAI and TAI and listed the latest discussions on these topics. Second, we analyzed the inter-relationships between XAI, TAI, and provenance through a bibliometric analysis. We specified how provenance documentation plays a crucial role in building explainability and trustworthiness in AI-based systems, and briefly introduced a few tools and platforms such as Renku, WholeTale, MetaClip, and Geoweaver. Third, we presented a vision on the trends of research on XAI, TAI, and provenance in the next decade. We hope this literature analysis highlights the importance of provenance in AI-based systems and encourages AI practitioners/researchers to start documenting provenance. We expect to see more AI/ML models become explainable, providing enough details to the end-user, and we believe that provenance documentation will be one of the significant approaches to accomplish that.
ACKNOWLEDGMENTS
The work was supported by the National Science Foundation under Grants No. 2019609 and the National Aeronautics and Space Administration under Grant No. 80NSSC21M0028. We thank three anonymous reviewers for their constructive comments and suggestions on an earlier version of this paper.
AUTHOR CONTRIBUTION STATEMENT
Kale, Amruta ([email protected]) and Ma, Xiaogang ([email protected]) proposed the topic for the literature review and designed the framework. Kale, Amruta ([email protected]) conducted the literature review and wrote the first draft. All co-authors contributed to the discussion and revision of the manuscript.