A deep neural network is a good task solver, but it is difficult to make sense of its operation. People have different ideas about how to interpret its operation. We look at this problem from a new perspective where the interpretation of task solving is synthesized by quantifying how much and what previously unused information is exploited in addition to the information used to solve previous tasks. First, after learning several tasks, the network acquires several information partitions related to each task. We propose that the network then learns the minimal information partition that supplements previously learned information partitions to more accurately represent the input. This extra partition is associated with unconceptualized information that has not been used in previous tasks. We manage to identify what unconceptualized information is used and quantify the amount. To interpret how the network solves a new task, we quantify as meta-information how much information from each partition is extracted. We implement this framework with the variational information bottleneck technique. We test the framework with the MNIST and the CLEVR data set. The framework is shown to be able to compose information partitions and synthesize experience-dependent interpretation in the form of meta-information. This system progressively improves the resolution of interpretation upon new experience by converting a part of the unconceptualized information partition to a task-related partition. It can also provide a visual interpretation by imaging what is the part of previously unconceptualized information that is needed to solve a new task.
Deep neural networks (DNNs) have made great achievements in fields such as image recognition (Krizhevsky, Sutskever, & Hinton, 2017), speech recognition (Hinton et al., 2012), natural language processing (Vaswani et al., 2017), and game playing beyond human-level performance (Silver et al., 2016). DNNs, however, are famous black-box models. That they fail under certain circumstances, such as adversarial attack (Goodfellow, Shlens, & Szegedy, 2014), motivates increasing research into understanding how DNNs solve tasks or model interpretation. More recent research also suggests that better model interpretation can be useful to, for example, explanation about model behavior, knowledge mining, ethics, and trust. (Doshi-Velez & Kim, 2017; Lipton, 2018)
Researchers have proposed different approaches to proceed with model interpretation; for example, concerning the interpretation style, the post hoc style tries to separate the model training step and model interpretation step, and the concurrent style aims simultaneously for task performance as well as interpretation (Lipton, 2018). As for the applicability of interpretation methods, the model-specific type targets a certain class of models, and with the model-agnostic type, the interpretation method does not depend on the model (Arrieta et al., 2020). Considering the scope of interpretation, global interpretation gives information about how the task is solved from a broader view, and local interpretation is more focused on certain examples or parts of the model (Doshi-Velez & Kim, 2017). There are also diverse forms of interpretation, such as the information feature (Chen, Song, Wainwright, & Jordan, 2018), the relevance feature (Bach et al., 2015), a hot spot of attention (Hudson & Manning, 2018), or gradient information (Sundararajan, Taly, & Yan, 2017). Another stream of research proposes that interpretable models are usually simple ones: for example, discrete-state models (Hou & Zhou, 2018), shallower decision trees (Freitas, 2014; Wu et al., 2017), graph models (Zhang, Cao, Shi, Wu, & Zhu, 2017), or a small number of neurons (Lechner et al., 2020). (See Arrieta et al., 2020, for a more detailed overview.)
One particular dimension for model interpretation related to our letter is how much preestablished human knowledge is needed. Methods that require high human involvement, such as interpretation with human predefined concepts (Koh et al., 2020; Chen, Bei, & Rudin, 2020) or with large human-annotated data sets (Kim, Tapaswi, & Fidler, 2018), implicitly assume the background knowledge of an average human to make sense of the interpretation, which is hard to define rigorously. Contrarily, existing human-agnostic methods transfer interpretation into some measurable form, such as the depth of the decision tree (Freitas, 2014; Wu et al., 2017). However, how well this kind of measure is related to human-style interpretation is under debate.
Within the human-agnostic dimension of interpretation, we extend the discussion with two new perspectives. One perspective starts with the simple idea that interpretation should be experience dependent. Motivated by this idea, we focus on the situation where the model learns a sequence of tasks by assuming that later tasks can be explained using earlier experiences. In other words, model interpretation in our framework is defined as meta-information describing how the information used to solve the new task is related to previous ones. The second perspective is motivated by the idea that interpretation should be able to handle the out-of-experience situation. In a situation where a new task cannot be fully solved by experience, the model interpretation method should be able to report new knowledge, mimicking a human explaining what is newly learned. We demonstrate that this framework can cast insight into how later tasks can be solved based on previous experience on MNIST and CLEVR data sets (Johnson et al., 2017) and express ignorance when experience is not applicable.
Our work is related to the concept bottleneck model (CBM) and concept whitening model (CWM; Koh et al., 2020; Chen et al., 2020) in the sense that meaningful interpretation of the current task depends on previously learned knowledge. However, these methods do not capture reasonable interpretation when the human-defined concepts alone are insufficient to solve downstream tasks (Margeloiu et al., 2021). In our framework, we add the unconceptualized region to take care of information not yet associated with tasks. Moreover, a recent study also shows that contamination of concept-irrelevant information in the predefined feature space can hamper interpretation (Mahinpei et al., 2021). We implement information bottleneck (IB; Tishby, Pereira, & Bialek, 2000) as a remedy to this information leak problem. Our method also shares similarities with variational information bottleneck for interpretation (VIBI) method (Bang, Xie, Lee, Wu, & Xing, 2019) and the multiview information bottleneck method (Wang, Boudreau, Luo, Tan, & Zhou, 2019) in the sense that these methods use IB to obtain minimal latent representation from previously given representations. However, unlike the multiview IB method for problem solving, the goal of our framework is to synthesize interpretation. Furthermore, our framework does so using macroscopic task-level representations, which is different from microscopic input-level representations used in VIBI.
2 Insight into Interpretation
This section discusses the intuition behind our framework for model interpretation.
2.1 Interpretation as Meta-Information
To quantify how a new task is solved using the experience of previous tasks, we evaluate meta-information. We define meta-information as a vector of mutual information, where each element of the vector describes how much the corresponding information partition is used for the new task.
2.1.1 Interpreting Using the Right Level
In this work, a machine learns a series of different tasks. The aim is to ascribe an interpretation of how the model solves the new task based on previous experience. If we did this using low-level features, such as the intensity and color of each pixel, the task description would become complicated. Instead, we aim to give an interpretation at a more abstract level—for example, “This new task is solved by combining the knowledge about tasks 2 and 4.” To achieve this goal, information about the input is partitioned at the task level. We therefore prepare information partitions that encode useful features for each task.
2.1.2 Inducing Independence
These partitions have to satisfy certain conditions. If these information partitions are redundant, we will have arbitrariness in assigning meta-information since a task can equally be solved using different partitions (Wibral, Priesemann, Kay, Lizier, & Phillips, 2017). Therefore, inducing independence among partitions is preferred for having unambiguous meta-information. Useful methods are widely available in machine learning fields such as independent component analysis (Bell & Sejnowski, 1995; Hyvärinen & Oja, 2000) and variational autoencoders (Kingma & Welling, 2013).
2.1.3 Meaning Assignment
We have defined meta-information meta-information as a vector of Shannon information measured in bits (i.e., how much each information partition is used). Although the number of bits itself has no meaning, each entry of the vector is linked to a corresponding task. Hence, the meta-information can be mapped to the relevance of previous tasks.
2.2 Progressive Nature of Interpretation
2.2.1 Progressive Interpretation
One important but usually ignored property of interpretation is that we interpret based on experience (National Research Council, 2002; Bada & Olusegun, 2015). Progressively learning multiple tasks is not a rare setting in machine learning (Andreas, Rohrbach, Darrell, & Klein, 2016; Rusu et al., 2016; Parisi, Kemker, Part, Kanan, & Wermter, 2019), which is usually referred to as “lifelong learning,” “sequential learning,” or “incremental learning.” However, these studies usually focus on avoiding catastrophic forgetting and do not investigate how progressiveness contributes to interpretation. In one example, Kim et al. (2018), point out that interpretability emerges when lower-level modules are progressively made use of. We propose that interpretation should be synthesized in a progressive manner, where the model behavior is interpreted by how much the current task is related to previously experienced tasks.
2.2.2 Knowing You Don't Know
An experience-based progressive interpretation framework may inevitably encounter the situation when its previous experience does not help interpret the current task. To solve this problem, we introduce an unconceptualized partition, storing information not yet included in the existing information partitions. We noticed that this unconceptualized partition generates a “knowing you don't know” type of interpretation—a meta-cognition ability that allows a person to reflect on their knowledge, including what they don't know (Glucksberg & McCloskey, 1981). Under this situation, the design of the framework should be able to interpret knowing you don't know when faced with out-of-experience tasks.
We now formalize our insights in the language of information theory in the following sections.
3 The Progressive Interpretation Framework
Assume we have a model with stochastic input , which is statistically the same regardless of a task. Task is defined as predicting a series of stochastic labels . Its corresponding internal representation is . The progressive interpretation framework is formalized iteratively as follows:
- Assume that after task , a model has a minimal internal representation that encodes input . describes the internal representation learned to solve task . describes internal representation encoding X that is not yet used to solve and task. The optimization in the ideal case yields independence among the previous task-relevant partitions:
Here, we define the notation to be .
- Then the model is faced with the new task and learns to predict . After learning , the model distills the necessary part from each partition for solving task . This is achieved by minimizingwhile maintaining the best task performance, that is, by maintaining ideally all task-relevant information:
The interpretation is defined as the meta-information of how much the individual partitions for previous tasks are used to solve task . Namely, the composition of the mutual information over the different partitions is the meta-information we use to interpret the global operation of the neural network. Then the local interpretation for each example is available from .
After task , the model updates the representation partition by splitting into the newly added representation and its complement, . Then the former is denoted as and the latter as new . The model would continue for further iteration and interpretation of the tasks.
Our particular interest is in the system involving neural networks. Since our framework is information-theoretic, all types of neural networks are treated equally as segments of information processing pipelines. Which type of neural network to choose depends on the specific problem.
4.1 Information Bottleneck
4.2 Task Training and Information Partition Splitting
Suppose a new model with task input learns its first task to predict label . It is not difficult to train a neural network for this task by optimization: , where is a distance function, such as KL divergence or mean-square error, which is decided by the problem. is an encoder network parameterized by . After training, we will be able to obtain the representation of task 1 as , where indicates a neural network after optimizing .
4.3 New Task Interpretation
After getting , we can derive an interpretation as the meta-information needed from each partition as defined in section 3. We can also look into the representations of to gain insight into how task is solved for each example.
is the information needed from the unconceptualized partition to solve task . We can rewrite this to be and define the new unconceptualized partition as . We can then go back to step 1 and continue the iteration for task .
5.1 MNIST Data Set
We first illustrate our progressive interpretation framework on the MNIST data set (60,000/10,000 train/test splits). We set task 1 as digit recognition. For task 2, we propose three kinds of tasks: determining if a number is even or odd (parity task), predicting the sum of pixel intensities (ink task), or a task that involves both digit information and pixel intensity information with a certain resolution (see below). First, we train a network to perform digit recognition, and then we train an autoencoder with IB to train a network to obtain a digit-independent partition. Then we extend the network to train on a second task and obtain interpretation from the information flow. We choose continuous latent representation for this section. (See appendix sections 1 and 2 for implementation details.)
5.1.1 IB Removes Task-Relevant Information from the Unconceptualized Region
5.1.2 The Framework Explains How a New Task is Solved
5.1.3 Experience-Dependence of the ELSE Partition
After learning the digit and the ink tasks, we can update the autoencoder to exclude the ink-task-related information. On the one hand, (the first row of Figure 3b) represents the average pixel intensity. On the other hand, this information is suppressed in (rows 2–5). The suppression can be measured by feature correlation between and . Before the ink task, the correlations are (0.295, 0.414, 0.080, 0.492, 0.100) for the five units visualized, but after the ink task, the correlation becomes (0.030, 0.194, 0.019, 0.028, 0.001). We also present the result of the average ink intensity versus the latent code of the five units. It can clearly be seen that before the ink task, the knowledge of average intensity is distributed across all five units. However, after the ink task, the knowledge of average intensity is extracted as and removed from (see Figure 3c). The result indicates that the unconceptualized region is experience dependent, and information about the already learned task is excluded. Unlike other frameworks such as variational autoencoder (Kingma & Welling, 2013) and infoGAN (Chen et al., 2016), which usually have no explicit control over partitioning latent representation, our framework allows latent representation reorganization through progressive tasks.
5.1.4 Quantitative Benchmark of Interpretation
5.2 CLEVR Data Set
In this section, we demonstrate the progressive interpretation framework on the CLEVR data set (Johnson et al., 2017), a large collection of 3D-rendered scenes (70,000/15,000 train/test splits) with multiple objects with compositionally different properties. The CLEVR data set was originally designed for a visual question-answering task, but we train the model without using natural language. For example, we train the model to classify the color of an object or conduct a multiple-choice (MC) task using only pictures. For the MC task, the model is trained on a large set of four pictures and learns to choose one of the four pictures that includes a target object (100,000/20,000 train/test splits).
In this section, we divide the tasks into two groups. In task group 1, the model that is pretrained to tell objects apart learns to recognize three of the important properties, position, color, and material, among shape, size, color, material, and position. In task group 2, the model is asked to perform an MC task selecting a picture according to a specific context, for example, “Choose the picture with red cubes,” which needs information learned or not yet learned in task 1. For task group 1, we first use convolutional neural networks (CNNs) to report the image properties by supervised learning and then obtain the unconceptualized region via autoencoding. After that, task group 2 is performed with interpretation synthesized. We choose discrete latent representation for this section. (See appendix sections 1 and 2 for implementation details.)
5.2.1 Interpretation by Information Flow
The result of interpretation by information flow is shown in Table 1. The mutual information for is measured in Nat per object, where MC represents the multiple-choice task. Different rows represent different question types. We sample five random initializations of the networks for each task and present both the average and standard deviations. The theoretical amount of information required for feature is shown in parentheses. We can interpret how the model is solving the task by calculating mutual information coming from each information partition. For example, the task to “choose the picture with green metal” needs 0.345 Nat of information from the color domain and 0.686 Nat from the material domain. As expected, information coming from other domains is judged as irrelevant to this task. If the task is to “choose the picture with a small yellow object,” the model then needs 0.343 Nat from the color domain, plus 0.70 Nat of information from the unconceptualized region since the model has not yet explicitly learned about object size. If the task is “choose the picture with a large sphere,” the model finds out that all previously learned properties are useless and has to pick 0.31 Nat of information from the unconceptualized region. This is because neither size nor shape information has been used in previous tasks.
|Question Type .||Position .||Color .||Material .||Unknown .||Correct rate .|
|Question Type .||Position .||Color .||Material .||Unknown .||Correct rate .|
Note: The information unit (Nat/object), inside parentheses, is the theoretical value.
5.2.2 Single-Example Interpretation and Unconceptualized Representation
We examine the correctness of the unconceptualized representation by comparing it with the true label. For example, if the task is “choose the small yellow object,” the unconceptualized region should represent the size “small.” We can cross-check by calculating their mutual information, which is 0.662 Nat per object. For the case “choosing a red cube,” mutual information with the label “cube” is 0.432 Nat per object. For the case “choosing cylinder on the right side,” mutual information with the label “cylinder” is 0.408 Nat per object. All of these numbers exceed the chance level (the 99, 95, and 90 percentile by chance are 0.637, 0.495, and 0.368 Nat, respectively, for balanced binary random variables like size, and 0.583, 0.449, 0.332 Nat for cases with three alternatives like shape).
5.2.3 Visualizing the Unconceptualized Representation
After getting the unconceptualized representation useful for the new task, we can continue the framework by splitting that representation into the learned useful part and its complement. Separating this new useful representation is nontrivial because labels of the MC task jointly depend on multiple image properties. While previous methods (Koh et al., 2020; Chen et al., 2020) need feature-specific labels to learn a new property, the proposed framework automatically segregates a new, useful representation from previously learned representations. Furthermore, the proposed system can visualize what new representation has just been learned.
Information about other studies on the CLEVR data set can be found in appendix sections 4 to 8. We also offer more discussion about our method in appendix section 9 and discuss limitations of our method in appendix section 10. The source code of this project can be found at https://github.com/hezq06/progressive_interpretation.
This letter proposes a progressive framework based on information theory to synthesize interpretation. We show that interpretation involves independence, is progressive, and can be given at a macroscopic level using meta-information. Changing the receiver of the interpretation from a human to a target model helps define interpretation clearly. Our interpretation framework divides the input representations into independent partitions by tasks and synthesizes interpretation for the next task. This framework can also visualize what conceptualized and unconceptualized partitions code by generating images. The framework is implemented with a VIB technique and is tested on the MNIST and the CLEVR data sets. The framework can solve the task and synthesize nontrivial interpretation in the form of meta-information. The framework is also able to progressively form meaningful new representation partitions. Our information-theoretic framework capable of forming quantifiable interpretations is expected to inspire future understanding-driven deep learning.
We thank Ho Ka Chan, Yuri Kinoshita, and Qian-Yuan Tang for useful discussions about the work. This study was supported by Brain/MINDS from the Japan Agency for Medical Research and Development (AMED) under grant JP15dm0207001, Japan Society for the Promotion of Science (JSPS) under KAKENHI grant JP18H05432, and the RIKEN Center for Brain Science.