Abstract
When appreciating a painting, people often classify it into a style. The extent to which the painting is regarded as a typical example of the style is called its style prototypicality. The authors propose a method of quantifying style prototypicalities and conduct an experiment to validate this method using the drift rate parameter of the drift diffusion model. This parameter is calculated using participants’ decision-making response times. The authors find a positive correlation (r = .88, p < .001) between the drift rates and the quantified prototypicalities for the paintings used in this study, confirming the psychological appropriateness of the proposed quantification method.
Background Project
The larger project that our study fits into aims to clarify, empirically, how the categorization of paintings into styles influences the aesthetic preference of paintings. Aesthetic preference is defined as the pleasure that people experience when looking at a painting. This research subject is a developing and important topic in the fields of empirical aesthetics, art education, and the psychology of art. Studies have been conducted on how the categorization of image features, for example, color [1,2] and object patterns [3], influence aesthetic preference for those features. However, because the preference for a painting cannot be reduced to preferences for individual image features, and people look at paintings and simple image patterns with different aesthetic anticipations [4], it was necessary to study paintings as a whole to achieve our research objective.
Styles of paintings have two important characteristics. The first characteristic is that styles are perceptually recognizable. Meyer Schapiro [5] pointed out the visibility of the qualitative content of styles in his definition of a style as “a system of forms with a quality and a meaningful expression through which the personality of the artist and the broad outlook of a group is visible.” Shapiro further raised three aspects of artworks that compose a style—form elements or motives, form relationships, and qualities. All these aspects are accessible through visual observations. Similarly, Ernst Gombrich [6] described a style as a “distinctive and therefore recognizable” way of making art. The perceptual recognizability of styles is also supported by the success of machine learning in modeling and classifying styles using image features [7,8]. Psychological studies by M. Dorothee Augustine et al. define styles as “combinations of low-level features” and point out that image-processing algorithms that can identify styles embody this definition [9,10]. This perceptual recognizability implies that laypeople can learn to differentiate varied styles of paintings by visually inspecting sample paintings of those styles, which is a task suitable for lab experiments.
The second characteristic is that paintings belonging to a style differ in terms of prototypicality. The prototypicality of a painting refers to the extent to which the painting is regarded as a typical example or a prototype of its style [11,12]. Paul Hekkert’s research [13] was the first experimental study that investigated how the prototypicality of paintings influences aesthetic preferences for paintings. In this experiment, participants were asked to rate 40 Cubist paintings on two Likert scales, one ranging from ugly to beautiful and the other ranging from typical to atypical in terms of Cubism. “Typical” paintings were rated as more beautiful than “atypical” paintings. Psychologists put forward several explanations for this typicality effect. Rolf Reber et al. [14] argue that the human brain processes typical objects more fluently than atypical ones and thus tend to prefer typical objects. Claudia Muth and Claus-Christian Carbon [15] propose that a viewer’s liking of a typical painting might result from an “aesthetic aha,” which refers to the moment the viewer detects a gestalt among the elements of a painting. It is also possible that typical paintings are simply better in terms of artistic quality or more famous than atypical ones.
Later, Andras Farkas’s study [16], which used 40 surrealist paintings, also examined the influence of prototypicality on people’s preference for paintings. Farkas’s experiments showed that participants’ preference for the 10 most typical paintings was higher than their preference for the remaining 30 paintings in terms of statistical significance (i.e. with ps < .05) in all three series of experiments. The 10 least typical paintings showed no statistically significant difference in preference from the other 30 paintings in two of the three series of experiments. All prior studies on aesthetic influences of painting categorization that we reviewed, which Hekkert’s and Farkas’s studies exemplified, considered prototypicality as a predictor of preference. As these studies showed inconsistent results, our experiment also measured the prototypicality of paintings and analyzed how this prototypicality influences aesthetic responses to paintings.
To clarify how the prototypicalities of paintings influence aesthetic responses to paintings, we compared the aesthetic responses of people who have some knowledge of styles— and thus have prototypicality of paintings in their minds (i.e. aesthetic responses in the with-learning condition)—with responses of people who have not learned about styles and thus are not aware of such prototypicality (i.e. aesthetic responses in the without-learning condition). Following this logic, we designed and conducted two experiments, each corresponding to one condition. We used digital images of two styles of paintings as experimental stimuli: landscape paintings by Vincent van Gogh and Paul Gauguin. The reasons for choosing the two styles are described in Supplemental Material 1.
In the with-learning experiment, participants were first taught how to distinguish the two art styles. Then we displayed the two styles of paintings and elicited and quantified the participants’ perception of the paintings’ prototypicalities. Finally, the participants reported their aesthetic preference for the paintings. In the without-learning experiment, participants performed the aesthetic evaluations of the paintings without learning how to differentiate the two styles. Supplemental Material 1 provides additional details about the without-learning experiment.
The aim of performing the two experiments was to compare the aesthetic evaluations between the with-learning and without-learning conditions and examine whether the differences related to the prototypicalities of the paintings.
Current Study
Objective of the Current Study and Its Role in the Project
The main methodological challenge to this experimental paradigm was to find a psychologically appropriate method of eliciting and quantifying the prototypicalities of paintings according to participants. Each of the previous studies that investigated aesthetic influences of prototypicality for paintings explored only one style [17]. In those studies, participants did not need to choose among multiple styles to find one that best fit a painting. However, in real-world appreciation of a painting, people often consider which style the painting should be classified into. Hence, it is of both theoretical and practical importance to propose a psychologically suitable method of quantifying the prototypicalities of paintings in such decision-making situations.
We proposed and implemented a method of prototypicality quantification in our with-learning experiment and validated the psychological appropriateness of this method using a cognitive model called the drift diffusion model (DDM). In cognitive science, DDM is presently the best-established model for human decision-making when categorizing tasks. The following section of this article introduces the quantification method and its DDM-based validation.
Figure 1 illustrates the composition of the background project and the position of the current study, which is in the with-learning experiment space. The with-learning experiment can be regarded as the combination of a cognitive phase and an affective phase. The cognitive phase includes learning the styles, the quantitative measurement of the prototypicalities of the paintings, and the validation of the measurement using DDM. The affective phase is the participants’ reporting of their aesthetic preferences. The current study completed the cognitive phase. Since a method of prototypicality measurement that has a firm scientific foundation is a prerequisite for research on the relationship between prototypicality and aesthetic preference, the current study, although not directly investigating aesthetic preference, plays a critical role in the overall project.
Details of the cognitive phase of the experiment are described in the section “Cognitive Phase in the With-Learning Experiment.”
Figure 2 illustrates the procedure of the DDM-based validation of the prototypicality measurement.
In the step “measurement of prototypicalities of paintings,” participants completed two tasks for each painting: (1) they decided whether it was painted by van Gogh or Gauguin, and (2) they reported the extent to which they thought the painting was a work by each artist. Participants’ responses in the second task were then used to quantify the prototypicalities of the paintings. In the next step, “validation of the prototypicality measurement using DDM,” the participants’ responses in the first task were used to run a DDM estimation for each painting. The DDM contained a parameter called drift rate, whose absolute value denoted how easily the participants could classify the painting. Since it can be assumed that a painting with high prototypicality should be easier to classify than a painting with low prototypicality, if the participants’ quantification of the prototypicalities were psychologically appropriate, we should find a positive correlation between the prototypicalities and the absolute values of the estimated drift rates for the paintings. This was confirmed in our data analysis.
There follow sections describing details of the method of prototypicality quantification and describing the specifics of the DDM-based validation of the quantification method.
Prior Research on the Application of DDM in Empirical Aesthetics
In the area of empirical aesthetics, Xin Jin et al. [18] predicted the aesthetic scores of photo images using a convolutional neural network (CNN) and a DDM to construct an algorithm called a “deep drift-diffusion model.” For an input image, the CNN module predicted the number of positive attractors, referring to the image features that positively affected aesthetic preference for the image, and the number of negative attractors, signifying the image features that negatively affected the aesthetic preference. The DDM module took the two numbers as the input information and computed an aesthetic score for the image.
Although Jin et al.’s idea of combining deep learning and DDM is intriguing, their DDM module assumes that when people evaluate an image aesthetically, they bisect the continuum of aesthetic preference into high- and low-preference categories and then decide which category the image lies in. This assumption lacks support in psychology. In addition, the algorithm provides no information on what the positive and negative attractors are, implying that we cannot use the algorithm to study concrete factors that influence the aesthetic preference of images.
In comparison, our study aims to clarify the influence of a concrete factor that may affect the aesthetic preference of paintings—the paintings’ categorization. Therefore our study used DDM to model participants’ behaviors in an experimental categorization task, which is the best-supported way of using DDM in cognitive psychology. These distinctions make our study unique in terms of applying DDM in empirical aesthetics research.
Cognitive Phase in the with-Learning Experiment
Participants, Platform, and Stimuli
There were 22 participants in our study—all nonexperts in the field of art. We obtained informed consent from all the participants. The Office of Research Ethics of Waseda University approved this experiment (approval no. 2018-HN023).
We performed the experiment using PsychoPy (version 1.90.2) on a MacBook Air. Participants used the computer touchpad to click the buttons and scales displayed on screen to complete the tasks. The experiment was conducted in Japanese.
The experimental stimuli consisted of a “training set” that contained eight paintings for each style (see Supplemental Material 2) and an “evaluation set” that contained 15 paintings for each style (see Supplemental Material 3). Interviews with 13 art experts determined how typical each painting was for its style. The training-set paintings were typical examples of the two styles. The evaluation-set paintings ranged from atypical to typical for each style. The screen background was set to medium gray (CIE L* = 50).
To prevent participants’ selections from being affected by prior knowledge about either artist, van Gogh was labeled “Painter A,” and Gauguin was labeled “Painter B.”
Supplemental Material 1 describes the process of selecting paintings for the two sets and gives details about the participants and the preprocessing of the images.
Procedure
This experiment consisted of two sessions. In session 1, the participants learned how to differentiate between van Gogh’s and Gauguin’s styles by examining the training-set paintings. Each painting was displayed three times in random order, each time for 5 seconds. As each painting was displayed, the label of its painter (Painter A or B) was shown below the image. The participants were required to examine the paintings carefully to learn to differentiate between the two styles. This style-learning protocol was adapted from Jean Rush’s passive label-only training [19] and supported by Madeleine Ransom’s theory of style learning [20].
In session 2, the participants viewed the evaluation-set paintings in random order. For each painting, participants completed two tasks.
A two-alternative forced choice (2 AFC) task: Participants judged whether the displayed painting was painted by Painter A or B by clicking a button. Response times (RTs) were recorded in milliseconds.
Visual analog scale (VAS) rating: Participants first rated the extent to which they thought the painting was painted by Painter A on a 0%–100% VAS (Question 1). Then they rated the extent to which they thought the painting was painted by Painter B, using the same VAS (Question 2).
Method of Prototypicality Quantification
The participants’ categorization accuracy for each style in the 2 AFC task showed that 20 of the participants—12 men and eight women, with a mean age of 23.8 years (SD = 7.9)— successfully learned the style differentiation. Supplemental Material 1 provides details of the evaluation of participants’ style-learning performance.
Then we categorized the evaluation-set paintings and quantified their prototypicality using these participants’ ratings on the two VAS questions. For both questions, Cronbach’s α was .95, which indicates a high degree of inter-rater reliability.
For each painting, we defined “van-Gogh-like score” as the mean of the participants’ rating scores for the painting on Question 1 and defined “Gauguin-like score” as the mean of the participants’ rating scores on Question 2. If the van-Gogh-like score was found to be larger than the Gauguin-like score, we classified the painting in the category “cognitive van-Gogh style” and defined the prototypicality score (PTS) of the painting as the value of its van-Gogh-like score. If the van-Gogh-like score was smaller than the Gauguin-like score, we classified the painting into the category “cognitive Gauguin style” and defined its PTS as the value of its Gauguin-like score.
The two cognitive categories that resulted from these categorizations contained nearly the same paintings as the authorship of the paintings (see Supplemental Material 3). The “cognitive van-Gogh style” had 12 paintings, of which 11 were correctly identified as van Gogh paintings. The “cognitive Gauguin style” had 18 paintings, of which 14 were correctly identified as Gauguin paintings. A chi-square test of independence performed on these counts demonstrated a strong association between the paintings’ membership in the two cognitive categories and the authorship of the paintings, X2 (1, N = 30) = 13.9, p < .001, Cramer’s V = .68. This further substantiated the participants’ success in style learning.
Validation of the Psychological Appropriateness of Prototypicality Quantification
Composition of DDM
We used DDM to validate the psychological appropriateness of our method of quantifying prototypicality, namely, our definition of the PTS. First we used the participants’ responses in the 2 AFC task to estimate a DDM for each painting. Then we investigated the correlational relationship between the PTS and the absolute value of the estimated drift rate, which is a DDM parameter that represents the cognitive ease of classification across the paintings. The logic of this validation was that the PTS should display a positive correlation with the absolute value of the drift rate if the PTS truly represented prototypicality. Below, we describe the composition of DDM and the validation procedure.
DDM was initially proposed by Roger Ratcliff [21] in 1978, and subsequent studies [22–24] substantiated correspondences between parameters in DDM and components of human cognitive processing in categorization. These findings suggest that DDM is a widely accepted model for explaining the cognitive mechanism underlying stimulus-by-stimulus variations in response times in categorization tasks.
For a participant’s response to a 2 AFC task, the DDM distinguishes between the decision-making process and the neural or muscle activities unrelated to the decision-making process; for example, the perceptual encoding of the stimulus, formation of the neural instructions in the motor cortex, and their muscular execution. The DDM uses a parameter (t0) to encompass the time durations of these non-decisional components.
Regarding the decision-making process, the DDM models it as a one-dimensional Wiener process, which is a Gaussian stochastic process that progresses continuously over time. The decision-making process consists of three parameters: decision threshold separation a, decision bias z/a, and drift rate v. These parameters determine the response time distribution of participants’ selections of the upper response by Equation 1 and that of the lower response by Equation 2 (both adapted from Equation 2 in Rainer Alexandrowicz’s paper [25] with S2 = 1), which is illustrated in Fig. 3 (generated using Alexandrowicz’s diffusion model visualizer).
During the decision-making process, the participant continuously accumulates information from the percept of the stimulus, which gradually strengthens their inclination toward one of the two candidate responses. When the amount of accumulated information reaches the threshold for one candidate response, the decision process terminates, and the participant executes this response. However, Gaussian noise exists during the whole information accumulation process, which can cause the participant’s inclination to oscillate stochastically. This oscillation could lead the participant to make an erroneous decision [26].
In Fig. 3, the horizontal axis represents response time, and the vertical axis represents the amount of accumulated information. On the vertical axis, the threshold of one candidate response is denoted as zero and is called a lower response. The threshold of the other candidate response is denoted by the parameter of the decision threshold, separation a, which is positive in sign and is called the upper response. It is arbitrary to designate which candidate response is upper and lower. For our 2 AFC task, we set the response “Painter A” as the upper response and “Painter B” as the lower one. Parameter a is an indicator of the accuracy-speed trade-off when the responding speed is emphasized. Our experiment did not require participants to respond quickly; therefore, this parameter was not important for our analysis.
If a participant were to choose one candidate response much more than the other response before the categorization task, the participant may be said to exhibit bias at the onset of the task. This bias is denoted by a fraction: z/a ∈ (0, 1). In Fig. 3, this is represented by a point z, situated between zero and point a. The fraction z/a takes a value in (0, 0.5) if the bias is toward the upper response, a value in (0.5, 1) if the bias is toward the lower response, and the value 0.5 if no bias exists. As the participants in our task were presented with the same number of paintings for the two candidate responses (that is, van Gogh’s and Gauguin’s paintings) during the style-learning session, we had no reason to assume a decision bias and therefore set z/a = 0.5.
Each participant’s decision process starts at point z and drifts gradually toward the threshold of the candidate response that represents the category to which the stimulus belongs. The speed of this drift is represented by the absolute value of the drift rate v (“abv”). The drift rate v carries a positive sign if the stimulus belongs to the category denoted by the upper response and a negative sign if it the lower response matches the stimulus. The abv is determined mainly by the ambiguity of the stimulus, or, in other words, the difficulty of deciding which candidate response matches the stimulus. Abvs are larger for stimuli that are easier to classify than for those that are difficult to classify. In our study, the prototypicality of a painting means how typical and unambiguous the painting is perceived to be as a member of the cognitive category that contains the painting. A painting that has an unambiguous categorical membership—a high prototypicality—should be easier to classify than a painting that has an ambiguous categorical membership—a low prototypicality. Thus, abvs for paintings with high prototypicalities should be larger than those for paintings with low prototypicalities. As a corollary, if the PTS defined in our study truly represents prototypicality, we would likely find a positive correlation between the PTSs and the abvs of the paintings.
Data Analysis
For each painting, we estimated the non-decisional duration t0, decision threshold separation a, and drift rate v from participants’ response times and responses for the painting using a function of maximum likelihood estimation provided by the R package RWiener (version 1.3.3). Supplemental Material 3 summarizes the estimates, and Supplemental Material 1 describes the preprocessing of response times.
For each painting, we created a pair of plots to observe whether the distribution of response times predicted by the estimated DDM matched the scattering of the experimentally measured response times well. The first plot juxtaposes the scattering of 50 data points sampled from the predicted distribution and the scattering of the experimentally measured response times for the upper response (i.e. cognitive van-Gogh style) and the lower response (i.e. cognitive Gauguin style) respectively. We set the sample size to be 50 because it is both close to the number of the experimentally measured response times and large enough to show the patterns of the predicted distributions. The second plot compares the curve of the predicted distribution generated by the 50 sample points and the experimentally measured response times, shown as rug plots.
Figure 4 shows the plots of a high-PTS painting and those of a low-PTS painting for each cognitive category. Supplemental Material 1 provides the plots for all 30 paintings. These plots show that the predicted distribution of response times has a similar pattern with the scattering of the experimentally measured response times for all paintings. This means that the estimated DDMs fit well with the experimental response time data for the paintings.
We examined the relationship between PTS and abv for the paintings using Pearson’s correlation analysis. For the entire set of paintings, we detected a positive correlation between PTS and abv, r(28) = .88, p < .001. The significance of this correlation was maintained in both cognitive categories. For the cognitive van-Gogh style, r(10) = .95, p < .001. For the cognitive Gauguin style, r(16) = .79, p < .001. These results (plotted in Fig. 5) matched the prediction raised by the DDM, and therefore indicated that it was psychologically suitable to use the PTS defined in our study to measure the prototypicality of paintings.
Summary, Limitations, and Future Work
Our study proposed an experimental method of eliciting and quantifying the style prototypicality of paintings and validated its psychological appropriateness using DDM. We believe this method has extensive application value for empirical studies on the prototypicality of paintings. As styles are art-historical concepts, this method can facilitate explorations of how art-historic knowledge influences cognitive and emotional impressions of art.
Subsequent steps in our project were eliciting participants’ affective evaluations of paintings, both in with- and without-learning experiments. We are conducting data analyses to establish how affective evaluations of paintings differ between the two conditions.
A major limitation of this study is that it has been conducted only within a Japanese cultural context, which may have influenced the results of the experiments and the DDM estimations. Another limitation is that the low number of participants (22) and a lack of diversity (all participants were Japanese). These limitations suggest that caution should be exercised in generalizing the results of this study. Our future research will recruit diverse participants and conduct studies internationally to validate our experimental paradigm and evaluate how cultural factors affect the experimental results (e.g. painting categorization and DDM parameters).
In addition, it would be insightful to conduct experiments using paintings by artists with similar styles, experiments that train nonexperts on certain aspects of painting to evaluate how they then perceive and categorize paintings, and experiments that compare replica paintings with original paintings with respect to the relationship between prototypicality and aesthetic evaluations.
Acknowledgments
This project was supported by Waseda University Grant for Special Research Projects (Project numbers: 2017S-207 and 2019E-111).