As from time to time it is impractical to ask agents to provide linear orders over all alternatives, for these partial rankings it is necessary to conduct preference completion. Specifically, the personalized preference of each agent over all the alternatives can be estimated with partial rankings from neighboring agents over subsets of alternatives. However, since the agents' rankings are nondeterministic, where they may provide rankings with noise, it is necessary and important to conduct the certainty-based preference completion. Hence, in this paper firstly, for alternative pairs with the obtained ranking set, a bijection has been built from the ranking space to the preference space, and the certainty and conflict of alternative pairs have been evaluated with a well-built statistical measurement Probability-Certainty Density Function on subjective probability, respectively. Then, a certainty-based voting algorithm based on certainty and conflict has been taken to conduct the certainty-based preference completion. Moreover, the properties of the proposed certainty and conflict have been studied empirically, and the proposed approach on certainty-based preference completion for partial rankings has been experimentally validated compared to state-of-arts approaches with several datasets.

In a preference completion problem, with a set of agents (users) and a set of alternatives (items), each agent (user) has his/her partial ranking over a subset of alternatives (items) and the goal of this problem is to infer each agent (user)'s personalized ranking or preference over all the alternatives (items) including those alternatives (items) the agent (user) has not yet handled. Obviously, from time to time it is impractical to ask agents to provide linear orders over all alternatives, especially in big data environments [1]. For example, perhaps the agent does not know the status of some alternatives because there are too many alternatives, which makes it hard for the agent to rank all of them. Or perhaps some alternatives are incomparable for a certain agent. All these situations mentioned above result in partial rankings, and it is necessary to introduce preference completion.

The preference completion problem has been applied to applications in many areas, such as social choice, and recommender system [2], which can be very useful in community detection [3, 4], or graph anomaly detection [5]. For example, in social choice, each voter (agent) can cast a ballot by a ranking over all candidates (alternatives), or a partial ranking over some candidates (alternatives). As for these partial rankings, it is necessary to form a ranking over all candidates by a certain voting rule. In a recommendation system, each user can rate some items. Then the task of the recommendation system is to predict the rate on the items that have not been rated by him/her. To satisfy this requirement, two common approaches including the matrix factorization approach and the neighborhood-based approach are introduced to handle the preference completion. The traditional algorithms on these two approaches are usually rating-oriented, while a recent line of work focuses on the ranking-oriented algorithms [6, 7] due to the drawbacks of the rating-oriented algorithms. In this paper, we focus on the ranking-oriented neighborhood-based approach.

Traditionally, in neighborhood-based preference completion, it is first to find the near neighbors of each agent and then aggregate these neighbors' rankings to produce the predicted preference by a certain voting rule [6]. However, this task has some inevitable issues. For example, an agent may exhibit irrational behaviors or provide rankings in a noise setting. To address this issue, many rating-oriented trust-based approaches have been proposed with additional contextual information. Meanwhile, the ranking-oriented approach has left much room for better research. Liu et al. [8] proposed an anchor-based algorithm with many other agents' ranking information leveraged to ignore the presence of randomness.

Here in this paper a certainty-based preference completion algorithm is proposed on the basis of Liu's [8] work. More precisely, after finding the k-nearest neighbors by the anchor-kNN algorithm Liu proposed, we use the certainty-based voting algorithm introduced in this paper to complete the preference (ranking) instead of using the traditional majority voting rule. The traditional majority voting rule tends to cause wrong judgment especially when both sides have close votes. In this case, a slight randomness even can cause different outcomes by the majority voting rule. For this reason, this paper introduces a certainty-based voting algorithm to deal with this problem. Importantly, when we take a vote on two alternatives, the certainty which measures the degree that the two alternatives can be preferred or comparable should be introduced. Only when the certainty value satisfies a defined threshold, we can go further to have three-way preference decision instead of assigning 0 or 1 for the two alternatives simply. Hence, the certainty-based voting algorithm avoids the wrong judgment when both sides have close scores or rankings made in a noise setting. In this paper, before formulating the certainty and presenting the certainty-based preference completion algorithm, we consider the certainty and preference space first to introduce the three-way preference between two alternatives.

Technically, in a ranking pool gathered from agents, the rankings including alternative pair A and B can be aggregated to form the preference between A and B. Mathematically, a bijection can be built from the ranking space to the preference space for alternative pair A and B. Here, the ranking space consists of all the partial rankings on A and B from agents, while the preference space consists of three-way preference between A and B, which includes

• preference (prefer A to B, denoted as $PAB+$),

• dispreference (prefer B to A, denoted as $PAB-$), and

• uncertainty (no preference between A and B, denoted as $CAB-$),

according to the trisecting and acting models of human cognitive behaviors [1, 9]. Thus, the following three situations are distinguished:

• The agents prefer alternative A to alternative B, which can be confirmed by high preference $PAB+$, low dispreference $PAB-$, and low uncertainty $CAB-$.

• The agents prefer alternative B to alternative A, which can be confirmed by low $PAB+$, high $PAB-$, and low $CAB-$.

• The agents are uncertain about the preference between alternative pair A and B, i.e., A and B are unpreferred, which can be confirmed by low $PAB+$, low $PAB-$, and high $CAB-$.

It is obvious that when $CAB-$ is low, the preference between A and B can be determined, i.e., A and B are preferable. Hence, the certainty of preference can be introduced to describe the trustworthiness of the preference, which is denoted as $CAB+$, and it can be calculated as $CAB+$ = 1-$CAB-$. The certainty of preference can be taken as the subjective probability of the preference, following the proposition that the certainty is the degree of belief that an individual has on the preference [10]. Hence, in this paper, the certainty can be evaluated based on a well-built statistical measurement, which defines a bijection from ranking space to preference space, enabling the estimation on the pairwise preference with neighbors' partial rankings via mapping them to

(preference $PAB+$, dispreference $PAB-$, uncertainty $CAB-$).

Our definition on certainty should capture the following key properties:

• - Property 1: Certainty $CAB+$ increases as the number of rankings between alternative pair A and B increases for a fixed ratio of rankings from A to B and rankings from B to A.

• - Property 2: Certainty $CAB+$ decreases as the extent of conflict increases in the partial rankings between alternative pair A and B.

Our main contributions in this paper can be summarized as follows:

• As pointed out in [11], it is necessary and important to introduce the certainty and conflict of the preference between alternative pairs, and from time to time the certainty and conflict of the preference are more important than the preference itself. In this paper, a probability-based certainty and conflict are introduced under Properties 1 & 2, to describe the trustworthiness of the preference.

• A certainty-based voting algorithm using the certainty and conflict is proposed for conducting the certainty-based preference completion in nondeterministic settings.

• We empirically study the properties of the proposed approach, and experimentally validate the proposed approach compared to the state-of-the-art approaches with several datasets.

This paper is organized as follows. Section 2 reviews existing works on the Plackett-Luce model, Kendall-Tau distance and anchor-kNN algorithm. In Section 3, a bijection has been built from ranking space to preference space, and certainty and conflict of alternative pairs have been evaluated based on a well-built statistical measurement. In Section 4, a certainty-based voting algorithm has been taken to conduct the preference completion with the certainty and conflict. In addition, Section 5 studies empirically the properties of the proposed approach about certainty and conflict. Moreover, Section 6 has been experimentally validated compared to the state-of-the-art approaches with several datasets. Finally, Section 7 summarizes this paper and presents the future work.

### 2.1 Plackett-Luce Model

Given a set of m alternatives and a set of n agents, let y(y1, y2, …, ym) denotes the latent features of alternatives and x(x1, x2, …, xn) denotes the latent features of agents. Agent i's ranking Ri is determined by a statistical model for ranking data. Hence, as a widely-used statistical model, the Plackett-Luce model [12, 13] is adopted to generate the rankings of agents. In this paper, each alternative is assigned a positive value named utility. The greater this utility is, the more likely its corresponding alternative is ranked at a higher position [14]. In [14], the realized utility for every alternative j on agent i is determined by

$uij(Xj, Yj) = θ (Xi, Yj) + εi,j,$
(1)

where θ(χi, yj) is agent I's expected utility on alternative j and can be determined by the closeness of the latent feature χi, and yj, measured by θ(χj, yj) = exp(-||xi - yj||2), and εi,j is a zero mean independent random variable that follows a Gumbel distribution. When the realized utilities set ui(ui1, ui2, …, uim) of agent i is obtained, agent i ranks the alternatives in a decreasing order according to the realized utilities. After repeating this for n times, synthetic datasets of all the agents can be generated for experiments. For more details, please refer to the following Algorithm 1.

#### Sampling from Plackett-Luce Model.

Algorithm 1.
Sampling from Plackett-Luce Model.
Algorithm 1.
Sampling from Plackett-Luce Model.
Close modal

### 2.2 Kendall-Tau Distance

Given two agents' rankings R1 and R2 over the same alternatives, the Kendall-Tau distance can be introduced to measure the similarity of R1 and R2, which is the total number of disagreements in pairwise comparisons between alternatives in the linear rankings. For alternative j in Ri, R¡(j) represents the position in R¡. For example, with a ranking of alternatives represented by R¡, if j in is the top-ranked alternative, then Ri(j) = 1. The normalized Kendall-Tau distance between R1 and R2 is

$NK(R1, R2) = ∑j1≠j2∈R1I(Πk=1,2 (Rk(j1)-Rk(j2)) < 0)(|R1|2)$
(2)

where I(v) is an indicator that is set to be 1 if the argument v is true; otherwise, it is set to be 0.

Moreover, if the rankings have not shared completely the same alternatives, the intersection of the two alternative sets can be taken for computing the normalized Kendall-Tau distance.

### 2.3 Anchor-kNN Algorithm

Before the introduction of the anchor-kNN proposed in [8], we first present the idea of KT-kNN, which simply uses the Kendall-Tau distance to find the agent's neighbors. If the Kendall-Tau distance between two rankings and Rj is small, the latent feature of the agents x¡ and xj should be close, i.e., the two agents have a similar opinion on alternatives.

As the KT-kNN algorithm has not considered that agents' preferences may be nondeterministic or agents' rankings are made in noise setting, different from KT-kNN, anchor-kNN uses other agents' (named as anchors) ranking data to determine the closeness of two agents rather than considering the two agents' rankings only. The anchor-kNN develops a feature Fi,j, for agents i and j to represent the Kendall-Tau distance between Ri and Rj, i.e., Fi,j = NK(Ri, Rj). Then for measuring the closeness of two agents denoted as Di,j, we use the sum of the difference between Fi,t and Fj,t to find the k-nearest neighbors, where t is the third agent that belongs to all the other agents except agents i and j.

### 3. CERTAINTY AND PREFERENCE SPACE

In this section, let us present some preliminary definitions first. For an arbitrary alternatives pair A and B, the certainty can be adopted to describe the trustworthiness of the preference between A and B. Technically, following [15], a Probability-Certainty Density Function (PCDF) can be introduced to capture the subjective probability of the ranking. However, unlike [15], following [16] and [17], in this paper certainty is defined based on the PCDF to satisfy Properties 1 & 2.

#### 3.1 Ranking Space

The ranking space consists of all the weighted partial rankings on the alternative pair A and B from agents, including

• the rankings ${OAB(i)}$ where A is ranked ahead of B with weight $wAB(i)$ for the ranking $OAB(i)$, and nABdenotes the accumulated weight of rankings ${OAB(i)}$, represented by $nAB=∑iwAB(i)$,

• the rankings ${OBA(j)}$ where B is ranked ahead of A with weight $wBA(j)$ for the ranking $OBA(j)$, and nBA denotes the accumulated weight of rankings ${OBA(j)}$, represented by $nBA=∑iwBA(i)$, and

• the unordered ones ${OAB¯(k)}$ where A and B are not comparable with weight $wAB¯(k)$ for the ranking $OAB¯(k)$, and $nAB¯$ denotes the accumulated weight of rankings $OAB¯(k)$, represented by $nAB¯=∑iwAB¯(k)$. Obviously, we have $wAB¯(k)=wBA¯(k)$, and $OAB¯(k)=OBA¯(k)$.

Moreover, the weight $wAB(i)$ for $OAB(i)$ means the quality of ranking $OAB(i)$. Without additional knowledge, we assign $wAB(i)$ to be 1.

Definition 1. Ranking space

$O = {< nAB, nBA, nAB¯ > |min{nAB, nBA, nAB¯} > 0}.$

#### 3.2 Preference Space

Traditionally, the uncertainty is usually ignored, and sometimes dispreference has not been taken into account as well, which leads to some disturbing results shown in empirical study section. According to the trisecting and acting models of human cognitive behaviors [9, 18], the preference space consists of three-way preference between alternatives, which includes

• preference $PAB+$ (prefer A to B),

• dispreference $PAB-$ (prefer B to A), and

• uncertainty $CAB-$ (no preference between A and B).

Definition 2. Preference space

$P = { |PAB++PAB-+CAB-=1, min{PAB+, PAB-, CAB-}> 0}.$

#### 3.3 Certainty of Rankings in Alternative Pairs

The Bayesian inference [19, 20] here is adopted to update the probability with the available contextual information about the rankings in alternative pairs, i.e., update the prior distribution to the posterior distribution [21, 22]. Currently, the offline Bayesian inference has been utilized in this paper. The Bayesian inference can also be applied to online/streaming scenario [23, 24].

Let xAB, xBA and $XAB¯$ be the probability of rankings ${OAB(i)}$, ${OBA(j)}$ and ${OAB(k)}$, respectively, where $XAB¯=1-XAB-XBA$ and $X=$. In addition, $xAB∈[0, 1], xBA∈[0, 1] and xAB¯≥0$, and thus we then have $xAB+xBA≤1$.

Without any additional information, the prior distribution f(X|O) is a uniform distribution. As the cumulative probability of a distribution within [0,1] equals 1, the density of a PCDF has the mean value 1 within [0,1], and this makes f(X|O) = 1.

As the ranking sample O conforms to a multinomial distribution [16, 22], we have

$f(O) = 6(XAB)nAB(XBA)nBA(XAB¯)nAB¯nAB!nBA!(nAB¯)!$
(3)

As for posterior distribution f(O|X), it can be estimated as [16, 22]:

$f(O|X) = f(X|O)f(O)∫01f(X|O)f(O)dX = (XAB)nAB (XBA)nBA(XAB¯)nAB¯∫01(XAB)nAB(XBA)nBA(XAB¯)nAB¯dX$
(4)

Then, the certainty can be determined by the deviations of posterior distribution from the prior distribution, i.e., uniform distribution. Hence, we have the following definition about certainty.

Definition 3. The certainty $CAB+$ of rankings ${}$ can be estimated as

$CAB+=12∫01|f(O|X)-f(X|O)|dX=12∫01|(XAB)nBA(XAB¯)nAB¯∫01(XAB)nAB(XBA)nBA(XAB¯)nAB¯ dX-1|dX$
(5)

where $12$ is to remove the double counting of the deviations.

From this definition, we have $CAB+=CBA+$.

#### 3.4 Conflict of Rankings in Alternative Pairs

The conflict can be determined by the relative difference between weighted rankings nAB and nBA, as in [17]. More specifically,

• there is the largest conflict, when weighted rankings nAB = nBA;

• there is the smallest conflict, when weighted rankings nAB = 0 or nBA = 0.

Hence, we have the following definition about conflict.

Definition 4. The conflict cAB of rankings ${}$ can be estimated as

$CAB=min{nABnAB+nBA, nBAnAB+nBA}$
(6)

From this definition, we have cAB = cBA.

#### 3.5 Bijection from Ranking Space to Preference Space

With Definitions 1, 2, 3 and 4, the following definition can be introduced.

Definition 5. The bijection from ranking space ${}$ to preference space ${}$ can be estimated as

$PAB+= nABnAB+nBA+nAB¯ CAB+$
(7)
$PAB-= nBAnAB+nBA+nAB¯ CAB+$
(8)
$CAB- =1-CAB+$
(9)

### 4. CERTAINTY-BASED PREFERENCE COMPLETION

This section proposes the certainty-based preference completion approach. The framework of our approach is shown in Figure 1. It includes two processes. One is to find the k-nearest neighbors for user i with the anchor-kNN algorithm Liu [8] proposed. The other one is to conduct a linear ranking for user i over all alternatives. In this section, we focus on the latter one. As for the latter one, with the neighbors' partial ranking, a certainty-based voting algorithm is introduced to estimate pairwise preference for all pair alternatives, and then these pairwise preferences can form a linear ranking for the user i.

#### Certainty-based preference completion process.

Figure 1.
Certainty-based preference completion process.
Figure 1.
Certainty-based preference completion process.
Close modal

### 4.1 Certainty-based Voting Algorithm

First, let us introduce a definition.

Definition 6. With preference space ${}$, the following conclusions can be obtained:

• if uncertainty $CAB-≥ε1$, alternatives A and B are unpreferred;

• if $CAB-<ε1$,

• - if $PAB+-PAB+≥ε2$, user ¡ prefers A to B;

• - if $PAB--PAB+≥ε2$, user ¡ prefers B to A;

• - otherwise, A and B are unpreferred;

where ε1 and ε2 are thresholds to rule out the fuzziness of comparison.

In the existing work, with the rankings of neighbors obtained by k-nearest neighbors algorithm, common voting rules, such as majority voting, can be taken to estimate pairwise preference for conducting the preference completion.

In contrast, in this paper, we use a certainty-based voting rule with certainty and conflict to obtain pairwise preference. The certainty and conflict measure the trustworthiness that the pair alternatives can be preferred or comparable. If the certainty satisfies a defined threshold, we can then evaluate the degree that the user i prefers one to another denoted by $PAB-$ and $PAB+$. Then, only if the difference between the two-way preference has reached a value, we can make a preference decision on the two alternatives. Technically, for the alternative pair A and B with $CAB-<ε1$ and $|PAB+-PAB-|≥ε2$, a preference decision between A and B can be made. The process for estimating pairwise preference is also shown in Algorithm 2. We apply this algorithm on all alternative pairs, and then we get all the pairwise preferences.

#### Certainty-based voting algorithm for estimating pairwise preference.

Algorithm 2.
Certainty-based voting algorithm for estimating pairwise preference.
Algorithm 2.
Certainty-based voting algorithm for estimating pairwise preference.
Close modal

### 4.2 Greedy Order Algorithm

Next, let us combine all the pairwise preferences to form a linear ranking over all alternatives. One possible approach is the greedy order algorithm [25]. This algorithm follows a greedy idea: the order algorithm always picks the alternative that currently has the maximum potential value in the alternatives pool I and ranks it above all the other remaining items. Here, for item ¡, the potential value v¡ is equal to $∑j∈lψi,j-∑j∈lψj,i$. This value aggregates all the pairwise preferences obtained in the previous subsection and represents the preference for item ¡ among all the neighbors' rankings. Then it deletes the picked one from the alternatives pool and updates the potential values of the remaining items by removing the effects of the picked one. Repeat the picking process until the alternatives pool is empty, and then a linear ranking for user ¡ is produced. See Algorithm 3.

#### Greedy order algorithm.

Algorithm 3.
Greedy order algorithm.
Algorithm 3.
Greedy order algorithm.
Close modal

In this section, we study the properties of certainty and conflict in our proposed model.

### 5.1 Increasing Rankings with Fixed Conflict

Figure 2 plots how certainty $CAB+$ varies with weighted rankings nAB and $nAB→$ under fixed conflict cAB.

#### Certainty increases with NAB + nAB when $nABnAB+nBA$ and $nABnAB+nBA$ is fixed.

Figure 2.
Certainty increases with NAB + nAB when $nABnAB+nBA$ and $nABnAB+nBA$ is fixed.
Figure 2.
Certainty increases with NAB + nAB when $nABnAB+nBA$ and $nABnAB+nBA$ is fixed.
Close modal

This should confirm Property 1.

Theorem 1. As for fixed $nABnAB+nBA$ and $nAB→$, the certainty $CAB+$ increases with nAB + nBA.

Proof: Let $nABnAB+nBA=α, nAB+nBA=β$, and

$f(•) = (XAB)n AB(XBA)nBA(1-XAB-XBA)nAB¯∫01(XAB)nAB(XBA)nBA(1-XAB-XBA)nAB¯dX$
(10)

Then we have

$CAB+=12∫01|f(•)-1|dX$
(11)

As in [17], x1, x2, x3, x4 can be defined, such that f(x1) = f(x2) = f(x3) = f(x4) = 1 and

$CAB+=∫x1x2∫x3x4[f(•)-1] dxABdXBA$
(12)

where x1, x2, x3, and x4 are functions of β. Then

$∂CAB+∂β= ∂X2∂β ∫x3x4[f(X2)-1]dXAB-∂X1∂β ∫x3x4[f(X1)-1]dXAB+∫x1x2∂∂β∫X3X4[f(•)-1]dxABdxBA= ∫x1x2∂∂β∫x3x4[f(•)-1]dxABdxBA$
(13)

where

$∂∂β ∫x3x4[f(•)-1]dXAB= ∂X4∂β[f(X4)-1]-∂X3∂β[f(X3)-1]+∫X3X4∂∂β[f(•)-1]dxAB= ∫x3x4∂∂β[f(•)-1]dxAB$
(14)

Following Lemma 9 in [17], we have

$∂∂β ∫x3x4[f(•)-1]dxAB>0$
(15)

With Equation (13), we have

$∂CAB+∂β > 0$
(16)

This confirms the results of Theorem 1.

### 5.2 Increasing Conflict with Fixed Rankings

Figure 3 plots how certainty $CAB+$ varies with weighted rankings nAB and $nAB→$ under the fixed summation of nAB + nBA and the fixed $nAB→$. This should confirm Property 2.

#### Certainty is concave when $nAB+nBA+nAB¯ and nAB¯$ is fixed, and the minimum occurs at nAB = nBA.

Figure 3.
Certainty is concave when $nAB+nBA+nAB¯ and nAB¯$ is fixed, and the minimum occurs at nAB = nBA.
Figure 3.
Certainty is concave when $nAB+nBA+nAB¯ and nAB¯$ is fixed, and the minimum occurs at nAB = nBA.
Close modal

Theorem 2. As for fixed $nAB→$, the certainty $CAB+$ is decreasing with nAB ≤ nBA, and increasing with nAB ≥ nBA.

Proof: The details of validation process can be omitted here, as it is similar to one in the proof of Theorem 1. More specifically, with removing the absolute sign and then differentiating it, it can be proved that the derivation is negative for nAB ≤ nBA, and positive for nAB ≥ nBA.

In this section, we examine the empirical performance of the certainty-based preference completion algorithm. In the experiments, we compare our certainty-based preference completion algorithm with the common majority voting algorithm [8] and the classic collaborative filtering algorithm (CF) [26]. Both our certainty-based preference completion algorithm and majority voting algorithm use the anchor-kNN algorithm to find k-nearest neighbors' rankings and utilize these rankings to conduct the preference completion of the target user. While the collaborative filtering algorithm is a rating-oriented algorithm different from the other two. It computes user's similarity to find user's neighbors, and uses their ratings to generate item prediction.

### 6.1 Datasets

The experiments adopt two forms of datasets to evaluate algorithms' performance.

• One type of dataset is the synthetic one created by the sampler using a Plackett-Luce model with Algorithm 1. The produced synthetic dataset has over 20,000 rankings from agents on the set of 20 alternatives. Each ranking follows a Gumbel distribution.

• The other type of dataset is the Flixster dataset that collects the movie ratings by users with social trust. It has over 8,000,000 ratings on over 2,000 movies. For the experiments, we convert the ratings to rankings, and select over 9,000 rankings on over 50 movies.

### 6.2 Evaluation Metrics

We evaluate the performance on three metrics: (a) Prediction error, (b) Spearman correlation coefficient, (c) Kendall rank correlation coefficient. The first one measures the quality of the predicted ranking, and the others measure the degree of correlation on the predicted ranking with the original one. Please refer to Pearson [27] and Liu et. al. [2] for more details.

• Evaluation Metric 1: This evaluation metric estimates the accuracy on the predicted ranking with the original true one.

where M is the maximum of the pairwise error, Yi,j,k = 1 means in predicted ranking, alternative user i prefers alternative j to alternative k and Xi,j,k = 1 represents alternative user i prefers alternative j to alternative k in original ranking. I-(v) equals 1 when v < 0, and equals 0 otherwise.

• Evaluation Metric 2: The Spearman correlation coefficient measures the difference of the position for every alternative in predicted ranking and the original one to evaluate the similarity between the predicted ranking and the original one. The greater its value, the more precise our predicted ranking.

to simplify, we have

where di represents the difference on the position of alternative i with the predicted ranking and the original one.

• Evaluation Metric 3: The Kendall rank correlation coefficient is very similar to the above evaluation Metric 2, except that it uses the Kendall distance to measure the correlation:

where the symbol in Equation (20) has the same meaning with the evaluation Metric 1, Ix represents the alternatives set in original ranking, and Iy represents the alternatives set in predicted ranking.

$Φprdiction Error = 1M ∑Xi,j,kI-(Yi,j,k)$
(17)
$ΦSpearman CC = ∑i(Xi-X¯)(y1-yi)∑i(Xi-X¯)2 ∑i(yi-y¯)2$
(18)
$ΦSpearman CC = 1 -6 ∑d12n(n2-1)$
(19)
$Φkendall CC = 1 - 4 ∑I-(Xi, j, k Yi,j,k)|Ix ∩ Iy|⋅ |(Ix ∪ Iy)-1|$
(20)

### 6.3 Experimental Results on Synthetic Dataset and Flixster Dataset

In this section, we conduct the experiments on a synthetic dataset and the Flixster dataset. With the evaluation metrics separately, the comparison results with different approaches can be presented. The prediction error measures the difference in pairwise preference with the predicted ranking and original ranking. The goal is to reduce the prediction error as far as possible. While the Spearman correlation coefficient and the Kendall rank correlation coefficient measure the similarity between the predicted ranking and the original ranking. We expect the values on these two evaluation metrics can be higher possibly.

(a) Synthetic dataset

• As shown in Figure 4, it is very clear that the prediction error tends to be smaller when using certainty-based algorithm than the CF algorithm and the majority voting algorithm. In addition, the two ranking-oriented approaches outperform the rating-oriented approach. For one thing, the ranking contains more preference relation information over alternatives than rating score, and thus it may be easier and more accurate in finding the user's neighbors and completing preference. As a result, the ranking-oriented approach has a lower prediction error. For another, the comparison between the certainty-based voting algorithm and the majority voting algorithm shows the superiority of the certainty-based one. The preference completion algorithm with certainty considered does reduce the effect of randomness.

• Figure 5(a) shows the performance of Spearman correlation coefficient. On this evaluation metric, the certainty-based voting algorithm performs better than the other two algorithms. This is because our approach with preference space and certainty considered can filter out those pair preferences which have close votes and have lower certainty. This behavior causes the predicted rank much more trustworthy.

• Figure 5(b) shows the performance of Kendall rank correlation coefficient. We can get a similar conclusion with the Spearman correlation coefficient in Figure 5(a), so we do not repeat explanation here.

#### Prediction error on synthetic dataset: x-axis denotes the number of neighbors. Plots show the prediction error. For this evaluation metric, smaller values are better.

Figure 4.
Prediction error on synthetic dataset: x-axis denotes the number of neighbors. Plots show the prediction error. For this evaluation metric, smaller values are better.
Figure 4.
Prediction error on synthetic dataset: x-axis denotes the number of neighbors. Plots show the prediction error. For this evaluation metric, smaller values are better.
Close modal

#### Performance on synthetic dataset: x-axis denotes the number of neighbors. Plots show the Spearman correlation coefficient (Spearman CC) and Kendall rank correlation coefficient (Kendall CC). For both evaluation metrics, higher values are better.

Figure 5.
Performance on synthetic dataset: x-axis denotes the number of neighbors. Plots show the Spearman correlation coefficient (Spearman CC) and Kendall rank correlation coefficient (Kendall CC). For both evaluation metrics, higher values are better.
Figure 5.
Performance on synthetic dataset: x-axis denotes the number of neighbors. Plots show the Spearman correlation coefficient (Spearman CC) and Kendall rank correlation coefficient (Kendall CC). For both evaluation metrics, higher values are better.
Close modal

Roughly speaking, from the experiments on the synthetic dataset, we verify the effectiveness of our proposed certainty-based preference completion algorithm.

(b) Flixster dataset The performance of the three approaches is examined on a real-world dataset, Flixster dataset, which contains the rating information. Because the proposed algorithm and the majority voting algorithm both use the anchor-kNN algorithm which need ranking data instead of rating data, we need to convert rating data to ranking data first.

• As shown in Figure 6, when the number of neighbors, k > 300, our approach outperforms the other two and the ranking-oriented method still performs better than the rating-oriented method. While when k < 300, the result does not perform as expected. A possible reason may be that the process of converting ranking data to rating data inevitably brings errors on the pairwise preference. With more neighbors considered, our proposed algorithm shows its superiority. Thus, the prediction error descends when the number of neighbors grows.

• In Figure 7(a), as we can observe, the certainty-based approach outperforms the other two approaches significantly. This shows a consistent result with the experiments on the synthetic dataset.

• Figure 7(b) shows the a similar performance with Figure 7(a).

#### Prediction error on Flixster dataset: x-axis denotes the number of neighbors. Plots show the prediction error. For this evaluation metric, smaller values are better.

Figure 6.
Prediction error on Flixster dataset: x-axis denotes the number of neighbors. Plots show the prediction error. For this evaluation metric, smaller values are better.
Figure 6.
Prediction error on Flixster dataset: x-axis denotes the number of neighbors. Plots show the prediction error. For this evaluation metric, smaller values are better.
Close modal

#### Performance on Flixster dataset: x-axis denotes the number of neighbors. Plots show the Spearman correlation coefficient (Spearman CC) and Kendall rank correlation coefficient (Kendall CC). For both evaluation metrics, higher values are better.

Figure 7.
Performance on Flixster dataset: x-axis denotes the number of neighbors. Plots show the Spearman correlation coefficient (Spearman CC) and Kendall rank correlation coefficient (Kendall CC). For both evaluation metrics, higher values are better.
Figure 7.
Performance on Flixster dataset: x-axis denotes the number of neighbors. Plots show the Spearman correlation coefficient (Spearman CC) and Kendall rank correlation coefficient (Kendall CC). For both evaluation metrics, higher values are better.
Close modal

In general, with the experiments on the synthetic dataset and Flixster dataset, we can come to a conclusion that the experiments validate our proposed certainty-based preference completion algorithm.

Due to the fact that the agents' rankings are nondeterministic, where they may provide their rankings under noisy environments, it is necessary and important to conduct the certainty-based preference completion. Hence, in this paper firstly, for alternative pairs a bijection has been built from the ranking space to the preference space, and its certainty and conflict have been evaluated based on a well-built statistical measurement Probability-Certainty Density Function. Then, a certainty-based voting algorithm based on the certainty and conflict has been taken to conduct the preference completion. More specifically, the ranking with high certainty and low conflict can be obtained with the proposed algorithm to conduct the preference completion. Moreover, the properties of the proposed approach about certainty and conflict have been studied empirically, and the proposed approach has been experimentally validated compared to the state-of-the-art approaches with several datasets.

As in real applications, the data is usually unbalanced [28], i.e., some alternative pairs have a lot of rankings, while others only have a few rankings. In our future work, we will propose algorithms to handle unbalanced preference completion both effectively and efficiently.

All authors including L. Li (lilei@hfut.edu.cn), M.H. Xue (18856337539@163.com), Z. Zhang (zanzhang@ hfut.edu.cn), H.H. Chen (hchen@ustc.edu.cn), and X.D. Wu (xwu@hfut.edu.cn) took part in writing the paper. In addition, L. Li designed the algorithm and experiments, and provided the funding; M.H. Xue designed and conducted experiments, and analyzed the data; Z. Zhang analyzed the data.

This work has been supported by the National Natural Science Foundation of China (No. 62076087, No. 61906059 & No. 62120106008) and the Program for Changjiang Scholars and Innovative Research Team in University (PCSIRT) of the Ministry of Education of China under grant IRT17R32.

The first author would like to thank his wife Jun Zhang, his parents and friends during his fight with lung adenocarcinoma. “I leave no trace of wings in the air, but I am glad I have had my flight.”

common voting rules may include positional scoring rules, maximin, and Bucklin. For more details, please refer to [21].

[1]
Li
,
L.
, et al.:
Weighted partial order oriented three-way decisions under score-based common voting rules
.
International Journal of Approximate Reasoning
123
(
2020
),
41
54
(
2020
)
[2]
Liu
,
T.
:
Learning to rank for information retrieval
.
Berlin
,
Springer
(
2011
)
[3]
Liu
,
F.
, et al.:
Deep learning for community detection: Progress, challenges and opportunities
. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence (IJCAI 2020), pp.
4981
4987
(
2020
)
[4]
Su
,
X.
, et al.:
A comprehensive survey on community detection with deep learning
. arXiv preprint arXiv:2105.12584 (
2021
)
[5]
Ma
,
X.
, et al.:
A comprehensive survey on graph anomaly detection with deep learning
. arXiv preprint arXiv:2106.07178 (
2021
)
[6]
Katz-Samuels
,
J.
,
Scott
,
C.
:
Nonparametric preference completion
. In: Proceedings of International Conference on Artificial Intelligence and Statistics (AISTATS 2018), pp.
632
641
(
2018
)
[7]
Liu
,
N.
,
Yang
,
Q.
:
Eigenrank: A ranking-oriented approach to collaborative filtering
. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.
83
90
(
2008
)
[8]
Liu
,
A.
, et al.:
Near-neighbor methods in random preference completion
. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2019), pp.
4336
4343
(
2019
)
[9]
Yao
,
J.
:
Three-way granular computing, rough sets, and formal concept analysis
.
International Journal of Approximate Reasoning
116
,
106
125
(
2020
)
[10]
Li
,
L.
,
Wang
,
Y.
:
Context based trust normalization in service-oriented environments
. In: Proceedings of the IEEE Conference on Autonomic and Trusted Computing, pp.
122
138
(
2010
)
[11]
Hallinan
,
J.T.
:
Why we make mistakes
.
,
Portland
(
2010
)
[12]
Luce
,
R.
:
Individual choice behavior: A theoretical analysis
.
Dover Publications
,
New York
(
1959
)
[13]
Plackett
,
R.
:
The analysis of permutations
.
Applied Statistics
24
,
193
202
(
1975
)
[14]
Liu
,
A.
, et al.:
Learning plackett-luce mixtures from partial preferences
. In: Proceedings of the AAAI Conference on Artificial Intelligence (AAAI 2019), pp.
4328
4335
(
2019
)
[15]
J⊘sang
,
A.
:
A subjective metric of authentication
. In: Proceedings of the 5th European Symposium on Research in Computer Security (ESORICS 98), pp.
329
344
(
1998
)
[16]
Li
,
L.
,
Wang
,
Y.
:
Subjective trust inference in composite services
. In: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2010), pp.
1377
1384
(
2010
)
[17]
Wang
Y.
,
Singh
,
M.P.
:
Evidence-based trust: A mathematical model geared for multiagent systems
.
ACM Transactions on Autonomous and Adaptive Systems
5
(
4
), Article No. 14 (
2010
)
[18]
Yao
,
Y.
:
Three-way decision: An interpretation of rules in rough set theory
. In: Proceedings of the 4th International Conference on Rough Sets and Knowledge Technology (RSKT 2009), pp.
642
649
(
2009
)
[19]
Chen
,
H.
,
Tiño
,
P.
,
Yao
,
X.
:
Probabilistic classification vector machines
.
IEEE Transactions on Neural Networks
20
(
6
),
901
914
(
2009
)
[20]
Chen
,
H.
,
Tiño
,
P.
,
Yao
,
X.
:
Predictive ensemble pruning by expectation propagation
.
IEEE Transactions Knowledge Data Engineering
21
(
7
),
999
1013
(
2009
)
[21]
,
M.S.
, et al.:
Bayesian reliability
.
Springer
,
Berlin
(
2008
)
[22]
Hines
,
W.W.
, et al.:
Probability and statistics in engineering
.
John Wiley & Sons
,
Hoboken
(
2003
)
[23]
Chen
,
H.
,
Tiño
,
P.
,
Yao
,
X.
:
Efficient probabilistic classification vector machine with incremental basis function selection
.
IEEE Transactions on Neural Networks and Learning Systems
25
(
2
),
356
369
(
2014
)
[24]
Jiang
,
B.
, et al.:
Scalable graph-based semi-supervised learning through sparse bayesian model
.
IEEE Transactions on Knowledge and Data Engineering
29
(
12
),
2758
2771
(
2017
)
[25]
Cohen
,
W.W.
,
Schapire
,
R.E.
,
Singer
,
Y.
:
Learning to order things
.
Journal of Artificial Intelligence Research
5
,
243
270
(
1999
)
[26]
Goldberg
,
D.
, et al.:
Using collaborative filtering to weave an information tapestry
.
Communications of the ACM
35
(
12
),
61
70
(
1992
)
[27]
Fieller
,
E.
,
Herman
,
H.
,
Pearson
,
E.
:
Tests for rank correlation coefficients. I
.
Biometrika,
44
(
3/4
),
470
481
(
1957
)
[28]
Gong
,
Z.
,
Chen
,
H.
:
Model-based oversampling for imbalanced sequence classification
. In: Proceedings of the 25th ACM International Conference on Information and Knowledge Management (CIKM’16), pp.
1009
1018
(
2016
)
This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. For a full description of the license, please visit https://creativecommons.org/licenses/by/4.0/legalcode.