Certainty-based Preference Completion

Abstract As from time to time it is impractical to ask agents to provide linear orders over all alternatives, for these partial rankings it is necessary to conduct preference completion. Specifically, the personalized preference of each agent over all the alternatives can be estimated with partial rankings from neighboring agents over subsets of alternatives. However, since the agents' rankings are nondeterministic, where they may provide rankings with noise, it is necessary and important to conduct the certainty-based preference completion. Hence, in this paper firstly, for alternative pairs with the obtained ranking set, a bijection has been built from the ranking space to the preference space, and the certainty and conflict of alternative pairs have been evaluated with a well-built statistical measurement Probability-Certainty Density Function on subjective probability, respectively. Then, a certainty-based voting algorithm based on certainty and conflict has been taken to conduct the certainty-based preference completion. Moreover, the properties of the proposed certainty and conflict have been studied empirically, and the proposed approach on certainty-based preference completion for partial rankings has been experimentally validated compared to state-of-arts approaches with several datasets.


INTRODUCTION
In a preference completion problem, with a set of agents (users) and a set of alternatives (items), each agent (user) has his/her partial ranking over a subset of alternatives (items) and the goal of this problem is to infer each agent (user)'s personalized ranking or preference over all the alternatives (items) including

Certainty-based Preference Completion
those alternatives (items) the agent (user) has not yet handled.Obviously, from time to time it is impractical to ask agents to provide linear orders over all alternatives, especially in big data environments [1].For example, perhaps the agent does not know the status of some alternatives because there are too many alternatives, which makes it hard for the agent to rank all of them.Or perhaps some alternatives are incomparable for a certain agent.All these situations mentioned above result in partial rankings, and it is necessary to introduce preference completion.
The preference completion problem has been applied to applications in many areas, such as social choice, and recommender system [2], which can be very useful in community detection [3,4], or graph anomaly detection [5].For example, in social choice, each voter (agent) can cast a ballot by a ranking over all candidates (alternatives), or a partial ranking over some candidates (alternatives).As for these partial rankings, it is necessary to form a ranking over all candidates by a certain voting rule.In a recommendation system, each user can rate some items.Then the task of the recommendation system is to predict the rate on the items that have not been rated by him/her.To satisfy this requirement, two common approaches including the matrix factorization approach and the neighborhood-based approach are introduced to handle the preference completion.The traditional algorithms on these two approaches are usually ratingoriented, while a recent line of work focuses on the ranking-oriented algorithms [6,7] due to the drawbacks of the rating-oriented algorithms.In this paper, we focus on the ranking-oriented neighborhood-based approach.
Traditionally, in neighborhood-based preference completion, it is first to find the near neighbors of each agent and then aggregate these neighbors' rankings to produce the predicted preference by a certain voting rule [6].However, this task has some inevitable issues.For example, an agent may exhibit irrational behaviors or provide rankings in a noise setting.To address this issue, many rating-oriented trust-based approaches have been proposed with additional contextual information.Meanwhile, the ranking-oriented approach has left much room for better research.Liu et al. [8] proposed an anchor-based algorithm with many other agents' ranking information leveraged to ignore the presence of randomness.
Here in this paper a certainty-based preference completion algorithm is proposed on the basis of Liu's [8] work.More precisely, after finding the k-nearest neighbors by the anchor-kNN algorithm Liu proposed, we use the certainty-based voting algorithm introduced in this paper to complete the preference (ranking) instead of using the traditional majority voting rule.The traditional majority voting rule tends to cause wrong judgment especially when both sides have close votes.In this case, a slight randomness even can cause different outcomes by the majority voting rule.For this reason, this paper introduces a certainty-based voting algorithm to deal with this problem.Importantly, when we take a vote on two alternatives, the certainty which measures the degree that the two alternatives can be preferred or comparable should be introduced.Only when the certainty value satisfies a defined threshold, we can go further to have three-way preference decision instead of assigning 0 or 1 for the two alternatives simply.Hence, the certainty-based voting algorithm avoids the wrong judgment when both sides have close scores or rankings made in a noise setting.In this paper, before formulating the certainty and presenting the certainty-based preference completion algorithm, we consider the certainty and preference space first to introduce the three-way preference between two alternatives.

Certainty-based Preference Completion
Technically, in a ranking pool gathered from agents, the rankings including alternative pair A and B can be aggregated to form the preference between A and B. Mathematically, a bijection can be built from the ranking space to the preference space for alternative pair A and B. Here, the ranking space consists of all the partial rankings on A and B from agents, while the preference space consists of three-way preference between A and B, which includes It is obvious that when − AB C is low, the preference between A and B can be determined, i.e., A and B are preferable.Hence, the certainty of preference can be introduced to describe the trustworthiness of the preference, which is denoted as + AB C , and it can be calculated as The certainty of preference can be taken as the subjective probability of the preference, following the proposition that the certainty is the degree of belief that an individual has on the preference [10].Hence, in this paper, the certainty can be evaluated based on a well-built statistical measurement, which defines a bijection from ranking space to preference space, enabling the estimation on the pairwise preference with neighbors' partial rankings via mapping them to (preference , dispreference , uncertainty ).

Certainty-based Preference Completion
Our main contributions in this paper can be summarized as follows: • As pointed out in [11], it is necessary and important to introduce the certainty and conflict of the preference between alternative pairs, and from time to time the certainty and conflict of the preference are more important than the preference itself.In this paper, a probability-based certainty and conflict are introduced under Properties 1 & 2, to describe the trustworthiness of the preference.
• A certainty-based voting algorithm using the certainty and conflict is proposed for conducting the certainty-based preference completion in nondeterministic settings.
• We empirically study the properties of the proposed approach, and experimentally validate the proposed approach compared to the state-of-the-art approaches with several datasets.This paper is organized as follows.Section 2 reviews existing works on the Plackett-Luce model, Kendall-Tau distance and anchor-kNN algorithm.In Section 3, a bijection has been built from ranking space to preference space, and certainty and conflict of alternative pairs have been evaluated based on a well-built statistical measurement.In Section 4, a certainty-based voting algorithm has been taken to conduct the preference completion with the certainty and conflict.In addition, Section 5 studies empirically the properties of the proposed approach about certainty and conflict.Moreover, Section 6 has been experimentally validated compared to the state-of-the-art approaches with several datasets.Finally, Section 7 summarizes this paper and presents the future work.

Plack ett-Luce Model
Given a set of m alternatives and a set of n agents, let y(y 1 , y 2 , …, y m ) denotes the latent features of alternatives and x(x 1 , x 2 , …, x n ) denotes the latent features of agents.Agent i's ranking R i is determined by a statistical model for ranking data.Hence, as a widely-used statistical model, the Plackett-Luce model [12,13] is adopted to generate the rankings of agents.In this paper, each alternative is assigned a positive value named utility.The greater this utility is, the more likely its corresponding alternative is ranked at a higher position [14].In [14], the realized utility for every alternative j on agent i is determined by where h(x i , y j ) is agent i's expected utility on alternative j and can be determined by the closeness of the latent feature x i and y j , measured by h(x i , , and e i,j is a zero mean independent random variable that follows a Gumbel distribution.When the realized utilities set u i (u i1 , u i2 , …, u im ) of agent i is obtained, agent i ranks the alternatives in a decreasing order according to the realized utilities.After repeating this for n times, synthetic datasets of all the agents can be generated for experiments.For more details, please refer to the following Algorithm 1.

Kendall-Tau Distance
Given t wo agents' rankings R 1 and R 2 over the same alternatives, the Kendall-Tau distance can be introduced to measure the similarity of R 1 and R 2 , which is the total number of disagreements in pairwise comparisons between alternatives in the linear rankings.For alternative j in R i , R i (j) represents the position in R i .For example, with a ranking of alternatives represented by R i , if j in R i is the top-ranked alternative, then R i (j) = 1.The normalized Kendall-Tau distance between R 1 and R 2 is where I(v) is an indicator that is set to be 1 if the argument v is true; otherwise, it is set to be 0.
Moreover, if the rankings have not shared completely the same alternatives, the intersection of the two alternative sets can be taken for computing the normalized Kendall-Tau distance.

Anchor-kNN Algorithm
Before t he introduction of the anchor-kNN proposed in [8], we first present the idea of KT-kNN, which simply uses the Kendall-Tau distance to find the agent's neighbors.If the Kendall-Tau distance between two rankings R i and R j is small, the latent feature of the agents x i and x j should be close, i.e., the two agents have a similar opinion on alternatives.
As the KT-kNN algorithm has not considered that agents' preferences may be nondeterministic or agents' rankings are made in noise setting, different from KT-kNN, anchor-kNN uses other agents' (named as

Certainty-based Preference Completion
anchors) ranking data to determine the closeness of two agents rather than considering the two agents' rankings only.The anchor-kNN develops a feature F i, j for agents i and j to represent the Kendall-Tau distance between R i and R j , i.e., F i, j = NK(R i , R j ).Then for measuring the closeness of two agents denoted as D i,j , we use the sum of the difference between F i,t and F j,t to find the k-nearest neighbors, where t is the third agent that belongs to all the other agents except agents i and j.

CERTAINTY AND PREFERENCE SPAC E
In this section, let us present some preliminary definitions first.For an arbitrary alternatives pair A and B, the certainty can be adopted to describe the trustworthiness of the preference between A and B. Technically, following [15], a Probability-Certainty Density Function (PCDF) can be introduced to capture the subjective probability of the ranking.However, unlike [15], following [16] and [17], in this paper certainty is defined based on the PCDF to satisfy Properties 1 & 2.

Ranking Space
The ranking spa ce consists of all the weighted partial rankings on the alternative pair A and B from agents, including • the rankings

Preference Space
Traditionally, the uncertainty is u sually ignored, and sometimes dispreference has not been taken into account as well, which leads to some disturbing results shown in empirical study section.According to the trisecting and acting models of human cognitive behaviors [9,18], the preference space consists of threeway preference between alternatives, which includes

Certainty of Rankings in Alternative Pairs
The Bayesian inference [19,20] here is adopted to update the probability with the available contextual information about the rankings in alternative pairs, i.e., update the prior distribution to the posterior distribution [21,22].Currently, the offline Bayesian inference has been utilized in this paper.The Bayesian inference can also be applied to online/streaming scenario [23,24].Let x AB , x BA and AB x be the probability of rankings x and X = <x AB , x BA >.In addition, , and thus we then have Without any additional information, the prior distribution f(X|O) is a uniform distribution.As the cumulative probability of a distribution within [0,1] equals 1, the density of a PCDF has the mean value 1 within [0,1], and this makes f(X|O) = 1.
As the ranking sample O conforms to a multinomial distribution [16,22], we have As for posterior distribution f(O|X), it can be estimated as [16,22]: Then, the certainty can be determined by the deviations of posterior distribution from the prior distribution, i.e., uniform distribution.Hence, we have the following definition about certainty.n n n can be estimated as where 1 2 is to remove the double counting of the deviations.
From this definition, we have

Conflict of Rankings in Alternative Pairs
The conflict can be determined by the relative difference between weighted rankings n AB and n BA , as in [17].More specifically, • there is the largest conflict, when weighted rankings n AB = n BA ; • there is the smallest conflict, when weighted rankings n AB = 0 or n BA = 0. Hence, we have the following definition about conflict.n n n can be estimated as From this definition, we have c AB = c BA .

Bijection from Ranking Space to Preference Space
With Definitions 1, 2, 3 and 4, the following definition can be introduced.DEFINITION

CERTAINTY-BASED PREFERENCE COMPLETION
This section proposes the certainty-based preference completion approach.The framework of our approach is shown in Figure 1.It includes two processes.One is to find the k-nearest neighbors for user i with the anchor-kNN algorithm Liu [8] proposed.The other one is to conduct a linear ranking for user i over all alternatives.In this section, we focus on the latter one.As for the latter one, with the neighbors' partial ranking, a certainty-based voting algorithm is introduced to estimate pairwise preference for all pair alternatives, and then these pairwise preferences can form a linear ranking for the user i.where e 1 and e 2 are thresholds to rule out the fuzziness of comparison.

Certainty-based Voting Algorithm
In the existing work, with the rankings of neighbors obtained by k-nearest neighbors algorithm, common voting rules , such as majority voting, can be taken to estimate pairwise preference for conducting the preference completion.common voting rules may include positional scoring rules, maximin, and Bucklin.For more details, please refer to [21].

Certainty-based Preference Completion
In contrast, in this paper, we use a certainty-based voting rule with certainty and conflict to obtain pairwise preference.The certainty and conflict measure the trustworthiness that the pair alternatives can be preferred or comparable.If the certainty satisfies a defined threshold, we can then evaluate the degree that the user i prefers one to another denoted by + AB P and − AB P .Then, only if the difference between the two-way preference has reached a value, we can make a preference decision on the two alternatives.Technically, for the alternative pair A and B with e , a preference decision between A and B can be made.The process for estimating pairwise preference is also shown in Algorithm 2. We apply this algorithm on all alternative pairs, and then we get all the pairwise preferences.

Greedy Order Algorithm
Next, let us combine all the pairwise preferences to form a linear rank ing over all alternatives.One possible approach is the greedy order algorithm [25].This algorithm follows a greedy idea: the order algorithm always picks the alternative that currently has the maximum potential value in the alternatives pool I and ranks it above all the other remaining items.Here, for item i, the potential value v i is equal to Algorithm 3. Greedy order algorithm.

E MPIRICAL STUDIES ON PROPERTIES OF CERTAINTY
In this section, we study the properties of certainty and conflict in our proposed model.

I ncreasing Rankings with Fixed Conflict
Then we have As in [17], where x 1 , x 2 , x 3 , and x 4 are functions of b.

Certainty-based Preference Completion
where Following Lemma 9 in [17], we have With Equation ( 13), we have This confirms the results of Theorem 1.

Certainty-based Preference Completion
THEOREM 2. As for fixed AB n , the certainty + AB C is decreasing with n AB ≤ n BA , and increasing with n AB ≥ n BA .
Proof: The details of validation process can be omitted here, as it is similar to one in the proof of Theorem 1.More specifically, with removing the absolute sign and then differentiating it, it can be proved that the derivation is negative for n AB ≤ n BA , and positive for n AB ≥ n BA .

EXPERIMENTS
In this section, we examine the empirical performance of the certainty-based preference completion algorithm.In the experiments, we compare our certainty-based preference completion algorithm with the common majority voting algorithm [8] and the classic collaborative filtering algorithm (CF) [26].Both our certainty-based preference completion algorithm and majority voting algorithm use the anchor-kNN algorithm to find k-nearest neighbors' rankings and utilize these rankings to conduct the preference completion of the target user.While the collaborative filtering algorithm is a rating-oriented algorithm different from the other two.It computes user's similarity to find user's neighbors, and uses their ratings to generate item prediction.

Datasets
The experiments adopt t wo forms of datasets to evaluate algorithms' performance.
• One type of dataset is the synthetic one created by the sampler using a Plackett-Luce model with Algorithm 1.The produced synthetic dataset has over 20,000 rankings from agents on the set of 20 alternatives.Each ranking follows a Gumbel distribution.
• The other type of dataset is the Flixster dataset that collects the movie ratings by users with social trust.It has over 8,000,000 ratings on over 2,000 movies.For the experiments, we convert the ratings to rankings, and select over 9,000 rankings on over 50 movies.

Evaluation Metrics
We evaluate th e performance on three metrics: (a) Prediction error, (b) Spearman correlation coefficient, (c) Kendall rank correlation coefficient.The first one measures the quality of the predicted ranking, and the others measure the degree of correlation on the predicted ranking with the original one.Please refer to Pearson [27] and Liu et.al. [2] for more details.
• Evaluation Metric 1: This evaluation metric estimates the accuracy on the predicted ranking with the original true one.

Certainty-based Preference Completion
where M is the maximum of the pairwise error, Y i,j,k = 1 means in predicted ranking, alternative user i prefers alternative j to alternative k and X i,j,k = 1 represents alternative user i prefers alternative j to alternative k in original ranking.I − (v) equals 1 when v < 0, and equals 0 otherwise.
• Evaluation Metric 2: The Spearman correlation coefficient measures the difference of the position for every alternative in predicted ranking and the original one to evaluate the similarity between the predicted ranking and the original one.The greater its value, the more precise our predicted ranking.
to simplify, we have where d i represents the difference on the position of alternative i with the predicted ranking and the original one.
• Evaluation Metric 3: The Kendall rank correlation coefficient is very similar to the above evaluation Metric 2, except that it uses the Kendall distance to measure the correlation: , , where the symbol in Equation ( 20) has the same meaning with the evaluation Metric 1, I x represents the alternatives set in original ranking, and I y represents the alternatives set in predicted ranking.

Experimental Results on Synthetic Data set and Flixster Dataset
In this section, we conduct the experiments on a synthetic dataset and the Flixster dataset.With the evaluation metrics separately, the comparison results with different approaches can be presented.The prediction error measures the difference in pairwise preference with the predicted ranking and original ranking.The goal is to reduce the prediction error as far as possible.While the Spearman correlation coefficient and the Kendall rank correlation coefficient measure the similarity between the predicted ranking and the original ranking.We expect the values on these two evaluation metrics can be higher possibly.

(a) Synthetic dataset
• As shown in Figure 4, it is very clear that the prediction error tends to be smaller when using certaintybased algorithm than the CF algorithm and the majority voting algorithm.In addition, the two rankingoriented approaches outperform the rating-oriented approach.For one thing, the ranking contains more preference relation information over alternatives than rating score, and thus it may be easier and more accurate in finding the user's neighbors and completing preference.As a result, the ranking-oriented approach has a lower prediction error.For another, the comparison between the

Certainty-based Preference Completion
certainty-based voting algorithm and the majority voting algorithm shows the superiority of the certainty-based one.The preference completion algorithm with certainty considered does reduce the effect of randomness.• Figure 5(a) shows the performance of Spearman correlation coefficient.On this evaluation metric, the certainty-based voting algorithm performs better than the other two algorithms.This is because our approach with preference space and certainty considered can filter out those pair preferences which have close votes and have lower certainty.This behavior causes the predicted rank much more trustworthy.
• Figure 5(b) shows the performance of Kendall rank correlation coefficient.We can get a similar conclusion with the Spearman correlation coefficient in Figure 5(a), so we do not repeat explanation here.
Roughly speaking, from the experiments on the synthetic dataset, we verify the effectiveness of our proposed certainty-based preference completion algorithm.

CONCLUSION AND FUTURE WORK
Due to the fact that the agents' rankings are nondeterministic, where they may provide their rankings under noisy environments, it is necessary and important to conduct the certainty-based preference completion.Hence, in this paper firstly, for alternative pairs a bijection has been built from the ranking space to the preference space, and its certainty and conflict have been evaluated based on a well-built statistical measurement Probability-Certainty Density Function.Then, a certainty-based voting algorithm based on the certainty and conflict has been taken to conduct the preference completion.More specifically, the ranking with high certainty and low conflict can be obtained with the proposed algorithm to conduct the preference completion.Moreover, the properties of the proposed approach about certainty and conflict have been studied empirically, and the proposed approach has been experimentally validated compared to the state-of-the-art approaches with several datasets.
As in real applications, the data is usually unbalanced [28], i.e., some alternative pairs have a lot of rankings, while others only have a few rankings.In our future work, we will propose algorithms to handle unbalanced preference completion both effectively and efficiently.

•
preference (prefer A to B, denoted as + AB P ), • dispreference (prefer B to A, denoted as − AB P ), and • uncertainty (no preference between A and B, denoted as − AB C ), according to the trisecting and acting models of human cognitive behaviors [1, 9].Thus, the following three situations are distinguished: • The agents prefer alternative A to alternative B, which can be confirmed by high preference + AB P , low dispreference − AB P , and low uncertainty − AB C .• The agents prefer alternative B to alternative A, which can be confirmed by low + AB P , high − AB P , and low − AB C .• The agents are uncertain about the preference between alternative pair A and B, i.e., A and B are unpreferred, which can be confirmed by low + AB P , low − AB P , and high − AB C .

− Property 1 :
Our definition on certainty should capture the following key properties: Certainty + AB C increases as the number of rankings between alternative pair A and B increases for a fixed ratio of rankings from A to B and rankings from B to A. − Property 2: Certainty + AB C decreases as the extent of conflict increases in the partial rankings between alternative pair A and B.

O
where A is ranked ahead of B with weight( )   where B is ranked ahead of A with weight( )   where A and B are not comparable with weight ( ) k AB w for the ranking ( ) k AB O , and AB n denotes the accumulated weight of rankings( )

First
, the following conclusions can be obtained:• if uncertainty 1 AB C − ≥ e ,alternatives A and B are unpreferred; i prefers B to A; -otherwise, A and B are unpreferred;

Algorithm 2 .
Certainty-based voting algorithm for estimating pairwise preference.
value aggregates all the pairwise preferences obtained in the previous subsection and represents the preference for item i among all the neighbors' rankings.Then it deletes the picked one from the alternatives pool and updates the potential values of the remaining items by removing the effects of the picked one.Repeat the picking process until the alternatives pool is empty, and then a linear ranking for user i is produced.See Algorithm 3.

Figure 2 Figure 2 .
Figure 2 plots how certainty + AB C varies with weighted rankings n AB and AB n under fixed conflict c AB .

Figure 3
Figure 3 plots how certainty + AB C varies with weighted rankings n AB and AB n under the fixed summation of n AB + n BA and the fixed AB n .This should confirm Property 2.

Figure 3 .
Figure 3. Certainty is concave when AB BA AB n n n + + and AB n is fi xed, and the minimum occurs at n AB = n BA .

Figure 4 .
Figure 4. Prediction error on synthetic dataset: x-axis denotes the number of neighbors.Plots show the prediction error.For this evaluation metric, smaller values are better.

Figure 6 .
Figure 6.Prediction error on Flixster dataset: x-axis the number of neighbors.Plots show the prediction error.For this evaluation metric, smaller values are better.

Figure 7 .
Figure 7. Performance on Flixster dataset: x-axis denotes the number of neighbors.Plots show the Spearman correlation coeffi cient (Spearman CC) and Kendall rank correlation coeffi cient (Kendall CC).For both evaluation metrics, higher values are better.