Solo disponible en BuenasTareas
  • Páginas : 30 (7310 palabras )
  • Descarga(s) : 0
  • Publicado : 29 de septiembre de 2010
Leer documento completo
Vista previa del texto
Enhancing Link-Based Similarity Through the Use of Non-Numerical Labels and Prior Information
Christian Desrosiers
Software Eng. and IT Department Ecole de Technologie Superieure 1100 Notre-Dame W. Montreal (QC) H3C1K3, Canada

George Karypis
Computer Science & Eng. Department University of Minnesota, Twin Cities 200 Union Street SE Minneapolis (MN) 55455, ABSTRACT
Several key applications like recommender systems require to compute similarities between the nodes (objects or entities) of a bipartite network. These similarities serve many important purposes, such as finding users sharing common interests or items with similar characteristics, as well as the automated recommendation and categorization of items. While a broad range of methods havebeen proposed to compute similarities in networks, such methods have two limitations: (1) they require the link values to be in the form of numerical weights representing the strength of the corresponding relation, and (2) they do not take into account prior information on the similarities. This paper presents a novel approach, based on the SimRank algorithm, to compute similarities between thenodes of a bipartite network. Unlike current methods, this approach allows one to model the agreement between link values using any desired function, and provides a simple way to integrate prior information on the similarity values directly in the computations. To evaluate its usefulness, we test this approach on the problem of predicting the ratings of users for movies and
has several valuable uses, among which are the recommendation of new items, the discovery of groups of like-minded individuals, and the automated categorization of items. A popular method to compute the similarity between users or items, found in many recommender systems, is based on the correlation between the ratings made by users on common items. As recognized by several recent works onthis topic, such as [7, 12, 28], this method is very sensitive to sparse data. For instance, while two users can be similar if they have rated different items, this method is unable to evaluate their similarity in such cases. Moreover, although efficient approaches based on dimensionality reduction and graph theory have been proposed for this problem, they generally suffer from two limitations: • They donot allow categorical ratings or other nonnumerical rating types; • They do not provide an easy way to integrate prior information on the similarity values. To illustrate these limitations, consider the example of Figure 1. This example shows the categorical evaluation (good, bad, ok and N/A) of two unknown criteria, given by three users (John, Mary and Simon) to items i and j. Since the ratingsare non-numerical, traditional methods based on correlation, dimensionality reduction or graph theory cannot be used to compute the similarity between these users or items. Furthermore, prior information on the similarity between users or items, obtained from user profiles or item contents, is often available in recommender systems. Although not considered by current methods, this information canbe used to enhance the computation of similarities.
{good ,bad}

Networks, link-based similarity, SimRank, item recommendation

Computing significant similarities between objects or entities of a network is an important problem that has several key applications such as image processing, classifying web pages, classifying protein interaction and gene expressiondata, part-of-speech tagging, detecting malicious or fraudulent activities, and recommending new and interesting items to consumers. For instance, in recommender systems, one of the most crucial tasks is to find users that share common interests, or items with similar characteristics. This task




Item i


,ok} } ok , } {ok {bad,N/A


Item j

tracking img