Recommender System Classification

Had a discussion with my secondary supervisor and it turned out pretty bad because I wasn’t fully prepared and he was rushing to somewhere else for a meeting. So I am jotting down a brief summary (read: highly based on personal/subjective feelings/opinions) of my readings here to help organize things before the followup meeting that is taking place next week.

For some reason, my secondary supervisor seems very skeptical with everything and keeps having argument with my primary supervisor in a lot of things. It is probably a good idea to ensure everyone is on the same page before I begin.

Folksonomy / Collaborative Tagging

Folksonomy is becoming the primary way web users organize their content. While rigid category may still being offered by some web-applications, the flexibility of folksonomy continue to attact more users to tag their resources using keywords^[1]. Besides that, a particular content may be represented with different sets of keywords over time, and folksonomy works great in capturing this change of semantics.

While we cannot deny the fact that the flexibility may cause abuse to the system, but this is not a new problem and there are people working on it. Besides that, considering tags are just a list of keywords describing a resource, they can be considered as a document and can be modeled in a number of ways like how IR people model documents. If the IR people face problems like polysem, synonyms etc, the same problems are also applied to the tags. One of the solutions to the problem is the usage of dimensionality reduction techniques such as LSI, pLSI or LDA ^[2], or clustering tags to avoid ambiguity ^[4].

Encouraging people to tag their resources properly is not actually within the scope of my project, although it is known that collaborative tagging system typically suffered from ramp-up problem (arises when new user/item is added to the system or ratings/tags for an item is sparsely populated). I actually have no idea why is this raised in the previous discussion. However, my secondary supervisor is kind enough to point me to a project called ESP Game which is a game to help tagging of pictures ^[3].

Conclusion: Tagging is a mean of organizing information that is gaining popularity, although it may be subjected to various problems, but it can be (to a certain extend) fixed.

Web 2.0 Applications

Again, both of my supervisors don’t seem to agree on the usefulness of web 2.0 applications. But as web 2.0 applications is gaining popularity, the amount of information generated is drastically increased and there may be a need to offer an alternative approach to enable users to discover them. It is not logical to say Web 2.0 application is built on flawed principles, and then the users deserve no more further innovation from the applications.

Recommender System

A recommender system is a system that recommend content to user to assist in discovery of new content and predict the degree of interest a user may have to an unrated item ^[5]. Depending on different classification methods, they can be generally categorized into 2 major types.

Content-based Recommender

Content-based recommender works by comparing user’s profile to unseen content to predict whether he/she would like that content. In short, it is a system that recommend content based on this statement – ‘you liked A, B and C in the past, then you should like D which is similar to A, B and C’ ^{[5, 7]}.

Collaborative-filtering

Unlike content-based recommender, collaborative-filtering system compares an unseen content with profiles of other users with similar interests, then recommend it if others expressed their interest in the past. In short, it is a system that exercises – ‘Alice, Bob, and Charlie liked A, therefore you should also like A‘ (assuming Alice, Bob and Charlie share the same interest with the current user) ^[5].

Collaborative-filtering system can be further categorized into 2 types – memory-based and model-based ^[6].

Memory-based

To me, this is what a recommender should do ideally. Everything about content and users are stored in the storage and used in the recommender algorithm. However, this means it is heavily depending on content that may not necessarily annotated, or tagged appropriately.

Model-based

Model-based is implementing collaborative filtering in another way where a set of training data is usually fed to a learning algorithm to produce a mathematical model ^[6]. Depending on implementation, an intermediate layer (probably lower dimension, depending on the model) is created to complement the incompleteness of annotated data.

Most of the current model consider all users are distributed independently and identically without considering relationships between them ^[6]. However, in reality, we often turn to people having same interests for recommendations ^[7]. Therefore, some people start utilizing social relationship and trust network into the modeling of recommendation algorithm ^[8].

Besides studying user’s social network, other relationships that may be observed are user-tag, user-content, and content-tag relationships. My research project will be more or less based on the research done by Ma et al ^[6]. So as I am going this route, my first problem will be identifying which relationship(s) should be incorporated into the model.

Data Sources and Domain

I wanted to work on something that would help in travel itinerary planning. But as I may not have enough time (with only roughly 7 months left) to do that, I am scaling down the scope to build a system that recommend images. Data will most probably come from flickr through their API as suggested by fellow stackoverflowers.

What do I do next?

As told by my primary supervisor, I am going to start drafting the initial design while keep reading more about this area.

(note (code cslai))

Recommender System Classification

Folksonomy / Collaborative Tagging

Web 2.0 Applications