Notes on codes, projects and everything
Back then, when I was still working on my postgraduate degree research, I used RDF, which was the preferred format in the world of Semantic Web to represent data. I eventually dropped the degree, and stopped following the development of the related technology and standards. Until I volunteered to update the import script for popit when I was looking for the next job/project.
(more…)This post is purely based on my own speculation as there’s no experiment on real-life data to actually back the arguments. I am currently trying to document down a plan for my experiment(s) on recommender system (this reminds me that I have not release the Flickr data collection tool :/) and my supervisor advised to write a paragraph or two on some of the key things. Since he is not going to read it, so I might as well just post it here as a note.
After being frustrated of not getting consistent and accurate result via standard DOM methods especially html_element.getAttribute('key'); and html_element.setAttribute('key', 'value');, I came across some YUI library components that provides abstractions to various DOM methods. Some interesting DOM-related tools covered in this post are YAHOO.util.Element, YAHOO.util.DOM and YAHOO.util.Selector.
So apparently Annoy is now splitting points by using the centroids of 2 means clustering. It is claimed that it provides better results for ANN search, however, how does this impact regression? Purely out of curiosity, I plugged a new point splitting function and generated a new set of points.
(more…)The making of this plugin was completely a random act of hand-itchiness. A friend of mine (@cornguo) published a fun app online. There is a name for this kind of app, but I can’t recall at the moment. It typically displays some buttons (usually in a grid), and clicking them causes some sound to be played. The interesting part in cornguo’s app is that there’s a text-input field where the name of the buttons can be typed-in for replaying.
Recently I switched my search code to Annoy because the input dataset is huge (7.5mil records with 20k dictionary count). It wasn’t without issues though, however I would probably talk about it next time. In order to figure out what each parameters meant, I spent some time watching through the talk given by the author @fulhack.