Notes on codes, projects and everything
Back then when I was attending a job interview, I was asked to write a Fizz Buzz program to prove that my coding ability. There was only a pen and a piece of paper, so basically means there’s no way I can refer to the documentation for the API syntax. Fortunately I somehow managed to remember and not screw up.
While working on a text classification task, I spent quite some time preparing the training set for a given document collection. The project is supposed to be a pure golang implementation, so after some quick searching I found some libraries that are either a wrapper to libsvm, or a re-implementation. So I happily started to prepare my training set in the libsvm format.
Although my supervisor strongly recommend using JENA for RDF related work, but as I really don’t like Java (just personal preference), and wouldn’t want to install JRE/JVM (whatever it is called) at my shared server account, so I went to look for an alternative. After spending some time searching, I found this library called Redland and it provides binding for my current favorite language — PHP, so I decided to use this for my RDF work.
The making of this plugin was completely a random act of hand-itchiness. A friend of mine (@cornguo) published a fun app online. There is a name for this kind of app, but I can’t recall at the moment. It typically displays some buttons (usually in a grid), and clicking them causes some sound to be played. The interesting part in cornguo’s app is that there’s a text-input field where the name of the buttons can be typed-in for replaying.
Implementing a Information Retrieval system is a fun thing to do. However, doing it efficiently is not (at least to me). So my first few attempts didn’t really end well (mostly uses just Go/golang with some bash tricks here and there, with or without a database). Then I jumped back to Python, which I am more familiar with and was very surprised with all the options available. So I started with Pandas and Scikit-learn combo.
Recently I switched my search code to Annoy because the input dataset is huge (7.5mil records with 20k dictionary count). It wasn’t without issues though, however I would probably talk about it next time. In order to figure out what each parameters meant, I spent some time watching through the talk given by the author @fulhack.