Notes on codes, projects and everything
Traversing a tree structure often involves writing a recursive function. However, Python isn’t the best language for this purpose. Therefore I started flattening the tree into a key-value dictonary structure. Logically it is still a tree, but it is physically stored as a dictionary. Therefore it is now easier to write a simple loop to traverse it.
In the previous post, I re-implemented Annoy in 2D with some linear algebra maths. Then I spent some time going through some tutorial on vectors, and expanded the script to handle data in 3D and more. So instead of finding gradient, the perpendicular line in the middle of two points, I construct a plane, and find the distance between it and points to construct the tree.
Recently I switched my search code to Annoy because the input dataset is huge (7.5mil records with 20k dictionary count). It wasn’t without issues though, however I would probably talk about it next time. In order to figure out what each parameters meant, I spent some time watching through the talk given by the author @fulhack.
Implementing a Information Retrieval system is a fun thing to do. However, doing it efficiently is not (at least to me). So my first few attempts didn’t really end well (mostly uses just Go/golang with some bash tricks here and there, with or without a database). Then I jumped back to Python, which I am more familiar with and was very surprised with all the options available. So I started with Pandas and Scikit-learn combo.
There are a lot of things I want to post to both here and my personal blogs. However I was sucked into sanctuary for the most of last month. I guess after a month of playing, it is probably time to slowly resume my personal projects.
Not sure about the others, but the obsession to my coding tools is probably more than I would admit. I have just managed to do a dirty quick hack to manage my VIM configuration settings. While I am sure there are other people doing this, I would like to show my reinvented wheels.
While following through the Statistical Learning course, I came across this part on doing regression with boosting. Then reading through the material, and going through it makes me wonder, the same method may be adapted to Erik Bernhardsson‘s annoy algorithm.
While working on a text classification task, I spent quite some time preparing the training set for a given document collection. The project is supposed to be a pure golang implementation, so after some quick searching I found some libraries that are either a wrapper to libsvm, or a re-implementation. So I happily started to prepare my training set in the libsvm format.
While my static pages and site theme is still under construction, I went to CodeIgniter.com to see how to work with it and played the tutorial screencasts. The reason behind in considering CodeIgniter (CI) to be used in my next project is because I don’t feel like re-inventing the wheel. However to port my current project to use CI may cause some problems as there are differences in how we implement MVC structure.