Notes on codes, projects and everything
Traversing a tree structure often involves writing a recursive function. However, Python isn’t the best language for this purpose. Therefore I started flattening the tree into a key-value dictonary structure. Logically it is still a tree, but it is physically stored as a dictionary. Therefore it is now easier to write a simple loop to traverse it.
In the previous post, I re-implemented Annoy in 2D with some linear algebra maths. Then I spent some time going through some tutorial on vectors, and expanded the script to handle data in 3D and more. So instead of finding gradient, the perpendicular line in the middle of two points, I construct a plane, and find the distance between it and points to construct the tree.
Recently I switched my search code to Annoy because the input dataset is huge (7.5mil records with 20k dictionary count). It wasn’t without issues though, however I would probably talk about it next time. In order to figure out what each parameters meant, I spent some time watching through the talk given by the author @fulhack.
Implementing a Information Retrieval system is a fun thing to do. However, doing it efficiently is not (at least to me). So my first few attempts didn’t really end well (mostly uses just Go/golang with some bash tricks here and there, with or without a database). Then I jumped back to Python, which I am more familiar with and was very surprised with all the options available. So I started with Pandas and Scikit-learn combo.
Back then in college, we were given a lot of programming practices. These questions usually shows a desired output format, and we were required to write a program to print out the exact thing. Usually it involves printing a matrix of numbers, or symbols etc. For these problems, usually a loop structure or two should solve the problem.
This is basically a small incremental update to my script published here. For some reason, the previous version of the script didn’t really work, so this release should fix the problem. Besides fixing the problem where the daemon did not actually launched at start up, I have added a settings applet for this script as well.
The Sports Tracker app for my awesome Nokia N9 is not receiving any updates and doesn’t look like things are going to change any time soon. Recently the development team at Sports Tracker published a status update post and sadly there’s no mention of N9 port at all. It’s really sad considering how incomplete the N9 port is at the moment (horrible GPS positioning, no pedometer to name a few).
To do node selection for DOM operations, one typically uses CSS selectors as (probably) popularized by jQuery. However, there is another alternative that is as powerful if not better known as XPath. XPath may be able to do a lot more than just selecting node (which I have no time to find out for now) but I will just focus on how to do node selection in this blog post.
After a year and half, a lot of things changed, and annoy also changed the splitting strategy too. However, I always wanted to do a proper follow up to the original post, where I compared boosting to Annoy. I still remember the reason I started that (flawed) experiment was because I found boosting easy.
(more…)