Notes on codes, projects and everything
So apparently Annoy is now splitting points by using the centroids of 2 means clustering. It is claimed that it provides better results for ANN search, however, how does this impact regression? Purely out of curiosity, I plugged a new point splitting function and generated a new set of points.
(more…)After a year and half, a lot of things changed, and annoy also changed the splitting strategy too. However, I always wanted to do a proper follow up to the original post, where I compared boosting to Annoy. I still remember the reason I started that (flawed) experiment was because I found boosting easy.
(more…)While following through the Statistical Learning course, I came across this part on doing regression with boosting. Then reading through the material, and going through it makes me wonder, the same method may be adapted to Erik Bernhardsson‘s annoy algorithm.
(more…)In the previous post, I re-implemented Annoy in 2D with some linear algebra maths. Then I spent some time going through some tutorial on vectors, and expanded the script to handle data in 3D and more. So instead of finding gradient, the perpendicular line in the middle of two points, I construct a plane, and find the distance between it and points to construct the tree.
Recently I switched my search code to Annoy because the input dataset is huge (7.5mil records with 20k dictionary count). It wasn’t without issues though, however I would probably talk about it next time. In order to figure out what each parameters meant, I spent some time watching through the talk given by the author @fulhack.
Implementing a Information Retrieval system is a fun thing to do. However, doing it efficiently is not (at least to me). So my first few attempts didn’t really end well (mostly uses just Go/golang with some bash tricks here and there, with or without a database). Then I jumped back to Python, which I am more familiar with and was very surprised with all the options available. So I started with Pandas and Scikit-learn combo.
I didn’t realize that I have been working for 3 weeks until the Labour Day which was a public holiday. Many things happened in these few weeks and I am still struggling to catch up with it. My superior and colleagues have been very helpful and offered me some helpful tutorials and books. I was instructed to build a event scheduler application using codeigniter in the first week and then work on a side project that extends a form using DOM methods and properties.
I don’t quite remember when did I first heard about Category Theory, but the term stuck in my head for quite a while. Eventually I attempted to start looking for tutorials on the topic, but it is hard to find one that I actually understand. Most of them are either leaning too much to the Mathematics side, or too much to the Programming side.
(more…)Just recently I volunteered to do a pre-101 kinda workshop for people wanting to learn programming. I had done this a few times in the past, but in different settings and goals in mind. The whole structure predates the sessions but I can’t remember when I first created them.
(more…)Back then in college, we were given a lot of programming practices. These questions usually shows a desired output format, and we were required to write a program to print out the exact thing. Usually it involves printing a matrix of numbers, or symbols etc. For these problems, usually a loop structure or two should solve the problem.
I often struggle to get my Javascript code organized, and have tried numerous ways to do so. I have tried putting relevant code into classes and instantiate as needed, then abuse jQuery’s data()
method to store everything (from scalar values to functions and callbacks). Recently, after knowing (briefly) how a jQuery plugin should be written, it does greatly simplify my code.