Notes on codes, projects and everything
So apparently Annoy is now splitting points by using the centroids of 2 means clustering. It is claimed that it provides better results for ANN search, however, how does this impact regression? Purely out of curiosity, I plugged a new point splitting function and generated a new set of points.
(more…)After a year and half, a lot of things changed, and annoy also changed the splitting strategy too. However, I always wanted to do a proper follow up to the original post, where I compared boosting to Annoy. I still remember the reason I started that (flawed) experiment was because I found boosting easy.
(more…)While following through the Statistical Learning course, I came across this part on doing regression with boosting. Then reading through the material, and going through it makes me wonder, the same method may be adapted to Erik Bernhardsson‘s annoy algorithm.
(more…)In the previous post, I re-implemented Annoy in 2D with some linear algebra maths. Then I spent some time going through some tutorial on vectors, and expanded the script to handle data in 3D and more. So instead of finding gradient, the perpendicular line in the middle of two points, I construct a plane, and find the distance between it and points to construct the tree.
Recently I switched my search code to Annoy because the input dataset is huge (7.5mil records with 20k dictionary count). It wasn’t without issues though, however I would probably talk about it next time. In order to figure out what each parameters meant, I spent some time watching through the talk given by the author @fulhack.
Implementing a Information Retrieval system is a fun thing to do. However, doing it efficiently is not (at least to me). So my first few attempts didn’t really end well (mostly uses just Go/golang with some bash tricks here and there, with or without a database). Then I jumped back to Python, which I am more familiar with and was very surprised with all the options available. So I started with Pandas and Scikit-learn combo.
While JSON is a fine data-interchange format, however it does have some limitations. It is well-known for its simplicity, that even a non-programmer can easily compose a JSON file (but humanity will surprise you IRL). Therefore, it is found almost everywhere, from numerous web APIs, to geospatial data (GeoJSON), and even semantic web (RDF/JSON).
A friend of mine recently posted a screenshot containing a code snippet for a fairly straight forward problem. So after reading the solution I suddenly had the itch to propose another solution that I initially thought would be better (SPOILER: Turns out it isn’t). Then mysteriously I stuck myself to my seat and started coding an alternative solution to it instead of playing Diablo 3 just now.
I am not going to waste time telling stories that inspire this post, as most people would have already heard something similar constantly. This is not a mythbuster kinda post, so don’t expect a scientific proof to the answer of the question. Instead, through this post, I hope to break the impression that claims composing a HTML document is difficult.
I didn’t realize that I have been working for 3 weeks until the Labour Day which was a public holiday. Many things happened in these few weeks and I am still struggling to catch up with it. My superior and colleagues have been very helpful and offered me some helpful tutorials and books. I was instructed to build a event scheduler application using codeigniter in the first week and then work on a side project that extends a form using DOM methods and properties.
I am currently doing some organization to my blogs. Few weeks ago, after spending months struggling to work in Ubuntu 7.10, I learned about symbolic links. Then I thought this would be good for my project file management. Therefore I started to re-organize my project file structure to utilize symbolic links. One of the projects that uses symbolic link is the current wordpress theme.