Notes on codes, projects and everything
So apparently Annoy is now splitting points by using the centroids of 2 means clustering. It is claimed that it provides better results for ANN search, however, how does this impact regression? Purely out of curiosity, I plugged a new point splitting function and generated a new set of points.
(more…)After a year and half, a lot of things changed, and annoy also changed the splitting strategy too. However, I always wanted to do a proper follow up to the original post, where I compared boosting to Annoy. I still remember the reason I started that (flawed) experiment was because I found boosting easy.
(more…)While following through the Statistical Learning course, I came across this part on doing regression with boosting. Then reading through the material, and going through it makes me wonder, the same method may be adapted to Erik Bernhardsson‘s annoy algorithm.
(more…)Traversing a tree structure often involves writing a recursive function. However, Python isn’t the best language for this purpose. Therefore I started flattening the tree into a key-value dictonary structure. Logically it is still a tree, but it is physically stored as a dictionary. Therefore it is now easier to write a simple loop to traverse it.
Just survived a job interview, so I should probably celebrate this despite the outcome. Well, considering I was off the job market for a couple of years, I probably has all the reason to be nervous. Anyway, like most geeky serious job interview, there are a test given by the company to the attendees.
This update took me quite a bit more time than I initially expected. Anyway, I have done some refactoring work to the original code, and thought it would be nice to document the changes. Overall, most of the changes involved the refactoring of function names. I am not sure if this would stick, but I am quite satisfied for now.
So my cheat with dask worked fine and dandy, until I started inspecting the output (which was to be used as an input for another script). While the script seemed to work fine, however when I started to parse each line I was hit with some funny syntax errors. After some quick inspection I found some of the lines was not printed completely.
Implementing a Information Retrieval system is a fun thing to do. However, doing it efficiently is not (at least to me). So my first few attempts didn’t really end well (mostly uses just Go/golang with some bash tricks here and there, with or without a database). Then I jumped back to Python, which I am more familiar with and was very surprised with all the options available. So I started with Pandas and Scikit-learn combo.
Recently the term “Semantic Web” becomes extremely popular that Sitepoint blogs keep posting articles on this topic (1, 2). In my college days, I learned about Semantic Network and I wonder if there is some relationship between them. I’m not sure whether I get the concept correctly but in this article I would like to revise a bit on semantic network before going to semantic web. Please correct me if I’m wrong.