Notes on codes, projects and everything
So apparently Annoy is now splitting points by using the centroids of 2 means clustering. It is claimed that it provides better results for ANN search, however, how does this impact regression? Purely out of curiosity, I plugged a new point splitting function and generated a new set of points.
(more…)After a year and half, a lot of things changed, and annoy also changed the splitting strategy too. However, I always wanted to do a proper follow up to the original post, where I compared boosting to Annoy. I still remember the reason I started that (flawed) experiment was because I found boosting easy.
(more…)While following through the Statistical Learning course, I came across this part on doing regression with boosting. Then reading through the material, and going through it makes me wonder, the same method may be adapted to Erik Bernhardsson‘s annoy algorithm.
(more…)In the previous post, I re-implemented Annoy in 2D with some linear algebra maths. Then I spent some time going through some tutorial on vectors, and expanded the script to handle data in 3D and more. So instead of finding gradient, the perpendicular line in the middle of two points, I construct a plane, and find the distance between it and points to construct the tree.
Recently I switched my search code to Annoy because the input dataset is huge (7.5mil records with 20k dictionary count). It wasn’t without issues though, however I would probably talk about it next time. In order to figure out what each parameters meant, I spent some time watching through the talk given by the author @fulhack.
Implementing a Information Retrieval system is a fun thing to do. However, doing it efficiently is not (at least to me). So my first few attempts didn’t really end well (mostly uses just Go/golang with some bash tricks here and there, with or without a database). Then I jumped back to Python, which I am more familiar with and was very surprised with all the options available. So I started with Pandas and Scikit-learn combo.
Being new to asyncio, after publishing the previous post on running multiple applications in one event loop, I also cross posted it to the discussion board for feedback. So apparently instead of cramming everything to the same event loop, it would be better if each application run on a separate thread. That makes sense, considering all the code that was written for that.
(more…)Just managed to migrate all my blog sites to one centralized multi-site, so no more half-baked solution and hopefully this brings better plugin compatibility. I have not check with other related services (like Google Webmaster Tools) whether this cause any breakage though. Well, the main purpose of this blog post is actually a draft of what I did for the past two months for my postgraduate programme. Yea, I should have posted more stuff to this blog (just realized that my last post here is already like half a year ago).
It is very much expected that there will be endless stream of new (and often times better) tools introduced to solve the same set of problems. While I am slowly resuming my programming work, and in the process of reviving my very much dead postgrad project, I found some alternative to the tools I had used in the past. I suppose I shall just jot them down here so that there’s a reference for later use.
A really sweet new feature in the recently released update is the ability to change lockscreen shortcut. Unfortunately there is no easy way to change connection with my Jolla unlike my old Nokia N9 (no pun intended). As I have not been using my N9 for quite some time, I was only reminded when I came across this thread on TMO.
I am currently doing some organization to my blogs. Few weeks ago, after spending months struggling to work in Ubuntu 7.10, I learned about symbolic links. Then I thought this would be good for my project file management. Therefore I started to re-organize my project file structure to utilize symbolic links. One of the projects that uses symbolic link is the current wordpress theme.