## (note (code cslai))

Notes on codes, projects and everything

# annoy

• ### Quick experiment, boosting vs annoy

While following through the Statistical Learning course, I came across this part on doing regression with boosting. Then reading through the material, and going through it makes me wonder, the same method may be adapted to Erik Bernhardsson‘s annoy algorithm.

• ### Approximate Neighbour Search in Multiple Dimensions

In the previous post, I re-implemented Annoy in 2D with some linear algebra maths. Then I spent some time going through some tutorial on vectors, and expanded the script to handle data in 3D and more. So instead of finding gradient, the perpendicular line in the middle of two points, I construct a plane, and find the distance between it and points to construct the tree.

• ### Re-implementing Approximate Nearest Neighbour Search

Recently I switched my search code to Annoy because the input dataset is huge (7.5mil records with 20k dictionary count). It wasn’t without issues though, however I would probably talk about it next time. In order to figure out what each parameters meant, I spent some time watching through the talk given by the author @fulhack.

• ### Information Retrieving with ….. a lot of libraries

Implementing a Information Retrieval system is a fun thing to do. However, doing it efficiently is not (at least to me). So my first few attempts didn’t really end well (mostly uses just Go/golang with some bash tricks here and there, with or without a database). Then I jumped back to Python, which I am more familiar with and was very surprised with all the options available. So I started with Pandas and Scikit-learn combo.

## Random Posts

• ### Creating a new daemon script for Meego 1.2 Harmattan

Call me a cheapskate, as I still have not subscribe to a mobile data plan after purchasing my second smartphone, namely Nokia N9. There’s this ‘allow background connections’ option but it doesn’t care whether the connected network is a WLAN network or mobile data network. After finding out that Nokia has no interest in creating another separate option so that each type of network has their respective ‘allow background connections’ switch, I decided to make one for my own.

• ### MVC: Kohana vs. Zend Framework

After comparing my own implementation of MVC with CodeIgniter’s, now I’m comparing Kohana’s and Zend’s. I have just shifted from CodeIgniter to Kohana recently in work and is currently learning on how to use Zend Framework to build my web-app. As everybody knows, Zend Framework is more like a collection of library classes than a framework a la Ruby on Rails, using MVC in Zend Framework would require one to begin from bootstrapping stage. However, in Kohana, just like other frameworks, bootstrapping is done by the framework itself so the developer will get an installation that almost just works (after a little bit of configuration).

• ### Spiraling Number

Back then in college, we were given a lot of programming practices. These questions usually shows a desired output format, and we were required to write a program to print out the exact thing. Usually it involves printing a matrix of numbers, or symbols etc. For these problems, usually a loop structure or two should solve the problem.

• ### My evil evil form and database library

Writing a usable form and database library has always been a painful experience. So why bother re-inventing the wheel when there are so many to choose from already? I am writing one mostly for learning purpose. After numerous attempts, I finally get my form and database library in shape. It is nowhere complete, but nor it is perfect, but it is currently the implementation that is closest to my original design. I will keep working on it so it can be used in my personal projects in the future.

• ### #nand2tetris is fun

The Nand2Tetris part I at coursera is very much my first completed course. It was so fun to actually work through the material and it feels amazing to know how simple it is to actually build a computer from scratch. While it is simple, it doesn’t mean the course itself is easy though. I was struggling to get the CPU wired up properly that I spent two to three days just to get it working.