## (note (code cslai))

Notes on codes, projects and everything

# Information Retrieval

• ### Further Hack on the Multi-Dimensional Approximate Neighbour Search

Traversing a tree structure often involves writing a recursive function. However, Python isn’t the best language for this purpose. Therefore I started flattening the tree into a key-value dictonary structure. Logically it is still a tree, but it is physically stored as a dictionary. Therefore it is now easier to write a simple loop to traverse it.

• ### Approximate Neighbour Search in Multiple Dimensions

In the previous post, I re-implemented Annoy in 2D with some linear algebra maths. Then I spent some time going through some tutorial on vectors, and expanded the script to handle data in 3D and more. So instead of finding gradient, the perpendicular line in the middle of two points, I construct a plane, and find the distance between it and points to construct the tree.

• ### Re-implementing Approximate Nearest Neighbour Search

Recently I switched my search code to Annoy because the input dataset is huge (7.5mil records with 20k dictionary count). It wasn’t without issues though, however I would probably talk about it next time. In order to figure out what each parameters meant, I spent some time watching through the talk given by the author @fulhack.

• ### Information Retrieving with ….. a lot of libraries

Implementing a Information Retrieval system is a fun thing to do. However, doing it efficiently is not (at least to me). So my first few attempts didn’t really end well (mostly uses just Go/golang with some bash tricks here and there, with or without a database). Then I jumped back to Python, which I am more familiar with and was very surprised with all the options available. So I started with Pandas and Scikit-learn combo.

## Random Posts

• ### 4 Fundamental Things in Programming (pre-101)

Just recently I volunteered to do a pre-101 kinda workshop for people wanting to learn programming. I had done this a few times in the past, but in different settings and goals in mind. The whole structure predates the sessions but I can’t remember when I first created them.

(more…)
• ### Building problock – my personal experience

A few years ago, I was asked to build a game or simulation (alongside 2048) as a part of a job application. Being very impressed with Explorable Explanations, I implemented Conway’s Game of life with Javascript and jQuery (that was before ES6 became popular). Then I made a very simple grid maker jQuery plugin to dynamically generate a grid of divs. If you check the source code, you may realize I rely on Underscore.js a lot back then.

(more…)
• ### Quick experiment, boosting vs annoy

While following through the Statistical Learning course, I came across this part on doing regression with boosting. Then reading through the material, and going through it makes me wonder, the same method may be adapted to Erik Bernhardsson‘s annoy algorithm.

(more…)
• ### Weekend Code Quiz – Easy Template Parser

A friend of mine recently posted a screenshot containing a code snippet for a fairly straight forward problem. So after reading the solution I suddenly had the itch to propose another solution that I initially thought would be better (SPOILER: Turns out it isn’t). Then mysteriously I stuck myself to my seat and started coding an alternative solution to it instead of playing Diablo 3 just now.

• ### Overdoing a task part 2: csv-to-json

This is the second part of the golang learning rant log. Previously on (note (code cslai)) I managed to make each line in the CSV into a hash map. So today I am going to make it into JSON Lines.