Notes on codes, projects and everything
It is useful to have the terminal around whenever I code. However, while real screen estate is finite, having a terminal further limiting the amount of information that can be displayed at the same time. The problem with the terminal is that I don’t really need it all the time, so I usually find it buried under a group of windows.
After publishing the previous note on setting up my development environment, I find myself spending more time in the CLI (usually via SSH from host). Then I find myself not needing all the GUI apps in a standard Ubuntu desktop environment so I went ahead and set up a new environment based on Ubuntu Quantal server edition beta-1. For some reason my network stopped working and didn’t really want to spend time finding out the cause, so I reinstalled everything again today using the final installer, as well as the updated Virtualbox 4.2.6.
I have been following this excellent guide written by Benjamin Thomas to set up my virtual machine for development purpose. However, when I am starting to configure a Ubuntu Quantal alpha machine, parts of the guide became inapplicable. Hence, this post is written as a small revision to the previously mentioned guide.
Been trying my best to stick to the well-known UNIX Philosophy – “Do one thing and do it well”, so I have been breaking down my projects into numerous pieces of small tasks and rely on existing tools whenever possible. One of the existing tool that I use a lot is the GNU sort tool. Generally sort utility is really doing fine and dandy without having to configure anything, at least not until I realize the problem that leads to this post.
Just a quick update to the previous post, the virtuoso storage engine works with redland provided the required packages are properly installed (yes, yes, yes, I know I haven’t release my PHP OO wrapper for Redland). Now that the package is installed, we need to do some configuration so that Redland can use it.
I wanted to try using virtuoso as the storage engine for Redland but unfortunately there is no librdf-storage-virtuoso package for Ubuntu. After getting some help from @dajobe, I attempted to build the packages myself. Although it takes quite some time to build packages, but not too difficult it seems.
Another day, another programming assessment test. This time I was asked to generate some random data, then examine them to get their data type. Practically it is not a very difficult thing to do and I could probably complete it in fewer lines. I am pretty sure there are better ways to do this, as usual though.
After shifting all my instant messaging accounts to my Nokia N9, I stopped getting email alerts via Adium. Therefore, when I finally remember to check my mailboxes, they are already loaded with exploding amount of mails (mostly junk and newsletter though). I don’t fancy doing my email stuff with my device, and don’t feel like installing a webmail checker to my browser, hence this simple little script is written for my phone.
Recently I switched my search code to Annoy because the input dataset is huge (7.5mil records with 20k dictionary count). It wasn’t without issues though, however I would probably talk about it next time. In order to figure out what each parameters meant, I spent some time watching through the talk given by the author @fulhack.
In the previous post, I re-implemented Annoy in 2D with some linear algebra maths. Then I spent some time going through some tutorial on vectors, and expanded the script to handle data in 3D and more. So instead of finding gradient, the perpendicular line in the middle of two points, I construct a plane, and find the distance between it and points to construct the tree.
Implementing a Information Retrieval system is a fun thing to do. However, doing it efficiently is not (at least to me). So my first few attempts didn’t really end well (mostly uses just Go/golang with some bash tricks here and there, with or without a database). Then I jumped back to Python, which I am more familiar with and was very surprised with all the options available. So I started with Pandas and Scikit-learn combo.