Notes on codes, projects and everything
Traversing a tree structure often involves writing a recursive function. However, Python isn’t the best language for this purpose. Therefore I started flattening the tree into a key-value dictonary structure. Logically it is still a tree, but it is physically stored as a dictionary. Therefore it is now easier to write a simple loop to traverse it.
In the previous post, I re-implemented Annoy in 2D with some linear algebra maths. Then I spent some time going through some tutorial on vectors, and expanded the script to handle data in 3D and more. So instead of finding gradient, the perpendicular line in the middle of two points, I construct a plane, and find the distance between it and points to construct the tree.
Recently I switched my search code to Annoy because the input dataset is huge (7.5mil records with 20k dictionary count). It wasn’t without issues though, however I would probably talk about it next time. In order to figure out what each parameters meant, I spent some time watching through the talk given by the author @fulhack.
Implementing a Information Retrieval system is a fun thing to do. However, doing it efficiently is not (at least to me). So my first few attempts didn’t really end well (mostly uses just Go/golang with some bash tricks here and there, with or without a database). Then I jumped back to Python, which I am more familiar with and was very surprised with all the options available. So I started with Pandas and Scikit-learn combo.
Although my supervisor strongly recommend using JENA for RDF related work, but as I really don’t like Java (just personal preference), and wouldn’t want to install JRE/JVM (whatever it is called) at my shared server account, so I went to look for an alternative. After spending some time searching, I found this library called Redland and it provides binding for my current favorite language — PHP, so I decided to use this for my RDF work.
Back then, when I was still working on my postgraduate degree research, I used RDF, which was the preferred format in the world of Semantic Web to represent data. I eventually dropped the degree, and stopped following the development of the related technology and standards. Until I volunteered to update the import script for popit when I was looking for the next job/project.
(more…)There are a lot of things I want to post to both here and my personal blogs. However I was sucked into sanctuary for the most of last month. I guess after a month of playing, it is probably time to slowly resume my personal projects.
I came across a video on Youtube on Pi day. Coincidently it was about estimating the value of Pi produced by Matt Parker aka standupmaths. While I am not quite interested in knowing the best way to estimate Pi, I am quite interested in the algorithm he showed in the video however. Specifically, I am interested to find out how easy it is to implement in Python.
When one start writting Javascript in patterns like the module pattern, then sooner or later he would want to maintain the state when an event handler is called. The reason I am still using YUI to handle my event handling code is because I like how state can be maintained.