Notes on codes, projects and everything
While following through the Statistical Learning course, I came across this part on doing regression with boosting. Then reading through the material, and going through it makes me wonder, the same method may be adapted to Erik Bernhardsson‘s annoy algorithm.
In the previous post, I re-implemented Annoy in 2D with some linear algebra maths. Then I spent some time going through some tutorial on vectors, and expanded the script to handle data in 3D and more. So instead of finding gradient, the perpendicular line in the middle of two points, I construct a plane, and find the distance between it and points to construct the tree.
Recently I switched my search code to Annoy because the input dataset is huge (7.5mil records with 20k dictionary count). It wasn’t without issues though, however I would probably talk about it next time. In order to figure out what each parameters meant, I spent some time watching through the talk given by the author @fulhack.
Implementing a Information Retrieval system is a fun thing to do. However, doing it efficiently is not (at least to me). So my first few attempts didn’t really end well (mostly uses just Go/golang with some bash tricks here and there, with or without a database). Then I jumped back to Python, which I am more familiar with and was very surprised with all the options available. So I started with Pandas and Scikit-learn combo.
Recently the term “Semantic Web” becomes extremely popular that Sitepoint blogs keep posting articles on this topic (1, 2). In my college days, I learned about Semantic Network and I wonder if there is some relationship between them. I’m not sure whether I get the concept correctly but in this article I would like to revise a bit on semantic network before going to semantic web. Please correct me if I’m wrong.
I just failed a programming assessment test miserably yesterday and thought I should at least document it down. However, the problem with this is that the questions are copyrighted, so I guess I would write it from another point of view. So the main reason I failed was because I chose the wrong strategy to the problem, thinking it should be solution but as I put in time to that I ended up creating more problems.
Although my supervisor strongly recommend using JENA for RDF related work, but as I really don’t like Java (just personal preference), and wouldn’t want to install JRE/JVM (whatever it is called) at my shared server account, so I went to look for an alternative. After spending some time searching, I found this library called Redland and it provides binding for my current favorite language — PHP, so I decided to use this for my RDF work.