Notes on codes, projects and everything
So apparently Annoy is now splitting points by using the centroids of 2 means clustering. It is claimed that it provides better results for ANN search, however, how does this impact regression? Purely out of curiosity, I plugged a new point splitting function and generated a new set of points.
(more…)After a year and half, a lot of things changed, and annoy also changed the splitting strategy too. However, I always wanted to do a proper follow up to the original post, where I compared boosting to Annoy. I still remember the reason I started that (flawed) experiment was because I found boosting easy.
(more…)This post is purely based on my own speculation as there’s no experiment on real-life data to actually back the arguments. I am currently trying to document down a plan for my experiment(s) on recommender system (this reminds me that I have not release the Flickr data collection tool :/) and my supervisor advised to write a paragraph or two on some of the key things. Since he is not going to read it, so I might as well just post it here as a note.
I was asked to evaluate fuzzy c-means to find out whether it is a good clustering algorithm for my MPhil project. So I spent the whole afternoon reading through some tutorial to get some basic understanding. Then I thought why not implement it in Clojure because it doesn’t look too complicated (I was so wrong…).
Just a quick update to the previous post, the virtuoso storage engine works with redland provided the required packages are properly installed (yes, yes, yes, I know I haven’t release my PHP OO wrapper for Redland). Now that the package is installed, we need to do some configuration so that Redland can use it.
I used to develop a bot, partly for work, that fetches current latest petrol retail price in Malaysia. The bot was really an experiment, but at the time it worked well. Then a few years later, out of boredom, I revisited the project after finding the telegram bot library is moving towards asyncio. It was great (at least a lot of people rave about it), but also at the same time intimidating, I learned about coroutines and used gevent in the past, but not asyncio itself.
(more…)Recently I am involved in developing some small modules for a enterprise class website using CodeIgniter (CI). There was no restriction given on which framework should I use for the development and I chose CI as I learned a bit on it (when I was considering whether to shift my personal development project). Of course there are other reasons why I chose to learn CI, for example the superior documentation and screencasts available.
While following through the Statistical Learning course, I came across this part on doing regression with boosting. Then reading through the material, and going through it makes me wonder, the same method may be adapted to Erik Bernhardsson‘s annoy algorithm.
(more…)Although my supervisor strongly recommend using JENA for RDF related work, but as I really don’t like Java (just personal preference), and wouldn’t want to install JRE/JVM (whatever it is called) at my shared server account, so I went to look for an alternative. After spending some time searching, I found this library called Redland and it provides binding for my current favorite language — PHP, so I decided to use this for my RDF work.