Notes on codes, projects and everything
Implementing a Information Retrieval system is a fun thing to do. However, doing it efficiently is not (at least to me). So my first few attempts didn’t really end well (mostly uses just Go/golang with some bash tricks here and there, with or without a database). Then I jumped back to Python, which I am more familiar with and was very surprised with all the options available. So I started with Pandas and Scikit-learn combo.
While following through the Statistical Learning course, I came across this part on doing regression with boosting. Then reading through the material, and going through it makes me wonder, the same method may be adapted to Erik Bernhardsson‘s annoy algorithm.
(more…)Everyone knows folksonomy is (or was) cool and useful, however, when it is applied in real life, then problem arises. The idea of blogging this came while I am struggling to get my literature review report done (been doing it for months, I am being so ridiculous, I know). As a matter of fact, as I am dying to get it done, there are a couple of things that I found to be blog-worthy. So, I will be publishing a couple of brief overview to some of the topics involved in the coming days in a really casual (read: lazy, and full of personal speculations) way to this very humble little blog of mine.
Sometimes, letting a piece of code evolving by itself without much planning does not usually end well. However I was quite pleased with a by-product of it and I am currently formalizing it. So the by-product is some sort of DSL for a rule engine that I implemented to process records. It started as some lambda functions in Python but eventually becomes something else.
Recently I switched my search code to Annoy because the input dataset is huge (7.5mil records with 20k dictionary count). It wasn’t without issues though, however I would probably talk about it next time. In order to figure out what each parameters meant, I spent some time watching through the talk given by the author @fulhack.
Recently I find some of my pet projects share a common pattern, they all are based on some kind of grids. So I find myself writing similar piece of code over and over again. While re-inventing wheels is quite fun, especially when you learn new way of getting things done with every iteration, it is actually quite tedious after a while.