Notes on codes, projects and everything
With most of my stuff more or less set, I guess it is time to start documenting the steps before I forget. So I heard a lot of good things about docker for quite some time, but haven’t really have the time to do it due to laziness (plus my relatively n00b-ness in the field of dev-ops). Just a few months ago, I decided to finally migrate away from webfaction (thanks for all the superb support) to a VPS so I can run more things on it.
Everyone knows folksonomy is (or was) cool and useful, however, when it is applied in real life, then problem arises. The idea of blogging this came while I am struggling to get my literature review report done (been doing it for months, I am being so ridiculous, I know). As a matter of fact, as I am dying to get it done, there are a couple of things that I found to be blog-worthy. So, I will be publishing a couple of brief overview to some of the topics involved in the coming days in a really casual (read: lazy, and full of personal speculations) way to this very humble little blog of mine.
So apparently Annoy is now splitting points by using the centroids of 2 means clustering. It is claimed that it provides better results for ANN search, however, how does this impact regression? Purely out of curiosity, I plugged a new point splitting function and generated a new set of points.
(more…)Just managed to migrate all my blog sites to one centralized multi-site, so no more half-baked solution and hopefully this brings better plugin compatibility. I have not check with other related services (like Google Webmaster Tools) whether this cause any breakage though. Well, the main purpose of this blog post is actually a draft of what I did for the past two months for my postgraduate programme. Yea, I should have posted more stuff to this blog (just realized that my last post here is already like half a year ago).
Recently I switched my search code to Annoy because the input dataset is huge (7.5mil records with 20k dictionary count). It wasn’t without issues though, however I would probably talk about it next time. In order to figure out what each parameters meant, I spent some time watching through the talk given by the author @fulhack.
Had a discussion with my secondary supervisor and it turned out pretty bad because I wasn’t fully prepared and he was rushing to somewhere else for a meeting. So I am jotting down a brief summary (read: highly based on personal/subjective feelings/opinions) of my readings here to help organize things before the followup meeting that is taking place next week.