Notes on codes, projects and everything
So apparently Annoy is now splitting points by using the centroids of 2 means clustering. It is claimed that it provides better results for ANN search, however, how does this impact regression? Purely out of curiosity, I plugged a new point splitting function and generated a new set of points.(more…)
After a year and half, a lot of things changed, and annoy also changed the splitting strategy too. However, I always wanted to do a proper follow up to the original post, where I compared boosting to Annoy. I still remember the reason I started that (flawed) experiment was because I found boosting easy.(more…)
Usually I take about a week to learn a new language so I can start doing some
real work with it. After all a programming language (at least the high level and dynamic ones) is just assignment, calculation, branching, looping and reuse (and in certain cases, concurrency/parallelism, not gonna dive deep in defining the difference though). Well, that was true until I started learning Rust, partly for my own leisure.
While following through the Statistical Learning course, I came across this part on doing regression with boosting. Then reading through the material, and going through it makes me wonder, the same method may be adapted to Erik Bernhardsson‘s annoy algorithm.(more…)
I finally put in some time and effort learning myself a bit of Rust. Though I am still struggling with ownership and lifetimes (which is essentially everything about the language, to be honest), I find it more interesting compared to Golang, which is relatively boring, though being functional (no pun intended). While learning the language, the one thing I came across often is the
Option enum, then I remembered that I read something about Monad.
Recently I volunteered in building a site that reports whether certain websites are blocked locally (please don’t ask why that is happening). As it is a very simple app reporting status I wanted it to be easily scrape-able. One of the decision made was I want it to have things to see on first load, this practically removes the possibility of using react, which is my current favorite.
One of my recent tasks involving crawling a lot of geo-tagged data from a given service. The most recent one is crawling files containing a point cloud for a given location. So I began by observing the behavior in the browser. After exporting the list of HTTP requests involved in loading the application, I noticed there are a lot of requests fetching resources with a common
While JSON is a fine data-interchange format, however it does have some limitations. It is well-known for its simplicity, that even a non-programmer can easily compose a JSON file
(but humanity will surprise you IRL). Therefore, it is found almost everywhere, from numerous web APIs, to geospatial data (GeoJSON), and even semantic web (RDF/JSON).
I was asked to evaluate fuzzy c-means to find out whether it is a good clustering algorithm for my MPhil project. So I spent the whole afternoon reading through some tutorial to get some basic understanding. Then I thought why not implement it in Clojure because it doesn’t look too complicated (I was so wrong…).
Long long time ago when I was working with Prolog, I was introduced to list. Unlike arrays in most popular programming languages, we weren’t really able to access to a particular member directly. Every list is constructed in a chain-like structure.
Just managed to migrate all my blog sites to one centralized multi-site, so no more
half-baked solution and hopefully this brings better plugin compatibility. I have not check with other related services (like Google Webmaster Tools) whether this cause any breakage though. Well, the main purpose of this blog post is actually a draft of what I did for the past two months for my postgraduate programme. Yea, I should have posted more stuff to this blog (just realized that my last post here is already like half a year ago).