Notes on codes, projects and everything
So apparently Annoy is now splitting points by using the centroids of 2 means clustering. It is claimed that it provides better results for ANN search, however, how does this impact regression? Purely out of curiosity, I plugged a new point splitting function and generated a new set of points.
(more…)After a year and half, a lot of things changed, and annoy also changed the splitting strategy too. However, I always wanted to do a proper follow up to the original post, where I compared boosting to Annoy. I still remember the reason I started that (flawed) experiment was because I found boosting easy.
(more…)While following through the Statistical Learning course, I came across this part on doing regression with boosting. Then reading through the material, and going through it makes me wonder, the same method may be adapted to Erik Bernhardsson‘s annoy algorithm.
(more…)Traversing a tree structure often involves writing a recursive function. However, Python isn’t the best language for this purpose. Therefore I started flattening the tree into a key-value dictonary structure. Logically it is still a tree, but it is physically stored as a dictionary. Therefore it is now easier to write a simple loop to traverse it.
My cloud storage is nearly exploding, and I am not in a good position to start subscribing for more storage (Yes, I am still #opentowork). Considering I just moved my domain settings to Cloudflare and started using Cloudflare tunnel, I figure I probably should just back up some of my photos, and host it on my workstation.
(more…)I was trying to learn scala and clojure to find one that I may want to use in my postgraduate project. After trying to learn scala for a couple of days, I gave up because I really don’t like the syntax (too OO for my liking). Then I continued with clojure and found myself liking the syntax better.
Often times, I am dealing with JSONL files, though panda’s DataFrame is great (and blaze to certain extend), however it is offering too much for the job. Most of the received data is in the form of structured text and I do all sorts of work with them. For example checking for consistency, doing replace based on values of other columns, stripping whitespace etc.
I really don’t know how to start explaining what is a Dragon Curve. However, I find it is interesting enough after finding out that there’s actually a fixed pattern of occurrence. Therefore I spent some time writing a series of scripts to plot the generated fractal into a graph. What I didn’t expect is, the series get really complicated after a while.
Traversing a tree structure often involves writing a recursive function. However, Python isn’t the best language for this purpose. Therefore I started flattening the tree into a key-value dictonary structure. Logically it is still a tree, but it is physically stored as a dictionary. Therefore it is now easier to write a simple loop to traverse it.