Notes on codes, projects and everything
In the previous post, I re-implemented Annoy in 2D with some linear algebra maths. Then I spent some time going through some tutorial on vectors, and expanded the script to handle data in 3D and more. So instead of finding gradient, the perpendicular line in the middle of two points, I construct a plane, and find the distance between it and points to construct the tree.
Recently I switched my search code to Annoy because the input dataset is huge (7.5mil records with 20k dictionary count). It wasn’t without issues though, however I would probably talk about it next time. In order to figure out what each parameters meant, I spent some time watching through the talk given by the author @fulhack.
Implementing a Information Retrieval system is a fun thing to do. However, doing it efficiently is not (at least to me). So my first few attempts didn’t really end well (mostly uses just Go/golang with some bash tricks here and there, with or without a database). Then I jumped back to Python, which I am more familiar with and was very surprised with all the options available. So I started with Pandas and Scikit-learn combo.
Sometimes I really doubt about the advantage of recycling old stuff to fund for new units beyond goodwill. Sure you get to convince yourself that you are saving the environment by doing so, and it also saves money in the long run. However, I didn’t realize how much it generates it may be after trying to work out an answer for a fictional IQ question.
So I first heard about Panda probably a year ago when I was in my previous job. It looked nice, but I didn’t really get the chance to use it. So practically it is a library that makes data looks like a mix of relational database table and excel sheet. It is easy to do query with it, and provides a way to process it fast if you know how to do it properly (no, I don’t, so I cheated).
With most of my stuff more or less set, I guess it is time to start documenting the steps before I forget. So I heard a lot of good things about docker for quite some time, but haven’t really have the time to do it due to laziness (plus my relatively n00b-ness in the field of dev-ops). Just a few months ago, I decided to finally migrate away from webfaction (thanks for all the superb support) to a VPS so I can run more things on it.
While working on a text classification task, I spent quite some time preparing the training set for a given document collection. The project is supposed to be a pure golang implementation, so after some quick searching I found some libraries that are either a wrapper to libsvm, or a re-implementation. So I happily started to prepare my training set in the libsvm format.
This is the second part of the golang learning rant log. Previously on (note (code cslai)) I managed to make each line in the CSV into a hash map. So today I am going to make it into JSON Lines.
I was invited to try Go (the programming language, not that board game) a few months ago, however I didn’t complete back then. The main reason was because it felt raw, compared to other languages that I know a fair bit better (for example Ruby). There was no much syntatic sugar around, and getting some work done with it feels “dirty”.
A new day, and a new post on job application. So this time instead of asking a snippet, I was actually asked to deliver some sort of a full application. Not sure why this was required, but I had fun creating them nonetheless. Though I would say I am not really a fan of creating visual stuff though (oh the crappy animation nearly killed me).
Just recently I volunteered to do a pre-101 kinda workshop for people wanting to learn programming. I had done this a few times in the past, but in different settings and goals in mind. The whole structure predates the sessions but I can’t remember when I first created them.
(more…)While JSON is a fine data-interchange format, however it does have some limitations. It is well-known for its simplicity, that even a non-programmer can easily compose a JSON file (but humanity will surprise you IRL). Therefore, it is found almost everywhere, from numerous web APIs, to geospatial data (GeoJSON), and even semantic web (RDF/JSON).
A few years ago, I was asked to build a game or simulation (alongside 2048) as a part of a job application. Being very impressed with Explorable Explanations, I implemented Conway’s Game of life with Javascript and jQuery (that was before ES6 became popular). Then I made a very simple grid maker jQuery plugin to dynamically generate a grid of divs. If you check the source code, you may realize I rely on Underscore.js a lot back then.
(more…)A friend of mine recently posted a screenshot containing a code snippet for a fairly straight forward problem. So after reading the solution I suddenly had the itch to propose another solution that I initially thought would be better (SPOILER: Turns out it isn’t). Then mysteriously I stuck myself to my seat and started coding an alternative solution to it instead of playing Diablo 3 just now.
I’m not sure why Windows Media Player 11 doesn’t like mp4/aac format but not being able to transfer my tracks in mp4 through MTP to my N81 really piss me off. At first I installed Songbird 1.0.0 to try synchronizing the tracks through the MTP addon but it doesn’t even play *.mp4 tracks out of the box under Windows XP SP3 (although it claimed otherwise). Weird thing happens when I was trying to synchronize anyway by having the files renamed to *.mp4 extension and it succeeded but without any meta-tag information.