Notes on codes, projects and everything
Although my supervisor strongly recommend using JENA for RDF related work, but as I really don’t like Java (just personal preference), and wouldn’t want to install JRE/JVM (whatever it is called) at my shared server account, so I went to look for an alternative. After spending some time searching, I found this library called Redland and it provides binding for my current favorite language — PHP, so I decided to use this for my RDF work.
With most of my stuff more or less set, I guess it is time to start documenting the steps before I forget. So I heard a lot of good things about docker for quite some time, but haven’t really have the time to do it due to laziness (plus my relatively n00b-ness in the field of dev-ops). Just a few months ago, I decided to finally migrate away from webfaction (thanks for all the superb support) to a VPS so I can run more things on it.
Recently I switched my search code to Annoy because the input dataset is huge (7.5mil records with 20k dictionary count). It wasn’t without issues though, however I would probably talk about it next time. In order to figure out what each parameters meant, I spent some time watching through the talk given by the author @fulhack.
While working on a text classification task, I spent quite some time preparing the training set for a given document collection. The project is supposed to be a pure golang implementation, so after some quick searching I found some libraries that are either a wrapper to libsvm, or a re-implementation. So I happily started to prepare my training set in the libsvm format.
It is useful to have the terminal around whenever I code. However, while real screen estate is finite, having a terminal further limiting the amount of information that can be displayed at the same time. The problem with the terminal is that I don’t really need it all the time, so I usually find it buried under a group of windows.
This is the second part of the golang learning rant log. Previously on (note (code cslai)) I managed to make each line in the CSV into a hash map. So today I am going to make it into JSON Lines.