Notes on codes, projects and everything
Recently I volunteered in building a site that reports whether certain websites are blocked locally (please don’t ask why that is happening). As it is a very simple app reporting status I wanted it to be easily scrape-able. One of the decision made was I want it to have things to see on first load, this practically removes the possibility of using react, which is my current favorite.
Often times, I am dealing with JSONL files, though panda’s DataFrame is great (and blaze to certain extend), however it is offering too much for the job. Most of the received data is in the form of structured text and I do all sorts of work with them. For example checking for consistency, doing replace based on values of other columns, stripping whitespace etc.
After shifting all my instant messaging accounts to my Nokia N9, I stopped getting email alerts via Adium. Therefore, when I finally remember to check my mailboxes, they are already loaded with exploding amount of mails (mostly junk and newsletter though). I don’t fancy doing my email stuff with my device, and don’t feel like installing a webmail checker to my browser, hence this simple little script is written for my phone.
After reading through the documentation, I find that the role based ACL and work flow can be more tightly integrated. Therefore I made all the transaction into many FSMs and my work flow component now consists of one work flow library and one work flow management model. As I am going a more normalized design (I use denormalized design in work as it deals with a lot of documents, however for a small project like mine, a denormalized design should do well).
Recently I switched my search code to Annoy because the input dataset is huge (7.5mil records with 20k dictionary count). It wasn’t without issues though, however I would probably talk about it next time. In order to figure out what each parameters meant, I spent some time watching through the talk given by the author @fulhack.
After a miserable trip back to academic world, I finally re-gained the courage to get back to job-market. For the time spent in university, I spent quite some time reading about Semantic Web and RDF. Then I thought, I should have published more in this format in future. However, that didn’t really happen, mostly because I am too lazy.