Notes on codes, projects and everything
Everyone knows folksonomy is (or was) cool and useful, however, when it is applied in real life, then problem arises. The idea of blogging this came while I am struggling to get my literature review report done (been doing it for months, I am being so ridiculous, I know). As a matter of fact, as I am dying to get it done, there are a couple of things that I found to be blog-worthy. So, I will be publishing a couple of brief overview to some of the topics involved in the coming days in a really casual (read: lazy, and full of personal speculations) way to this very humble little blog of mine.
This post is purely based on my own speculation as there’s no experiment on real-life data to actually back the arguments. I am currently trying to document down a plan for my experiment(s) on recommender system (this reminds me that I have not release the Flickr data collection tool :/) and my supervisor advised to write a paragraph or two on some of the key things. Since he is not going to read it, so I might as well just post it here as a note.
Had a discussion with my secondary supervisor and it turned out pretty bad because I wasn’t fully prepared and he was rushing to somewhere else for a meeting. So I am jotting down a brief summary (read: highly based on personal/subjective feelings/opinions) of my readings here to help organize things before the followup meeting that is taking place next week.
Just survived a job interview, so I should probably celebrate this despite the outcome. Well, considering I was off the job market for a couple of years, I probably has all the reason to be nervous. Anyway, like most
geeky serious job interview, there are a test given by the company to the attendees.
So I first heard about Panda probably a year ago when I was in my previous job. It looked nice, but I didn’t really get the chance to use it. So practically it is a library that makes data looks like a mix of relational database table and excel sheet. It is easy to do query with it, and provides a way to process it fast if you know how to do it properly (no, I don’t, so I cheated).
Often times, I am dealing with JSONL files, though panda’s DataFrame is great (and blaze to certain extend), however it is offering too much for the job. Most of the received data is in the form of structured text and I do all sorts of work with them. For example checking for consistency, doing replace based on values of other columns, stripping whitespace etc.