Notes on codes, projects and everything
Often times, I am dealing with JSONL files, though panda’s DataFrame is great (and blaze to certain extend), however it is offering too much for the job. Most of the received data is in the form of structured text and I do all sorts of work with them. For example checking for consistency, doing replace based on values of other columns, stripping whitespace etc.
Just survived a job interview, so I should probably celebrate this despite the outcome. Well, considering I was off the job market for a couple of years, I probably has all the reason to be nervous. Anyway, like most
geeky serious job interview, there are a test given by the company to the attendees.
It is very difficult to like the way vim handle plugins by default, so I was really thrilled to find out about pathogen when a geek I followed tweeted about it. It took me some time to actually re-organize my current configuration to this new format. Then I thought why not reorganize my .vimrc as well, as my current version looks a bit cryptic after a while.
Just a quick update to the previous post, the virtuoso storage engine works with redland provided the required packages are properly installed (yes, yes, yes, I know I haven’t release my PHP OO wrapper for Redland). Now that the package is installed, we need to do some configuration so that Redland can use it.
So my cheat with dask worked fine and dandy, until I started inspecting the output (which was to be used as an input for another script). While the script seemed to work fine, however when I started to parse each line I was hit with some funny syntax errors. After some quick inspection I found some of the lines was not printed completely.
Traversing a tree structure often involves writing a recursive function. However, Python isn’t the best language for this purpose. Therefore I started flattening the tree into a key-value dictonary structure. Logically it is still a tree, but it is physically stored as a dictionary. Therefore it is now easier to write a simple loop to traverse it.