Notes on codes, projects and everything
Often times, I am dealing with JSONL files, though panda’s DataFrame is great (and blaze to certain extend), however it is offering too much for the job. Most of the received data is in the form of structured text and I do all sorts of work with them. For example checking for consistency, doing replace based on values of other columns, stripping whitespace etc.
Usually I take about a week to learn a new language so I can start doing some
real work with it. After all a programming language (at least the high level and dynamic ones) is just assignment, calculation, branching, looping and reuse (and in certain cases, concurrency/parallelism, not gonna dive deep in defining the difference though). Well, that was true until I started learning Rust, partly for my own leisure.
Traversing a tree structure often involves writing a recursive function. However, Python isn’t the best language for this purpose. Therefore I started flattening the tree into a key-value dictonary structure. Logically it is still a tree, but it is physically stored as a dictionary. Therefore it is now easier to write a simple loop to traverse it.
Recently I switched my search code to Annoy because the input dataset is huge (7.5mil records with 20k dictionary count). It wasn’t without issues though, however I would probably talk about it next time. In order to figure out what each parameters meant, I spent some time watching through the talk given by the author @fulhack.
This post continued from this post. Finally I have found some time to start developing my pet project using Zend Framework. After getting the controller to work the way I am more familiar (comparing to Kohana which I used at work) with, the next step is to get it to output some data.
I didn’t realize that I have been working for 3 weeks until the Labour Day which was a public holiday. Many things happened in these few weeks and I am still struggling to catch up with it. My superior and colleagues have been very helpful and offered me some helpful tutorials and books. I was instructed to build a event scheduler application using codeigniter in the first week and then work on a side project that extends a form using DOM methods and properties.