Notes on codes, projects and everything
So my cheat with dask worked fine and dandy, until I started inspecting the output (which was to be used as an input for another script). While the script seemed to work fine, however when I started to parse each line I was hit with some funny syntax errors. After some quick inspection I found some of the lines was not printed completely.
Often times, I am dealing with JSONL files, though panda’s DataFrame is great (and blaze to certain extend), however it is offering too much for the job. Most of the received data is in the form of structured text and I do all sorts of work with them. For example checking for consistency, doing replace based on values of other columns, stripping whitespace etc.
I came across a video on Youtube on Pi day. Coincidently it was about estimating the value of Pi produced by Matt Parker aka standupmaths. While I am not quite interested in knowing the best way to estimate Pi, I am quite interested in the algorithm he showed in the video however. Specifically, I am interested to find out how easy it is to implement in Python.
After delaying for quite some time, I think I should start the project before I get bored with it. The project will be either hosted on this current domain (coolsilon.com) at least for now and will probably move to another domain if needed. The site will be either a blog aggregator or just a simple article submission site that works kinda like digg / reddit, however, to be promoted to the frontpage the submission would have to impress the opposite group.
After shifting all my instant messaging accounts to my Nokia N9, I stopped getting email alerts via Adium. Therefore, when I finally remember to check my mailboxes, they are already loaded with exploding amount of mails (mostly junk and newsletter though). I don’t fancy doing my email stuff with my device, and don’t feel like installing a webmail checker to my browser, hence this simple little script is written for my phone.
After the last post, I found that it may be fun to write a wrapper for YUI in order to make it behave like jQuery. Therefore, the code below is clearly mainly for self-amusement and is not intended to be used in production projects. However, through coding this, I found that although the difference in design, but YUI is obviously capable to do what jQuery offers (if not more). I will not continue working on this so whoever interested may just copy and paste the code to further developing it.
Traversing a tree structure often involves writing a recursive function. However, Python isn’t the best language for this purpose. Therefore I started flattening the tree into a key-value dictonary structure. Logically it is still a tree, but it is physically stored as a dictionary. Therefore it is now easier to write a simple loop to traverse it.
So apparently Annoy is now splitting points by using the centroids of 2 means clustering. It is claimed that it provides better results for ANN search, however, how does this impact regression? Purely out of curiosity, I plugged a new point splitting function and generated a new set of points.(more…)