(note (code cslai))

Notes on codes, projects and everything

Python

Protobuf and dask

While JSON is a fine data-interchange format, however it does have some limitations. It is well-known for its simplicity, that even a non-programmer can easily compose a JSON file ~~(but humanity will surprise you IRL)~~. Therefore, it is found almost everywhere, from numerous web APIs, to geospatial data (GeoJSON), and even semantic web (RDF/JSON).

(more…)
The Little Pythoner, Part 3

Previously, I started practising recursions by implementing a type check on lat (list of atoms), and ismember (whether an atom is a member of a given lat). Then in the third chapter, named “Cons the Magnificent”, more list manipulation methods are being introduced.

(more…)
The Little Pythoner, Part 2

In the last part, I implemented a couple of primitive functions so that they can be applied in the following chapters. The second chapter of the book, is titled “Do it again, and again, and again…”. The title already hints that readers will deal with repetitions throughout the chapter.

(more…)
The Little Pythoner, Part I

I saw this article from alistapart, which is about Javascript’s prototypal object orientation. So the article mentioned Douglas Crawford, and I was immediately reminded about my struggle in understanding the language itself. Back then I used to also refer to his site for a lot of notes in Javascript. So I went back to have a quick read, and found this article that discusses the similarity between Javascript and Lisp.

(more…)
Oh NO! My print output is chopped!

So my cheat with dask worked fine and dandy, until I started inspecting the output (which was to be used as an input for another script). While the script seemed to work fine, however when I started to parse each line I was hit with some funny syntax errors. After some quick inspection I found some of the lines was not printed completely.

(more…)
Processing JSON with dask.bag

Often times, I am dealing with JSONL files, though panda’s DataFrame is great (and blaze to certain extend), however it is offering too much for the job. Most of the received data is in the form of structured text and I do all sorts of work with them. For example checking for consistency, doing replace based on values of other columns, stripping whitespace etc.

(more…)
Estimating the value of pi with a line of Python

I came across a video on Youtube on Pi day. Coincidently it was about estimating the value of Pi produced by Matt Parker aka standupmaths. While I am not quite interested in knowing the best way to estimate Pi, I am quite interested in the algorithm he showed in the video however. Specifically, I am interested to find out how easy it is to implement in Python.

(more…)
Drink All You Can

Sometimes I really doubt about the advantage of recycling old stuff to fund for new units beyond goodwill. Sure you get to convince yourself that you are saving the environment by doing so, and it also saves money in the long run. However, I didn’t realize how much it generates it may be after trying to work out an answer for a fictional IQ question.

(more…)
Linting a libsvm-formatted data

While working on a text classification task, I spent quite some time preparing the training set for a given document collection. The project is supposed to be a pure golang implementation, so after some quick searching I found some libraries that are either a wrapper to libsvm, or a re-implementation. So I happily started to prepare my training set in the libsvm format.

(more…)

Random Posts

Evaluating logical statements

Often times one would have to write code to evaluate logical statements. For example, given statement p and q, what is p implies q? As there’s no operator for implication in PHP, one would have to rewrite the statement that consists only in NOT (!), AND (&&) and OR (||) operators. When there are a huge load of these statements, code can be difficult to read.

(more…)
Hitting a nail with the wrong hammer (my writing tools)

I just failed a programming assessment test miserably yesterday and thought I should at least document it down. However, the problem with this is that the questions are copyrighted, so I guess I would write it from another point of view. So the main reason I failed was because I chose the wrong strategy to the problem, thinking it should be solution but as I put in time to that I ended up creating more problems.

(more…)
Mixing Non-array and Array in array_map()

array_map function is a function that I use the most in my php scripts recently. However, there are times where I want to pass some non-array into it, therefore often times I have code like the snippet shown below:

$result = array_map( 'some_callback', array_fill(0, count($some_array), 'some_string'), array_fill(0, count($some_array), 'some_other_string'), $some_array )

It doesn’t look good IMO, as it makes the code looks complicated. Hence, after seeing how the code may vary in all different scenarios, I created some functions to clean up the array_map call as seen above. Code snippet after the jump

(more…)
Fuzzy C-Means and Content Recommendation

This post is purely based on my own speculation as there’s no experiment on real-life data to actually back the arguments. I am currently trying to document down a plan for my experiment(s) on recommender system (this reminds me that I have not release the Flickr data collection tool :/) and my supervisor advised to write a paragraph or two on some of the key things. Since he is not going to read it, so I might as well just post it here as a note.

(more…)
Using YUI for DOM Manipulation

After being frustrated of not getting consistent and accurate result via standard DOM methods especially html_element.getAttribute('key'); and html_element.setAttribute('key', 'value');, I came across some YUI library components that provides abstractions to various DOM methods. Some interesting DOM-related tools covered in this post are YAHOO.util.Element, YAHOO.util.DOM and YAHOO.util.Selector.

(more…)