(note (code cslai))

Notes on codes, projects and everything

dask

Processing JSON with dask.bag

Often times, I am dealing with JSONL files, though panda’s DataFrame is great (and blaze to certain extend), however it is offering too much for the job. Most of the received data is in the form of structured text and I do all sorts of work with them. For example checking for consistency, doing replace based on values of other columns, stripping whitespace etc.

(more…)

Random Posts

Some status updates

There are a lot of things I want to post to both here and my personal blogs. However I was sucked into sanctuary for the most of last month. I guess after a month of playing, it is probably time to slowly resume my personal projects.

(more…)
Random notes on Pandas and Scikit-learn

So I first heard about Panda probably a year ago when I was in my previous job. It looked nice, but I didn’t really get the chance to use it. So practically it is a library that makes data looks like a mix of relational database table and excel sheet. It is easy to do query with it, and provides a way to process it fast if you know how to do it properly (no, I don’t, so I cheated).

(more…)
Database transactions involving prepared statements

I was thinking whether it is possible to avoid exposing PDO and PDOStatement objects to the users of my database library (mainly just me). While I was working on my project I sort of notice that there is a almost fixed pattern whenever I work with the database. With this in mind, I added in some new functions to the library, and decided to make a quick release for this.

(more…)
Regression with Annoy (cont’d)

After a year and half, a lot of things changed, and annoy also changed the splitting strategy too. However, I always wanted to do a proper follow up to the original post, where I compared boosting to Annoy. I still remember the reason I started that (flawed) experiment was because I found boosting easy.
(more…)
More Interviewing Questions

Just survived a job interview, so I should probably celebrate this despite the outcome. Well, considering I was off the job market for a couple of years, I probably has all the reason to be nervous. Anyway, like most ~~geeky~~ serious job interview, there are a test given by the company to the attendees.

(more…)

(note (code cslai))

dask

Processing JSON with dask.bag

Random Posts

Some status updates

Random notes on Pandas and Scikit-learn

Database transactions involving prepared statements

Regression with Annoy (cont’d)

More Interviewing Questions

Meta

Search

Archives

Categories

Pages

Recent Posts