Notes on codes, projects and everything
The Internet Censorship Dashboard is a project that aggregates data fetched from the OONI API, to provide an overview of the current state of Internet Censorship experienced by users mainly in Southeast Asia. The current form was built a couple of years ago, and recently got funded to get it updated to work better with new APIs.
(more…)Back then, when I was still working on my postgraduate degree research, I used RDF, which was the preferred format in the world of Semantic Web to represent data. I eventually dropped the degree, and stopped following the development of the related technology and standards. Until I volunteered to update the import script for popit when I was looking for the next job/project.
(more…)In recent years, I start to make my development environment decouple from the tools delivered by the package manager used by the operating system. The tools (compiler, interpreters, libraries etc) are usually best left unmodified so other system packages that rely on them keeps working as intended. Also another reason for the setup is I wanted to follow the latest release as much as possible, which cannot be done unless I enroll myself to a rolling release distro.
(more…)Just recently I volunteered to do a pre-101 kinda workshop for people wanting to learn programming. I had done this a few times in the past, but in different settings and goals in mind. The whole structure predates the sessions but I can’t remember when I first created them.
(more…)I don’t quite remember when did I first heard about Category Theory, but the term stuck in my head for quite a while. Eventually I attempted to start looking for tutorials on the topic, but it is hard to find one that I actually understand. Most of them are either leaning too much to the Mathematics side, or too much to the Programming side.
(more…)This is the year I kept digging my old undergraduate notes on Statistics for work. First was my brief attempt wearing the Data Scientist performing ANOVA test to see if there’s correlation between pairs of variables. Then just recently I was tasked to analyze a survey result for a social audit project.
(more…)So apparently Annoy is now splitting points by using the centroids of 2 means clustering. It is claimed that it provides better results for ANN search, however, how does this impact regression? Purely out of curiosity, I plugged a new point splitting function and generated a new set of points.
(more…)After a year and half, a lot of things changed, and annoy also changed the splitting strategy too. However, I always wanted to do a proper follow up to the original post, where I compared boosting to Annoy. I still remember the reason I started that (flawed) experiment was because I found boosting easy.
(more…)A few years ago, I was asked to build a game or simulation (alongside 2048) as a part of a job application. Being very impressed with Explorable Explanations, I implemented Conway’s Game of life with Javascript and jQuery (that was before ES6 became popular). Then I made a very simple grid maker jQuery plugin to dynamically generate a grid of divs. If you check the source code, you may realize I rely on Underscore.js a lot back then.
(more…)A friend of mine recently posted a screenshot containing a code snippet for a fairly straight forward problem. So after reading the solution I suddenly had the itch to propose another solution that I initially thought would be better (SPOILER: Turns out it isn’t). Then mysteriously I stuck myself to my seat and started coding an alternative solution to it instead of playing Diablo 3 just now.
My cloud storage is nearly exploding, and I am not in a good position to start subscribing for more storage (Yes, I am still #opentowork). Considering I just moved my domain settings to Cloudflare and started using Cloudflare tunnel, I figure I probably should just back up some of my photos, and host it on my workstation.
(more…)Implementing a Information Retrieval system is a fun thing to do. However, doing it efficiently is not (at least to me). So my first few attempts didn’t really end well (mostly uses just Go/golang with some bash tricks here and there, with or without a database). Then I jumped back to Python, which I am more familiar with and was very surprised with all the options available. So I started with Pandas and Scikit-learn combo.
After being frustrated of not getting consistent and accurate result via standard DOM methods especially html_element.getAttribute('key');
and html_element.setAttribute('key', 'value');
, I came across some YUI library components that provides abstractions to various DOM methods. Some interesting DOM-related tools covered in this post are YAHOO.util.Element
, YAHOO.util.DOM
and YAHOO.util.Selector
.
Another day, another programming assessment test. This time I was asked to generate some random data, then examine them to get their data type. Practically it is not a very difficult thing to do and I could probably complete it in fewer lines. I am pretty sure there are better ways to do this, as usual though.