Notes on codes, projects and everything
This is the year I kept digging my old undergraduate notes on Statistics for work. First was my brief attempt wearing the Data Scientist performing ANOVA test to see if there’s correlation between pairs of variables. Then just recently I was tasked to analyze a survey result for a social audit project.
(more…)One of my recent tasks involving crawling a lot of geo-tagged data from a given service. The most recent one is crawling files containing a point cloud for a given location. So I began by observing the behavior in the browser. After exporting the list of HTTP requests involved in loading the application, I noticed there are a lot of requests fetching resources with a common rXXX
pattern.
It is very difficult to like the way vim handle plugins by default, so I was really thrilled to find out about pathogen when a geek I followed tweeted about it. It took me some time to actually re-organize my current configuration to this new format. Then I thought why not reorganize my .vimrc as well, as my current version looks a bit cryptic after a while.
I was trying to learn scala and clojure to find one that I may want to use in my postgraduate project. After trying to learn scala for a couple of days, I gave up because I really don’t like the syntax (too OO for my liking). Then I continued with clojure and found myself liking the syntax better.
Recently I am being assigned to work on a FSM based workflow system to enable my other colleagues to use it in their application. I am rather impressed with the simplicity of the workflow system (initially designed by my Technical Director) and decided to post some notes here. The workflow system was developed for CodeIgniter PHP framework and Drupal CMS.
Back then in college, we were given a lot of programming practices. These questions usually shows a desired output format, and we were required to write a program to print out the exact thing. Usually it involves printing a matrix of numbers, or symbols etc. For these problems, usually a loop structure or two should solve the problem.
While following through the Statistical Learning course, I came across this part on doing regression with boosting. Then reading through the material, and going through it makes me wonder, the same method may be adapted to Erik Bernhardsson‘s annoy algorithm.
(more…)So I first heard about Panda probably a year ago when I was in my previous job. It looked nice, but I didn’t really get the chance to use it. So practically it is a library that makes data looks like a mix of relational database table and excel sheet. It is easy to do query with it, and provides a way to process it fast if you know how to do it properly (no, I don’t, so I cheated).
So apparently Annoy is now splitting points by using the centroids of 2 means clustering. It is claimed that it provides better results for ANN search, however, how does this impact regression? Purely out of curiosity, I plugged a new point splitting function and generated a new set of points.
(more…)I used to develop a bot, partly for work, that fetches current latest petrol retail price in Malaysia. The bot was really an experiment, but at the time it worked well. Then a few years later, out of boredom, I revisited the project after finding the telegram bot library is moving towards asyncio. It was great (at least a lot of people rave about it), but also at the same time intimidating, I learned about coroutines and used gevent in the past, but not asyncio itself.
(more…)