Notes on codes, projects and everything
I haven’t got much time lately, so didn’t write about this new phone that I recently imported. For some reason, this new phone of mine do not act as mass storage device like its predecessors (to certain extend). Thankfully I can still ssh in the phone and this makes it possible to mount it as a sshfs volume.
I often struggle to get my Javascript code organized, and have tried numerous ways to do so. I have tried putting relevant code into classes and instantiate as needed, then abuse jQuery’s data()
method to store everything (from scalar values to functions and callbacks). Recently, after knowing (briefly) how a jQuery plugin should be written, it does greatly simplify my code.
One of my recent tasks involving crawling a lot of geo-tagged data from a given service. The most recent one is crawling files containing a point cloud for a given location. So I began by observing the behavior in the browser. After exporting the list of HTTP requests involved in loading the application, I noticed there are a lot of requests fetching resources with a common rXXX
pattern.
So my cheat with dask worked fine and dandy, until I started inspecting the output (which was to be used as an input for another script). While the script seemed to work fine, however when I started to parse each line I was hit with some funny syntax errors. After some quick inspection I found some of the lines was not printed completely.
Implementing a Information Retrieval system is a fun thing to do. However, doing it efficiently is not (at least to me). So my first few attempts didn’t really end well (mostly uses just Go/golang with some bash tricks here and there, with or without a database). Then I jumped back to Python, which I am more familiar with and was very surprised with all the options available. So I started with Pandas and Scikit-learn combo.
Back then, when I was still working on my postgraduate degree research, I used RDF, which was the preferred format in the world of Semantic Web to represent data. I eventually dropped the degree, and stopped following the development of the related technology and standards. Until I volunteered to update the import script for popit when I was looking for the next job/project.
(more…)