Notes on codes, projects and everything
As the name implies, Resource Definition Framework, or RDF in short, is a language to represent information about resources in world wide web. Information that can be represented is mostly metadata like title (assuming the resource is a web-page), author, last modified date etc. Besides representing resource that is network-accessible, it can be used to represent things that cannot be accessed through the network, as long as it can be identified using a URI.
After reading through the documentation, I find that the role based ACL and work flow can be more tightly integrated. Therefore I made all the transaction into many FSMs and my work flow component now consists of one work flow library and one work flow management model. As I am going a more normalized design (I use denormalized design in work as it deals with a lot of documents, however for a small project like mine, a denormalized design should do well).
After delaying for quite some time, I think I should start the project before I get bored with it. The project will be either hosted on this current domain (coolsilon.com) at least for now and will probably move to another domain if needed. The site will be either a blog aggregator or just a simple article submission site that works kinda like digg / reddit, however, to be promoted to the frontpage the submission would have to impress the opposite group.
Implementing a Information Retrieval system is a fun thing to do. However, doing it efficiently is not (at least to me). So my first few attempts didn’t really end well (mostly uses just Go/golang with some bash tricks here and there, with or without a database). Then I jumped back to Python, which I am more familiar with and was very surprised with all the options available. So I started with Pandas and Scikit-learn combo.
While following through the Statistical Learning course, I came across this part on doing regression with boosting. Then reading through the material, and going through it makes me wonder, the same method may be adapted to Erik Bernhardsson‘s annoy algorithm.(more…)