As the name implies, Resource Definition Framework, or RDF in short, is a language to represent information about resources in world wide web. Information that can be represented is mostly metadata like title (assuming the resource is a web-page), author, last modified date etc. Besides representing resource that is network-accessible, it can be used to represent things that cannot be accessed through the network, as long as it can be identified using a URI.
The main objective of RDF is to generate information that can be processed by applications by defining a standardized approach to represent resources. The usage of standardized language also enables interchanging of information between applications without loss of meaning. This allows third party applications to use / consume information created and because the information format is standardized, tools are readily available to manipulate the information.
As mentioned earlier, as long as a thing can be represented in the form of Universal Resource Identifier, URI, then it can be represented / described by RDF. URI is generally used to represent not only network-accessible things, but also non-network accessible things like a arbitrary human being, corporation, or even a book in a library as well as abstract concepts that do not necessarily exist physically like creator / author / modified date. URL, which stands for Uniform Resource Locator is a subset of URI.
RDF is a simple language that deals with only binary relationship, which involves a Subject, a Predicate and a Object. Given an example “web(cslai) is Jeffrey04’s blog”, we can re-structure the statement into subject = web(cslai), predicate = owner, object = Jeffrey04. Then we can put this into a graph (kinda reminds me of Semantic Network), as follows (not RDF graph):
From the graph, we can see that the relationship between a subject and object is described by the predicate. Or another way of saying, subject ( web(cslai) ) has a property in the form of predicate (owner) that has a value of object ( Jeffrey04 ).
Before going to construct a RDF graph, it is important to know that as RDF is used to provide information on a resource, there are a set of basic rules to follow when constructing a RDF statement. Subject should always be a URI, or a blank node (will be discussed later) that denotes a resource, predicate must always be a URI and object can be another resource, a blank node or a constant represented by a character string.
Besides serializing a RDF graph into an XML file, the statements can also be written in the form of triples. Each statement in a graph is written as a simple triple of subject, predicate and object in exact order. Another point to note is that a graph is a primary manner of represent statement, and any other way to represent a statement is considered secondary.
The basic syntax of a triple requires URI to be enclosed in angle bracket or QNAME which kinda resembles XML vocabulary/namespace thingy, and literals to be enclosed in double quotes. For example
<http://cslai.coolsilon.com/> csterms:owner "Jeffrey04".
Blank node is introduced when a structured data presents as the object value. For example, given a triple as follows
exstaff85740 exterms:address "1501 Grant Avenue, Bedford, Massachusetts 01730"
Before discussing the graph, it is worth pointing out that each URI node is denoted by an ellipse, and literals denoted by a box. As seen on the graph, all the nodes are either a subject or an object while arcs are predicates. The respective RDF triples for the above graph are shown as follows:
exstaff:85740 exterms:address exaddressid:85740 . exaddressid:85740 exterms:street "1501 Grant Avenue" . exaddressid:85740 exterms:city "Bedford" . exaddressid:85740 exterms:state "Massachusette" . exaddressid:85740 exterms:postalCode "01730" .
As seen in both the graph as well as the RDF triples, a new node is created just to describe the concept of address. To represent the same piece of information in another way without having to create a new node, a blank node can be introduced, as follows
In RDF triples form
exstaff:85740 exterms:address ??? . ??? exterms:street "1501 Grant Avenue" . ??? exterms:city "Bedford" . ??? exterms:state "Massachusette" . ??? exterms:postalCode "01730" .
As seen from the graph, address node is changed from a node with URI address into a node that doesn’t have address which is called a blank node. Then in the triplets it is written as a ‘???’ instead of the complete URI as shown in the above example. However, besides using a ‘???’ to denote a blank node in triples, we can also use another form of representation in case there are a lot of blank nodes that represents different things within a graph. By reusing the same example, the triples can be rephrased as follows
exstaff:85740 exterms:address _:johnaddress . _:johnaddress exterms:street "1501 Grant Avenue" . _:johnaddress exterms:city "Bedford" . _:johnaddress exterms:state "Massachusette" . _:johnaddress exterms:postalCode "01730" .
By breaking up the address into smaller structured parts, it enables external applications to manipulate the information in a more standardized way to produce a more predictable result. However, because RDF only deals with binary relationship, to properly describes N-ary relationship, it has to be broken into a list of binary relationship with the use of blank nodes. Somehow, this reminds me of something similar in prolog where you can create a kind of variable that the programmer do not need to explicitly name them.
Besides being used for the above situation, blank node is also often used in situation where there is no other way to properly and accurately describe a resource. For example, a person with email address email@example.com, although mailto:firstname.lastname@example.org is a valid URI, but as mailbox address is also used as an attribute, the better way of representing John Doe is by using a blank node with mailto:email@example.com as object, as follows:
_:john exterms:mailbox <mailto:firstname.lastname@example.org> .
An example of combining multiple RDF that is scattered around the internet is assuming there is a book that is authored by an author that uses email@example.com, we can make inference by comparing the following triple
ex2terms:book78354 exterms:mailbox <mailto:firstname.lastname@example.org> .
We can then deduce that The book is written by John Doe that has email address of email@example.com.
However, because by default object allows any literals, it may make other application that consumes the information in trouble. For example, by looking at the triple below,
_:jeff exterms:age "24" .
There is no way to tell whether that 24 is a base 10 decimal, or is an octal number. Things can only go worse if the application is actually expecting a float. Type literal is introduced to solve the problem by allowing the specification of a particular datatype to be used in literals. Back to the previous example, to properly define my age, I should prepare a triple as follows:
_:jeff exterms:age "24"^^xsd:integer .
Content is summarized / oversimplified from RDF Primer.