Semantic Web is not just about putting data on the web, but also making links to allow a person as well as a machine to explore the web of data. Links are made in the web of data connects arbitrary things together as described by RDF as opposed to links in the web of hypertext, where links connects to only web-resources. Linkage of arbitrary things then allow related things to be found while performing search.
Besides, it is also a principle of linking data between systems and entities that allow rich self-describing inter-relations of available data across the globe on the web. Web of Data also marks a shift from publishing data in human readable HTML documents to machine readable documents that allows machines to be able to make inference out of the data published.
To efficiently link entities together, Sir Tim Bernes-Lee proposed four rules or expectations, as follows,
- Use URI as names for things
- Use HTTP URIs to enable people to look up these names
- Returns useful information using standard technology / format like RDF / SPARQL when someone looks up a URI
- Allow users to discover more things by including links to other URI.
However, it is important to know that breaking the rules does not neccessary destructive. It would only reduce inter-connectivity, which in turns discourages re-usability that results in making resources less valuable.
To use URI as names for things, Universal URI set of symbols should always be used to enable other parties to be able to process data that results in a consistent result. This also means that the risk of loosing meaning is reduced in the process.
It is also important to actually serve information on the web against a given URI. This allows data as well as metadata in specific standard formats such as RDF or OWL accessible. By publishing information about a resource, it enables others, especially applications and machines to properly understand the document.
To enable users to discover things, one of the ways is to provide inner as well as outer links information back to the user. By definition, it says that given a graph G, it is browsable if for the URI of any node in G, and if the URI is looked up, information returned that describes the node must satisfy the following conditions:
- Returns statements where the node is either a subject or object
- Describes all blank nodes attached to the node by one arc.
In short, it allows data to be represented in graph form and allow traversal. It is important for the query service to return RDF statements of that involves the specified node regardless it is a subject or an object. Note that the subgraph returned should be a minimum spanning tree (MSG) or known as RDF molecule.
However, if there is a statement that relate multiple entities, the statement should then be repeated for each of the entities. This then violates the rule where data must not store in more than one place mainly for consistency purpose. However, this becomes less of a concern if the statements are automatically generated.
Besides that, there may be a situation where the author of document A claims that it relates to document B, but the author of B may think otherwise. One of the reasons may be document A did not exist when B got published.
Multiple or expired data may also not be desirable at times. For example page visitor statistics data within a site introduction document. To solve this issue, one of the proposed way is to separate the statistics statement data out of the introduction document into an individual document.
In the end, links opens up the web of data to not only human beings, but also AI processes to allow them to make inferences out of entities. Besides, it also encourages all parties to publsih data freely in an open standard format.
Data summarized and greatly simplified from the following sources: