Tuesday, December 15, 2009

Semantic Publishing

For the most part searching for information on the web consists of relying on the search engine to find the occurrence of the words we type in the search box. This will certainly bring back a large quantity of documents, but whether or not the documents are useful is another matter. This has been the case since very early in the online information age when we librarians went to great lengths to professionally come up with search strategies that gave the user as many valid documents as possible.


Just because we search for documents that have a certain set of search terms doesn't at all mean that the meaning of retrieved documents will be anywhere close to what we intended. Semantic Publishing refers to an emerging practice of enriching documents with anything that makes the meaning of a document clearer to the search engine. Additionally, if a document's meaning is more discoverable, similar documents can be integrated with others providing reliably linked information from a search.


A number of STM publishers are making efforts in several areas to create on-line journals that are semantically enriched. An excellent example of this is the New England Journal of Medicine (NEJM). In order to facilitate the discovery of an article, NEJM routinely adds semantic XML markup in the text to increase the understanding of the underlying meaning. Additionally, NEJM routinely explores methods for semantic publishing at their Journal's Beta Site. This includes the use of supplementary material with the articles including audio, video, images, and creative ways to link material.


The benefits of semantic publishing will enable the new semantic web. This vision of the semantic web was described by Tim Berners-Lee in 1999 as:


I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A ‘Semantic Web’, which should make this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The ‘intelligent agents’ people have touted for ages will finally materialize.

With efforts such as with the New England Journal of Medicine, this vision is closer to reality.

1 comment:

Gouri said...

The search results are further worsened by the blackhat SEO techniques like keyword stuffing & link trading :) Though search engines are constantly working on their search algorithm (Google, for instance, has stopped considering meta keywords), they are still a long way to go.

The day when search engines start throwing results not just on the basis of word match but after thorough analysis of the data, will in deed be a dream come true for all the netizens.

Nice to know that Journal's Beta Site has made a significant achievement in this direction. Thank you for such an informative article.