Friday, December 18, 2009

2010 Trends

I have been reading with great interest over the past few months blog posts predicting the future of publishing, even it's demise. While the dire predictions may be over the top, the industry is definitely changing. As I look forward into 2010, there are certainly conversations in publishing that are driving these changes.


Digital publishing is making a large impact on publishing. In the academic and STM space, electronic sales are now surpassing print sales. Both the ARL and ACRL report that crossing of the electronic/print paths has been happening for the past few years. As academic institutions have been faced with decreased funding and tight budgets, the best alternative has been to move towards electronic resources that have been generally under priced. Now as the demand for these eResources has increased, there will be price implications moving forward. In an excellent post by Kent Anderson, he points out that we are in the middle of a revolution:

And there is no going back. We’re in the midst of a revolution of distribution, manufacturing, and information presentation and utilization. It’s a digital revolution. It’s a revolution that now dominates the purchasing and strategic frameworks for demand and supply.

As this digital revolution continues in the STM space both authors and publishers now have the capability to add value to the digital content especially in terms of making the meaning of the content more discoverable through the use of Semantic Publishing. Enhancing the meaning of content through semantics was just not possible in the print world.


In 2010 we will see more of the huge growth in the sale of eBooks. While these sales represent a small percentage of overall book sales, there is steep growth in this area and it is taking the place of print books. As for the demise of the publishing, Steve Haber, the president of Sony's Digital Reading Business, has a post titled The Death of Print Doesn't Have to Mean the Death of Publishing. He points out:

There are some similarities between where the publishing industry is today and where the music industry was when it entered the digital age. When we transitioned from LPs and cassette tapes to CDs and MP3s; music did not die - vinyl and magnetic tape formats did.

He concludes that the shift from print to eBook doesn't mean that publishing will go away just as when we went from film to digital, pictures did not go away.

Tuesday, December 15, 2009

Semantic Publishing

For the most part searching for information on the web consists of relying on the search engine to find the occurrence of the words we type in the search box. This will certainly bring back a large quantity of documents, but whether or not the documents are useful is another matter. This has been the case since very early in the online information age when we librarians went to great lengths to professionally come up with search strategies that gave the user as many valid documents as possible.


Just because we search for documents that have a certain set of search terms doesn't at all mean that the meaning of retrieved documents will be anywhere close to what we intended. Semantic Publishing refers to an emerging practice of enriching documents with anything that makes the meaning of a document clearer to the search engine. Additionally, if a document's meaning is more discoverable, similar documents can be integrated with others providing reliably linked information from a search.


A number of STM publishers are making efforts in several areas to create on-line journals that are semantically enriched. An excellent example of this is the New England Journal of Medicine (NEJM). In order to facilitate the discovery of an article, NEJM routinely adds semantic XML markup in the text to increase the understanding of the underlying meaning. Additionally, NEJM routinely explores methods for semantic publishing at their Journal's Beta Site. This includes the use of supplementary material with the articles including audio, video, images, and creative ways to link material.


The benefits of semantic publishing will enable the new semantic web. This vision of the semantic web was described by Tim Berners-Lee in 1999 as:


I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A ‘Semantic Web’, which should make this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The ‘intelligent agents’ people have touted for ages will finally materialize.

With efforts such as with the New England Journal of Medicine, this vision is closer to reality.

Friday, December 4, 2009

eMagazine Next Year

In my last post I suggested that magazine and newspaper content isn't the best fit for the current generation of eContent readers such as a Kindle. The structure of a book and it's intended use suggest a much different user interface than for a magazine or newspaper which are typically approached in a greater browse mode than for a book. Time, Inc. this week released a demonstration of what Sports Illustrated will look like in full-color, with an interactive user experience. Some predict that the device is the new Apple tablet.


Tuesday, December 1, 2009

eContent Readers

For the past 20 years, I have witnessed the evolution and growth in the distribution of electronic content. For the first ten years this centered primarily around the aggregation of full-text scholarly journal content and it also included a few early efforts in creating electronic editions of books. Growth of the scholarly journal content has continued to grow with the monumental efforts of primary and secondary publishers in addition to major universities and consortia. Off-shore data conversion vendors have made these digitization efforts affordable and have now partnered with publishers to digitize most of the major newspaper back files. Research in libraries today looks very different from 20 years ago aided by the ability access all of this content on the web.


Unfortunately, accessing electronic book content has not progressed in the same manner. Ten years ago the first attempts at eBooks began and we have all seen the starts and stops along the way with countless readers. It appears that the latest manifestations of readers are now catching on and will continue to grow allowing better functionality and more content. This generation of eReaders attempts to handle magazine and newspaper content in addition to books, but nobody is satisfied with the way this serial content works today. Efforts are now underway to create standards around how to display newspaper and magazine content along with advertising.


I look forward to the day when I can read my books, search the journal literature, read my magazines, and browse the morning paper all on one device. Given the recent explosion in the current adoption of eContent it is likely this day will come.

Tuesday, November 24, 2009

Collaborative Publishing

In my last post (XML First…What about eBook First?) I discussed an emerging trend of publishing an eBook before the print version. An interesting implementation of this process is with the electronic release of The Complete Guide to Google Wave. This is a technical manual for Google's new web application, Google Wave. The technical manual was released one-month after the initial preview release of Wave. Since Google Wave is a hard-to-understand concept and application, this manual has been very useful for the early adopters of Wave.


The fact that the guide was released first in electronic form, rather then print, is only part of what is intriguing about this publishing experiment. The work is a collaboration effort between two authors who now use Google Wave to update their guide and receive public input on the future releases. The authors intention is to "Release early and Often". The guide will be updated and refined in public as Google Wave is changed and improved before a full public release of the product. The softcover print version is coming in early 2010.

Thursday, November 19, 2009

XML First...What about eBook First?

The benefits for publishers to move to "XML First" workflows have been floating around for awhile now (see my post from November 12th). This type of workflow supports the easy and efficient creation of alternatives to the print book such as the eBook. It was only a matter of time that an "eBook First" discussion started.


According to the International Digital Publishing Forum (IDPF), year-over-year eBook sales have grown to 300%. This certainly raises the interest of any publishing CFO who now sees the importance of electronic revenues. But it will surely raise the interest of authors and editors who until now built the content around the printed book and not the eBook. The shift to creating an eBook first, taking advantage of its creative possibilities has begun. No longer does the content have to fit within the constraints of the printed book.


Mike Shatzkin writes a very thought provoking post on his blog titled What it will mean when the ebook comes first. He predicts a huge upheaval for editors and authors when they start thinking about eBook First.


I've recognized for years that prevalent thinking is that the eBook is only an electronic version of the printed book. This is an artifact of the workflows that created the eBook from the print. With eBook First the content will surely be different than what is possible in print.

Tuesday, November 17, 2009

The Content Package

Publishing is as much about the content as it is about the packaging of the content. Many initiatives are underway to re-invent the package and how the user experiences the content on the web.

Google is experimenting with the user interface for news with their Fast Flip project. Users can very rapidly 'Flip' through the online version of news articles much as you would in browsing a newspaper or magazine until you find something of interest. Whether or not this new user experience will become the standard for online news reading isn't as important as the experiment itself as Google tries to push the envelope on the content package.


In an effort to promote innovation in the way information is accessed in the Life Sciences, Elsevier promoted the Elsevier Grand Challenge 2009. Elsevier was looking for specific tools to improve the interpretation of online journals. Specific objectives for the project were to:

  • improve the process/methods/results of creating, reviewing and editing scientific content
  • interpret, visualize or connect the knowledge more effectively, and/or
  • provide tools/ideas for measuring the impact of these improvements

The winner of the challenge is a prototype tool that links the internal content of a journal article with external scientific content. A pilot of the tool can be seen in the November 12th issue of the Cell journal, published by Cell Press. In this issue, mentions of proteins, genes and small molecules are highlighted and links give the user pop-up windows with relevant contextual information. This is accomplished through rich semantic tagging of the content which can be ignored in the xml for print, turned on for presentation in the online version, and turned off by online users that don't want to see the highlighting. An example of the pilot can be seen at:


Dissociation of EphB2 Signaling Pathways Mediating Progenitor Cell Proliferation and Tumor Suppression p679


While innovation in the user interface for news and journal articles is moving ahead with promising prototypes, it seems that similar innovation in eBook interfaces has a ways to go. Most electronic readers attempt to replicate the printed book experience to widely varying success. One experiment that is underway is with the new so called 'Vook' which combines text of a book with video. This format shows promise but examples so far don't demonstrate content that truly shows a close link between the text and the video.

Thursday, November 12, 2009

XML-First, Please

The introduction of XML-First into the publishing workflow allows publishers to move from a print centered workflow to a content centered workflow. By quickly and efficiently transitioning the author's manuscript (usually in Word) to XML prior to composition gives the publisher the ability to publish faster and in many more creative ways to their customers including XHTML and ePub. The move to XML-First has proved to decrease costs, increase ROI, and raise the quality of the end product. A very typical workflow can be seen in a slideshow from Taylor & Francis Books, "What impact does XML-First have on your costs". Mark Majurey shows in the presentation a 30% reduction in budgeted time for copyediting combined with the ability to outsource the typesetting results in dramatic cost savings to the publisher and ultimately a higher ROI.


The benefits of XML-First is not restricted to traditional publishing. Last month, the US Government introduced the release of the Federal Register in XML prior to composition of the printed register. The Federal Register publishes approximately 80,000 pages per year and is the de-facto news agency of the executive branch. The Washington Post on October 5th, 2009 reported this development in the article "A More Web-Friendly Register". The raw XML data can be accessed at Data.gov making the voluminous information accessible, customizable and reusable in a variety of formats. This development has made greater transparency in government a reality.



Tuesday, November 10, 2009

POD is Taking Over

Traditional production methods of books is in a steady decline. According to Publisher's Weekly, 275,232 new and revised titles were produced by traditional production methods in 2008 which is a 3% decline over the previous year. In contrast POD (print-on-demand) titles rose at an astounding rate of 132%, to 285,394 titles in the same period. Is the traditional book printing age over?


Clearly, Hewlett-Packard has seen these numbers and is launching a web hosted Print-on-Demand (POD) service named BookPrep. As reported in PCWorld 's article titled HP Bets on Print-on-Demand Services, BookPrep will include, among others, 500,000 out-of-print titles from the University of Michigan that were digitized by Google. BookPrep is partnered with Amazon to sell and distribute the books. According to Andrew Bolwell at HP, "There's a fundamental shift taking place in the publishing industry, Print-on-demand is the future."

Friday, November 6, 2009

"Intelligent" Content

For years now, publishers have created metadata which is nothing more than "data about data". It's the data that describes an artifact or piece of data. Typical metadata for a journal article would be the title, author, volume and issue.


Taking traditional metadata one step further is Intelligent Content. This is the emerging practice of enriching the content with information or metadata that allows the content to become adaptable to varying users, technology, output format, and purpose. This adaptability is often managed automatically by the publishing systems. For example, if content is published both electronically and in print, the intelligent content is able to tell the publishing systems to include video, audio and other rich media to the online edition. But intelligent content is not just about output format. Through the use of this metadata the content can be customized to fit the intended audience. Scott Abell in his Content Wrangler blog describes it this way:

"By adding intelligence to the content, you can have it do the formatting work for you, on-demand, only when it’s needed. That’s the smart way of providing the right content, to the right people, in the right format, at the right time, in the right language."
The possibilities are endless.

Monday, November 2, 2009

Staying ahead of Data Rot

Whenever the subject of eBooks comes up in a discussion inevitably someone mentions their love for the printed book and the hope that they will never go away. Everyday more and more books are created in an eBook form but will the digital copies of these books be around in 10 years? If history is any predictor the answer to that question depends on how well the digital copies are maintained over time and transferred to the latest and greatest storage formats to avoid being lost for all time.


I was reminded of this "data rot" this weekend as I ran across a reel-to-reel tape of my senior recital at music school some 30 years ago. I don't have the equipment to play it anymore, nor is the equipment readily at hand so I have no idea whether or not they are even salvageable at this point. But it is certain that if I don't take care of these soon they will be lost for good. Not a great loss mind you, but one I don't want to think about.


Storage formats come and go and in the electronic age those formats last about 10 years so efforts to keep eBooks around will become challenging. How is it that the books I have owned all of my life still survive? I remember a lecture from Library School that put forth the idea that printed books have proved to be the best storage format of all time. They are able to survive even fire. The edges of the pages might be charred however the text on the whole remains. Can this be said for any digital storage today? The printed book format has been enduring for hundreds of years. That's one format compared to over 10 eBook formats and the complimentary eBook devices I have owned over the years. Just as alarming is that I only have the latest Kindle copies of eBooks that I have purchased recently. All the other eBooks are lost forever.


I have always been excited and supportive of the digital transformation of books, but I don't think the printed book lovers have anything to worry about for awhile.

Wednesday, October 28, 2009

Getting into the (xml)Flow of Things

By now there is wide-spread acceptance that XML's tagging and indexing capability is a powerful tool to leverage a publisher's valuable content asset. Just as important is implementing a publishing workflow utilizing a content management system that stores documents in a native XML format. This means that the goal is to have a workflow where data is not only created and tagged in XML, but also stored in native XML creating the possibility to repurpose the data as needed without data transforms in and out of the CMS.


Consider the challenges presented in the following typical workflow. Even when content is tagged in a rich XML scheme but stored in a relational database the first step that we are faced with is transforming the data from XML so that it can be stored in relational database tables. Once it is stored, if we want to repurpose this data for publication, say on the web, another conversion must take place to recreate the XML once again. This laborious task of multiple back and forth transforms never results in a timely or high quality production process.


Certainly, just getting the data into the relational database can be a long process to begin with. But consider the challenge of receiving XML data from multiple, even hundreds, of sources on a daily basis. The process then involves standardizing the data which is a huge undertaking. In Dave Kellogg's (CEO of MarkLogic) post The First Step's a Doozy, Dave considers Step 1 of loading content into the relational database system to be a daunting challenge.


In order to realize the full potential of an end-to-end publishing workflow, it must be built around content management that not only "handles" XML as another data type but rather employs a central native XML repository. Once the XML can get flowing in this manner it will ensure that publishers can make content that was cumbersome to repurpose into an asset that is easy to assemble in any form desired.

Monday, October 26, 2009

Practical Application of XQuery

End-to-end XML based publishing workflows teamed with XML content management systems have made it possible for publishers to distribute custom published college course materials to university students in a variety of formats. Applications utilizing XQuery, a programming language designed to query repositories of XML data, allow college professors to search, manipulate and assemble the content into custom published course materials for distribution to their students. This previously cumbersome process of custom coursepack printing now makes it straight forward to provide course material in eBook or print formats and enables the inclusion of local content (PDF, Word, etc.) for a true customized package.


At a recent XML-in-Practice conference, a joint presentation by representatives from John Wiley & Sons in addition to McGraw-Hill demonstrated their implementations of web based custom publishing solutions utilizing XQuery on MarkLogic XML Server systems. Wiley's product, Custom Select, allows the user to search and select Wiley content at a section or chapter level and then customize the output with a cover, arrange the order of the content, and also upload local content. The resulting custom course material can then be previewed and submitted for printing or for the creation of eBooks. McGraw-Hill's implementation will provide the same level of functionality.


XQuery is particularly well suited to this application as it provides the capability to search, extract, and manipulate XML data from documents across many types of data sources. For more information on XQuery see XQuery 1.0: An XML Query Language by the World Wide Web Consortium (W3C) or the XQuery Wikipedia entry.

Friday, October 23, 2009

Innovative OCR Correction

The National Library of Australia has implemented an innovative approach to balance the cost of OCR correction with the user's need to search the full-text of historical newspapers aided by the efforts of the users themselves. When undertaking a large historical digitization project publishers are often faced with decisions around how much full-text OCR correction should be undertaken. With projects, such as historical newspaper collections, it is highly desirable for the user to be able to search the full-text of the archive for people, places or other factual information. The users success is largely influenced by the accuracy of the underlying text extracted by the OCR engine. The success of this extraction is ultimately dependent on the quality of the original source which is highly varied across the centuries.


The National Library of Australia along with the Australian State and Territorial Libraries has created the Australian Newspaper project. Over 4 million newspaper articles are currently available in the archive and are full-text searchable. To overcome the high cost of OCR correction the project includes the ability for the users to correct the underlying text. This approach has resulted in an impressive 3.4 million lines of electronic text corrected in over 150,000 articles. This community effort will surely benefit searchers for ages to come.

Wednesday, October 21, 2009

ePub Supported eReader Introduced This Week

The ePub digital book standard gets a big boost this week with the introduction of the Barnes & Noble new eReader. The BN Nook, supports the ePub standard which instantly makes available over 500,000 free books from Google Books. The Google Books are already showing up on BN.com shelves. These Google books are not available on the rival Kindle eBook from Amazon which utilizes a proprietary format. In addition to the free books, BN has over 500,000 more books available for their new reader.

ePub is an XML format composed of open standards from the IDPF (International Digital Publishing Forum) which is the trade and standards publishing association for the digital publishing industry. This format allows publishers to produce and distribute their content in one format and provides consumers with interoperability across a number of devices including the new Nook and the Sony Portable Readers.