Usability Notes - by Chris Baker

Notes on usability and related things by a project manager who manages electronic publishing projects.

About

My Photo

Twitter Updates

    follow me on Twitter

    Recent Posts

    • The internet and the older user
    • Thrashing numbers and the thirteenth task
    • SEO and SEM vendors and consultants appreciate me too much
    • Introduce new software testers, reveal Goldovsky errors
    • How to print a list of files from a Windows Directory (without needing to buy software)
    • Memories of the dotcom bubble
    • How Annals of Botany has made use of social media
    • Many social media services (Ethnority's lovely taxonomy)
    • Don't be a Hiro
    • van Gogh stops the Machine -- a paradox of virtual experience

    Most popular posts

    • kanban
    • "Oh, you just click the TV?" The journey of a metaphor
    • Security question difficulties
    • The NLM DTD
    • Poka-yoke
    • web colours
    • Requirements analysis
    • Shopping cart abaondonment benchmarks

    The NLM DTD

    Yesterday I was at the ALPSP Technology Update meeting "A Standard XML Document Format: the case for the adoption of NLM DTD". I gave a talk entitled "NLM DTD in Archiving - a case study. It is the story of how we used the NLM DTD to produce the IMechE Proceedings Archive : You can download my PowerPoint presentation by clicking this link:
    "Download NLM_DTD_archiving_IMechEProceedingsArchive2007.ppt .


    All the speaker presentations are also available for download from the ALPSP website
    It was an interesting meeting with a lively discussion, so I thought I would write up some notes.

    By the way, I shall assume that readers have a reasonable idea what a DTD is, and only give a brief definition here (from the W3C tutorial on DTDs,) .

    The purpose of a DTD (Document Type Definition) is to define the legal building blocks of an XML document. It defines the document structure with a list of legal elements and attributes.

    Meanwhile, if you are a bit hazy on what XML is or how it differs from HTML or SGML, you are welcome to try my short paper, intended for publishers, called "What the Hell is XML?"
    Download what_the_hell_is_xml.pdf

    So now back to yesterday. I will try to pick out the themes that I found interesting. That will mean dodging about chronologically between what the speakers said, the panel discussion and discussions over lunch. Impediments such as lunch plates and my own pre- and post-speech hormone levels mean that my notes were not up to much - I welcome any corrections and additions!

    Theme 1 - one reason why the NLM DTD is good

    Bruce Rosenblum, (Inera, and one of the authors of the DTD), kicked off with a description of how it came to be. The first of my themes soon emerged: one of the reasons the DTD is such a success is that it was designed by reviewing publishers' DTDs that were available at the time. Therefore it incorporated and built on what publishers actually were doing and wanted to do, rather than some theoretical model of what they should do. The authors examined a wide range of journals of many subjects and so have been able to allow for many of the practical issues that  journals publishers need to deal with . Also, while it is "the NLM DTD" it is suitable for subject matter very different from the National Library of Medicine's interests. Therefore, as Geoff Builder (CrossRef) said in his Chair's remarks, there is a lot of publishing wisdom in the DTD. When Eamonn Neylon (British Standards Institute) came to speak, he commented that the NLM DTD had followed the ideal route to being a standard - it had built on best practice, then been adopted widely by its community, before going down the route to having a NISO number.

    Theme 2 - whether and when to adapt it

    Given the "publishing wisdom" already included, the second theme to emerge was whether and how publishers should modify the DTD. Geoff Builder went so far as to say that if you needed to modify the DTD, you should look at your methods (as you might well be doing something wrong).  Bruce's partially disagreed: his talk included material on how the DTD had been made easy to modify or extend if necessary - and some examples of people producing extensions for it . My own contribution was that I've found  it important to avoid uncontrolled tinkering with DTDs - sometimes  people are very tempted just to "fix" the DTD as a  workaround to whatever problem they have on their desk today. During the IMechE Archive project, we did sometimes encounter unexpected oddities in the content we were processing. Each time I found that I could go back to the NLM DTD and its excellent documentation and find a way (or several  possible ways) in which the problem could be accommodated WITHOUT bending the DTD out of shape. The DTD is beautifully built to anticipate these kinds of problems (in my experience). One example is the <custom-meta> element, which enables you to  include your own, self-defined metadata into the DTD if you need to. Obviously you then need to document or take other steps to ensure that you adopt the same approach next time the problem is encountered, and know what you have done. But look to see what the DTD can already do before wading in and editing it.

    Theme 3 - QA

    A third theme that emerged was that of QA. You should certainly check that your content  will "parse" (that is, check that it obeys all the rules in the DTD), but it would be  a misconception to believe that this in itself means that all is well. This is because parsing makes a very specific set of checks - it is possible to write a document that parses perfectly, but is full of errors that the DTD cannot catch. Geoff Builder had a (rather shocking)  anecdote about how some people behaved as if parsing the data was a game - he worked with a supplier whose data didn't parse. When one day it did, he was suspicious and inspected the XML . This revealed the reason - the supplier had faked things by "commenting out" the bits of the document that wouldn't parse.

    The idea that the DTD is not a complete validation tool is somewhat counter-intuitive. During the panel discussion I was looking for an analogy that might be helpful, but it was not until later that  thought of one. Here it is. When considering contracting a company to do important work, you might undertake some due diligence checks. For example, you might find out whether the company has any legal cases pending, and you might do a Dun and Bradstreet credit check.  These checks are of course useful - you'd think carefully before getting involved with a company that had big money or legal problems - but they are not likely to be all that you would want to know. For example, you'd surely want to know about other things, including the competence and capacity of the company to do the work.

    Eamonn Neylon's talk covered some of the automated tools available to supplement a check for parsing - he suggested that  publishers should used rule-based QA where possible, and employ tools such as these to do it. Of course some manual inspection would always have its place too (automated tools are good at fixing predictable errors - people are good at finding unexpected errors)

    Theme 4 - how should publishers adopt it?

    A fourth theme, emerging in the panel discussion, was how should publishers adopt the NLM DTD (and possibly XML too if they have not done so already).  Adopting the NLM DTD, rather than making your own was of course already a big time and money saver. Both Bruce Rosenblum and Bill Kasdorf (Apex CoVantage) had experience that trying to get authors to provide tagged content (e.g. by giving them MS Word templates) was an uncertain venture - success in getting the authors to comply was by no means guaranteed, even in the case of a very prestigious journal or when working with very technophilic authors. The panel agreed that talking to your suppliers - especially the typesetters - was a good first step. Typesetters might be able to offer XML at no or little additional page cost and were likely to be willing to spend time with a publisher to discuss and explain (in the interests of winning business, of course). Both Geoff Builder and Eamonn Neylon sounded a note of caution though - a lot of XML work these days is with databases and therefore many an XML expert knows about databases and may be unfamiliar with the different issues of working with text content. So check the expert's expertise. What kind of workers did a publisher need? We felt that a publisher didn't necessarily need a mega-technical person. It could be very helpful to have an enthusiastic and interested project manager or project leader to make sure things happen [indeed, this is the role I took in the IMechE Archive project, while it was clear that Professional Engineering Publishing staff should deal with ongoing XML issues, so as not to be permanently reliant on me for expertise].   Publishers obviously needed to make sure they understood the new part of their business, however. Mick Spencer (Professional Engineering Publishing), commented that he had been on exactly this journey - learning about XML largely by talking and working with suppliers  - and had found it possible to learn perfectly adequately in this way.

    Nick Evans (ALPSP) asked what publishers should do with their legacy PDFs. The panel discussion drifted away from part of this - later I spoke to Nick and suggested that you could certainly use the NLM DTD to capture XML metadata for these older papers - you don't necessarily have to convert the full text into XML. In a way this was a bit like the position in my project with the IMechE Proceedings Archive - except that we started with paper, not even with PDF!

    Tunneling under Disney

    In an interesting closing set of closing remarks, Geoff Builder quoted  Yuri Rubinsky  (a well-known promoter of SGML, the predecessor to XML) as likening publishing to the "tunnels under Disneyland" (i.e. the support infrastructure, invisible to the user, which makes the whole enterprise work  ). I believe Geoff was quoting this passage:

    "I saw a revealing photograph of Disneyland in a United Airlines magazine, a shot of Mickey Mouse -- who is enormous in real life -- talking to a street cleaning person in a very tall, very wide tunnel underneath Disney World. A complex network of tunnels is what lets the Peaceful Kingdom function as well as it does and why you never see Mickey or Minnie or Goofy or Donald ducking into a washroom or eating lunch. The analogy [with the systems needed by publishers and libraries] is pretty rich. The architecture of the tunnels is the same no matter what public facility they support. The services they provide are constant, and silent. They keep complications -- like transport vehicles and emergency personnel -- out of the visitors' way, while providing an underpinning to the whole operation.

    On one level, publishing is like those tunnels, making available the attractions above ground with subterranean structures. But for me the most interesting aspects of the Disneyland tunnels are their dimensions and their materials and their layout. Why? Because they are completely consistent wherever they go. They're the same beneath a pirate ship and beneath a hotdog stand, providing the consistent system services below which support and enable the mad variety of extravaganza above."

    Yuri Rubinsky, Electronic Texts The Day After Tomorrow

    Geoff's point (I think) - is that publishers (as opposed to XML fans) need only to dive into the tunnels of XML DTDs, Near & Far diagrams, Schemas et. etc. only so far - to keep their Magic Kingdoms running well.

    Er Geoff - does that make me Mickey Mouse :-) ?

    [Note added 12 December 07: Geoff is planning another foray into the tunnels under Disneyland in the ALPSP update series - the details that the  ALPSP have circulated so far are:

    ALPSP Half Day Technology Update: All Your Documents Belong to Us.  The Next Wave in Publishing Infrastructure

    Date: Early July

    Venue: To be confirmed

    Chair: Geoff Bilder, CrossRef

    In this Technology Update you will hear how as publishers consider revamping their online publishing infrastructures, they are increasingly looking to new technologies like XML databases, RDF triple stores and XML-aware full text indexing engines.

    ALPSP will put more details on their website in 2008 - meanwhile they ssuggest that anyone interested either contacts them (info@alpsp.org) or monitors the ALPSP events guide for updates.]

    December 04, 2007 in Tools, XML | Permalink | Comments (0) | TrackBack (1)

    | Digg This | Save to del.icio.us |

    google box

    • google box
      Google

      all Google
      this blog only

    Adsense

    Subscribe in a reader
    Subscribe to Usability Notes - by Chris Baker by Email

    Archives

    • May 2013
    • December 2012
    • November 2012
    • October 2012
    • July 2012
    • June 2012
    • May 2012
    • April 2012
    • January 2012
    • December 2011

    More...

    Categories

    • Accessibility
    • Announcements
    • Books
    • Case Studies
    • Current Affairs
    • Customer behaviour
    • e-marketing and e-commerce
    • Email marketing
    • Games usability
    • ideas parking space
    • mobile
    • My usability experiences
    • Nice usability ideas
    • Pet hates
    • project management
    • Publishing
    • requirements analysis
    • Soapbox
    • social media
    • statistics and data
    • Tools
    • Usability and children
    • usage statistics
    • Useful usability resources
    • Web/Tech
    • Weblogs
    • website testing
    • Weird user interfaces
    • writing about others' writings
    • XML
    • Usability Notes - by Chris Baker
    • Powered by TypePad