a jaundiced eye: e. t.
for sunday, september 28, 1997.

XML: SGMLers' Revenge

When I first got into this business, I didn't even know about the World Wide Web. We were taught SGML, the Standard Generalized Markup Language, in order to perform document conversion on millions of pages of defense subcontractors' technical manuals. The tagsets were large and unwieldy, and it took us weeks just to convert a single paper manual to the required format. We used big software packages like SoftQuad's Author/Editor, where I learned the joys of regular expressions. We used Perl on the backend, to take proprietary conversion formats and turn them into the ASCII text markup files we then finished tagging manually. It wasn't until months later that we got to see the fruits of our labor, as we were but one step in a long chain of sub-sub-sub-contractors. I remember when we first got a chance to see an IETM (Interactive Electronic Technical Manual) for which we had done some conversion work. It was miraculous. Somehow, some very clever folks had managed to finish up the work we had started months earlier, and put it all on CD-ROM, with scalable graphics and a nifty Windows interface. We had never seen our documents in their final intended output format before.

When the Web hit, and a few of us took the time to learn the tiny markup language which it used, we were amazed. Not at the usefulness of HTML, for to us it seemed pretty paltry, a little DTD for doing scientific reports. It mixed formatting tags with logical markup, and seemed to have trouble making up its mind as to whether it wanted to be generic or specific. Of course, we had no idea how big it was to become. Few of us suspected the tremendous change it would have on our favorite haunt - the Internet - with its arcane and bizarre elitist culture. But some of us learned HTML, and started using it to present a little bit of ourselves to the company (we didn't have an external Web server, and wouldn't for another year.)

As our lives changed radically (due to layoffs, corporate buyouts, and other annoyances) we got caught up in the rocket ride that was the Netscape phenomenon. We struggled with frames and CGI and new tags that seemed to come out of nowhere, with no documentation. The best bet was to surf and see how HotWired was doing their thing, and imitate for dear life. The glacial pace of the old days of SGML, with its relatively stable DTDs and year-long turnaround on conversion projects, gave way to hourly updates to our intranet, and a puzzled sense that there wasn't really a single DTD for HTML. That the whole thing was in the hands of the browser manufacturers like Netscape. The anger of the old hands on the www-html mailing list, and the frustration of those around us who realized that cross-browser compatibility was a myth, were palpable. And rightly so.

Fortunately, the tag-a-minute trend is slowly being replaced by a resurgence of the old SGML mindset: keep structure and logical markup separate from output format, styles, and display issues. First it was Cascading Style Sheets, an elegant solution to the problems of embedded font tags. Now, looming on the horizon, we have XML. And things are going to get a lot better when it gets here.

Oddly enough, things will get better by moving backwards a bit. XML will be fully compatible with HTML 4.0, and presumably any later versions, but will finally allow for the creation of our own tagsets and will work together with CSS to allow the separation of style and structure. It will be possible to provide your information in one format, and have multiple output paths without having to go backwards to some proprietary format like MSWord, RTF, or Postscript. Best of all, we won't be bound to what the browser vendors do and don't support with regard to extensions to HTML.

The browser wars aren't over, but it's coming. Keep your head down.

Steven Champeon

r e c i p r o c a t e

Permanently archived at: http://www.jaundicedeye.com/browse/et/092897/

© 1997-2001 Steven Champeon. All rights reserved.
All slights reversed.