“Hey, Herb. I’m sending you a copy of my speech for the website. Should I send you the Word file, or should I save it in .HTML format?”
Anyone who’s ever tried to go from a Word file to an acceptable HTML file will attest to what a frustrating exercise it is, especially if you want to produce clean, junk-free HTML that loads quickly and predictably. The basic problem is that Word is to clean HTML what Exxon is to clean arctic seashores.
Consider one of the simplest of files:
For at least 30 years, writing a computer program to spit out that phrase has been an exercise that gets the novice over their very first hurdle. Alas, it’s a hurdle that Word seems not able to clear easily. Not because it can’t do it. Rather, it’s because Word adds so much baggage that the horse collapses under the sheer weight before it even makes it out of the gate.
Feed that simple phrase to FrontPage, and the HTML file it produces is:
<meta http-equiv="Content-Language" content="en-us">
<meta http-equiv="Content-Type" content="text/HTML; charset=windows-1252">
Mind you, these 224 characters are not as small as it the minimum it could be. But, keep that number in mind when you see what Word does to the same phrase. You won’t complain about FrontPage’s file size in the least.
Word offers two options: filtered and regular. So, I create a basic Word document, a bold Hello followed by an italicized world, choose File – Save As – Web Page. Then, I close the file, and reopen it, telling Word to treat it as plain text so I can see it in all its full glory. I’m not going to post the gory results-TechTrax isn’t big enough. The grand total: 9,538 characters. That’s right—more than 42 times larger than FrontPage’s offering!
Now, let’s try filtered. I type the phrase into a new Blank document, choose File – Save As – Web Page, Filtered. Then I close, reopen as plain text, and—tada! Now, it’s a mere 2,722 characters. Now, it’s only 12 times larger than the FrontPage version.
What’s wrong with this picture? Imagine a world filled with Word-produced HTML. Imagine the extra bytes, clogging the arteries of the information superhighway. Alas, there’s apparently no drug you can apply to rid Word HTML files of this bandwidth-choking glut of unneeded bytes.
Other Laundering Techniques
What about copy & paste, you ask? Okay. As it turns out, this appears to be the best option available. I do my little Hello world thing in Word, copy it to the clipboard, then paste it into FrontPage. Ah. Now, we’re down to a mere 491 characters. Still, it’s over twice as large as FrontPage’s unencumbered version. Actually, however, since the new page created by FrontPage before pasting in Hello world was already 138 characters worth of overhead, we can only charge Word for 353 characters.
What do you get for the extra characters? Nothing most of us really need. All the formatting I really wanted was the bold and italics. But, when I paste in from Word, I also get a style definition that carries the underlying font and point size, neither of which I really wanted. Nonetheless, it’s still much better than the 8000+ character battery of cascading style sheet information I get when I go with raw, undiluted Word HTML.
Okay. One more laundering experiment, just to prove that when you’re looking for something, it’s not always the last place you look. What happens if I first paste my formatted Hello world into WordPad (which comes with Windows, by the way), then recopy it to the clipboard there, and finally paste it into FrontPage?
Alas… all formatting is lost. Even though WordPad is capable of displaying and retaining formatting—even when pasted from the clipboard—when you copy from WordPad to the clipboard, formatting isn’t retained. This make WordPad a useless intermediary if you want to retain basic formatting, but lose unnecessary baggage. Alas, with WordPad, all of the baggage is lost, not just the unnecessary bits.
The bottom line is that if you want clean, quick-loading HTML, use FrontPage, or something designed for HTML. If someone sends you something in Word format, however, your best bet often is going to be to open it in Word, copy it to the clipboard, and then paste into FrontPage.