Back to home page

DOS ain't dead

Forum index page

Log in | Register

Back to the forum
Board view  Mix view

PDF DOC DOCX ODT RTFM - document formats please help to comp (Users)

posted by ron(R) Homepage E-mail, Australia, 15.03.2011, 21:05

> > And what a truly bloated thing docx is !
>
> Hah! Interesting observation. I'm curious if you ever compared
> DOCX to the Oasis Open Document Format (ODF) text document (.ODT
> extension) that Open Office Writer uses.

I have one .odt file to look at, and it is superficially similar to .docx.
I don't have Open Office Writer, but I don't think there is a port to DOS.

> Just for larfs, using Writer, i saved both a blank page and a
> "Hello World!" page using defaults, then unzipped them and did a
> DIR:
>
> Total for BLANK.ODT: 26,291 bytes in 9 files and 38 dirs
> Total for HELLO.ODT: 26,641 bytes in 9 files and 38 dirs

Yeah, that would be right ! Even in .docx files it is so divided up into bits that refer to each other, across several directories.
But what struck me is how many tags there are in a single .xml component, compared to the actual payload of text.
The more I learn about this stuff, the more I wonder how this format became a "standard".

Much of the effort I am putting into docx2htm is involved in removing tags that are irrelevant to HTML, or replacing multiple groups of tags with single HTML tags.
At the present state of play, the main output file is half the size of the main source file, and I still have a way to go.

---
AUSREG Consultancy http://www.ausreg.com
Tadpole Tunes http://www.tadpoletunes.com
Sna Keo Il http://www.tadpoletunes.com/sna_keo_il/

 

Complete thread:

Back to the forum
Board view  Mix view
15112 Postings in 1359 Threads, 247 registered users, 10 users online (0 registered, 10 guests)
DOS ain't dead | Admin contact
RSS Feed
powered by my little forum