Back to home page

DOS ain't dead

Forum index page

Log in | Register

Back to index page
Thread view  Board view
ecm

Homepage E-mail

Düsseldorf, Germany,
21.12.2024, 13:49
 

lDebug release 9 (Announce)

(Copied from freedos-user mailing list announcement.)

Hello,

Today I'm pushing out release 9 of lDebug, the 86-DOS debugger with the small L [1]! Just in time for the next FreeDOS interim release. It collected a few changes [2] since March's release 8. This includes several bugfixes and a number of other improvements, such as:

Assembling PUSH or POP with memory operands will now default to a size if none is given, to make the debugger more compatible to MS-DOS Debug. When specifying 16-bit unsigned numbers above-or-equal FF80h to the assembler, it can encode them as sign-extended 8-bit immediates in some cases. Further, the assembler accepts JMP FAR with a single number as implying a destination code segment equal to the current assembly segment, a feature copied from MS-DOS Debug. Assembling a MOVZX instruction writing to a 32-bit register from a memory operand lacking a size is rejected. (If the destination is a 16-bit register instead then a byte size is assumed.)

The new MODRM keyword allows to select or depict encodings that the assembler wouldn't choose by default. The xchg operand order in both the assembler and disassembler match NASM's and NDISASM's now. With a new DAO (Debugger Assembler Options) flag the NASM form of LOOP with an ECX or CX operand can be disassembled when an ASIZE prefix is found, rather than suffixing a D or W.

Three patch areas are added, currently used only for the TSC Extension for lDebug (ELD), to run small code snippets close to exiting or entering the debugger upon running a debuggee.

The instsect.com application now ships with the improved FSIBOOT5 protocol for the two-stage FAT32 loader. The application itself was improved somewhat, as well.

Heatshrink compression of the online help pages is now enabled, shrinking the total resident size of the debugger as compared to using the uncompressed online help.

The immediate assembler is enabled as well. This feature is inspired by D86. It allows you to enter a debugger command that consists of a dot followed by an assembly instruction, which will be assembled into a small buffer and ran immediately.

Bugs fixed:

* DPMI exit call to int 21h with 32-bit stack would crash. Reported by Japheth.

* Switching modes from DPMI PM to Real/Virtual 86 Mode with non-86-mode selector bases for one of the address variables (such as ADS, "Address D command Segment/Selector") would crash.

* Linear address specification to G or B commands using "@(" didn't parse closing parens correctly.

* BL command display fixed for DPMI with segments not matching current CS.

* Allow a comma between a numeric size expression (after L specifier) and a subsequent size keyword.

* Correctly parse device mode command line if it ends in a Line Feed (LF).

* Avoid some memory corruption during device mode init.

* The LBA check of the booted debugger works around a bug in the Xi8088 ROM-BIOS [3] by setting DS to 40h while calling int 13h function 41h.

* Flat binary file load disabled by default in device mode or resident debugger. It'd corrupt memory if no address was specified.

For the exact log of changesets you can consult the Mercurial repo [4]. I also write about lDebug changes regularly on my tech blog, the pushbx blog [5]. As usual the default, FreeDOS, and SvarDOS packages can be downloaded from our server [6].

My suggested blurb for the FreeDOS news:

Today marks another release of the DOS debugger with the small L, lDebug. This line-oriented debugger is a fork originally based on the 2008 version 1.13 of FreeDOS Debug/X, with a number of additions. Release 9 marks some bugfixes, a new revision of the lDebug/lDOS boot protocol for FAT32 drives, and the immediate assembler feature now enabled at build time. There's [a longer announcement on the freedos-user list]. Get lDebug from [ecm's website] [7] or from the mirror at ibiblio's FreeDOS Files Archive.

Regards,
ecm


[1]: https://pushbx.org/ecm/web/#projects-ldebug
[2]: https://pushbx.org/ecm/doc/ldebug.htm#news-r9
[3]: https://www.bttr-software.de/forum/forum_entry.php?id=21275
[4]: https://hg.pushbx.org/ecm/ldebug/shortlog/release9
[5]: https://pushbx.org/ecm/dokuwiki/doku.php?id=blog:pushbx
[6]: https://pushbx.org/ecm/download/ldebug/
[7]: https://pushbx.org/ecm/web/#projects-ldebug

---
l

mceric

Germany,
21.12.2024, 16:00

@ ecm
 

lDebug release 9 - Heatshrink compressed drives?

Hello :-)

Thanks for all those updates!

> Heatshrink compression of the online help pages is now enabled, shrinking
> the total resident size of the debugger as compared to using the
> uncompressed online help.

I had not known about that before, seems to be a variant of LZSS or LZ4:

https://github.com/atomicobject/heatshrink

https://www.cnx-software.com/2021/09/29/heatshrink...eight-compression-library-for-embedded-systems/

And for ESP32, there already is another, smaller variant, ESP32-Tamp:

https://esp32.com/viewtopic.php?t=39798

If you ever feel bored, maybe you could consider building a heatshrinked version of SHSUCDHD? A possible implementation could be to keep a table of where compressed sectors start.

This could even evolve to compressed RAMDISKs, partitions and maybe also EMS or XMS memory: In those cases, the drivers could claim that you have N times more space than you actually have, dynamically moving compressed sectors/blocks/clusters to 1:1, 1:2 or 1:4 compressed areas. In case N was too optimistic because entropy went up, they can just claim remaining clusters (or memory) are used now. With too little safety margin, it could even happen that they have to report write errors, but that case should be avoided.

Just dreaming of a free DOUBLESPACE, DRIVESPACE or STACKER here :-) Those used rather large compression blocks internally, which meant that they used a lot of RAM.

---
FreeDOS / DOSEMU2 / ...

ecm

Homepage E-mail

Düsseldorf, Germany,
21.12.2024, 18:41

@ mceric
 

lDebug release 9 - Heatshrink compressed drives?

> If you ever feel bored, maybe you could consider building a heatshrinked
> version of SHSUCDHD? A possible implementation could be to keep a table of
> where compressed sectors start.
>
> This could even evolve to compressed RAMDISKs, partitions and maybe also
> EMS or XMS memory: In those cases, the drivers could claim that you have N
> times more space than you actually have, dynamically moving compressed
> sectors/blocks/clusters to 1:1, 1:2 or 1:4 compressed areas. In case N was
> too optimistic because entropy went up, they can just claim remaining
> clusters (or memory) are used now. With too little safety margin, it could
> even happen that they have to report write errors, but that case should be
> avoided.
>
> Just dreaming of a free DOUBLESPACE, DRIVESPACE or STACKER here :-) Those
> used rather large compression blocks internally, which meant that they used
> a lot of RAM.

SHSUCDHD is a program that "Simulates a CD-ROM using an image file", right? Any particular reason you want it to support compression?

Other than that there's a number of different compression formats supported by inicomp and the scripts that do the packing (in lDebug's and kernwrap's scripts).

I did stick with heatshrink for the help pages and for packing extpak.eld, the packed library of Extensions for lDebug, though.

---
l

mceric

Germany,
21.12.2024, 22:55

@ ecm
 

lDebug release 9 - Heatshrink compressed drives?

> SHSUCDHD is a program that "Simulates a CD-ROM using an image file", right?
> Any particular reason you want it to support compression?

Yes. Because the image is read-only, it will be easier to support compressed static images of a CD (with a nice sector size of 2kb, too) than it will be to implement compressed writeable drives.

For ISO images, one could for example have a simple table of offsets where each compressed sector starts, as a header for the compressed image. With an algorithm like Heatshrink which uses very little RAM to decompress, the whole driver could have a very small RAM footprint :-)

---
FreeDOS / DOSEMU2 / ...

DosWorld

02.01.2025, 18:24
(edited by DosWorld, 02.01.2025, 18:53)

@ mceric
 

lDebug release 9 - Heatshrink compressed drives?

> > Heatshrink compression
> compressed RAMDISKs, partitions

https://www.ietf.org/rfc/rfc1978.txt
https://github.com/openzfs/zfs/blob/master/module/zfs/lzjb.c

---
Make DOS great again!
Make Russia small again!

jadoxa

Homepage E-mail

Queensland, Australia,
29.01.2025, 08:46

@ mceric
 

lDebug release 9 - Heatshrink compressed drives?

> If you ever feel bored, maybe you could consider building a heatshrinked
> version of SHSUCDHD?

If you've got the RAM I think you'd be better off just using gzip and SHSUCDRD. But anyway, here's a demonstration compressor (Windows binary), using Tamp (with a modified compressor.c to remove the header byte). It's not exactly quick, but a Windows (or Linux) version could parallelize. Haven't looked into decompressing, but it seems a straight-forward implementation would add about 5K (2k code, 2Ki buffer, 1Ki window). Not that I'll probably ever write one...

mceric

Germany,
30.01.2025, 02:21

@ jadoxa
 

Heatshrink compressed drives? - Tamp ISO compression test

> > If you ever feel bored, maybe you could consider building a heatshrinked
> > version of SHSUCDHD?
>
> If you've got the RAM I think you'd be better off just using gzip and
> SHSUCDRD. But anyway, here's a
> demonstration
> compressor (Windows binary), using Tamp...

Interestingly short :-)

So your image file format consists of a 4 byte header and an array of 32-bit compressed sector offsets? I fail to see where the total number of sectors in the image is stored, though?

If you already have test-compressed a few CD images you had around, could you share only the resulting compressed-image headers WITHOUT the compressed sector contents themselves, as a small file on your page? I am curious about the distribution of compressed sector sizes for typical ISOs :-)

Of course it would be possible to "compress" the table of file offsets of compressed sectors, but that would complicate access later and it is not necessary for the evaluation of compressors like TAMP.

A simple scheme would be to only store the absolute offset of every 16th compressed sector, followed by only the 16 lower bits for the next 15 offsets. Saves roughly half of the header size. To "decompress" the offset of sector x, one has to read the full offset F of sector (x and not 15), then, if (x and 15) is non-zero, read the low bits L for sector (x and 15) and compute F + (L if L was read) + (65536 if L was read and L < low bits of F). This should work for sector sizes of up to 4096 bytes, such as ISO sectors of 2048 bytes.

But I am getting distracted :-)

I am quite curious what the compressed sector size distributions are in your experience, also in relation to overall compression factors for the images as a whole.

Thank you!

---
FreeDOS / DOSEMU2 / ...

tom

Homepage

Germany (West),
30.01.2025, 17:36
(edited by tom, 30.01.2025, 20:23)

@ mceric
 

Heatshrink compressed drives? - Tamp ISO compression test

> I am quite curious what the compressed sector size distributions are in
> your experience, also in relation to overall compression factors for the
> images as a whole.

Now if you find this so interesting, why don't you sit down yourself and code
at least your own test suite where you can easily change compression method, compression unit size, and offset table storage method yourself, rather than
always suggesting ways others should spend their time coding your projects.

<edited to add>Hint: compression ratio won't be impressive for a compression unit of 2048 Byte.
However, using a bigger unit of say 8KB, but this would require a bigger memory footprint.

Actually, I don't think .ISO files will compress great. They tend to have a high
portion of media files like .WAV, .mp3, .JPG and similar which doesn't compress at all.

So I suggest starting such a project by compressing your .ISO collection (with any compressor of your choice), rather than discussing compression methods.

jadoxa

Homepage E-mail

Queensland, Australia,
31.01.2025, 09:24

@ mceric
 

Heatshrink compressed drives? - Tamp ISO compression test

> I fail to see where the total number of
> sectors in the image is stored, though?

I didn't, as I thought the driver tested that, but turns out it just uses the file size. That would mean either moving the offsets to the end, adding a pointer to that, then testing EOF; or store the number of sectors and test it; or leave it and hope. :)

> I am curious about the distribution of compressed sector sizes for typical ISOs

As tom said (it doesn't matter what I have, it matters what you have). Having said that, I did another test with LZ4 and here's what I got.


                    LZ4        Tamp        LZ4         gzip
                Ratio %     Ratio %     Ratio %     Ratio %
250GAMES        1.36  73.4  1.41  70.8  1.51  66.2  1.61  62.3
APC4101         1.03  97    1.03  96.9  1.05  95.4  1.06  94.7
EXTREME         1.49  67.2  1.68  59.6  1.8   55.6  2.09  47.9
fdosgame135b    1.69  59.3  1.81  55.1  2.01  49.7  2.25  44.4
FUZZYCD         1.18  84.8  1.22  81.9  1.43  70.2  1.45  69.1
Holmes2         1.13  88.2  1.19  83.7  1.3   76.7  1.42  70.5
NFSPCCD         1.12  89.6  1.14  87.7  1.16  85.8  1.24  80.4
osborne         1.62  61.6  1.7   58.7  1.94  51.6  2.11  47.5
PCU_OCT99_1     1.06  94.7  1.06  94.4  1.08  92.9  1.08  92.2
PCU_OCT99_2     1.01  99.2  1.01  99    1.01  98.5  1.02  98.2
PCW             1.29  77.7  1.44  69.5  1.55  64.3  1.77  56.5
RTZ-CD          1.1   91    1.15  86.7  1.33  75    1.44  69.5
Stellar7        1.08  92.8  1.13  88.3  1.14  87.4  1.3   76.7
Titus           1.13  88.1  1.16  86.3  1.2   83.6  1.24  80.8


APC, PCU & PCW are magazine cover discs; osborne is from my first PC (PC-DOS 6.3 & WfWg 3.11); the rest are games.
LZ4 (1.10.0) & Tamp are sector compression, the other LZ4 (1.3b) & gzip are file; ratio is the size of the original compared to the compressed (original_size / compressed_size); % is the opposite (compressed_size / original_size * 100).

Whilst LZ4 did save half a gig, and Tamp another 100 meg on top of that, I still don't think it's worth the effort.

> A simple scheme would be to only store the absolute offset of every 16th
> compressed sector, followed by only the 16 lower bits for the next 15
> offsets.

A less simple scheme that saves even more would be to store only the length, allowing packing two sectors into three bytes (a la FAT12); doing that for 32 sectors would be 52 bytes (dword address plus 48 bytes for the 32 lengths). Not sure how practical it would end up being, though.

mceric

Germany,
31.01.2025, 15:15

@ jadoxa
 

Heatshrink compressed drives? - Tamp ISO compression test

Hi! Thank you for sharing your results :-)

Let me sort those by TAMP ratio, omitting the magazine CD which did not compress well (saving less than 8% except for PCW), leaving the games and your Osborne (DOS/WfW) data, and expressing the percentages in saved instead of remaining percent:


.                  LZ4        Tamp           LZ4         gzip
.               Ratio %     Ratio %        Ratio %     Ratio %
fdosgame135b    1.69  40.7  1.81  44.9     2.01  50.3  2.25  55.6
osborne         1.62  38.4  1.70  41.3     1.94  48.4  2.11  52.5
EXTREME         1.49  32.8  1.68  40.4     1.80  44.4  2.09  52.1
250GAMES        1.36  26.6  1.41  29.2     1.51  33.8  1.61  37.7
FUZZYCD         1.18  15.2  1.22  18.1     1.43  29.8  1.45  30.9
Holmes2         1.13  11.8  1.19  16.3     1.30  23.3  1.42  29.5
Titus           1.13  11.9  1.16  13.7     1.20  16.4  1.24  19.2
NFSPCCD         1.12  10.4  1.14  12.3     1.16  14.2  1.24  19.6
RTZ-CD          1.10   9.0  1.15  13.3     1.33  25.0  1.44  30.5
Stellar7        1.08   7.2  1.13  11.7     1.14  12.6  1.30  23.3


If I understand you correctly, the left 2 columns are per-sector compressed, the right are whole-ISO solid compression? And TAMP always compressed better than LZ4? :-)

One could say that solid compression typically only squeezes out an extra 10%.

GZIP usually does not compress your ISOs much better than the faster LZ4.

Note that TAMP is fast in decompression, which is important for the use in compressed disk images, but slow in compression, while ZLIB is balanced there.

> > A simple scheme would be to only store the absolute offset of every 16th
> > compressed sector, followed by only the 16 lower bits for the next 15
> > offsets.

My idea was to avoid having to run a loop of additions to reconstruct the offset of an arbitary sector.

By storing only the lengths, you would need a loop. However, you could force all lengths and offsets to be multiples of 8 or 16, to be able to encode the compressed length of a 2k or 4k sector in a single byte.

---
FreeDOS / DOSEMU2 / ...

mceric

Germany,
31.01.2025, 23:33

@ mceric
 

Heatshrink compressed drives? - Tamp ISO compression test

Data news, everybody :-)

I made a little histogram counter for CDH files and ran it on some ISOs I had around, after "tampisoing" them. Enjoy, or something ;-)


File fdbasecd_2007-09-06.cdh compressed 8333312 to 6699576 bytes
4069 sectors, 19.6% saved, 80.4% remaining
Histo:  Empty ..128 ..256 ..384 ..512 ..640 ..768 ..896 ..1024
         4.7%  4.1%  2.1%  3.1%  1.8%  1.7%  1.8%  1.6%   3.1%
..1152 ..1280 ..1408 ..1536 ..1664 ..1792 ..1920 ..2047 ==2048
  4.5%   2.6%   1.3%   1.6%   1.6%   1.5%   1.0%   1.4%  69.6%

File fdbootcd_0.9.BETA.cdh compressed 10291200 to 8758719 bytes
5025 sectors, 14.9% saved, 85.1% remaining
Histo:  Empty ..128 ..256 ..384 ..512 ..640 ..768 ..896 ..1024
         3.9%  2.5%  1.9%  2.8%  1.9%  1.3%  1.5%  1.9%   2.1%
..1152 ..1280 ..1408 ..1536 ..1664 ..1792 ..1920 ..2047 ==2048
  3.2%   1.8%   1.1%   1.2%   1.1%   1.3%   1.2%   1.0%  77.3%

File fdbootcd_0.9rc5.BETA.cdh compressed 11599872 to 9563353 bytes
5664 sectors, 17.6% saved, 82.4% remaining
Histo:  Empty ..128 ..256 ..384 ..512 ..640 ..768 ..896 ..1024
         3.5%  2.3%  1.3%  3.3%  2.0%  1.5%  1.6%  4.1%   3.9%
..1152 ..1280 ..1408 ..1536 ..1664 ..1792 ..1920 ..2047 ==2048
  3.9%   2.6%   1.1%   1.4%   1.1%   1.3%   1.3%   1.4%  71.3%

File fdoslite_0.9pre.cdh compressed 36026368 to 22294142 bytes
17591 sectors, 38.1% saved, 61.9% remaining
Histo:  Empty ..128 ..256 ..384 ..512 ..640 ..768 ..896 ..1024
         2.2%  2.3%  2.5%  3.3%  2.7%  3.3%  4.0%  5.5%   7.8%
..1152 ..1280 ..1408 ..1536 ..1664 ..1792 ..1920 ..2047 ==2048
 11.3%  13.4%   9.8%   8.6%   6.4%   3.6%   2.2%   1.8%  18.2%

File freedos_1.0_fdfullcd.cdh compressed 160184320 to 147371512 bytes
78215 sectors, 8.0% saved, 92.0% remaining
Histo:  Empty ..128 ..256 ..384 ..512 ..640 ..768 ..896 ..1024
         0.8%  1.1%  1.0%  1.2%  1.1%  1.2%  1.6%  1.8%   2.5%
..1152 ..1280 ..1408 ..1536 ..1664 ..1792 ..1920 ..2047 ==2048
  2.9%   2.2%   1.3%   1.6%   1.3%   1.0%   0.8%   1.1%  84.4%

File kramers_nederlandse_taal.cdh compressed 147781632 to 73337302 bytes
72159 sectors, 50.4% saved, 49.6% remaining
Histo:  Empty ..128 ..256 ..384 ..512 ..640 ..768 ..896 ..1024
         0.9%  0.6%  0.5%  0.8%  1.1%  2.7% 13.7% 28.7%  21.8%
..1152 ..1280 ..1408 ..1536 ..1664 ..1792 ..1920 ..2047 ==2048
 14.3%   7.9%   3.0%   2.4%   1.4%   0.9%   1.0%   0.9%   6.5%

File hp_windows_print_drivers.cdh compressed 478834688 to 430721071 bytes
233806 sectors, 10.0% saved, 90.0% remaining
Histo:  Empty ..128 ..256 ..384 ..512 ..640 ..768 ..896 ..1024
         1.0%  0.8%  0.7%  0.6%  1.7%  1.5%  1.7%  1.9%   2.3%
..1152 ..1280 ..1408 ..1536 ..1664 ..1792 ..1920 ..2047 ==2048
  3.0%   3.4%   4.6%   3.3%   1.5%   0.9%   1.0%   3.3%  75.8%

File logox_3.5.cdh compressed 514738176 to 476102193 bytes
251337 sectors, 7.5% saved, 92.5% remaining
Histo:  Empty ..128 ..256 ..384 ..512 ..640 ..768 ..896 ..1024
         0.6%  0.7%  1.2%  1.5%  1.2%  1.3%  1.7%  1.8%   1.5%
..1152 ..1280 ..1408 ..1536 ..1664 ..1792 ..1920 ..2047 ==2048
  1.3%   1.5%   1.4%   1.7%   2.0%   2.8%   5.6%  11.2%  70.1%

File encyclopedie.cdh compressed 580093952 to 517457387 bytes
283249 sectors, 10.8% saved, 89.2% remaining
Histo:  Empty ..128 ..256 ..384 ..512 ..640 ..768 ..896 ..1024
         0.8%  0.8%  0.9%  3.7%  2.0%  1.6%  1.9%  1.5%   1.4%
..1152 ..1280 ..1408 ..1536 ..1664 ..1792 ..1920 ..2047 ==2048
  2.0%   3.8%   2.0%   1.8%   1.4%   1.6%   1.6%   3.9%  76.5%

File thesis.cdh compressed 735221760 to 513439278 bytes
358995 sectors, 30.2% saved, 69.8% remaining
Histo:  Empty ..128 ..256 ..384 ..512 ..640 ..768 ..896 ..1024
         0.6%  0.8%  6.8%  0.9%  1.7%  3.1%  2.1%  3.5%   5.7%
..1152 ..1280 ..1408 ..1536 ..1664 ..1792 ..1920 ..2047 ==2048
  7.9%  10.5%  11.2%  10.5%   6.1%   2.4%   1.9%   1.9%  31.4%


I also considered doing some Linux install/live CD, but those already do not compress well in solid compression, so trying to TAMP them seems a bit pointless. Example: Ubuntu 6.10 still takes 98.1% of the original space after BZIP2 of the ISO as a whole. For a classic Debian 3.0, more than two decades old, BZIP2 compresses the first CD ISO to 95.5%.

Even a 2016 Svarog386 DOS ISO only BZIP2 compresses to 99.0% of the original size. I guess I should have tested some DOS live CD with a lot of pre-installed stuff, but still UPX etc. will mean that many files already are compressed anyway.

It is interesting that quite a few sectors either do not compress at all and get stored as-is, or do not contain any data at all. I am not sure what the latter means. Does TAMP compress zero-filled arrays to zero-sized results, no matter how long they are?

---
FreeDOS / DOSEMU2 / ...

tom

Homepage

Germany (West),
01.02.2025, 11:19

@ mceric
 

Heatshrink compressed drives? - Tamp ISO compression test

> File fdbasecd_2007-09-06.cdh compressed 8333312 to 6699576 bytes

> File fdbootcd_0.9.BETA.cdh compressed 10291200 to 8758719 bytes

> File fdbootcd_0.9rc5.BETA.cdh compressed 11599872 to 9563353 bytes

> File fdoslite_0.9pre.cdh compressed 36026368 to 22294142 bytes

> File freedos_1.0_fdfullcd.cdh compressed 160184320 to 147371512 bytes

This might be more of a possible target for deduplication than compression.

As Linux has file systems that support deduplication: how about testing this on BTRFS or ZFS?

mceric

Germany,
01.02.2025, 13:54
(edited by mceric, 01.02.2025, 14:07)

@ tom
 

Heatshrink compressed drives? - Tamp ISO compression test

> This might be more of a possible target for deduplication than
> compression.
>
> As Linux has file systems that support deduplication: how about testing
> this on BTRFS or ZFS?

I have tried to find out whether RLE would be useful for the ISOs, assuming that the encoding is "prefix + N bytes to copy" or "postfix to repeat last byte/word/dword N times". The -1st byte/word/dword of each sector is assumed to be 0.

What bothers me is that I find only 1/9 of the expected all-sector-zero or -same in the thesis ISO, for example.

For the 2007-09-06 fdbasecd, the difference is smaller: I find 4.2% of the sectors to be all-zero (and none all-same-some-other-value, by the way) but TAMP compresses 4.7% of the sectors to zero compressed size.

Still, the difference might be 0.5% "sector is same as previous sector" which should not compress to zero size in a system where you expect each sector to be decompressible without needing other sectors as context.

So you might be right that the compressor state did not get reset between sectors, as there are unrealistically many sectors compressing to a size of zero?

---
FreeDOS / DOSEMU2 / ...

jadoxa

Homepage E-mail

Queensland, Australia,
01.02.2025, 15:33

@ mceric
 

Heatshrink compressed drives? - Tamp ISO compression test

> However, you could force all lengths and offsets to be multiples of 8 or 16,
> to be able to encode the compressed length of a 2k or 4k sector in a single
> byte.

Hmm, so we pad compressed data to a multiple of eight, presumably that means adding an average of four bytes, then we have a byte length, for an average of five bytes per sector. Great! Well, if we make each sector 2048 bytes, then we can remove the index altogether!

> > As Linux has file systems that support deduplication: how about testing
> > this on BTRFS or ZFS?

Considering this is for a DOS driver what a Linux FS does is irrelevant. At least, what I'm doing is for a DOS driver (SHSUCDHD).

> So you might be right that the compressor state did not get reset between
> sectors, as there are unrealistically many sectors compressing to a size of
> zero?

The compressor is reset before compressing each sector; trailing zeros are removed before compression (hence all-zero sectors being zero-length).

mceric

Germany,
06.02.2025, 02:42

@ mceric
 

Heatshrink compressed drives? - Tamp ISO compression test

> Data news, everybody :-)
>
> I made a little histogram counter for CDH files and ran it on some ISOs I
> had around, after "tampisoing" them. Enjoy, or something ;-)

In case you were wondering about the RLE outcomes: I computed some ESTIMATES for how much you could save using a cleverly chosen set of command bytes each of which can express "repeat previous byte/word/dword N times, then copy next M bytes as-is, then read next command byte" for popular values of N and M.

I assumed that the command bytes have to be command words if N is large, but that they would always be either bytes or words. Just RLE, no copying of data from further back etc.

Classic compressors typically used Lempel-Ziv and similar algorithms and command bit strings of variable length, frequently used values expressed as shorter constants (Huffman coding), drawing from a pipeline of command bits which get refilled 1 byte at a time. Commands usually meant something like "copy N upcoming bytes as-is", "repeat N bytes M times", "copy N bytes from M bytes ago (and possibly: then copy X bytes as-is)".

>
> File fdbasecd_2007-09-06.cdh compressed 8333312 to 6699576 bytes
> 4069 sectors, 19.6% saved, 80.4% remaining

This had 5% empty sectors and 70% non-compressible ones. Byte-RLE: min. 85%

> File fdbootcd_0.9.BETA.cdh compressed 10291200 to 8758719 bytes
> 5025 sectors, 14.9% saved, 85.1% remaining

Circa 4% empty and 77% non-compressible sectors. Byte-RLE: min. 88%

> File fdbootcd_0.9rc5.BETA.cdh compressed 11599872 to 9563353 bytes
> 5664 sectors, 17.6% saved, 82.4% remaining

Circa 4% empty and 71% non-compressible sectors. Byte-RLE: min. 86%

> File fdoslite_0.9pre.cdh compressed 36026368 to 22294142 bytes
> 17591 sectors, 38.1% saved, 61.9% remaining

Circa 2% empty and 18% non-compressible sectors. Byte-RLE: min. 82%

> File freedos_1.0_fdfullcd.cdh compressed 160184320 to 147371512 bytes
> 78215 sectors, 8.0% saved, 92.0% remaining

Circa 1% empty and 84% non-compressible sectors. Byte-RLE: min. 95%

> File kramers_nederlandse_taal.cdh compressed 147781632 to 73337302 bytes
> 72159 sectors, 50.4% saved, 49.6% remaining

Circa 1% empty, but only 7% non-compressible sectors. Byte-RLE: min. 92%
Many sectors TAMP-compress a lot. RLE works surprisingly bad for this.

> File hp_windows_print_drivers.cdh compressed 478834688 to 430721071 bytes
> 233806 sectors, 10.0% saved, 90.0% remaining

Circa 1% empty and 76% non-compressible sectors. Byte-RLE: min. 95%

> File logox_3.5.cdh compressed 514738176 to 476102193 bytes
> 251337 sectors, 7.5% saved, 92.5% remaining

Circa 1% empty and 70% non-compressible sectors. Byte-RLE: min. 89%, which is clearly too optimistic. See above.

> File encyclopedie.cdh compressed 580093952 to 517457387 bytes
> 283249 sectors, 10.8% saved, 89.2% remaining

Circa 1% empty and 77% non-compressible sectors. WORD (!) RLE: min. 93%
Here, word-RLE is predicted to save 20% more space than byte-RLE.

> File thesis.cdh compressed 735221760 to 513439278 bytes
> 358995 sectors, 30.2% saved, 69.8% remaining

Circa 1% empty and 31% non-compressible sectors, many in between. WORD-RLE: min. 82%, predicted to save twice as much space than byte-RLE.

>


Conclusion: In some cases, where ISO contain mostly already compressed data mixed with some empty areas such as empty or only partially filled sectors, RLE might be useful. In specific cases, RLE compression might work better when using words instead of bytes as the repeatable units.

Of course, that is only a very rough estimate, because I optimistically assumed that the repeat-postfix bytes magically double as copy-as-is prefix bytes. In reality, you often need more than 1 byte for that.

TAMP always works a lot better than simple RLE schemes, so it is worth the extra computation and complexity. Still, most test candidates above even TAMP-compress to 80-93% of their original size, so the use of (sector-wise) compressed ISOs only has limited use for those. In particular, installer ISOs do not compress well.

PS: https://en.wikipedia.org/wiki/842_(compression_algorithm) is yet another LZ variant - for fast RAM compression, while https://en.wikipedia.org/wiki/Snappy_(compression) is a fast LZ style algorithm without bitstrings.

---
FreeDOS / DOSEMU2 / ...

ecm

Homepage E-mail

Düsseldorf, Germany,
06.02.2025, 04:26

@ mceric
 

Snappy compression

> PS: https://en.wikipedia.org/wiki/842_(compression_algorithm) is yet
> another LZ variant - for fast RAM compression, while
> https://en.wikipedia.org/wiki/Snappy_(compression) is a fast LZ style
> algorithm without bitstrings.

I have a Snappy depacker in inicomp already: https://hg.pushbx.org/ecm/inicomp/file/536526f7c49c/snappy.asm

---
l

mceric

Germany,
08.02.2025, 01:04

@ ecm
 

Snappy compression

Hi ECM,

> > https://en.wikipedia.org/wiki/Snappy_(compression) is a fast LZ style
> > algorithm without bitstrings.
>
> I have a Snappy depacker in inicomp already:
> https://hg.pushbx.org/ecm/inicomp/file/536526f7c49c/snappy.asm

I was curious about the binary size of that, but neither NASM nor the TASM mode of YASM want to assemble it, so I just have to ask you how big your Snappy depacker is :-)

The unpacker also seems to rely on normalise_dssi_pointer, ..._both_pointers, ...pointer_with_displacement_bxcx, pointer_to_linear, check_pointers_not_overlapping, disp_al, disp_al_counter and disp_al_for_progress to be provided elsewhere.

And while I am at it, how much RAM does it need? Apparently just the output buffer and less than 40 bytes of stack "lvars"?

---
FreeDOS / DOSEMU2 / ...

ecm

Homepage E-mail

Düsseldorf, Germany,
08.02.2025, 18:02

@ mceric
 

Snappy compression - script to build a compressed executable

> Hi ECM,
>
> > > https://en.wikipedia.org/wiki/Snappy_(compression) is a fast LZ style
> > > algorithm without bitstrings.
> >
> > I have a Snappy depacker in inicomp already:
> > https://hg.pushbx.org/ecm/inicomp/file/536526f7c49c/snappy.asm
>
> I was curious about the binary size of that, but neither NASM nor the TASM
> mode of YASM want to assemble it, so I just have to ask you how big your
> Snappy depacker is :-)

In an lDebug source directory, running a command like INICOMP_METHOD=snappy use_build_compress_only=1 build_name=cdebugx ./mak.sh -D_PM=1 -D_DEBUG -D_DEBUG_COND -D_DEFAULTSHOWSIZE will pack the (existing) .big executable image using the Snappy method. (Drop "use_build_compress_only" or set it to zero to actually build the debugger first.) The final inicomp and iniload stages display messages like these:

    148 progress dots, method           snappy
../../inicomp/inicomp.asm:2040: warning: inisz: 1952 bytes used for depacker [-w+us
er]
../../ldosboot/iniload.asm:864: warning: 1 bytes in front of ms7_entry [-w+user]
../../ldosboot/iniload.asm:1271: warning: 9 bytes in front of ldos_entry [-w+user]
../../ldosboot/iniload.asm:1687: warning: 10 bytes in front of end [-w+user]
../../ldosboot/iniload.asm:1750: warning: 428 bytes in front of end2 [-w+user]
  130560 bytes ( 90.10%), method           snappy
Note: Method snappy selected.
-rw-r--r-- 1 130560  ../bin/lcdebugx.com


So the entire triple-mode (bootloader, DOS device driver, DOS executable) inicomp depacker stage, including progress display, takes 1952 bytes.

> The unpacker also seems to rely on normalise_dssi_pointer,
> ..._both_pointers, ...pointer_with_displacement_bxcx, pointer_to_linear,
> check_pointers_not_overlapping, disp_al, disp_al_counter and
> disp_al_for_progress to be provided elsewhere.

Yes, these are all defined in inicomp.asm, the main assembly source that includes the method-specific source text file and contains the code to call the depacker. The canonical source of truth on how to build is the mak.sh script, either the one from lDebug or the one from the kernwrap repo as used by lDOS/Enhanced DR-DOS and lDOS/MS-DOS.

I'll step you through the steps it takes to assemble the Snappy depacker:

* https://hg.pushbx.org/ecm/ldebug/file/81124309228f/source/mak.sh#l586 this sets some variables for the Snappy method, importantly it tells you inicomp.asm requires the -D_SNAPPY switch to choose the correct depacker. It also tells us the packer executable snzip is used (by default).
* https://github.com/kubo/snzip this is the repo that I use to pack.
* https://hg.pushbx.org/ecm/ldebug/file/81124309228f/source/mak.sh#l680 here we actually call the packer, as in snzip -ck file.big > file.sz
* https://hg.pushbx.org/ecm/ldebug/file/81124309228f/source/mak.sh#l739 this builds the "test program" executable image, which is the MZ executable stage used in an iniload wrapper stage to create the test program. The test program can tell us three things: Whether the depacker works at all and produces the exact expected output, how much memory the depacker needs to succeed, and how many progress units (dots) there are for this specific compressed image. As a fourth choice (not used by default) it can also run a benchmark (repeatedly depacking) which can be used as a speed test, usually with at least 16 runs and up to as many as 1024 to reduce the fraction of the time spent in startup cost. _PAYLOAD_FILE must refer to the compressed payload file.
* https://hg.pushbx.org/ecm/ldebug/file/81124309228f/source/mak.sh#l936 here we build the actual final inicomp stage. Again -D_SNAPPY must be passed and _PAYLOAD_FILE specifies the compressed payload file. Some more options are used as well. $inicomp_def_progress $inicomp_def_progressamount are used to pass the amount of progress dots determined from running the test program.
* After inicomp.asm has been assembled, it is used as the executable image payload to lDOS boot's iniload.asm, which also supports triple-mode execution.

> And while I am at it, how much RAM does it need? Apparently just the output
> buffer and less than 40 bytes of stack "lvars"?

Yes, there's the lframe variables (these are specified using macros that are defined in lmacros2.mac of my lDOS macro collection) and normal stack usage during the run. The inicomp stage usually supports overlapping buffers. The output buffer starts in memory below the packed payload. While depacking, it is valid for the depacked image to grow into the space used by the packed payload, but it must never cross into parts of the payload not yet consumed by the depacker. The exact boundary of how much memory is needed is determined by the test program, and is encoded into the MZ executable's minimum allocation field by the scripting. (The tellsize program is also used on the debugger .big image to determine how much memory is needed for the debugger init, and the bigger of the two sizes is encoded into the final executable.)

---
l

ecm

Homepage E-mail

Düsseldorf, Germany,
09.02.2025, 21:38

@ ecm
 

Snappy compression - script to build a compressed executable

I added the few needed changes to allow using snappy.asm's depacker for the test file mode of the inicomp repo. This is a small application that loads a packed image from a file, depacks it entirely in memory, and writes the depacked image to another file. This is how to build it and the size of the executable:

$ nasm testfile.asm -o testfile.com -I ../lmacros/ -D_SNAPPY
$ ls -lgG testfile.com
-rw-r--r-- 1 2245 Feb  9 21:29 testfile.com


You use it like testfile input.sz output.dat and it emits the message "Done." if it had enough memory and the depacking seems to have succeeded. Unlike the test program, it doesn't take the original uncompressed data as an input and therefore doesn't compare its output to check that it is exact. So you should compare the output to the original file yourself.

---
l

tom

Homepage

Germany (West),
31.01.2025, 17:42
(edited by tom, 31.01.2025, 18:29)

@ jadoxa
 

Heatshrink compressed drives? - Tamp ISO compression test

> > A simple scheme would be to only store the absolute offset of every 16th
> > compressed sector, followed by only the 16 lower bits for the next 15
> > offsets.
>
> A less simple scheme that saves even more would be to store only the
> length, allowing packing two sectors into three bytes (a la FAT12); doing
> that for 32 sectors would be 52 bytes (dword address plus 48 bytes for the
> 32 lengths). Not sure how practical it would end up being, though.

Given that the compression unit of 2K will be compressed to (on average) ~1.5K,
it's fairly irrelevant if the offset is stored in 2, 4, or 3 bytes.

Also: in the published results for TAMP, does TAMP always start fresh for the 2K unit, or does it keep the previous compression engine state?
For the discussed purpose, the compressor would need a reset for each 2K unit.
I read the TAMP instructions, but it wasn't clear to me.

Oso2k

30.01.2025, 05:12

@ jadoxa
 

lDebug release 9 - Heatshrink compressed drives?

> > If you ever feel bored, maybe you could consider building a heatshrinked
> > version of SHSUCDHD?
>
> If you've got the RAM I think you'd be better off just using gzip and
> SHSUCDRD. But anyway, here's a
> demonstration
> compressor (Windows binary), using Tamp (with a modified
> compressor.c to remove the header byte). It's not exactly quick, but a
> Windows (or Linux) version could parallelize. Haven't looked into
> decompressing, but it seems a straight-forward implementation would add
> about 5K (2k code, 2Ki buffer, 1Ki window). Not that I'll probably ever
> write one...

What about LZ4 compression? It uses very little memory and Trixter made a version for the 8088. http://www.oldskool.org/pc/lz4_8088/

Back to index page
Thread view  Board view
22287 Postings in 2062 Threads, 399 registered users, 207 users online (1 registered, 206 guests)
DOS ain't dead | Admin contact
RSS Feed
powered by my little forum