Back to home page

DOS ain't dead

Forum index page

Log in | Register

Back to index page
Thread view  Board view
bencollver

Homepage

04.02.2026, 17:02
 

trip report: Tiny C Compiler 0.9.27 (Miscellaneous)

First Attempt:

I bootstrapped the Tiny C Compiler 0.9.27 using Watcom 1.9 and HXRT on SvarDOS. After some work, Watcom successfully built it. But the bootstrapped TCC.EXE couldn't build its own runtime, libtcc1.a.

I got the following error message:

Exception 0D
EAX=00000287 EBX=0000547F ECX=00000008 EDX=00000274 ESI=00000100
EDI=00000100 EBP=00005878 ESP=000056E2 EFL=00013246 EIP=000009E9
CS=0097 (00111000,000058AF,00FB) SS=008F (00111000,000058AF,40F3)
DS=009F (000132A0,0000010F,00F3) ES=00D7 (00012E30,000000FF,00F3)
FS=0000 (********,********,****) GS=0000 (********,********,****)
LDTR=0038 (FF80F000,00000FFF,0082) TR=0030 (FF80AD70,00000067,008B)
ERRC=0000 (********,********,****) PTE 1. Page LDT=01FCF467
GDTR=07FF:FF80E0000 IDTR=07FF:FF80E800 PTE CR2=00000027
CR0=80000011 CR2=00000000 CR3=01FDC000 CR4=00000200 TSS:ESP0=00002290
DR0-3=00000000 00000000 00000000 00000000 DR6=FFFF0FF0 DR7=00000400
LPMS=0087(01) RMS=1B51:0200 cRMCB=0000 IRQ=000000 ISR=0000
   [EIP]=F3 67 A4 F7 C7 00 FF 75 03 B0 0D AA
   [ESP]=4A3F 0000 473C 0040 5878 0000 5702 0000
000056F2=547F 0000 0274 0000 005B 0000 0204 0000
00005702=008F 008F 289D 00C7 5726 0000 0040 0040
00005712=5878 0000 572A 0000 R000 0000 570A 0000
00005722=0001 0000 0507 0002 00C7 4550 0000 014C
00005732=0004 3D9F 6981 0000 0000 0000 0000 00E0

Syntax Error


The command was:

.\tcc -m32 -ar cr lib/32/libtcc1.a libtcc1.o crt1.o crt1w.o wincrt1.o wincrt1w.o dllcrt1.o dllmain.o chkstk.o bcheck.o alloca86.o oca86-bt.o

It succeeded when removing oca86-bt.o from the end.

It also succeeded when removing all .o files EXCEPT for oca86-bt.o.

Try as i might, i could not resolve this issue on SvarDOS.

It worked just fine on FreeDOS where it successfully bootstrapped the compiler and built its own runtime.

From HXRT.TXT:

HX has been tested to run with MS-DOS v5/6/7 and FreeDOS v1.0. It might work with other DOS versions, but this is untested and not recommended.

Second Attempt:

I bootstrapped the Tiny C Compiler 0.9.27 using Watcom 1.9 and HXRT on FreeDOS. It successfully self-hosted itself using the bootstrapped TCC.EXE. But the self-hosted TCC.EXE failed to run.

I got the following error message:

dkrnl32: exception C0000005, flags=0 occurred at BF:3AACE4
        ax=718024 bx=4034EC cx=717FE4 dx=41000C
        si=400000 di=113B2D bp=226EFC sp=226EEC
        exception caused by access to memory address 718024
        ip = Module 'libtcc.dll'+ACE4
        [eip] = 89 08 8B 85 F8 FF FF FF 89 85 F4 FF
        [esp] = 000002B3 00717FE4 0071800C 00711414 00226F08 003DC8B5
dkrnl32: fatal exit!


This error happened when i ran tcc.exe regardless of the arguments.

When i ran the static linked tcc64.exe, i got a similar error, but without the libtcc.dll reference.

I couldn't resolve this issue on FreeDOS.

It worked just fine on Windows XP, where it successfully self-hosted the compiler. This self-hosted compiler worked on Windows XP, but crashed on FreeDOS with HXRT.

I ran hx\test\dostest.exe and all tests passed.

I tried various settings in the DPIMILDR and HDPMI environment variables. None helped.

I tried using MS-DOS 6.22 and that didn't help either.

tkchia

Homepage

04.02.2026, 17:55
(edited by tkchia, 04.02.2026, 18:18)

@ bencollver
 

trip report: Tiny C Compiler 0.9.27

Hello bencollver,

For the first attempt: the instructions at cs:eip disassemble to

000009E9  F367A4            rep a32 movsb
000009EC  F7C700FF          test di,0xff00
000009F0  7503              jnz 0x9f5
000009F2  B00D              mov al,0xd
000009F4  AA                stosb


when interpreted as 16-bit code...

Apparently the rep a32 movsb caused an exception because it was trying to write to es:[0x100], and the upper limit of es was 0xff.

My wild guess is that the command line arguments as seen by tcc were too long, and some part(s) of the runtime (maybe the bootstrapping C library, or maybe HXRT, etc. ...) did not handle over-long command line arguments properly. I suppose it might be useful to see where the above code instructions were coming from...

Thank you!

---
https://codeberg.org/tkchiahttps://disroot.org/tkchia 路 馃槾 "MOV AX,0D500H+CMOS_REG_D+NMI"

bencollver

Homepage

04.02.2026, 18:48
(edited by bencollver, 04.02.2026, 19:08)

@ tkchia
 

trip report: Tiny C Compiler 0.9.27

> Hello bencollver,
>
> For the first attempt:
>
> I suppose it might be useful to see where
> the above code instructions were coming from...

Hi Tkchia,

Thanks for that information! I vote HXRT. I found those hex bytes in DPMILD32.EXE and here is a link to the code corresponding to your disassembly:

https://github.com/Baron-von-Riedesel/HX/blob/master/Src/DPMILDR/DPMILDR.ASM#L2864

Interesting that this behaves differently in FreeDOS than in SvarDOS.

Note that i can compile and run hello world with the bootstrapped TCC.EXE. It's not until i try to build TCC with the bootstrapped TCC.EXE that i run into problems on FreeDOS.

p.s. I could probably work around this first issue by using TCC's @file syntax for the overlong command line options.

Japheth

Homepage

Germany (South),
04.02.2026, 21:12

@ bencollver
 

trip report: Tiny C Compiler 0.9.27

> https://github.com/Baron-von-Riedesel/HX/blob/master/Src/DPMILDR/DPMILDR.ASM#L2864
>
> Interesting that this behaves differently in FreeDOS than in SvarDOS.

It's perhaps a COMMAND.COM issue.

I cannot remember all details currently, but since the cmdline in the PSP is restricted to 126 bytes, MS has extended this in MS-DOS 7.x : there's supposed to exist an environment variable CMDLINE, which may hold a cmdline of "any" size. HX's Win32 kernel emulation dkrnl32 uses this environment variable if possible.

---
MS-DOS forever!

bencollver

Homepage

04.02.2026, 21:44

@ Japheth
 

trip report: Tiny C Compiler 0.9.27

> It's perhaps a COMMAND.COM issue.
>
> I cannot remember all details currently, but since the cmdline in the PSP
> is restricted to 126 bytes, MS has extended this in MS-DOS 7.x : there's
> supposed to exist an environment variable CMDLINE, which may hold a cmdline
> of "any" size. HX's Win32 kernel emulation dkrnl32 uses this environment
> variable if possible.

Thanks for this detail!

Coincidentally, i've used an environment variable to pass command line arguments into QBASIC.EXE from a .bat file because QBASIC.EXE does not directly support using command line arguments from BASIC.

bretjohn

Homepage E-mail

Rio Rancho, NM,
05.02.2026, 22:24

@ Japheth
 

trip report: Tiny C Compiler 0.9.27

> I cannot remember all details currently, but since the cmdline in the PSP
> is restricted to 126 bytes, MS has extended this in MS-DOS 7.x : there's
> supposed to exist an environment variable CMDLINE, which may hold a cmdline
> of "any" size. HX's Win32 kernel emulation dkrnl32 uses this environment
> variable if possible.

FWIW, the details of the command-line tail are as follows.

There are 128 bytes set aside in the PSP for the command-tail. The first byte (at PSP:80h]) is the size (number of bytes) in the command-tail, and the last byte of the command-tail is always an ASCII 13 (Carriage Return or CR). That's why the command-tail size is limited to 126 bytes. In my programs (usually written in ASM), I normally ignore the size and just look for the CR.

The way the CMDLINE environment variable works is that if the command-tail is more than 126 bytes, the PSP still contains the first part of the tail and the size is shown as maximum (7Eh or 126). There is no direct indication anywhere that what's in the PSP is incomplete.

What I do in my programs is check the size byte (at PSP:[80h]) and if it's maxed out (7Eh) I look for a CMDLINE environment variable. If CMDLINE exists, I use it. If not, I revert back to what's in the PSP.

Others have adopted what MS did with Windows (e.g., when working at an NT command prompt there is a CMDLINE variable) so there is no way to test ahead of time whether it's supported or not. Also, a user can set an environment variable called CMDLINE to whatever they want so the mere presence of a CMDLINE variable doesn't guarantee anything about the command-tail.

Also, some environments/programs may assume that the command-tail is never more than 126 bytes, so may not properly be able to process a bigger one (you'll try to process a part of memory that isn't actually part of the command-tail)

Rugxulo

Homepage

Usono,
05.02.2026, 23:09

@ bretjohn
 

trip report: Tiny C Compiler 0.9.27

> There are 128 bytes set aside in the PSP for the command-tail. The first
> byte (at PSP:80h]) is the size (number of bytes) in the command-tail, and
> the last byte of the command-tail is always an ASCII 13 (Carriage Return or
> CR). That's why the command-tail size is limited to 126 bytes. In my
> programs (usually written in ASM), I normally ignore the size and just look
> for the CR.

Not sure the exact limit. Maybe you mean minus the length byte and actual ending CR. I know the DOS4GW extender couldn't handle longer than 126 or so. I had thought I read %CMDLINE% was only used when 7Fh or 80h (Borland??) length was found. But yeah, it's a Win9x feature, and FreeCOM (FreeDOS) supports it, so does DJGPP.

FYI, the core OpenWatcom tools should support longer cmdlines via env. var. if you put a '*' (asterisk) before them in your makefiles (using Wmake).

bretjohn

Homepage E-mail

Rio Rancho, NM,
06.02.2026, 01:59

@ Rugxulo
 

trip report: Tiny C Compiler 0.9.27

> Not sure the exact limit. Maybe you mean minus the length byte and actual
> ending CR.

That's exactly what I meant.

> I know the DOS4GW extender couldn't handle longer than 126 or
> so. I had thought I read %CMDLINE% was only used when 7Fh or 80h
> (Borland??) length was found.

I don't think you can depend on that in all cases (basically, an illegal value in the length byte). E.g., I know some programs if they perform an EXEC call don't set up the command-tail portion of the call correctly (in particular, they don't always set the size byte correctly). So if your program gets EXEC'd from another program you can't always depend on the size byte being correct (that's part of the reason I ignore it except to detect the potential presence of %CMDLINE%). In my programs I test for the length byte being >=7Eh, not exactly 7Eh, before looking for %CMDLINE%. But I also don't assume that just because it's a big value doesn't mean I'll necessarily find %CMDLINE%. And as I stated earlier, the user (or another program) can set %CMDLINE% so just because it exists doesn't mean it's associated with the command-tail.

If you're writing in a high-level language you usually depend on the compiler to handle all those details for you, and a lot of them (especially older ones) don't do it correctly with modern versions of DOS. At least in some cases you can bypass what the compiler does and look at the PSP and environment space yourself, but that sort of defeats the purpose of using a high-level language (it's supposed to handle those kinds of details automatically). When MS (or somebody else) changes the rules you can end up with messy crashes and bugs.

bencollver

Homepage

04.02.2026, 21:38

@ bencollver
 

trip report: Tiny C Compiler 0.9.27

> p.s. I could probably work around this first issue by using TCC's @file
> syntax for the overlong command line options.

I verified that i can use TCC's @file syntax to work around this issue on SvarDOS.

Taking a cue from tkchia, i disassembled the hex bytes from the second error message, and this time i found the hex bytes in TCC using mingw's objdump command.

C:\TCC0927>objdump -D libtcc.o
...
00009c51: <__sym_malloc>:
...
    9ce4:       89 08                   mov %ecx,(%eax)
    9ce6:       8b 85 f8 ff ff ff       mov -0x8(%ebp),%eax
    9cec:       89 85 f4 ff ff ff       mov %eax,-0xc(%ebp)


I found __sym_malloc() in tccgen.c

static Sym *__sym_malloc(void)
{
    Sym *sym_pool, *sym, *last_sym;
    int i;

    sym_pool = tcc_malloc(SYM_POOL_NB * sizeof(Sym));
    dynarray_add(&sym_pools, &nb_sym_pools, sym_pool);

    last_sym = sym_free_first;
    sym = sym_pool;
    for(i = 0; i < SYM_POOL_NB; i++) {
        sym->next = last_sym;
        last_sym = sym;
        sym++;
    }
    sym_free_first = last_sym;
    return last_sym;
}


I believe the disassembly corresponds to the for(;;) loop body, probably at "sym->next = last_sym". I am not sure i am ready to debug this one.

tkchia

Homepage

05.02.2026, 16:44

@ bencollver
 

trip report: Tiny C Compiler 0.9.27

Hello bencollver,

> I believe the disassembly corresponds to the for(;;) loop body, probably at
> "sym->next = last_sym". I am not sure i am ready to debug this one.

__sym_malloc seems to be used to extend a pool of free struct Sym thingies (TCC can then allocate new struct Sym objects from this free pool). It is odd that the routine is crashing.

Anyway, here is the machine code output that I got for this routine after compiling tccgen.c on my end (on a Linux machine):

00000a18 <__sym_malloc>:
     a18:       55                      push   %ebp
     a19:       89 e5                   mov    %esp,%ebp
     a1b:       81 ec 10 00 00 00       sub    $0x10,%esp
     a21:       b8 ec 1f 00 00          mov    $0x1fec,%eax
     a26:       50                      push   %eax
     a27:       e8 fc ff ff ff          call   a28 <__sym_malloc+0x10>
                        a28: R_386_PC32 tcc_malloc
     a2c:       83 c4 04                add    $0x4,%esp
     a2f:       89 45 fc                mov    %eax,-0x4(%ebp)
     a32:       8b 45 fc                mov    -0x4(%ebp),%eax
     a35:       50                      push   %eax
     a36:       b8 00 00 00 00          mov    $0x0,%eax
                        a37: R_386_32   nb_sym_pools
     a3b:       50                      push   %eax
     a3c:       b8 00 00 00 00          mov    $0x0,%eax
                        a3d: R_386_32   sym_pools
     a41:       50                      push   %eax
     a42:       e8 fc ff ff ff          call   a43 <__sym_malloc+0x2b>
                        a43: R_386_PC32 dynarray_add
     a47:       83 c4 0c                add    $0xc,%esp
     a4a:       8b 05 00 00 00 00       mov    0x0,%eax
                        a4c: R_386_32   sym_free_first
     a50:       89 45 f4                mov    %eax,-0xc(%ebp)
     a53:       8b 45 fc                mov    -0x4(%ebp),%eax
     a56:       89 45 f8                mov    %eax,-0x8(%ebp)
     a59:       b8 00 00 00 00          mov    $0x0,%eax
     a5e:       89 45 f0                mov    %eax,-0x10(%ebp)
     a61:       8b 45 f0                mov    -0x10(%ebp),%eax
     a64:       81 f8 e3 00 00 00       cmp    $0xe3,%eax
     a6a:       0f 83 2e 00 00 00       jae    a9e <__sym_malloc+0x86>
     a70:       e9 0b 00 00 00          jmp    a80 <__sym_malloc+0x68>
     a75:       8b 45 f0                mov    -0x10(%ebp),%eax
     a78:       89 c1                   mov    %eax,%ecx
     a7a:       40                      inc    %eax
     a7b:       89 45 f0                mov    %eax,-0x10(%ebp)
     a7e:       eb e1                   jmp    a61 <__sym_malloc+0x49>
     a80:       8b 45 f8                mov    -0x8(%ebp),%eax
     a83:       83 c0 18                add    $0x18,%eax
     a86:       8b 4d f4                mov    -0xc(%ebp),%ecx
     a89:       89 08                   mov    %ecx,(%eax)
     a8b:       8b 45 f8                mov    -0x8(%ebp),%eax
     a8e:       89 45 f4                mov    %eax,-0xc(%ebp)
     a91:       8b 45 f8                mov    -0x8(%ebp),%eax
     a94:       89 c1                   mov    %eax,%ecx
     a96:       83 c0 24                add    $0x24,%eax
     a99:       89 45 f8                mov    %eax,-0x8(%ebp)
     a9c:       eb d7                   jmp    a75 <__sym_malloc+0x5d>
     a9e:       8b 45 f4                mov    -0xc(%ebp),%eax
     aa1:       89 05 00 00 00 00       mov    %eax,0x0
                        aa3: R_386_32   sym_free_first
     aa7:       8b 45 f4                mov    -0xc(%ebp),%eax
     aaa:       c9                      leave
     aab:       c3                      ret


(The R_386_PC32 and R_386_32 are ELF relocations.) If the compiler output something different on your end -- modulo different instruction encodings -- then it might be a case of miscompilation.

Thank you!

---
https://codeberg.org/tkchiahttps://disroot.org/tkchia 路 馃槾 "MOV AX,0D500H+CMOS_REG_D+NMI"

bencollver

Homepage

05.02.2026, 18:34
(edited by bencollver, 05.02.2026, 19:01)

@ tkchia
 

trip report: Tiny C Compiler 0.9.27

> Hello bencollver,
>
> > I believe the disassembly corresponds to the for(;;) loop body, probably
> at
> > "sym->next = last_sym". I am not sure i am ready to debug this one.
>
> __sym_malloc seems to be used to extend a pool of free
> struct Sym thingies (TCC can then allocate new struct
> Sym objects from this free pool). It is odd that the routine is
> crashing.

I also thought it was odd. tcc_malloc() should either return the requested memory, or it should end the program with an error.

> Anyway, here is the machine code output that I got for this routine after
> compiling tccgen.c on my end (on a Linux machine):
> ...
> (The R_386_PC32 and R_386_32 are ELF
> relocations.) If the compiler output something different on your end --
> modulo different instruction encodings -- then it might be a case of
> miscompilation.

Thank you for your help!

I compiled tccgen.c, used mingw objdump -D tccgen.o, and then zeroed out the address labels before comparing them to get this diff:

--- a   2026-02-05 09:17:52.125688241 -0800
+++ b   2026-02-05 09:20:17.973684566 -0800
@@ -2,53 +2,48 @@
      000:       55                      push   %ebp
      000:       89 e5                   mov    %esp,%ebp
      000:       81 ec 10 00 00 00       sub    $0x10,%esp
-     000:       b8 ec 1f 00 00          mov    $0x1fec,%eax
+     000:       90                      nop
+     000:       b8 e0 1f 00 00          mov    $0x1fe0,%eax
      000:       50                      push   %eax
-     000:       e8 fc ff ff ff          call   a28 <__sym_malloc+0x10>
-                        a28: R_386_PC32 tcc_malloc
+     000:       e8 fc ff ff ff          call   c5f <__sym_malloc+0x11>
      000:       83 c4 04                add    $0x4,%esp
-     000:       89 45 fc                mov    %eax,-0x4(%ebp)
-     000:       8b 45 fc                mov    -0x4(%ebp),%eax
+     000:       89 85 fc ff ff ff       mov    %eax,-0x4(%ebp)
+     000:       8b 85 fc ff ff ff       mov    -0x4(%ebp),%eax
      000:       50                      push   %eax
      000:       b8 00 00 00 00          mov    $0x0,%eax
-                        a37: R_386_32   nb_sym_pools
      000:       50                      push   %eax
      000:       b8 00 00 00 00          mov    $0x0,%eax
-                        a3d: R_386_32   sym_pools
      000:       50                      push   %eax
-     000:       e8 fc ff ff ff          call   a43 <__sym_malloc+0x2b>
-                        a43: R_386_PC32 dynarray_add
+     000:       e8 fc ff ff ff          call   c80 <__sym_malloc+0x32>
      000:       83 c4 0c                add    $0xc,%esp
      000:       8b 05 00 00 00 00       mov    0x0,%eax
-                        a4c: R_386_32   sym_free_first
-     000:       89 45 f4                mov    %eax,-0xc(%ebp)
-     000:       8b 45 fc                mov    -0x4(%ebp),%eax
-     000:       89 45 f8                mov    %eax,-0x8(%ebp)
+     000:       89 85 f4 ff ff ff       mov    %eax,-0xc(%ebp)
+     000:       8b 85 fc ff ff ff       mov    -0x4(%ebp),%eax
+     000:       89 85 f8 ff ff ff       mov    %eax,-0x8(%ebp)
      000:       b8 00 00 00 00          mov    $0x0,%eax
-     000:       89 45 f0                mov    %eax,-0x10(%ebp)
-     000:       8b 45 f0                mov    -0x10(%ebp),%eax
-     000:       81 f8 e3 00 00 00       cmp    $0xe3,%eax
-     000:       0f 83 2e 00 00 00       jae    a9e <__sym_malloc+0x86>
-     000:       e9 0b 00 00 00          jmp    a80 <__sym_malloc+0x68>
-     000:       8b 45 f0                mov    -0x10(%ebp),%eax
+     000:       89 85 f0 ff ff ff       mov    %eax,-0x10(%ebp)
+     000:       8b 85 f0 ff ff ff       mov    -0x10(%ebp),%eax
+     000:       83 f8 cc                cmp    $0xffffffcc,%eax
+     000:       0f 83 4c 00 00 00       jae    d05 <__sym_malloc+0xb7>
+     000:       e9 14 00 00 00          jmp    cd2 <__sym_malloc+0x84>
+     000:       8b 85 f0 ff ff ff       mov    -0x10(%ebp),%eax
      000:       89 c1                   mov    %eax,%ecx
      000:       40                      inc    %eax
-     000:       89 45 f0                mov    %eax,-0x10(%ebp)
-     000:       eb e1                   jmp    a61 <__sym_malloc+0x49>
-     000:       8b 45 f8                mov    -0x8(%ebp),%eax
+     000:       89 85 f0 ff ff ff       mov    %eax,-0x10(%ebp)
+     000:       e9 d8 ff ff ff          jmp    caa <__sym_malloc+0x5c>
+     000:       8b 85 f8 ff ff ff       mov    -0x8(%ebp),%eax
      000:       83 c0 18                add    $0x18,%eax
-     000:       8b 4d f4                mov    -0xc(%ebp),%ecx
+     000:       8b 8d f4 ff ff ff       mov    -0xc(%ebp),%ecx
      000:       89 08                   mov    %ecx,(%eax)
-     000:       8b 45 f8                mov    -0x8(%ebp),%eax
-     000:       89 45 f4                mov    %eax,-0xc(%ebp)
-     000:       8b 45 f8                mov    -0x8(%ebp),%eax
+     000:       8b 85 f8 ff ff ff       mov    -0x8(%ebp),%eax
+     000:       89 85 f4 ff ff ff       mov    %eax,-0xc(%ebp)
+     000:       8b 85 f8 ff ff ff       mov    -0x8(%ebp),%eax
      000:       89 c1                   mov    %eax,%ecx
-     000:       83 c0 24                add    $0x24,%eax
-     000:       89 45 f8                mov    %eax,-0x8(%ebp)
-     000:       eb d7                   jmp    a75 <__sym_malloc+0x5d>
-     000:       8b 45 f4                mov    -0xc(%ebp),%eax
+     000:       83 c0 28                add    $0x28,%eax
+     000:       89 85 f8 ff ff ff       mov    %eax,-0x8(%ebp)
+     000:       e9 b9 ff ff ff          jmp    cbe <__sym_malloc+0x70>
+     000:       8b 85 f4 ff ff ff       mov    -0xc(%ebp),%eax
      000:       89 05 00 00 00 00       mov    %eax,0x0
-                        aa3: R_386_32   sym_free_first
-     000:       8b 45 f4                mov    -0xc(%ebp),%eax
-     000:       c9                      leave
-     000:       c3                      ret
+     000:       8b 85 f4 ff ff ff       mov    -0xc(%ebp),%eax
+     000:       c9                      leave 
+     000:       c3                      ret


I don't see the elf relocations in mingw's objdump output.

For the three instructions in the crash error message, the only difference is that on my win32 version the instructions have "ff ff ff" appended to them:

      000:       89 08                   mov    %ecx,(%eax)
-     000:       8b 45 f8                mov    -0x8(%ebp),%eax
-     000:       89 45 f4                mov    %eax,-0xc(%ebp)
-     000:       8b 45 f8                mov    -0x8(%ebp),%eax
+     000:       8b 85 f8 ff ff ff       mov    -0x8(%ebp),%eax
+     000:       89 85 f4 ff ff ff       mov    %eax,-0xc(%ebp)
+     000:       8b 85 f8 ff ff ff       mov    -0x8(%ebp),%eax


p.s.

I compared the output of the official win32 build of tcc 0.9.27 to my watcom bootstrapped build:

      000:       89 08                   mov    %ecx,(%eax)
-     000:       8b 45 f8                mov    -0x8(%ebp),%eax
-     000:       89 45 f4                mov    %eax,-0xc(%ebp)
-     000:       8b 45 f8                mov    -0x8(%ebp),%eax
+     000:       8b 85 f8 ff ff ff       mov    -0x8(%ebp),%eax
+     000:       89 85 f4 ff ff ff       mov    %eax,-0xc(%ebp)
+     000:       8b 85 f8 ff ff ff       mov    -0x8(%ebp),%eax


The only place i saw ELF relocations was in your disassembly.

I guess i could try and figure out where those extra "ff ff ff" bytes are coming from.

bencollver

Homepage

05.02.2026, 22:22
(edited by bencollver, 05.02.2026, 22:49)

@ bencollver
 

trip report: Tiny C Compiler 0.9.27

>       000:       89 08                   mov    %ecx,(%eax)
> -     000:       8b 45 f8                mov    -0x8(%ebp),%eax
> -     000:       89 45 f4                mov    %eax,-0xc(%ebp)
> -     000:       8b 45 f8                mov    -0x8(%ebp),%eax
> +     000:       8b 85 f8 ff ff ff       mov    -0x8(%ebp),%eax
> +     000:       89 85 f4 ff ff ff       mov    %eax,-0xc(%ebp)
> +     000:       8b 85 f8 ff ff ff       mov    -0x8(%ebp),%eax

>
> I guess i could try and figure out where those extra "ff ff ff" bytes are
> coming from.

The extra "ff ff ff" bytes are coming from i386-gen.c in the gen_modrm() function.

static void gen_modrm(int op_reg, int r, Sym *sym, int c)
{
...
        if (c == (char)c) {
            /* short reference */
            o(0x45 | op_reg);
            g(c);
        } else {
            oad(0x85 | op_reg, c);
        }


When c = 0xfffffff8, TCC takes the "short reference" branch while Watcom takes the "else" branch. What is this if statement actually testing?

Here is the commit where the code in question came from:

https://repo.or.cz/tinycc.git?a=commit;h=21c35b94437178b4a9ee50e6688f259a6bcc26da


I can't say i completely understand it, but i've changed my sources to the following:


#ifdef __WATCOMC__
        if (c == (signed char)c) {
#else
        if (c == (char)c) {
#endif
            /* short reference */
            o(0x45 | op_reg);
            g(c);
        } else {
            oad(0x85 | op_reg, c);
        }


And now the bootstrapped compiler emits the same code as the reference compiler and it can self-host on DOS.

I'll continue testing it... Thanks for the help!

tkchia

Homepage

05.02.2026, 23:13

@ bencollver
 

trip report: Tiny C Compiler 0.9.27

Hello bencollver,

Glad you are making progress. :-)

> When c = 0xfffffff8, TCC takes the "short reference" branch while Watcom
> takes the "else" branch. What is this if statement actually
> testing?

TCC directly generates machine code, and this is apparently where it emits a ModR/M byte and an address displacement for certain operand pairs (e.g. -0x10(%ebp), %eax).

The conditional is supposed to test if the displacement (here -0x10) is in the range -128 to +127. If it is, then TCC can opt to encode a short 1-byte displacement rather than a 4-byte one.

Thank you!

---
https://codeberg.org/tkchiahttps://disroot.org/tkchia 路 馃槾 "MOV AX,0D500H+CMOS_REG_D+NMI"

Rugxulo

Homepage

Usono,
05.02.2026, 23:16

@ bencollver
 

trip report: Tiny C Compiler 0.9.27

> The extra "ff ff ff" bytes are coming from i386-gen.c in the gen_modrm()
> function.

I assume this is just slightly smaller assembly code using signed byte instead of full dword offset.

> static void gen_modrm(int op_reg, int r, Sym *sym, int c)
> {
> ...
> if (c == (char)c) {
> /* short reference */
> o(0x45 | op_reg);
> g(c);
> } else {
> oad(0x85 | op_reg, c);
> }

>
> When c = 0xfffffff8, TCC takes the "short reference" branch while Watcom
> takes the "else" branch. What is this if statement actually
> testing?

In C, char literals (and char arguments to string functions) are usually "int" (to also allow EOF, aka (int)-1).

Here it's probably checking whether the byte (char) is the same as the int, that the low byte of the int is the same as the char byte.

> I can't say i completely understand it, but i've changed my sources to the
> following:
>
> [code]
> #ifdef __WATCOMC__
> if (c == (signed char)c) {

I forget what OpenWatcom and DJGPP use by default for char. It might be "signed char". (I think K&R only introduced "unsigned" for int or long first.)

There should be a cmdline switch to toggle char signedness by default. I would assume using that (and avoiding a patch) is cleaner (assuming that doesn't break anything).

> And now the bootstrapped compiler emits the same code as the reference
> compiler and it can self-host on DOS.

I only barely run TCC under HXRT when needed (with old ReactOS MSVCRT.DLL).

tkchia

Homepage

05.02.2026, 23:18

@ bencollver
 

trip report: Tiny C Compiler 0.9.27

Hello bencollver,

In addition, it seems that the miscompilation that ultimately caused the crash was this:

     000:       83 f8 cc                cmp    $0xffffffcc,%eax

This is apparently the i < SYM_POOL_NB part, and SYM_POOL_NB (= (8192 / sizeof(Sym))) should definitely not be 0xffffffcc.

Thank you!

---
https://codeberg.org/tkchiahttps://disroot.org/tkchia 路 馃槾 "MOV AX,0D500H+CMOS_REG_D+NMI"

tkchia

Homepage

06.02.2026, 23:45

@ bencollver
 

trip report: Tiny C Compiler 0.9.27

Hello bencollver,

I think c == (signed char)c here will be correct even under compilers other than Watcom. (And, c == (unsigned char)c here is definitely always wrong.) Maybe I should file a bug report with the upstream TCC project.

Thank you!

---
https://codeberg.org/tkchiahttps://disroot.org/tkchia 路 馃槾 "MOV AX,0D500H+CMOS_REG_D+NMI"

bencollver

Homepage

11.02.2026, 00:44

@ bencollver
 

trip report: Tiny C Compiler 0.9.27

I was chugging along replacing TCC's MSVCRT.DLL dependency with Watcom's libc. I got the compiler to self-host. That was fun!

Then i compiled mawk and the floating point math gave bad results. I traced it to parse_number() in tccpp.c:

            } else {
                tok = TOK_CDOUBLE;
                tokc.d = strtod(token_buf, NULL);
            }


I can import strtod() from Watcom's mt7s19.dll, but Watcom's stack calling convention isn't strictly cdecl. The arguments pass in fine, but results are returned in AX:DX where TCC expected them in ST0.

From: i386-gen.c

#define REG_FRET TREG_ST0 /* float return register */

From cguide.pdf:

10.5.2 Returning Values in 80x87-based Applications

When using the stack-based calling conventions with "fpi" or "fpi87", floating-point values are returned in registers. Single precision values are returned in EAX, and double precision values are returned in EDX:EAX


I tried to change TCC to use Watcom's calling convention, but i didn't get it right.

If i read the documentation correctly, it could be easy to write a wrapper DLL. With Watcom's -ecc option, the following wrapper function should automagically translate between cdecl and Watcom's calling convention.

double wtfmath_strtod(const char *s, char **r) {
    return strtod(s, r);
}


That's all i have brain juice for today.

Rugxulo

Homepage

Usono,
11.02.2026, 01:42

@ bencollver
 

trip report: Tiny C Compiler 0.9.27

> I tried to change TCC to use Watcom's calling convention, but i didn't get
> it right.
>
> If i read the documentation correctly, it could be easy to write a wrapper
> DLL. With Watcom's -ecc option, the following wrapper function should
> automagically translate between cdecl and Watcom's calling convention.
>
> double wtfmath_strtod(const char *s, char **r) {
> return strtod(s, r);
> }

>
> That's all i have brain juice for today.

Do you mean -3s doesn't work? Default is something like -3 (register calling convention). I just assume you knew -3s also existed and tried that.

bencollver

Homepage

11.02.2026, 02:06

@ Rugxulo
 

trip report: Tiny C Compiler 0.9.27

> Do you mean -3s doesn't work? Default is something like -3 (register
> calling convention). I just assume you knew -3s also existed and tried
> that.

That's right, i am using -3s and it doesn't work for floating point functions when i import them from mt7s19.dll. With Watcom's -3s stack calling convention, a floating point result is returned in AX:DX.

Ironically, with Watcom's -3 register calling convention, a floating point result is returned in ST0 as TCC expects, but then the function arguments don't match up.

I'll probably write a short script to generate wrapper functions and keep my fingers crossed that it works as expected.

Back to index page
Thread view  Board view
23154 Postings in 2179 Threads, 404 registered users (0 online)
DOS ain't dead | Admin contact
RSS Feed
powered by my little forum