bencollver

04.02.2026, 17:02 |
trip report: Tiny C Compiler 0.9.27 (Miscellaneous) |
First Attempt:
I bootstrapped the Tiny C Compiler 0.9.27 using Watcom 1.9 and HXRT on SvarDOS. After some work, Watcom successfully built it. But the bootstrapped TCC.EXE couldn't build its own runtime, libtcc1.a.
I got the following error message:
Exception 0D
EAX=00000287 EBX=0000547F ECX=00000008 EDX=00000274 ESI=00000100
EDI=00000100 EBP=00005878 ESP=000056E2 EFL=00013246 EIP=000009E9
CS=0097 (00111000,000058AF,00FB) SS=008F (00111000,000058AF,40F3)
DS=009F (000132A0,0000010F,00F3) ES=00D7 (00012E30,000000FF,00F3)
FS=0000 (********,********,****) GS=0000 (********,********,****)
LDTR=0038 (FF80F000,00000FFF,0082) TR=0030 (FF80AD70,00000067,008B)
ERRC=0000 (********,********,****) PTE 1. Page LDT=01FCF467
GDTR=07FF:FF80E0000 IDTR=07FF:FF80E800 PTE CR2=00000027
CR0=80000011 CR2=00000000 CR3=01FDC000 CR4=00000200 TSS:ESP0=00002290
DR0-3=00000000 00000000 00000000 00000000 DR6=FFFF0FF0 DR7=00000400
LPMS=0087(01) RMS=1B51:0200 cRMCB=0000 IRQ=000000 ISR=0000
[EIP]=F3 67 A4 F7 C7 00 FF 75 03 B0 0D AA
[ESP]=4A3F 0000 473C 0040 5878 0000 5702 0000
000056F2=547F 0000 0274 0000 005B 0000 0204 0000
00005702=008F 008F 289D 00C7 5726 0000 0040 0040
00005712=5878 0000 572A 0000 R000 0000 570A 0000
00005722=0001 0000 0507 0002 00C7 4550 0000 014C
00005732=0004 3D9F 6981 0000 0000 0000 0000 00E0
Syntax Error
The command was:
.\tcc -m32 -ar cr lib/32/libtcc1.a libtcc1.o crt1.o crt1w.o wincrt1.o wincrt1w.o dllcrt1.o dllmain.o chkstk.o bcheck.o alloca86.o oca86-bt.o
It succeeded when removing oca86-bt.o from the end.
It also succeeded when removing all .o files EXCEPT for oca86-bt.o.
Try as i might, i could not resolve this issue on SvarDOS.
It worked just fine on FreeDOS where it successfully bootstrapped the compiler and built its own runtime.
From HXRT.TXT:
HX has been tested to run with MS-DOS v5/6/7 and FreeDOS v1.0. It might work with other DOS versions, but this is untested and not recommended.
Second Attempt:
I bootstrapped the Tiny C Compiler 0.9.27 using Watcom 1.9 and HXRT on FreeDOS. It successfully self-hosted itself using the bootstrapped TCC.EXE. But the self-hosted TCC.EXE failed to run.
I got the following error message:
dkrnl32: exception C0000005, flags=0 occurred at BF:3AACE4
ax=718024 bx=4034EC cx=717FE4 dx=41000C
si=400000 di=113B2D bp=226EFC sp=226EEC
exception caused by access to memory address 718024
ip = Module 'libtcc.dll'+ACE4
[eip] = 89 08 8B 85 F8 FF FF FF 89 85 F4 FF
[esp] = 000002B3 00717FE4 0071800C 00711414 00226F08 003DC8B5
dkrnl32: fatal exit!
This error happened when i ran tcc.exe regardless of the arguments.
When i ran the static linked tcc64.exe, i got a similar error, but without the libtcc.dll reference.
I couldn't resolve this issue on FreeDOS.
It worked just fine on Windows XP, where it successfully self-hosted the compiler. This self-hosted compiler worked on Windows XP, but crashed on FreeDOS with HXRT.
I ran hx\test\dostest.exe and all tests passed.
I tried various settings in the DPIMILDR and HDPMI environment variables. None helped.
I tried using MS-DOS 6.22 and that didn't help either. |
tkchia

04.02.2026, 17:55 (edited by tkchia, 04.02.2026, 18:18)
@ bencollver
|
trip report: Tiny C Compiler 0.9.27 |
Hello bencollver,
For the first attempt: the instructions at cs:eip disassemble to
000009E9 F367A4 rep a32 movsb
000009EC F7C700FF test di,0xff00
000009F0 7503 jnz 0x9f5
000009F2 B00D mov al,0xd
000009F4 AA stosb
when interpreted as 16-bit code...
Apparently the rep a32 movsb caused an exception because it was trying to write to es:[0x100], and the upper limit of es was 0xff.
My wild guess is that the command line arguments as seen by tcc were too long, and some part(s) of the runtime (maybe the bootstrapping C library, or maybe HXRT, etc. ...) did not handle over-long command line arguments properly. I suppose it might be useful to see where the above code instructions were coming from...
Thank you! --- https://codeberg.org/tkchia 路 https://disroot.org/tkchia 路 馃槾 "MOV AX,0D500H+CMOS_REG_D+NMI" |
bencollver

04.02.2026, 18:48 (edited by bencollver, 04.02.2026, 19:08)
@ tkchia
|
trip report: Tiny C Compiler 0.9.27 |
> Hello bencollver,
>
> For the first attempt:
>
> I suppose it might be useful to see where
> the above code instructions were coming from...
Hi Tkchia,
Thanks for that information! I vote HXRT. I found those hex bytes in DPMILD32.EXE and here is a link to the code corresponding to your disassembly:
https://github.com/Baron-von-Riedesel/HX/blob/master/Src/DPMILDR/DPMILDR.ASM#L2864
Interesting that this behaves differently in FreeDOS than in SvarDOS.
Note that i can compile and run hello world with the bootstrapped TCC.EXE. It's not until i try to build TCC with the bootstrapped TCC.EXE that i run into problems on FreeDOS.
p.s. I could probably work around this first issue by using TCC's @file syntax for the overlong command line options. |
Japheth

Germany (South), 04.02.2026, 21:12
@ bencollver
|
trip report: Tiny C Compiler 0.9.27 |
> https://github.com/Baron-von-Riedesel/HX/blob/master/Src/DPMILDR/DPMILDR.ASM#L2864
>
> Interesting that this behaves differently in FreeDOS than in SvarDOS.
It's perhaps a COMMAND.COM issue.
I cannot remember all details currently, but since the cmdline in the PSP is restricted to 126 bytes, MS has extended this in MS-DOS 7.x : there's supposed to exist an environment variable CMDLINE, which may hold a cmdline of "any" size. HX's Win32 kernel emulation dkrnl32 uses this environment variable if possible. --- MS-DOS forever! |
bencollver

04.02.2026, 21:38
@ bencollver
|
trip report: Tiny C Compiler 0.9.27 |
> p.s. I could probably work around this first issue by using TCC's @file
> syntax for the overlong command line options.
I verified that i can use TCC's @file syntax to work around this issue on SvarDOS.
Taking a cue from tkchia, i disassembled the hex bytes from the second error message, and this time i found the hex bytes in TCC using mingw's objdump command.
C:\TCC0927>objdump -D libtcc.o
...
00009c51: <__sym_malloc>:
...
9ce4: 89 08 mov %ecx,(%eax)
9ce6: 8b 85 f8 ff ff ff mov -0x8(%ebp),%eax
9cec: 89 85 f4 ff ff ff mov %eax,-0xc(%ebp)
I found __sym_malloc() in tccgen.c
static Sym *__sym_malloc(void)
{
Sym *sym_pool, *sym, *last_sym;
int i;
sym_pool = tcc_malloc(SYM_POOL_NB * sizeof(Sym));
dynarray_add(&sym_pools, &nb_sym_pools, sym_pool);
last_sym = sym_free_first;
sym = sym_pool;
for(i = 0; i < SYM_POOL_NB; i++) {
sym->next = last_sym;
last_sym = sym;
sym++;
}
sym_free_first = last_sym;
return last_sym;
}
I believe the disassembly corresponds to the for(;;) loop body, probably at "sym->next = last_sym". I am not sure i am ready to debug this one. |
bencollver

04.02.2026, 21:44
@ Japheth
|
trip report: Tiny C Compiler 0.9.27 |
> It's perhaps a COMMAND.COM issue.
>
> I cannot remember all details currently, but since the cmdline in the PSP
> is restricted to 126 bytes, MS has extended this in MS-DOS 7.x : there's
> supposed to exist an environment variable CMDLINE, which may hold a cmdline
> of "any" size. HX's Win32 kernel emulation dkrnl32 uses this environment
> variable if possible.
Thanks for this detail!
Coincidentally, i've used an environment variable to pass command line arguments into QBASIC.EXE from a .bat file because QBASIC.EXE does not directly support using command line arguments from BASIC. |
tkchia

05.02.2026, 16:44
@ bencollver
|
trip report: Tiny C Compiler 0.9.27 |
Hello bencollver,
> I believe the disassembly corresponds to the for(;;) loop body, probably at
> "sym->next = last_sym". I am not sure i am ready to debug this one.
__sym_malloc seems to be used to extend a pool of free struct Sym thingies (TCC can then allocate new struct Sym objects from this free pool). It is odd that the routine is crashing.
Anyway, here is the machine code output that I got for this routine after compiling tccgen.c on my end (on a Linux machine):
00000a18 <__sym_malloc>:
a18: 55 push %ebp
a19: 89 e5 mov %esp,%ebp
a1b: 81 ec 10 00 00 00 sub $0x10,%esp
a21: b8 ec 1f 00 00 mov $0x1fec,%eax
a26: 50 push %eax
a27: e8 fc ff ff ff call a28 <__sym_malloc+0x10>
a28: R_386_PC32 tcc_malloc
a2c: 83 c4 04 add $0x4,%esp
a2f: 89 45 fc mov %eax,-0x4(%ebp)
a32: 8b 45 fc mov -0x4(%ebp),%eax
a35: 50 push %eax
a36: b8 00 00 00 00 mov $0x0,%eax
a37: R_386_32 nb_sym_pools
a3b: 50 push %eax
a3c: b8 00 00 00 00 mov $0x0,%eax
a3d: R_386_32 sym_pools
a41: 50 push %eax
a42: e8 fc ff ff ff call a43 <__sym_malloc+0x2b>
a43: R_386_PC32 dynarray_add
a47: 83 c4 0c add $0xc,%esp
a4a: 8b 05 00 00 00 00 mov 0x0,%eax
a4c: R_386_32 sym_free_first
a50: 89 45 f4 mov %eax,-0xc(%ebp)
a53: 8b 45 fc mov -0x4(%ebp),%eax
a56: 89 45 f8 mov %eax,-0x8(%ebp)
a59: b8 00 00 00 00 mov $0x0,%eax
a5e: 89 45 f0 mov %eax,-0x10(%ebp)
a61: 8b 45 f0 mov -0x10(%ebp),%eax
a64: 81 f8 e3 00 00 00 cmp $0xe3,%eax
a6a: 0f 83 2e 00 00 00 jae a9e <__sym_malloc+0x86>
a70: e9 0b 00 00 00 jmp a80 <__sym_malloc+0x68>
a75: 8b 45 f0 mov -0x10(%ebp),%eax
a78: 89 c1 mov %eax,%ecx
a7a: 40 inc %eax
a7b: 89 45 f0 mov %eax,-0x10(%ebp)
a7e: eb e1 jmp a61 <__sym_malloc+0x49>
a80: 8b 45 f8 mov -0x8(%ebp),%eax
a83: 83 c0 18 add $0x18,%eax
a86: 8b 4d f4 mov -0xc(%ebp),%ecx
a89: 89 08 mov %ecx,(%eax)
a8b: 8b 45 f8 mov -0x8(%ebp),%eax
a8e: 89 45 f4 mov %eax,-0xc(%ebp)
a91: 8b 45 f8 mov -0x8(%ebp),%eax
a94: 89 c1 mov %eax,%ecx
a96: 83 c0 24 add $0x24,%eax
a99: 89 45 f8 mov %eax,-0x8(%ebp)
a9c: eb d7 jmp a75 <__sym_malloc+0x5d>
a9e: 8b 45 f4 mov -0xc(%ebp),%eax
aa1: 89 05 00 00 00 00 mov %eax,0x0
aa3: R_386_32 sym_free_first
aa7: 8b 45 f4 mov -0xc(%ebp),%eax
aaa: c9 leave
aab: c3 ret
(The R_386_PC32 and R_386_32 are ELF relocations.) If the compiler output something different on your end -- modulo different instruction encodings -- then it might be a case of miscompilation.
Thank you! --- https://codeberg.org/tkchia 路 https://disroot.org/tkchia 路 馃槾 "MOV AX,0D500H+CMOS_REG_D+NMI" |
bencollver

05.02.2026, 18:34 (edited by bencollver, 05.02.2026, 19:01)
@ tkchia
|
trip report: Tiny C Compiler 0.9.27 |
> Hello bencollver,
>
> > I believe the disassembly corresponds to the for(;;) loop body, probably
> at
> > "sym->next = last_sym". I am not sure i am ready to debug this one.
>
> __sym_malloc seems to be used to extend a pool of free
> struct Sym thingies (TCC can then allocate new struct
> Sym objects from this free pool). It is odd that the routine is
> crashing.
I also thought it was odd. tcc_malloc() should either return the requested memory, or it should end the program with an error.
> Anyway, here is the machine code output that I got for this routine after
> compiling tccgen.c on my end (on a Linux machine):
> ...
> (The R_386_PC32 and R_386_32 are ELF
> relocations.) If the compiler output something different on your end --
> modulo different instruction encodings -- then it might be a case of
> miscompilation.
Thank you for your help!
I compiled tccgen.c, used mingw objdump -D tccgen.o, and then zeroed out the address labels before comparing them to get this diff:
--- a 2026-02-05 09:17:52.125688241 -0800
+++ b 2026-02-05 09:20:17.973684566 -0800
@@ -2,53 +2,48 @@
000: 55 push %ebp
000: 89 e5 mov %esp,%ebp
000: 81 ec 10 00 00 00 sub $0x10,%esp
- 000: b8 ec 1f 00 00 mov $0x1fec,%eax
+ 000: 90 nop
+ 000: b8 e0 1f 00 00 mov $0x1fe0,%eax
000: 50 push %eax
- 000: e8 fc ff ff ff call a28 <__sym_malloc+0x10>
- a28: R_386_PC32 tcc_malloc
+ 000: e8 fc ff ff ff call c5f <__sym_malloc+0x11>
000: 83 c4 04 add $0x4,%esp
- 000: 89 45 fc mov %eax,-0x4(%ebp)
- 000: 8b 45 fc mov -0x4(%ebp),%eax
+ 000: 89 85 fc ff ff ff mov %eax,-0x4(%ebp)
+ 000: 8b 85 fc ff ff ff mov -0x4(%ebp),%eax
000: 50 push %eax
000: b8 00 00 00 00 mov $0x0,%eax
- a37: R_386_32 nb_sym_pools
000: 50 push %eax
000: b8 00 00 00 00 mov $0x0,%eax
- a3d: R_386_32 sym_pools
000: 50 push %eax
- 000: e8 fc ff ff ff call a43 <__sym_malloc+0x2b>
- a43: R_386_PC32 dynarray_add
+ 000: e8 fc ff ff ff call c80 <__sym_malloc+0x32>
000: 83 c4 0c add $0xc,%esp
000: 8b 05 00 00 00 00 mov 0x0,%eax
- a4c: R_386_32 sym_free_first
- 000: 89 45 f4 mov %eax,-0xc(%ebp)
- 000: 8b 45 fc mov -0x4(%ebp),%eax
- 000: 89 45 f8 mov %eax,-0x8(%ebp)
+ 000: 89 85 f4 ff ff ff mov %eax,-0xc(%ebp)
+ 000: 8b 85 fc ff ff ff mov -0x4(%ebp),%eax
+ 000: 89 85 f8 ff ff ff mov %eax,-0x8(%ebp)
000: b8 00 00 00 00 mov $0x0,%eax
- 000: 89 45 f0 mov %eax,-0x10(%ebp)
- 000: 8b 45 f0 mov -0x10(%ebp),%eax
- 000: 81 f8 e3 00 00 00 cmp $0xe3,%eax
- 000: 0f 83 2e 00 00 00 jae a9e <__sym_malloc+0x86>
- 000: e9 0b 00 00 00 jmp a80 <__sym_malloc+0x68>
- 000: 8b 45 f0 mov -0x10(%ebp),%eax
+ 000: 89 85 f0 ff ff ff mov %eax,-0x10(%ebp)
+ 000: 8b 85 f0 ff ff ff mov -0x10(%ebp),%eax
+ 000: 83 f8 cc cmp $0xffffffcc,%eax
+ 000: 0f 83 4c 00 00 00 jae d05 <__sym_malloc+0xb7>
+ 000: e9 14 00 00 00 jmp cd2 <__sym_malloc+0x84>
+ 000: 8b 85 f0 ff ff ff mov -0x10(%ebp),%eax
000: 89 c1 mov %eax,%ecx
000: 40 inc %eax
- 000: 89 45 f0 mov %eax,-0x10(%ebp)
- 000: eb e1 jmp a61 <__sym_malloc+0x49>
- 000: 8b 45 f8 mov -0x8(%ebp),%eax
+ 000: 89 85 f0 ff ff ff mov %eax,-0x10(%ebp)
+ 000: e9 d8 ff ff ff jmp caa <__sym_malloc+0x5c>
+ 000: 8b 85 f8 ff ff ff mov -0x8(%ebp),%eax
000: 83 c0 18 add $0x18,%eax
- 000: 8b 4d f4 mov -0xc(%ebp),%ecx
+ 000: 8b 8d f4 ff ff ff mov -0xc(%ebp),%ecx
000: 89 08 mov %ecx,(%eax)
- 000: 8b 45 f8 mov -0x8(%ebp),%eax
- 000: 89 45 f4 mov %eax,-0xc(%ebp)
- 000: 8b 45 f8 mov -0x8(%ebp),%eax
+ 000: 8b 85 f8 ff ff ff mov -0x8(%ebp),%eax
+ 000: 89 85 f4 ff ff ff mov %eax,-0xc(%ebp)
+ 000: 8b 85 f8 ff ff ff mov -0x8(%ebp),%eax
000: 89 c1 mov %eax,%ecx
- 000: 83 c0 24 add $0x24,%eax
- 000: 89 45 f8 mov %eax,-0x8(%ebp)
- 000: eb d7 jmp a75 <__sym_malloc+0x5d>
- 000: 8b 45 f4 mov -0xc(%ebp),%eax
+ 000: 83 c0 28 add $0x28,%eax
+ 000: 89 85 f8 ff ff ff mov %eax,-0x8(%ebp)
+ 000: e9 b9 ff ff ff jmp cbe <__sym_malloc+0x70>
+ 000: 8b 85 f4 ff ff ff mov -0xc(%ebp),%eax
000: 89 05 00 00 00 00 mov %eax,0x0
- aa3: R_386_32 sym_free_first
- 000: 8b 45 f4 mov -0xc(%ebp),%eax
- 000: c9 leave
- 000: c3 ret
+ 000: 8b 85 f4 ff ff ff mov -0xc(%ebp),%eax
+ 000: c9 leave
+ 000: c3 ret
I don't see the elf relocations in mingw's objdump output.
For the three instructions in the crash error message, the only difference is that on my win32 version the instructions have "ff ff ff" appended to them:
000: 89 08 mov %ecx,(%eax)
- 000: 8b 45 f8 mov -0x8(%ebp),%eax
- 000: 89 45 f4 mov %eax,-0xc(%ebp)
- 000: 8b 45 f8 mov -0x8(%ebp),%eax
+ 000: 8b 85 f8 ff ff ff mov -0x8(%ebp),%eax
+ 000: 89 85 f4 ff ff ff mov %eax,-0xc(%ebp)
+ 000: 8b 85 f8 ff ff ff mov -0x8(%ebp),%eax
p.s.
I compared the output of the official win32 build of tcc 0.9.27 to my watcom bootstrapped build:
000: 89 08 mov %ecx,(%eax)
- 000: 8b 45 f8 mov -0x8(%ebp),%eax
- 000: 89 45 f4 mov %eax,-0xc(%ebp)
- 000: 8b 45 f8 mov -0x8(%ebp),%eax
+ 000: 8b 85 f8 ff ff ff mov -0x8(%ebp),%eax
+ 000: 89 85 f4 ff ff ff mov %eax,-0xc(%ebp)
+ 000: 8b 85 f8 ff ff ff mov -0x8(%ebp),%eax
The only place i saw ELF relocations was in your disassembly.
I guess i could try and figure out where those extra "ff ff ff" bytes are coming from. |
bencollver

05.02.2026, 22:22 (edited by bencollver, 05.02.2026, 22:49)
@ bencollver
|
trip report: Tiny C Compiler 0.9.27 |
> 000: 89 08 mov %ecx,(%eax)
> - 000: 8b 45 f8 mov -0x8(%ebp),%eax
> - 000: 89 45 f4 mov %eax,-0xc(%ebp)
> - 000: 8b 45 f8 mov -0x8(%ebp),%eax
> + 000: 8b 85 f8 ff ff ff mov -0x8(%ebp),%eax
> + 000: 89 85 f4 ff ff ff mov %eax,-0xc(%ebp)
> + 000: 8b 85 f8 ff ff ff mov -0x8(%ebp),%eax
>
> I guess i could try and figure out where those extra "ff ff ff" bytes are
> coming from.
The extra "ff ff ff" bytes are coming from i386-gen.c in the gen_modrm() function.
static void gen_modrm(int op_reg, int r, Sym *sym, int c)
{
...
if (c == (char)c) {
/* short reference */
o(0x45 | op_reg);
g(c);
} else {
oad(0x85 | op_reg, c);
}
When c = 0xfffffff8, TCC takes the "short reference" branch while Watcom takes the "else" branch. What is this if statement actually testing?
Here is the commit where the code in question came from:
https://repo.or.cz/tinycc.git?a=commit;h=21c35b94437178b4a9ee50e6688f259a6bcc26da
I can't say i completely understand it, but i've changed my sources to the following:
#ifdef __WATCOMC__
if (c == (signed char)c) {
#else
if (c == (char)c) {
#endif
/* short reference */
o(0x45 | op_reg);
g(c);
} else {
oad(0x85 | op_reg, c);
}
And now the bootstrapped compiler emits the same code as the reference compiler and it can self-host on DOS.
I'll continue testing it... Thanks for the help! |
bretjohn

Rio Rancho, NM, 05.02.2026, 22:24
@ Japheth
|
trip report: Tiny C Compiler 0.9.27 |
> I cannot remember all details currently, but since the cmdline in the PSP
> is restricted to 126 bytes, MS has extended this in MS-DOS 7.x : there's
> supposed to exist an environment variable CMDLINE, which may hold a cmdline
> of "any" size. HX's Win32 kernel emulation dkrnl32 uses this environment
> variable if possible.
FWIW, the details of the command-line tail are as follows.
There are 128 bytes set aside in the PSP for the command-tail. The first byte (at PSP:80h]) is the size (number of bytes) in the command-tail, and the last byte of the command-tail is always an ASCII 13 (Carriage Return or CR). That's why the command-tail size is limited to 126 bytes. In my programs (usually written in ASM), I normally ignore the size and just look for the CR.
The way the CMDLINE environment variable works is that if the command-tail is more than 126 bytes, the PSP still contains the first part of the tail and the size is shown as maximum (7Eh or 126). There is no direct indication anywhere that what's in the PSP is incomplete.
What I do in my programs is check the size byte (at PSP:[80h]) and if it's maxed out (7Eh) I look for a CMDLINE environment variable. If CMDLINE exists, I use it. If not, I revert back to what's in the PSP.
Others have adopted what MS did with Windows (e.g., when working at an NT command prompt there is a CMDLINE variable) so there is no way to test ahead of time whether it's supported or not. Also, a user can set an environment variable called CMDLINE to whatever they want so the mere presence of a CMDLINE variable doesn't guarantee anything about the command-tail.
Also, some environments/programs may assume that the command-tail is never more than 126 bytes, so may not properly be able to process a bigger one (you'll try to process a part of memory that isn't actually part of the command-tail) |
Rugxulo

Usono, 05.02.2026, 23:09
@ bretjohn
|
trip report: Tiny C Compiler 0.9.27 |
> There are 128 bytes set aside in the PSP for the command-tail. The first
> byte (at PSP:80h]) is the size (number of bytes) in the command-tail, and
> the last byte of the command-tail is always an ASCII 13 (Carriage Return or
> CR). That's why the command-tail size is limited to 126 bytes. In my
> programs (usually written in ASM), I normally ignore the size and just look
> for the CR.
Not sure the exact limit. Maybe you mean minus the length byte and actual ending CR. I know the DOS4GW extender couldn't handle longer than 126 or so. I had thought I read %CMDLINE% was only used when 7Fh or 80h (Borland??) length was found. But yeah, it's a Win9x feature, and FreeCOM (FreeDOS) supports it, so does DJGPP.
FYI, the core OpenWatcom tools should support longer cmdlines via env. var. if you put a '*' (asterisk) before them in your makefiles (using Wmake). |
tkchia

05.02.2026, 23:13
@ bencollver
|
trip report: Tiny C Compiler 0.9.27 |
Hello bencollver,
Glad you are making progress.
> When c = 0xfffffff8, TCC takes the "short reference" branch while Watcom
> takes the "else" branch. What is this if statement actually
> testing?
TCC directly generates machine code, and this is apparently where it emits a ModR/M byte and an address displacement for certain operand pairs (e.g. -0x10(%ebp), %eax).
The conditional is supposed to test if the displacement (here -0x10) is in the range -128 to +127. If it is, then TCC can opt to encode a short 1-byte displacement rather than a 4-byte one.
Thank you! --- https://codeberg.org/tkchia 路 https://disroot.org/tkchia 路 馃槾 "MOV AX,0D500H+CMOS_REG_D+NMI" |
Rugxulo

Usono, 05.02.2026, 23:16
@ bencollver
|
trip report: Tiny C Compiler 0.9.27 |
> The extra "ff ff ff" bytes are coming from i386-gen.c in the gen_modrm()
> function.
I assume this is just slightly smaller assembly code using signed byte instead of full dword offset.
> static void gen_modrm(int op_reg, int r, Sym *sym, int c)
> {
> ...
> if (c == (char)c) {
> /* short reference */
> o(0x45 | op_reg);
> g(c);
> } else {
> oad(0x85 | op_reg, c);
> }
>
> When c = 0xfffffff8, TCC takes the "short reference" branch while Watcom
> takes the "else" branch. What is this if statement actually
> testing?
In C, char literals (and char arguments to string functions) are usually "int" (to also allow EOF, aka (int)-1).
Here it's probably checking whether the byte (char) is the same as the int, that the low byte of the int is the same as the char byte.
> I can't say i completely understand it, but i've changed my sources to the
> following:
>
> [code]
> #ifdef __WATCOMC__
> if (c == (signed char)c) {
I forget what OpenWatcom and DJGPP use by default for char. It might be "signed char". (I think K&R only introduced "unsigned" for int or long first.)
There should be a cmdline switch to toggle char signedness by default. I would assume using that (and avoiding a patch) is cleaner (assuming that doesn't break anything).
> And now the bootstrapped compiler emits the same code as the reference
> compiler and it can self-host on DOS.
I only barely run TCC under HXRT when needed (with old ReactOS MSVCRT.DLL). |
tkchia

05.02.2026, 23:18
@ bencollver
|
trip report: Tiny C Compiler 0.9.27 |
Hello bencollver,
In addition, it seems that the miscompilation that ultimately caused the crash was this:
000: 83 f8 cc cmp $0xffffffcc,%eax
This is apparently the i < SYM_POOL_NB part, and SYM_POOL_NB (= (8192 / sizeof(Sym))) should definitely not be 0xffffffcc.
Thank you! --- https://codeberg.org/tkchia 路 https://disroot.org/tkchia 路 馃槾 "MOV AX,0D500H+CMOS_REG_D+NMI" |
bretjohn

Rio Rancho, NM, 06.02.2026, 01:59
@ Rugxulo
|
trip report: Tiny C Compiler 0.9.27 |
> Not sure the exact limit. Maybe you mean minus the length byte and actual
> ending CR.
That's exactly what I meant.
> I know the DOS4GW extender couldn't handle longer than 126 or
> so. I had thought I read %CMDLINE% was only used when 7Fh or 80h
> (Borland??) length was found.
I don't think you can depend on that in all cases (basically, an illegal value in the length byte). E.g., I know some programs if they perform an EXEC call don't set up the command-tail portion of the call correctly (in particular, they don't always set the size byte correctly). So if your program gets EXEC'd from another program you can't always depend on the size byte being correct (that's part of the reason I ignore it except to detect the potential presence of %CMDLINE%). In my programs I test for the length byte being >=7Eh, not exactly 7Eh, before looking for %CMDLINE%. But I also don't assume that just because it's a big value doesn't mean I'll necessarily find %CMDLINE%. And as I stated earlier, the user (or another program) can set %CMDLINE% so just because it exists doesn't mean it's associated with the command-tail.
If you're writing in a high-level language you usually depend on the compiler to handle all those details for you, and a lot of them (especially older ones) don't do it correctly with modern versions of DOS. At least in some cases you can bypass what the compiler does and look at the PSP and environment space yourself, but that sort of defeats the purpose of using a high-level language (it's supposed to handle those kinds of details automatically). When MS (or somebody else) changes the rules you can end up with messy crashes and bugs. |
tkchia

06.02.2026, 23:45
@ bencollver
|
trip report: Tiny C Compiler 0.9.27 |
Hello bencollver,
I think c == (signed char)c here will be correct even under compilers other than Watcom. (And, c == (unsigned char)c here is definitely always wrong.) Maybe I should file a bug report with the upstream TCC project.
Thank you! --- https://codeberg.org/tkchia 路 https://disroot.org/tkchia 路 馃槾 "MOV AX,0D500H+CMOS_REG_D+NMI" |
bencollver

11.02.2026, 00:44
@ bencollver
|
trip report: Tiny C Compiler 0.9.27 |
I was chugging along replacing TCC's MSVCRT.DLL dependency with Watcom's libc. I got the compiler to self-host. That was fun!
Then i compiled mawk and the floating point math gave bad results. I traced it to parse_number() in tccpp.c:
} else {
tok = TOK_CDOUBLE;
tokc.d = strtod(token_buf, NULL);
}
I can import strtod() from Watcom's mt7s19.dll, but Watcom's stack calling convention isn't strictly cdecl. The arguments pass in fine, but results are returned in AX:DX where TCC expected them in ST0.
From: i386-gen.c
#define REG_FRET TREG_ST0 /* float return register */
From cguide.pdf:
10.5.2 Returning Values in 80x87-based Applications
When using the stack-based calling conventions with "fpi" or "fpi87", floating-point values are returned in registers. Single precision values are returned in EAX, and double precision values are returned in EDX:EAX
I tried to change TCC to use Watcom's calling convention, but i didn't get it right.
If i read the documentation correctly, it could be easy to write a wrapper DLL. With Watcom's -ecc option, the following wrapper function should automagically translate between cdecl and Watcom's calling convention.
double wtfmath_strtod(const char *s, char **r) {
return strtod(s, r);
}
That's all i have brain juice for today. |
Rugxulo

Usono, 11.02.2026, 01:42
@ bencollver
|
trip report: Tiny C Compiler 0.9.27 |
> I tried to change TCC to use Watcom's calling convention, but i didn't get
> it right.
>
> If i read the documentation correctly, it could be easy to write a wrapper
> DLL. With Watcom's -ecc option, the following wrapper function should
> automagically translate between cdecl and Watcom's calling convention.
>
> double wtfmath_strtod(const char *s, char **r) {
> return strtod(s, r);
> }
>
> That's all i have brain juice for today.
Do you mean -3s doesn't work? Default is something like -3 (register calling convention). I just assume you knew -3s also existed and tried that. |
bencollver

11.02.2026, 02:06
@ Rugxulo
|
trip report: Tiny C Compiler 0.9.27 |
> Do you mean -3s doesn't work? Default is something like -3 (register
> calling convention). I just assume you knew -3s also existed and tried
> that.
That's right, i am using -3s and it doesn't work for floating point functions when i import them from mt7s19.dll. With Watcom's -3s stack calling convention, a floating point result is returned in AX:DX.
Ironically, with Watcom's -3 register calling convention, a floating point result is returned in ST0 as TCC expects, but then the function arguments don't match up.
I'll probably write a short script to generate wrapper functions and keep my fingers crossed that it works as expected. |