Back to home page

DOS ain't dead

Forum index page

Log in | Register

Back to index page
Thread view  Board view
Laaca(R)

Homepage

Czech republic,
02.01.2019, 00:07
 

SSE instructions in DOS programs? (Developers)

Hi!
What is the recomended way how to obtain whether I can use the SSE instructions?
First step is clear - use the CPUID.
But what about test whether the SSE are supported by the OS?
Sure, I can just include some DPMI server which enables (supports) the SSE (cwsdpmi r7) but what about situation when already other DPMI server is running? Or somebody uses my program under Win95? (which does not have the SSE support)

Internet says that it is enough to test the 9.bit and 10.bit in the CR4 register. But "MOV CR4,AX" is a privileged instruction. So - what to do?

---
DOS-u-akbar!

alexfru(R)

USA,
02.01.2019, 00:31

@ Laaca
 

SSE instructions in DOS programs?

> What is the recomended way how to obtain whether I can use the SSE
> instructions?
> First step is clear - use the CPUID.
> But what about test whether the SSE are supported by the OS?
> Sure, I can just include some DPMI server which enables (supports) the SSE
> (cwsdpmi r7) but what about situation when already other DPMI server is
> running? Or somebody uses my program under Win95? (which does not have the
> SSE support)
>
> Internet says that it is enough to test the 9.bit and 10.bit in the CR4
> register. But "MOV CR4,AX" is a privileged instruction. So - what to do?

Try executing an instruction of interest and catch #UD.
It might be easier to do in real or virtual 8086 mode (I avoided interrupt/exception handling in protected mode under DPMI).

Rugxulo(R)

Homepage

Usono,
02.01.2019, 10:08

@ Laaca
 

SSE instructions in DOS programs?

> Hi!

Felichan Novjaron!

> What is the recomended way how to obtain whether I can use the SSE
> instructions?
> First step is clear - use the CPUID.

But check if CPUID is supported first. (What clone cpu was it that needed it manually enabled?) But to do that assumes 386, so you have to check for that, too. And even that is messy, so check for 286 first. No, I'm not kidding.

> But what about test whether the SSE are supported by the OS?

http://board.flatassembler.net/topic.php?t=10708


_sse_supported:
  fxsave [xmm_save]
  mov dword ptr [xmm_save+160],0x45443043   # 'C0DE'
  fxrstor [xmm_save]
  mov dword ptr [xmm_save+160],0x44303044   # 'D00D'
  fxsave [xmm_save]
  cmp dword ptr [xmm_save+160],0x45443043   # 'C0DE'
  mov eax,0
  jnz _sse_supported_bye
  inc al                     # if OSFXSR not turned on, XMM* are not saved
_sse_supported_bye:
  ret


> Sure, I can just include some DPMI server which enables (supports) the SSE
> (cwsdpmi r7) but what about situation when already other DPMI server is
> running?

Causeway (CWSTUB) 4.x, DOS/32A 9.1.2, HDPMI32, and CWSDPMI (r5 2008 or later) all enable it for you.

It's because you're not supposed to use it if the OS can't handle it. OXFXSR means save extended FPU state across task switches. FXSAVE/FXRSTOR actually first appeared in late model P2s, and it's faster than the old way.

> Or somebody uses my program under Win95? (which does not have the
> SSE support)

Win95 predates it (SSE1 was 1998/2001, and SSE2 was 2000/2003), but I forget exactly when (Win95 OSR2?) that it was later supported. I'm sure you can just run some simple tool (in DOS, via AUTOEXEC) before Win95 starts up, no?

> Internet says that it is enough to test the 9.bit and 10.bit in the CR4
> register. But "MOV CR4,AX" is a privileged instruction. So - what to do?

Yes, that was the problem (for me), CWSDPMI r5 (2000 or 2002) was ring 3 by default, so I couldn't manually enable it (easily). Plus, even the weaker ring 0 version didn't support swapping.

Obviously you have to enable it before switching away from ring 0. I'd almost wonder why this kind of thing isn't in the FreeDOS kernel proper, but sadly it's too rare for most DOS programmers to care about. I'm glad CWS was wise enough to add support for it.

Here's some few other links I posted on FASM's forum a while back. (I'm no expert, of course.)

http://board.flatassembler.net/topic.php?t=20556

P.S. It's all for naught. Most mainstream software has, for years, blindly assumed and forced SSE2 on everyone. "Most" people have it already. Many "obsolete" 32-bit (IA-32) projects have been abandoned as well in lieu of AMD64. And AVX-512 will assimilate us all. Resistance is futile.

RayeR(R)

Homepage

CZ,
02.01.2019, 22:56

@ Laaca
 

SSE instructions in DOS programs?

> Internet says that it is enough to test the 9.bit and 10.bit in the CR4
> register. But "MOV CR4,AX" is a privileged instruction. So - what to do?

Is it necessary under pure DOS? If there will run only one program that use SSE it shouldn't be needed to save context as nothing else could corrupt it. It would probably work for Win95 too - task will be switched but SSE regs remains if no other app wil use it. But of course it would be correct to use context saving. BTW you will need extra alignment for SSE data (and maybe instructions?) I think 16B boundary. There was a problem that SSE code of FFMPEG didn't work under DOS/DJGPP so I disabled all inline asm code for SSE but I never went deep into it if there's some workaround.

---
DOS gives me freedom to unlimited HW access.

Laaca(R)

Homepage

Czech republic,
03.01.2019, 12:51

@ RayeR
 

SSE instructions in DOS programs?

In theory - SSE instructions use a new set of registers which have to be preserved when task swithing by OS.
So there is a security mechanism when SSE capable OS must report "I know these SSE-registers". It is done by setting bits in CR4 register.

When these bits are not set and the SSE instruction occurs it should raise a CPU exception even in single-task OS.

However today I tried to provocate this exception and I was not able to do it. I used a clean MS-DOS 7.1 without any drivers and CWSDPMI r3 from 1995 and no exception was raised.

It looks that the 9. and 10. bits in CR4 are set by my BIOS.

UPDATE:
I loaded CWSDPR0 and loaded the CR4 register. The 9.bit is set, the 10.bit not.

I am not sure what does it mean, wikipedia says this: contol registers


Anyway, I give it up, the result for me is that CPUID test for SSE is OK for real life.

---
DOS-u-akbar!

Laaca(R)

Homepage

Czech republic,
03.01.2019, 12:52

@ Laaca
 

SSE instructions in DOS programs?

..

---
DOS-u-akbar!

RayeR(R)

Homepage

CZ,
03.01.2019, 14:32

@ Laaca
 

SSE instructions in DOS programs?

I guess that SSE should work without setting those 2 bits in CR4 - if you don't need turn on exceptions (code should work without them too). I'm not sure what exactly bit 9 does. FXSAVE instruction save FPU, MMX, SSE regs. context to given location in memory. But as I told for a single task OS it's not necessary if no other code will use this regs. Here is better description with some sample code
https://wiki.osdev.org/SSE
they even set some bits in CR0.
In order to allow SSE instructions to be executed without generating a #UD, we need to alter the CR0 and CR4 registers.
I don't know what they means by "#UD" but from this description it seems that bits must be enabled to allow SSE instructions even on single task OS. But you told it didn't crash. I'm confused...

---
DOS gives me freedom to unlimited HW access.

RayeR(R)

Homepage

CZ,
03.01.2019, 14:39

@ RayeR
 

SSE instructions in DOS programs?

Aha, here is better description:
https://wiki.osdev.org/FPU
CR4.OSFXSR (bit 9)

Enables 128-bit SSE support. When clear, most SSE instructions will cause an invalid opcode, and FXSAVE and FXRSTOR will only include the legacy FPU state. When set, SSE is allowed and the XMM and MXCSR registers are accessible, which also means that your OS should maintain those additional registers. Trying to set this bit on a CPU without SSE will cause an exception, so you should check for SSE (or long mode) support first.

CR4.OSXMMEXCPT (bit 10)

Enables the #XF exception. When clear, SSE will work until an exception is generated, after which all SSE instructions will fail with an invalid opcode. When set, the exception handler is called instead and the problem may be diagnosed and reported. Again, you can't set this bit without ensuring SSE support is present

So you MUST enable bit 9 but MAY not enable bit 10. Of course check CPUID first before touching this (and check for CPUID itself on 386 :).

---
DOS gives me freedom to unlimited HW access.

marcov(R)

04.01.2019, 23:37

@ RayeR
 

SSE instructions in DOS programs?

> BTW you will need extra alignment for SSE data (and
> maybe instructions?) I think 16B boundary.

(I do some SSE2/3 and avx2 at work for image processing routines, not under dos. I don't know all the system level details (like the CR4 bits, since we deliver the machines the code runs on), but I do know how to craft simple routines like image rotation, colour conversion and kernel routines)

Alignment exception or penalty depends on exact CPU and instruction set (SSE3 has instructions of unaligned loads that are aliases for the normal instructions in afaik core 4xxx series and newer (?)), but indeed it is better to naturally align (iow x byte operations on (x rounded up to power of two), so that never a 4/8k page is crossed by an operation.

This also means that for local variables the stack must aligned accordingly, sometimes that is a problem for 32-bit compilers (for 64-bit it is part of the standard ABI, so there the problem shifts to 32-byte with AVX-2).
(but that probably needs 16-byte datatype support too, since an 16-byte byte array aligns to 1 not 16 byte)

AVX-512 rollout seems to have stalled, and it is not mass available, even on newly bought Intel machines. Some of the high core server CPUs clocked back heavily when using AVX2 on all processors. Possibly to expensive at the current node, and the whole rollout is stalled because of intels process problems.

Ryzen has AVX2, but some inefficiencies (using 2 128-bit execution units to handle a 256-bit instruction), which is rumoured to be fixed in the Ryzen 3000 series this summer. That said you usually get more cores per buck, so effectively it is still pretty ok.

> There was a problem that SSE
> code of FFMPEG didn't work under DOS/DJGPP so I disabled all inline asm
> code for SSE but I never went deep into it if there's some workaround.

If e.g. the library for ffmpeg expects the stack aligned, but some code calling into ffmeg doesn't then you get trouble

Rugxulo(R)

Homepage

Usono,
12.01.2019, 23:54

@ marcov
 

SSE instructions in DOS programs?

> > There was a problem that SSE
> > code of FFMPEG didn't work under DOS/DJGPP so I disabled all inline asm
> > code for SSE but I never went deep into it if there's some workaround.
>
> If e.g. the library for ffmpeg expects the stack aligned, but some code
> calling into ffmeg doesn't then you get trouble

There is a GCC option called "-mstackrealign" which should help with that. At least, it will do "and esp, -16" (maybe a bit more extra stuff beyond just that) which should help you.

IIRC, DJGPP's linkable COFF .o format is a bit less flexible in alignment (always four bytes??) versus MS COFF or other formats. You may instead wish to manually align the stack only for certain functions via attribute force_align_arg_pointer.

RayeR(R)

Homepage

CZ,
13.01.2019, 16:44
(edited by RayeR, 13.01.2019, 23:10)

@ Rugxulo
 

SSE instructions in DOS programs?

> There is a GCC option called "-mstackrealign" which should help with that.
> At least, it will do "and esp, -16" (maybe a bit more extra stuff beyond
> just that) which should help you.

I'm not sure if this helps in case of inlined assembly and linking external assembler files (FFMPEG use YASM for some modules), for pure C code it should work.

---
DOS gives me freedom to unlimited HW access.

Back to index page
Thread view  Board view
15877 Postings in 1466 Threads, 269 registered users, 80 users online (0 registered, 80 guests)
DOS ain't dead | Admin contact
RSS Feed
powered by my little forum