Back to home page

DOS ain't dead

Forum index page

Log in | Register

Back to index page
Thread view  Board view
bretjohn(R)

Homepage E-mail

Rio Rancho, NM,
12.09.2011, 02:15
 

Using Multiple CPU Cores in DOS? (Miscellaneous)

I had a thought this morning that I thought I would "throw against the wall to see if it sticks or not". I figured I would start on this forum since it seems to be where a lot of technical knowledge and "outside the box" thinkers hang out.

As many of you know, I have been working on updating my USB drivers, which is moving along slowly. One problem I have been running into is speed. More specifically, the major problem I am having is the fact that DOS runs in RM/V86, but I need to switch to PM in order to access and manipulate MMIO space of the EHCI and OHCI USB host controllers. There may be many of these mode switches that must occur during each IRQ, which very noticeably slows the entire system down. If the system can simply stay in one mode or the other (which it can with UHCI, since UHCI uses PIO instead of MMIO), the system speed issues are not so bad. But switching modes is EXTREMELY slow.

Up until now, most of my efforts to alleviate this problem have been to try and minimize the number of mode switches required (and I am going to continue to do that). However, this morning I thought of another approach that might prove beneficial to almost everyone needing PM access from DOS (VCPI, DPMI, DPMS, INT 15.87, etc.), not just my USB drivers. Instead of having a single CPU/Core switch between RM/V86 and PM, why not have one CPU/Core running in RM/V86 and another running in PM, and switch between the CPU's/Cores? I think switching CPU's would be almost instantaneous, much faster than switching modes on a single CPU.

This does not need to involve threads or anything like that (which I believe is beyond the scope of where DOS should go, anyway), but would still allow DOS to take advantage of the now-prevalent multi-CPU/multi-core technologies to increase speed. Because the system would still be single-threaded, though, the implementation should be simpler than what Windows or *nix requires.

I have no idea what the technical obstacles of such an approach might involve, though I'm sure they would be significant. E.g., I don't even know if it's even possible to have one CPU/Core running in RM/V86 and another in PM at the same time. It seems like it should be possible, but the CPU manufacturers may not have even considered this as a possibility, much less provided for or tested it.

I'm just throwing this out there for now to get some opinions of people who are far more knowledgeable about some of these things than I am. Feel free to throw darts if it is a dumb idea. I should also state right here that I do not intend to work on anything like this myself (at least in the near future), since I already have too many irons in the fire. If this is even practical, though, I thought it might pique somebody's interest who is looking for a practical project to work on.

RayeR(R)

Homepage

CZ,
12.09.2011, 15:26

@ bretjohn
 

Using Multiple CPU Cores in DOS?

Hehe, this is challenging. 1st you may have to look at ACPI spec to learn how to determine available CPU cores and obtain their APIC ID (I'm at the beginning of on this 1st step). Then learn how to program and turn on APIC to be able to swich cores. But I think it's not possible to run each core in different mode. It would be nice if in 64bit long mode the second core could run in v86 so NTVDM support for old 161bit apps would be possible but I think it can't.

The problem is obsolete DOS itself running in RM. If you run under some 32bit dos (there were some experimental verisons) then no need to switch modes. BTW in some text about cwsdpmi I read that swithing between VM/PM is faster then RM/PM so it helps if you install some memory manager (Jemm/Qemm/Emm) before loading drivers.

---
DOS gives me freedom to unlimited HW access.

Laaca(R)

Homepage

Czech republic,
12.09.2011, 17:26

@ RayeR
 

Using Multiple CPU Cores in DOS?

If you really want to use two CPU cores you could find some help on forums in http://www.osdev.org

But you still will have to solve the speed on single core CPUs.
I believe that moving data from base memory into extended memory via INT 15h.87 is slow in realmode but are you sure it is slow in vm86 mode too?

Or you couls maybe requiere loaded DPMI server, if it could help, of course.

---
DOS-u-akbar!

RayeR(R)

Homepage

CZ,
12.09.2011, 20:13

@ Laaca
 

Using Multiple CPU Cores in DOS?

> Or you couls maybe requiere loaded DPMI server, if it could help, of
> course.

I read that (CWS)DPMI decide itself according to if it is already started from VM or RM then it uses different methods for switching to PM. And switching from VM should be faster.

---
DOS gives me freedom to unlimited HW access.

bretjohn(R)

Homepage E-mail

Rio Rancho, NM,
12.09.2011, 22:36

@ Laaca
 

Using Multiple CPU Cores in DOS?

> But you still will have to solve the speed on single core CPUs.

Indeed, and there may be no way to solve that. Things just may need to run more slowly.

> I believe that moving data from base memory into extended memory via INT
> 15h.87 is slow in realmode but are you sure it is slow in vm86 mode too?

It's much faster in VM than RM, but still too slow when you need to do a lot of them. Here's some info from the two computers I use today:

To do an INT 15.87 memory transfer on a Sony laptop (approx 5 years old):
from Real Mode: ~ 15 ms
w/ HIMEM: ~ 7.5 ms
w/ HIMEM + EMM386: ~ 300 us

To do the same thing on a Dell desktop (approx 2 years old):
from Real Mode: ~ 1 ms
w/ HIMEM: ~ 1 ms
w/ HIMEM + EMM386: ~ 40 us

> Or you couls maybe requiere loaded DPMI server, if it could help, of course.

The problem is, DPMI isn't designed to be used with TSR's, and really doesn't help. That's why I'm investigating DPMS and JLM's.

If something like my suggestion is possible, it could increase system speed for nearly all applications that need PM (DPMI, EMS, XMS, etc.), not just USB or TSR's. Even when you're in DPMI/PM, e.g., the system "stops" every time there is an IRQ, and at least part of the IRQ processing is usually done in RM/VM. In a "normal" system, the IRQ's usually only happen a few dozen times a second, so it usually isn't perceived as a big deal. With USB, though, the problem is amplified because the IRQ's happen hundreds of times a second, and it becomes VERY noticeable.

Also in DPMI/PM, most INT calls need to be reflected to real mode. In addition, I/O requests should usually be reflected to real mode (though they normally aren't by default), in case there is any I/O virtualization going on. Even in a "normal" system there may be a lot of mode switching going on, far more than I think most people realize.

Requiring a specific version of DOS (like a 32-bit one), or a specific version of EMM, or even requiring an EMM at all, isn't a viable alternative, IMO. USB needs to work no matter what, even if it works slowly.

RayeR(R)

Homepage

CZ,
13.09.2011, 00:33

@ bretjohn
 

Using Multiple CPU Cores in DOS?

> Also in DPMI/PM, most INT calls need to be reflected to real mode. In
> addition, I/O requests should usually be reflected to real mode (though
> they normally aren't by default), in case there is any I/O virtualization
> going on. Even in a "normal" system there may be a lot of mode switching
> going on, far more than I think most people realize.

This thing is still not clear to me. Some years ago I wrote mp3 player with Covox output on LPT (using scitech mp3 lib) in DJGPP. This program change timer speed and install timer ISR for sending decoded samples to LPT port. I use this function to install:
_go32_dpmi_chain_protected_mode_interrupt_vector(INT_TIMER,&timer_isr_new);
I also tried to do it via
_go32_dpmi_set_protected_mode_interrupt_vector(INT_TIMER,&timer_isr_new);
but it froze immediatelly.
I think there's also 3rd way installing a realmode ISR (via transfer buffer). Question is what's the order of calling ISRs in RM and PM and how to do it best way to have smooth playback. In this case I think that timer IRQ is also hooked by realmode BIOS. If I use chain function it probably means PM ISR is executed 1st and then call RM ISR? If I use set function it will never call RM ISR and BIOS will mess up and freeze? Maybe it would be best if I allow call RM ISR only say on every 100th tick.

The current result is that it's working on 22kHz sample rate but you can hear some short periodic drops during playback...

---
DOS gives me freedom to unlimited HW access.

bretjohn(R)

Homepage E-mail

Rio Rancho, NM,
13.09.2011, 04:25

@ RayeR
 

Using Multiple CPU Cores in DOS?

> This thing is still not clear to me. Some years ago I wrote mp3 player with
> Covox output on LPT (using scitech mp3 lib) in DJGPP. This program change
> timer speed and install timer ISR for sending decoded samples to LPT port.

Let's do a quick calculation, using my Sony Laptop as an example. We'll assume it's running in RM (without an EMM), so it will take around 15 ms to do a pair of mode switches. Let's say you reprogram the timer interrupt to occur 100 times a second instead of the normal 18.2. Because your program is running in PM but your handler is running in RM, there must be a pair of mode switches, which take 15 ms, and they must occur once every 10 ms. Guess what? The system will crash since this is impossible. Your computer is fast enough to work, but not all computers will be.

> Question is what's the order of calling ISRs in RM and PM and how to do it
> best way to have smooth playback.

If the CPU is in PM, the hardware will call the PM handler. It is up to the PM handler to decide if and when it will call the RM handler. In the case of the timer interrupt, it absolutely SHOULD call the RM handler, which requires a pair of mode switches.

> Maybe it would be best if I allow call RM ISR only say on every 100th tick.

You need to call the original RM handler at the same rate it was called before you messed with it (approximately 18.2 times a second). If you don't, at a minimum the system clock will be incorrect, but you will probably also screw up some TSR's or Device Drivers that depend on the interrupt for timing (which could crash the system).

RayeR(R)

Homepage

CZ,
13.09.2011, 19:32

@ bretjohn
 

Using Multiple CPU Cores in DOS?

> Let's do a quick calculation, using my Sony Laptop as an example. We'll
> assume it's running in RM (without an EMM), so it will take around 15 ms to
> do a pair of mode switches. Let's say you reprogram the timer interrupt to
> occur 100 times a second instead of the normal 18.2. Because your program
> is running in PM but your handler is running in RM, there must be a pair of
> mode switches, which take 15 ms, and they must occur once every 10 ms.
> Guess what? The system will crash since this is impossible. Your computer
> is fast enough to work, but not all computers will be.

Yes I understand that higher irq rate that it can switch will cause failure.
from this description of _go32_dpmi_chain_protected_mode_interrupt_vector
it seems it install my ISR as PM ISR and then chaining to next ISR but it's not clear if next PM ISR or next RM ISR. If it would chain RM ISR and it would be slow as you say then it couldn't work. I tested it on P166 and it run at 22kHz. I don't remember if it started from VM or RM.

> You need to call the original RM handler at the same rate it was called
> before you messed with it (approximately 18.2 times a second). If you
> don't, at a minimum the system clock will be incorrect, but you will
> probably also screw up some TSR's or Device Drivers that depend on the
> interrupt for timing (which could crash the system).

Yes I think it would be right way but I cannot achieve this with _go32_dpmi_chain_protected_mode_interrupt_vector if it calls RM handler automatically. I need to use _go32_dpmi_set_protected_mode_interrupt_vector which doesn't call anything else and call RM handler manually.

There's also a note:
The DPMI spec says that 3 software interrupts are special, in that they also get reflected to a protected-mode handler. These interrupts are: 1Ch (the timer tick interrupt), 23h (Keyboard Break interrupt), and 24h (Critical Error interrupt). This means that, to catch these interrupts, you need to install a protected-mode handler only.

but what this reflection exactly mean and what is the order of calling ISRs in this case...

---
DOS gives me freedom to unlimited HW access.

Laaca(R)

Homepage

Czech republic,
13.09.2011, 21:12

@ RayeR
 

Using Multiple CPU Cores in DOS?

Bret, can you somehow reduce the frequency of IRQ event?
I mean setup USB host to generate this IRQ less often.

---
DOS-u-akbar!

bretjohn(R)

Homepage E-mail

Rio Rancho, NM,
13.09.2011, 23:02

@ Laaca
 

Using Multiple CPU Cores in DOS?

> Bret, can you somehow reduce the frequency of IRQ event?
> I mean setup USB host to generate this IRQ less often.

This is possible to a certain extent, and I'm already doing it when it's appropriate. I already test the computer to see how long the mode switches take, and adjust the IRQ rate to the extent the hardware allows. The problem is, slowing down the IRQ rate also slows down access to everything on the USB bus. And, with all of the slow mode switching going on in the background, the speed of all foreground applications is adversely affected as well.

Just as an aside, USB is really designed to run with the possibility of IRQ's being generated at least once every millisecond. With EHCI that can be increased to once every 125 microseconds.

I'm doing what I can to mitigate this in the drivers, but it would sure be nice if someone else could maybe work on a way to decrease the switching times, if it's even possible.

RayeR(R)

Homepage

CZ,
14.09.2011, 03:30

@ bretjohn
 

Using Multiple CPU Cores in DOS?

BTW how do you measure CPU mode switch? Do you have some small util for this? I could test it on my C2D E8400 how fast will it be...

---
DOS gives me freedom to unlimited HW access.

bretjohn(R)

Homepage E-mail

Rio Rancho, NM,
14.09.2011, 18:24

@ RayeR
 

Using Multiple CPU Cores in DOS?

> BTW how do you measure CPU mode switch? Do you have some small util for
> this? I could test it on my C2D E8400 how fast will it be...

What I do is measure how long it takes to do a small INT 15.87 memory transfer. The only parts that should be involved in an INT 15.87 transfer outside of the pair of time-consuming mode switches are some initialization, a REP MOVSx, and some clean-up, which are all minimal in time consumption.

I don't have a separate utility for that -- right now it's just integrated into the programs that need to know how long it takes.

RayeR(R)

Homepage

CZ,
15.09.2011, 01:15

@ bretjohn
 

Using Multiple CPU Cores in DOS?

> What I do is measure how long it takes to do a small INT 15.87 memory
> transfer. The only parts that should be involved in an INT 15.87 transfer
> outside of the pair of time-consuming mode switches are some
> initialization, a REP MOVSx, and some clean-up, which are all minimal in
> time consumption.

But you don't know how BIOS implements this service. If BIOS writer make it crappy then you got bad result and it may differ from pure time needed for mode switch. Maybe I would get better result if I install a dummy RM ISR (just IRET) for some unused INT vector and then call it via __dpmi_int(myINT, &regs); and measure the time. But it will also counts DPMI server overhead...

---
DOS gives me freedom to unlimited HW access.

bretjohn(R)

Homepage E-mail

Rio Rancho, NM,
15.09.2011, 19:11

@ RayeR
 

Using Multiple CPU Cores in DOS?

> But you don't know how BIOS implements this service. If BIOS writer make it
> crappy then you got bad result and it may differ from pure time needed for
> mode switch. Maybe I would get better result if I install a dummy RM ISR
> (just IRET) for some unused INT vector and then call it via
> __dpmi_int(myINT, &regs); and measure the time. But it will also counts
> DPMI server overhead...

You can go ahead do such a test -- just make sure you're comparing apples to apples.

Even if the BIOS writer did a crappy implementation, it still wouldn't take 25 or 50 times as long as it needs to. Even a REALLY bad implementation would only be double, and in reality probably even far less than that.

The problem we need to resolve is not just a few percent, either. It's very significant -- I think we need a multiplier of at least 10 even from V86 mode.

Rugxulo(R)

Homepage

Usono,
13.09.2011, 02:27

@ bretjohn
 

Using Multiple CPU Cores in DOS?

(CAVEAT: this was not meant to be a long rant, sorry, and I hope it's not too off-topic or unhelpful, that wasn't my goal. Please keep this in mind.)

> > Or you couls maybe requiere loaded DPMI server, if it could help, of
> course.
>
> The problem is, DPMI isn't designed to be used with TSR's, and really
> doesn't help. That's why I'm investigating DPMS and JLM's.

Does DPMIONE has TSR support? (Hmmm, maybe not, I might be thinking of Flashtek X32.) But I guess you don't want to rely on a specific (abandoned?) implementation or two. I agree, of course, just saying ....

> If something like my suggestion is possible, it could increase system speed
> for nearly all applications that need PM (DPMI, EMS, XMS, etc.), not just
> USB or TSR's.

I'm vaguely thinking that some (semi-obscure) computers have supported this for years. Not home PCs, mind you, but others. I'd be very surprised if no one had tried this yet. I'm pretty sure somebody somewhere has a computer than can run two different chips running two different OSes at the same time.

It's an interesting idea, at least, but I suspect most people will say something (dumb? reasonable?) like, "Just use virtualization (VirtualPC, QEMU, VirtualBox, VMware)." Especially with things like AMD SVM (paged real mode) or VT-3x or whatever the hell they call the latest (Westmere? "unrestricted guest execution", real mode, big real mode), nested page tables, etc.

Okay, maybe that wouldn't help your USB for DOS cause specifically, but just in general, I mean, I think people (e.g. MS including Hyper-V in Win8 by default, even for home use) are starting to want people to use virtualization for any legacy concerns. Hence trying to run two OSes on the actual cores without an OS is probably less of a goal for them than running under modern virtualization. (Perhaps you can somehow write a very limited hypervisor and somehow use that to implement faster mode switching, I dunno. That's the only "real" reason, no pun intended, to bring this up, heh.)

> Also in DPMI/PM, most INT calls need to be reflected to real mode. In
> addition, I/O requests should usually be reflected to real mode (though
> they normally aren't by default), in case there is any I/O virtualization
> going on. Even in a "normal" system there may be a lot of mode switching
> going on, far more than I think most people realize.

It's odd that on the one hand, everybody wants ultra speed, but on the other hand they want ultra compatibility. I don't know if you can really have both. At least, the PC market has gotten incredibly fractured trying to chase these two goals (and fast hardware advancements, of course). I'm not complaining, but it does seem like we're suffering from the weight of it all. (And obviously I think throwing everything away and starting from scratch isn't even feasible, fun, or recommended. But no one can agree which part to preserve as everybody's usage is different.)

> Requiring a specific version of DOS (like a 32-bit one), or a specific
> version of EMM, or even requiring an EMM at all, isn't a viable
> alternative, IMO. USB needs to work no matter what, even if it works
> slowly.

As opposed to requiring latest Windows (which most developers do) or latest Linux .DEB or .RPM distro (if even) or latest Mac OS X upgrade? That's the way the world works: find one (obscure) solution, and force it down everyone's throat, esp. for money. It's annoying. I mean, I get it, it simplifies some things, but it's not a universal solution. That's why compatibility is important.

Honestly, it's kinda dopey that we don't have portable drivers or binaries for all these various x86 systems. Am I the only one who thinks it's dumb that x86 OS #1 can't run code from x86 OS #2 despite being written for the same language (e.g. C89)? At least FreeBSD partially got that right by emulating others. Maybe that's why Java is so popular for modern developers (allegedly #1 or #2 most popular programming language by far). But that seems too high-end, difficult, non-standard, and of course high requirements for my tastes. Meh, there is no good solution.

EDIT: Would the Rosetta OS group's framework sound interesting here?

I wonder sometimes if we'd be better off splitting the IBM PC line into "legacy" and "(modern) multimedia" and "server". Well, I guess that's what we did, almost, with Windows (legacy), Mac (multimedia), Linux (server). Unfortunately, Win64 isn't legacy at all anymore (unless you're one of the crazies who thinks 32-bit is deprecated and obsolete [yes, some people think that!]). It seems everybody nowadays wants to support multimedia, which needs tons of RAM, which needs 64-bit, which needs modern drivers, which needs money. And modern gaming needs 3D acceleration, video, sound, etc., basically high-end multimedia, which needs all of the above. And Windows is big in the gaming world. So even if servers and businesses and home users don't want it, they're stuck with the effects of that support and the lack of 16-bit (DOS) backwards compatibility due to 64-bit due to gaming and multimedia.

I guess in theory we should all switch to Linux/DOSEMU (somewhat buggy), eCS (OS/2, too expensive) or work on the FreeDOS kernel more (too difficult). And before you whine, DOS386, :-P I'm just saying, doing everything in pure DOS is infeasible, not the least because of lack of drivers and developers. Dual booting is fine (I'm doing it now), but it's not a perfect solution, nor is virtualization, nor is NTVDM or MDOS or DOSEMU or whatever, nor buying old (prone to failure) hardware.

RayeR(R)

Homepage

CZ,
14.09.2011, 03:52

@ Rugxulo
 

Using Multiple CPU Cores in DOS?

> Honestly, it's kinda dopey that we don't have portable drivers or binaries
> for all these various x86 systems. Am I the only one who thinks it's dumb
> that x86 OS #1 can't run code from x86 OS #2 despite being written for the
> same language (e.g. C89)? At least FreeBSD partially got that right by

Scitech was already tried to do such drivers but they're gone... Also I read somewhere, that nvidia have unified driver that share a lot of code between win and linux version but of course there are many differences in kernels API so there must be some layer between... I think that problem is there was a lack of such open driver standard in the past so groups developing OSes setup their own closed standards.

BTW as I studied EFI spec, there should be some graphics API that will be available for EFI apps. So when it spreads it may become quite good standard. It should provide also other functions/drivers. But I'm not sure if this EFI API can be utilized only during boot stage or later all the time. But there was mentioned that EFI allows you to load EFI apps from EFI (FAT32) partition on HDD (like vendor specific diag/config tools and there are alrerady also some games!) so I guess it will be available. But EFI should be running completly in PM so probably when you boot DOS and switch to RM you will be lost. The question is how long there will be legacy support emulationg BIOS services in EFI BIOS...
So shortly, there will be some new API available for using modern HW but incompatible with old 16bit DOS.

---
DOS gives me freedom to unlimited HW access.

tom(R)

Homepage

Germany,
14.09.2011, 13:24

@ bretjohn
 

Using Multiple CPU Cores in DOS?

> To do an INT 15.87 memory transfer on a Sony laptop (approx 5 years old):
> from Real Mode: ~ 15 ms
> w/ HIMEM: ~ 7.5 ms
> w/ HIMEM + EMM386: ~ 300 us
>
> To do the same thing on a Dell desktop (approx 2 years old):
> from Real Mode: ~ 1 ms
> w/ HIMEM: ~ 1 ms
> w/ HIMEM + EMM386: ~ 40 us
> slowly.

these numbers seem to be somewhat slow but Real Mode INT 15.87 is known to be slow.

you might look at HIMEM how it copies extended memory:

if in protected mode, rely on an efficient implementation

if in real mode,
if FreeDOS
switch to protected mode, copy data, switch back to RM
if MSDOS
use unreal mode and copy data

no need to use an extra CPU core to accelerate (OHCI?) USB

bretjohn(R)

Homepage E-mail

Rio Rancho, NM,
14.09.2011, 19:03

@ tom
 

Using Multiple CPU Cores in DOS?

> if in real mode,
> if FreeDOS
> switch to protected mode, copy data, switch back to RM
> if MSDOS
> use unreal mode and copy data

How do you figure there's going to be a huge difference between switching modes yourself and having INT 15.87 do it for you? Other than the mode switches, INT 15.87 doesn't need to do anything that takes a lot of time.

> no need to use an extra CPU core to accelerate (OHCI?) USB

Absolutely not true. Whether the majority of the IRQ code is in RM or PM, there are still going to be several mode transitions required.

If in RM, the only required transitions to PM are to access the MMIO space. The problem is, there will always need to be at least one, and usually several, of these required during each IRQ. It may be possible to "consolidate" some (but by no means all) of them, which is one of the things I will be working through.

If in PM, all INT xx calls and all PIO access to ports >= 100h require a transition to RM and back again. I will try and minimize the number of those required as well, though it may not be possible to eliminate all of them. The larger issue is that when there is some data transferred across the bus, the host driver needs to notify the individual device driver (USBKEYB, e.g.) through a RM call-back address. It may also be possible to queue/consolidate these to reduce the number of transitions required.

There is also the problem of IRQ sharing, where a lot of USB host controllers, and sometimes other PCI devices as well, use the same IRQ. That can also increase the number of mode switches required.

Bottom line is that mode switches doesn't just happen once each time there is an IRQ, and the IRQ's happen VERY often, so the time it takes must be minimized. I was just trying to brainstorm an ideas that could help with this, since it is actually a much larger issue than just my USB drivers, but seems to have always been ignored.

tom(R)

Homepage

Germany,
15.09.2011, 14:43
(edited by tom, 15.09.2011, 18:34)

@ bretjohn
 

Using Multiple CPU Cores in DOS?

> > if in real mode,
> > if FreeDOS
> > switch to protected mode, copy data, switch back to RM
> > if MSDOS
> > use unreal mode and copy data
>
> How do you figure there's going to be a huge difference between switching
> modes yourself and having INT 15.87 do it for you? Other than the mode
> switches, INT 15.87 doesn't need to do anything that takes a lot of time.

may be 'other then mode switches' is the answer, and it's fairly well known that BIOS int15.87 is SLOOOOW.

anyway, on an 500 MHz K6, I'm able to call XMSMove() 590000 times per second in real mode (for blocksize 4..64). that should be fast enough for a keyboard handler (and even EHCI USB should you ever implement this)

in protected mode, this very much depends on the DPMI host, as you are accessing 'memory' that does not exist, so the DPMI host has not necessarily page table descriptors for this memory allocated, and must somehow emulate that. I remember that FD-EMM386 was modified to handle this, but forgot the details

P.S: my I7 920 in real mode can execute 1,900,000 XMSMoves per second

P.P.S: use www.drivesnapshot.de/freedos/himem.exe and run
HIMEM /TEST
to measure these numbers;

bretjohn(R)

Homepage E-mail

Rio Rancho, NM,
15.09.2011, 18:24

@ tom
 

Using Multiple CPU Cores in DOS?

> in protected mode, this very much depends on the DPMI host, as you are
> accessing 'memory' that does not exist, so the DPMI host has not
> necessarily page table descriptors for this memory allocated, and must
> somehow emulate that. I remember that FD-EMM386 was modified to handle
> this, but forgot the details

I'm not using DPMI, and MMIO is "real" memory -- it's just not RAM. Also AFAIK, you can't assign an XMS handle to a specific known physical memory address above 1 MB, but maybe you know something I don't. XMS moves won't work in this situation unless you can do that. Plus, this needs to work even if XMS isn't available.

> P.P.S: use ftp://www.drivesnapshot.de/freedos/himem.exe and run
> HIMEM /TEST
> t measure these numbers;

The site seems to need a non-anonymous login and password.

tom(R)

Homepage

Germany,
15.09.2011, 18:40

@ bretjohn
 

Using Multiple CPU Cores in DOS?

> > in protected mode, this very much depends on the DPMI host, as you are
> > accessing 'memory' that does not exist, so the DPMI host has not
> > necessarily page table descriptors for this memory allocated, and must
> > somehow emulate that. I remember that FD-EMM386 was modified to handle
> > this, but forgot the details
>
> I'm not using DPMI, and MMIO is "real" memory -- it's just not RAM.
right. still a self respecting DPMI host has only page tables for existing memory, not for the full 4GB address space

> Also
> AFAIK, you can't assign an XMS handle to a specific known physical memory
> address above 1 MB,
right. you have to take the code out of HIMEM and put it into your driver.
still a bit easier then 'using multiple CPU cores for USB'
I referred to XMSMove as this is exactly what you need - including mode switching ....

please use www.drivesnapshot.de/freedos/himem.exe and run
HIMEM /TEST
to measure these numbers;

bretjohn(R)

Homepage E-mail

Rio Rancho, NM,
16.09.2011, 01:33
(edited by bretjohn, 16.09.2011, 02:20)

@ tom
 

Using Multiple CPU Cores in DOS?

> please use www.drivesnapshot.de/freedos/himem.exe and run
> HIMEM /TEST
> to measure these numbers;

Where's the source code?

EDIT: I found it on Japheth's site -- never mind.

RayeR(R)

Homepage

CZ,
16.09.2011, 02:26

@ tom
 

Using Multiple CPU Cores in DOS?

Here's my test results with various memmgrs (dos 6.22, c2d e8400)

QEMM386.SYS (speed kick ass but limited to 256MB XMS)
blocksize    4:3016520 iterations in 1 second
blocksize   16:3012128 iterations in 1 second     
blocksize   64:3001442 iterations in 1 second     
blocksize  256:2747853 iterations in 1 second     
blocksize 1024:1995045 iterations in 1 second

JEMMEX.EXE
blocksize    4:1687545 iterations in 1 second
blocksize   16:1695841 iterations in 1 second
blocksize   64:1693863 iterations in 1 second
blocksize  256:1602186 iterations in 1 second
blocksize 1024:1179619 iterations in 1 second

HIMEM.SYS
blocksize    4:445851 iterations in 1 second
blocksize   16:445485 iterations in 1 second
blocksize   64:446652 iterations in 1 second
blocksize  256:440279 iterations in 1 second
blocksize 1024:417670 iterations in 1 second

HIMEM.SYS+UMBPCI.SYS
blocksize    4:439380 iterations in 1 second
blocksize   16:439380 iterations in 1 second
blocksize   64:439377 iterations in 1 second
blocksize  256:433588 iterations in 1 second
blocksize 1024:411454 iterations in 1 second

---
DOS gives me freedom to unlimited HW access.

bretjohn(R)

Homepage E-mail

Rio Rancho, NM,
16.09.2011, 18:06

@ tom
 

Using Multiple CPU Cores in DOS?

I will have some time to mess with this over the weekend, and will let you know what happens.

I have looked over the code a little bit in the meantime. Based on what I see in the code and comments, it looks like the speed issues with INT 15.87 from RM may be more related to fiddling with the A20 line, and not the time it takes to switch CPU modes like I though it was. Is that correct?

If so, would it be possible to "patch" this with HIMEM? 95% of what you need is already there. HIMMEM even already traps INT 15.87, but just messes with A20 and lets the BIOS do the actual copying. If neither of the two addresses is in the HMA (or maybe even if they are), couldn't HIMEM just do the copy (hopefully much quicker than the BIOS does) and leave the BIOS out of it completely?

RayeR(R)

Homepage

CZ,
16.09.2011, 18:22

@ bretjohn
 

Using Multiple CPU Cores in DOS?

> Based on what I
> see in the code and comments, it looks like the speed issues with INT 15.87
> from RM may be more related to fiddling with the A20 line, and not the time
> it takes to switch CPU modes like I though it was. Is that correct?

I think that A20 is fiddled only when acessing HMA so it's useless for reaching MMIO at higher address. And when himem access XMS it shouldn't do extra settings of A20. It seems to mile like 2 different things without any relations.

---
DOS gives me freedom to unlimited HW access.

bretjohn(R)

Homepage E-mail

Rio Rancho, NM,
17.09.2011, 02:03

@ RayeR
 

Using Multiple CPU Cores in DOS?

> I think that A20 is fiddled only when acessing HMA so it's useless for
> reaching MMIO at higher address. And when himem access XMS it shouldn't do
> extra settings of A20. It seems to mile like 2 different things without any
> relations.

That seems logical to me also, but there must be something going on in the BIOS that makes it so slow. It seems like all it would need to do is a variation on what HIMEMX does, which should be very fast (haven't done my testing yet, though).

I did notice in the code for the latest HIMEMX that basically all the INT 15.87 trap does is keep track of the A20 state, so the BIOS must be doing something with A20. And, in the comments at the top of the source it talks about how slow this is. I'm just guessing that maybe the BIOS messes with A20 every time, even if it doesn't actually need to. Maybe Tom/Japheth/whoever (somebody who's worked on HIMEMX or similar so has had to do some research already) can shed some light on it.

Laaca(R)

Homepage

Czech republic,
17.09.2011, 07:47

@ bretjohn
 

Using Multiple CPU Cores in DOS?

Discussion about multicore under DOS turned into another direction however yesterday at FreeDOS discussion board appeared this entry"

the intel processor manuals are at:
http://www.intel.com/design/corei7ee/documentation.htm
programming 3a section 7.6 is of specific interest, as it contains stuff about multicore and multithreaded programming for IA-32 procs.

---
DOS-u-akbar!

tom(R)

Homepage

Germany,
17.09.2011, 14:09

@ bretjohn
 

Using Multiple CPU Cores in DOS?

> > I think that A20 is fiddled only when acessing HMA so it's useless for
> > reaching MMIO at higher address.
wrong. A20 must be fiddled with for every access above 1M.

in real life, A20 is disabled on program start, is enabled by the next DOS interrupt, and stays enable until next program is started.

but still the int15.87 handler (and XMSMove) must check for the state.

> That seems logical to me also, but there must be something going on in the
> BIOS that makes it so slow.
measurements show that something is making it slow - whatever it is.

> I did notice in the code for the latest HIMEMX that basically all the INT
> 15.87 trap does is keep track of the A20 state, so the BIOS must be doing
> something with A20.
the BIOS doesn't care about A20. unexpected things would happen if int15.87 is called with A20 disabled.


it might have been be a good idea if HIMEM had implemented int15.87 to make it much faster and exported to the rest of the world. but:

nobody was supposed to call int15.87. memory above 1 M is abstracted by the XMS handler; nobody else should use physical addresses. no need to export a faster version of a 'forbidden' function

so far no HIMEM (but all EMM386) implement int15.87, so this would make your USB driver dependent on a particular HIMEM implementation

it was you who insisted that the USB driver should also work without HIMEM

it's probably easiest if you copy the HIMEM code to your driver

bretjohn(R)

Homepage E-mail

Rio Rancho, NM,
18.09.2011, 04:27

@ tom
 

Using Multiple CPU Cores in DOS?

> > ... but there must be something going on in the BIOS that makes it so
> > slow.
> measurements show that something is making it slow - whatever it is.

I did some experimenting today on one of my computers, and it looks like toggling the A20 line is the sole culprit. I can detail the exact procedures I took if someone wants. Anyway, I can get the timing down to approximately the same as V86 mode by taking A20 out of the equation.

> the BIOS doesn't care about A20. unexpected things would happen if int15.87
> is called with A20 disabled.

That may be true for some BIOS's, but not all. At least on the system I was testing today, the BIOS messes with the A20 line even if it doesn't need to (that is, it takes the time to enable A20 even if it's already enabled).

Anyway, I think I have a way around this problem figured out. Unfortunately, in order to work, it's going to need the A20 line on all the time. I need to get some opinions on how "appropriate" this is, though.

I've seen statements that vary from (paraphrasing), "There are a lot of programs that depend on A20 being disabled so you shouldn't EVER leave it enabled," to "There are only a few poorly written 8086-era programs that depend on A20 being disabled, and nobody uses them any more." I think the latter statement is more accurate, but just wondered what others feel/know about the real situation.

Thanks for all your help Tom, BTW.

tom(R)

Homepage

Germany,
19.09.2011, 15:03

@ bretjohn
 

Using Multiple CPU Cores in DOS?

> I've seen statements that vary from (paraphrasing), "There are a lot of
> programs that depend on A20 being disabled so you shouldn't EVER leave it
> enabled," to "There are only a few poorly written 8086-era programs that
> depend on A20 being disabled, and nobody uses them any more." I think the
> latter statement is more accurate, but just wondered what others feel/know
> about the real situation.

it's somewhere in the middle.
almost no program ever relied on A20 being enabled or disabled.

BUT early versions of Microsoft LINK /EXEPACK and early versions of PKLITE (pre 1991) had a bug, and relied on A20 being disabled.

for this reason MSDOS invented first LOADFIX, them started programs with A20 disabled, and reenabled A20 on the first int 21, and left it enabled.

in my own humble opinion, it's pointless as programs that haven't been updated since ~1991 are probably obsolete, deserve LOADFIX, and my HIMEM's behaviour was 'enable A20 and leave it in this state'.

Bart thought different, and persuaded me to behave like MSDOS, and disable A20 on program start.

anyway, checking for A20 status is easy, like

status = memcmp(0000:80, FFFF:90, 16)

only if status is disabled, you should enable, then copy, then disable A20
status will be enabled most of the time

> Thanks for all your help Tom, BTW.
:-D

bretjohn(R)

Homepage E-mail

Rio Rancho, NM,
19.09.2011, 19:50

@ tom
 

Using Multiple CPU Cores in DOS?

> BUT early versions of Microsoft LINK /EXEPACK and early versions of
> PKLITE (pre 1991) had a bug, and relied on A20 being disabled.
>
> for this reason MSDOS invented first LOADFIX, them started programs with
> A20 disabled, and reenabled A20 on the first int 21, and left it enabled.

Interesting. I never knew that's what LOADFIX was for -- it never made any sense to me before now why a program would have a problem being loaded in the first 64k.

> anyway, checking for A20 status is easy, like
>
> status = memcmp(0000:80, FFFF:90, 16)

Yes, the HIMEMX source has an ASM implementation of the same basic test.

> only if status is disabled, you should enable, then copy, then disable A20
> status will be enabled most of the time

I don't think I want that to be the default behavior, though, since it makes things INCREDIBLY slow -- too slow to depend on in an IRQ handler if there's a way to avoid it. I think my default is going to be to to enable A20 and leave it on, but have a user-selectable option to do it "the slow way". Like you, I don't see a lot of point in making that the default behavior given the limited number of programs affected and the availability of LOADFIX.

***

I still think the possibility of using a second core to make the mode switches faster may still be worth investigating, though. Even when you don't have to mess with the A20 line, the mode switches are still noticeably slow when you need to do a lot of them in the background.

tom(R)

Homepage

Germany,
19.09.2011, 22:00

@ bretjohn
 

Using Multiple CPU Cores in DOS?

> I don't think I want that to be the default behavior, though, since it
> makes things INCREDIBLY slow
wrong. A20 will be enabled almost always, and only a few microseconds after program has started; you won't notice a difference

> -- too slow to depend on in an IRQ handler if
> there's a way to avoid it. I think my default is going to be to to enable
> A20 and leave it on
changing system state at some randowm interrupt time might be one of the worst ideas in operating system design ever - even worse then 'my driver needs a second core'

> but have a user-selectable option to do it "the slow
> way".
have a user selectable option to do it the fast way

> Like you, I don't see a lot of point in making that the default
> behavior given the limited number of programs affected and the availability
> of LOADFIX.
there's a tiny difference.

with these old programs, without LOADFIX, they would print a predictable error message, and exit.

now randomly enabling A20 at some randowm interrupt time removes the predictability (of failure), and instead introduces the feature 'works most of the time'

> I still think the possibility of using a second core to make the mode
> switches faster may still be worth investigating, though.
as said above, mode switches are a few hundred thousands per second.

> Even when you
> don't have to mess with the A20 line, the mode switches are still
> noticeably slow when you need to do a lot of them in the background.

do you really need an entire core for this USB driver ?
one core exactly and only for your driver ?
or do you intend to run a real operating system on the second core, with
an API how multiple drivers can share this core, how they synchronize with the real mode part, ...

I don't expect to see this implemented this century ;)

bretjohn(R)

Homepage E-mail

Rio Rancho, NM,
19.09.2011, 23:19

@ tom
 

Using Multiple CPU Cores in DOS?

> wrong. A20 will be enabled almost always, and only a few microseconds after
> program has started; you won't notice a difference

IRQ's don't stop just because a "real" program starts running. Unless I'm missing something, it doesn't make sense to have IRQ's execute quickly at a command prompt and slowly while a program is running. At least that seems to me to be what you're suggesting.

> changing system state at some randowm interrupt time might be one of the
> worst ideas in operating system design ever - even worse then 'my driver
> needs a second core'

That's not what I'm suggesting. I'm saying to leave A20 on all the time in case background processes (TSR's and Device Drivers) need access to extended memory (not necessarily XMS memory). I think what you're saying is to leave A20 disabled while the system is not at a command prompt, and for TSR's that need it to be turned on to repeatedly (and slowly) toggle it on and off while processing their IRQ's when a "real" program is running.

> have a user selectable option to do it the fast way

I think the default should be the fast way, since almost nobody needs it to be the slow way.

> now randomly enabling A20 at some randowm interrupt time removes the
> predictability (of failure), and instead introduces the feature 'works most
> of the time'

Again, that's not what I'm suggesting -- though it seems to be what you're suggesting. We're apparently not communicating very effectively.

> as said above, mode switches are a few hundred thousands per second.

It could be, but I still think it might be possible to improve it considerably.

> do you really need an entire core for this USB driver ?
> one core exactly and only for your driver ?

No, that's not what I'm suggesting at all.

> or do you intend to run a real operating system on the second core, with
> an API how multiple drivers can share this core, how they synchronize with
> the real mode part, ...

I wasn't making myself clear. I was thinking of something like implementing this with INT 15.89 and/or VCPI, which would in turn affect DPMI, DPMS, and other "high level" environments that rely on those services for PM. I think this could be done transparently so that the "high level" environments wouldn't even need to know which core they were running on -- everything would (at least hopefully) just be faster. This wouldn't benefit programs that manipulate CR0 themselves.

tom(R)

Homepage

Germany,
20.09.2011, 12:02

@ bretjohn
 

Using Multiple CPU Cores in DOS?

> > wrong. A20 will be enabled almost always, and only a few microseconds
> after
> > program has started; you won't notice a difference
>
> IRQ's don't stop just because a "real" program starts running. Unless I'm
> missing something, it doesn't make sense to have IRQ's execute quickly at a
> command prompt and slowly while a program is running. At least that seems
> to me to be what you're suggesting.

seems my english writing skills need refinement. retrying:


A20 will be enabled by each INT 21, (usually DOS lives in memory above 1M), and left enabled.
A20 is disabled when when a program is started.
as most programs call int 21 sooner or later, A20 is reenabled again by int 21, and left enabled.

therefore it's fast, almost all time.

DOS386(R)

20.09.2011, 12:24

@ tom
 

Using Multiple CPU Cores in DOS?

tom wrote:

> therefore it's fast, almost all time.

What is fast ? INT $21 ?

tom wrote:

> the BIOS doesn't care about A20. unexpected things would
> happen if int15.87 is called with A20 disabled.

but

http://www.ctyme.com/intr/rb-1527.htm wrote:

> 03h address line 20 gating failed

:confused:

What about accessing MMIO (all addresses having A20 bit = ZERO) without "fiddling" A20 ?

---
This is a LOGITECH mouse driver, but some software expect here
the following string:*** This is Copyright 1983 Microsoft ***

bretjohn(R)

Homepage E-mail

Rio Rancho, NM,
20.09.2011, 18:23

@ tom
 

Using Multiple CPU Cores in DOS?

> A20 will be enabled by each INT 21, (usually DOS lives in memory above 1M),
> and left enabled.
> A20 is disabled when when a program is started.
> as most programs call int 21 sooner or later, A20 is reenabled again by int
> 21, and left enabled.
>
> therefore it's fast, almost all time.

I can see where somebody may want this as an option, but it definitely shouldn't be the default. This is potentially penalizing (performance-wise) the VAST majority of programs that work like they're supposed to, and catering to a few rare, buggy programs that have another option (LOADFIX) available to them anyway.

tom(R)

Homepage

Germany,
20.09.2011, 20:17
(edited by tom, 21.09.2011, 12:43)

@ bretjohn
 

Using Multiple CPU Cores in DOS?

> > A20 will be enabled by each INT 21, (usually DOS lives in memory above
> 1M),
> > and left enabled.
> > A20 is disabled when when a program is started.
> > as most programs call int 21 sooner or later, A20 is reenabled again by
> int
> > 21, and left enabled.
> >
> > therefore it's fast, almost all time.
>
> I can see where somebody may want this as an option, but it definitely
> shouldn't be the default.
it IS the default, and it costs NOTHING.

A20 is enabled ALL THE TIME except a few microseconds..


btw: in protected mode, A20 switching is implemented by leaving A20 enabled all the time, but on Enable/DisableA20 the page tables are modified so that
the 'visible' memory at ffff:10.. ffff:ffff is either physical memory 000000 or 00100000

> This is potentially penalizing
> (performance-wise) the VAST majority of programs that work like they're
> supposed to, and catering to a few rare, buggy programs that have another
> option (LOADFIX) available to them anyway.
btw: this is penalizing the poor *users*, not the programs. the programs don't care

RayeR(R)

Homepage

CZ,
19.09.2011, 15:01

@ tom
 

Using Multiple CPU Cores in DOS?

> > > I think that A20 is fiddled only when acessing HMA so it's useless for
> > > reaching MMIO at higher address.
> wrong. A20 must be fiddled with for every access above 1M.

Yes but I though that when accessing all mem >1MB in PM it use some native mode where all CPU address pins go directly to memory and A20 hack logic is disabled. I belived this is used only in real mode to access HMA. But I may be wrong.

---
DOS gives me freedom to unlimited HW access.

bretjohn(R)

Homepage E-mail

Rio Rancho, NM,
19.09.2011, 20:25

@ RayeR
 

Using Multiple CPU Cores in DOS?

> Yes but I though that when accessing all mem >1MB in PM it use some native
> mode where all CPU address pins go directly to memory and A20 hack logic is
> disabled. I belived this is used only in real mode to access HMA. But I may
> be wrong.

Since I hadn't messed directly with PM before (all I had ever done was set up some Descriptor Table entries), I thought that as well. But now that I've done some testing, I know it's not true. A20 has to be enabled to access memory above 1 MB, even if the CPU is in PM. I tried it with A20 disabled -- it didn't crash, but the data was garbage.

RayeR(R)

Homepage

CZ,
20.09.2011, 03:14

@ bretjohn
 

Using Multiple CPU Cores in DOS?

> Since I hadn't messed directly with PM before (all I had ever done was set
> up some Descriptor Table entries), I thought that as well. But now that
> I've done some testing, I know it's not true. A20 has to be enabled to
> access memory above 1 MB, even if the CPU is in PM. I tried it with A20
> disabled -- it didn't crash, but the data was garbage.

Aha, thanks for explanation. So when going to PM it's needed to enable A20 logic and it's leaved enabled for whole time it runs in PM so it doesn't cause performance loss. But when need to switch to RM A20 logic is disabled (but it seems it's not necessary for newer programs) and it takes extra time (enabling too). If I remember well A20 was controlled via KBC chip through some IO port on old PC so it was really slow (IO cycle went to KBC through slow ISA bus). On modern PC it should be faster but it's legacy stuff so maybe there's little attention on good performance now...

---
DOS gives me freedom to unlimited HW access.

DOS386(R)

13.09.2011, 07:11
(edited by DOS386, 13.09.2011, 07:23)

@ bretjohn
 

Using Multiple CPU Cores in DOS?

> why not have one CPU/Core running in RM/V86 and another running in PM

IIRC the problem we discussed 2 years ago was that you were using INT $15 / $87 incorrectly, resulting in a 16-Byte's limit instead of 64 KiB, plus redundant mapping code. This problem is not fixed in the latest version 2010-Jan-30.

Above in many posts I see many ideas (use multiple cores, use some special DPMI host, use DPMS, use virtualization) ... but sorry I don't like any of them. Why? Because they just introduce unnecessary requirements. One day we could see:

> Great USB DOS driver by Bret
> Supports UHCI OHCI EHCI XHCI
> needs virtualization capable CPU with a least 2 cores otherwise only UHCI will work

So my suggestions:

* Try to reduce the number of mode switches
* Get it working now, get it working fast later

PS: mode switches are inherently evil ... see RayeR's DGJPP issues or FreeBASIC graphics issues :-(

PPSS: Georg's DOSUSB can be pretty fast with EHCI ... unfortunately there are stability issues, probably related also to interrupt hazards and mode switches :-(

> just saying, doing everything in pure DOS is infeasible

So the final DEATH of DOS ain't dead is imminent ???

You don't have any alternative BTW (I don't need several 10 GiB of Loonix/Win8/VirtualBloat/OSama2/... crap).

PS: A HACK :

* switch to PM (quasi permanently)
* implement USB driver + HD driver + FAT(crap) + file manager
* copying occurs in PM only
* when done, fix BIOS clock and return to RM and DOS

This is not a driver, but allows to use USB storage devices from DOS.

---
This is a LOGITECH mouse driver, but some software expect here
the following string:*** This is Copyright 1983 Microsoft ***

bretjohn(R)

Homepage E-mail

Rio Rancho, NM,
13.09.2011, 19:14

@ DOS386
 

Using Multiple CPU Cores in DOS?

> IIRC the problem we discussed 2 years ago was that you were using INT $15 /
> $87 incorrectly, resulting in a 16-Byte's limit instead of 64 KiB,
> plus redundant mapping code. This problem is not fixed in the latest
> version 2010-Jan-30.

This is fixed in what I'm testing with now. Even if it weren't, though, it doesn't affect the HUGE relative disparity in switching speed between RM and V86 (a factor of 25~50 on my test machines), nor the fact that it's still relatively slow even from V86 mode. The only way to circumvent this is to not switch, or to increase the switching speed.

> Above in many posts I see many ideas (use multiple cores, use some special
> DPMI host, use DPMS, use virtualization) ... but sorry I don't like any of
> them. Why? Because they just introduce unnecessary requirements.

These things aren't necessary for things to work at all, they just may be necessary for things to work fast enough to be very useful.

> * Try to reduce the number of mode switches
> * Get it working now, get it working fast later

Exactly what I'm doing, and why I already stated that I would not be looking into the multi-CPU/Core option myself, at least any time soon. If it's possible to do it, though, it could help a LOT of programs, not just mine.

> PS: mode switches are inherently evil ... see RayeR's DGJPP issues or
> FreeBASIC graphics issues :-(

Perhaps "evil", but unfortunately necessary with today's systems. Modern hardware specs (ACPI, SATA, MMIO, fast USB, ...) are designed to be run from PM (or SMM, in some cases), not RM. It takes a lot of "hacking" and compromises to get it to work from RM.

Life would certainly be a lot easier if we just fell in line with the rest of the sheep, blew off DOS altogether, and just went exclusively to Windows or Linux. But I'm not going there.

> This is not a driver, but allows to use USB storage devices from DOS.

There's a lot more to USB and DOS than storage devices, unfortunately.

FFK(R)

Homepage

18.09.2011, 04:44

@ DOS386
 

Using Multiple CPU Cores in DOS?

>
> Above in many posts I see many ideas (use multiple cores, use some special
> DPMI host, use DPMS, use virtualization) ...

I guess that the best choice is a standarised DPMI 1.1 Multi-cores, Multi-thread extension,

> but sorry I don't like any of
> them. Why? Because they just introduce unnecessary requirements.

Multi-cores should be very useful for a modern graphic library,
For example, I can split the video screen in two. then I make two instances of DUGL, each one render on it's half screen using a separated CPU core. This mean that we can render twice faster than a single core !
I can for example do all 3D transformation inside a CPU core, and do the rendering using another CPU core.

RayeR(R)

Homepage

CZ,
19.09.2011, 14:50

@ FFK
 

Using Multiple CPU Cores in DOS?

> I guess that the best choice is a standarised DPMI 1.1 Multi-cores,
> Multi-thread extension,

DPMI 1.1? Where? Or do you mean to create DMPI 1.1 standard by yourself...

---
DOS gives me freedom to unlimited HW access.

FFK(R)

Homepage

21.09.2011, 01:30

@ RayeR
 

Using Multiple CPU Cores in DOS?

> > I guess that the best choice is a standarised DPMI 1.1 Multi-cores,
> > Multi-thread extension,
>
> DPMI 1.1? Where? Or do you mean to create DMPI 1.1 standard by yourself...

I do mean that DOS community can specify and implement on new extension 1.1 of DPMI, adding support for multi-core / multi-thread / hyper-threading / PAE ...
I don't have enough knowledge to implement this by my self, but I will be a happy user if for example japheth implement this for us :-)

DOS386(R)

20.09.2011, 09:10

@ bretjohn
 

USBDOS | A20-BUG | no need for 1'000'000'000 cores

> I did some experimenting today on one of my computers, and it
> looks like toggling the A20 line is the sole culprit.

Voila. Problem gone. No need for multi-core "technology" :-)

> statements that vary from (paraphrasing), "There are a lot of programs that depend
> on A20 being disabled so you shouldn't EVER leave it enabled,"

No right to exist for those. ;-)

> "There are only a few poorly written 8086-era programs that depend on A20 being
> disabled, and nobody uses them any more."

Right.

> but just wondered what others feel/know about the real situation

See above.

> I guess that the best choice is a standarised DPMI 1.1 Multi-cores, Multi-thread extension

Right. Considering that not a single useful DOS standard has been established during last 20 years. Whenever the need for a standard is mentioned in some forum, people start digging inside >= 20 years old problems private to Macro$oft and soon also making exciting "discoveries" :clap:

> > > I think that A20 is fiddled only when acessing HMA so it's useless for
> > > reaching MMIO at higher address.
> wrong. A20 must be fiddled with for every access above 1M.

Are you sure ??? Are all lines >= A20 disabled or only A20 ??? Because if only A20 is dead, then you can perfectly use PM and access MMIO ... as long as you access only addresses having the a20 bit ZERO. This should be true even for a VESA LFB 640x480x24bpp, as the VESA LFB is always aligned to an integer multiple of 64 MiB and for this mode smaller than 1 MiB. This should be also possible for OHCI/EHCI, assuming the MMIO size is below 1 MiB and also sufficiently aligned (2 MiB at least). Untested.

> BUT early versions of Microsoft LINK /EXEPACK and early versions of PKLITE
> (pre 1991) had a bug, and relied on A20 being disabled.

:confused:

> and my HIMEM's behaviour was 'enable A20 and leave it in this state'.

:-)

> I still think the possibility of using a second core to make the mode switches faster

I don't think so. Huge cost for probably no benefit. And I ASS'ume the A20-BUG affects both cores the same way ?

> changing system state at some randowm interrupt time might be one of the worst
> ideas in operating system design ever - even worse then 'my driver needs a second core'

Really ??? AFAIK mode switches inside IRQ are a "standard" DPMI aproach. NOT amused, either.

> now randomly enabling A20 at some randowm interrupt time removes the predictability

Leave A20-line ON and A20-BUG OFF, point.

> If I remember well A20 was controlled via KBC chip through some IO port on old PC

most PC ... and old KBC (+PIT) ports are HELL SLOW (cca 1000 cycles per access) ... not to talk about "wait until KBC ready" loops :-(

Solution ideas:

- Georg's driver has the /R option to completely avoid PCI MMIO and INT $15 / $87.

- Try to ignore A20 and access addresses with a20 line ZERO (sufficient?).

- Execute SMSW, if V86, then INT $15 / $87, if RM, then INC CR0 - completely avoid BIOS.

- Ensure A20-line is always ON while your driver is active (how?).

---
This is a LOGITECH mouse driver, but some software expect here
the following string:*** This is Copyright 1983 Microsoft ***

bretjohn(R)

Homepage E-mail

Rio Rancho, NM,
20.09.2011, 19:34

@ DOS386
 

USBDOS | A20-BUG | no need for 1'000'000'000 cores

> Voila. Problem gone. No need for multi-core "technology" :-)

Problem improved, but not gone. Mode switches are still too slow, IMO. What I'm suggesting only needs two cores, not a billion, and I think could possibly improve mode switching times by a factor of at least 10 (and likely way more than that). It doesn't involve any changes to the single-threaded nature of DOS, either. It just involves modifications to implementations of things like VCPI.

> I don't think so. Huge cost for probably no benefit.

It may not even be possible, or may make things worse instead of better, or may provide no benefit at all. Or it may in fact decrease switching times to virtually zero. You don't know until you actually try, now, do you?

> most PC ... and old KBC (+PIT) ports are HELL SLOW (cca 1000 cycles per
> access) ... not to talk about "wait until KBC ready" loops :-(

The "Fast" A20 switch (Port 92h) seems to be MUCH faster than using the KBC, but is still pretty slow and not universally available. Best not to mess with A20 at all if you can avoid it.

> - Georg's driver has the /R option to completely avoid PCI MMIO and INT $15
> / $87.

It's impossible to avoid PCI MMIO -- that's the only way OHCI and EHCI can work. BTW, one of the first things I tried to do (a long time ago when I first started working on OHCI) was to try and move the MMIO space down into the first 1 MB. I couldn't get it to work on my test system at the time, so abandoned the idea.

MMIO is also HORRIBLY inefficient from a memory resource perspective, so moving it down into the first MB is not desirable. As a comparison, each UHCI controller reserves 32 bytes of PIO space, and only in fact uses 17 of those bytes. OHCI and EHCI use MMIO, and MMIO must be allocated in multiples of 4k memory pages, even if only a few bytes are actually used. Unnecessarily allocating 4k memory pages out of the first MB is not a good use of limited resources.

I may re-visit this again later, though, as a way to speed things up.

> - Try to ignore A20 and access addresses with a20 line ZERO (sufficient?).

Have you experimented with this? It seems to be that A20 really affects A20+, not just A20. Moving MMIO space around after the BIOS has already configured (if that's required to align it properly) is not very practical, for a number of reasons.

> - Execute SMSW, if V86, then INT $15 / $87, if RM, then INC CR0 -
> completely avoid BIOS.

That's what HIMEMX does, and what I do now, since I know now that I can't trust the BIOS. That doesn't eliminate the A20 issue, though.

> - Ensure A20-line is always ON while your driver is active (how?).

Best you can do is turn it back on when you see it turned off. Anyway, I think the only programs that would even try to turn it off would be Memory Managers / Kernels that work like Tom is suggesting (Disable on an INT 21h EXEC and Enable on any other INT 21h), or a buggy BIOS (INT 15.87 or similar).

RayeR(R)

Homepage

CZ,
23.09.2011, 02:27

@ bretjohn
 

Using Multiple CPU Cores in DOS?

Today I was on a beer with my friend Rudolf, coreboot developer, and I ask him about multicore usage. He told me it's not hard to use it. System starts with one core enabled and other cores are halted (but they are initialized during POST, e.g. MTRRs are set). You need to program APIC to send a message to second core to start up. You specify an address where the 2nd core will start executing code. It will start in realmode so the address must be <1MB. The APIC is controlled via MMIO acessing some FFxxxxxx registers near top of address range so you need be in pmode or use INT service to access this high address. I don't know anything how APIC works so we didn't discuss it further. He pointed me to sources of SeaBIOS - one of the coreboot possible payload, that there is an example of probing multiple cores and APIC programming. Executing an ISR on 2nd core is possible but it's more complex. It needs to disable legacy PIC and reprogram APIC/LAPIC, I don't remember exactly...
So if anybody have a time and interested, go ahead, start to study how to deal with APIC...

---
DOS gives me freedom to unlimited HW access.

Back to index page
Thread view  Board view
15112 Postings in 1359 Threads, 247 registered users, 13 users online (0 registered, 13 guests)
DOS ain't dead | Admin contact
RSS Feed
powered by my little forum