Back to home page

DOS ain't dead

Forum index page

Log in | Register

Back to the forum
Board view  Mix view

mTCP: A Unicode enabled IRCjr is available for testing (Announce)

posted by bretjohn Homepage E-mail, Rio Rancho, NM, 06.02.2023, 17:23

> Unfortunately, I'm looking for something very different with Telnet and
> IRC. Strict mappings from the code page characters to Unicode are
> published, and those table files require a strict mapping. I need
> something more relaxed ... a lot of Unicode code points have reasonably
> close substitutes available. For example, there are at least two Unicode
> "black diamond" characters that I know of, with just slightly different
> shapes. I map both of those to the character 0x04, which is a diamond, and
> that's close enough for display purposes. There are a lot of variations of
> line drawing characters with different line weights that can be represented
> by the standard line drawing characters, so I map those too. I think for
> 128 different possible code points I have over 300 Unicode characters
> mapping to them.
>
> Technically what I'm doing is not correct, but I'd rather see a black
> diamond or a line drawing character of some sort rather than the standard
> "I can't display this glyph" tofu character. Which is also why I went with
> a text file to allow users to define their own mappings; they can be as
> strict or as sloppy as they want.
>
> I can use the published tables as a starting point, but I suspect for
> display purposes people will want to see the additional mappings.

FWIW, I have a similar philosophy. I created a program called UNI2ASCI and it's included as part of my USB drivers. The strings downloaded from USB devices are stored as UniCode and I wanted a way to display the strings even if they weren't "legitimate" ASCII. It took me quite a while to do, but I scanned through all the UniCode characters and mapped as many as I could reasonably do into _something_ that could be displayed on a "normal" DOS screen. For example, I found 20 different UniCode characters that (in my opinion) looked enough like a "2" that I mapped them that way. There are also some UniCode characters that I map as multiple ASCII characters (e.g., I map the Copyright symbol as "(C)"). I treat UNI2ASCI sort of as a "subroutine" that the other USB programs call when they want to display a UniCode string.

I see two major differences between what I did with UNI2ASCI and what I think you're trying to do, though. The first is that UNI2ASCI currently only supports Code Page 437. It takes a LOT of work to do a fairly "complete" UniCode mapping of the upper half of a Code Page, so I only did one. The other major difference I see is that I did not map any of the control characters (ASCII < 32) like the diamond character you mention. While those characters _can_ be displayed on the screen, when you try to bring them from the screen into a file or printer them or send them across a serial link you can have all kinds of problems.

You can download the source code for UNI2ASCI (it's included in the USB Source Code) from my web site:

http://brejohnson.us

It's in A86 format. Eventually, in my copious ;-) spare time I hope to convert all my programs to NASM format in addition to updating them with various items.

 

Complete thread:

Back to the forum
Board view  Mix view
22632 Postings in 2109 Threads, 402 registered users, 460 users online (0 registered, 460 guests)
DOS ain't dead | Admin contact
RSS Feed
powered by my little forum