|Free Software for DOS|
Text Utilities 4
Spellers, Dictionaries, Text Analysis, Characters
|21 Aug 2006|
|Go back to Front Page Menus|
|Go to top of Text Utils 1|
|Go to top of Text Utils 2|
|Go to top of Text Utils 3|
|Go to top of Text Utils 5|
|This page:||ASCII TEXT SPELLCHECKERS|
|WORD LISTS AND DICTIONARIES|
|WORD COUNT & TEXT ANALYSIS
||ASCII CHARTS||CHARACTER TRANSLATION AND STRIPPING||Page 1:||GENERAL TEXT VIEWERS||SMALL / TINY TEXT VIEWERS||TSR (POPUP) TEXT VIEWERS||TEXT VIEWERS FOR PROGRAMMERS||UNIX ||COMPILE TEXT TO EXE||Page 2:||PROCESS, FORMAT, FILTER PLAIN TEXT||FILE SORTING||DUPLICATE-LINE FILTERS||TEXT JUSTIFY||Page 3:||SEARCH AND REPLACE||sed stream editor||SEARCH ONLY||grep global regular expression print||LINE KILL / REPLACE||FILE COMPARE / DIFFERENCE||Page 5:||FILE FORMAT CONVERSION||UNIX < > DOS||OTHER CONVERSIONS||POSTSCRIPT AND PDF: View, print, convert|
|ASCII TEXT SPELLCHECKERS|
International Ispell (1) Interactive text and HTML spell checker.
[added 1998-07-03, updated 2005-12-09]
Ispell, an interactive spell checker developed for Unix platforms, can be used as a standalone program or as an external checker for many power editors. This version includes English dictionaries (UK & US), and runs in text or HTML mode. 32-bit DJGPP build, requires 80386+ and a DOS Protected Mode Interface (CWSDPMI or other).From the program help:
Whenever a word is found that is not in the dictionary, it is printed on the first line of the screen. If the dictionary contains any similar words, they are listed with a number next to each one. You have the option of replacing the word completely, or choosing one of the suggested words. Commands are: R Replace the misspelled word completely. Space Accept the word this time only. A Accept the word for the rest of this session. I Accept the word, and put it in your private dictionary. U Accept and add lowercase version to private dictionary. 0-n Replace with one of the suggested words. L Look up words in system dictionary. X Write the rest of this file, ignoring misspellings, and start next file. Q Quit immediately. Asks for confirmation. Leaves file unchanged. ! Shell escape. ^L Redraw screen. ^Z Suspend program. ? Show this help screen.
Authors: Geoff Kuenning et al. Port by Eli Zaretskii, Israel (2001).
Geoff Kuenning's International Ispell Home Page.
International Ispell (2) Interactive spell checker, supports 8-bit characters.
[added 1998-04-06, updated 2005-04-16]
This EMX/gcc-compiled DOS & OS/2 port minimally requires a 386 PC, but I'd recommend a fast 486 or Pentium with at least 8MB RAM and a disk cache. The package is a very large download, containing executables, source, and multiple language dictionaries (Dutch, English, French and German). The compiled English dictionary requires about 4.7MB disk space (contains at least 210,000 unique words including many technical and scientific terms). Supports 8-bit characters. Supports maintenance of a user ("private") dictionary, which by default is stored in the root directory with the filename _english. All in all, I like the comprehensiveness and "intelligence" of this ISPELL. The program itself loads slowly on a Pentium 60 (w/ 8MB RAM), and is much too slow on a 386/20 (8MB). Requires ANSI.SYS or equivalent, and DOS extender (included). I wouldn't waste time downloading this package unless you're willing to invest a _little_ time with setup. Package includes C source code.
Core commands are same as for v3.3, above.
Authors: Geoff Kuenning et al. (1983-1997). Port by Piet Tutelaers, Netherlands (1997).
Download ispellw32.zip (2.5MB).
Geoff Kuenning's International Ispell Home Page.
GNU ispell Interactive spell checker, runs well on older PCs.
[added 1998-04-06, updated 2005-04-16]
This old, but widely distributed 16-bit ispell includes only an English dictionary (38,000 words / 156K on disk). Run the program without parameters to check a single word, or pass it a filespec and it will enter a line-by-line interactive check / correction mode. It can check multiple files in sequence if you pass it a wildcarded filespec. The package lacks usage documentation (but see Downloads, below) and unless you're familiar with ispell, you could end up frustrated. Just hit the "?" key when inside the program (or start with
ispell ?) to get the list of navigation commands. Easy to use. I'm sure there are additional hidden features, but I haven't used it much. Runs briskly enough on a 386/20.
Commands are: R Replace the misspelled word completely. Space Accept the word this time only A Accept the word for the rest of this file. I Accept the word, and put it in your private dictionary. 0-9 Replace with one of the suggested words. <NL> Recompute near misses. Use this if you interrupted the near miss generator, and you want it to return to this word. Q Write the rest of this file, ignoring misspellings, and start next file. X Exit immediately. Asks for confirmation y/n. Leaves file unchanged. ! Shell escape. ^L Redraw screen.
To exit single-word mode, type ^C. Package includes the Look utility.
Capabilities which are absent in GNU ispell vs Internatiional Ispell: GNU's is not case sensitive, suffix handling is more primitive and it won't allow non-alphabetical characters into the dictionary.
Authors: Pace Willisson (1988). Port by Pavel Ganelin (1993).
1993-10-26: v4.0 (despite the version number, this is older than the Unix-based versions 3.x).
|Source, full docs||ispell-4.0.tar.gz||(379K)|
JSPELL Excellent interactive spell checker (English dictionary).
* * * * *
[added 1998-09-17, updated 1998-10-25]
When considering both ease-of-use and versatility, you won't find a better choice than JSPELL. Note: JSPELL may not run on some faster Pentiums (divide overflow error) use SLOWDOWN to avoid the error.
Author: Joohee Jeong (1998). Suggested by Robert Bull, Scott Nesbitt.
Download jspel211.zip (209K).
SpellTest Spell checker for plain or html text; interactive mode or file report (English dictionary).
[added 1999-04-18, updated 2006-03-14]
This speller could be particularly useful to web authors because it ignores HTML codes in documents during a spell check. SpellTest can run in two modes: 1. A simple interactive mode allows manual replace of unknown terms but has no features like "ignore all" or "add to custom dic"; 2. SpellTest probably functions best as a report-to-file speller. Reported terms are referenced by original document line numbers. No limit on text file sizes. Includes a large 2MB dictionary and user dictionaries are supported. Requires a fast 80386 (80486-100MHz recommended), and about 2MB RAM (4MB recommended).
Usage : spelltst.exe <file> <options> Options: -r:<report name> , by default report.txt -n Dont load addishional dictionaries. -o Online error fixing. (Ascii text files only). -nr Dont create a report file.
Author: Oleg Stepanyuk / Oddin Software, Russia (1999).
Download spelltst.zip (972K).
GDSPELL Interactive spell checker handles big files. (English dictionary)
* * *
GDSPELL is an easy to use standalone spell checker from the developers of the freeware NE editor (also included here). Both programs use the same dictionary, so you don't need to clutter your hard disk with different dictionaries. Unlike NE, GDSPELL can check big files, and create and use a custom dictionary. Spell checking dialog is similar to those found in popular word processors.Limitations:
EXE size: 55K; Dictionary size: 370K
(Thanks to Yves Bellefeuille's freeware list for pointing me to this one).
Author: G.D. Davis (1995); distributed by GDSoft.
Download gdsp300b.zip (414K).
Tschek Spell checker outputs list of all misspelled words to screen or file.
* * *
Most of us use word processors or stand-alone dialog spell checkers to perform "on-the-fly" spell checking and correction (e.g., GDSpell). But sometimes these spell checkers can be cumbersome and time consuming because they prompt word by word. If you are spell checking an HTML or technical document with a "dumb" spell checker, this can be tedious. Of course, you could add all the those strange words or tags to a user dictionary, but that's no fun either. Or, you could use a spell checker that simply outputs a list of unrecognized words to a file without any prompting or correction. You can browse the output file, quickly locate words that are obvious typos, and manually correct the original document (e.g., using a search / replace tool).Features:
SPELL DictionaryFile InputFile [OutputFile]
BIGSPELL.BAT rem USAGE: bigspell any.txt @echo off spell 1.dic %1 1.tmp /b spell 2.dic 1.tmp 2.tmp /b spell 3.dic 2.tmp 3.tmp /b spell 4.dic 3.tmp 4.tmp /b spell 5.dic 4.tmp 5.tmp /b spell 6.dic 5.tmp misspell.txt /b del *.tmp echo Spell check complete. See MISSPELL.TXT
Author: Timo Salmi, Finland (1996).
Download tschek15.zip (68K).
More in these pages from Timo Salmi.
Look Look up words (from a word list) to verify spelling.
[added 2001-10-21, updated 2005-04-16]
Look is not a spell checker but rather lists words from a word list file that most closely match a string (i.e., useful for looking up an uncertain spelling). Look is included in some ISPELL distributions but here it is listed separately to bring more attention to it.
Look.exe appears to work a lot like grep (in fact, it requires grep/egrep/fgrep for -r option). However, it has certain conveniences for looking up words in a spelling list. With no options, look searches a word list file for all words that start with the first characters of the string you give it. Options allow it to ignore caps or small letters, use bona file regular expression wildcards, and use dictionary order. Look is meant to be used within editors like vi that allow you to run external programs. It can also be used on the command line.
Look appears to be happy using any ASCII spelling list, such as SIL's Word List, or Moby Words (users can add, remove, or modify words in such lists with any text editor). By default, look uses a word list named ISPELL.WOR (included), but you can supply a different file as an option.
usage: look [-dfr] string [file] -d dictionary order: consider only letters, digits, and spaces -f fold upper case to lower -r string is a regular expression
Note: Use of regular expression switch -r requires the programs grep/egrep/fgrep (not included) in path.
Suggested by Howard Schwartz.
Look.exe is part of the GNU ispell binary and source packages, above.
|WORD LISTS AND DICTIONARIES|
Moby Words English word, name, and phrase lists; 610,000+ entries (ASCII).
* * * * *
[added 2001-10-21, updated 2004-06-29]
Moby Words is part of the Moby Project, a large collection of lists of words and phrases, and works of literature (contents are now in the public domain).Partial contents of Moby Words:
Author: Grady Ward (1996).
Get more info at the Moby Words page.
If you don't like a 26MB download, go to the Moby Project page for smaller pieces.
SIL's Word List ASCII English, 110,000 words, can function as dictionary for some spellers.
Four text files contain approximately 110,000 English words total. The set can be used as a large dictionary for spellers that can use ASCII-only dictionaries. See Tschek for an example.From the doc:
This word list includes inflected forms, such as plural nouns and the -s, -ed and -ing forms of verbs. Thus the number of lexical stems represented in the list is considerably smaller than the total number of words.
Author: Evan Antworth / SIL International (1991).
Jorj English dictionary program.
* * *
Jorj is a stand-alone dictionary program, with two EXE variants (compiled for different memory usage) in one package. Jorj can be run in memory resident (pop-up) or non-memory resident modes. One of the provided executables ("Omega") will use XMS memory when the program is run as a TSR.
One unique feature of Jorj is its ability to search for entries even when your spelling is incorrect. Jorj also has a "word scan" feature that will list all entries containing a given search string. The lexicon has some significant drawbacks. The word list is small but adequate (larger in registered version) and definitions are brief and not authoritative. Words are syllabified, but parts of speech are lacking. Even with these shortcomings, JORJ still serves as a handy reference.
EXE size: 35K (alpha) or 64K (omega). Dictionary size: 1.2MB.
Author: George Fredal / Jorj Software (1997).
Download jorj97.zip (652K).
|WORD COUNT & TEXT ANALYSIS|
Also see UXUTL or the GNU Textutils for UNIXish WC.
WCNT Count and analyze word frequency in text and HTML documents.
* * * *
One of the more comprehensive "word count" programs I've encountered. It includes a host of options: Can analyze HTML documents (ignores tags in word counts). Count of lines, characters, non-whitespace characters, words, distinct words and unique words. Average length of words, distinct words and unique words. Sorted word lists with frequencies. Word length distribution histograms. Configurable word sets. DOS code page awareness. Multiple filespecs with wildcards: Outputs combined statistics of all files when passed a filespec with wildcards. Donationware.
Author: Branko Radovanovic, Croatia (1997).
Download wcnt120.zip (20K).
wc Simple word count program, from Unix.
* * *
A DOS clone of the Unix wc utility with some added features. Unlike WCNT (above), wc: 1) lists individual file stats when passed a filespec with wildcards; 2) can read from standard input as well as from files. wc also generates error level values for use in batch files.
Author: Roman Nurilov (1997).
Download wc_11.zip (10K).
Word Count (WC) Word counter also counts sentences, calculates readability index.
* * *
Another word count program that can optionally count sentences and generate a rough and ready "readability" index based on a combination of word length and sentence length. Can read from standard input as well as from files.
Author: Bob Ferguson, Netherlands (2000).
Download wc24.zip (15K).
More in these pages from Bob Ferguson.
Text Information (TI) Comprehensive text file statistics generator.
This program generates a wealth of statistics about a text file including size of file, whitespace, lines, blank lines, shortest line, longest line, average line length, average with blanks, number of pages, lines per page. Expects a single filename.
Options: -A# Display how many times each letter is used. the '#' represents an optional character to be counted specifically. -C# Same as '-A' switch, except # is the numerical ASCII value of the character to be counted specifically (A = 65 or 97, @ = 64). -F# Treat words > # characters as long when calculating fog index * # must be included in -F switch (default 9 if no -F switch) -L# Display length of each line in the file. # is an optional number specifying how long a 'long' line is, in characters -O:name Prints output to both the screen and an output file, the name of the output file is optional - the input name with .ti extension will be used if no name is given. 'ti input.txt > output.txt' will not work. -P# The number of lines per page for calculating number of pages. -T Assume non-text file, changes handling of ascii values 128-255. -W Display a list of word lengths used in the file. -? This help screen.
Author: Quentin J. Christensen, Australia (1999).
Download ti12.zip (30K).
CTRLALT TSR pops up ASCII charts, Hex table, ANSI codes; Mark/paste; more.
* * * *
This may be the oldest DOS program I still use. CtrlAlt was released in 1986, but still serves a useful purpose if you're a DOS lover. Using a variety of mnemonic key combinations, you can pop up all sorts of charts from which you can paste special characters into a document. Includes ASCII, Hex, and ANSI code charts, key scan codes, line drawing characters. Can also mark and copy screen text. Includes more exotic stuff as well. Note: CTRLALT cannot unload itself from memory. This is one program that requires a thorough reading of the documentation there are no help screens.
Authors: Barry Simon and Richard M. Wilson (1986).
Download ctrlalt.zip (53K).
ASCIITable TSR, mouse compatible ASCII chart.
[added 1998-04-28, updated 2005-07-01]
A nicely enhanced TSR ASCII chart symbol picker; especially useful in editors. Allows creation of a multiple-character string on the chart's editing line (up to 255 symbols). Requires about 4.1K RAM with a default 128 character buffer. README.EXE displays in English and Russian (has Borland Pascal 7 CRT bug see fixes).Other features:
Also in download package: FontGrab v2.1b (font dumper utility), and Int9 v2.0b (reads and displays keyboard scancodes).
Author: Dmitry B. Afanasiev, Russia (1996).
Download ascii350.zip (34K).
Terminate Character Selector (T-CHAR) Non-TSR ASCII chart.
[added 1998-04-28, updated 2005-07-01]
A slick non-TSR ASCII chart which displays ASCII, HEX, and BIN values of selected characters. Returns all values for selected character. Originally part of the Terminate communications software package.
Author: Bo Bendtsen, Denmark (1994). Suggested by Robert Bull.
Download t-char.zip (16K).
|CHARACTER TRANSLATION & STRIPPING|
Also see CONVERT UNIX < > DOS FORMATS.
FIXTEXT Character translation DOS / MAC / UNIX conversion & more.
* * *
FIXTEXT is a command line utility that performs two general functions. It can convert among DOS, UNIX and MAC text formats. It can also translate (replace) characters or strings within a text. For example, it can convert uppercase letters to lower case letters or it can convert ASCII characters to their ANSI (Windows) equivalents. There is only one hitch to translation: The user must write and maintain separate translation tables. While this permits a great deal of flexibility and customization, it also requires effort to create and edit the translation tables (not difficult, but time consuming). A few specific operations such as trimming leading / trailing line spaces and expanding tabs are hard coded as command line switches.
Author: Bruce Guthrie (2002).
Download FIXT0208.ZIP (79K).
More in these pages from Bruce Guthrie.
XLAT Create custom character translation programs.
* * * *
XLAT is an old but good alternative to FIXTEXT. One feature unique to XLAT is its ability to clone itself into multiple programs, each containing a custom translation table. You won't need to keep track of separate translation tables. In addition, a memory resident version can perform "on the fly" translation when sending strings to printer. Fast.From the docs...
It is often useful to have a little utility that translates certain characters within a file to certain others; e.g., if you have received an EBCDIC file, or if you have to deal with ISO-646 or ISO-8859 representations of national characters, which typically differ from the PC's...[I]t would be nice if it were easily customizable... One solution to this would be to have the programme read a translation table at run time; but then, you have to remember about these tables, their format etc., ...and you have to remember to take them along when you move between different machines.
The Xlat package is a different: there is just one programme for each type of conversion. Additional flexibility is gained by providing two flavours: a filter flavour, which can be used for disk files and for inter-programme data exchange, and a resident flavour, which is specifically designed for serving a printer...For customizing...there is a companion programme, ConfXlat, meaning 'configure xlat'. You feed it one version of an Xlat file, and you'll have a full screen menu that allows you to change the mappings and create a new incarnation of an Xlat programme...Your versions may have names like 'EBC2ASC', or 'GERM-646', or whatever.
Included sample filters perform these translations: replace non-ASCII characters by near-equivalents; replace non-ASCII characters by blanks; convert EBCDIC to ASCII; convert ROT13 Usenet-style encryption; replace German umlauts (IBM style) by ISO-646 equivalents; and vice versa.
Author: Gisbert W. Selke (1990). Suggested by Robert Bull.
Download xlat11.zip (58K).
CharWatch TSR character translator with 4 built-in translation tables.
CharWatch is a TSR program that translates characters "on the fly." It monitors the keyboard, screen and printer. Notable features include four built-in, commonly-used translation tables. Supported languages are English (active by default), French and Spanish. You can also create your own external tables using a very simple table format.
CWATCH [options] -? or -h.....[h]elp -u.......[u]ninstall -t<tables>....select tables to use (from 0 to 3) -n<d>......[N]o monitoring for this device -f <file>....Get translation table in this [f]ile -l <language>...select [l]anguage
Internal translation tables include:
0: All accentuated characters become non-accentuated.
1: All graphic box characters become text characters (+,-,|).
2: Simple border characters become double.
3: Double border characters become simple.
Author: Vincent Penquerc'h, France (1997); Suggested by Marianna Van Erp.
Download charw110.zip (10K).
REMOVE Remove carriage returns from ASCII documents.
* * *
Many documents on the Web are ASCII text, and lines are often broken by carriage returns at 60-80 columns. If you import this text into many word processors the carriage returns are retained and interpreted as paragraph marks. This usually fouls your attempts to apply special paragraph styles in your word processor, because each line is now considered a paragraph. To get rid of these paragraph marks you need to run the text file through a filter prior to importing into your word processor. Some word processors may include such a filter, but some don't.
The REMOVE package contains two executables, for DOS and for Windows 3.1. It strips single carriage returns / line feeds while preserving actual paragraph boundaries. It can also optionally preserve tab formatting. Although you can use nearly any search / replace tool to remove CRs and LFs, REMOVE is a friendlier alternative.
Bug: Seems to merge first two words of paragraphs which start with word "I" ("I am" becomes "Iam").
This package also contains TextConvert, which can detect and convert among DOS, Mac, and Unix text formats.
Author: Iceman, Finland (1995).
Download remove30.zip (159K).
[ Go to Top | Front Page ]
©1994-2004, Richard L. Green.