$Id: README,v 1.7 1997/04/13 02:59:59 dps Exp $ What is new in version 0.005 of word2x Update version to 0.005 Fix version number bug Update config.guess an config.sub Re-generate configure.in with newer autoconf Fix various ANSI violations that g++ 2.95 disliked What is new in version 0.004 of word2x Stupid bug in word2x_junk_filter::filter_junk bug which ignored the last character read squashed. Added german support from word2x port EX2. What it was new in version 0.003 of word2x word2x-0.003 is version word2x-0.002 with a major bug in strip.cc eliminated. word2x-0.002 was 0.001 retro-fitted with some quite new junk filtering code with lots of tunable parameters (i.e. all of tune.h). This code is extracted from the envolving, and currently incomplete, source tree of the next major release. (When this happens I will stop supporting or maintaing any 0.00x versions). The major change is much better junk filtering, losing less text and throwing out more junk; unicode documents should now work. Increasing numbers of problem document which have OLE junk in places that break the code are appearing. Splitting the document with lls (from the LAOLA package) and attacking the WordDocument stream works---sometime I will have a useable library that can do this automagically. (In word2x-0.002 you have a very good chance of tickling the strip.cc bug and its (buggy) bug trap). Documents that do cause problems after the suggested work-around to word2x@duncan.telstar.net please. The immediate fix is to try one of the other two programs. (Free software people are prepared to co-operate with the "competition"). There are links to all the "competition" I know of on the word2x home page at http://word2x.alcom.co.uk (hosted by the alcom.co.uk free of charge, despite the fact charges normally apply). Installing word2x You need a C++ compiler and a version of make that does understands how to make .o files from .cc files, for example GNU make. Ideally you have getopt_long already in your C library but you might not. If this applies set GETOPT to gopt.o in the Makefile. getopt_long is the version supplied by the free software foundation in glibc-1.09 If your make does not know then add a rule. for GNU make the rule is %.o: %.cc $(CPP) $(CPPFLAGS) -c -o $@ $< Please note that a warning about a contravariance violation is normal. As this is program only recently escaped, YMMV. The main reason for its escape was incessant irration that comp.os.linux.misc posters manage about word .doc files (IMHO this is justified). I had wrote this program for myself and my word problem; I let it run wild in the hope that is it useful for others. [I now know is it is helping some people] Further information on other converters is avialable in the list of converters avialble via (word2x seems to have a monopoly on converters from word to latex not requiring word and avialable on non-MS platforms). The program has been compiled on (the first two by me personally): Linux 2.1.30 (Unix) SunOS (Unix) DEC Alpha AXP under OSF/1 (Unix) IBM SP/2 (RS6000) under AIX (Unix) [SP/2s are heavy computing power...] It is known not compile with Borland C++ 3.1 (PC version). If any manages to compile on a PC version, please tell Duncan Simpson and W.Hennings Limited flat (linear) memory might be lethal, esp. if your system lacks alloca. If have not learned to steal what is free then you can send money (prefably UK funds), postcards, etc to the author at Frax House, Kingston Bagpuize, OXON OX13 5AW or for the next couple of years Flat 6, 93 Westridge Road, Southampton I neither suggest that you do donate nor that you do not donate. SunOS Can be problematic. Setting LD to ./sunos_link and defining add produced a binary that worked for me with one warning about strncasecmp. I guess SUN's ld is incompatible with g++ or something; using ar and ranlib, aka the sunos_link shell script, works. The configuration script hopefuk does this stuff for you. Reported bugs On some platforms it misses the first 3/4 of a page. If you are afflicted get out your copy of hexdump and adjust the start offset in word2x to the correct value. This should be fixed now. Copyright This program is(c) D.P.Simson 1997. The program is licenced under the GPL version 2, or any later version (at your option). This means DOS people must distribute source as per the GPL. The stuff I did not write is: config.guess and config.sub come from GNU autoconf and are thus (c) The Free Software Foundation. getopt.c, getopt1.c and getopt.h are (c) The Free Software Foundation. I am fairly sure the LGPL requires these files to be distributed as well. alloca.c is almost certaintly also (c) The Free Sofwtware Foundation. install-sh is probably (c) The X consortium Introductory proganda Despite the fact that open formats like rtf are good and widely avialable far too many idiots seem to insist on using word .doc format. This program is an attempt to limit the damage this causes users of non-microsoft systems and text processing systems, for example LaTeX. It is designed to be retargetable and avoid some of the travesties of proper typsetting comitted by word, which is hobbled by the lack of litagures in TrueType fonts (and the lack of different design sizes to some extent). There is quite a large amount of guesswork from context to reduce the impact of my lack of understanding a document the way word does. One even sees interesting things like 550* \F(foo, bar) * 42 * (pixels per em) which is not too good! There may be multiple bits of alternating roman and equation, multiple items of text in brackets, etc. etc. Fortunately the reader converts these, in two stages, to a single maths insert. Maths inserts with embedded newlines get rendered as eqnarray* in LaTeX mode. All maths is just deleted in text mode (would someone like to add this support?). LaTeX mode sees the equation example above as 550 * \F(foo, bar * baz) * 42 * (pixels per em) and renders it as % Some comments omited for brevity $$550 \times {\text{foo} \over \text{bar} \times \text{baz} } \times 42 \times \text{(pixels per em)}$$ which looks a lot better than word's own version, which uses awful stars instead of proper times signs. Text mode implements tables with real columns, unlike catdoc. Long entries are folded automatically and there is some semi-intelligent width reduction. Hypenation is not supported so if someone instists on using supercalifragilistic... then an overlong line might result (anyone care to fix this? I thought it was just overkill to implement the hypenation algorithm along with all the rest). Apart from the pictures and a little trailing junk the code does a good job on the TrueType documents. The readme generates some error messages about extra ^Us amoung other things due to a lack of understanding of some of the inserts used in some documents. Anyone who can decode more types of insert, please tell me about it and preferably send a patch so I can avoid extra programming (got too much real work to be doing). If someone wishes to contribute *roff output I would include it. Extra understanding of equations also gratefully recieved as the examples in the TrueType docs are rather limited. Bibliography and any other you can tell me about also grateful listened to. Duncan (-: