UTF8 support

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

UTF8 support

Hello to the list.

I did my own hack for support for UTF8 encoding in GNU Prolog. It peeks
ahead on the stream and makes each extended character/byte in a UTF8
sequence appear to be of the same type as the entire 'wide' character.
It uses the functions declared in <wctype.h>.

This logic has a problem with pushback *BUT* it is all internal to the
scanner (the scanner just treats non-ASCII characters specially).

Invalid UTF8 is read in the old-fashioned way.

Long '\XXXXX\' escape sequences are read/written.

I also did everything I could for atom_chars, atom_codes etc. but I
don't know gprolog's internals very well. They all seem to work OK

There are #ifdef blocks and a new option in configure.in (untested).

        *Char* and *Code* functions updated
        I had to add some fields to StmInf for the scanner to use
        iswprint test
        UTF8_Hack_Classify_Char, Is_Valid_Code

I hope you will find all this useful.

Stamatis Mitrofanis

Users-prolog mailing list
[hidden email]

gprolog-1.2.16-utf8.patch (22K) Download Attachment