UTF8 support

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

UTF8 support

stamit
Hello to the list.

I did my own hack for support for UTF8 encoding in GNU Prolog. It peeks
ahead on the stream and makes each extended character/byte in a UTF8
sequence appear to be of the same type as the entire 'wide' character.
It uses the functions declared in <wctype.h>.

This logic has a problem with pushback *BUT* it is all internal to the
scanner (the scanner just treats non-ASCII characters specially).

Invalid UTF8 is read in the old-fashioned way.

Long '\XXXXX\' escape sequences are read/written.

I also did everything I could for atom_chars, atom_codes etc. but I
don't know gprolog's internals very well. They all seem to work OK
though.

There are #ifdef blocks and a new option in configure.in (untested).


src/BipsPl/c_supp.[ch]
        *Char* and *Code* functions updated
src/BipsPl/scan_supp.c
        UTF8_Hack_Peek_Next_Char
src/BipsPl/stream_supp.[ch]
        I had to add some fields to StmInf for the scanner to use
src/BipsPl/write_supp.c
        iswprint test
src/EnginePl/atom.[ch]
        UTF8_Hack_Classify_Char, Is_Valid_Code


I hope you will find all this useful.


Stamatis Mitrofanis

_______________________________________________
Users-prolog mailing list
[hidden email]
http://lists.gnu.org/mailman/listinfo/users-prolog

gprolog-1.2.16-utf8.patch (22K) Download Attachment