Normalization of UNICODE3 accented characters for Virtuoso free-text indexing

Normalization of UNICODE3 accented characters in a free-text index can be controlled by setting the XAnyNormalization configuration parameter in the [I18N] section of the Virtuoso configuration file, virtuoso.ini. This parameter controls whether accented UNICODE characters should be converted to their non-accented base variants when creating a free-text index or when parsing a free-text query string. The parameter's value is a bitmask integer, currently with only 2 bits in use:

XAnyNormalization? value bit equivalent Description
0 00 Default. Nothing is normalized, so "Jose" and "Jos?" are two distinct words.
1 01 ToBeDone?
2 10 Any "combining character sequence" (a combination of a base character and one or more combining characters) is converted to its (smallest known) base. For example, "?" will lose its accent, and become a plain ASCII "e".
3 11 This combines 1 and 2, and so causes both conversions. Any pair of base character and combining character loses the second character, and characters with accents lose their accents.

So the fragment of virtuoso.ini would look like:


...

[I18N]
XAnyNormalization = 3

...

Example

With XAnyNormalization=3, one can get the following:


SQL> SPARQL 
     INSERT 
       IN <http://InternationalNSMs/>
         {
           <s>  <sp>  "?ndio Jo?o Macap? J?nior T?rres Lu?s Ara?jo Jos?"  ; 
                <ru>  "?? ??????? ????????, ??????? ? ???????? ???????? ?? ?????"   
         }
       ;

INSERT INTO <http://InternationalNSMs/>, 2 (or less) triples -- done


SQL> DB.DBA.RDF_OBJ_FT_RULE_ADD (NULL, NULL, 'InternationalNSMs.wb');

Done. -- 0 msec.

SQL> VT_INDEX_DB_DBA_RDF_OBJ(0);

Done. -- 26 msec.

SQL> SPARQL 
     SELECT * 
       FROM <http://InternationalNSMs/> 
       WHERE 
         {
           ?s  ?p  ?o 
         }
       ORDER BY ASC (str(?o))
       ;

s  sp  ?ndio Jo?o Macap? J?nior T?rres Lu?s Ara?jo Jos?
s  ru  ?? ??????? ????????, ??????? ? ???????? ???????? ?? ?????

2 Rows. -- 2 msec.

SQL> SPARQL 
     SELECT * 
       FROM <http://InternationalNSMs/> 
       WHERE 
         { 
           ?s  ?p            ?o                                                    . 
           ?o  bif:contains  "'?ndio Jo?o Macap? J?nior T?rres Lu?s Ara?jo Jos?'"  
         }
       ;

s  sp  ?ndio Jo?o Macap? J?nior T?rres Lu?s Ara?jo Jos?

1 Rows. -- 2 msec.

SQL> SPARQL 
     SELECT * 
       FROM <http://InternationalNSMs/> 
       WHERE
         { 
           ?s  ?p            ?o                                                    . 
           ?o  bif:contains  "'Indio Joao Macapa Junior Torres Luis Araujo Jose'" 
         }
       ;

s  sp  ?ndio Jo?o Macap? J?nior T?rres Lu?s Ara?jo Jos?

1 Rows. -- 1 msec.

SQL> SPARQL 
     SELECT * 
       FROM <http://InternationalNSMs/> 
       WHERE 
         { 
           ?s  ?p            ?o                         . 
           ?o  bif:contains  "'???????? ???????? ?? ?????'" 
         }
       ;

s  ru  ?? ??????? ????????, ??????? ? ???????? ???????? ?? ?????

Related