• Topic
  • Discussion
  • VOS.VirtGeoSPARQLEnhancementDocs(Last) -- Owiki? , 2024-10-24 10:18:59 Edit owiki 2024-10-24 10:18:59

    Virtuoso Geospatial Enhancements

    Introduction

    As of Virtuoso 7.1, in both Open Source and Commercial/Enterprise Editions, a number of major enhancements have been made to Geospatial support, improving the Geometry data types and functions supported, and increasing compliance with the emerging GeoSPARQL and OGC standards.

    Virtuoso Geospatial Geometry data types and sample queries

    The table below outlines the common WKT (Well Known Text) representations for several types of geometric objects used in RDF:

    graphic table of WKT representations

    The following queries "count the number of items of each type, whose coordinates fall within a bounded box shape" for the various RDF geometry data types now supported by Virtuoso. The links are to live examples of the query running against the OpenLink LOD Cloud Cache instance.

    BOX


    SELECT         ?f AS ?facet 
            COUNT(?s) AS ?cnt
    FROM <http://linkedgeodata.org/>
    WHERE 
      { 
        ?s  <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>  ?f .
        ?s  <http://www.w3.org/2003/01/geo/wgs84_pos#geometry> ?p .
        FILTER
         ( bif:st_intersects
            ( bif:st_geomfromtext
                ( "BOX(0.3412 43.5141, 9.3412 48.0141)" )
            , ?p 
            ) 
         )
      } 
    GROUP BY ?f  
    ORDER BY DESC(?cnt) 
    LIMIT 10
    

    POLYGON


    SELECT         ?f AS ?facet 
            COUNT(?s) AS ?cnt
    FROM <http://linkedgeodata.org/>
    WHERE 
      { 
        ?s  <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>   ?f .
        ?s  <http://www.w3.org/2003/01/geo/wgs84_pos#geometry>  ?p .
        FILTER 
          ( bif:st_intersects
              ( bif:st_geomfromtext
                  ( "POLYGON((1 2, 6 1, 9 3, 8 5, 3 6, 1 2))" )
              , ?p 
              ) 
          )
      }  
    GROUP BY ?f  
    ORDER BY DESC(?cnt) 
    LIMIT 10
    

    POLYGON WITH HOLE


    SELECT         ?f AS ?facet 
            COUNT(?s) AS ?cnt
    FROM <http://linkedgeodata.org/>
    WHERE 
      { 
        ?s  <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>   ?f .
        ?s  <http://www.w3.org/2003/01/geo/wgs84_pos#geometry>  ?p .
        FILTER 
          ( bif:st_intersects
              ( bif:st_geomfromtext
                  ( "POLYGON((1 2, 6 1, 9 3, 8 5, 3 6, 1 2), (3 3, 5 5, 6 2, 3 3))" )
                , ?p 
              ) 
          )
      } 
    GROUP BY ?f  
    ORDER BY DESC(?cnt) 
    LIMIT 10
    

    MULTIPOLYGON


    SELECT         ?f AS ?facet 
            COUNT(?s) AS ?cnt
    FROM <http://linkedgeodata.org/>
    WHERE 
      { 
        ?s  <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>   ?f .
        ?s  <http://www.w3.org/2003/01/geo/wgs84_pos#geometry>  ?p .
        FILTER 
          ( bif:st_intersects
              ( bif:st_geomfromtext
                  ( "MULTIPOLYGON(((1 2, 6 1, 9 3, 3 6, 1 2)), ((4 9, 7 6, 9 8, 4 9)))" )
                , ?p 
              ) 
          )
      }  
    GROUP BY ?f  
    ORDER BY DESC(?cnt) 
    LIMIT 10
    

    GEOMETRY COLLECTION


    SELECT         ?f AS ?facet 
            COUNT(?s) AS ?cnt
    FROM <http://linkedgeodata.org/>
    WHERE 
      { 
        ?s  <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?f .
        ?s  <http://www.w3.org/2003/01/geo/wgs84_pos#geometry> ?p .
        FILTER 
          ( bif:st_intersects
              ( bif:st_geomfromtext
                  ( "GEOMETRYCOLLECTION( POINT(4 5), POINT(7 4), POINT(6 2), LINESTRING(4 5, 6 7, 7 4, 6 2), POLYGON((1 2, 6 1, 9 3, 8 5, 3 6, 1 2)) )" )
                , ?p
              ) 
          )
      }  
    GROUP BY ?f  
    ORDER BY DESC(?cnt) 
    LIMIT 10
    

    MULTI POINT


    SELECT         ?f AS ?facet 
            COUNT(?s) AS ?cnt
    FROM <http://linkedgeodata.org/>
    WHERE 
      { 
        ?s  <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>   ?f .
        ?s  <http://www.w3.org/2003/01/geo/wgs84_pos#geometry>  ?p .
        FILTER
          ( bif:st_intersects
              ( bif:st_geomfromtext
                  ( "MULTIPOINT(3 7, 4 2, 8 6)" )
                , ?p 
              ) 
          )
      }  
    GROUP BY ?f  
    ORDER BY DESC(?cnt) 
    LIMIT 10
    

    LINE STRING


    SELECT         ?f AS ?facet 
            COUNT(?s) AS ?cnt
    FROM <http://linkedgeodata.org/>
    WHERE 
      { 
        ?s  <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>   ?f .
        ?s  <http://www.w3.org/2003/01/geo/wgs84_pos#geometry>  ?p .
        FILTER
          ( bif:st_intersects
              ( bif:st_geomfromtext
                  ( "LINESTRING(1 2, 3 6, 9 4)" )
                , ?p
              )
          )
      }  
    GROUP BY ?f  
    ORDER BY DESC(?cnt) 
    LIMIT 10
    

    MULTI LINE STRING


    SELECT         ?f AS ?facet 
            COUNT(?s) AS ?cnt
    FROM <http://linkedgeodata.org/>
    WHERE 
      { 
        ?s  <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>   ?f .
        ?s  <http://www.w3.org/2003/01/geo/wgs84_pos#geometry>  ?p .
        FILTER
          ( bif:st_intersects
              ( bif:st_geomfromtext
                  ( "MULTILINESTRING((1 8, 4 4), (4 9, 8 5, 6 2, 1 4))" )
                , ?p 
              ) 
          )
      } 
    GROUP BY ?f 
    ORDER BY DESC(?cnt) 
    LIMIT 10
    

    Supported shape types


    BOX, BOX2D, BOX3D, BOXM, BOXZ, BOXZM
    CIRCULARSTRING
    COMPOUNDCURVE
    CURVEPOLYGON
    EMPTY
    GEOMETRYCOLLECTION, GEOMETRYCOLLECTIONM, GEOMETRYCOLLECTIONZ, GEOMETRYCOLLECTIONZM
    LINESTRING, LINESTRINGM, LINESTRINGZ, LINESTRINGZM
    MULTICURVE
    MULTILINESTRING, MULTILINESTRINGM, MULTILINESTRINGZ, MULTILINESTRINGZM
    MULTIPOINT, MULTIPOINTM, MULTIPOINTZ, MULTIPOINTZM
    MULTIPOLYGON, MULTIPOLYGONM, MULTIPOLYGONZ, MULTIPOLYGONZM
    POINT, POINTM, POINTZ, POINTZM
    POLYGON, POLYGONM, POLYGONZ, POLYGONZM
    POLYLINE, POLYLINEZ
    RING, RINGM, RINGZ, RINGZM
    

    Not yet supported shape types


    CIRCULARSTRINGM, CIRCULARSTRINGZ, CIRCULARSTRINGZM
    COMPOUNDCURVEM, COMPOUNDCURVEZ, COMPOUNDCURVEZM
    CURVE, CURVEM, CURVEZ, CURVEZM
    CURVEPOLYGONM, CURVEPOLYGONZ, CURVEPOLYGONZM
    GEOMETRY, GEOMETRYZ, GEOMETRYZM
    MULTICURVEM, MULTICURVEZ, MULTICURVEZM
    MULTIPATCH
    MULTISURFACE, MULTISURFACEM, MULTISURFACEZ, MULTISURFACEZM
    POLYHEDRALSURFACE, POLYHEDRALSURFACEM, POLYHEDRALSURFACEZ, POLYHEDRALSURFACEZM
    POLYLINEM
    SURFACE, SURFACEM, SURFACEZ, SURFACEZM
    TIN, TINM, TINZ, TINZM
    

    Virtuoso Geospatial geometry functions

    The following Virtuoso Geospatial geometry functions are available for use in both SQL & RDF Geospatial queries. The listed functions are built-in SQL functions. As all built-in functions of Virtuoso, geo-specific functions can be called from SPARQL with prefix bif: (e.g., bif:earth_radius() or <bif:earth_radius>()).

    Open Source proj4 Plug-in

    The Virtuoso proj4 Hosted Plugin Module is required for performing transformation between different coordinates systems using the ST_Transform() function. The plugin is based on Frank Warmerdam's proj4 library and it is practical to have the proj4 package installed on every box of a Virtuoso cluster, even if the build is performed on single machine including one outside the cluster. The reason is that the plugin should load data about coordinate systems to work, and the simplest way to get the right data from a high-quality source is to use the package.

    Compiling proj4 Plug-in

    The proj4 is currently available in the default develop/7 branch of the Virtuoso Open Source git repository, and can be built with the following command sequence.

    Note: The proj.4 library (may come from the proj.4 download area) must first be installed on the system, which the configure script will detect, enabling the proj4 plugin library to be built in ~/libsrc/plugin/.libs.
    git clone https://github.com/openlink/virtuoso-opensource.git
    cd virtuoso-opensource 
    ./autogen.sh
    export CFLAGS="-msse4.2 -DSSE42"
    ./configure 
    make -j 24
    make install
    


    bash-3.2$ ls libsrc/plugin/.libs/proj4*
    libsrc/plugin/.libs/proj4.a
    libsrc/plugin/.libs/proj4.la
    libsrc/plugin/.libs/proj4.lai
    libsrc/plugin/.libs/proj4_la-import_gate_virtuoso.o
    libsrc/plugin/.libs/proj4_la-sql_proj4.o
    libsrc/plugin/.libs/proj4_la-proj4_plugin.o
    libsrc/plugin/.libs/proj4.so
    libsrc/plugin/.libs/proj4.ver
    

    Installation and Configuration of proj4 Plug-in

    After the plugin (proj4.so) is built, it must be added to the [Plugins] section of the Virtuoso configuration file (virtuoso.ini or the like). This must be done on every node, if running in a cluster.


    [Plugins]
    LoadPath = ./plugins
    Load2    = plain, proj4
    

    If everything is fine, the virtuoso.log file will contain something like the following lines after the next startup:


    21:30:10 { Loading plugin 1: Type `plain', file `shapefileio' in `.'
    21:30:10   ShapefileIO version 0.1virt71 from OpenLink Software
    21:30:10   Shapefile support based on Frank Warmerdam's Shapelib
    21:30:10   SUCCESS plugin 1: loaded from ./plugins/shapefileio.so }
    21:30:10 { Loading plugin 2: Type `plain', file `proj4' in `.'
    21:30:11   plain version 3208 from OpenLink Software
    21:30:11   Cartographic Projections support based on Frank Warmerdam's proj4 library
    21:30:11   SUCCESS plugin 2: loaded from ./plugins/proj4.so }
    21:30:11 OpenLink Virtuoso Universal Server
    21:30:11 Version 07.10.3208-pthreads for Linux as of Mar 31 2014
    ...
    21:30:28 PL LOG: Initial setup of DB.DBA.SYS_PROJ4_SRIDS data from files in "/usr/share/proj"
    21:30:30 PL LOG: DB.DBA.SYS_PROJ4_SRIDS now contains 6930 spatial reference systems
    ...
    21:30:32 Server online at 1720 (pid 9654)
    

    To store descriptions of coordinate systems, the plugin creates a table:


    CREATE TABLE  DB.DBA.SYS_PROJ4_SRIDS 
      (
        SR_ID            INTEGER,
        SR_FAMILY        VARCHAR NOT NULL,
        SR_TAG           VARCHAR,
        SR_ORIGIN        VARCHAR NOT NULL,
        SR_IRI           IRI_ID_8,
        SR_PROJ4_STRING  VARCHAR NOT NULL,
        SR_WKT           VARCHAR,
        SR_COMMENT       VARCHAR,
        SR_PROJ4_XML     ANY,
        PRIMARY KEY (SR_ID, SR_FAMILY) 
      );
    

    This is filled with data from files epsg, esri, esri.extra, nad83, and nad27 of directory /usr/share/proj. Note these files must exist in the /usr/share/proj directory; otherwise, a message will be reported in the log file, indicating the file could not be found. Every row of the table is identified with the name of the "family" of coordinate systems and an integer SRID. Different sources may assign the same SRID to different reference systems; however, descriptions of well-known systems will match exactly or with differences that are not noticeable for any practical application.

    The loading process uses family names 'EPSG', 'ESRI', 'NAD83' and 'NAD27'. When the ST_Transform() searches for a coordinate system that corresponds to a given SRID, it returns the first record found while checking the families in the following order: 'PG', 'EPSG', 'ESRI','NAD83', 'NAD27'. It is therefore generally practical to put all custom definitions in 'PG' family, giving them the highest priority.

    A sample EPSG file containing the mapping for the proj.4 EPSG:4326 coordinate system is:


    $ cat /usr/share/proj/epsg 
    <4326>+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs<>
    $
    

    SQL> SELECT * FROM DB.DBA.SYS_PROJ4_SRIDS;
    SR_ID              SR_FAMILY          SR_TAG    SR_ORIGIN              SR_IRI    SR_PROJ4_STRING                                    SR_WKT    SR_COMMENT   SR_PROJ4_XML
    INTEGER NOT NULL   VARCHAR NOT NULL   VARCHAR   VARCHAR NOT NULL       VARCHAR   VARCHAR NOT NULL                                   VARCHAR   VARCHAR      VARCHAR
    ________________   ________________   _______   ____________________   _______   ________________________________________________   _______   __________   ____________
    
    4326               EPSG               4326      /usr/share/proj/epsg   NULL      +datum=WGS84 +ellps=WGS84 +no_defs +proj=longlat             NULL         NULL
    
    1 Rows. -- 0 msec.
    SQL> 
    

    There are two procedures available for loading more coordinate systems:

    • DB.DBA.PROJ4_LOAD_SYS_SRIDS is called at server startup, if the proj4 plugin is loaded:

      DB.DBA.PROJ4_LOAD_SYS_SRIDS ( IN projdir VARCHAR := '/usr/share/proj', IN only_if_empty_table INTEGER := 0 )

    • DB.DBA.PROJ4_LOAD_INIT_FILE is a lower-level procedure:

      DB.DBA.PROJ4_LOAD_INIT_FILE ( IN path VARCHAR, IN _sr_family VARCHAR )

    The main part of DB.DBA.PROJ4_LOAD_SYS_SRIDS() is a sequence of:


     DB.DBA.PROJ4_LOAD_INIT_FILE (projdir || '/epsg', 'EPSG');
     DB.DBA.PROJ4_LOAD_INIT_FILE (projdir || '/esri', 'ESRI');
     DB.DBA.PROJ4_LOAD_INIT_FILE (projdir || '/esri.extra', 'ESRI');
     DB.DBA.PROJ4_LOAD_INIT_FILE (projdir || '/nad83', 'NAD83');
     DB.DBA.PROJ4_LOAD_INIT_FILE (projdir || '/nad27', 'NAD27');
    

    Rows with the same SRID but different SR_FAMILY values may exist in the table; however, only one projection per SRID is used, and SR_FAMILY defines the priority. The internal search query for projection by SRID is:


    SELECT COALESCE
       (
         ( SELECT SR_PROJ4_STRING FROM DB.DBA.SYS_PROJ4_SRIDS WHERE SR_ID= :0 AND SR_FAMILY='PG' ), 
         ( SELECT SR_PROJ4_STRING FROM DB.DBA.SYS_PROJ4_SRIDS WHERE SR_ID= :0 AND SR_FAMILY='EPSG' ), 
         ( SELECT SR_PROJ4_STRING FROM DB.DBA.SYS_PROJ4_SRIDS WHERE SR_ID= :0 AND SR_FAMILY='ESRI' ), 
         ( SELECT SR_PROJ4_STRING FROM DB.DBA.SYS_PROJ4_SRIDS WHERE SR_ID= :0 AND SR_FAMILY='NAD83' ), 
         ( SELECT SR_PROJ4_STRING FROM DB.DBA.SYS_PROJ4_SRIDS WHERE SR_ID= :0 AND SR_FAMILY='NAD27' ) 
       );
    

    This means that for ST_Transform(), function 'PG' overrides everything else; EPSG is the next highest priority; then ESRI, NAD83, and NAD27. However, custom queries and procedures may select whatever they please (including SR_FAMILY values not listed here, strings from other tables, etc.), and may feed projection strings directly to ST_Transform().

    The coordinate systems can also be updated by directly manipulating the DB.DBA.SYS_PROJ4_SRIDS table. (This table is readable by public, and writable only by DBA.) After editing the table, the "Proj4 cache_reset()" function should be called to prevent the SQL runtime from using previously-prepared projections that might now be obsolete. Note that proj4 projections are for normalized data in radians, while Virtuoso stores shapes using numbers that come from WKT; i.e., they're latitudes and longitudes in degrees, for almost all cases.

    The proj4 plugin automatically applies the RAD_TO_DEG multiplier before conversion and/or the RAD_TO_DEG multiplier after conversion, when source and/or destination coordinate systems are latitude-longitude or geocentric. Even if this conversion is done automatically, you should remember that it happens, because many "how-to" instructions for spatial data sets contain paragraphs like "how to convert these data to WGS-84," and much sample C/C++ code contains transformations like { x *= RAD_TO_DEG; y *= RAD_TO_DEG; }. These transformations will probably be redundant in the corresponding Virtuoso/PL code, while proj4 strings can be used unchanged and simply passed as the 3rd and 4th arguments of the ST_Transform() function. If degrees-to-radians conversion is made twice, the data may be calculated as if the shape is located in a totally different place of ellipsoid. If the post-transformation radians-to-degrees conversion is also made twice, the resulting shape may look like the real one, but coordinates may be tens of kilometers away from the correct values.

    Example usage of ST_Transform()

    Below are some example uses of the ST_Transform() function to transform some of the sample coordinate systems loaded into Virtuoso:


    SQL> SELECT * FROM DB.DBA.SYS_PROJ4_SRIDS;
    SR_ID              SR_FAMILY          SR_TAG    SR_ORIGIN              SR_IRI    SR_PROJ4_STRING                                                                                                                                                                         SR_WKT    SR_COMMENT   SR_PROJ4_XML
    INTEGER NOT NULL   VARCHAR NOT NULL   VARCHAR   VARCHAR NOT NULL       VARCHAR   VARCHAR NOT NULL                                                                                                                                                                        VARCHAR   VARCHAR      VARCHAR
    ________________   ________________   _______   ____________________   _______   _____________________________________________________________________________________________________________________________________________________________________________________   _______   __________   ____________
    
    2005               EPSG               2005      /usr/share/proj/epsg   NULL      +ellps=clrk80 +k=0.9995000000000001 +lat_0=0 +lon_0=-62 +no_defs +proj=tmerc +units=m +x_0=400000 +y_0=0                                                                                          NULL         NULL
    2249               EPSG               2249      /usr/share/proj/epsg   NULL      +datum=NAD83 +ellps=GRS80 +lat_0=41 +lat_1=42.68333333333333 +lat_2=41.71666666666667 +lon_0=-71.5 +no_defs +proj=lcc +to_meter=0.3048006096012192 +x_0=200000.0001016002 +y_0=750000             NULL         NULL
    4326               EPSG               4326      /usr/share/proj/epsg   NULL      +datum=WGS84 +ellps=WGS84 +no_defs +proj=longlat                                                                                                                                                  NULL         NULL
    
    3 Rows. -- 1 msec.
    SQL> SELECT st_transform (st_geomfromtext ('POLYGON((-16 20.25,-16.1 20.35,-15.9 20.35,-16 20.25))'), 1, '+proj=latlong +ellps=clrk66', '+proj=merc +ellps=clrk66 +lat_ts=33');
    unnamed
    VARCHAR NOT NULL
    _____________________________________________________________________________________________________________________________________________
    
    SRID=1;POLYGON((-1495284.211473 1920596.789917,-1504629.737795 1930501.842961,-1485938.685152 1930501.842961,-1495284.211473 1920596.789917))
    
    1 Rows. -- 0 msec.
    SQL> SELECT ST_AsText(ST_Transform(ST_GeomFromText('POLYGON((743238 2967416,743238 2967450, 743265 2967450,743265.625 2967416,743238 2967416))',2249),4326)) AS wgs_geom;
    wgs_geom
    VARCHAR NOT NULL
    ___________________________________________________________________________________________________________________
    
    POLYGON((-71.177685 42.390290,-71.177684 42.390383,-71.177584 42.390383,-71.177583 42.390289,-71.177685 42.390290))
    
    1 Rows. -- 1 msec.
    SQL> 
    

    Future Plans

    • Full support for all DE9-IM based operations.
    • Full support for GeoSPARQL.
    • Additional functions for splitting compound geometries into parts and for constructing geometries (except operations that get shapes as arguments and returns other shapes).

    Related