Friday, December 14, 2012

Calling GDAL/OGR commands from Python

key words: python, GDAL/OGR, subprocess.check_call, os.walk, external commands, environment variables

If you plan to call GDAL/OGR commands from python code you might need to set up environment variables before you can do that. Environment variables can be defined at system level and in that case you don’t need to do that but when they are not defined on the system level you will have to take care of them.

Calling external commands from python can be accomplished by subprocess module. There are several function that can be useful for this task but it this post I didn’t show all the differences between these function the main thing I wanted to show you is how to set up environment variables required by some external program called by subprocess module. In this particular case external program is any of GDAL/OGR commands.

If you want to write porogram that calls GDAL/OGR I can assume that you need it to process multiple files and that is why this sample program uses os.walk function. Walk function allows you to process multiple files in the multiple folders. This sample filters all tif files that can be found in grid_data folder.

Using subprocess subprocess.check_call function I have demonstrated invocation of gdalinfo command for all TIFF files and just as an example one OGR command (ogrinfo). The same principles can be applied for file processing using any of available GDAL/OGR commands.

Here is the source code.

.... enjoy ...

'''
Created on 10. okt. 2012
@author: Stipica Pavicic
'''
import os, fnmatch, subprocess

for (root, dirs, files) in os.walk(r'L:\sp\python\grid_data'):
    #print (root)    
    #print (os.listdir(root)) # list of files and folders
    #print (fnmatch.filter(os.listdir(root),".")) # filter the list of files and folders
    
    for filename in fnmatch.filter(files,"*.tif"):
      
        fbasename, extension = os.path.splitext(filename)
        
        gdal_root = r'C:\release-1310-gdal-1-9-mapserver-6-0'
        pathv=''
        pathv= pathv + gdal_root  + r'\bin;' 
        pathv= pathv + gdal_root  + r'\bin\gdal\python\osgeo;' 
        pathv= pathv + gdal_root  + r'\bin\proj\apps;' 
        pathv= pathv + gdal_root  + r'\bin\gdal\apps;' 
        pathv= pathv + gdal_root  + r'\bin\ms\apps;' 
        pathv= pathv + gdal_root  + r'\bin\gdal\csharp;'  
        pathv= pathv + gdal_root  + r'\bin\ms\csharp;' 
        pathv= pathv + gdal_root  + r'\bin\curl;'
        pathv= pathv + gdal_root  + r'\bin\gdal\apps'      
        
        
        #if you don't work with Oracle or you don't need Oracle plugin just rename/remove \bin\gdal\plugins\gdal_GEOR.dll & ogr_OCI.dll files to avoid unnecessary error messages
        env_cust={"GDAL_DATA":        gdal_root + r'\bin\gdal-data', \
                  "GDAL_DRIVER_PATH": gdal_root + r'\bin\gdal\plugins',  \
                  "SET PYTHONPATH":   gdal_root + r'\bin\gdal\python\osgeo;' +  
                                      gdal_root + r'\bin\ms\python', \
                  "SET PROJ_LIB":     gdal_root + r'\bin\proj\SHARE', \
                  "PATH": pathv}
        
        os.chdir(root)
        #print (root, filename, fbasename, extension)
        #subprocess.check_call(['echo' , filename], env=env_cust, shell=True)
        subprocess.check_call(['gdalinfo', filename], env=env_cust, shell=True)
        subprocess.check_call(['ogrinfo', '--help-general'], env=env_cust, shell=True)

Monday, December 3, 2012

Automation of Raster/grid data processing using ArcGIS and python

key words: automation, processing, python, ArcGIS, arcpy, xml, minidom

The main idea for this post was to define way how to process datasets that are stored as a files and organized in multiple folders. To be more specific I will describe some particular case.

Task: Create GeoTiff files from delivered ASCII files (x,y,z). ASCII files are organised in folders: the root folder is “grid_data” and there are several subfolders grouping the data according to resolution (5m grid, 2m grid). There is no coordinate system definition attached to files. Coordinate system definition is delivered in separated pdf document as well as information about grid resolution.

Before processing you always have to make some preparation especially if you want to make you processing autonomous and not dependent to operator. In this case I’ve decided to create xml file (info.xml) that will hold some basic information about delivered data as well as some other information about desired parameters for processing. Here is sample file.
<?xml version="1.0" encoding="UTF-8"?>
<geoprocessing> 
 <input_data>
  <!-- dataset resolution in m --> 
  <res>5</res>
  <!-- name of ESRI prj file -->
  <srs>C:\sp\python\grid_data\srs\ED_1950_UTM_Zone_31N.prj</srs>
 </input_data>
 <results>
  <!-- Root folder for processed files. Leave empty if you want to use same folder as input data -->
  <root_folder></root_folder>
  <!-- name of ESRI prj file. Leave empty if you want to use same srs as input data -->
  <srs></srs>
 </results>
</geoprocessing>
I have decided to use xml for this purpose because it is more flexible when you need to make some changes (if you want to use similar approach to solution of some other task). To parse this xml file I have created following python class:
'''
Created on 23. nov. 2012

@author: Stipica Pavicic
'''
from xml.dom.minidom import parseString


'''
configuration sample file m:\python\info.xml
'''       

class ConfigObj(object):
    '''
    Parsing of xml file that holds configuration parameters for porcessing.
    Parameters are separated in two sections <input_data> and <results>.
    '''
    _dom=''
    _result_obj=''
    _input_obj=''
    
    class InputObj(object):
        '''
        parsing of <input_data> segment of info.xml file
        '''
        _res = 'not_set'
        _srs = 'not_set'
    
        def __init__(self, dom_obj):
            
            node = dom_obj.getElementsByTagName('input_data')[0]
            
            dom_elem = node.getElementsByTagName('srs')[0].firstChild
            if dom_elem:
                self._srs = dom_elem.data

    
            dom_elem = node.getElementsByTagName('res')[0].firstChild
            if dom_elem:
                self._res = dom_elem.data
         
        
        def getSrs(self):
            return self._srs
        
        def getRes(self):
            return self._res 
    
    class ResultObj(object):
        '''
        parsing of <results> segment of info.xml file
        '''
        _root_folder='not_set'
        _srs='not_set'
    
        def __init__(self, dom_obj):
            
            node = dom_obj.getElementsByTagName('results')[0]
            
            dom_elem = node.getElementsByTagName('srs')[0].firstChild
            if dom_elem:
                self._srs = dom_elem.data
            else:
                # in the case if result srs is not defined in xml file use the input data srs 
                nodei = dom_obj.getElementsByTagName('input_data')[0]
                dom_elemi = nodei.getElementsByTagName('srs')[0].firstChild
                self._srs = dom_elemi.data                
    
            dom_elem = node.getElementsByTagName('root_folder')[0].firstChild
            if dom_elem:
                self._root_folder = dom_elem.data
   
        
        def getSrs(self):
            return self._srs
        
        def getRootFolder(self):
            return self._root_folder 
        

    def __init__(self, info_xml_path):

        try:
            myfile = open(info_xml_path,'r')
            #convert to string:
            mydata = myfile.read()
            #close file because we dont need it anymore:
            myfile.close()
            #parse the xml you got from the file
            self._dom = parseString(mydata)
            self._result_obj = self.ResultObj(self._dom)
            self._input_obj = self.InputObj(self._dom)
        except IOError:
            print("IO Error")
        except:
            print("General Error!")
        
        
    def getDom(self):
        return self._dom
    
    def getGetResultParam(self):
        '''
        access  parameters
        '''
        return self._result_obj
    
    def getGetInputParam(self):
        '''
        access  parameters
        '''
        return self._input_obj
To use this class you need to create xml file with configuration parameters (eg. c:\test\info.xml). Here is the sample code how to access parameters:
from my.xml.configurator import ConfigObj

conf = ConfigObj(r'c:\test\info.xml')
print conf.getGetInputParam().getSrs()
print conf.getGetInputParam().getRes()
print conf.getGetResultParam().getSrs()
print conf.getGetResultParam().getRootFolder()
As you can see it is very easy to adjust python code to some different structured xml file. Maybe you need to add additional parameters inside <input> section, or maybe you want to add additional section for some other group of parameters (something like <global>). All this depends on the issue you want to solve using this approach. For this particular case I’ll stick with the parameters that already exists in presented xml file ..I’m calling it info.xml but it can be any other name :-) So... for now you know how to create configuration file, to create some parameters and values, and how to access those parameters using python code (class ConfigObj). Next step is to create script that process all files in all subfolders starting from some arbitrary root folder. Script will search for configuration files (info.xml) and processing will be performed only for those folders. Here is code for that. The processing is straightforward but I will make some comments just to explain idea. Some comments are also included in source code.
'''
Created on 23. nov. 2012
@author: Stipica Pavicic
'''

from my.xml.configurator import ConfigObj
from my.process_timer import Timer

import os, fnmatch
import arcpy
import random
import shutil
from arcpy.conversion import PointToRaster

arcpy.CheckOutExtension("3D")

def createTmpGDB(folder):
    '''creates temporary file geodatabase'''
    #temp geodatabase name
    tmp_gdb_name = 'tempXXX.gdb'
    
    #redefine name if already exists
    while os.path.exists(folder + os.sep + tmp_gdb_name):
        tmp_gdb_name = tmp_gdb_name.replace('XXX',str(random.randint(1000000, 9000000)))
    
    arcpy.CreateFileGDB_management(folder, tmp_gdb_name )
        
    return tmp_gdb_name

def deleteTmpGDB(tmp_gdb):
    '''removes temporary file geodatabase'''    
    shutil.rmtree(tmp_gdb)
    return 0 
    

def proces_grid_file(in_file_folder, out_file_folder, resolution, in_srs, out_srs, filename, temporery_gdb):
    try:
        fbasename, extension = os.path.splitext(filename)
        in_csv = in_file_folder + os.sep + filename
        #point_data = out_file_folder + os.sep + temporery_gdb + os.sep + "tp" + fbasename 
        #basename can contain ilegal characters for arcpy that is why I use random name (for this particular case the name of intermediate results is not important 
        point_data = out_file_folder + os.sep + temporery_gdb + os.sep + "tp" + str(random.randint(1000000, 9000000))
        while os.path.exists(point_data):
            #redefine point_data file name if already exists
            point_data = out_file_folder + os.sep + temporery_gdb + os.sep + "tp" + str(random.randint(1000000, 9000000))
            
        out_tif = out_file_folder + os.sep + fbasename + ".tif"

        arcpy.ASCII3DToFeatureClass_3d(in_csv, "XYZ", point_data, "POINT", "", out_srs, "", "", "DECIMAL_POINT")
        arcpy.PointToRaster_conversion(point_data , "Shape.Z", out_tif, "MOST_FREQUENT", "NONE", str(resolution))
        #process finished without exception
        return 1
    except Exception as e:
        print e
        #process finished with exception
        return 0
    

for (root, dirs, files) in os.walk(r'C:\sp\python\grid_data'):

    measure = Timer()   
    config_file_name='info.xml'
    filter='*.csv'
    
    # get parameters from xml file (conf) and define configuration for processing the files in folder (root)
    if config_file_name in files:
        print '*****************************************************************************'
        print '>>>> Root    : ', root
        print 'Nr. of files : ' , len(fnmatch.filter(files,filter))  # nuber of files in folder that match filter expression 
        print 'Config file  : ' , root + os.sep + config_file_name
        conf = ConfigObj(root + os.sep + config_file_name)
        
        #open prj files
        inputSrsFile   = open(conf.getGetInputParam().getSrs(),'r')
        resultSrsFile  = open(conf.getGetResultParam().getSrs(),'r')
        
        #read files
        inputSrs  = inputSrsFile.read()
        resultSrs = resultSrsFile.read()  
        
        #close files 
        inputSrsFile.close()
        resultSrsFile.close()
        
        print 'spatial reference system definition'
        print 'input data srs :' , inputSrs
        print 'results    srs :'  , resultSrs   
        
        
        # create temporary file geodatabase
        tmp_gdb = createTmpGDB(root)
        
        
        # loop though list of files that match filter criteria (filter) & process files
        cnt = 0
        for filename in fnmatch.filter(files,filter):
            print '-----------------------------------------------------------------------------'
            cnt = cnt + 1    
            print cnt, ' -> processing file: ', filename
            fbasename, extension = os.path.splitext(filename)        
            #print root, '||', filename, '||', fbasename, '||', extension
            r=proces_grid_file(root, root, conf.getGetInputParam().getRes(), inputSrs, resultSrs, filename, tmp_gdb)
            if r:
                # r=0 process is finished without exception
                print filename, 'processing finished!'
            else:
                # r=1 process is finished with exception
                print filename, ': file is not processed!'
            
            print filename, measure.getTimeDifference()
        
        #delete temporary file geodatabase
        deleteTmpGDB(root + os.sep + tmp_gdb)
        
        # total time for processing
        print measure.getElapsedlTime()
Script starts at the line 59 but before that I have included some helper functions:
  • to create temporary file geodatabase (line 17)
  • to remove temporary file geodatabase (line 30)
  • file processing function (line 36)


... how to use this code?

Firstly you need to look how the files are organised. If they are stored in folder by grouping according to some joint property there is a good chance that this code can be useful to you, no matter if you need to perform the same processing (as in this post) or not. For this particular case files were stored in folders according to the resolution and the spatial reference system, ie. files with the same resolution and the same spatial reference system should be in the same folder. You can split files with the same properties in multiple folders but you cannot put the files with the different properties in the same folder.

Secondly, for all folders that you want to process, create one xml file with the properties. Use the same name for all xml files (see the code line 62; info.xml). Folders without configuration file won’t be processed.

Thirdly, change the filter you want to use for your case (line 63). Files I had to process were delivered as csv files.

And finally, run the script. To be able to run it you will have to establish working enviroment (all import should be accessible by script). To measure processing time I have also used class described in one of my previous posts. If you don’t want to use it just comment lines 7, 61, 110 and 116 (all lines related to Timer class).

Feel free to use this code and to modify it according to your needs. I hope that I will make a new post to show you how to modify existing code and how to use it by following this concept: walking through the folders and processing files based on the xml configuration file.

.....

... enjoy!

Thursday, October 11, 2012

How to measure processing time in Python

keywords: python, process time, python class

...if you need to measure a time during your python processing here is one helper class
'''
Created on 11. okt. 2012
@author: Stipica Pavicic
'''

import time
class Timer(object):
    '''
    timer starts with class initialisation
    '''

    def __init__(self):
        self.t1= time.time()
        self.t2= time.time()
        
    def getElapsedlTime(self):
        # gets total elapsed from class initialsation
        self.delta=time.time()-self.t1
        if self.delta > 60:
            self.delta = "Time elapsed:    %6.3f  min" % (self.delta/60) 
        else:
            self.delta = "Time elapsed:    %6.3f  sec" % (self.delta)
        return self.delta
        
    def getTimeDifference(self):
        # gets time elapsed from previous reading (for first reading this is equal to total time elapsed getElapsedlTime() 
        self.delta=time.time()-self.t2
        self.t2 = time.time()
        if self.delta > 60:
            self.delta = "Time difference: %6.3f  min" % (self.delta/60) 
        else:
            self.delta = "Time difference: %6.3f  sec" % (self.delta)
        return self.delta        
Using this class it is possible to get total time form class initialisation to some desired moment getElapsedlTime() or you can get time difference from some previous time reading getTimeDifference()

Just make sure that the class Timer is available in your script and you can use it just like in next example.
'''
Created on 11. okt. 2012
@author: Stipica Pavicic
'''

import time
from my.process_timer import Timer

print " ... initialise your timer"
measure=Timer()

print " ... begin some processing"
time.sleep(1.1)

# ... get elapsed time
print measure.getTimeDifference()
print measure.getElapsedlTime() 

print " ... some processing"
time.sleep(1.2)

# ... get elapsed time
print measure.getTimeDifference()
print measure.getElapsedlTime()

print " ... some processing"
time.sleep(1.3)

# ... get elapsed time
print measure.getTimeDifference()
print measure.getElapsedlTime()
Here is output:
 ... initialise your timer
 ... begin some processing
Time difference:  1.100  sec
Time elapsed:     1.100  sec
 ... some processing
Time difference:  1.200  sec
Time elapsed:     2.300  sec
 ... some processing
Time difference:  1.300  sec
Time elapsed:     3.600  sec
... enjoy

Friday, September 21, 2012

PyDev Installation

keywords: Eclipse, python, pydev, installation

I have just installed Eclipse Version: Juno Release (Build id: 20120614-1722) and I’ve wanted to install Python IDE PyDev. When I’ve started with the installation procedure I’ve run into problem.... here are the error log and the screen shot ....

HTTP Proxy Authentication Required: http://pydev.org/updates/content.xml
HTTP Proxy Authentication Required: http://pydev.org/updates/content.xml
Proxy Authentication Required




Luckily the solution was simple. It was only required to add some configuration options to eclipse.ini file. Instructions how to solve this issue I’ve found on internet were not so precise and possibly could make some confusion ... so I’ve decided to make one short post on this topic.

.... locate your eclipse.ini file and add at the end these lines:
-Dorg.eclipse.ecf.provider.filetransfer.excludeContributors=org.eclipse.ecf.provider.filetransfer.httpclient
-Dhttp.proxyPort=8080
-Dhttp.proxyHost=myproxy
-Dhttp.proxyUser=mydomain\myusername
-Dhttp.proxyPassword=mypassword
-Dhttp.nonProxyHosts=localhost|127.0.0.1
It is important to put these lines to last block after -vmargs. Here is the whole ini file (added lines 22-27).
-startup
plugins/org.eclipse.equinox.launcher_1.3.0.v20120522-1813.jar
--launcher.library
plugins/org.eclipse.equinox.launcher.win32.win32.x86_1.1.200.v20120522-1813
-product
org.eclipse.epp.package.jee.product
--launcher.defaultAction
openFile
--launcher.XXMaxPermSize
256M
-showsplash
org.eclipse.platform
--launcher.XXMaxPermSize
256m
--launcher.defaultAction
openFile
-vmargs
-Dosgi.requiredJavaVersion=1.5
-Dhelp.lucene.tokenizer=standard
-Xms40m
-Xmx512m
-Dorg.eclipse.ecf.provider.filetransfer.excludeContributors=org.eclipse.ecf.provider.filetransfer.httpclient
-Dhttp.proxyPort=8080
-Dhttp.proxyHost=myproxy
-Dhttp.proxyUser=mydomain\myusername
-Dhttp.proxyPassword=mypassword
-Dhttp.nonProxyHosts=localhost|127.0.0.1

...and don’t forget to restart Eclipse :-)


....here are more details.

... enjoy ...

Wednesday, September 19, 2012

Create grid/raster from ASCII file using GDAL, FME and ArcGIS

keywords: ASCII file, raster, grid, GDAL, OGR, FME, ArcGIS, python

... to extend a little bit presented GDAL functionality in my previous post I’ve decided to show you how to create raster file from list of coordinates stored in ASCII file. The intention is to create raster with pixel value taken from third coordinate. This can be anything from height, temperature, moisture, wind speed,...

I will show you how to to perform that with three tools I’m currently using GDAL, FME and ArcGIS. Unfortunately, this time I will limit myself only to these three tools but maybe in some post in the future I’ll include PostGIS, Oracle...

This tasks for GIS professionals is not demanding but many people with limited GIS knowledge need this and that is why I have decided to write down something about it or better to say demonstrate how to perform this task using three well known GIS tools.

The source ASCII file contains three columns with coordinates: easting northing and height.

This is segment of sample dataset (dtm_predicted_tides.csv):
445058.00, 6707614.00, 114.20
445060.00, 6707614.00, 114.22
445062.00, 6707614.00, 114.24
445064.00, 6707614.00, 114.25
445022.00, 6707616.00, 113.94
445024.00, 6707616.00, 113.96

... lets start with GDAL.

In my previous post I’ve used GDAL to generate concave hull.

This procedure very similar. Firstly we need to create vrt file (dem.vrt).
<OGRVRTDataSource>
   <OGRVRTLayer name="test_dataset">
       <LayerSRS>EPSG:23031</LayerSRS>
       <SrcDataSource>dtm_predicted_tides.csv</SrcDataSource>
       <GeometryType>wkbPoint</GeometryType>
       <GeometryField encoding="PointFromColumns" x="field_1" y="field_2" z="field_3"/>
   </OGRVRTLayer>                                                                                                       
</OGRVRTDataSource> 
To get extent use ogrinfo command. Parameters -al and -so will give you summary information for all layers (we have only one layer defined in vrt).
M:\__gdal_test>ogrinfo -al -so dem.vrt
INFO: Open of `dem.vrt'
     using driver `VRT' successful.

Layer name: dtm_predicted_tides
Geometry: Point
Feature Count: 433752
Extent: (444448.000000, 6707606.000000) - (447314.000000, 6709738.000000)
Layer SRS WKT:
PROJCS["ED50 / UTM zone 31N",
    GEOGCS["ED50",
       DATUM["European_Datum_1950",
           SPHEROID["International 1924",6378388,297,
               AUTHORITY["EPSG","7022"]],
           TOWGS84[-87,-98,-121,0,0,0,0],
           AUTHORITY["EPSG","6230"]],
       PRIMEM["Greenwich",0,
           AUTHORITY["EPSG","8901"]],
       UNIT["degree",0.0174532925199433,
           AUTHORITY["EPSG","9122"]],
       AUTHORITY["EPSG","4230"]],
    PROJECTION["Transverse_Mercator"],
    PARAMETER["latitude_of_origin",0],
    PARAMETER["central_meridian",3],
    PARAMETER["scale_factor",0.9996],
    PARAMETER["false_easting",500000],
    PARAMETER["false_northing",0],
    UNIT["metre",1,
       AUTHORITY["EPSG","9001"]],
    AXIS["Easting",EAST],
    AXIS["Northing",NORTH],
    AUTHORITY["EPSG","23031"]]
field_1: String (0.0)
field_2: String (0.0)
field_3: String (0.0)
From data extent and resolution of data stored in ASCII file calculate dimension and extent for raster file you are about to generate. To get points centered in relation to pixel you should expand raster area for half resolution value. In this case resolution is the same for easting and northing axes. In this simple calculation do not forget that number of pixels (ie. raster resolution) can only be integer value.
rmaxx=maxx+resx/2
rmaxy=maxy+resx/2
rminx=minx-resy/2
rminy=miny-resy/2

dimx=(rmaxx-rminx)/resx
dimy=(rmaxy-rminy)/resy

txe: rminx rmaxx
tye: rminy rmaxy
outsize: dimx dimy
Command gdal_grid will produce desired grid file (dem.tiff). Search ellipse radius 1 and 2 should be smaller than data resolution.
gdal_grid -a nearest:radius1=1:radius2=1:nodata:-999
          -txe 444447 447315   -tye 6707605 6709739  
          -outsize 1434 1067  
          -of GTiff -ot Float32 
          -l dtm_predicted_tides dem.vrt  dtm_predicted_tides dem.tiff
To check results you can use gdalinfo command.
M:\__gdal_test>gdalinfo dem.tiff
Driver: GTiff/GeoTIFF
Files: dem.tiff
Size is 1434, 1067
Coordinate System is:
PROJCS["ED50 / UTM zone 31N",
    GEOGCS["ED50",
       DATUM["European_Datum_1950",
           SPHEROID["International 1924",6378388,297.0000000000014,
               AUTHORITY["EPSG","7022"]],
           TOWGS84[-87,-98,-121,0,0,0,0],
           AUTHORITY["EPSG","6230"]],
       PRIMEM["Greenwich",0],
       UNIT["degree",0.0174532925199433],
       AUTHORITY["EPSG","4230"]],
    PROJECTION["Transverse_Mercator"],
    PARAMETER["latitude_of_origin",0],
    PARAMETER["central_meridian",3],
    PARAMETER["scale_factor",0.9996],
    PARAMETER["false_easting",500000],
    PARAMETER["false_northing",0],
    UNIT["metre",1,
       AUTHORITY["EPSG","9001"]],
    AUTHORITY["EPSG","23031"]]
Origin = (444447.000000000000000,6707605.000000000000000)
Pixel Size = (2.000000000000000,2.000000000000000)
Metadata:
 AREA_OR_POINT=Area
Image Structure Metadata:
 INTERLEAVE=BAND
Corner Coordinates:
Upper Left  (  444447.000, 6707605.000) (  1d59'19.59"E, 60d29'57.52"N)
Lower Left  (  444447.000, 6709739.000) (  1d59'17.44"E, 60d31' 6.48"N)
Upper Right (  447315.000, 6707605.000) (  2d 2'27.50"E, 60d29'58.91"N)
Lower Right (  447315.000, 6709739.000) (  2d 2'25.46"E, 60d31' 7.87"N)
Center      (  445881.000, 6708672.000) (  2d 0'52.50"E, 60d30'32.70"N)
Band 1 Block=1434x1 Type=Float32, ColorInterp=Gray


... here is FME example ...

This is even more simple. The first thing you have to define is csv data reader. In the way this is just like vrt file. For large datasets do not forget to limit number of scanned lines.


After that create geometry features form attributes using 3DPointReplacer


Use NumericRasterizer to create raster. You can define size of raster by entering number of cells by entering resolution (size of pixel) just like shown on the next figure.



In this moment everything is ready for to be written to some raster/grid format. In this example I’ve used Geotiff. On the writer you can add coordinate system definition.

Here is workflow for the whole process


...here is ArcGIS example...

In the Catalog Tree navigate to the 3D Analyst toolbox by expanding Toolboxes / System Toolboxes / 3D Analyst Tools. Double click ASCII 3D to Feature Class tool.



Enter all necessary values. That will create point features from the ASCII file. After that just start second command (see next figure) and the task is finished... for large dataset consider to drink a coffee or take a power nap :)



As I consider myself FME addict :) I just love model builder available in ArcGIS. Let me demonstrate how to create previous process using Model builder. Just open ModelBuilder window from Geoprocessing dropdown menu, drag tool from toolbox to Model builder and configure it just like it was demonstrated in previous steps. Process can be started from Model menu. Here is final look of process and created raster...



...and one more interesting functionality from ESRI... you can export this process to python script and then you have basic process coded in python and you can modify it and make it more powerful. It is very good start if you want to automatise some processes. Here is generated python code.
# ---------------------------------------------------------------------------
# process.py
# Created on: 2012-09-19 13:25:05.00000
#   (generated by ArcGIS/ModelBuilder)
# Description:
# ---------------------------------------------------------------------------

# Import arcpy module
import arcpy

# Check out any necessary licenses
arcpy.CheckOutExtension("3D")


# Local variables:
dtm_predicted_tides_csv__2_ = "M:\\__gdal_test\\dtm_predicted_tides.csv"
dtm_predicted_tides_MB = "\\\\...\\...\\gis_data.gdb\\dtm_predicted_tides_MB"
dtm_predicted_tides_MBr = "\\\\...\\...\\gis_data.gdb\\dtm_predicted_tides_MBr"

# Process: ASCII 3D to Feature Class
arcpy.ASCII3DToFeatureClass_3d("M:\\__gdal_test\\dtm_predicted_tides.csv", "XYZ", dtm_predicted_tides_MB, "POINT", "", "PROJCS['ED_1950_UTM_Zone_31N',GEOGCS['GCS_European_1950',DATUM['D_European_1950',SPHEROID['International_1924',6378388.0,297.0]],PRIMEM['Greenwich',0.0],UNIT['Degree',0.0174532925199433]],PROJECTION['Transverse_Mercator'],PARAMETER['False_Easting',500000.0],PARAMETER['False_Northing',0.0],PARAMETER['Central_Meridian',3.0],PARAMETER['Scale_Factor',0.9996],PARAMETER['Latitude_Of_Origin',0.0],UNIT['Meter',1.0]]", "", "", "DECIMAL_POINT")

# Process: Point to Raster
arcpy.PointToRaster_conversion(dtm_predicted_tides_MB, "Shape.Z", dtm_predicted_tides_MBr, "MAXIMUM", "NONE", "2")
Now just imagine how simple it will be to automatise this process for multiple files stored in some folder on filesystem.... with the help of little bit of python coding run trough folders, get file names, create list of filenames, create loop using generated python code and ... voilà ... you have new automated process :-)

....... I just want to mention here that FME and GDAL are also good integrated with python.

I hope that this post helps some of GIS beginners or some people that use GIS only occasionally....

....enjoy!

Friday, September 14, 2012

How to generate Concave Hull from regularly spaced point data using GDAL

keywords: GDAL, OGR, vrt, gdal_grid, gdal_contour, concave hull, grid/raster processing

If you have to to process data delivered in plain ASCII files you have probably run into concave hull problem. Contrary to convex hull problem GIS tools usually don’t have some builtin function that can help you with this issue. There is a good demonstration of PostGIS ST_ConcaveHull function

This is just to illustrate multiple possible solution to concave hull problem.

I this post presented scenario is not generic and it works best for regularly spaced points or almost regularly spaced points. With some adjustments can be applied for irregularly scattered points. For now I’ll describe solution for regularly spaced points. Although this procedure is not general I hope it will eventually help somebody.

The idea is to create raster/grid file from point data and to identify areas (pixels) that doesn’t contain any point. The easiest way is to produce grid in the way that value of each pixel represents number of points. For regularly spaced data if everything is setup well we should get only values 0 and 1.

From the created grid just generate contour with value 1. It represents concave hull for the regularly spaced data. If you have almost regularly spaced data you can apply the same principle and just generate contour with value 0.1 (this also works for regularly spacing) . To illustrate difference here is the figure with three contours with values 0.1, 0.5 and 1.

I’ll will start with simple ASCII file with data organized into three columns: x, y, z. That is very common data format that many professional use to share point data. Here is just few lines from used file dtm_predicted_tides.csv (in this scenario file does not contain headers).
446034.00, 6707606.00, 113.92
445036.00, 6707606.00, 113.95
445038.00, 6707606.00, 113.98
445040.00, 6707606.00, 114.00
...
First we have to create VRT (virtual format) file because GDAL cannot directly read and “understand” ASCII file. VRT file details you can find here.

For this application (no headers in ASCII file) VRT file will be (dem.vrt):
<OGRVRTDataSource>
    <OGRVRTLayer name="dtm_predicted_tides">
       <SrcDataSource>dtm_predicted_tides.csv</SrcDataSource>
               <GeometryType>wkbPoint</GeometryType>
               <GeometryField encoding="PointFromColumns" x="field_1" y="field_2" z="field_3"/>
    </OGRVRTLayer>
</OGRVRTDataSource>
To test it use command ogrinfo:
M:\__gdal_test>ogrinfo -al -so dem.vrt
INFO: Open of `dem.vrt'
     using driver `VRT' successful.

Layer name: dtm_predicted_tides
Geometry: Point
Feature Count: 433752
Extent: (444448.000000, 6707606.000000) - (447314.000000, 6709738.000000)
Layer SRS WKT:
(unknown)
field_1: String (0.0)
field_2: String (0.0)
field_3: String (0.0)
From this info data you will need extent to calculate some parameters that you will need in next step. To assure that generated contour line is closed it is necessary to extend area for at least one pixel value. The size of pixel is determined by resolution of original data. For regularly spaced points this value is equal to distance of two neighboring points along the easting and northing axes. For almost regularly spaced data you have to estimate this value bearing in mind the proposed scenario for concave hull estimation.

Taking into account generated extent and resolution of 2m it can be easily calculated that the generated grid should be sized 1433 x 1066. To extend area and to assure that point is located in the middle of generated pixel it is required to extend area for one and half pixel and in this case that is 3m.

Now we have all values necessary to run command. Notice that the size of grid is also changed because original extent was changed.
gdal_grid -a count:radius1=1:radius2=1
          -txe 444445 447317   -tye 6707603 6709741
          -outsize 1436 1069  
          -of GTiff -ot Float32 
          -l dtm_predicted_tides dem.vrt  concavehull_ext2.tiff
Usage: gdal_grid [--help-general] [--formats]
    [-ot {Byte/Int16/UInt16/UInt32/Int32/Float32/Float64/
         CInt16/CInt32/CFloat32/CFloat64}]
    [-of format] [-co "NAME=VALUE"]
    [-zfield field_name]
    [-a_srs srs_def] [-spat xmin ymin xmax ymax]
    [-clipsrc |WKT|datasource|spat_extent]
    [-clipsrcsql sql_statement] [-clipsrclayer layer]
    [-clipsrcwhere expression]
    [-l layername]* [-where expression] [-sql select_statement]
    [-txe xmin xmax] [-tye ymin ymax] [-outsize xsize ysize]
    [-a algorithm[:parameter1=value1]*]    [-q]
    <src_datasource> <dst_filename>

Available algorithms and parameters with their's defaults:
    Inverse distance to a power (default)
invdist:power=2.0:smoothing=0.0:radius1=0.0:radius2=0.0:angle=0.0:max_points=0:min_points=0:nodata=0.0
    Moving average
       average:radius1=0.0:radius2=0.0:angle=0.0:min_points=0:nodata=0.0
    Nearest neighbor
       nearest:radius1=0.0:radius2=0.0:angle=0.0:nodata=0.0
    Various data metrics
       :radius1=0.0:radius2=0.0:angle=0.0:min_points=0:nodata=0.0
       possible metrics are:
           minimum
           maximum
           range
           count
           average_distance
           average_distance_pts
Notice that we have used count metrics with the boath radius of search ellipse set to 1. The intention was to get one point per pixel. With data that are not regularly spaced this is not possible to get so in that case you should just play a little bit firstly with pixel size and then by search ellipse radius. At the end just interpolate... lets say 0.1 contour and you can get satisfactory results. .... Use gdalinfo to check generated raster
M:\__gdal_test>gdalinfo concavehull_ext2.tiff
Driver: GTiff/GeoTIFF
Files: concavehull_ext2.tiff
      concavehull_ext2.tiff.aux.xml
Size is 1436, 1069
Coordinate System is:
PROJCS["ED_1950_UTM_Zone_31N",
    GEOGCS["GCS_European_1950",
       DATUM["European_Datum_1950",
           SPHEROID["International_1924",6378388.0,297.0]],
       PRIMEM["Greenwich",0.0],
       UNIT["Degree",0.0174532925199433]],
    PROJECTION["Transverse_Mercator"],
    PARAMETER["False_Easting",500000.0],
    PARAMETER["False_Northing",0.0],
    PARAMETER["Central_Meridian",3.0],
    PARAMETER["Scale_Factor",0.9996],
    PARAMETER["Latitude_Of_Origin",0.0],
    UNIT["Meter",1.0],
    AUTHORITY["EPSG","23031"]]
Origin = (444445.000000000000000,6707603.000000000000000)
Pixel Size = (2.000000000000000,2.000000000000000)
Image Structure Metadata:
 INTERLEAVE=BAND
Corner Coordinates:
Upper Left  (  444445.000, 6707603.000) (  1d59'19.46"E, 60d29'57.45"N)
Lower Left  (  444445.000, 6709741.000) (  1d59'17.31"E, 60d31' 6.55"N)
Upper Right (  447317.000, 6707603.000) (  2d 2'27.63"E, 60d29'58.84"N)
Lower Right (  447317.000, 6709741.000) (  2d 2'25.59"E, 60d31' 7.94"N)
Center      (  445881.000, 6708672.000) (  2d 0'52.50"E, 60d30'32.70"N)
Band 1 Block=1436x1 Type=Float32, ColorInterp=Gray
 Min=0.000 Max=1.000
 Minimum=0.000, Maximum=1.000, Mean=0.283, StdDev=0.450
 Metadata:
    STATISTICS_MAXIMUM=1
    STATISTICS_MEAN=0,28255913031469
    STATISTICS_MINIMUM=0
    STATISTICS_STDDEV=0,45024393416031
Pixel size and corner coordinates should be as it was previously calculated. You won’t get info about statistics and coordinate system but for now that is not important.

In this moment we are ready to create concave hull or to be more precise we are able to generate contours. The following command will generate shape file with 0.1 contour and attribute elev that holds contour value.
gdal_contour -a elev concavehull_ext2.tiff concavehull_ext2_z.shp -fl 0.1
Usage: gdal_contour [-b ] [-a ] [-3d] [-inodata]
                   [-snodata n] [-f ] [-i ]
                   [-off ] [-fl  ...]
                   [-nln ] [-q]
                    
Here is generated concave hull for my case. Gray pixels contain no points and for the other pixels count value is equal to 1 (that was intention from the beginning). If you need you can generate multiple contours at once. In that case you can use elev attribute to extract what you intend to use for concave hull..
gdal_contour -a elev concavehull_ext2.tiff contours.shp -fl 0.1 0.5 0.7
If you need to visualise points you can use ogr2ogr to create shape file from vrt.
ogr2ogr dem.shp dem.vrt
Here is segment with metric raster, points and concave hull (contour=1). It is good practice to use coordinate system definition from the beginning of process. For that you should add one subelement to the vrt OGRVRTLayer element.
<LayerSRS>EPSG:23031</LayerSRS>
The previous is sufficient to propagate coordinate system definition to all generated files.

The whole process is very simple and can be summarized into few lines:
1. create vrt file and check it (ogrinfo); optionaly crate shape file ogr2ogr
2. create metric raster gdal_grid (check it with gdalinfo)
3. generate concave hull (gdal_contour)

... just play with your data and enjoy! :-)

Monday, April 30, 2012

Geomedia and Oracle: how to speed up initial connection

keywords: oracle, geomedia, geomedia objects, oracle log, sql hint, sql plus, oracle enterprise manager, spool, timing

If you are using Oracle database and Geomedia Professional or application based on Geomedia Objects you will probably want to speed up initial connection that might be very slow. In this post I will present to you solution to that specific issue that might significantly improve user experience. Same approach you can apply in many other situation where you can not control or change SQL generated by some tool/software.

To start optimisation process you will need to find out which statement causes trouble. That is possible to find out using Oracle tools as Oracle Enterprise Manager. You can login as a same user you are using from Geomedia. After that go to Performance/Top activity and find select statements you are looking for.

To help you little bit here are two screenshots. Some arbitrary query executed using SQL developer


and here is the same query found by Oracle Enterprise Manager (Performance/Top activity).


You can read more on this topic here.

The same thing you can accomplish using tool as TOAD or by SQL queries, etc...

In the case you want to optimize queries generated by Geomedia you can find all SQL queries in log file. To get information from log file you have to create file c:\temp\GOracle.log and then start Geomedia. Don’t be creative, the file name and location should be as it is written, you can not use different location and file name :)

After you connect your Geomedia to Oracle database you will get SQL queries generated by Geomedia in log file. That is a starting point for your optimisation.

Let me show you one query generated by Geomedia in the moment of initial connection (open your SQL Plus and try it yourself). Here is the SQL entered in SQL Plus and the content of my spool file (sql-log.txt).
spool sql-log.txt;
set timing on;
SELECT COUNT(1)
FROM ALL_OBJECTS
WHERE OBJECT_TYPE = 'TABLE'
 AND SECONDARY = 'N'
 AND (OWNER, OBJECT_NAME) IN
    (SELECT OWNER, TABLE_NAME
            FROM ALL_TABLES
            WHERE OWNER NOT IN ('SCOTT', 'MGMT_VIEW', 'WKPROXY', 'WKSYS', 'MDDATA', 'SYSMAN', 'ANONYMOUS', 'XDB', 'WK_TEST', 'OLAPSYS', 'CTXSYS', 'MDSYS', 'SI_INFORMTN_SCHEMA', 'ORDPLUGINS', 'ORDSYS', 'EXFSYS', 'WMSYS', 'DBSNMP', 'DMSYS', 'DIP', 'OUTLN', 'SYSTEM', 'SYS', 'SH'));
SELECT /*+ STAR_TRANSFORMATION */  COUNT(1)
FROM ALL_OBJECTS
WHERE OBJECT_TYPE = 'TABLE'
 AND SECONDARY = 'N'
 AND (OWNER, OBJECT_NAME) IN
    (SELECT OWNER, TABLE_NAME
            FROM ALL_TABLES
            WHERE OWNER NOT IN ('SCOTT', 'MGMT_VIEW', 'WKPROXY', 'WKSYS', 'MDDATA', 'SYSMAN', 'ANONYMOUS', 'XDB', 'WK_TEST', 'OLAPSYS', 'CTXSYS', 'MDSYS', 'SI_INFORMTN_SCHEMA', 'ORDPLUGINS', 'ORDSYS', 'EXFSYS', 'WMSYS', 'DBSNMP', 'DMSYS', 'DIP', 'OUTLN', 'SYSTEM', 'SYS', 'SH'));
spool off;

(Notice optimiser hint used to alter execution plan: /*+ STAR_TRANSFORMATION */ )

Here is content of spool file:
SQL> set timing on;
SQL> SELECT COUNT(1)
2   FROM ALL_OBJECTS
3   WHERE OBJECT_TYPE = 'TABLE'
4     AND SECONDARY = 'N'
5     AND (OWNER, OBJECT_NAME) IN
6      (SELECT OWNER, TABLE_NAME
7              FROM ALL_TABLES
8               WHERE OWNER NOT IN ('SCOTT', 'MGMT_VIEW', 'WKPROXY', 'WKSYS', 'MDDATA', 'SYSMAN', 'ANONYMOUS', 'XDB', 'WK_TEST', 'OLAPSYS', 'CTXSYS', 'MDSYS', 'SI_INFORMTN_SCHEMA', 'ORDPLUGINS', 'ORDSYS', 'EXFSYS', 'WMSYS', 'DBSNMP', 'DMSYS', 'DIP', 'OUTLN', 'SYSTEM', 'SYS', 'SH'));

COUNT(1)                                                                      
----------                                                                      
   27664                                                                      

Elapsed: 00:01:12.36
SQL> SELECT /*+ STAR_TRANSFORMATION */  COUNT(1)
2   FROM ALL_OBJECTS
3   WHERE OBJECT_TYPE = 'TABLE'
4     AND SECONDARY = 'N'
5     AND (OWNER, OBJECT_NAME) IN
6      (SELECT OWNER, TABLE_NAME
7              FROM ALL_TABLES
8               WHERE OWNER NOT IN ('SCOTT', 'MGMT_VIEW', 'WKPROXY', 'WKSYS', 'MDDATA', 'SYSMAN', 'ANONYMOUS', 'XDB', 'WK_TEST', 'OLAPSYS', 'CTXSYS', 'MDSYS', 'SI_INFORMTN_SCHEMA', 'ORDPLUGINS', 'ORDSYS', 'EXFSYS', 'WMSYS', 'DBSNMP', 'DMSYS', 'DIP', 'OUTLN', 'SYSTEM', 'SYS', 'SH'));

COUNT(1)                                                                      
----------                                                                      
   27664                                                                      

Elapsed: 00:00:01.23
SQL> spool off;

You can notice that elapsed time has changed dramatically, from 1:12.36 min to 1.23 seconds!

...just one brief comment.... every time Geomedia connects to the Oracle database, presented query is executed without any optimisation and your users will be very unsatisfied… not only users of your system, you as a software developer or system architect or GIS analyst.... you can also experience great disappointment waiting more than minute .. just to establish connection!!!! … and one more thing, this SQL is not the only one that make you trouble (Examine your log file!). In this post I’ll present you principles you can apply to any SQL.

Now it is time to “tell” Oracle when it receives query from Geomedia to run it in optimized form. Here id PL/SQL code to do that. This solution was proposed by Oracle Guru Damir Vadas (Tnx Damir!).
set serveroutput on size 123456;
 
DECLARE
  l_sql            VARCHAR2(2048 CHAR);
  l_sql_tune_task_id  VARCHAR2(100);
BEGIN
  l_sql :=  'select OWNER, OBJECT_NAME, TO_CHAR(CREATED, ''MM/DD/YYYY HH24:MI:SS''), TO_CHAR(LAST_DDL_TIME, ''MM/DD/YYYY HH24:MI:SS''), ''TABLE'' from ALL_OBJECTS where OBJECT_TYPE = ''TABLE'' and SECONDARY = ''N'' and (OWNER,OBJECT_NAME) in (select OWNER,TABLE_NAME from ALL_TABLES where OWNER not in (''SCOTT'',''MGMT_VIEW'',''WKPROXY'',''WKSYS'',''MDDATA'',''SYSMAN'',''ANONYMOUS'',''XDB'',''WK_TEST'',''OLAPSYS'',''CTXSYS'',''MDSYS'',''SI_INFORMTN_SCHEMA'',''ORDPLUGINS'',''ORDSYS'',''EXFSYS'',''WMSYS'',''DBSNMP'',''DMSYS'',''DIP'',''OUTLN'',''SYSTEM'',''SYS'',''SH'')) order by OWNER, OBJECT_NAME';
  l_sql_tune_task_id := DBMS_SQLTUNE.create_tuning_task (
                       sql_text => l_sql,
                       user_name       => null,
                       scope    => DBMS_SQLTUNE.scope_comprehensive,
                  time_limit      => 600,
                       task_name       => 'g3_tuning_task',
                       description     => 'Tuning task for GEOMEDIA.');
  DBMS_OUTPUT.put_line('l_sql_tune_task_id: ' || l_sql_tune_task_id);
END;
/
 
EXEC DBMS_SQLTUNE.execute_tuning_task(task_name => 'g3_tuning_task');
 
SELECT task_name, status FROM dba_advisor_log WHERE TASK_NAME='g3_tuning_task';
 
SET LONG 10000;
SET PAGESIZE 1000
SET LINESIZE 200
SELECT DBMS_SQLTUNE.report_tuning_task('g3_tuning_task') AS recommendations FROM dual;
 
execute dbms_sqltune.accept_sql_profile(task_name => 'g3_tuning_task', replace => TRUE);    

Take into consideration that you can not change query (variable: l_sql), take it exactly as it was written in log file. You have to take care only about quotation: double your quotes or use q’[..]’), do not change upper-case to lower-case or anything else. The limitation of this procedure is that you can not optimise queries that have some parameters that are changed every time.

Examine reports you get by proposed procedures (pay attention to estimated benefit).

When you work with Oracle and Geomedia (read/write) you will find GDOSYS schema in your database. There are several tables, views, triggers and sequences in that schema. One table (GEXCLUSIONS) is used to speed up connection. Here is description of that table from Intergraph's documentation: “The GEXCLUSIONS metadata table is specific to the Oracle Object Model Data Server and is used to exclude schemas from the initial connection scan. When establishing an Oracle connection, any schema that the connected user has privileges to see will be scanned for compatibility. The more schemas that are available to the connected user, the longer the connection takes. This is one reason it is not recommended to connect as a user with the DBA role.”

From my experience you can leave content of this table as it is and just perform proposed optimisation. If you change content of the table GEXCLUSIONS you wil have to perform optimisation one more time because every new entry in GEXCLUSIONS table changes query (variable: l_sql).

… so ... play a little bit with this and very soon you will notice the benefit … be careful, try it first on the test environment not production :)

… enjoy ...

Sunday, April 22, 2012

a little bit about Oracle Quote Operator and Substitution Variables

keywords: oracle quote operator, substitution variables, set define, string, ampersand, global temporary table

I would like to share in this post one Oracle functionality that might save you some of your time. If you write PL/SQL code or if you are programming in some other language sql statements most probably you will have to write to database textual data that contain quotes or you will have to create SQL statement that inserts varchar into database. In this case or other similar cases you will have to struggle with lot of quote signs. To circumvent all that trouble try to use Oracle quote operator.
q'<delimiter><string><delimiter>';
Here is how it works:
declare
   l_str varchar2(30) := q'[I’ll quote!]';
begin
   dbms_output.put_line(l_str);
end;
or just simple select
select q'[i’ll quote!]' from dual;
Last time I used this Oracle functionality introduced in Oracle 10g I had to store in database many predefined CQL filters and http requests GetLegendGraphic (the intention was to store configuration for the Web GIS application in database; Geoserver was GIS engine used on server side). Besides CQL and Legend elements I had to store some other configuration elements in database but for this post I would like to present you one more thing that can be helpful working with strings... especially if you are storing url in database table.
Here is example of url I had to store in database.
../geoserver/wms?REQUEST=GetLegendGraphic&VERSION=1.0.0&FORMAT=image/png&WIDTH=20
&HEIGHT=20&LAYER=PP:COUNTY
It is standard GET request that contains ampersand signs. To demonstrate what is so special with GET request I have prepared for you several SQL statements.... so.... open your SQL plus and try to run it.... First create table for testing, than try to run script 1 and than script 2... after testing remove table....
--to create table:
create global temporary table my_test_table (
 test_string  varchar2(1000)
) on commit preserve rows;
--to remove table:
truncate table my_test_table;
drop table my_test_table;
--script 1
set define off;
insert into  my_test_table
values (q'[../geoserver/wms?REQUEST=GetLegendGraphic&VERSION=1.0.0&FORMAT=image/png&WIDTH=20&HEIGHT=20&LAYER=PP:COUNTY]');
commit;
Try to execute the same statement but now with option set define on.
--script 2
set define on;
insert into  my_test_table
values (q'[../geoserver/wms?REQUEST=GetLegendGraphic&VERSION=1.0.0&FORMAT=image/png&WIDTH=20&HEIGHT=20&LAYER=PP:COUNTY]');
commit;
You will notice difference immediately :-)
… and If you didn’t know about oracle substitution variables maybe now you can find some interesting application for that functionality …..

On this link you can read more about it.

… enjoy …

Saturday, April 21, 2012

GOIDGenerator & CRCCalculator

keywords: Feature Manipulation Engine (FME), GOIDGenerator, CRCCalculator, Coordinate System, identifier based on location and/or geometry

...reading only caption of this post some people will immediately recognise that this post is about FME...

In the case you want to create identifyer based on the point location (coordinates) you can use GOIDGenerator or CRCCalculator transformer. If you are using GOID you should create your ID based on the first 16 characters of the generated GOID string. Later in this post I’m referencing to GOID in its shortened meaning (first 16 characters).


If you are using CRC you should take into count only geometry not attributes (Attributes to use in CRC calculation shuld be empty).



Solutions using GOID and CRC should generate the same result but in some cases it is not so. To make demonstration I have created sample workbench file.



At the beginning of the translation process I have created geometry object (point) and created GOID and CRC attribute based only on position or geometry (attributes goid_1 and _crc_1) after that I have assigned coordinate system to geometry object and repeated generation of GOID and CRC. This time values are stored in different named attributes (goid_2 and _crc_2) . At the end of the process two testers checks equality of generated GOIDs and generated CRCs. In the case of GOID the attribute value changed after CoordinateSystemSetter transformer what wasn’t case for CRC values.

Take that into account when you compare data that are coming from two different sources, sometimes the coordinates can be exact the same but omission of coordinate system definition can mislead you if you are relying on GOID.

Let me describe to you case that drove me to explore more in detail GOIDGenerator transformer. Here is conceptual schema.



Data in database do not require reprojection because they are in official coordinate system. New data are imported from csv file and after geometry creation (2DPointReplacer) data were transformed into official coordinate system. Feature merger should detect points that didn’t change coordinates. Detection wasn’t possible because branch with data coming from Oracle Spatial didn’t have coordinate system attached. The solution to this case was simple: "attach CS to the Oracle Spatial Reader" … sometimes when you are debugging workbench process such solution is not so obvious and you have to make number of tests to find out what seems to be issue.

To avoid coordinate system influence on GOID generation you can use following workflow:
  • store CS to attribute
  • remove CS
  • generate GOID
  • set CS based on attribute value


… or you can use CRCCalculator...

There are always many solution but keep in mind that coordinate system information is in some way stored in GOID and not in CRC... sometimes you need that information and sometimes you don’t.

... enjoy ...

Check Your Oracle Spatial Tables

keywords: oracle, spatial tables, spatial index, metadata, validity check

Here are few helpful SQL statements. You can use it when you migrate or create new Oracle database with spatial tables.

In the case when select statement return some rows there is something wrong with your spatial tables. First statement checks for missing metadata, second checks missing spatial indexes and the third checks index validity.

Keep in mind that these queries sometimes are not sufficient to prove that everything is OK with your spatial database or spatial data definition. For example you can have invalid entries in tables related to the definition of spatial reference system (if you are using custom SRID for your spatial data).

Anyway, these posted queries can help you in many cases to detect errors related to migration or creation of spatial tables. The check is performed for all database users (schemas). If you have some restriction, run these queries as a system or sys user.. or some other user that have privileges to execute select statements on all required tables.

… enjoy ...
-- to get all spatial tables without entry in user_sdo_metadata view
select geomet.sdo_owner, geomet.sdo_table_name, geomet.sdo_column_name,
    atbl.owner, atbl.table_name, atbl.column_name
    from mdsys.sdo_geom_metadata_table geomet, all_tab_cols atbl   
        where geomet.sdo_owner       (+)  = atbl.owner
        and geomet.sdo_table_name    (+)  = atbl.table_name
        and geomet.sdo_column_name   (+)  = atbl.column_name
        and geomet.sdo_owner              is null
        and geomet.sdo_table_name         is null
        and geomet.sdo_column_name        is null
        and atbl.data_type_owner       =  'MDSYS'  
        and atbl.data_type             =  'SDO_GEOMETRY'
        and atbl.owner                != 'MDSYS';
-- to get all missing spatial indexes
select alic.index_owner, alic.table_name, alic.column_name, 
    atbl.owner, atbl.table_name, atbl.column_name
    from  all_ind_columns alic, all_tab_cols atbl
        where alic.index_owner     (+)  = atbl.owner
        and   alic.table_name      (+)  = atbl.table_name
        and   alic.column_name     (+)  = atbl.column_name
        and   alic.index_owner          is null
        and   alic.table_name           is null
        and   alic.column_name          is null
        and   atbl.data_type_owner   =  'MDSYS'  
        and   atbl.data_type         =  'SDO_GEOMETRY'
        and   atbl.owner            !=  'MDSYS';
--to check spatial index validity
select alin.owner, alin.table_name, alin.status, 
    alin.domidx_status, alin.domidx_opstatus
    from  all_indexes alin
        where (alin.index_type        = 'DOMAIN'  
           and alin.ityp_name         = 'SPATIAL_INDEX' 
           and alin.table_name not like '%$%')
        and   (alin.status           != 'VALID' 
           or  alin.domidx_status    != 'VALID' 
           or  alin.domidx_opstatus  != 'VALID');

Friday, April 13, 2012

Reserved Characters in HTML: google-code-prettify

keywords: html reserved characters, google-code-prettify, xml

...just a small tip if you use google-code-prettify...

If you want to publish some code that includes xml tags or generaly speaking if your code contains some reservd characters in HTML like operators lighter-than and greater-than, you have to make replacement: insted of < you shoul put &lt; and insted of > put &gt; …

For example, if you want to prettify code:

    private  String getStyleTemplate(){
        String xmlTemplate ="";
        xmlTemplate = xmlTemplate + "<style>\n";
        xmlTemplate = xmlTemplate + "  <id>:style_id:</id>\n";
        xmlTemplate = xmlTemplate + "  <name>:style_name:</name>\n";
        xmlTemplate = xmlTemplate + "  <sldVersion>\n";
        xmlTemplate = xmlTemplate + "    <version>1.0.0</version>\n";
        xmlTemplate = xmlTemplate + "  </sldVersion>\n";
        xmlTemplate = xmlTemplate + "  <filename>:style_filename:</filename>\n";
        xmlTemplate = xmlTemplate + "</style>";
        
        return xmlTemplate;
    }

you shold enter in the HTML page:
<pre class="prettyprint linenums lang-java">

    private  String getStyleTemplate(){
        String xmlTemplate ="";
        xmlTemplate = xmlTemplate + "&lt;style&gt;\n";
        xmlTemplate = xmlTemplate + "  &lt;id&gt;:style_id:&lt;/id&gt;\n";
        xmlTemplate = xmlTemplate + "  &lt;name&gt;:style_name:&lt;/name&gt;\n";
        xmlTemplate = xmlTemplate + "  &lt;sldVersion&gt;\n";
        xmlTemplate = xmlTemplate + "    &lt;version&gt;1.0.0&lt;/version&gt;\n";
        xmlTemplate = xmlTemplate + "  &lt;/sldVersion&gt;\n";
        xmlTemplate = xmlTemplate + "  &lt;filename&gt;:style_filename:&lt;/filename&gt;\n";
        xmlTemplate = xmlTemplate + "&lt;/style&gt;";        

        return xmlTemplate;
    }
</pre>
and the result shold look like this:
    private  String getStyleTemplate(){
        String xmlTemplate ="";
        xmlTemplate = xmlTemplate + "<style>\n";
        xmlTemplate = xmlTemplate + "  <id>:style_id:</id>\n";
        xmlTemplate = xmlTemplate + "  <name>:style_name:</name>\n";
        xmlTemplate = xmlTemplate + "  <sldVersion>\n";
        xmlTemplate = xmlTemplate + "    <version>1.0.0</version>\n";
        xmlTemplate = xmlTemplate + "  </sldVersion>\n";
        xmlTemplate = xmlTemplate + "  <filename>:style_filename:</filename>\n";
        xmlTemplate = xmlTemplate + "</style>";        

        return xmlTemplate;
    }
I hope this helps someone.

To get exact result as in this page add custom css to your template in blogger, and use default prettify theme.
li.L0, li.L1, li.L2, li.L3, li.L4, li.L5, li.L6, li.L7, li.L8, li.L9
{
    color: #555;
    list-style-type: decimal;
}
...enjoy

Tuesday, April 10, 2012

Overcome performance issues using FME command line and Java

keywords: command line, FME, large datasets, database, SQL, Java

In this post I’ve described procedure how to use FME command line functionality to overcome some performance issue with large datasets.

… just a few words about FME command line ….

FME workbench can be played using FME workbench too or simply by using command line. You can see exact command in the translation log window.


fme.exe my_workbench.fmw
         --SourceDataset_CSV C:\my_data.txt
         --DestDataset_XLS_ADO C:\my_export.xls
First parameter is command (starts FME engine), second parameter is workbench name, after that can be arbitrarily number of parameters as it is defined in workbench file (user published parameters) but some of them are published by default like source or destination dataset (check for FME documentation for more details).
In this sample command the name of source dataset parameter for csv file is SourceDataset_CSV and the name of destination dataset parameter for xls file is DestDataset_XLS_ADO.

Create your workbench file and play it. Examine your own translation log window and see what parameters are expressed in command line. That is going to be your starting point for optimisation.

I don’t want to say that using proposed procedure you can solve all performance issues but some of them you will surely overcome, especially if your task has similarities with my case.

… the case …

I have to calculate mean, median and mode height for all Croatian LPIS parcels (land parcel information system). The source data were:
  • height data: dgn files for all country (DTM: points and breaklines; accuracy: +/- 2m).  To get more feeling on the data size you can immagine height grid of 25m x 25m that covers area of 56.542square km.
  • LPIS polygons (number of polygons > 1 000 000)

In the workbench file I had to overlay all LPIS polygons and DTM and than I had to calculate mean, median, mode for every parcel as well as accuracy. The procedure to solve this task is simple but to do calculation for whole country it takes a lot of time. The conceptual workflow is presented on the next figure.

From my point of view if you are working with large datasets and large number of files it is always good idea to upload your data to the database. In fact, this post assumes that you have access to some database system.

I work always in that way, from experience I can say that database will save you time and that conversion from file to database system is worth effort. In my work I use mostly Oracle database (sometimes PostGIS). For the sake of convenience, if you have to refer to original files later in process, during conversion from file to database system you may consider to add format attributes (fme_basename, fme_dataset, fme_feature_type).

Conversion from some file system to database system can be described by following schema.



You may consider some other approach depending on the source data format and data model. Proposed workflow is sufficient for CAD files (sometimes you have to expose color, style, weight and store it as attribute, just like other format attributes in step 2).

Let me go back to the first schema. Reading DTM files from file or even from the database can be time consuming. In the case of Croatian DTM I had to wait many hours (I don’t remember exactly but I’m speaking of period longer than 10 hours). Reading LPIS polygons wasn’t issue (step 2). Try to imagine how long will take to overlap all LPIS polygons and DTM, a how much RAM you should have to perform such task. It is not hard to see that such task can not be easily solved. In some cases it can be impossible to find solution without process redesign.

… redesign …

The goal is to make calculation (mean, median, mode for height) on the parcel level (one polygon). Parcel defines area of interest and DTM data provide height information. The workflow for successful problem solution can be described as follows.


Posted Java application creates two database view that play rule of the source dataset in the workbench file that is later executed using command line approach in the same application. To be honest this application can be written in many programing languages but for this particular task I decide to use Java. You just have to create workbench file that connects to database views. Here is the java code... (to run this application you need to have oracle jdbc driver).


package my.util.fme;

import java.sql.*;
import java.io.*;

public class CreateBatch {
 
  public static void main(String[] args) 
      throws ClassNotFoundException, SQLException, IOException
  {
    
 String fmeDat, datasource, username, password, fileName;
 String q1, q2, q3;
 
 Runtime rt;
 Process pr;
 int exitVal;
 String command;
 
 Process prcs;
 InputStreamReader isr ;
 BufferedReader br ;
 String line; 
 
 try{
  if(args.length==5){
   
   fmeDat = args[0];
   datasource = args[1];
   username = args[2];
   password = args[3];
   fileName = args[4];
   
   FileWriter fstream; 
   BufferedWriter out;
   
   Class.forName("oracle.jdbc.driver.OracleDriver");
   
      String url = "jdbc:oracle:thin:@oracle1.geofoto.lan:1521:" + datasource;
             
      Connection conn = 
           DriverManager.getConnection(url,username, password);
   
      conn.setAutoCommit(false);
   
      Statement stmt = conn.createStatement();
   Statement stmt1 = conn.createStatement();
   
      //step 1 in proposed workflow
   ResultSet rset =  stmt.executeQuery("select lp.id from lpis_parcels lp order by id");
   
      ResultSet rset1, rset2, rset3;
   
      while (rset.next()) {
   
    fstream = new FileWriter(fileName);
    out = new BufferedWriter(fstream);
        
    //create view with one LPIS parcel
    //step 1 in proposed workflow
    q1="create or replace view v_lpis_parcels as " +
    "select  lp.* from lpis_parcels lp where lp.id=" + rset.getString(1);
    
    //create view with DTM points that have anyinteract topological relation with current parcel -> rset.getString(1)
    //step 2 in proposed workflow
    q2="create or replace view v_dtm_points as " +
    "select dtm.* from dtm_points dtm where sdo_anyinteract(dtm.geom, select lp.geom from v_lpis_parcels) = 'TRUE'";
    
    //create view with DTM lines that have anyinteract topological relation with current parcel -> rset.getString(1)
    //step 2 in proposed workflow
    q3="create or replace view v_dtm_lines as " +
    "select dtm.* from dtm_lines dtm where sdo_anyinteract(dtm.geom, select lp.geom from v_lpis_parcels) = 'TRUE'";
    
    rset1 =  stmt1.executeQuery(q1);
    rset2 =  stmt1.executeQuery(q2);
    rset3 =  stmt1.executeQuery(q3);
    
    //step 3 & 4: in proposed workflow these steps are performed inside fme workbench file
    command = "fme.exe " + 
         fmeDat + 
         " --SourceDataset_ORACLE8I " + 
         datasource +
         " --DestDataset_ORACLE8I " +
         datasource;  
    
    //create batch file: contains only one command; it is recreated in every iteration
    //probably it can be sloved without file creation but this works and I didn't find another way to execute fme.exe (command)
    //step 3 & 4: in proposed workflow these steps are performed inside fme workbench file
    out.write(command);
    out.write('\n');
    out.close();
    
    //executed batch file created in previous step
    //step 3 & 4: in proposed workflow these steps are performed inside fme workbench file
    rt = Runtime.getRuntime();     
    prcs = rt.exec(fileName);   
    isr =  new InputStreamReader( prcs.getInputStream() );         
    br = new BufferedReader(  isr ); 
       
    while  ((line = br.readLine()) != null)
      System.out.println(line); 
      }
   
      stmt.close();
   stmt1.close();

  }else{
   System.out.println ("");
   System.out.println ("java CreateBatch     ");  
   System.out.println ("");
   System.out.println ("   FME workbench file: *.fmw");  
   System.out.println ("   Oracle Service Name: Ask Database Administrator!");  
   System.out.println ("   Username: Ask Database Administrator!");  
   System.out.println ("   Password: Ask Database Administrator!");  
   System.out.println ("   File Name: Name of batch file created during process");  
  }
 }
 catch(SQLException e){
 
  System.out.println("------------------------------");
  System.out.println("SQLException");
  System.out.println("------------------------------");
  System.out.println(e);
 }
 catch(ClassNotFoundException e){
  System.out.println("------------------------------");
  System.out.println("ClassNotFoundException");
  System.out.println("------------------------------");
  System.out.println(e);  
 }
 catch(IOException e){
  System.out.println("------------------------------");
  System.out.println("IOException");
  System.out.println("------------------------------");
  System.out.println(e);  
 }
  }
}


In the nutshell:
  • prepare FME workbench that connects to the database view/s (keep in mind that view/s are created on the fly using application)
  • using starter application (java, python, c++, ….) create database view/s with minimum data needed for translation process using FME
  • invoke FME using  starter application (in the same run as previous step)
  • loop process until you solve task for whole dataset
  • store result of every iteration for later reporting and analysis

… to conclude …

Optimisation of described case is reached by breaking process into smaller units (working with one parcel at the time) and by the fact that the FME workbench doesn’t need to read whole dataset at once because it reads data from views that are created on the fly using posted java application. Java application glues two separate processes: data preparation (SQL) and execution of fme workbench (FME command line)  as it is presented on the workflow diagram.

enjoy!

Thursday, April 5, 2012

Catalog (Inspect all attributes)

keywords: PL/SQL, Oracle, data catalog

this post is just generalisation of already posted procedure to create catalog from implemented database tables

…. Posted procedure inspect only one attribute (VRSTA) but I have found out later that there are valuable information stored in other attributes... Attribute names, description and domain were unknown. This time I had to make list of attributes and values in spatial tables so that I can easily decide based on attribute value what attributes might be interesting to customers so that I can include it in reporting or as info click functionality of web GIS application... yes, I know … the one should know in advance what attributes to include in application or report, unfortunately this is not my case :-) … based on report produced by posted PL/SQL procedure I just have to guess …

After you review  report (created by posted procedure) you might find number of columns with garbage data. Just remove it using this SQL statement:
alter table table_name drop column column_name;
to drop multiple columns you can use:
alter table table_name drop (column_name1,column_name2);

 … have fun!

declare
 cursor c_tablice is 
    select table_name t_name, column_name c_name from  all_tab_cols
       where owner = 'PPSV'  
       and data_type_owner is null -- to skip object types such as sdo_geometry
       and table_name not like '%$%'  -- to skip system values
       and column_name not like '%$%' -- to skip system values
       and column_name not like 'ID%'; -- to skip fields named ID%
       /*
         in my case all fields that follows pattern ID% are automatically generated (sequence) and they are not interesting for catalog;
         you can exclude some additional fields using the same principle
       */

       type vrsta_type is ref cursor;
       c_vrsta vrsta_type;
       vrsta varchar2(1000);
       v_sql varchar2(1000);
begin
for  r_tablice in c_tablice
loop
  dbms_output.put_line('> TABLE: '||r_tablice.t_name);
  v_sql := 'select distinct(' || r_tablice.c_name || ' ) d_vrsta from '|| r_tablice.t_name;
      open c_vrsta for v_sql ;
          loop
       fetch c_vrsta into vrsta ;
             exit when c_vrsta%NOTFOUND;
             dbms_output.put_line(r_tablice.c_name || ': '|| vrsta);
          end loop;
      dbms_output.put_line('');
      close  c_vrsta;
end loop;     
end;
/

Wednesday, April 4, 2012

Feature Manipulation Engine & Linear Referencing

keywords: FME, LRS (linear referencing system)

… a little bit something for FME addicts … my people :-)

Recently I have to to solve one simple task regarding linear referencing. It is common task for all systems where the coordinate system is related to the linear elements. Each feature location is expressed as a distance from the beginning of some line segment. For example of such systems I can point out road systems, pipeline systems and similar.

In this case that I’m writing about I had GPS data in csv format and road axis in the Oracle database. The GPS data represent location of the images (taken by the specially equipped vehicle for road measurements) along the road. To import images into application that animates ride down the road I had to express GPS coordinates for every image as a distance from the beginning of the road segment, ie. I had to establish LRS. For this particular task I’ve choose FME.

For better understanding I’ll break down the task into smaller activities (steps):
1. create point geometry from GPS data in csv format (if needed change the coordinate system)
2. prepare line geometry of selected road segment
3. move (map) measured GPS points to the nearest point on the road segment
4. calculate measure for all points on road segment (old and new points added in step 3)
5. join original attributes of GPS points (image ID needed for animation) with the calculated measure (step 4)

Here is the produced workbench file (FME 2012).




Unfortunately I can't post original data but if you have some additional questions feel free to ask. Maybe FME have some better solution for this task but I didn't have additional time to explore. The whole schema presented in separate steps is here.

  • 2D point adder creates geometry from atribute data
  • Reprojector chenges coordinate system of GPS data to match coordinate system of road data
  • Counter creates ID on every point (I wonder why I put it here...  just forget it ...)
  • Coordinate rounder removes unuseful decimals places on coordinates
  • tester removes points that are collected before measuring car entered discussed road
  • rounding the coordinates
  • length calculation  (_uk value) using LengthCalculator :-)
  • calculation of scale factor (ExpressionEvaluator)
  • ratio between distance calculated from geometry data (_uk) and distance measured by odometer (STAC_ZAV) and stored in database … this calculation is not so important for this solution but it may be required
  • store ratio (_koef) as a variable using VariableSetter
  • remove height coordinate (2DForcer)
  • NeighborFinder finds closest point on road line for every gps point. Coordinates are stored as attribute values
  • 2DPointReplacer takes previously calculated coordinates from attribute values and create geometry objects
  • custom transformer unique_geom creates unique identifier for every point based on point coordinates
  • overylay points on the line (PointOnLineOverlayer) and join generated line segments (LineJoiner)
  • calculate measure for all vertices (point) in line generated in previous step
  • break line feature into points (line vertices) using Chopper
  • create unique identifier for every point based on point coordinates (unique_geom)
  • extract generated measure (LR) an store it as attribute value (MeasureExtracor)
  • retrieve variable created in step 1 and store it as attribute value (VariableRetriever)
  • scale calculated measure values using variable value retrieved in previous step
  • using FeatureMerger merge calculated measure values (step 4) with the original attributes of GPS points. Unique identifier based on geometry is used for this task (custom transformer: unique_geom).
  • remove unnecessary attributes using AttributeRemover
  • store result into desired format (in this case it was xls)


Naming Convention (Spatial Column)

keywords: spatial database, spatial column, metadata, oracle, naming convention, PL/SQL …

sometimes, someone, somehow .. doesn't give too much attention naming the database tables and columns. That kind of messy database produce a lot of headache to the SW developers and GIS experts (speaking of spatial databases). This kind of mess happens specially when the tables are created using several different tools and by several different people. If you are at the end of the chain and you have to work with spatial tables that don't follow any kind of naming convention you may lose a lot of time …. I don’t want to mention nerves :-)

To help a little bit to myself and others that share same destiny I’ve wrote simple pl/sql procedure that renames spatial column of all spatial tables in one oracle schema. The changes are also done in spatial metadata (user_sdo_geom_metadata), spatial index is rebuilt automatically.

In this particular case schema name is PPSV and desired spatial column name is GEOM... Change it respectively for your needs …

If you use Geoserver and you have already published layers based on tables that need column renaming you just have to restart your Geoserver after you apply proposed procedure.

...enjoy!


declare
    cursor c_tablice is 
        select distinct(table_name) t_name 
            from  all_tab_cols 
            where owner = 'PPSV' 
            and column_name = 'VRSTA' 
            and table_name not like '%$%';
    cursor c_gtables is 
        select table_name t_name, column_name c_name 
            from  all_tab_cols 
            where owner = 'PPSV' 
            and  table_name not like '%$%' 
            and data_type = 'SDO_GEOMETRY' 
            order by table_name;

    type vrsta_type is ref cursor;
    c_vrsta vrsta_type;
    vrsta varchar2(2000);
    v_sql varchar2(2000);
    v_g_rename varchar2(2000);  -- sql to rename geometry field
    v_md_update varchar2(2000); -- sql to update geometry metadata
    v_new_geom_field_name varchar2(100) := 'GEOM';
begin
    for  r_tablice in c_gtables
    loop
        --dbms_output.put_line('> TABLE: '||r_tablice.t_name || '   GEOMETRY FIELD: ' || r_tablice.c_name);
        v_g_rename := 'alter table ' || r_tablice.t_name || ' rename column ' || r_tablice.c_name || ' to ' || v_new_geom_field_name;
        v_md_update := 'update user_sdo_geom_metadata set column_name = ' || q'[']' || v_new_geom_field_name || q'[']'||  ' where table_name = ' || q'[']' || r_tablice.t_name || q'[']' ;
     
     
        if r_tablice.c_name != v_new_geom_field_name
        then
            execute immediate v_g_rename;
            dbms_output.put_line(v_g_rename);
            execute immediate v_md_update;
            dbms_output.put_line(v_md_update );
        end if;

     
    end loop;
exception
    when others then
        dbms_output.put_line('err: ' || v_g_rename);
        dbms_output.put_line('err: ' || v_md_update);
end;
/