spatial blogging ... and so

Friday, December 14, 2012

Calling GDAL/OGR commands from Python

key words: python, GDAL/OGR, subprocess.check_call, os.walk, external commands, environment variables

If you plan to call GDAL/OGR commands from python code you might need to set up environment variables before you can do that. Environment variables can be defined at system level and in that case you don’t need to do that but when they are not defined on the system level you will have to take care of them.

Calling external commands from python can be accomplished by subprocess module. There are several function that can be useful for this task but it this post I didn’t show all the differences between these function the main thing I wanted to show you is how to set up environment variables required by some external program called by subprocess module. In this particular case external program is any of GDAL/OGR commands.

If you want to write porogram that calls GDAL/OGR I can assume that you need it to process multiple files and that is why this sample program uses os.walk function. Walk function allows you to process multiple files in the multiple folders. This sample filters all tif files that can be found in grid_data folder.

Using subprocess subprocess.check_call function I have demonstrated invocation of gdalinfo command for all TIFF files and just as an example one OGR command (ogrinfo). The same principles can be applied for file processing using any of available GDAL/OGR commands.

Here is the source code.

.... enjoy ...

'''
Created on 10. okt. 2012
@author: Stipica Pavicic
'''
import os, fnmatch, subprocess

for (root, dirs, files) in os.walk(r'L:\sp\python\grid_data'):
    #print (root)    
    #print (os.listdir(root)) # list of files and folders
    #print (fnmatch.filter(os.listdir(root),".")) # filter the list of files and folders
    
    for filename in fnmatch.filter(files,"*.tif"):
      
        fbasename, extension = os.path.splitext(filename)
        
        gdal_root = r'C:\release-1310-gdal-1-9-mapserver-6-0'
        pathv=''
        pathv= pathv + gdal_root  + r'\bin;' 
        pathv= pathv + gdal_root  + r'\bin\gdal\python\osgeo;' 
        pathv= pathv + gdal_root  + r'\bin\proj\apps;' 
        pathv= pathv + gdal_root  + r'\bin\gdal\apps;' 
        pathv= pathv + gdal_root  + r'\bin\ms\apps;' 
        pathv= pathv + gdal_root  + r'\bin\gdal\csharp;'  
        pathv= pathv + gdal_root  + r'\bin\ms\csharp;' 
        pathv= pathv + gdal_root  + r'\bin\curl;'
        pathv= pathv + gdal_root  + r'\bin\gdal\apps'      
        
        
        #if you don't work with Oracle or you don't need Oracle plugin just rename/remove \bin\gdal\plugins\gdal_GEOR.dll & ogr_OCI.dll files to avoid unnecessary error messages
        env_cust={"GDAL_DATA":        gdal_root + r'\bin\gdal-data', \
                  "GDAL_DRIVER_PATH": gdal_root + r'\bin\gdal\plugins',  \
                  "SET PYTHONPATH":   gdal_root + r'\bin\gdal\python\osgeo;' +  
                                      gdal_root + r'\bin\ms\python', \
                  "SET PROJ_LIB":     gdal_root + r'\bin\proj\SHARE', \
                  "PATH": pathv}
        
        os.chdir(root)
        #print (root, filename, fbasename, extension)
        #subprocess.check_call(['echo' , filename], env=env_cust, shell=True)
        subprocess.check_call(['gdalinfo', filename], env=env_cust, shell=True)
        subprocess.check_call(['ogrinfo', '--help-general'], env=env_cust, shell=True)

Monday, December 3, 2012

Automation of Raster/grid data processing using ArcGIS and python

key words: automation, processing, python, ArcGIS, arcpy, xml, minidom

The main idea for this post was to define way how to process datasets that are stored as a files and organized in multiple folders. To be more specific I will describe some particular case.

Task: Create GeoTiff files from delivered ASCII files (x,y,z). ASCII files are organised in folders: the root folder is “grid_data” and there are several subfolders grouping the data according to resolution (5m grid, 2m grid). There is no coordinate system definition attached to files. Coordinate system definition is delivered in separated pdf document as well as information about grid resolution.

Before processing you always have to make some preparation especially if you want to make you processing autonomous and not dependent to operator. In this case I’ve decided to create xml file (info.xml) that will hold some basic information about delivered data as well as some other information about desired parameters for processing. Here is sample file.

<?xml version="1.0" encoding="UTF-8"?>
<geoprocessing> 
 <input_data>
  <!-- dataset resolution in m --> 
  <res>5</res>
  <!-- name of ESRI prj file -->
  <srs>C:\sp\python\grid_data\srs\ED_1950_UTM_Zone_31N.prj</srs>
 </input_data>
 <results>
  <!-- Root folder for processed files. Leave empty if you want to use same folder as input data -->
  <root_folder></root_folder>
  <!-- name of ESRI prj file. Leave empty if you want to use same srs as input data -->
  <srs></srs>
 </results>
</geoprocessing>

I have decided to use xml for this purpose because it is more flexible when you need to make some changes (if you want to use similar approach to solution of some other task). To parse this xml file I have created following python class:

'''
Created on 23. nov. 2012

@author: Stipica Pavicic
'''
from xml.dom.minidom import parseString


'''
configuration sample file m:\python\info.xml
'''       

class ConfigObj(object):
    '''
    Parsing of xml file that holds configuration parameters for porcessing.
    Parameters are separated in two sections <input_data> and <results>.
    '''
    _dom=''
    _result_obj=''
    _input_obj=''
    
    class InputObj(object):
        '''
        parsing of <input_data> segment of info.xml file
        '''
        _res = 'not_set'
        _srs = 'not_set'
    
        def __init__(self, dom_obj):
            
            node = dom_obj.getElementsByTagName('input_data')[0]
            
            dom_elem = node.getElementsByTagName('srs')[0].firstChild
            if dom_elem:
                self._srs = dom_elem.data

    
            dom_elem = node.getElementsByTagName('res')[0].firstChild
            if dom_elem:
                self._res = dom_elem.data
         
        
        def getSrs(self):
            return self._srs
        
        def getRes(self):
            return self._res 
    
    class ResultObj(object):
        '''
        parsing of <results> segment of info.xml file
        '''
        _root_folder='not_set'
        _srs='not_set'
    
        def __init__(self, dom_obj):
            
            node = dom_obj.getElementsByTagName('results')[0]
            
            dom_elem = node.getElementsByTagName('srs')[0].firstChild
            if dom_elem:
                self._srs = dom_elem.data
            else:
                # in the case if result srs is not defined in xml file use the input data srs 
                nodei = dom_obj.getElementsByTagName('input_data')[0]
                dom_elemi = nodei.getElementsByTagName('srs')[0].firstChild
                self._srs = dom_elemi.data                
    
            dom_elem = node.getElementsByTagName('root_folder')[0].firstChild
            if dom_elem:
                self._root_folder = dom_elem.data
   
        
        def getSrs(self):
            return self._srs
        
        def getRootFolder(self):
            return self._root_folder 
        

    def __init__(self, info_xml_path):

        try:
            myfile = open(info_xml_path,'r')
            #convert to string:
            mydata = myfile.read()
            #close file because we dont need it anymore:
            myfile.close()
            #parse the xml you got from the file
            self._dom = parseString(mydata)
            self._result_obj = self.ResultObj(self._dom)
            self._input_obj = self.InputObj(self._dom)
        except IOError:
            print("IO Error")
        except:
            print("General Error!")
        
        
    def getDom(self):
        return self._dom
    
    def getGetResultParam(self):
        '''
        access  parameters
        '''
        return self._result_obj
    
    def getGetInputParam(self):
        '''
        access  parameters
        '''
        return self._input_obj

To use this class you need to create xml file with configuration parameters (eg. c:\test\info.xml). Here is the sample code how to access parameters:

from my.xml.configurator import ConfigObj

conf = ConfigObj(r'c:\test\info.xml')
print conf.getGetInputParam().getSrs()
print conf.getGetInputParam().getRes()
print conf.getGetResultParam().getSrs()
print conf.getGetResultParam().getRootFolder()

As you can see it is very easy to adjust python code to some different structured xml file. Maybe you need to add additional parameters inside <input> section, or maybe you want to add additional section for some other group of parameters (something like <global>). All this depends on the issue you want to solve using this approach. For this particular case I’ll stick with the parameters that already exists in presented xml file ..I’m calling it info.xml but it can be any other name :-) So... for now you know how to create configuration file, to create some parameters and values, and how to access those parameters using python code (class ConfigObj). Next step is to create script that process all files in all subfolders starting from some arbitrary root folder. Script will search for configuration files (info.xml) and processing will be performed only for those folders. Here is code for that. The processing is straightforward but I will make some comments just to explain idea. Some comments are also included in source code.

'''
Created on 23. nov. 2012
@author: Stipica Pavicic
'''

from my.xml.configurator import ConfigObj
from my.process_timer import Timer

import os, fnmatch
import arcpy
import random
import shutil
from arcpy.conversion import PointToRaster

arcpy.CheckOutExtension("3D")

def createTmpGDB(folder):
    '''creates temporary file geodatabase'''
    #temp geodatabase name
    tmp_gdb_name = 'tempXXX.gdb'
    
    #redefine name if already exists
    while os.path.exists(folder + os.sep + tmp_gdb_name):
        tmp_gdb_name = tmp_gdb_name.replace('XXX',str(random.randint(1000000, 9000000)))
    
    arcpy.CreateFileGDB_management(folder, tmp_gdb_name )
        
    return tmp_gdb_name

def deleteTmpGDB(tmp_gdb):
    '''removes temporary file geodatabase'''    
    shutil.rmtree(tmp_gdb)
    return 0 
    

def proces_grid_file(in_file_folder, out_file_folder, resolution, in_srs, out_srs, filename, temporery_gdb):
    try:
        fbasename, extension = os.path.splitext(filename)
        in_csv = in_file_folder + os.sep + filename
        #point_data = out_file_folder + os.sep + temporery_gdb + os.sep + "tp" + fbasename 
        #basename can contain ilegal characters for arcpy that is why I use random name (for this particular case the name of intermediate results is not important 
        point_data = out_file_folder + os.sep + temporery_gdb + os.sep + "tp" + str(random.randint(1000000, 9000000))
        while os.path.exists(point_data):
            #redefine point_data file name if already exists
            point_data = out_file_folder + os.sep + temporery_gdb + os.sep + "tp" + str(random.randint(1000000, 9000000))
            
        out_tif = out_file_folder + os.sep + fbasename + ".tif"

        arcpy.ASCII3DToFeatureClass_3d(in_csv, "XYZ", point_data, "POINT", "", out_srs, "", "", "DECIMAL_POINT")
        arcpy.PointToRaster_conversion(point_data , "Shape.Z", out_tif, "MOST_FREQUENT", "NONE", str(resolution))
        #process finished without exception
        return 1
    except Exception as e:
        print e
        #process finished with exception
        return 0
    

for (root, dirs, files) in os.walk(r'C:\sp\python\grid_data'):

    measure = Timer()   
    config_file_name='info.xml'
    filter='*.csv'
    
    # get parameters from xml file (conf) and define configuration for processing the files in folder (root)
    if config_file_name in files:
        print '*****************************************************************************'
        print '>>>> Root    : ', root
        print 'Nr. of files : ' , len(fnmatch.filter(files,filter))  # nuber of files in folder that match filter expression 
        print 'Config file  : ' , root + os.sep + config_file_name
        conf = ConfigObj(root + os.sep + config_file_name)
        
        #open prj files
        inputSrsFile   = open(conf.getGetInputParam().getSrs(),'r')
        resultSrsFile  = open(conf.getGetResultParam().getSrs(),'r')
        
        #read files
        inputSrs  = inputSrsFile.read()
        resultSrs = resultSrsFile.read()  
        
        #close files 
        inputSrsFile.close()
        resultSrsFile.close()
        
        print 'spatial reference system definition'
        print 'input data srs :' , inputSrs
        print 'results    srs :'  , resultSrs   
        
        
        # create temporary file geodatabase
        tmp_gdb = createTmpGDB(root)
        
        
        # loop though list of files that match filter criteria (filter) & process files
        cnt = 0
        for filename in fnmatch.filter(files,filter):
            print '-----------------------------------------------------------------------------'
            cnt = cnt + 1    
            print cnt, ' -> processing file: ', filename
            fbasename, extension = os.path.splitext(filename)        
            #print root, '||', filename, '||', fbasename, '||', extension
            r=proces_grid_file(root, root, conf.getGetInputParam().getRes(), inputSrs, resultSrs, filename, tmp_gdb)
            if r:
                # r=0 process is finished without exception
                print filename, 'processing finished!'
            else:
                # r=1 process is finished with exception
                print filename, ': file is not processed!'
            
            print filename, measure.getTimeDifference()
        
        #delete temporary file geodatabase
        deleteTmpGDB(root + os.sep + tmp_gdb)
        
        # total time for processing
        print measure.getElapsedlTime()

Script starts at the line 59 but before that I have included some helper functions:

to create temporary file geodatabase (line 17)
to remove temporary file geodatabase (line 30)
file processing function (line 36)

... how to use this code?

Firstly you need to look how the files are organised. If they are stored in folder by grouping according to some joint property there is a good chance that this code can be useful to you, no matter if you need to perform the same processing (as in this post) or not. For this particular case files were stored in folders according to the resolution and the spatial reference system, ie. files with the same resolution and the same spatial reference system should be in the same folder. You can split files with the same properties in multiple folders but you cannot put the files with the different properties in the same folder.

Secondly, for all folders that you want to process, create one xml file with the properties. Use the same name for all xml files (see the code line 62; info.xml). Folders without configuration file won’t be processed.

Thirdly, change the filter you want to use for your case (line 63). Files I had to process were delivered as csv files.

And finally, run the script. To be able to run it you will have to establish working enviroment (all import should be accessible by script). To measure processing time I have also used class described in one of my previous posts. If you don’t want to use it just comment lines 7, 61, 110 and 116 (all lines related to Timer class).

Feel free to use this code and to modify it according to your needs. I hope that I will make a new post to show you how to modify existing code and how to use it by following this concept: walking through the folders and processing files based on the xml configuration file.

.....

... enjoy!

Thursday, October 11, 2012

How to measure processing time in Python

keywords: python, process time, python class

...if you need to measure a time during your python processing here is one helper class

'''
Created on 11. okt. 2012
@author: Stipica Pavicic
'''

import time
class Timer(object):
    '''
    timer starts with class initialisation
    '''

    def __init__(self):
        self.t1= time.time()
        self.t2= time.time()
        
    def getElapsedlTime(self):
        # gets total elapsed from class initialsation
        self.delta=time.time()-self.t1
        if self.delta > 60:
            self.delta = "Time elapsed:    %6.3f  min" % (self.delta/60) 
        else:
            self.delta = "Time elapsed:    %6.3f  sec" % (self.delta)
        return self.delta
        
    def getTimeDifference(self):
        # gets time elapsed from previous reading (for first reading this is equal to total time elapsed getElapsedlTime() 
        self.delta=time.time()-self.t2
        self.t2 = time.time()
        if self.delta > 60:
            self.delta = "Time difference: %6.3f  min" % (self.delta/60) 
        else:
            self.delta = "Time difference: %6.3f  sec" % (self.delta)
        return self.delta

Using this class it is possible to get total time form class initialisation to some desired moment getElapsedlTime() or you can get time difference from some previous time reading getTimeDifference()

Just make sure that the class Timer is available in your script and you can use it just like in next example.

'''
Created on 11. okt. 2012
@author: Stipica Pavicic
'''

import time
from my.process_timer import Timer

print " ... initialise your timer"
measure=Timer()

print " ... begin some processing"
time.sleep(1.1)

# ... get elapsed time
print measure.getTimeDifference()
print measure.getElapsedlTime() 

print " ... some processing"
time.sleep(1.2)

# ... get elapsed time
print measure.getTimeDifference()
print measure.getElapsedlTime()

print " ... some processing"
time.sleep(1.3)

# ... get elapsed time
print measure.getTimeDifference()
print measure.getElapsedlTime()

Here is output:

 ... initialise your timer
 ... begin some processing
Time difference:  1.100  sec
Time elapsed:     1.100  sec
 ... some processing
Time difference:  1.200  sec
Time elapsed:     2.300  sec
 ... some processing
Time difference:  1.300  sec
Time elapsed:     3.600  sec

... enjoy

Friday, September 21, 2012

PyDev Installation

keywords: Eclipse, python, pydev, installation

I have just installed Eclipse Version: Juno Release (Build id: 20120614-1722) and I’ve wanted to install Python IDE PyDev. When I’ve started with the installation procedure I’ve run into problem.... here are the error log and the screen shot ....

HTTP Proxy Authentication Required: http://pydev.org/updates/content.xml
HTTP Proxy Authentication Required: http://pydev.org/updates/content.xml
Proxy Authentication Required

Luckily the solution was simple. It was only required to add some configuration options to eclipse.ini file. Instructions how to solve this issue I’ve found on internet were not so precise and possibly could make some confusion ... so I’ve decided to make one short post on this topic.

.... locate your eclipse.ini file and add at the end these lines:

-Dorg.eclipse.ecf.provider.filetransfer.excludeContributors=org.eclipse.ecf.provider.filetransfer.httpclient
-Dhttp.proxyPort=8080
-Dhttp.proxyHost=myproxy
-Dhttp.proxyUser=mydomain\myusername
-Dhttp.proxyPassword=mypassword
-Dhttp.nonProxyHosts=localhost|127.0.0.1

It is important to put these lines to last block after -vmargs. Here is the whole ini file (added lines 22-27).

-startup
plugins/org.eclipse.equinox.launcher_1.3.0.v20120522-1813.jar
--launcher.library
plugins/org.eclipse.equinox.launcher.win32.win32.x86_1.1.200.v20120522-1813
-product
org.eclipse.epp.package.jee.product
--launcher.defaultAction
openFile
--launcher.XXMaxPermSize
256M
-showsplash
org.eclipse.platform
--launcher.XXMaxPermSize
256m
--launcher.defaultAction
openFile
-vmargs
-Dosgi.requiredJavaVersion=1.5
-Dhelp.lucene.tokenizer=standard
-Xms40m
-Xmx512m
-Dorg.eclipse.ecf.provider.filetransfer.excludeContributors=org.eclipse.ecf.provider.filetransfer.httpclient
-Dhttp.proxyPort=8080
-Dhttp.proxyHost=myproxy
-Dhttp.proxyUser=mydomain\myusername
-Dhttp.proxyPassword=mypassword
-Dhttp.nonProxyHosts=localhost|127.0.0.1

...and don’t forget to restart Eclipse :-)

....here are more details.

... enjoy ...

Wednesday, September 19, 2012

Create grid/raster from ASCII file using GDAL, FME and ArcGIS

keywords: ASCII file, raster, grid, GDAL, OGR, FME, ArcGIS, python

... to extend a little bit presented GDAL functionality in my previous post I’ve decided to show you how to create raster file from list of coordinates stored in ASCII file. The intention is to create raster with pixel value taken from third coordinate. This can be anything from height, temperature, moisture, wind speed,...

I will show you how to to perform that with three tools I’m currently using GDAL, FME and ArcGIS. Unfortunately, this time I will limit myself only to these three tools but maybe in some post in the future I’ll include PostGIS, Oracle...

This tasks for GIS professionals is not demanding but many people with limited GIS knowledge need this and that is why I have decided to write down something about it or better to say demonstrate how to perform this task using three well known GIS tools.

The source ASCII file contains three columns with coordinates: easting northing and height.

This is segment of sample dataset (dtm_predicted_tides.csv):

445058.00, 6707614.00, 114.20
445060.00, 6707614.00, 114.22
445062.00, 6707614.00, 114.24
445064.00, 6707614.00, 114.25
445022.00, 6707616.00, 113.94
445024.00, 6707616.00, 113.96

... lets start with GDAL.

In my previous post I’ve used GDAL to generate concave hull.

This procedure very similar. Firstly we need to create vrt file (dem.vrt).

<OGRVRTDataSource>
   <OGRVRTLayer name="test_dataset">
       <LayerSRS>EPSG:23031</LayerSRS>
       <SrcDataSource>dtm_predicted_tides.csv</SrcDataSource>
       <GeometryType>wkbPoint</GeometryType>
       <GeometryField encoding="PointFromColumns" x="field_1" y="field_2" z="field_3"/>
   </OGRVRTLayer>                                                                                                       
</OGRVRTDataSource>

To get extent use ogrinfo command. Parameters -al and -so will give you summary information for all layers (we have only one layer defined in vrt).

M:\__gdal_test>ogrinfo -al -so dem.vrt
INFO: Open of `dem.vrt'
     using driver `VRT' successful.

Layer name: dtm_predicted_tides
Geometry: Point
Feature Count: 433752
Extent: (444448.000000, 6707606.000000) - (447314.000000, 6709738.000000)
Layer SRS WKT:
PROJCS["ED50 / UTM zone 31N",
    GEOGCS["ED50",
       DATUM["European_Datum_1950",
           SPHEROID["International 1924",6378388,297,
               AUTHORITY["EPSG","7022"]],
           TOWGS84[-87,-98,-121,0,0,0,0],
           AUTHORITY["EPSG","6230"]],
       PRIMEM["Greenwich",0,
           AUTHORITY["EPSG","8901"]],
       UNIT["degree",0.0174532925199433,
           AUTHORITY["EPSG","9122"]],
       AUTHORITY["EPSG","4230"]],
    PROJECTION["Transverse_Mercator"],
    PARAMETER["latitude_of_origin",0],
    PARAMETER["central_meridian",3],
    PARAMETER["scale_factor",0.9996],
    PARAMETER["false_easting",500000],
    PARAMETER["false_northing",0],
    UNIT["metre",1,
       AUTHORITY["EPSG","9001"]],
    AXIS["Easting",EAST],
    AXIS["Northing",NORTH],
    AUTHORITY["EPSG","23031"]]
field_1: String (0.0)
field_2: String (0.0)
field_3: String (0.0)

From data extent and resolution of data stored in ASCII file calculate dimension and extent for raster file you are about to generate. To get points centered in relation to pixel you should expand raster area for half resolution value. In this case resolution is the same for easting and northing axes. In this simple calculation do not forget that number of pixels (ie. raster resolution) can only be integer value.

rmaxx=maxx+resx/2
rmaxy=maxy+resx/2
rminx=minx-resy/2
rminy=miny-resy/2

dimx=(rmaxx-rminx)/resx
dimy=(rmaxy-rminy)/resy

txe: rminx rmaxx
tye: rminy rmaxy
outsize: dimx dimy

Command gdal_grid will produce desired grid file (dem.tiff). Search ellipse radius 1 and 2 should be smaller than data resolution.

gdal_grid -a nearest:radius1=1:radius2=1:nodata:-999
          -txe 444447 447315   -tye 6707605 6709739  
          -outsize 1434 1067  
          -of GTiff -ot Float32 
          -l dtm_predicted_tides dem.vrt  dtm_predicted_tides dem.tiff

To check results you can use gdalinfo command.

M:\__gdal_test>gdalinfo dem.tiff
Driver: GTiff/GeoTIFF
Files: dem.tiff
Size is 1434, 1067
Coordinate System is:
PROJCS["ED50 / UTM zone 31N",
    GEOGCS["ED50",
       DATUM["European_Datum_1950",
           SPHEROID["International 1924",6378388,297.0000000000014,
               AUTHORITY["EPSG","7022"]],
           TOWGS84[-87,-98,-121,0,0,0,0],
           AUTHORITY["EPSG","6230"]],
       PRIMEM["Greenwich",0],
       UNIT["degree",0.0174532925199433],
       AUTHORITY["EPSG","4230"]],
    PROJECTION["Transverse_Mercator"],
    PARAMETER["latitude_of_origin",0],
    PARAMETER["central_meridian",3],
    PARAMETER["scale_factor",0.9996],
    PARAMETER["false_easting",500000],
    PARAMETER["false_northing",0],
    UNIT["metre",1,
       AUTHORITY["EPSG","9001"]],
    AUTHORITY["EPSG","23031"]]
Origin = (444447.000000000000000,6707605.000000000000000)
Pixel Size = (2.000000000000000,2.000000000000000)
Metadata:
 AREA_OR_POINT=Area
Image Structure Metadata:
 INTERLEAVE=BAND
Corner Coordinates:
Upper Left  (  444447.000, 6707605.000) (  1d59'19.59"E, 60d29'57.52"N)
Lower Left  (  444447.000, 6709739.000) (  1d59'17.44"E, 60d31' 6.48"N)
Upper Right (  447315.000, 6707605.000) (  2d 2'27.50"E, 60d29'58.91"N)
Lower Right (  447315.000, 6709739.000) (  2d 2'25.46"E, 60d31' 7.87"N)
Center      (  445881.000, 6708672.000) (  2d 0'52.50"E, 60d30'32.70"N)
Band 1 Block=1434x1 Type=Float32, ColorInterp=Gray

... here is FME example ...

This is even more simple. The first thing you have to define is csv data reader. In the way this is just like vrt file. For large datasets do not forget to limit number of scanned lines.

After that create geometry features form attributes using 3DPointReplacer

Use NumericRasterizer to create raster. You can define size of raster by entering number of cells by entering resolution (size of pixel) just like shown on the next figure.

In this moment everything is ready for to be written to some raster/grid format. In this example I’ve used Geotiff. On the writer you can add coordinate system definition.

Here is workflow for the whole process

...here is ArcGIS example...

In the Catalog Tree navigate to the 3D Analyst toolbox by expanding Toolboxes / System Toolboxes / 3D Analyst Tools. Double click ASCII 3D to Feature Class tool.

Enter all necessary values. That will create point features from the ASCII file. After that just start second command (see next figure) and the task is finished... for large dataset consider to drink a coffee or take a power nap :)

As I consider myself FME addict :) I just love model builder available in ArcGIS. Let me demonstrate how to create previous process using Model builder. Just open ModelBuilder window from Geoprocessing dropdown menu, drag tool from toolbox to Model builder and configure it just like it was demonstrated in previous steps. Process can be started from Model menu. Here is final look of process and created raster...

...and one more interesting functionality from ESRI... you can export this process to python script and then you have basic process coded in python and you can modify it and make it more powerful. It is very good start if you want to automatise some processes. Here is generated python code.

# ---------------------------------------------------------------------------
# process.py
# Created on: 2012-09-19 13:25:05.00000
#   (generated by ArcGIS/ModelBuilder)
# Description:
# ---------------------------------------------------------------------------

# Import arcpy module
import arcpy

# Check out any necessary licenses
arcpy.CheckOutExtension("3D")


# Local variables:
dtm_predicted_tides_csv__2_ = "M:\\__gdal_test\\dtm_predicted_tides.csv"
dtm_predicted_tides_MB = "\\\\...\\...\\gis_data.gdb\\dtm_predicted_tides_MB"
dtm_predicted_tides_MBr = "\\\\...\\...\\gis_data.gdb\\dtm_predicted_tides_MBr"

# Process: ASCII 3D to Feature Class
arcpy.ASCII3DToFeatureClass_3d("M:\\__gdal_test\\dtm_predicted_tides.csv", "XYZ", dtm_predicted_tides_MB, "POINT", "", "PROJCS['ED_1950_UTM_Zone_31N',GEOGCS['GCS_European_1950',DATUM['D_European_1950',SPHEROID['International_1924',6378388.0,297.0]],PRIMEM['Greenwich',0.0],UNIT['Degree',0.0174532925199433]],PROJECTION['Transverse_Mercator'],PARAMETER['False_Easting',500000.0],PARAMETER['False_Northing',0.0],PARAMETER['Central_Meridian',3.0],PARAMETER['Scale_Factor',0.9996],PARAMETER['Latitude_Of_Origin',0.0],UNIT['Meter',1.0]]", "", "", "DECIMAL_POINT")

# Process: Point to Raster
arcpy.PointToRaster_conversion(dtm_predicted_tides_MB, "Shape.Z", dtm_predicted_tides_MBr, "MAXIMUM", "NONE", "2")

Now just imagine how simple it will be to automatise this process for multiple files stored in some folder on filesystem.... with the help of little bit of python coding run trough folders, get file names, create list of filenames, create loop using generated python code and ... voilà ... you have new automated process :-)

....... I just want to mention here that FME and GDAL are also good integrated with python.

I hope that this post helps some of GIS beginners or some people that use GIS only occasionally....

....enjoy!

Friday, September 14, 2012

How to generate Concave Hull from regularly spaced point data using GDAL

keywords: GDAL, OGR, vrt, gdal_grid, gdal_contour, concave hull, grid/raster processing

If you have to to process data delivered in plain ASCII files you have probably run into concave hull problem. Contrary to convex hull problem GIS tools usually don’t have some builtin function that can help you with this issue. There is a good demonstration of PostGIS ST_ConcaveHull function

This is just to illustrate multiple possible solution to concave hull problem.

I this post presented scenario is not generic and it works best for regularly spaced points or almost regularly spaced points. With some adjustments can be applied for irregularly scattered points. For now I’ll describe solution for regularly spaced points. Although this procedure is not general I hope it will eventually help somebody.

The idea is to create raster/grid file from point data and to identify areas (pixels) that doesn’t contain any point. The easiest way is to produce grid in the way that value of each pixel represents number of points. For regularly spaced data if everything is setup well we should get only values 0 and 1.

From the created grid just generate contour with value 1. It represents concave hull for the regularly spaced data. If you have almost regularly spaced data you can apply the same principle and just generate contour with value 0.1 (this also works for regularly spacing) . To illustrate difference here is the figure with three contours with values 0.1, 0.5 and 1.

I’ll will start with simple ASCII file with data organized into three columns: x, y, z. That is very common data format that many professional use to share point data. Here is just few lines from used file dtm_predicted_tides.csv (in this scenario file does not contain headers).

446034.00, 6707606.00, 113.92
445036.00, 6707606.00, 113.95
445038.00, 6707606.00, 113.98
445040.00, 6707606.00, 114.00
...

First we have to create VRT (virtual format) file because GDAL cannot directly read and “understand” ASCII file. VRT file details you can find here.

For this application (no headers in ASCII file) VRT file will be (dem.vrt):

<OGRVRTDataSource>
    <OGRVRTLayer name="dtm_predicted_tides">
       <SrcDataSource>dtm_predicted_tides.csv</SrcDataSource>
               <GeometryType>wkbPoint</GeometryType>
               <GeometryField encoding="PointFromColumns" x="field_1" y="field_2" z="field_3"/>
    </OGRVRTLayer>
</OGRVRTDataSource>

To test it use command ogrinfo:

M:\__gdal_test>ogrinfo -al -so dem.vrt
INFO: Open of `dem.vrt'
     using driver `VRT' successful.

Layer name: dtm_predicted_tides
Geometry: Point
Feature Count: 433752
Extent: (444448.000000, 6707606.000000) - (447314.000000, 6709738.000000)
Layer SRS WKT:
(unknown)
field_1: String (0.0)
field_2: String (0.0)
field_3: String (0.0)

From this info data you will need extent to calculate some parameters that you will need in next step. To assure that generated contour line is closed it is necessary to extend area for at least one pixel value. The size of pixel is determined by resolution of original data. For regularly spaced points this value is equal to distance of two neighboring points along the easting and northing axes. For almost regularly spaced data you have to estimate this value bearing in mind the proposed scenario for concave hull estimation.

Taking into account generated extent and resolution of 2m it can be easily calculated that the generated grid should be sized 1433 x 1066. To extend area and to assure that point is located in the middle of generated pixel it is required to extend area for one and half pixel and in this case that is 3m.

Now we have all values necessary to run command. Notice that the size of grid is also changed because original extent was changed.

gdal_grid -a count:radius1=1:radius2=1
          -txe 444445 447317   -tye 6707603 6709741
          -outsize 1436 1069  
          -of GTiff -ot Float32 
          -l dtm_predicted_tides dem.vrt  concavehull_ext2.tiff

Usage: gdal_grid [--help-general] [--formats]
    [-ot {Byte/Int16/UInt16/UInt32/Int32/Float32/Float64/
         CInt16/CInt32/CFloat32/CFloat64}]
    [-of format] [-co "NAME=VALUE"]
    [-zfield field_name]
    [-a_srs srs_def] [-spat xmin ymin xmax ymax]
    [-clipsrc |WKT|datasource|spat_extent]
    [-clipsrcsql sql_statement] [-clipsrclayer layer]
    [-clipsrcwhere expression]
    [-l layername]* [-where expression] [-sql select_statement]
    [-txe xmin xmax] [-tye ymin ymax] [-outsize xsize ysize]
    [-a algorithm[:parameter1=value1]*]    [-q]
    <src_datasource> <dst_filename>

Available algorithms and parameters with their's defaults:
    Inverse distance to a power (default)
invdist:power=2.0:smoothing=0.0:radius1=0.0:radius2=0.0:angle=0.0:max_points=0:min_points=0:nodata=0.0
    Moving average
       average:radius1=0.0:radius2=0.0:angle=0.0:min_points=0:nodata=0.0
    Nearest neighbor
       nearest:radius1=0.0:radius2=0.0:angle=0.0:nodata=0.0
    Various data metrics
       :radius1=0.0:radius2=0.0:angle=0.0:min_points=0:nodata=0.0
       possible metrics are:
           minimum
           maximum
           range
           count
           average_distance
           average_distance_pts

Notice that we have used count metrics with the boath radius of search ellipse set to 1. The intention was to get one point per pixel. With data that are not regularly spaced this is not possible to get so in that case you should just play a little bit firstly with pixel size and then by search ellipse radius. At the end just interpolate... lets say 0.1 contour and you can get satisfactory results. .... Use gdalinfo to check generated raster

M:\__gdal_test>gdalinfo concavehull_ext2.tiff
Driver: GTiff/GeoTIFF
Files: concavehull_ext2.tiff
      concavehull_ext2.tiff.aux.xml
Size is 1436, 1069
Coordinate System is:
PROJCS["ED_1950_UTM_Zone_31N",
    GEOGCS["GCS_European_1950",
       DATUM["European_Datum_1950",
           SPHEROID["International_1924",6378388.0,297.0]],
       PRIMEM["Greenwich",0.0],
       UNIT["Degree",0.0174532925199433]],
    PROJECTION["Transverse_Mercator"],
    PARAMETER["False_Easting",500000.0],
    PARAMETER["False_Northing",0.0],
    PARAMETER["Central_Meridian",3.0],
    PARAMETER["Scale_Factor",0.9996],
    PARAMETER["Latitude_Of_Origin",0.0],
    UNIT["Meter",1.0],
    AUTHORITY["EPSG","23031"]]
Origin = (444445.000000000000000,6707603.000000000000000)
Pixel Size = (2.000000000000000,2.000000000000000)
Image Structure Metadata:
 INTERLEAVE=BAND
Corner Coordinates:
Upper Left  (  444445.000, 6707603.000) (  1d59'19.46"E, 60d29'57.45"N)
Lower Left  (  444445.000, 6709741.000) (  1d59'17.31"E, 60d31' 6.55"N)
Upper Right (  447317.000, 6707603.000) (  2d 2'27.63"E, 60d29'58.84"N)
Lower Right (  447317.000, 6709741.000) (  2d 2'25.59"E, 60d31' 7.94"N)
Center      (  445881.000, 6708672.000) (  2d 0'52.50"E, 60d30'32.70"N)
Band 1 Block=1436x1 Type=Float32, ColorInterp=Gray
 Min=0.000 Max=1.000
 Minimum=0.000, Maximum=1.000, Mean=0.283, StdDev=0.450
 Metadata:
    STATISTICS_MAXIMUM=1
    STATISTICS_MEAN=0,28255913031469
    STATISTICS_MINIMUM=0
    STATISTICS_STDDEV=0,45024393416031

Pixel size and corner coordinates should be as it was previously calculated. You won’t get info about statistics and coordinate system but for now that is not important.

In this moment we are ready to create concave hull or to be more precise we are able to generate contours. The following command will generate shape file with 0.1 contour and attribute elev that holds contour value.

gdal_contour -a elev concavehull_ext2.tiff concavehull_ext2_z.shp -fl 0.1

Usage: gdal_contour [-b ] [-a ] [-3d] [-inodata]
                   [-snodata n] [-f ] [-i ]
                   [-off ] [-fl  ...]
                   [-nln ] [-q]

Here is generated concave hull for my case. Gray pixels contain no points and for the other pixels count value is equal to 1 (that was intention from the beginning).

If you need you can generate multiple contours at once. In that case you can use elev attribute to extract what you intend to use for concave hull..

gdal_contour -a elev concavehull_ext2.tiff contours.shp -fl 0.1 0.5 0.7

If you need to visualise points you can use ogr2ogr to create shape file from vrt.

ogr2ogr dem.shp dem.vrt

Here is segment with metric raster, points and concave hull (contour=1).

It is good practice to use coordinate system definition from the beginning of process. For that you should add one subelement to the vrt OGRVRTLayer element.

<LayerSRS>EPSG:23031</LayerSRS>

The previous is sufficient to propagate coordinate system definition to all generated files.

The whole process is very simple and can be summarized into few lines:
1. create vrt file and check it (ogrinfo); optionaly crate shape file ogr2ogr
2. create metric raster gdal_grid (check it with gdalinfo)
3. generate concave hull (gdal_contour)

... just play with your data and enjoy! :-)

Monday, April 30, 2012

Geomedia and Oracle: how to speed up initial connection

keywords: oracle, geomedia, geomedia objects, oracle log, sql hint, sql plus, oracle enterprise manager, spool, timing

If you are using Oracle database and Geomedia Professional or application based on Geomedia Objects you will probably want to speed up initial connection that might be very slow. In this post I will present to you solution to that specific issue that might significantly improve user experience. Same approach you can apply in many other situation where you can not control or change SQL generated by some tool/software.

To start optimisation process you will need to find out which statement causes trouble. That is possible to find out using Oracle tools as Oracle Enterprise Manager. You can login as a same user you are using from Geomedia. After that go to Performance/Top activity and find select statements you are looking for.

To help you little bit here are two screenshots. Some arbitrary query executed using SQL developer

and here is the same query found by Oracle Enterprise Manager (Performance/Top activity).

You can read more on this topic here.

The same thing you can accomplish using tool as TOAD or by SQL queries, etc...

In the case you want to optimize queries generated by Geomedia you can find all SQL queries in log file. To get information from log file you have to create file c:\temp\GOracle.log and then start Geomedia. Don’t be creative, the file name and location should be as it is written, you can not use different location and file name :)

After you connect your Geomedia to Oracle database you will get SQL queries generated by Geomedia in log file. That is a starting point for your optimisation.

Let me show you one query generated by Geomedia in the moment of initial connection (open your SQL Plus and try it yourself). Here is the SQL entered in SQL Plus and the content of my spool file (sql-log.txt).

spool sql-log.txt;
set timing on;
SELECT COUNT(1)
FROM ALL_OBJECTS
WHERE OBJECT_TYPE = 'TABLE'
 AND SECONDARY = 'N'
 AND (OWNER, OBJECT_NAME) IN
    (SELECT OWNER, TABLE_NAME
            FROM ALL_TABLES
            WHERE OWNER NOT IN ('SCOTT', 'MGMT_VIEW', 'WKPROXY', 'WKSYS', 'MDDATA', 'SYSMAN', 'ANONYMOUS', 'XDB', 'WK_TEST', 'OLAPSYS', 'CTXSYS', 'MDSYS', 'SI_INFORMTN_SCHEMA', 'ORDPLUGINS', 'ORDSYS', 'EXFSYS', 'WMSYS', 'DBSNMP', 'DMSYS', 'DIP', 'OUTLN', 'SYSTEM', 'SYS', 'SH'));
SELECT /*+ STAR_TRANSFORMATION */  COUNT(1)
FROM ALL_OBJECTS
WHERE OBJECT_TYPE = 'TABLE'
 AND SECONDARY = 'N'
 AND (OWNER, OBJECT_NAME) IN
    (SELECT OWNER, TABLE_NAME
            FROM ALL_TABLES
            WHERE OWNER NOT IN ('SCOTT', 'MGMT_VIEW', 'WKPROXY', 'WKSYS', 'MDDATA', 'SYSMAN', 'ANONYMOUS', 'XDB', 'WK_TEST', 'OLAPSYS', 'CTXSYS', 'MDSYS', 'SI_INFORMTN_SCHEMA', 'ORDPLUGINS', 'ORDSYS', 'EXFSYS', 'WMSYS', 'DBSNMP', 'DMSYS', 'DIP', 'OUTLN', 'SYSTEM', 'SYS', 'SH'));
spool off;

(Notice optimiser hint used to alter execution plan: /*+ STAR_TRANSFORMATION */ )

Here is content of spool file:

SQL> set timing on;
SQL> SELECT COUNT(1)
2   FROM ALL_OBJECTS
3   WHERE OBJECT_TYPE = 'TABLE'
4     AND SECONDARY = 'N'
5     AND (OWNER, OBJECT_NAME) IN
6      (SELECT OWNER, TABLE_NAME
7              FROM ALL_TABLES
8               WHERE OWNER NOT IN ('SCOTT', 'MGMT_VIEW', 'WKPROXY', 'WKSYS', 'MDDATA', 'SYSMAN', 'ANONYMOUS', 'XDB', 'WK_TEST', 'OLAPSYS', 'CTXSYS', 'MDSYS', 'SI_INFORMTN_SCHEMA', 'ORDPLUGINS', 'ORDSYS', 'EXFSYS', 'WMSYS', 'DBSNMP', 'DMSYS', 'DIP', 'OUTLN', 'SYSTEM', 'SYS', 'SH'));

COUNT(1)                                                                      
----------                                                                      
   27664                                                                      

Elapsed: 00:01:12.36
SQL> SELECT /*+ STAR_TRANSFORMATION */  COUNT(1)
2   FROM ALL_OBJECTS
3   WHERE OBJECT_TYPE = 'TABLE'
4     AND SECONDARY = 'N'
5     AND (OWNER, OBJECT_NAME) IN
6      (SELECT OWNER, TABLE_NAME
7              FROM ALL_TABLES
8               WHERE OWNER NOT IN ('SCOTT', 'MGMT_VIEW', 'WKPROXY', 'WKSYS', 'MDDATA', 'SYSMAN', 'ANONYMOUS', 'XDB', 'WK_TEST', 'OLAPSYS', 'CTXSYS', 'MDSYS', 'SI_INFORMTN_SCHEMA', 'ORDPLUGINS', 'ORDSYS', 'EXFSYS', 'WMSYS', 'DBSNMP', 'DMSYS', 'DIP', 'OUTLN', 'SYSTEM', 'SYS', 'SH'));

COUNT(1)                                                                      
----------                                                                      
   27664                                                                      

Elapsed: 00:00:01.23
SQL> spool off;

You can notice that elapsed time has changed dramatically, from 1:12.36 min to 1.23 seconds!

...just one brief comment.... every time Geomedia connects to the Oracle database, presented query is executed without any optimisation and your users will be very unsatisfied… not only users of your system, you as a software developer or system architect or GIS analyst.... you can also experience great disappointment waiting more than minute .. just to establish connection!!!! … and one more thing, this SQL is not the only one that make you trouble (Examine your log file!). In this post I’ll present you principles you can apply to any SQL.

Now it is time to “tell” Oracle when it receives query from Geomedia to run it in optimized form. Here id PL/SQL code to do that. This solution was proposed by Oracle Guru Damir Vadas (Tnx Damir!).

set serveroutput on size 123456;
 
DECLARE
  l_sql            VARCHAR2(2048 CHAR);
  l_sql_tune_task_id  VARCHAR2(100);
BEGIN
  l_sql :=  'select OWNER, OBJECT_NAME, TO_CHAR(CREATED, ''MM/DD/YYYY HH24:MI:SS''), TO_CHAR(LAST_DDL_TIME, ''MM/DD/YYYY HH24:MI:SS''), ''TABLE'' from ALL_OBJECTS where OBJECT_TYPE = ''TABLE'' and SECONDARY = ''N'' and (OWNER,OBJECT_NAME) in (select OWNER,TABLE_NAME from ALL_TABLES where OWNER not in (''SCOTT'',''MGMT_VIEW'',''WKPROXY'',''WKSYS'',''MDDATA'',''SYSMAN'',''ANONYMOUS'',''XDB'',''WK_TEST'',''OLAPSYS'',''CTXSYS'',''MDSYS'',''SI_INFORMTN_SCHEMA'',''ORDPLUGINS'',''ORDSYS'',''EXFSYS'',''WMSYS'',''DBSNMP'',''DMSYS'',''DIP'',''OUTLN'',''SYSTEM'',''SYS'',''SH'')) order by OWNER, OBJECT_NAME';
  l_sql_tune_task_id := DBMS_SQLTUNE.create_tuning_task (
                       sql_text => l_sql,
                       user_name       => null,
                       scope    => DBMS_SQLTUNE.scope_comprehensive,
                  time_limit      => 600,
                       task_name       => 'g3_tuning_task',
                       description     => 'Tuning task for GEOMEDIA.');
  DBMS_OUTPUT.put_line('l_sql_tune_task_id: ' || l_sql_tune_task_id);
END;
/
 
EXEC DBMS_SQLTUNE.execute_tuning_task(task_name => 'g3_tuning_task');
 
SELECT task_name, status FROM dba_advisor_log WHERE TASK_NAME='g3_tuning_task';
 
SET LONG 10000;
SET PAGESIZE 1000
SET LINESIZE 200
SELECT DBMS_SQLTUNE.report_tuning_task('g3_tuning_task') AS recommendations FROM dual;
 
execute dbms_sqltune.accept_sql_profile(task_name => 'g3_tuning_task', replace => TRUE);

Take into consideration that you can not change query (variable: l_sql), take it exactly as it was written in log file. You have to take care only about quotation: double your quotes or use q’[..]’), do not change upper-case to lower-case or anything else. The limitation of this procedure is that you can not optimise queries that have some parameters that are changed every time.

Examine reports you get by proposed procedures (pay attention to estimated benefit).

When you work with Oracle and Geomedia (read/write) you will find GDOSYS schema in your database. There are several tables, views, triggers and sequences in that schema. One table (GEXCLUSIONS) is used to speed up connection. Here is description of that table from Intergraph's documentation: “The GEXCLUSIONS metadata table is specific to the Oracle Object Model Data Server and is used to exclude schemas from the initial connection scan. When establishing an Oracle connection, any schema that the connected user has privileges to see will be scanned for compatibility. The more schemas that are available to the connected user, the longer the connection takes. This is one reason it is not recommended to connect as a user with the DBA role.”

From my experience you can leave content of this table as it is and just perform proposed optimisation. If you change content of the table GEXCLUSIONS you wil have to perform optimisation one more time because every new entry in GEXCLUSIONS table changes query (variable: l_sql).

… so ... play a little bit with this and very soon you will notice the benefit … be careful, try it first on the test environment not production :)

… enjoy ...