Pages

Wednesday 5 November 2014

Python tutorial: Converting a raster dataset to XYZ in Python

One of the most popular posts in my blog is about converting a raster image into XYZ text file.  Converting raster image to XYZ file may be necessary because machine learning algorithms (outside proprietary software) requires input to be a table. I had written a post about converting a raster file into XYZ using ARCGIS some three years ago, which seems to attract many visitors to my blog.

Here I will write a way to do it using Python. For reading geo-referenced raster file, we will use rasterio package which is a wrapper for gdal that provides clean and fast I/O for geospatial raster images. The package is written in cython therefore it is very fast. Reading a raster file with rasterio is a one liner code. You can download binary of rasterio here. Its a binary file so installing rasterio is just a matter of clicking binary executable file. After reading the raster file we will get bounding box of the image and compute XY of each pixel and later write in a csv file together with pixel values. The code works for any number of bands in the image.

I hope this piece of code is useful for you. In near future I will show you to do exact same thing within QGIS. Stay tuned.


Output XYZ file with band values

In addition, the above code can be combine with the code (Data exchange between MATLAB and Python: Reading and writing .mat files with Python)  so that  you can easily export multi-spectral and hyperspectral data as a mat file for MATLAB. Many people seem to be having problem with reading remote sensing image with multiband read function in MATLAB. If you are one them, use above codes and bypass multiband read function to get your remotes images straight in MATLAB.

3 comments:

  1. Hi ! I was searching for this kind of treatment and you help me so much. But I found an error that messes up data. Line 23 and 24 : newX = X.flatten() instead of newX = np.array(X.flatten(1)). Or maybe it's du to new versions of libs...

    ReplyDelete
    Replies
    1. Hi,I had the same error. The argument for flatten needs to be a string.
      From numpy doc =
      #'C’ means to flatten in row-major (C-style) order. ‘F’ means to flatten in column-major (Fortran- style) order
      # ‘A’ means to flatten in column-major order if a is Fortran contiguous in memory, row-major order otherwise. ‘K’ means to flatten a in the order the elements occur in memory. The default is ‘C’
      I could fix it by replacing the '1' with 'C', which I assume it has the same output as using '1'. Can someone confirm this?

      Delete
  2. I had to modify the final part due to the same error. Worked almost perfect, but it changes the width and height of the original input just in one unit for each. If you have the solution for this, I'd appreciate it! Thanks!

    ReplyDelete