Spatial reference missing from rasterio write to GeoTiff

Spatial reference missing from rasterio write to GeoTiff - python

I am trying to use rasterio to load in an image, modify the ndarray, then write out using the same spatial reference system as the original image. The below function is my attempt to do this. But the spatial reference system is missing from the output geotiff. Any suggestions on what I am doing wrong?
I have checked the input geotiff crs is valid ('epsg:32611').
# Function to write out an ndarry as a GeoTIFF using the spatial references of a sample geotif file
def write_GeoTif_like(templet_tif_file, output_ndarry, output_tif_file):
import rasterio
orig = rasterio.open(templet_tif_file)
with rasterio.open(output_tif_file, 'w', driver='GTiff', height=output_ndarry.shape[0],
width=output_ndarry.shape[1], count=1, dtype=output_ndarry.dtype,
crs=orig.crs, transform=orig.transform, nodata=-9999) as dst:
dst.write(output_ndarry, 1)

Having been bitten by this issue before, I'd guess that your GDAL_DATA environment variable is not being set correctly (see https://github.com/conda/conda/issues/4050 for more detail). Without knowing more about your installation/OS, I cannot say for sure, but if gdal (and rasterio) are unable to find the location with metadata files such as those that support operations involving coordinate references systems, you'll lose the CRS in the output tif.

Related

Adding pixelData to pydicom dataset causes VR Error

I am working with a set of dicom images. I would like to create a new image with a header similar to an existing image. However, I already propagate the images in numpy arrays, so to avoid duplication I propagate the headers without PixelData:
metadata = pydicom.filereader.dcmread(image_path[l],stop_before_pixels=True)
In a separate function, I want to attach a different image (an ROI) to the modified metadata:
ds = metadata
ds.PixelData = roi.astype(np.int16).tostring() # A numpy array converted to the same datatype as pixel_array was
ds.save_as(os.path.join(write_dir,'ROI'+str(slice+1))+'.dcm')
This results in an error message below which seems to indicate that the PixelData VR is not set in the dictionary? Thanks for your suggestions.
ValueError: Cannot write ambiguous VR of 'OB or OW' for data element
with tag (7fe0, 0010). Set the correct VR before writing, or use an
implicit VR transfer syntax

The VR for Pixel Data is ambiguous in the DICOM Standard. Depending on the exact nature of your dataset the required VR is either OB or OW. Because you're adding a brand new Pixel Data element to an existing dataset pydicom defaults the VR to 'OB or OW'. Normally this isn't an issue if your dataset is conformant because during write pydicom will automatically fix this so the correct VR is used (using the correct_ambiguous_vr() function). If your dataset isn't conformant then:
If your Pixel Data uses a compressed transfer syntax, like JPEG, then it should be OB.
Otherwise, it should be OB if Bits Allocated <= 8 and OW if > 8.
# Set the VR manually
ds['PixelData'].VR = 'OW'

Python: Read vtk file, add data set then write vtk

I have an exisiting vtk file (of a FE mesh, regular hexahedron mesh) and I would like to add a data set to this that I have in Python. Specifically, I would like to add this numpy data set to each node and then visualize it in ParaView.
Any tips on how I can get started on this?

VTK (and by extension ParaView) has great NumPy integration facilities. For a wonderful overview on these, please see the blog post series starting with Improved VTK – numpy integration].
The important parts are:
You need to wrap your VTK data object in an adapter class
You add your NumPy array to the wrapped data set
Sketching this out, you can write:
import vtk
from vtk.numpy_interface import dataset_adapter as dsa
dataSet = ...
numpyArray = ...
adaptedDataSet = dsa.WrapDataObject(dataSet)
dataSet.PointData.append(numpyArray, 'arrayname')
If your data were instead associated with cells rather than points, you would change that last line to
dataSet.CellData.append(numpyArray, 'arrayname')
You'll have to be sure that the order of the data in the NumPy array matches the order of points in the hexahedron mesh.
Now, how do you do this in ParaView? You can add a Programmable Filter. The Python environment in which the script set on the Programmable Filter is executed already does this wrapping for you, so you can simplify the script above to:
# Shallow copy the input data to the output
output.VTKObject.ShallowCopy(inputs[0].VTKObject)
# Define the numpy array
numpyArray = ...
# Add the numpy array as a point data set
output.PointData.append(numpyArray, 'arrayName')
In the script above, output is a wrapped copy of the dataset produced by the Programmable Filter, saving you from having to do the wrapping manually. You do need to shallow copy the input object to the output as the script shows.

Thanks for your assistance. Here is how I solved my problem:
import vtk
from vtk.numpy_interface import dataset_adapter as dsa
# Read in base vtk
fileName = "Original.vtk"
reader = vtk.vtkUnstructuredGridReader()
reader.SetFileName(fileName)
reader.Update()
mesh = reader.GetOutput()
# Add data set and write VTK file
meshNew = dsa.WrapDataObject(mesh)
meshNew.PointData.append(NewDataSet, "new data")
writer = vtk.vtkUnstructuredGridWriter()
writer.SetFileName("New.vtk")
writer.SetInputData(meshNew.VTKObject)
writer.Write()

How to prevent GDAL from writing data source to disk when de-referenced

I need to extract the raster (stored as a numpy array) from a file. Following the very popular OGR Cookbook, I am reading in an OGR layer (geojson) and then rasterizing the vectors. I read that array using GDAL's ReadAsArray() function. That all works fine, and I can do all sorts of numpy things to it. However, GDAL automatically writes out the GDAL dataset I create because its automatically de-referenced once the program ends. I don't need/want this file to be output because its useless to have on disk, I just need the data in memory. How can you prevent this from happening?
I've tried not calling the FlushCache() function, but the file still gets output in the end.
Code:
...
# Create the destination data source
target = gdal.GetDriverByName('GTiff').Create(output_raster_path, source_raster.RasterXSize, source_raster.RasterYSize, 1, gdal.GDT_UInt16)
target.SetGeoTransform(source_raster.GetGeoTransform())
target.SetProjection(source_raster.GetProjection())
band = target.GetRasterBand(1)
band.SetNoDataValue(no_data_value)
gdal.RasterizeLayer(target, [1], source_layer, options=["ATTRIBUTE=BuildingID"])
raster = band.ReadAsArray()
return raster
Afterwards, once the program completes, a geotiff is written to output_raster_path, which I had just set as "temp.tif".

You can use In-Memory Driver for things like that.
mem_drv = gdal.GetDriverByName('MEM')
target = mem_drv.Create('', source_raster.RasterXSize, source_raster.RasterYSize, 1, gdal.GDT_UInt16)

python: perform gdalwarp in memory with gdal bindings

I currently have a processing chain in R which downloads MODIS data and then calls gdalwarp from the system to reproject a specific subdataset (e.g. NDVI) into WGS1984. The resulting GeoTiffs are then collected into an HDF5 file for further processing.
Now I'm moving the processing chain to python, and I was wondering if there's a way to skip the step of writing GeoTiffs to disk with the functionalities of the gdal module.
To be clear, the question is:
Can i perform gdalwarp with using strictly the python bindings from the gdal module and without writing to disk?
I've been researching a bit and the closest answers to my questions were these posts:
How to project and resample a grid to match another grid with GDAL python?
Replicating result of gdalwarp using gdal Python bindings
The first method requires a template, so not really what I'm looking for.
The second method looks more promising, it's using the function AutoCreateWarpedVRT which seems to be quite what I want. Although, in contrary to the example in the answer, my result doesn't match the reference (independently of any error threshold).
In my previous implementation which calls gdalwarp directly, I've specified a target resolution in addition to the target reference system. So I assume that's something that could make the difference - but I haven't been able to set it within the gdal bindings in python.
Here's what I tried (sorry, not reproducible without the MODIS data):
import gdal
import osr
ds = gdal.Open('/data/MOD13A2.A2016305.h18v07.005.2016322013359.hdf')
t_srs = osr.SpatialReference()
t_srs.ImportFromEPSG(4326)
src_ds = gdal.Open(ds.GetSubDatasets()[0][0], gdal.GA_ReadOnly)
dst_wkt =t_srs.ExportToWkt()
error_threshold = 0.125
resampling=gdal.GRA_NearestNeighbour
tmp_ds = gdal.AutoCreateWarpedVRT( src_ds,
None, # src_wkt : left to default value --> will use the one from source
dst_wkt,
resampling,
error_threshold)
# create tiff
dst_ds = gdal.GetDriverByName('GTiff').CreateCopy('warp_test.tif', tmp_ds)
dst_ds = None
And this is for the reference:
gdalwarp -ot Int16 -tr 0.00892857142857143 0.00892857142857143 -t_srs EPSG:4326 "HDF4_EOS:EOS_GRID:MOD13A2.A2016305.h18v07.005.2016322013359.hdf:MODIS_Grid_16DAY_1km_VI:1 km 16 days NDVI" MOD13A2.A2016305.h18v07.005.2016322013359_MODIS_Grid_16DAY_1km_VI_1_km_16_days_NDVI.tif
The comparison:
i1 = gdal.Open('warp_test.tif')
i2 = gdal.Open('MOD13A2.A2016305.h18v07.005.2016322013359_MODIS_Grid_16DAY_1km_VI_1_km_16_days_NDVI.tif')
# test
print(i1.RasterXSize,i1.RasterYSize)
1267 1191
#reference
print(i2.RasterXSize,i2.RasterYSize)
1192 1120
i1.GetRasterBand(1).Checksum() == i2.GetRasterBand(1).Checksum()
False
So you can see, using the gdal.AutoCreateWarpedVRT function results in a dataset with different dimensions and resolution.

If you want to mimic your "reference" call to gdalwarp you can use:
import gdal
ds = gdal.Warp('warp_test.tif', infile, dstSRS='EPSG:4326',
outputType=gdal.GDT_Int16, xRes=0.00892857142857143, yRes=0.00892857142857143)
ds = None
If you dont want to output to a file on disk, you can warp to an in-memory VRT file, for example:
ds = gdal.Warp('', infile, dstSRS='EPSG:4326', format='VRT',
outputType=gdal.GDT_Int16, xRes=0.00892857142857143, yRes=0.00892857142857143)
You can of course warp to any format in memory, but for files other than VRT the warped result will actually be stored in-memory.

Where to begin? Using x,y,z data to display a building lot

I used a builders' level to get x,y,z coordinates on a 110' x 150' building lot.
They are not in equally spaced rows and columns, but are randomly placed.
I have found a lot of info on mapping and I'm looking forward to learning about GIS. And how to use the many free software utilities out there.
Where should I start?
Now the data is in a csv file format, but I could change that.
It seems that I want to get the information I have into a "shapefile" or a raster format.
I supose I could look up the formats and do this, but it seems that I havn't come accross the proper utility for this part of the process.
Thank You Peter

You can convert your coordinate into a shapefile to display them in QGIS, ArcMAP, or similar GIS programs. You probably want a polygon shapefile.
One easy way to do this is with the PySAL
>>> import pysal
>>> coords = [(0,0), (10,0), (10,10), (0,10), (0,0)]
>>> pts = map(pysal.cg.Point, coords)
>>> polygon = pysal.cg.Polygon(pts)
>>> shp = pysal.open('myPolygon.shp','w')
>>> shp.write(polygon)
>>> shp.close()
Note: pysal currently doesn't support (Z coordinates) but there are plenty of similar libraries that do.
Also notice the first and last point are the same, indicating a closed polygon.
If your X,Y,Z coordinates are GPS coordinates you'll be able to align your data with other GIS data easily by telling the GIS what projection your data is in (WGS84, UTM Zone #, etc). If your coordinates are in local coordinates (not tied to a grid like UTM, etc) you'll need to "Georeference" you coordinates in order to align them with other data.
Finally using the ogr2ogr command you can easilly export your data from shapefile to other formats like KML,
ogr2ogr -f KML myPolygon.kml myPolygon.shp

You can convert a CSV file into any OGR supported format. All you need is a header file for the CSV file.
Here you have an example:
<ogrvrtdatasource>
<ogrvrtlayer name="bars">
<srcdatasource>bars.csv</srcdatasource>
<geometrytype>wkbPoint</geometrytype>
<layersrs>EPSG:4326</layersrs>
<geometryfield encoding="PointFromColumns" x="longitude" y="latitude">
</geometryfield>
</ogrvrtlayer>
</ogrvrtdatasource>
In the datasource field you set the CSV file name.
In your case, you have points, so the example is ok.
The field layersrs indicates the projection of the coordinates. If you have longitude and latitude, this one is ok.
The geometryfields must contain the x and y properties, that define the columns in the CSV file that containt the coordinates. The CSV file must have a first line defining the field names.
Save the file with a .vrt extension.
Once you have this, use the ogr2ogr program, which you have if GDAL is installed.
If you want to convert the file to a Shapefile, just type in a console:
ogr2ogr -f "ESRI Shapefile" bars.shp bars.vrt
If your question is what to do with the data, you can check the gdal_grid utility program, which converts scattered data (as yours) to raster data. You can use the CSV with the vrt header file as the input, without changing the format.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Spatial reference missing from rasterio write to GeoTiff - python

Related

Adding pixelData to pydicom dataset causes VR Error

Python: Read vtk file, add data set then write vtk

How to prevent GDAL from writing data source to disk when de-referenced

python: perform gdalwarp in memory with gdal bindings

Where to begin? Using x,y,z data to display a building lot

Categories

Resources