I currently have a processing chain in R which downloads MODIS data and then calls gdalwarp from the system to reproject a specific subdataset (e.g. NDVI) into WGS1984. The resulting GeoTiffs are then collected into an HDF5 file for further processing.
Now I'm moving the processing chain to python, and I was wondering if there's a way to skip the step of writing GeoTiffs to disk with the functionalities of the gdal module.
To be clear, the question is:
Can i perform gdalwarp with using strictly the python bindings from the gdal module and without writing to disk?
I've been researching a bit and the closest answers to my questions were these posts:
How to project and resample a grid to match another grid with GDAL python?
Replicating result of gdalwarp using gdal Python bindings
The first method requires a template, so not really what I'm looking for.
The second method looks more promising, it's using the function AutoCreateWarpedVRT which seems to be quite what I want. Although, in contrary to the example in the answer, my result doesn't match the reference (independently of any error threshold).
In my previous implementation which calls gdalwarp directly, I've specified a target resolution in addition to the target reference system. So I assume that's something that could make the difference - but I haven't been able to set it within the gdal bindings in python.
Here's what I tried (sorry, not reproducible without the MODIS data):
import gdal
import osr
ds = gdal.Open('/data/MOD13A2.A2016305.h18v07.005.2016322013359.hdf')
t_srs = osr.SpatialReference()
t_srs.ImportFromEPSG(4326)
src_ds = gdal.Open(ds.GetSubDatasets()[0][0], gdal.GA_ReadOnly)
dst_wkt =t_srs.ExportToWkt()
error_threshold = 0.125
resampling=gdal.GRA_NearestNeighbour
tmp_ds = gdal.AutoCreateWarpedVRT( src_ds,
None, # src_wkt : left to default value --> will use the one from source
dst_wkt,
resampling,
error_threshold)
# create tiff
dst_ds = gdal.GetDriverByName('GTiff').CreateCopy('warp_test.tif', tmp_ds)
dst_ds = None
And this is for the reference:
gdalwarp -ot Int16 -tr 0.00892857142857143 0.00892857142857143 -t_srs EPSG:4326 "HDF4_EOS:EOS_GRID:MOD13A2.A2016305.h18v07.005.2016322013359.hdf:MODIS_Grid_16DAY_1km_VI:1 km 16 days NDVI" MOD13A2.A2016305.h18v07.005.2016322013359_MODIS_Grid_16DAY_1km_VI_1_km_16_days_NDVI.tif
The comparison:
i1 = gdal.Open('warp_test.tif')
i2 = gdal.Open('MOD13A2.A2016305.h18v07.005.2016322013359_MODIS_Grid_16DAY_1km_VI_1_km_16_days_NDVI.tif')
# test
print(i1.RasterXSize,i1.RasterYSize)
1267 1191
#reference
print(i2.RasterXSize,i2.RasterYSize)
1192 1120
i1.GetRasterBand(1).Checksum() == i2.GetRasterBand(1).Checksum()
False
So you can see, using the gdal.AutoCreateWarpedVRT function results in a dataset with different dimensions and resolution.
If you want to mimic your "reference" call to gdalwarp you can use:
import gdal
ds = gdal.Warp('warp_test.tif', infile, dstSRS='EPSG:4326',
outputType=gdal.GDT_Int16, xRes=0.00892857142857143, yRes=0.00892857142857143)
ds = None
If you dont want to output to a file on disk, you can warp to an in-memory VRT file, for example:
ds = gdal.Warp('', infile, dstSRS='EPSG:4326', format='VRT',
outputType=gdal.GDT_Int16, xRes=0.00892857142857143, yRes=0.00892857142857143)
You can of course warp to any format in memory, but for files other than VRT the warped result will actually be stored in-memory.
Related
I have several pyramidal, tiled TIFF images that were converted from a different format. The converter program wrote incorrect data to the XResolution and YResolution TIFF metadata. How can I modify these fields?
tiff.ResolutionUnit: 'centimeter'
tiff.XResolution: '0.34703996762331574'
tiff.YResolution: '0.34704136833246829'
Ideally I would like to use Python or a command-line tool.
One can use tifftools.tiff_set from Tiff Tools.
import tifftools
tifftools.tiff_set(
PATH_TO_ORIG_IMAGE,
PATH_TO_NEW_IMAGE,
overwrite=False,
setlist=[
(
tifftools.Tag.RESOLUTIONUNIT,
tifftools.constants.ResolutionUnit.CENTIMETER.value,
),
(tifftools.Tag.XRESOLUTION, xresolution),
(tifftools.Tag.YRESOLUTION, yresolution),
],
)
Replace xresolution and yresolution with the desired values. These values must be floats. In this example, the resolution unit is centimeter.
This is also possible with the excellent tifffile package. In fact there is an example of this use case in the README.
with TiffFile('temp.tif', mode='r+') as tif:
_ = tif.pages[0].tags['XResolution'].overwrite((96000, 1000))
Be aware that this will overwrite the original image. If this is not desired, make a copy of the image first and then overwrite the tags.
I have an exisiting vtk file (of a FE mesh, regular hexahedron mesh) and I would like to add a data set to this that I have in Python. Specifically, I would like to add this numpy data set to each node and then visualize it in ParaView.
Any tips on how I can get started on this?
VTK (and by extension ParaView) has great NumPy integration facilities. For a wonderful overview on these, please see the blog post series starting with Improved VTK – numpy integration].
The important parts are:
You need to wrap your VTK data object in an adapter class
You add your NumPy array to the wrapped data set
Sketching this out, you can write:
import vtk
from vtk.numpy_interface import dataset_adapter as dsa
dataSet = ...
numpyArray = ...
adaptedDataSet = dsa.WrapDataObject(dataSet)
dataSet.PointData.append(numpyArray, 'arrayname')
If your data were instead associated with cells rather than points, you would change that last line to
dataSet.CellData.append(numpyArray, 'arrayname')
You'll have to be sure that the order of the data in the NumPy array matches the order of points in the hexahedron mesh.
Now, how do you do this in ParaView? You can add a Programmable Filter. The Python environment in which the script set on the Programmable Filter is executed already does this wrapping for you, so you can simplify the script above to:
# Shallow copy the input data to the output
output.VTKObject.ShallowCopy(inputs[0].VTKObject)
# Define the numpy array
numpyArray = ...
# Add the numpy array as a point data set
output.PointData.append(numpyArray, 'arrayName')
In the script above, output is a wrapped copy of the dataset produced by the Programmable Filter, saving you from having to do the wrapping manually. You do need to shallow copy the input object to the output as the script shows.
Thanks for your assistance. Here is how I solved my problem:
import vtk
from vtk.numpy_interface import dataset_adapter as dsa
# Read in base vtk
fileName = "Original.vtk"
reader = vtk.vtkUnstructuredGridReader()
reader.SetFileName(fileName)
reader.Update()
mesh = reader.GetOutput()
# Add data set and write VTK file
meshNew = dsa.WrapDataObject(mesh)
meshNew.PointData.append(NewDataSet, "new data")
writer = vtk.vtkUnstructuredGridWriter()
writer.SetFileName("New.vtk")
writer.SetInputData(meshNew.VTKObject)
writer.Write()
I need to extract the raster (stored as a numpy array) from a file. Following the very popular OGR Cookbook, I am reading in an OGR layer (geojson) and then rasterizing the vectors. I read that array using GDAL's ReadAsArray() function. That all works fine, and I can do all sorts of numpy things to it. However, GDAL automatically writes out the GDAL dataset I create because its automatically de-referenced once the program ends. I don't need/want this file to be output because its useless to have on disk, I just need the data in memory. How can you prevent this from happening?
I've tried not calling the FlushCache() function, but the file still gets output in the end.
Code:
...
# Create the destination data source
target = gdal.GetDriverByName('GTiff').Create(output_raster_path, source_raster.RasterXSize, source_raster.RasterYSize, 1, gdal.GDT_UInt16)
target.SetGeoTransform(source_raster.GetGeoTransform())
target.SetProjection(source_raster.GetProjection())
band = target.GetRasterBand(1)
band.SetNoDataValue(no_data_value)
gdal.RasterizeLayer(target, [1], source_layer, options=["ATTRIBUTE=BuildingID"])
raster = band.ReadAsArray()
return raster
Afterwards, once the program completes, a geotiff is written to output_raster_path, which I had just set as "temp.tif".
You can use In-Memory Driver for things like that.
mem_drv = gdal.GetDriverByName('MEM')
target = mem_drv.Create('', source_raster.RasterXSize, source_raster.RasterYSize, 1, gdal.GDT_UInt16)
I am trying to transform a .grib file into a GeoTIFF to be used in a GIS (ArcGIS to be particular), but am having trouble getting the image to project properly. I have been able to create a GeoTIFF, using GDAL in Python, that shows the data but is not showing up in the correct location when brought into ArcGIS. The resulting image is below.
The data I am working with can be downloaded from: https://gimms.gsfc.nasa.gov/SMOS/SMAP/L05/
I am trying to project the data into WGS84 Web Mercator (Auxiliary Sphere), EPSG: 3857
Note: I have tried bringing in the data via ArcMap by creating a Raster Mosaic which should be able to work with .grib data, but I didn't have any luck.
Update: I have also tried using the Project Raster tool, but ArcGIS does not like the default projection that comes from the .grib file and gives an error.
The code I'm using:
import gdal
src_filename = r"C:\att\project\arcshare\public\disaster_response\nrt_products\smap\20150402_20150404_anom1.grib"
dst_filename = r"C:\att\project\arcshare\public\disaster_response\nrt_products\smap\smap_py_test1.tif"
#Open existing dataset
src_ds = gdal.Open(src_filename)
#Open output format driver, see gdal_translate --formats for list
format = "GTiff"
driver = gdal.GetDriverByName( format )
#Output to new format
dst_file = driver.CreateCopy( dst_filename, src_ds, 0 )
#Properly close the datasets to flush to disk
dst_ds = None
src_ds = None
I am not very well versed in using GDAL or GDAL in Python, so any help or tips would be greatly appreciated.
Try using gdal.Translate (in Python) or gdal_translate (from command line). Here are two examples of how I have used each approach in the past:
Option 1: Python approach
from osgeo import gdal
# Open existing dataset
src_ds = gdal.Open(src_filename)
# Ensure number of bands in GeoTiff will be same as in GRIB file.
bands = [] # Set up array for gdal.Translate().
if src_ds is not None:
bandNum = src_ds.RasterCount # Get band count
for i in range(bandNum+1): # Update array based on band count
if (i==0): #gdal starts band counts at 1, not 0 like the Python for loop does.
pass
else:
bands.append(i)
# Open output format driver
out_form= "GTiff"
# Output to new format using gdal.Translate. See https://gdal.org/python/ for osgeo.gdal.Translate options.
dst_ds = gdal.Translate(dst_filename, src_ds, format=out_form, bandList=bands)
# Properly close the datasets to flush to disk
dst_ds = None
src_ds = None
Option 2: Command line gdal_translate (called from Python) approach
import os
# Open output format driver, see gdal_translate --formats for list
out_form = "GTiff"
# Pull out specific band of interest
band=3
# Convert from GRIB to GeoTIFF using system gdal_translate
src_ds = src_filename
dst_ds = dst_filename
os.system("gdal_translate -b %s -of %s %s %s" %(str(band), out_form, src_ds, dst_ds))
I've had trouble in the past creating a multi-band GeoTiff using option 2, so I recommend using option 1 when possible.
Something like this should transform your native coordinates into your desired projection. This is not tested, yet. (Could by latitude instead of latitudes).
from cfgrib import xarray_store
from pyproj import Proj, transform
grib_data = xarray_store.open_dataset('your_grib_file.grib')
lat = grib_data.latitudes.value
lon = grib_data.longitudes.value
lon_transformed, lat_transformed = transform (Proj(init='init_projection'),
Proj(init='target_projection', lon, lat)
I am trying to use rasterio to load in an image, modify the ndarray, then write out using the same spatial reference system as the original image. The below function is my attempt to do this. But the spatial reference system is missing from the output geotiff. Any suggestions on what I am doing wrong?
I have checked the input geotiff crs is valid ('epsg:32611').
# Function to write out an ndarry as a GeoTIFF using the spatial references of a sample geotif file
def write_GeoTif_like(templet_tif_file, output_ndarry, output_tif_file):
import rasterio
orig = rasterio.open(templet_tif_file)
with rasterio.open(output_tif_file, 'w', driver='GTiff', height=output_ndarry.shape[0],
width=output_ndarry.shape[1], count=1, dtype=output_ndarry.dtype,
crs=orig.crs, transform=orig.transform, nodata=-9999) as dst:
dst.write(output_ndarry, 1)
Having been bitten by this issue before, I'd guess that your GDAL_DATA environment variable is not being set correctly (see https://github.com/conda/conda/issues/4050 for more detail). Without knowing more about your installation/OS, I cannot say for sure, but if gdal (and rasterio) are unable to find the location with metadata files such as those that support operations involving coordinate references systems, you'll lose the CRS in the output tif.