MODIS AQUA Data - Stacking / Mosaic data with python GDAL

MODIS AQUA Data - Stacking / Mosaic data with python GDAL - python

I know how to access and plot subdatasets using gdal and python. However, I'm wondering if there's a way to use the GEO data contained in the HDF4 file so I could look at the same area over many years.
And if possible, can an area be cut out of the data and how?
UPDATE:
To be more specific: I plotted MODIS Data and as you can see below the river moves downwards (rectangular structure top left corner). So over a whole year it's not the same location that i'm observing.
There's a directory in the subdatasets called Geolocation Fields with Long and Alt directories. So is it possible to access this information or lay it over the data to cut out a specific area?
If we for example take a look at the NASA picture below would it be possible to cut it between 10-15 alt. and -5 to 0 long.
You can download a sample file by copying the url below:
https://ladsweb.modaps.eosdis.nasa.gov/archive/allData/6/MYD021KM/2009/034/MYD021KM.A2009034.1345.006.2012058160107.hdf
UPDATE:
I ran
x0, dx, dxdy, y0, dydx, dy = hdf_file.GetGeoTransform()
which gave me the following output:
x0: 0.0
dx: 1.0
dxdy: 0.0
y0: 0.0
dydx: 0.0
dy: 1.0
As well as
gdal.Warp(workdir2+"/output.tif",workdir1+"/MYD021KM.A2009002.1345.006.2012058153105.hdf")
which gave me the following error:
ERROR 1: Input file /Volumes/Transcend/Master_Thesis/Data/AQUA_002_1345/MYD021KM.A2009002.1345.006.2012058153105.hdf has no raster bands.
**UPDATE 2: **
Here's my code on how I open and read my hdf files:
all_files is a list containing file names like:
MYD021KM.A2008002.1345.006.2012066153213.hdf
MYD021KM.A2008018.1345.006.2012066183305.hdf
MYD021KM.A2008034.1345.006.2012067035823.hdf
MYD021KM.A2008050.1345.006.2012067084421.hdf
etc .....
for fe in all_files:
print "\nopening file: ", fe
try:
hdf_file = gdal.Open(workdir1 + "/" + fe)
print "getting subdatasets..."
subDatasets = hdf_file.GetSubDatasets()
Emissiv_Bands = gdal.Open(subDatasets[2][0])
print "getting bands..."
Bands = Emissiv_Bands.ReadAsArray()
print "unit conversion ... "
get_name_tag = re.findall(".A(\d{7}).", all_files[i])[0]
print "name tag of current file: ", get_name_tag
# Code for 1 Band:
L_B_1 = radiance_scales[specific_band] * (Bands[specific_band] - radiance_offsets[specific_band]) # Source: MODIS Level 1B Product User's Guide Page 36 MOD_PR02 V6.1.12 (TERRA)/V6.1.15 (AQUA)
data_1_band['%s' % get_name_tag] = L_B_1
L_B_1_mean['%s' % get_name_tag] = L_B_1.mean()
# Code for many different Bands:
data_all_bands["%s" % get_name_tag] = []
for k in Band_nrs[lowest_band:highest_band]: # Bands 8-11
L_B = radiance_scales[k] * (Bands[k] - radiance_offsets[k]) # List with all bands
print "Appending mean value of {} for band {} out of {}".format(L_B.mean(), Band_nrs[k], len(Band_nrs))
data_all_bands['%s' % get_name_tag].append(L_B.mean()) # Mean radiance values
i=i+1
print "data added. Adding i+1 = ", i
except AttributeError:
print "\n*******************************"
print "Can't open file {}".format(workdir1 + "/" + fe)
print "Skipping this file..."
print "*******************************"
broken_files.append(workdir1 + "/" + fe)
i=i+1

Without knowing your exact data source and desired output etc. it is hard to give you a specific answer. With that said, it appears that you have the native .hdf format of MODIS images and wish to do some subsetting to get the images referenced to the same area, then plot etc.
It might help for you to look at gdal.Warp() from the gdal module. This method is able to take a .hdf file and subset a series of images to the same bounding box with the same resolution/number of rows and columns.
You can then analyse and plot these images/compare pixels etc.
I hope that this gives you a good starting point to get started.
gdal.Warp docs: https://gdal.org/python/osgeo.gdal-module.html#Warp
More general warp help: https://www.gdal.org/gdalwarp.html
Something like this:
import gdal
# Set up the gdal.Warp options such as desired spatial resolution,
# resampling algorithm to use and output format.
# See: https://gdal.org/python/osgeo.gdal-module.html#WarpOptions
# for other options that can be specified.
warp_options = gdal.WarpOptions(format="GTiff",
outputBounds=[min_x, min_y, max_x, max_y],
xRes=res,
yRes=res,
# PROBABLY NEED TO SET t_srs TOO
)
# Apply the warp.
# (output_file, input_file, options)
gdal.Warp("/path/to/output_file.tif",
"/path/to/input_file.hdf",
options=warp_options)
Exact code to write:
# Apply the warp.
# (output_file, input_file, options)
gdal.Warp('/path/to/output_file.tif',
'/path/to/HDF4_EOS:EOS_SWATH:"MYD021KM.A2009034.1345.006.2012058160107.hdf":MODIS_SWATH_Type_L1B:EV_1KM_RefSB',
options=warp_options)

Related

How to specify burn values by attribute using GDAL rasterize (python API)?

I'm using gdal.RasterizeLayer() to convert a shapefile to a GeoTiff using a GeoTiff template while burning the output values by ATTRIBUTE. What I want to output is a .tif where the burn value corresponds to the value of a given attribute. What I find is that gdal.RasterizeLayer() is burning to strange values that do not correspond to the values in my attribute field. Here's what I have currently:
gdalformat = 'GTiff'
datatype = gdal.GDT_Byte
# Open Shapefile
shapefile = ogr.Open(self.filename)
shapefile_layer = shapefile.GetLayer()
# Get projection info from reference image
image = gdal.Open(ref_image, gdal.GA_ReadOnly)
output = gdal.GetDriverByName(gdalformat).Create(output_tif, image.RasterXSize, image.RasterYSize, 1, datatype,
options=['COMPRESS=DEFLATE'])
output.SetProjection(image.GetProjectionRef())
output.SetGeoTransform(image.GetGeoTransform())
# Write data to band 1
band = output.GetRasterBand(1)
band.SetNoDataValue(0)
gdal.RasterizeLayer(output, [1], shapefile_layer, options=['ATTRIBUTE=FCode'])
# Close datasets
band = None
output = None
image = None
shapefile = None
# Build image overviews
subprocess.call("gdaladdo --config COMPRESS_OVERVIEW DEFLATE " + output_tif + " 2 4 8 16 32 64", shell=True)
What occurs is that the output .tif correctly assigns different burn values for each attribute, but the value does not correspond to the attribute value. For example, the input attribute value FCode=46006 turns into a burn value of 182 (and it's not clear why!). I tried adding and removing the 'COMPRESS=DEFLATE' option, and adding and removing the '3D' option for gdal.RasterizeLayer(). None affect the output burn values.
You can see the input shapefile and attribute values here: input .shp
And the output, with the incorrect values, here: output raster

I fixed this myself by changing the type to gdal.GDT_Int32.

Find neighbouring polygons in Python QGIS

I am using code I found and slightly modified for my purposes. The problem is, it is not doing exactly what I want, and I am stuck with what to change to fix it.
I am searching for all neighbouring polygons, that share common borded (a line), that is not a point
My goal: 135/12 is neigbour with 319/2 135/4, 317 but not with 320/1
What I get in my QGIS table after I run my script
NEIGBOURS are the neighbouring polygons,
SUM is the number of neighbouring polygons
The code I use also includes 320/1 as a neighbouring polygon. How to fix it?
from qgis.utils import iface
from PyQt4.QtCore import QVariant
_NAME_FIELD = 'Nr'
_SUM_FIELD = 'calc'
_NEW_NEIGHBORS_FIELD = 'NEIGHBORS'
_NEW_SUM_FIELD = 'SUM'
layer = iface.activeLayer()
layer.startEditing()
layer.dataProvider().addAttributes(
[QgsField(_NEW_NEIGHBORS_FIELD, QVariant.String),
QgsField(_NEW_SUM_FIELD, QVariant.Int)])
layer.updateFields()
feature_dict = {f.id(): f for f in layer.getFeatures()}
index = QgsSpatialIndex()
for f in feature_dict.values():
index.insertFeature(f)
for f in feature_dict.values():
print 'Working on %s' % f[_NAME_FIELD]
geom = f.geometry()
intersecting_ids = index.intersects(geom.boundingBox())
neighbors = []
neighbors_sum = 0
for intersecting_id in intersecting_ids:
intersecting_f = feature_dict[intersecting_id]
if (f != intersecting_f and
not intersecting_f.geometry().disjoint(geom)):
neighbors.append(intersecting_f[_NAME_FIELD])
neighbors_sum += intersecting_f[_SUM_FIELD]
f[_NEW_NEIGHBORS_FIELD] = ','.join(neighbors)
f[_NEW_SUM_FIELD] = neighbors_sum
layer.updateFeature(f)
layer.commitChanges()
print 'Processing complete.'

I have found somewhat a workaround for it. Before using my script, I create a small (for my purposes, 0,01 m was enough) buffer around all joints. Later, I use a Difference tool to remove the buffer areas from my main layer, thus removing not-needed neighbouring polygons. Using the code now works fine

ROS CompressedDepth to numpy (or cv2)

Folks,
I am using this link as starting point to convert my CompressedDepth (image of type: "32FC1; compressedDepth," in meters) image to OpenCV frames:
Python CompressedImage Subscriber Publisher
I get an empty data when I try to print, or I get a NonType when I see the result of my array, etc.
What is the right way to convert a compressedDepth image?
Republishing is not gonna work do to wifi/router bandwidth and speed constraints.

The right way to decode compressedDepth is to first remove the header from the raw data and then convert the remaining data.
This is documented in image_transport_plugins/compressed_depth_image_transport/src/codec.cpp.
On my machine the header size is 12 bytes. This might however be different on other architectures since the size of an enum is not defined.
The following python code snippet exports compressed 16UC1 and 32FC1 depth images as png file:
# 'msg' as type CompressedImage
depth_fmt, compr_type = msg.format.split(';')
# remove white space
depth_fmt = depth_fmt.strip()
compr_type = compr_type.strip()
if compr_type != "compressedDepth":
raise Exception("Compression type is not 'compressedDepth'."
"You probably subscribed to the wrong topic.")
# remove header from raw data
depth_header_size = 12
raw_data = msg.data[depth_header_size:]
depth_img_raw = cv2.imdecode(np.fromstring(raw_data, np.uint8), cv2.CV_LOAD_IMAGE_UNCHANGED)
if depth_img_raw is None:
# probably wrong header size
raise Exception("Could not decode compressed depth image."
"You may need to change 'depth_header_size'!")
if depth_fmt == "16UC1":
# write raw image data
cv2.imwrite(os.path.join(path_depth, "depth_" + str(msg.header.stamp) + ".png"), depth_img_raw)
elif depth_fmt == "32FC1":
raw_header = msg.data[:depth_header_size]
# header: int, float, float
[compfmt, depthQuantA, depthQuantB] = struct.unpack('iff', raw_header)
depth_img_scaled = depthQuantA / (depth_img_raw.astype(np.float32)-depthQuantB)
# filter max values
depth_img_scaled[depth_img_raw==0] = 0
# depth_img_scaled provides distance in meters as f32
# for storing it as png, we need to convert it to 16UC1 again (depth in mm)
depth_img_mm = (depth_img_scaled*1000).astype(np.uint16)
cv2.imwrite(os.path.join(path_depth, "depth_" + str(msg.header.stamp) + ".png"), depth_img_mm)
else:
raise Exception("Decoding of '" + depth_fmt + "' is not implemented!")

Generate 2d images of molecules from PubChem FTP data

Rather than crawl PubChem's website, I'd prefer to be nice and generate the images locally from the PubChem ftp site:
ftp://ftp.ncbi.nih.gov/pubchem/specifications/
The only problem is that I'm limited to OSX and Linux and I can't seem to find a way of programmatically generating the 2d images that they have on their site. See this example:
https://pubchem.ncbi.nlm.nih.gov/compound/6#section=Top
Under the heading "2D Structure" we have this image here:
https://pubchem.ncbi.nlm.nih.gov/image/imgsrv.fcgi?cid=6&t=l
That is what I'm trying to generate.

If you want something working out of the box I would suggest using molconvert from ChemAxon's Marvin (https://www.chemaxon.com/products/marvin/), which is free for academics. It can be used easily from the command line and it supports plenty of input and output formats. So for your example it would be:
molconvert "png" -s "C1=CC(=C(C=C1[N+](=O)[O-])[N+](=O)[O-])Cl" -o cdnb.png
Resulting in the following image:
It also allows you to set parameters such as width, height, quality, background color and so on.
However, if you are a programmer I would definitely recommend RDKit. Follows a code which generates images for a pair of compounds given as smiles.
from rdkit import Chem
from rdkit.Chem import Draw
ms_smis = [["C1=CC(=C(C=C1[N+](=O)[O-])[N+](=O)[O-])Cl", "cdnb"],
["C1=CC(=CC(=C1)N)C(=O)N", "3aminobenzamide"]]
ms = [[Chem.MolFromSmiles(x[0]), x[1]] for x in ms_smis]
for m in ms: Draw.MolToFile(m[0], m[1] + ".svg", size=(800, 800))
This gives you following images:

So I also emailed the PubChem guys and they got back to me very quickly with this response:
The only bulk access we have to images is through the download
service: https://pubchem.ncbi.nlm.nih.gov/pc_fetch/pc_fetch.cgi
You can request up to 50,000 images at a time.
Which is better than I was expecting, but still not amazing since it requires downloading things that I in theory could generate locally. So I'm leaving this question open until some kind soul writes an open source library to do the same.
Edit:
I figure I might as well save people some time if they are doing the same thing as I am. I've created a Ruby Gem backed on Mechanize to automate the downloading of images. Please be kind to their servers and only download what you need.
https://github.com/zachaysan/pubchem
gem install pubchem

An open source option is the Indigo Toolkit, which also has pre-compiled packages for Linux, Windows, and MacOS and language bindings for Python, Java, .NET, and C libraries. I chose the 1.4.0 beta.
I had a similar interest to yours in converting SMILES to 2D structures and adapted my Python to address your question and to capture timing information. It uses the PubChem FTP (Compound/Extras) download of CID-SMILES.gz. The following script is an implementation of a local SMILES-to-2D-structure converter that reads a range of rows from the PubChem CID-SMILES file of isomeric SMILES (which contains over 102 million compound records) and converts the SMILES to PNG images of the 2D structures. In three tests with 1000 SMILES-to-structure conversions, it took 35, 50, and 60 seconds to convert 1000 SMILES at file row offsets of 0, 100,000, and 10,000,000 on my Windows 10 laptop (Intel i7-7500U CPU, 2.70GHz) with a solid state drive and running Python 3.7.4. The 3000 files totaled 100 MB in size.
from indigo import *
from indigo.renderer import *
import subprocess
import datetime
def timerstart():
# start timer and print time, return start time
start = datetime.datetime.now()
print("Start time =", start)
return start
def timerstop(start):
# end timer and print time and elapsed time, return elapsed time
endtime = datetime.datetime.now()
elapsed = endtime - start
print("End time =", endtime)
print("Elapsed time =", elapsed)
return elapsed
numrecs = 1000
recoffset = 0 # 10000000 # record offset
starttime = timerstart()
indigo = Indigo()
renderer = IndigoRenderer(indigo)
# set render options
indigo.setOption("render-atom-color-property", "color")
indigo.setOption("render-coloring", True)
indigo.setOption("render-comment-position", "bottom")
indigo.setOption("render-comment-offset", "20")
indigo.setOption("render-background-color", 1.0, 1.0, 1.0)
indigo.setOption("render-output-format", "png")
# set data path (including data file) and output file path
datapath = r'../Download/CID-SMILES'
pngpath = r'./2D/'
# read subset of rows from data file
mycmd = "head -" + str(recoffset+numrecs) + " " + datapath + " | tail -" + str(numrecs)
print(mycmd)
(out, err) = subprocess.Popen(mycmd, stdout=subprocess.PIPE, shell=True).communicate()
lines = str(out.decode("utf-8")).split("\n")
count = 0
for line in lines:
try:
cols = line.split("\t") # split on tab
key = cols[0] # cid in cols[0]
smiles = cols[1] # smiles in cols[1]
mol = indigo.loadMolecule(smiles)
s = "CID=" + key
indigo.setOption("render-comment", s)
#indigo.setOption("render-image-size", 200, 250)
#indigo.setOption("render-image-size", 400, 500)
renderer.renderToFile(mol, pngpath + key + ".png")
count += 1
except:
print("Error processing line after", str(count), ":", line)
pass
elapsedtime = timerstop(starttime)
print("Converted", str(count), "SMILES to PNG")

Need to take data from text file to spreadsheet for analysis

I am working with data which I get in text files, and which has to be subsequently analysed. I'm currently using Excel for this task. The original file looks like this:
Contact Angle (deg) 86.20
Wetting Tension (dy/cm) 4.836
Wetting Tension Left (dy/cm) 39.44
Wetting Tension Right (dy/cm) 39.44
Base Tilt Angle (deg) 0.00
Base (mm) 1.6858
Base Area (mm2) 2.2322
Height (mm) 0.7888
Tip Width (mm) 0.9707
Wetted Tip Width (mm) 0.9581
Sessile Volume (ul) 1.1374
Sessile Surface Area (mm2) 4.1869
Contrast (cts) 245
Sharpness (cts) 161
Black Peak (cts) 10
White Peak (cts) 255
Edge Threshold (cts) 111
Base Left X (mm) 4.138
Base Right X (mm) 5.821
Base Y (mm) 2.980
RMS Fit Error (mm) 3.545E-3
#1600
I don't need the majority of this information, and for now, all I need is the Contact Angle at the top, and the time (prefixed by the '#' at the bottom). At the moment, I have a script which extracts the information I need and creates another text file for easy reading. The code used is below:
infile = "in.txt"
outfile = "newout.out"
measure_time = ""
with open(infile) as f, open(outfile, 'w') as f2:
for line in f:
if line.split():
if line.split()[0] == "Contact":
contact_angle = line.split()[-1].strip()
f2.write("Contact Angle (deg): " + contact_angle + '\n')
if line.split()[0][0] == '#':
for i in range(1,5):
measure_time += (line.split()[0][i])
f2.write("Measured at: " + measure_time[:2] + ":" + measure_time[2:] + '\n')
measure_time = ""
else:
continue
What I am looking for is a way to get my data nicely formatted in a spreadsheet for easy analysis. I would like the angles in the same row, in adjoining cells, and the measurement time in the cells below that, but I'm unsure what the best way to go about this is.
Can anyone with some more Python experience help me here?
EDIT: The image here shows what I tried to explain (poorly) above.
EDIT2: The solution posted below by #RonRosenfeld works, but I would still prefer to have a Python solution for this problem, as stated earlier. As I have no previous experience with Excel VBA, I would rather use something familiar to me.

I would just read the original file or files into Excel, selecting only those lines that begin with the Contact Angle, or # token. I'm not sure how much error checking you need to do. The following assumes that you will select multiple files, and that each file is formatted as you demonstrated in your original data. It will output the angles in row 1, and the corresponding times in row 2. It does NOT check for proper formatting; or that every Angle has a corresponding Time.
It also does NOT test and will give an error, if you only select one file. That capability can be added, if necessary.
EDIT: modified to account for either TAB or SPACE as the separator; also added code to clear worksheet and autofit the columns
It should also be easy to modify if you want to select additional parameters.
Option Explicit
'Set Reference to Microsoft Scripting Runtime
Sub GetDataFromTextFiles()
Dim FSO As FileSystemObject
Dim TS As TextStream
Dim F As File
Dim sLines As Variant
Dim I As Long, J As Long
Dim sFilePath
Dim S As String
Dim vLines() As Variant
Dim rExtract As Range
'Hard Coded here but could also use a
'User form to select multiple lines
vLines = Array("#", "Contact Angle")
Set rExtract = [b3]
Cells.Clear
[a3] = "Contact Angle (deg)"
[a4] = "Measured At"
sFilePath = Application.GetOpenFilename("Text Files (*.txt), *.txt", MultiSelect:=True)
Set FSO = New FileSystemObject
For J = LBound(sFilePath) To UBound(sFilePath)
Set TS = FSO.OpenTextFile(sFilePath(J), ForReading)
Do Until TS.AtEndOfStream = True
S = Trim(Replace(TS.ReadLine, Chr(9), Chr(32)))
For I = 0 To UBound(vLines)
If InStr(1, S, vLines(I)) = 1 Then
Select Case I
Case 0 '#
With rExtract(2, 1)
.Value = TimeSerial(Int(Mid(S, 2) / 100), Mid(S, 2) Mod 100, 0)
.NumberFormat = "hh:mm"
End With
Case 1 '#
rExtract(1, 1) = Mid(S, InStrRev(S, " ") + 1)
'advance to next column after outputting angle
Set rExtract = rExtract(1, 2)
End Select
End If
Next I
Loop
Next J
Cells.EntireColumn.AutoFit
End Sub
Here is another macro that does NOT require setting a reference to Microsoft Scripting Runtime. It does not use the FileSystemObject, but rather uses built-in VBA routines to read the file. I have been told that it will run more quickly, but I've not tested it myself. In addition, there could be issues with certain types of data, but they do not seem to exist in your files, and it runs fine on your sample.
Option Explicit
Sub GetDataFromTextFiles()
Dim sLines As Variant
Dim I As Long, J As Long
Dim sFilePath
Dim S As String
Dim vLines() As Variant
Dim rExtract As Range
'Hard Coded here but could also use a
'User form to select multiple lines
vLines = Array("#", "Contact Angle")
Set rExtract = [b3]
Cells.Clear
[a3] = "Contact Angle (deg)"
[a4] = "Measured At"
sFilePath = Application.GetOpenFilename("Text Files (*.txt), *.txt", MultiSelect:=True)
For J = LBound(sFilePath) To UBound(sFilePath)
Open sFilePath(J) For Input As #1
Do While Not EOF(1)
Input #1, S
S = Trim(Replace(S, Chr(9), Chr(32)))
For I = 0 To UBound(vLines)
If InStr(1, S, vLines(I)) = 1 Then
Select Case I
Case 0 '#
With rExtract(2, 1)
.Value = TimeSerial(Int(Mid(S, 2) / 100), Mid(S, 2) Mod 100, 0)
.NumberFormat = "hh:mm"
End With
Case 1
rExtract(1, 1) = Mid(S, InStrRev(S, " ") + 1)
'advance to next column after outputting angle
Set rExtract = rExtract(1, 2)
End Select
End If
Next I
Loop
Close #1
Next J
Cells.EntireColumn.AutoFit
End Sub

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.