GDAL WriteArray issue - python

I'm utilizing python GDAL to write a raster data into a .tif file. Here's the code:
import numpy, sys
from osgeo import gdal, utils
from osgeo.gdalconst import *
# register all of the GDAL drivers
gdal.AllRegister()
# open the image
inDs = gdal.Open("C:\\Documents and Settings\\patrick\\Desktop\\tiff elevation\\EBK1KM\\color_a1.tif",GDT_UInt16)
if inDs is None:
print "couldn't open input dataset"
sys.exit(1)
else:
print "opening was successful!"
cols = inDs.RasterXSize
rows = inDs.RasterYSize
bands = inDs.RasterCount
driver = inDs.GetDriver()
driver.Create("C:\\Documents and Settings\\patrick\\Desktop\\tiff elevation\\EBK1KM\\newfile.tif",cols,rows,3,GDT_UInt16)
outDs = gdal.Open("C:\\Documents and Settings\\patrick\\Desktop\\tiff elevation\\EBK1KM\\newfile.tif")
if outDs is None:
print "failure to create new file"
sys.exit(1)
outBand1 = outDs.GetRasterBand(1)
outBand2 = outDs.GetRasterBand(2)
outBand3 = outDs.GetRasterBand(3)
data1 = inDs.GetRasterBand(1).ReadAsArray()
data2 = inDs.GetRasterBand(2).ReadAsArray()
data3 = inDs.GetRasterBand(3).ReadAsArray()
outBand1.WriteArray(data1,0,0)
outBand2.WriteArray(data2,0,0)
outBand3.WriteArray(data3,0,0)
print "before closing out the file"
print outDs.GetRasterBand(1).ReadAsArray(700,700,5,5)
print outDs.GetRasterBand(2).ReadAsArray(700,700,5,5)
print outDs.GetRasterBand(3).ReadAsArray(700,700,5,5)
outDs.SetProjection(inDs.GetProjection())
outDs.SetGeoTransform(inDs.GetGeoTransform())
outDs = None
outDs = gdal.Open("C:\\Documents and Settings\\patrick\\Desktop\\tiff elevation\\EBK1KM\\newfile.tif")
print "after reopening"
print outDs.GetRasterBand(1).ReadAsArray(700,700,5,5)
print outDs.GetRasterBand(2).ReadAsArray(700,700,5,5)
print outDs.GetRasterBand(3).ReadAsArray(700,700,5,5)
The resultant output between the closing and reopening of the output dataset are different:
before closing out the file
[[ 36 35 55 121 0]
[ 54 0 111 117 0]
[ 0 117 152 56 0]
[ 89 122 56 0 0]
[102 107 0 25 53]]
[[ 68 66 126 200 0]
[ 78 0 166 157 0]
[ 0 235 203 70 0]
[229 251 107 0 0]
[241 203 0 42 121]]
[[0 0 0 0 0]
[0 0 0 0 0]
[0 0 0 0 0]
[0 0 0 0 0]
[0 0 0 0 0]]
after reopening
[[0 0 0 0 0]
[0 0 0 0 0]
[0 0 0 0 0]
[0 0 0 0 0]
[0 0 0 0 0]]
[[0 0 0 0 0]
[0 0 0 0 0]
[0 0 0 0 0]
[0 0 0 0 0]
[0 0 0 0 0]]
[[0 0 0 0 0]
[0 0 0 0 0]
[0 0 0 0 0]
[0 0 0 0 0]
[0 0 0 0 0]]
is there some command I'm missing to ensure that the file is written and saved prior to setting the variable to None? I've tried adding both of the following with no luck:
outband1.FlushCache()
outDs.FlushCache()

You don't need to Create then Open a raster (which you were reading GA_ReadOnly). You also don't need gdal.AllRegister() at the beginning, as it has already been called when you load GDAL into Python (see the Raster API tutorial).
Picking up somewhere above (with modifications):
# Create a new raster data source
outDs = driver.Create(out_fname, cols, rows, 3, gdal.GDT_UInt16)
# Write metadata
outDs.SetGeoTransform(inDs.GetGeoTransform())
outDs.SetProjection(inDs.GetProjection())
# Write raster data sets
for i in range(3):
outBand = outDs.GetRasterBand(i + 1)
outBand.WriteArray(data[i])
# Close raster file
outDs = None
Sometimes I add this to ensure the file is fully deallocated, and to prevent running into some gotchas:
del outDs, outBand

Related

Rasterio grayscale .tif image is empty

I'm facing an issue when opening a .tif using rasterio using the code below.
fp = 'image.tif'
image = rasterio.open(fp)
print(image.read())
When printing the content of the image, I get this
[[[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
...
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]]]
I verified all values and they are all 0. However, when dragging the image in QGIS, I can view it and confirm that the image contains values ranging from 101 to 122.
QGIS image
Any idea on how to read the image and get these 101 to 122 values as a numpy array ?
Here's a link to the image in question

LabelBinarizer gives all values zeros

I'm encoding my labels with label binarizer like this:
from sklearn.preprocessing import LabelBinarizer
# Transform labels to one-hot
lb = LabelBinarizer()
Y = lb.fit_transform(df.classification)
But when I print Y I get all zeros like:
[[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
...
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]]
I don't know if all the values in all rows are zeros or not. Unfortunately, I can't see the complete row and couldn't find a way to do so. Are these values right or not?
Any help would be appreciated.

matlab's bwmorph(image, 'spur') in python

I'm porting a matlab image processing script over to python/skimage and haven't been able to find Matlab's bwmorph function, specifically the 'spur' operation in skimage. The matlab docs say this about spur operation:
Removes spur pixels. For example:
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 1 0 becomes 0 0 0 0
0 1 0 0 0 1 0 0
1 1 0 0 1 1 0 0
I've implemented a version in python than handles the above case fine:
def _neighbors_conv(image):
image = image.astype(np.int)
k = np.array([[1,1,1],[1,0,1],[1,1,1]])
neighborhood_count = ndimage.convolve(image,k, mode='constant', cval=1)
neighborhood_count[~image.astype(np.bool)] = 0
return neighborhood_count
def spur(image):
return _neighbors_conv(image) > 1
def bwmorph(image, fn, n=1):
for _ in range(n):
image = fn(image)
return image
t= [[0, 0, 0, 0],
[0, 0, 1, 0],
[0, 1, 0, 0],
[1, 1, 0, 0]]
t = np.array(t)
print('neighbor count:')
print(_neighbors_conv(t))
print('after spur:')
print(bwmorph(t,spur).astype(np.int))
neighbor count:
[[0 0 0 0]
[0 0 1 0]
[0 3 0 0]
[7 5 0 0]]
after spur:
[[0 0 0 0]
[0 0 0 0]
[0 1 0 0]
[1 1 0 0]]
The above works by removing any pixels that only have a single neighboring pixel.
I have noticed that the above implementation behaves differently than matlab's spur operation though. Take this example in matlab:
0 0 0 0 0
0 0 1 0 0
0 1 1 1 1
0 0 1 0 0
0 0 0 0 0
becomes, via bwmorph(t,'spur',1):
0 0 0 0 0
0 0 0 0 0
0 0 1 1 1
0 0 0 0 0
0 0 0 0 0
The spur operation is a bit more complex than looking at the 8-neighbor count. It is not clear to me how to extend my implementation to satisfy this case without making it too aggressive (i.e. removing valid pixels).
What is the underlying logic of matlab's spur or is there a python implementation already available that I can use?
UPDATE:
I have found Octave's implemenation of spur that uses a LUT:
case('spur')
## lut=makelut(inline("xor(x(2,2),(sum((x&[0,1,0;1,0,1;0,1,0])(:))==0)&&(sum((x&[1,0,1;0,0,0;1,0,1])(:))==1)&&x(2,2))","x"),3);
## which is the same as
lut=repmat([zeros(16,1);ones(16,1)],16,1); ## identity
lut([18,21,81,273])=0; ## 4 qualifying patterns
lut=logical(lut);
cmd="BW2=applylut(BW, lut);";
(via https://searchcode.com/codesearch/view/9585333/)
Assuming that is correct I just need to be able to create this LUT in python and apply it...
I ended up implementing my own version of spur and other operations of bwmorph myself. For future internet travelers who have the same need here is a handy gist of what I ended up using:
https://gist.github.com/bmabey/4dd36d9938b83742a88b6f68ac1901a6

rotate an nxnxn matrix in python

I have a binary array of size 64x64x64, where a volume of 40x40x40 is set to "1" and rest is "0". I have been trying to rotate this cube about its center around z-axis using skimage.transform.rotate and also Opencv as:
def rotateImage(image, angle):
row, col = image.shape
center = tuple(np.array([row, col]) / 2)
rot_mat = cv2.getRotationMatrix2D(center, angle, 1.0)
new_image = cv2.warpAffine(image, rot_mat, (col, row))
return new_image
In the case of openCV, I tried, 2D rotation of each idividual slices in a cube (Cube[:,:,n=1,2,3...p]).
After rotating, total sum of the values in the array changes. This may be caused by interpolation during rotation. How can I rotate 3D array of this kind without adding anything to the array?
Ok so I understand now what you are asking. The closest I can come up with is scipy.ndimage. But there is a way interface with imagej from python if which might be easier. But here is what I did with scipy.ndimage:
from scipy.ndimage import interpolation
angle = 25 #angle should be in degrees
Rotatedim = interpolation.rotate(yourimage, angle, reshape = False,output = np.int32, order = 5,prefilter = False)
This worked for some angles to preserve the some and not others, perhaps by playing around more with the parameters you might be able to get your desired outcome.
One option is to convert into sparse, and transform the coordinates using a matrix rotation. Then transform back into dense. In 2 dimensions, this looks like:
import numpy as np
import scipy.sparse
import math
N = 10
space = np.zeros((N, N), dtype=np.int8)
space[3:7, 3:7].fill(1)
print(space)
print(np.sum(space))
space_coo = scipy.sparse.coo_matrix(space)
Coords = np.array(space_coo.nonzero()) - 3
theta = 30 * 3.1416 / 180
R = np.array([[math.cos(theta), math.sin(theta)], [-math.sin(theta), math.cos(theta)]])
space2_coords = R.dot(Coords)
space2_coords = np.round(space2_coords)
space2_coords += 3
space2_sparse = scipy.sparse.coo_matrix(([1] * space2_coords.shape[1], (space2_coords[0], space2_coords[1])), shape=(N, N))
space2 = space2_sparse.todense()
print(space2)
print(np.sum(space2))
Output:
[[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 1 1 1 1 0 0 0]
[0 0 0 1 1 1 1 0 0 0]
[0 0 0 1 1 1 1 0 0 0]
[0 0 0 1 1 1 1 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]]
16
[[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 1 0 0 0 0 0 0]
[0 0 1 1 1 1 0 0 0 0]
[0 0 1 1 1 1 1 0 0 0]
[0 1 1 0 1 1 0 0 0 0]
[0 0 0 1 1 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]]
16
The advantage is that you'll get exactly as many 1 values before and after the transform. The downsides is that you might get 'holes', as above, and/or duplicate coordinates, giving values of '2' in the final dense matrix.

outputting large matrix in python from a dictionary

I have a python dictionary formatted in the following way:
data[author1][author2] = 1
This dictionary contains an entry for every possible author pair (all pairs of 8500 authors), and I need to output a matrix that looks like this for all author pairs:
"auth1" "auth2" "auth3" "auth4" ...
"auth1" 0 1 0 3
"auth2" 1 0 2 0
"auth3" 0 2 0 1
"auth4" 3 0 1 0
...
I have tried the following method:
x = numpy.array([[data[author1][author2] for author2 in sorted(data[author1])] for author1 in sorted(data)])
print x
outf.write(x)
However, printing this leaves me with this:
[[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]
...,
[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]
[0 0 0 ..., 0 0 0]]
and the output file is just a blank text file. I am trying to format the output in a way to read into Gephi (https://gephi.org/users/supported-graph-formats/csv-format/)
You almost got it right, your list comprehension is inverted. This will give you the expected result:
d = dict(auth1=dict(auth1=0, auth2=1, auth3=0, auth4=3),
auth2=dict(auth1=1, auth2=0, auth3=2, auth4=0),
auth3=dict(auth1=0, auth2=2, auth3=0, auth4=1),
auth4=dict(auth1=3, auth2=0, auth3=1, auth4=0))
np.array([[d[i][j] for i in sorted(d.keys())] for j in sorted(d[k].keys())])
#array([[0, 1, 0, 3],
# [1, 0, 2, 0],
# [0, 2, 0, 1],
# [3, 0, 1, 0]])
You could use pandas. Using #Saullo Castro input:
import pandas as pd
df = pd.DataFrame.from_dict(d)
Result:
>>> df
auth1 auth2 auth3 auth4
auth1 0 1 0 3
auth2 1 0 2 0
auth3 0 2 0 1
auth4 3 0 1 0
And if you want to save you can just do df.to_csv(file_name)

Categories

Resources