Hello, I have some problem with converting Tiff file to numpy array.
I have a 16 bit signed raster file and I want to convert it to numpy array.
I using to this gdal libarary.
import numpy
from osgeo import gdal
ds = gdal.Open("C:/.../dem.tif")
dem = numpy.array(ds.GetRasterBand(1).ReadAsArray())
At first glance, everything converts well, but I compared the result obtained in python with result in GIS software and I got different results.
Python result
Arcmap result
I found many value in numpy array that are below 91 and 278 (real min and max values), that should not exist.
GDAL already returns a Numpy array, and wrapping it in np.array by default creates a copy of that array. Which is an unnecessary performance hit. Just use:
dem = ds.GetRasterBand(1).ReadAsArray()
Or if its a single-band raster, simply:
dem = ds.ReadAsArray()
Regading the statistics, are you sure ArcMap shows the absolute high/low value? I know QGIS for example often draws the statistics from a sample of the dataset (for performance) and depending on the settings sometimes uses a percentile (eg 1%, 99%).
edit: BTW, is this a public dataset? Like an SRTM tile? It might help if you list the source.
Related
I have a big file saved from matlab with version -v7.3, when reading it by python, the shape of matrix change !! is that normal ?
For example, let's have the below matrix in MATLAB,
clear all, clcl
A = randn(10,3) + randn(10,3)*i;
save('example.mat','-v7.3'); %% The saved file is example.mat with version 7.3
above, the saved file is example.mat a matrix of size (10,3)
so, let's go to python to read that file :
import numpy as np
import h5py as h5
data_try = h5.File('example.mat', 'r')
A = np.array(data_try)
A = A.view(np.complex) #here the matrix equivalent to that one in matlab
but what i find that A in python is of size (3,10) !! and also when having matrix of three dimensions, the shape is changing !!
Is that normal that python reads the transpose of matrix coming from matlab ??!! or something wrong is happening !
However when using the other way as below:
import scipy.io as spio
Data = spio.loadmat('example.mat', squeeze_me=True)
A = Data[‘A’]
in that case, everything is really nice, but unfortunately we can not use that way for big matrices !!!
please, any solution for that issue ?
You might face a problem with different memory alignment in Matlab (column-major) and Numpy (row-major)... check e.g. this question for related discussion and a solution (reshaping in Fortran-style, which is also column-major).
SciPy's .mat interface automatically takes care of this reinterpretation, which is why you don't encounter the problem when using it.
I'm currently using Python 3.7.3 on Linux CentOS 7. I am conducting research related to NASA's THEMIS All-Sky-Imager (ASI) database, where I attempt to extract auroral images (keograms) from the .cdf files (which describe visual information for each auroral epoch in the form of a NumPy array). I convert the NumPy array to 'approximately' its corresponding image using Matplotlib and PIL (Python Imaging Library) in the following code:
I would like to note that I'm using the Python3 console within the Linux terminal. Also, I am using the SpacePy Library to read the .cdf file using the pycdf.CDF(...) function.
# Define .cdf file object as downloaded from NASA THEMIS ASI Database
>>> cdf = pycdf.CDF('/projectnb/burbsp/big/SATELLITE/themis/data/thg/l1/asi/gill/2008/01/thg_l1_asf_gill_2008011403_v01.cdf')
# Display .cdf file objects; 'thg_asf_gill' defines the set of arrays which contains the visual information stored for each image; i.e. cdf['thg_asf_gill'][0,:,:] would define the first epoch array up to cdf['thg_asf_gill'][1197,:,:] which is the last epoch array (a total 1198 epoch arrays).
>>> print(cdf)
range_epoch: CDF_EPOCH [2]
thg_asf_gill: CDF_UINT2 [1198, 256, 256]
thg_asf_gill_column: CDF_UINT2 [256] NRV
thg_asf_gill_epoch: CDF_EPOCH [1198]
thg_asf_gill_epoch0: CDF_EPOCH [] NRV
thg_asf_gill_row: CDF_UINT2 [256] NRV
thg_asf_gill_tend: CDF_REAL8 [1198]
thg_asf_gill_time: CDF_REAL8 [1198]
Then, I use what I've seen suggested on stack overflow to convert the NumPy array (take the last epoch array as an example) into an image plot.
>>> pyplot.imshow(cdf['thg_asf_gill'][1197,:,:])
>>> pyplot.show()
The resulting image (Image Plot of Aurora) can be found below:
As you can see there are some features that aren't shown because they are a bit too dim, so I tried to play around with the color scaling using the following:
pyplot.clim(vmin,vmax)
Where vmin and vmax are some values; in particular I use vmin = 5000 and vmax = 10000 (I also played around with other values), the resulting image (Altered Image Plot of Aurora) is:
The image becomes too distorted (resolution-wise).
The NASA THEMIS ASI image is found below:
As you can see, the image yielded from PIL and Matplotlib is somewhat of a transormation/rotation of the actual NASA image. Also, the dim-ish features are lost. So, what's a way to increase the resolution of the resultant image such that the the bright features don't become too bright/distorted (as seen from the altered image) and the dim features are enhanced?
Thank you!
I have a large 40 mb (about 173,397 lines) .dat file filled with binary data (random symbols). It is an astronomical photograph. I need to read and display it with Python. I am using a binary file because I will need to extract pixel value data from specific regions of the image. But for now I just need to ingest it into Python. Something like the READU procedure in IDL. Tried numpy and matplotlib but nothing worked. Suggestions?
You need to know the data type and dimensions of the binary file. For example, if the file contains float data, use numpy.fromfile like:
import numpy as np
data = np.fromfile(filename, dtype=float)
Then reshape the array to the dimensions of the image, dims, using numpy.reshape (the equivalent of REFORM in IDL):
im = np.reshape(data, dims)
I have tens of thousands of text files to analyze, where each text file represents a snapshot in time of the physical state of a system. The micro-state of each "pixel" is represented by floats from 0 to 1. Is it possible for OpenCV to directly read a text file without first having to convert the text file to an image format? I do not want to create tens of thousands of image files every time I carry out this analysis.
Context/goal: I am analyzing a thermal simulation of a nano-magnetic system, and will eventually need to use OpenCV to calculate the contour areas of clusters formed above a certain threshold value.
I've included my code attempt below, using a test text file. The system is a square system of side length 40, and I am analyzing the column of 40^2 = 1600 data points which I call mag (for magnetization, as this is from a scientific research project). I multiply each "pixel" by 255 to mimic grayscale. As soon as the program reaches the cv2.threshold line, I get an error:
~/anaconda/conda-bld/work/opencv-2.4.8/modules/imgproc/src/thresh.cpp:783: error: (-210) in function threshold
which I suspect arises from my mimicking grayscale instead of reading an actual grayscale image file.
import numpy as np
import cv2
SideDim = 40
dud, mag = np.loadtxt('Aex_testfile.txt', unpack=True, usecols=(4,5), skiprows=2)
mag = np.reshape(mag, (SideDim,SideDim))
for row in range(SideDim):
for col in range(SideDim):
mag[row][col] = round(255 * mag[row][col])
mag = mag.astype(np.int)
ret,thresh = cv2.threshold(mag,0,255,cv2.THRESH_BINARY)
plt.imshow(thresh,'gray')
Regarding the question in your post title:
In Python, CV2 does not convert text into an image format. Instead, "images" are just numpy arrays. You are then correct in using np.loadtxt to import data (though I'm partial to np.genfromtxt(), as it's slightly more robust).
Regarding the error you're getting:
Error code -210 is defined as:
#define CV_StsUnsupportedFormat -210 /* the data format/type is not supported by the function*/
cv2.threshold() uses an 8 bit integer. Instead of casting mag as np.int, cast it as np.uint8. This should fix your error
Other things to note:
With numpy arrays, you don't need to use those ugly nested loops to multiply each value by 255. Instead, just do mag * 255.
Instead of multiplying by 255 (which doesn't quite make sense unless you're positive your maximum value is 1...), you should really just normalize your array. Something like (mag / mag.amax()) * 255 would be a better solution.
You don't need open CV for this part of the program. Instead, you can just do it in numpy:
thresh = 255 * (mag > threshval)
this will produce an array (thresh) that has any values greater than threshval set equal to 255
In general, I think it would behoove you to learn numpy before jumping into opencv. I think you'd be surprised at how much you can do in numpy.
I am looking to store pixel values from satellite imagery into an array. I've been using
np.empty((image_width, image_length)
and it worked for smaller subsets of an image, but when using it on the entire image (3858 x 3743) the code terminates very quickly and all I get is an array of zeros.
I load the image values into the array using a loop and opening the image with gdal
img = gdal.Open(os.path.join(fn + "\{0}".format(fname))).ReadAsArray()
but when I include print img_array I end up with just zeros.
I have tried almost every single dtype that I could find in the numpy documentation but keep getting the same result.
Is numpy unable to load this many values or is there a way to optimize the array?
I am working with 8-bit tiff images that contain NDVI (decimal) values.
Thanks
Not certain what type of images you are trying to read, but in the case of radarsat-2 images you can the following:
dataset = gdal.Open("RADARSAT_2_CALIB:SIGMA0:" + inpath + "product.xml")
S_HH = dataset.GetRasterBand(1).ReadAsArray()
S_VV = dataset.GetRasterBand(2).ReadAsArray()
# gets the intensity (Intensity = re**2+imag**2), and amplitude = sqrt(Intensity)
self.image_HH_I = numpy.real(S_HH)**2+numpy.imag(S_HH)**2
self.image_VV_I = numpy.real(S_VV)**2+numpy.imag(S_VV)**2
But that is specifically for that type of images (in this case each image contains several bands, so i need to read in each band separately with GetRasterBand(i), and than do ReadAsArray() If there is a specific GDAL driver for the type of images you want to read in, life gets very easy
If you give some more info on the type of images you want to read in, i can maybe help more specifically
Edit: did you try something like this ? (not sure if that will work on tiff, or how many bits the header is, hence the something:)
A=open(filename,"r")
B=numpy.fromfile(A,dtype='uint8')[something:].reshape(3858,3743)
C=B*1.0
A.close()
Edit: The problem is solved when using 64bit python instead of 32bit, due to memory errors at 2Gb when using the 32bit python version.