I am trying to make a palettized version of my height image data (using Python/Matplotlib) and for some reason...it is giving me quite weird horizontal lines which I know are not actually present in the dataset.
Both images (mine and the "better" one).
Is this something weird with how Matplotlib normalizes the data? I just don't quite understand how this could happen, so I am at a loss for where to start. I have provided my code below (sorry if there is a typo, I slightly changed it to make sense outside of the code).
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# file location of the raw data
fileloc = r'C:\Users\...\raw_height_profile.csv'
# generate height profile map
palettized_image = getheightprofile(fileloc)
def getheightprofile(fileloc, color_palette='jet'):
# read data from file
data = pd.read_csv(fileloc, skiprows=0)
# generate colormap (I'm using the jet colormap rn)
colormap = plt.get_cmap(color_palette)
# normalize the height data to the range [0, 1]
norm = (data - np.min(data)) / (np.max(data) - np.min(data))
# convert the height data to RGB values using the palette
palettized_data = (colormap(norm)*255).astype(np.uint8)
# save the file as a png (to check quality)
saveloc = r'C:\Users\...\palletized_height_profile.png'
plt.imsave(saveloc, palettized_data)
# return the nice numbers for later analysis
return palettized_data
But instead of returning the nice image that I think I should get, it returns a super weird image with lines across it. note: I know these images aren't quite the same palettization, but I think you can understand the issue.
Does anyone understand how, why, etc.? I have also attached a link to the dataset, because maybe that is helpful...but I am quite sure there is nothing wrong with the data.
Related
I have a large tiff file (around 2GB) containing a map. I have been able to successfully read the data and even display it using the following python code:
import rasterio
from rasterio.plot import show
with rasterio.open("image.tif") as img:
show(img)
data = img.read()
This works just fine. However, I need to be able to display specific parts of this map without having to load the entire file into memory (as it takes up too much of the RAM and is not doable on many other PCs). I tried using the Window class of rasterio in order to that, but when I tried to display the map the outcome was different from how the full map is displayed (as if it caused data loss):
import rasterio
from rasterio.plot import show
from rasterio.windows import Window
with rasterio.open("image.tif") as img:
data = img.read(window=Window(0, 0, 100000, 100000))
show(data)
So my question is, how can I display a part of the map without having to load into memory the entire file, while also making it look as if it had been cropped from the full map image?
thanks in advance :)
The reason that it displays nicely in the first case, but not in the second, is that in the first case you pass an instance of rasterio.DatasetReader to show (show(img)), but in the second case you pass in a numpy array (show(data)). The DatasetReader contains additional information, in particular an affine transformation and color interpretation, which show uses.
The additional things show does in the first case (for RGB data) can be recreated for the windowed case like so:
import rasterio
from rasterio.enums import ColorInterp
from rasterio.plot import show
from rasterio.windows import Window
with rasterio.open("image.tif") as img:
window = Window(0, 0, 100000, 100000)
# Lookup table for the color space in the source file
source_colorinterp = dict(zip(img.colorinterp, img.indexes))
# Read the image in the proper order so the numpy array will have the colors in the
# order expected by matplotlib (RGB)
rgb_indexes = [
source_colorinterp[ci]
for ci in (ColorInterp.red, ColorInterp.green, ColorInterp.blue)
]
data = img.read(rgb_indexes, window=window)
# Also pass in the affine transform corresponding to the window in order to
# display the correct coordinates and possibly orientation
show(data, transform=img.window_transform(window))
(I figured out what show does by looking at the source code here)
In case of data with a single channel, the underlying matplotlib library used for plotting scales the color range based on the min and max value of the data. To get exactly the same colors as before, you'll need to know the min and max of the whole image, or some values that come reasonably close.
Then you can explicitly tell matplotlib's imshow how to scale:
with rasterio.open("image.tif") as img:
window = Window(0, 0, 100000, 100000)
data = img.read(window=window, masked=True)
# adjust these
value_min = 0
value_max = 255
show(data, transform=img.window_transform(window), vmin=value_min, vmax=value_max)
Additional kwargs (like vmin and vmax here) will be passed on to matplotlib.axes.Axes.imshow, as documented here.
From the matplotlib documenation:
vmin, vmax: float, optional
When using scalar data and no explicit norm, vmin and vmax define the data range that the colormap covers. By default, the colormap covers the complete value range of the supplied data. It is deprecated to use vmin/vmax when norm is given. When using RGB(A) data, parameters vmin/vmax are ignored.
That way you could also change the colormap it uses etc.
The Problem:
I'm trying to simulate a live video by cycling through a series of still images I have saved in a directory, but when I add the animation and update functions my plot is displayed empty.
Background on why I'm doing this:
I believe its important for me to do it this way rather than a complete change of approach, say turning the images into a video first then displaying that, because what I really want to test is the image analysis I will be adding and then overlaying on each frame. The final application will be receiving frames one by one from a camera and will need to do some processing, display the image + annotations + output the data as .csv etc... I'm simulating this for now because I do not have any of the hardware to generate the images and will not have it for several months during which time I need to get the image processing set up, but I do have access to some sets of stills that are approximately what will be produced. In case its relevant my simulation images are 1680x1220 and are 1.88 Mb TIFFs, though I could covert and compress them if needed, and in the final form the resolution will be a bit higher and probably the image format could be adjusted if needed.
What I have tried:
I followed an example to list all files in a folder, and an example
to update a plot. However, the plot displays blank when I run the
code.
I added a line to print the current file name, and I can see this
cycling as expected.
I also made sure the images will display in the plot if I just create
a plot and add one image, and they do. But, when combined with the
animation function the plot is blank and I'm not sure what I've done
wrong/failed to include.
I also tried adding a plt.pause() in the update, but again this
didn't work.
I increased the interval up to 2000 to give it more time, but that didn't work. I believe 2000 is extreme, I'm expecting it should work with more like 20-30fps. Going to 0.5fps tells me the code is wrong or incomplete, rather than it just being a question of needing time to read the image file.
I appreciate no one else has my images, but they are nothing special. I'm using 60 images but I guess it could be tested with any 2 random images and setting range(60) to range(2) instead?
The example I copied originally demonstrated the animation function by making a random array, and if I do that it will show a plot that updates with random squares as expected.
Replacing:
A = np.random.randn(10,10)
im.set_array(A)
...with my image instead...
im = cv2.imread(files[i],0)
...and the plot remains empty/blank. I get a window shown called "Figure1" (like when using the random array), but unlike with the array there is nothing in this window.
Full code:
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
import os
import cv2
def update(i):
im = cv2.imread(files[i],0)
print(files[i])
#plt.pause(0.1)
return im
path = 'C:\\Test Images\\'
files = []
# r=root, d=directories, f = files
for r, d, f in os.walk(path):
for file in f:
if '.TIFF' in file:
files.append(os.path.join(r, file))
ani = FuncAnimation(plt.gcf(), update, frames=range(60), interval=50, blit=False)
plt.show()
I'm a python and a programming novice so have relied on adjusting examples others have given online but I have only a simplistic understanding of how they are working and end up with a lot of trial and error on the syntax. I just can't figure out anything to make this one work though.
Cheers for any help!
The main reason nothing is showing up is because you never add the images to the plot. I've provided some code below to do what you want, be sure to look up anything you are curious about or don't understand!
import glob
import os
from matplotlib import animation
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
IMG_DIRPATH = 'C:\\Test Images\\' # the folder with your images (be careful about
# putting spaces in directory names!)
IMG_EXT = '.TIFF' # the file extension of your images
# Create a figure, and set to the desired size.
fig = plt.figure(figsize=[5, 5])
# Create axes for the current figure so that images can be sized appropriately.
# Passing in [0, 0, 1, 1] makes the axes fill the whole figure.
# frame_on=False means we won't have a bounding box, and setting xticks=[] and
# yticks=[] means that we won't have pesky tick marks along our image.
ax_props = {'frame_on': False, 'xticks': [], 'yticks': []}
ax = plt.axes([0, 0, 1, 1], **ax_props)
# Get all image filenames.
img_filepaths = glob.glob(os.path.join(IMG_DIRPATH, '*' + IMG_EXT))
def update_image(img_filepath):
# Remove all existing images on the axes, and restore our settings.
ax.clear()
ax.update(ax_props)
# Read the current image.
img = mpimg.imread(img_filepath)
# Add the current image to the plot axes.
ax.imshow(img)
anim = animation.FuncAnimation(fig, update_image, frames=img_filepaths, interval=250)
plt.show()
I have a satellite image of 7-channels (Basically I have seven .tif files, one for each band). And I have a .csv file with coordinates of points-of-interest that are in the region shot by the satellite. I want to cut small portions of the image in the surroundings of each coordinate point. How could I do that?
As I don't have a full working code right now, it really doesn't matter the size of those small portions of image. For the explanation of this question let's say that I want them to be 15x15 pixels. So for the moment, my final objective is to obtain a lot of 15x15x7 vectors, one for every coordinate point that I have in the .csv file. And that is what I am stucked with. (the "7" in the "15x15x7" is because the image has 7 channels)
Just to give some background in case it's relevant: I will use those vectors later to train a CNN model in keras.
This is what I did so far: (I am using jupyter notebook, anaconda environment)
imported gdal, numpy, matplotlib, geopandas, among other libraries.
Opened the .gif files using gdal, converted them into arrays
Opened the .csv file using pandas.
Created a numpy array called "imagen" of shape (7931, 7901, 3) that will host the 7 bands of the satellite image (in form of numbers). At this point I just need to know which rows and colums of the array "imagen" correspond to each coordinate point. In other words I need to convert every coordinate point into a pair of numbers (row,colum). And that is what I am stucked with.
After that, I think that the "cutting part" will be easy.
#I import libraries
from osgeo import gdal_array
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import geopandas
from geopandas import GeoDataFrame
from shapely.geometry import Point
#I access the satellite images (I just show one here to make it short)
b1 = r"E:\Imágenes Satelitales\2017\226_86\1\LC08_L1TP_226086_20170116_20170311_01_T1_sr_band1.tif"
band1 = gdal.Open(b1, gdal.GA_ReadOnly)
#I open the .csv file
file_svc = "C:\\Users\\Administrador\Desktop\DeepLearningInternship\Crop Yield Prediction\Crop Type Classification model - CNN\First\T28_Pringles4.csv"
df = pd.read_csv(file_svc)
print(df.head())
That prints something like this:
Lat1 Long1 CropingState
-37.75737 -61.14537 Barbecho
-37.78152 -61.15872 Verdeo invierno
-37.78248 -61.17755 Barbecho
-37.78018 -61.17357 Campo natural
-37.78850 -61.18501 Campo natural
#I create the array "imagen" (I only show one channel here to make it short)
imagen = (np.zeros(7931*7901*7, dtype = np.float32)).reshape(7931,7901,7)
imagen[:,:,0] = band1.ReadAsArray().astype(np.float32)
#And then I can plot it:
plt.imshow(imagen[:,:,0], cmap = 'hot')
plt.plot()
Which plots something like this:
(https://github.com/jamesluc007/DeepLearningInternship/blob/master/Crop%20Yield%20Prediction/Crop%20Type%20Classification%20model%20-%20CNN/First/red_band.png)
I want to transform those (-37,-61) into something like (2230,1750). But I haven't figured it how yet. Any clues?
I have a 64 bit unsigned binary file that contains multiple images, this file is an output of a numeric analysis software that is designed to store graphic information in binary form. The software itself has its builtin function to export images but it's an old software, doing that is an pain in the ass.
So I am trying to convert this file to multiple images using python. I found an potential solution here.
The following is the code I copied from aforementioned post with minimal change for my specific file:
import numpy as np
import matplotlib.pyplot as plt
def main():
data = read_data('test21.SGR', 8192, 8192)
visualize(data)
def read_data(filename, width, height):
with open(filename, 'r') as infile:
# Skip the header
infile.seek(8192)
data = np.fromfile(infile, dtype=np.uint64)
# Reshape the data into a 3D array. (-1 is a placeholder for however many
# images are in the file... E.g. 2000)
return data.reshape((width, height, -1))
def visualize(data):
# There are better ways to do this, but let's keep it simple
plt.ion()
fig, ax = plt.subplots()
im = ax.imshow(data[:,:,0], cmap=plt.cm.gray)
for i in xrange(data.shape[-1]):
image = data[:,:,i]
im.set(data=image, clim=[image.min(), image.max()])
fig.canvas.draw()
main()
But when I use this code the error says:
ValueError: total size of new array must be unchanged
If this will work out I will be basically saving at least 2 hours at work from image extraction. I am a newbie in Python so I don't quite understand how to resolve this issue, any help will be appreciated here.
I'm trying to set the graph background to a dicom image. I followed this example, but the image data given from dicom.pixel_array isn't RGBA. I'm not sure how to convert it, either. I'm also not sure what exactly bokeh is expecting. I've tried finding specifics in the documentation, but not such luck.
from bokeh.plotting import figure, show, output_file
import dicom
import numpy as np
path = "/pathToDicomImage.dcm"
data = dicom.read_file(path)
img = data.pixel_array
p = figure(x_range=(0,10), y_range=(0,10))
# must give a vector of images
p.image_rgba(image=[img], x=0, y=0, dw=10, dh=10)
output_file("image_rgba.html", title="image_rgba.py example")
show(p)
This code doesnt give me any errors, but it doesn't display anything. Maybe the pixel array doesn't have alpha data, so alpha defaults to 0? I'm not sure. Also, I can't quite figure out how to test it.
SOLVED
As was pointed out, I just needed to map the pixel data to rgba space. for this instance, it means duplicating the data to each channel, and setting alpha all the way.
def dicom_image_to_RGBA(image_data):
rows = len(image_data)
cols = rows
img = np.empty((rows,cols), dtype=np.uint32)
view = img.view(dtype=np.uint8).reshape((rows, cols, 4))
for i in range(0,rows):
for j in range(0,cols):
view[i][j][0] = image_data[i][j]
view[i][j][1] = image_data[i][j]
view[i][j][2] = image_data[i][j]
view[i][j][3] = 255
return img
Not being an expert in python, I have had a glance at pydicom's capabilities in handling pixel data. I figured out that pixel_array is the value of the pixel-data attribute of the DICOM dataset as is and pydicom does not offer any functionality to convert it into some standard format which can be handled uniformly. This means you will have to convert it to RGB in most cases which is a quite compilcated and error-prone task.
Things to consider in this:
The encoding (Big/Little Endian, various compression methods like JPEG, JPEG-LS, RLE, ZIP) - DICOM attribute (0002,0010) TransferSyntaxUID
The type of pixeldata (Grayscale, RGB, ...) - DICOM attribute (0028,0004) PhotometricInterpretation, (0028,0103) PixelRepresentation
In case of color images: are the values encoded colur by plane (RRRRR,.....GGGGG,.....BBBBB) or colour by pixel as you expect it to be (RGB RGB...)
The bit depth and which bits are used for actual pixel data values (0028,0100) BitsAllocated, (0028,0101) BitsStored, (0028,0102) Highbit.
are the pixel data values really the values to be displayed or are they indices to a colour/grayscale lookup table (0028,3000) ModalityLUTSequence, (0028,3002) LUTDescriptor, (0028,3003) LUTExplanation, (0028,3004) ModalityLUTType, (0028,3006) LUTData.
Scary, isn't it? For some modern image classes like Enhanced MR, there is even more than that.
However, if you constrain to a particular type of image (e.g. Computed Radiography). limitations to the above mentioned apply that make your life a bit easier.
If you would post a DICOM dump of the image header I could give you some hints how to display that particular image.
HTH
kritzel
What you need to do is map the pixel data returned from pixel_array to RGB space. Usually that is done using a look up table (LUT). Take a look at the functions GetImage and GetLUTValue in the dicomparser module in the dicompyler-core library.
In GetLUTValue it maps the data to an 8-bit greyscale image. If you want to use a different LUT, you would need to map the color space accordingly.