Datashader on Google Colab - blank output - python

I'm trying out Datashader on Google Colab to visualise a large dataset of longitudes and latitudes colored logarithmically with the colorcet.fire colormaps, but my code throws a completely blank output.
Code in text:
import datashader as ds
import pandas as pd
import colorcet
data = pd.read_csv('hab.csv', usecols=['longitude','latitude'])
cvs = ds.Canvas()
agg = cvs.points(data, 'latitude', 'longitude')
ds.tf.set_background(ds.tf.shade(agg, cmap=colorcet.fire, how='log'))
What I see on Colab:

I'm not a collab user, but yes, when I run your code locally with the five datapoints shown I get a blank plot. In my local version, it's because the code is specifying a colormap whose highest value is white, and for a few scattered points each of them are at the highest value. The code uses set_background, perhaps trying to set the background to black as would be suitable for that colormap, but it doesn't specify "black" and so the set_background call does nothing. If I specify the background color and add Datashader spreading so that these single datapoints are easier to see, I do get a plot from your code:
cvs = ds.Canvas()
agg = cvs.points(data, 'latitude', 'longitude')
ds.tf.set_background(ds.tf.shade(ds.tf.spread(agg, px=10), cmap=colorcet.fire, how='log'), "black")
You may have some other problem as well, though, since the plot you showed wasn't just white, it appeared to be transparent. And if your dataset is indeed large, you should see output anyway, because data points would then overlap and use all the colors in the colormap.

Related

Why are there Horizontal Stripes on my Palettized Image?

I am trying to make a palettized version of my height image data (using Python/Matplotlib) and for some reason...it is giving me quite weird horizontal lines which I know are not actually present in the dataset.
Both images (mine and the "better" one).
Is this something weird with how Matplotlib normalizes the data? I just don't quite understand how this could happen, so I am at a loss for where to start. I have provided my code below (sorry if there is a typo, I slightly changed it to make sense outside of the code).
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# file location of the raw data
fileloc = r'C:\Users\...\raw_height_profile.csv'
# generate height profile map
palettized_image = getheightprofile(fileloc)
def getheightprofile(fileloc, color_palette='jet'):
# read data from file
data = pd.read_csv(fileloc, skiprows=0)
# generate colormap (I'm using the jet colormap rn)
colormap = plt.get_cmap(color_palette)
# normalize the height data to the range [0, 1]
norm = (data - np.min(data)) / (np.max(data) - np.min(data))
# convert the height data to RGB values using the palette
palettized_data = (colormap(norm)*255).astype(np.uint8)
# save the file as a png (to check quality)
saveloc = r'C:\Users\...\palletized_height_profile.png'
plt.imsave(saveloc, palettized_data)
# return the nice numbers for later analysis
return palettized_data
But instead of returning the nice image that I think I should get, it returns a super weird image with lines across it. note: I know these images aren't quite the same palettization, but I think you can understand the issue.
Does anyone understand how, why, etc.? I have also attached a link to the dataset, because maybe that is helpful...but I am quite sure there is nothing wrong with the data.

Display specific part of tiff image using rasterio without having to load the entire file

I have a large tiff file (around 2GB) containing a map. I have been able to successfully read the data and even display it using the following python code:
import rasterio
from rasterio.plot import show
with rasterio.open("image.tif") as img:
show(img)
data = img.read()
This works just fine. However, I need to be able to display specific parts of this map without having to load the entire file into memory (as it takes up too much of the RAM and is not doable on many other PCs). I tried using the Window class of rasterio in order to that, but when I tried to display the map the outcome was different from how the full map is displayed (as if it caused data loss):
import rasterio
from rasterio.plot import show
from rasterio.windows import Window
with rasterio.open("image.tif") as img:
data = img.read(window=Window(0, 0, 100000, 100000))
show(data)
So my question is, how can I display a part of the map without having to load into memory the entire file, while also making it look as if it had been cropped from the full map image?
thanks in advance :)
The reason that it displays nicely in the first case, but not in the second, is that in the first case you pass an instance of rasterio.DatasetReader to show (show(img)), but in the second case you pass in a numpy array (show(data)). The DatasetReader contains additional information, in particular an affine transformation and color interpretation, which show uses.
The additional things show does in the first case (for RGB data) can be recreated for the windowed case like so:
import rasterio
from rasterio.enums import ColorInterp
from rasterio.plot import show
from rasterio.windows import Window
with rasterio.open("image.tif") as img:
window = Window(0, 0, 100000, 100000)
# Lookup table for the color space in the source file
source_colorinterp = dict(zip(img.colorinterp, img.indexes))
# Read the image in the proper order so the numpy array will have the colors in the
# order expected by matplotlib (RGB)
rgb_indexes = [
source_colorinterp[ci]
for ci in (ColorInterp.red, ColorInterp.green, ColorInterp.blue)
]
data = img.read(rgb_indexes, window=window)
# Also pass in the affine transform corresponding to the window in order to
# display the correct coordinates and possibly orientation
show(data, transform=img.window_transform(window))
(I figured out what show does by looking at the source code here)
In case of data with a single channel, the underlying matplotlib library used for plotting scales the color range based on the min and max value of the data. To get exactly the same colors as before, you'll need to know the min and max of the whole image, or some values that come reasonably close.
Then you can explicitly tell matplotlib's imshow how to scale:
with rasterio.open("image.tif") as img:
window = Window(0, 0, 100000, 100000)
data = img.read(window=window, masked=True)
# adjust these
value_min = 0
value_max = 255
show(data, transform=img.window_transform(window), vmin=value_min, vmax=value_max)
Additional kwargs (like vmin and vmax here) will be passed on to matplotlib.axes.Axes.imshow, as documented here.
From the matplotlib documenation:
vmin, vmax: float, optional
When using scalar data and no explicit norm, vmin and vmax define the data range that the colormap covers. By default, the colormap covers the complete value range of the supplied data. It is deprecated to use vmin/vmax when norm is given. When using RGB(A) data, parameters vmin/vmax are ignored.
That way you could also change the colormap it uses etc.

Plotly in Python: show mean and variance of selected data

I am generating histograms using go.Histogram as described here. I am getting what is expected:
What I want to do is to show some statistics of the selected data, as shown in the next image (the white box I added manually in Paint):
I have tried this and within the function selection_fn I placed the add_annotation described here. However, it does nothing. No errors too.
How can I do this?
Edit: I am using this code taken from this link
import plotly.graph_objects as go
import numpy as np
x = np.random.randn(500)
fig = go.Figure(data=[go.Histogram(x=x)])
fig.show()
with obviously another data set.

Is there a way to plot many markers in Folium?

I am trying to use Folium for geographical information reading from a pandas dataframe.
The code I have is this one:
import folium
from folium import plugins
import pandas as pd
...operations on dataframe df...
map_1 = folium.Map(location=[51.5073219, -0.1276474],
zoom_start=12,
tiles='Stamen Terrain')
markerCluster = folium.plugins.MarkerCluster().add_to(map_1)
lat=0.
lon=0.
for index,row in df.iterrows():
lat=row['lat]
lon=row['lon']
folium.Marker([lat, lon], popup=row['name']).add_to(markerCluster)
map_1
df is a dataframe with longitude, latitude and name information. Longitude and latitude are float.
I am using jupyter notebook and the map does not appear. Just a white empty box.
I have also tried to save the map:
map_1.save(outfile='map_1.html')
but also opening the file doesn't work (using Chrome, Firefox or Explorer).
I have tried to reduce the number of markers displayed and below 300 markers the code works and the map is correctly displayed. If there are more than 300 Markers the map returns to be be blank.
The size of the file is below 5 MB and should be processed correctly by Chrome.
Is there a way around it (I have more than 2000 entries in the dataframe and I would like to plot them all)? Or to set the maximum number of markers shown in folium?
Thanks
This might be too late but I stumbled upon the same problem and found a solution that worked for me without having to remove the popups so I figured if anybody has the same issue they can try it out. Try replacing popup=row['name'] with popup=folium.PopUp(row['name'], parse_html=True) and see whether it works. You can read more about it here https://github.com/python-visualization/folium/issues/726

pylab.scatter creates colorbar with weird white lines

When I run this on OSX Yosemite the colorbar generted has weird white lines (see image below). Is there any way I can generate a colorbar without these ugly lines?
import pylab
import numpy
x = numpy.random.random(50)
y = numpy.random.random(50)
s = pylab.scatter(x,y,c=y)
pylab.colorbar(s)
pylab.savefig('/Users/kilojoules/plot.pdf')
This is a known issue (not of matplotlib, but many pdf viewers) which is also described in the documentation of the colorbar function (along with a work-around):
# create the colorbar
cbar = pylab.colorbar(s)
# set the color of the lines
cbar.solids.set_edgecolor("face")
This should fix it.
For further reading: the relevant issue on github

Categories

Resources