I have a specific implementation question about taking data mapped out using a colormapping (cmap) and converting it to rgba values. Essentially, I have a bunch of data which I would like to create an errorbar() plot for where the points as well as the errorbars themselves are colored by the size of some other value (for concreteness let's say it's contribution to the chi-square of the fit of some model). Let's say I have an (N,4) array called D, where the first two columns are the X and Y data, the third column is the value of the errorbar, and the last column is its contribution to the chi-square function.
How would I go about first 1) mapping the range of chi-square contribution values to a cmap, and secondly, 2) how can I get rgba values from these in order to loop over the errorbar() function to plot what I was hoping to plot?
This may actually be helpful (http://matplotlib.org/api/cm_api.html), but I'm unable to find any examples or additional information about how to use ScalarMappable() (which does have a to_rgba() method).
Thanks!
You can map scalar values to a colormap by calling the objects in matplotlib.cm on the values. The values should lie between 0 and 1. So, to get RBGA values for some chi-square distributed data (which I'll generate randomly), I would do:
chisq = np.random.chisquare(4, 8)
chisq -= chisq.min()
chisq /= chisq.max()
errorbar_colors = cm.winter(chisq)
Instead of having the color scale start and end at the minimum and maximum actual values, you could subtract off the minimum and divide by the maximum you want.
Now errorbar_colors will be a (8, 4) array of RGBA values from the winter colormap:
array([[ 0. , 0.7372549 , 0.63137255, 1. ],
[ 0. , 0.7372549 , 0.63137255, 1. ],
[ 0. , 0.4745098 , 0.7627451 , 1. ],
[ 0. , 1. , 0.5 , 1. ],
[ 0. , 0.36078431, 0.81960784, 1. ],
[ 0. , 0.47843137, 0.76078431, 1. ],
[ 0. , 0. , 1. , 1. ],
[ 0. , 0.48627451, 0.75686275, 1. ]])
To plot this, you can just iterate over the colors and the datapoints and draw errorbars:
heights = np.random.randn(8)
sem = .4
for i, (height, color) in enumerate(zip(heights, errorbar_colors)):
plt.plot([i, i], [height - sem, height + sem], c=color, lw=3)
plt.plot(heights, marker="o", ms=12, color=".3")
However, none of the built-in matplotlib colormaps are all that well-suited to this task. For some improvement, you could use seaborn to generate a sequential color palette that can be used to color lines:
import numpy as np
import seaborn
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
chisq = np.random.chisquare(4, 8)
chisq -= chisq.min()
chisq /= chisq.max()
cmap = ListedColormap(seaborn.color_palette("GnBu_d"))
errorbar_colors = cmap(chisq)
heights = np.random.randn(8)
sem = .4
for i, (height, color) in enumerate(zip(heights, errorbar_colors)):
plt.plot([i, i], [height - sem, height + sem], c=color, lw=3)
plt.plot(heights, marker="o", ms=12, color=".3")
But even here, I have doubts that this is going to be the best way to get your point across. I don't know exactly what your data look like, but I would advise making two plots, one with the dependent variable you would be plotting here, and a second with the chi square statistic as the dependent variable. Alternatively, if you're interested in the relationship between the size of the error bars and the chi square value, I would plot that directly with a scatterplot.
Related
I need to convert some colors from RGB and for this, I need to know what the color model of matplotlib's colormap is. My best guess would be CIELab but then I'm wondering about the fourth dimension.
import matplotlib
import numpy as np
scale = np.random.rand(500)
matplotlib.cm.viridis(scale)
# output
array([[0.175707, 0.6979 , 0.491033, 1. ],
[0.127568, 0.566949, 0.550556, 1. ],
[0.166383, 0.690856, 0.496502, 1. ],
...,
[0.169646, 0.456262, 0.55803 , 1. ],
[0.139147, 0.533812, 0.555298, 1. ],
[0.120565, 0.596422, 0.543611, 1. ]])
I have a 2D array of size (3,2) and i have to re sample this by using nearest neighbor, linear and bi cubic method of interpolation so that the size become (4,3).
I am using Python, numpy and scipy for this.
How can I achieve resampling of the input array?
There is a good tutorial on re-sampling using convolution here.
For integer factor up-scaling:
import numpy
import scipy
from scipy import ndimage, signal
# Scale factor
factor = 2
# Input image
a = numpy.arange(16).reshape((4,4))
# Empty image enlarged by scale factor
b = numpy.zeros((a.shape[0]*factor, a.shape[0]*factor))
# Fill the new array with the original values
b[::factor,::factor] = a
# Define the convolution kernel
kernel_1d = scipy.signal.boxcar(factor)
kernel_2d = numpy.outer(kernel_1d, kernel_1d)
# Apply the kernel by convolution, seperately in each axis
c = scipy.signal.convolve(b, kernel_2d, mode="valid")
Note that the factor can be different for each axis, and that you can also apply the convolution sequentially, on each axis. The kernels for bi-linear and bi-cubic are also shown in the link, with the bilinear interpolation making use of a triangular signal (scipy.signal.triang) and bi-cubic being a piece wise function.
You should also mind which portion of the interpolated image is valid; along the edges there is not sufficient support for the kernel.
Bi-cubic interpolation is the best option of the three, as far as satellite imagery goes.
There is a simpler solution for this https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.zoom.html.
Nearest neighbor interpolation is order=0, bilinear interpolation is order=1, and bicubic is order=3 (default).
import numpy as np
import scipy.ndimage
x = np.arange(6).reshape(3,2).astype(float)
z = (4/3, 3/2)
print('Original array:\n{0}\n\n'.format(x))
methods=['nearest-neighbor', 'bilinear', 'biquadratic', 'bicubic']
for o in range(4):
print('Resampled with {0} interpolation:\n {1}\n\n'.
format(methods[o], scipy.ndimage.zoom(x, z, order=o)))
This results to:
Original array:
[[0. 1.]
[2. 3.]
[4. 5.]]
Resampled with nearest-neighbor interpolation:
[[0. 1. 1.]
[2. 3. 3.]
[2. 3. 3.]
[4. 5. 5.]]
Resampled with bilinear interpolation:
[[0. 0.5 1. ]
[1.33333333 1.83333333 2.33333333]
[2.66666667 3.16666667 3.66666667]
[4. 4.5 5. ]]
Resampled with biquadratic interpolation:
[[1.04083409e-16 5.00000000e-01 1.00000000e+00]
[1.11111111e+00 1.61111111e+00 2.11111111e+00]
[2.88888889e+00 3.38888889e+00 3.88888889e+00]
[4.00000000e+00 4.50000000e+00 5.00000000e+00]]
Resampled with bicubic interpolation:
[[5.55111512e-16 5.00000000e-01 1.00000000e+00]
[1.03703704e+00 1.53703704e+00 2.03703704e+00]
[2.96296296e+00 3.46296296e+00 3.96296296e+00]
[4.00000000e+00 4.50000000e+00 5.00000000e+00]]
I am trying to plot a 4D array using as 4th dimension the color. Here is a sample of my matrix:
[[ 4.216 0. 1. 0. ]
[ 5.36 0. 1. 0. ]
[ 5.374 0. 2. 0. ]
...,
[ 0.294 0. 1. 0. ]
[ 0.314 0. 2. 0. ]
[ 0.304 0. 1. 0. ]]
4th column only contains values 0, 1 and 2.
So when I try to plot it using this script:
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(data[:,0],data[:,1],data[:,2], c=data[:,3], cmap=plt.hot())
plt.show()
I am getting this error:
TypeError: can't multiply sequence by non-int of type 'float'
This isn't a 4D array. It's a 2D array with 4 columns (the 2 dimensions could be referred to as "rows" and "columns"). But I see what you're trying to say—each row could be interpreted as describing a point in 4-dimensional space, with the fourth "dimension" being colour.
Two-dimensionality is actually the key to the problem. I suspect your data variable is a numpy.matrix rather than a vanilla numpy.array. A matrix is particular class of 2D-array that has various special properties, including the fact that a slice of it (for example, data[:, 0]) is still a 2-dimensional matrix object, whereas .scatter() expects each argument to be a 1-D array.
The fix is to say:
data = numpy.asarray(data)
to convert your data from a matrix to a normal array whose column slices will be 1-dimensional.
BTW: you probably meant to say cmap='hot'. The call to plt.hot() sets the default colormap (so your figure may look right, but there's a side effect) but it actually returns None.
I am trying to replicate the results from a paper.
"Two-dimensional Fourier Transform (2D-FT) in space and time along sections of constant latitude (east-west) and longitude (north-south) were used to characterize the spectrum of the simulated flux variability south of 40degS." - Lenton et al(2006)
The figures published show "the log of the variance of the 2D-FT".
I have tried to create an array consisting of the seasonal cycle of similar data as well as the noise. I have defined the noise as the original array minus the signal array.
Here is the code that I used to plot the 2D-FT of the signal array averaged in latitude:
import numpy as np
from numpy import ma
from matplotlib import pyplot as plt
from Scientific.IO.NetCDF import NetCDFFile
### input directory
indir = '/home/nicholas/data/'
### get the flux data which is in
### [time(5day ave for 10 years),latitude,longitude]
nc = NetCDFFile(indir + 'CFLX_2000_2009.nc','r')
cflux_southern_ocean = nc.variables['Cflx'][:,10:50,:]
cflux_southern_ocean = ma.masked_values(cflux_southern_ocean,1e+20) # mask land
nc.close()
cflux = cflux_southern_ocean*1e08 # change units of data from mmol/m^2/s
### create an array that consists of the seasonal signal fro each pixel
year_stack = np.split(cflux, 10, axis=0)
year_stack = np.array(year_stack)
signal_array = np.tile(np.mean(year_stack, axis=0), (10, 1, 1))
signal_array = ma.masked_where(signal_array > 1e20, signal_array) # need to mask
### average the array over latitude(or longitude)
signal_time_lon = ma.mean(signal_array, axis=1)
### do a 2D Fourier Transform of the time/space image
ft = np.fft.fft2(signal_time_lon)
mgft = np.abs(ft)
ps = mgft**2
log_ps = np.log(mgft)
log_mgft= np.log(mgft)
Every second row of the ft consists completely of zeros. Why is this?
Would it be acceptable to add a randomly small number to the signal to avoid this.
signal_time_lon = signal_time_lon + np.random.randint(0,9,size=(730, 182))*1e-05
EDIT: Adding images and clarify meaning
The output of rfft2 still appears to be a complex array. Using fftshift shifts the edges of the image to the centre; I still have a power spectrum regardless. I expect that the reason that I get rows of zeros is that I have re-created the timeseries for each pixel. The ft[0, 0] pixel contains the mean of the signal. So the ft[1, 0] corresponds to a sinusoid with one cycle over the entire signal in the rows of the starting image.
Here are is the starting image using following code:
plt.pcolormesh(signal_time_lon); plt.colorbar(); plt.axis('tight')
Here is result using following code:
ft = np.fft.rfft2(signal_time_lon)
mgft = np.abs(ft)
ps = mgft**2
log_ps = np.log1p(mgft)
plt.pcolormesh(log_ps); plt.colorbar(); plt.axis('tight')
It may not be clear in the image but it is only every second row that contains completely zeros. Every tenth pixel (log_ps[10, 0]) is a high value. The other pixels (log_ps[2, 0], log_ps[4, 0] etc) have very low values.
Consider the following example:
In [59]: from scipy import absolute, fft
In [60]: absolute(fft([1,2,3,4]))
Out[60]: array([ 10. , 2.82842712, 2. , 2.82842712])
In [61]: absolute(fft([1,2,3,4, 1,2,3,4]))
Out[61]:
array([ 20. , 0. , 5.65685425, 0. ,
4. , 0. , 5.65685425, 0. ])
In [62]: absolute(fft([1,2,3,4, 1,2,3,4, 1,2,3,4]))
Out[62]:
array([ 30. , 0. , 0. , 8.48528137,
0. , 0. , 6. , 0. ,
0. , 8.48528137, 0. , 0. ])
If X[k] = fft(x), and Y[k] = fft([x x]), then Y[2k] = 2*X[k] for k in {0, 1, ..., N-1} and zero otherwise.
Therefore, I would look into how your signal_time_lon is being tiled. That may be where the problem lies.
I have the bbox of a matplotlib.patches.Rectangle object (a bar from a bar graph) in display coordinates, like this:
Bbox(array([[ 0., 0.],[ 1., 1.]])
But I would like that not in display coordinates but data coordinates. I'm pretty sure this requires a transform. What's the method for doing this?
I'm not sure how you got the Bbox in display coordinates. Almost everything the user interacts with is in data coordinates (those look like axis or data coordinates to me, not display pixels). The following should fully explain the transforms as they apply to Bboxes:
from matplotlib import pyplot as plt
bars = plt.bar([1,2,3],[3,4,5])
ax = plt.gca()
fig = plt.gcf()
b = bars[0].get_bbox() # bbox instance
print b
# box in data coords
#Bbox(array([[ 1. , 0. ],
# [ 1.8, 3. ]]))
b2 = b.transformed(ax.transData)
print b2
# box in display coords
#Bbox(array([[ 80. , 48. ],
# [ 212.26666667, 278.4 ]]))
print b2.transformed(ax.transData.inverted())
# box back in data coords
#Bbox(array([[ 1. , 0. ],
# [ 1.8, 3. ]]))
print b2.transformed(ax.transAxes.inverted())
# box in axes coordinates
#Bbox(array([[ 0. , 0. ],
# [ 0.26666667, 0.6 ]]))