Python - Plotting T_value above barplot - python

This is a minimal example, code, of what I am doing:
B = np.array([0.6383447, 0.5271385, 1.7721380, 1.7817880])
b_mean = mean(B)
ori_t = stats.ttest_1samp(B, 0)[0]
r1 = [1]
plt.bar(r1,b_mean,width=barWidth, color="blue")
This code produce a barplot of the mean of the 'B' array. Now I would like to add the T-value (extracted at the 3 line) and display it above the barplot. I tried the following:
plt.text(x=r1, y=b_mean+0.1, s=ori_t, size = 6)
each time it returns
TypeError: float() argument must be a string or a number
which I don't understand. Does anyone knows how to achieve or overcome that?

The problem is that you are passing r1 = [1] as the x-position for your text. r1 is a list which cannot be used for specifying the position of the text. x and y arguments in plt.text should be scalars. So either you write x=1 OR you write x=r1[0] both of which are scalars. I have included the missing imports in my answer to make it complete. I have also adjusted the y-limits accordingly.
From the docs:
x, y : scalars
The position to place the text. By default, this is in data coordinates. The coordinate system can be changed using the transform parameter.
from scipy import stats
import numpy as np
import matplotlib.pyplot as plt
B = np.array([0.6383447, 0.5271385, 1.7721380, 1.7817880])
b_mean = np.mean(B)
ori_t = stats.ttest_1samp(B, 0)[0]
r1 = [1]
plt.bar(r1,b_mean,width=0.02, color="blue")
plt.text(x=r1[0], y=b_mean+0.1, s=ori_t, size = 10)
# plt.text(x=1, y=b_mean+0.1, s=ori_t, size = 10)
plt.ylim(0, b_mean+0.2)
plt.show()

Related

wrong number of arguments when using x, y arrays as pixel positions and z as colour value in matplotlib.pcolor()

I have been trying to make a coloured image with a colourbar scale. Each pixel should correspond to colour bar values from the Z array, and x,y should be pixel position arguments (in mm) similar to this:
https://ars.els-cdn.com/content/image/1-s2.0-S0010218014001989-gr6_lrg.jpg
Data is imported across many text files. However, a small example section of this input data can be recreated from the following code file to pandas dataframe code, (where the x axis range is -30 - 30mm (increment by 1) and y axis range is 6 - 15 mm (increment by 3):
import numpy as np
import pandas as pd
x = np.linspace(-30,30,61)
y = np.linspace(6, 15, 4)
z = 1.35 * np.random.rand(61*4, 1)
for i in range(0,3,1):
x = np.append(x,x[:61])
y = np.repeat(y,61)
df = pd.DataFrame()
df['X [mm]'] = x
df ['Y [mm]'] = y
df['LDA1-Mean [m/s]'] = z
print(df)
Now running the following code:
import matplotlib.pyplot as plt
Z = df['LDA1-Mean [m/s]'].to_list()
positions = np.array(list(zip(df['X [mm]'], df['Y [mm]'])))
plt.pcolor(positions, Z)
plt.show()
plt.savefig('solutions/graphs/test.png', dpi=300, bbox_inches="tight")
Produces the following error:
TypeError: pcolor() takes 1 or 3 positional arguments but 2 were given
Is there a better way to do this with imshow() or contourf(), I'm open to suggestions.
As the order of X and Y data cannot as consistent as, I'd prefer to take always use the X and Y position data rather than reordering Z data (and using only 1 argument).
Thank you in advance for the help - I am still new to programming. Please feel free to ask questions if there is something I have not explained.

I am beginner, and have a question related to plotting in Python

I am new to python.
I wanted to know the syntax for a problem
Suppose I want to plot a quantity x = (constant with a fixed given value) * ln (1+z) versus z (which varies from c to d)
How do I define the variables x and z, how do I input an 'ln' function
I have imported numpy, scipy and matplotlib, but do not know how to proceed thereafter
Since you already imported numpy, here is just another answer:
import numpy as np
import matplotlib.pyplot as plt
x_coeff = 10
c = 0
d = 100
z = [i for i in range(c, d)]
x = [x_coeff * np.log(1+v) for i, v in enumerate(z)]
plt.plot(z, x)
plt.show()
It's always better to check the documents, and give out your first try:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.log.html
You might also need to understand "list comprehension".
It's a beautiful and convenient way to create list in python.
For plotting a curve, you need two lists, one of them is domain on x axis and the other is range points on y-axis. first we take a constant as input,using python inbuilt input function and make sure that it is int, use math library and log function to do log as need.
import math
import matplotlib.pyplot as plt
a = int(input("enter a value for constant : "))
c,d = 0,100
xvals = list(range(c,d,1)) # start,end,step
print(xvals)
yvals = [a*math.log(1+x) for x in xvals]
print(yvals)
plt.plot(xvals,yvals)
plt.show()

Python: Get values of array which correspond to contour lines

Is there a way to extract the data from an array, which corresponds to a line of a contourplot in python? I.e. I have the following code:
n = 100
x, y = np.mgrid[0:1:n*1j, 0:1:n*1j]
plt.contour(x,y,values)
where values is a 2d array with data (I stored the data in a file but it seems not to be possible to upload it here). The picture below shows the corresponding contourplot. My question is, if it is possible to get exactly the data from values, which corresponds e.g. to the left contourline in the plot?
Worth noting here, since this post was the top hit when I had the same question, that this can be done with scikit-image much more simply than with matplotlib. I'd encourage you to check out skimage.measure.find_contours. A snippet of their example:
from skimage import measure
x, y = np.ogrid[-np.pi:np.pi:100j, -np.pi:np.pi:100j]
r = np.sin(np.exp((np.sin(x)**3 + np.cos(y)**2)))
contours = measure.find_contours(r, 0.8)
which can then be plotted/manipulated as you need. I like this more because you don't have to get into the deep weeds of matplotlib.
plt.contour returns a QuadContourSet. From that, we can access the individual lines using:
cs.collections[0].get_paths()
This returns all the individual paths. To access the actual x, y locations, we need to look at the vertices attribute of each path. The first contour drawn should be accessible using:
X, Y = cs.collections[0].get_paths()[0].vertices.T
See the example below to see how to access any of the given lines. In the example I only access the first one:
import matplotlib.pyplot as plt
import numpy as np
n = 100
x, y = np.mgrid[0:1:n*1j, 0:1:n*1j]
values = x**0.5 * y**0.5
fig1, ax1 = plt.subplots(1)
cs = plt.contour(x, y, values)
lines = []
for line in cs.collections[0].get_paths():
lines.append(line.vertices)
fig1.savefig('contours1.png')
fig2, ax2 = plt.subplots(1)
ax2.plot(lines[0][:, 0], lines[0][:, 1])
fig2.savefig('contours2.png')
contours1.png:
contours2.png:
plt.contour returns a QuadContourSet which holds the data you're after.
See Get coordinates from the contour in matplotlib? (which this question is probably a duplicate of...)

binned_statistic_2d producing unexpected negative values

I'm using scipy.stats.binned_statistic_2d and then plotting the output. When I use stat="count", I have no problems. When I use stat="mean" (or np.max() for that matter), I end up with negative values in each bin (as identified by the color bar), which should not be the case because I have constructed zvals such that it is always greater than zero. Does anyone know why this is the case? I've included the minimal code I use to generate the plots. I also get an invalid value RunTime warning, which makes me think that something strange is going on in binned_statistic_2d. The following code should just copy and run.
From the documentation:
'count' : compute the count of points within each bin. This is
identical to an unweighted histogram. `values` array is not
referenced.
which leads me to believe that there might be something going on in binned_statistic_2d and how it handles z-values.
import numbers as _numbers
import numpy as _np
import scipy as _scipy
import matplotlib as _mpl
import types as _types
import scipy.stats
from matplotlib import pyplot as _plt
norm_args = (0, 3, int(1e5)) # loc, scale, size
x = _np.random.random(norm_args[-1]) # xvals can be log scaled.
y = _np.random.normal(*norm_args) #_np.random.random(norm_args[-1]) #
z = _np.abs(_np.random.normal(1e2, *norm_args[1:]))
nbins = 1e2
kwargs = {}
stat = _np.max
fig, ax = _plt.subplots()
binned_stats = _scipy.stats.binned_statistic_2d(x, y, z, stat,
nbins)
H, xedges, yedges, binnumber = binned_stats
Hplot = H
if isinstance(stat, str):
cbar_title = stat.title()
elif isinstance(stat, _types.FunctionType):
cbar_title = stat.__name__.title()
XX, YY = _np.meshgrid(xedges, yedges)
Image = ax.pcolormesh(XX, YY, Hplot.T) #norm=norm,
ax.autoscale(tight=True)
grid_kargs = {'orientation': 'vertical'}
cax, kw = _mpl.colorbar.make_axes_gridspec(ax, **grid_kargs)
cbar = fig.colorbar(Image, cax=cax)
cbar.set_label(cbar_title)
Here's the runtime warning:
/Users/balterma/Library/Enthought/Canopy_64bit/User/lib/python2.7/sitepackages/matplotlib/colors.py:584: RuntimeWarning: invalid value encountered in less cbook._putmask(xa, xa < 0.0, -1)
Image with mean:
Image with max:
Image with count:
Turns out the problem was interfacing with plt.pcolormesh. I had to convert the output array from binned_statistic_2d to a masked array that masked the NaNs.
Here's the question that gave me the answer:
pcolormesh with missing values?

Use of pandas.shift() to align datasets based on scipy.signal.correlate

I have datasets that look like the following: data0, data1, data2 (analogous to time versus voltage data)
If I load and plot the datasets using code like:
import pandas as pd
import numpy as np
from scipy import signal
from matplotlib import pylab as plt
data0 = pd.read_csv('data0.csv')
data1 = pd.read_csv('data1.csv')
data2 = pd.read_csv('data2.csv')
plt.plot(data0.x, data0.y, data1.x, data1.y, data2.x, data2.y)
I get something like:
now I try to correlate data0 with data1:
shft01 = np.argmax(signal.correlate(data0.y, data1.y)) - len(data1.y)
print shft01
plt.figure()
plt.plot(data0.x, data0.y,
data1.x.shift(-shft01), data1.y)
fig = plt.gcf()
with output:
-99
and
which works just as expected! but if I try it the same thing with data2, I get a plot that looks like:
with a positive shift of 410. I think I am just not understanding how pd.shift() works, but I was hoping that I could use pd.shift() to align my data sets. As far as I understand, the return from correlate() tells me how far off my data sets are, so I should be able to use shift to overlap them.
panda.shift() is not the correct method to shift curve along x-axis. You should adjust X values of the points:
plt.plot(data0.x, data0.y)
for target in [data1, data2]:
dx = np.mean(np.diff(data0.x.values))
shift = (np.argmax(signal.correlate(data0.y, target.y)) - len(target.y)) * dx
plt.plot(target.x + shift, target.y)
here is the output:
#HYRY one correction to your answer: there is an indexing mismatch between len(), which is one-based, and np.argmax(), which is zero-based. The line should read:
shift = (np.argmax(signal.correlate(data0.y, target.y)) - (len(target.y)-1)) * dx
For example, in the case where your signals are already aligned:
len(target.y) = N (one-based)
The cross-correlation function has length 2N-1, so the center value, for aligned data, is:
np.argmax(signal.correlate(data0.y, target.y) = N - 1 (zero-based)
shift = ((N-1) - N) * dx = (-1) * dx, when we really want 0 * dx

Categories

Resources