minimum and maximum length to delimit my line - python

Sorry o-for the stupid question but I have been on it for over nearly an hour already. Here is a sample of my dataframe:
SEASON Total
0 2004-2005 4
1 2005-2006 4
2 2006-2007 1
3 2007-2008 7
4 2008-2009 7
5 2009-2010 4
6 2010-2011 4
7 2012-2013 4
8 2013-2014 1
9 2014-2015 2
10 2015-2016 3
11 2016-2017 13
12 2017-2018 18
13 2018-2019 8
I have done this:
plt.figure(figsize=(13,6))
plt.plot(per_year.index, per_year['Total'])
plt.xticks(per_year.index, per_year['SEASON'].unique());
plt.title('AVg assist PER YEAR')
plt.axvline(x=10,color='red', linestyle='--')
plt.axhline(y=3.8,color='orange', xmax=10)
plt.axhline(y=11.75, xmax=10)
plt.tight_layout()
All I want is to be able give a max length to my first horizontal line(where it has to stop) and minimum to my second horizontal line to say where it has to finish. I am pretty sure I can do it if change the axis to proper numbers. But I want to keep it as it is.

From the docs: the xmin and xmax arguments need to be between 0-1
Calculate the scale based on the number of x items
xmax = 10/len(per_year.index)
Or use the hline method of the axes:
ax = plt.gca()
ax.hlines(y=3.8,xmin=0, xmax=10, color='r')
ax.hlines(y=11.75,xmin=0, xmax=10, color='g')

Related

Errorbar plot for Likert scale confidence values

I have the following dataset, for 36 fragments in total (36 rows × 3 columns):
Fragment lower upper
0 1 1 5
1 2 2 5
2 3 3 5
3 4 2 5
4 5 1 5
5 6 1 5
I've calculated these lower and upper bounds from this dataset (966 rows × 2 columns):
Fragment Confidence Value
0 33 4
1 26 4
2 23 3
3 16 2
4 36 3
which contains multiple instance of a fragment and an associated Confidence value.
The confidence values are data from a Likert scale, i.e. 1-5. I want to create an error bar plot, for example like this:
So on the y-axis to have each fragment 1-36 and on the x-axis to show the range/std/mean (?) of the confidence values for each fragment.
I've tried this, but it's not exactly what I want, I think using the lower and upper bounds isn't the best idea, maybe I need std/range...
#confpd is the second dataset from above
meanconfs = confpd.groupby('Fragment', as_index=False)['Confidence Value'].mean()
minconfs = confpd.groupby(Fragment', as_index=False)['Confidence Value'].min()
maxconfs = confpd.groupby('Fragment', as_index=False)['Confidence Value'].max()
data_dict = {}
data_dict['Fragment'] = ['1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18',
'19','20','21','22','23','24','25','26','27','28','29','39','31','32','33','34','35','36']
data_dict['lower'] = minconfs['Confidence Value']
data_dict['upper'] = maxconfs['Confidence Value']
dataset = pd.DataFrame(data_dict)
##dataset is the first dataset I show above
for lower,upper,y in zip(dataset['lower'],dataset['upper'],range(len(dataset))):
plt.plot((lower,upper),(y,y),'ro-',color='orange')
plt.yticks(range(len(dataset)),list(dataset['Fragment']))
The result of this code is this, which is not what I want.
Any help is greatly appreciated!!

How do the subplot indices work in Python?

I'm trying to make sense how the subplot indices work but they don't seem intuitive at all. I particularly have an issue with the third index. I know that there are other ways to create subplots in python but I am trying to understand how subplots written in such a manner work because they are used extensively.
I am trying to use a trivial example to see if I understand what I'm doing. So, here's what I want to do:
Row 1 has 3 columns
Row 2 has 2 columns
Row 3 has 3 columns
Rows 4 and 5 have 2 columns. However, I want to have the left subplot span rows 4 and 5.
This is the code for the first 3 rows. I don't understand why the third index of ax4 is 3 instead of 4.
ax1 = plt.subplot(5,3,1)
ax2 = plt.subplot(5,3,2)
ax3 = plt.subplot(5,3,3)
ax4 = plt.subplot(5,2,3)
ax5 = plt.subplot(5,2,4)
ax6 = plt.subplot(5,3,7)
ax7 = plt.subplot(5,3,8)
ax8 = plt.subplot(5,3,9)
For the three subplots that sit in rows 3 and 4, I can't seem to be able to do that. Here's my wrong attempt:
ax9 = plt.subplot(4,2,10)
ax10 = plt.subplot(5,2,12)
ax11 = plt.subplot(5,2,15)
The indices are from left to right, and then wrap at the end of the row. So subplot(2, 3, x):
1 2 3
4 5 6
For your example, ax4=subplot(5, 3, x) the subplots are indexed:
1 2 3
4 5 6
7 8 9
10 11 12
13 14 15
For the ax4=subplot(5, 2, x) they are indexed:
1 2
3 4
5 6
7 8
9 10
To span subplots, you can input the start and stop indices:
ax9 = plt.subplot(5, 2, 7:9)
ax10 = plt.subplots(5, 2, 8:10)

How to create n subplots (box plots) automatically?

I need to show n (e.g. 5) box plots. How can I do it?
df =
col1 col2 col3 col4 col5 result
1 3 1 1 4 0
1 2 2 4 9 1
1 2 1 3 7 1
This is my current code. But it does not display the data inside plots. Also, plots are very thin if n is for example 10 (is it possible to go to a new line automatically?).
n=5
columns = df.columns
i = 0
fig, axes = plt.subplots(1, n, figsize=(20,5))
for ax in axes:
df.boxplot(by="result", column = [columns[i]], vert=False, grid=True)
i = i + 1
display(fig)
This example is for Azure Databricks, but I appreciate just a matplotlib solution as well if it's applicable.
I am not sure I got what you are trying to do, but the following code will show you the plots. You can control the figure sizes by changing the values of (10,10)
Code:
df.boxplot(by="result",figsize=(10,10));
Result:
To change the Vert and show the grid :
df.boxplot(by="result",figsize=(10,10),vert=False, grid=True);
I solved it myself as follows:
df.boxplot(by="result", column = columns[0:4], vert=False, grid=True, figsize=(30,10), layout = (3, 5))
If you want additional row to be generated, while fixing the number of columns to be constant: adjust the layout as follows:
In [41]: ncol = 2
In [42]: df
Out[42]:
v0 v1 v2 v3 v4 v5 v6
0 0 3 6 9 12 15 18
1 1 4 7 10 13 16 19
2 2 5 8 11 14 17 20
In [43]: df.boxplot(by='v6', layout=(df.shape[1] // ncol + 1, ncol)) # use floor division to determine how many row are required

3D Plot after using for loop and range (Python)

i have some questions for which i couldn't find any answers although i looked up for it.
My code so far is the following:
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
from math import *
from scipy.special import *
import matplotlib.pyplot as plt
import numpy as np
## Definition der Parameter für Druckgleichung nach Rudnicki (1986) ##
q = 6.0/1000
lameu = 11.2*10**9
lame = 8.4*10**9
pi
alpha = 0.65
G = 8.4*10**9
k = 1.0e-15
eta = 0.001
t = 1000*365*24600
kappa = k/eta
print "kappa ist:",kappa
c = ((kappa*(lameu-lame)*(lame+2*G))/((alpha**2)*(lameu+2*G)))
print "c ist:",c
xmin = -10
xmax = 10
ymin = -10
ymax = 10
for x in range (xmin,xmax):
for y in range (ymin,ymax):
r=sqrt(x**2+y**2)
P=(q/(rhof*4*pi*kappa))*(expn(1,r**2/(4*c*t)))
z = P/1e6
print x, y, z
x, y = np.meshgrid(x, y)
## Plotting in 3D ##
fig = plt.figure()
ax = fig.gca(projection='3d')
surf = ax.plot_surface(x, y, z, rstride=1, cstride=1, cmap=cm.jet, linewidth=0,
antialiased=False, vmin=np.nanmin(z), vmax=np.nanmax(z))
fig.colorbar(surf, shrink=0.5, aspect=5)
## Achsenskalen ##
ax.set_xlim(xmin,xmax) # x-Achsenskala vorgeben
ax.set_ylim(ymin,ymax) # y-Achsenskala vorgeben
## Beschriftung der Achsen ##
ax.set_title('Druckverteilung')
ax.set_xlabel('Distanz zu Well [m]')
ax.set_ylabel('Distanz zu Well [m]')
ax.set_zlabel('Druck in [MPa]')
plt.show()
If i try to run the program, my values for x,y and z show up as intended, but i dont get any 3D Plot. I had this issue once before, so i tried so define my infinite values for z to be treated as not a number:
z[z==np.inf] = np.nan
After adding this to my code, i get the following error:
TypeError: 'numpy.float64' object does not support item assignment
What exactly means this? I dont get it in the context. I think i need it for my plot?
Whats the exact difference in my for loop, e.g. using:
for x in range [-10,10]
and
for x in range (-10,10)
?
I know there are types of functions using
P[x,y]=....
instead of only
P=....
?
When do i have to use the brackets?
I hope someone can lighten me up. Thanks!
To answer your various questions:
z[z==np.inf] = np.nan
After adding this to my code, i get the following error: TypeError: 'numpy.float64' object does not
support item assignment
This is because z is just a number, not an array.
The () and [] confusion is simple, you access elements of a list (or any other container class implementing __getitem__ using the [] brackets. You call objects using ().
Essentially, these two bits of syntax are short forms of the less conveneient versions;
myObject[key] results in myObject.__getitem__(key), and myObject(variable) results in myObject.__call__(variable). It's just syntax.
Typically, these are used to create functions and container classes (you could misuse them, but it would make for some very confusing code).
As for making your plotting work, you're going to want to make your z array of data points, with the correct shape.
The issue you were having is that you did not provide the data to plot_surface as it requires, it needs 2D arrays of data. XX and YY are just what numpy.meshgrid creates, iirc, x and y arguments can just be straight lists, but i haven't tried it.
At any rate, you normally have elements lookign like this (for a square grid):
XX
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
YY
1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6 6
7 7 7 7 7 7 7 7 7
8 8 8 8 8 8 8 8 8
9 9 9 9 9 9 9 9 9
and then ZZ is just the z vaules for the function at the corresponding point, i.e. if you're plotting some function, f(x,y) then you could do something like:
for i in range(len(XX)):
for j in range(len(XX[0])):
ZZ[i][j] = f(XX[i][j], YY[i][j])
Although there is likely some much faster numpy way to do the array operations that would be faster.
i normally do something like this:
import numpy
# other boiler plae variable definitions you have
xs = numpy.linspace(xStart, xStop, num=50)
ys = numpy.linspace(yStart, yStop, num=50)
XX, YY = numpy.meshgrid(xs,ys)
ZZ = numpy.zeros_like(XX)
for i, x in enumerate(xs):
for j, y in enumerate(ys):
r=sqrt(x**2+y**2)
P=(q/(rhof*4*pi*kappa))*(expn(1,r**2/(4*c*t)))
ZZ[i][j] = P/1e6
fig = plt.figure()
ax = fig.gca(projection='3d')
surf = ax.plot_surface(XX, YY, ZZ, rstride=1, cstride=1, cmap=cm.jet, linewidth=0,
antialiased=False, vmin=np.nanmin(ZZ), vmax=np.nanmax(ZZ))
fig.colorbar(surf, shrink=0.5, aspect=5)

How to draw bar in python

I want to draw bar chart for below data:
4 1406575305 4
4 -220936570 2
4 2127249516 2
5 -1047108451 4
5 767099153 2
5 1980251728 2
5 -2015783241 2
6 -402215764 2
7 927697904 2
7 -631487113 2
7 329714360 2
7 1905727440 2
8 1417432814 2
8 1906874956 2
8 -1959144411 2
9 859830686 2
9 -1575740934 2
9 -1492701645 2
9 -539934491 2
9 -756482330 2
10 1273377106 2
10 -540812264 2
10 318171673 2
The 1st column is the x-axis and the 3rd column is for y-axis. Multiple data exist for same x-axis value. For example,
4 1406575305 4
4 -220936570 2
4 2127249516 2
This means three bars for 4 value of x-axis and each of bar is labelled with tag(the value in middle column). The sample bar chart is like:
http://matplotlib.org/examples/pylab_examples/barchart_demo.html
I am using matplotlib.pyplot and np. Thanks..
I followed the tutorial you linked to, but it's a bit tricky to shift them by a nonuniform amount:
import numpy as np
import matplotlib.pyplot as plt
x, label, y = np.genfromtxt('tmp.txt', dtype=int, unpack=True)
ux, uidx, uinv = np.unique(x, return_index=True, return_inverse=True)
max_width = np.bincount(x).max()
bar_width = 1/(max_width + 0.5)
locs = x.astype(float)
shifted = []
for i in range(max_width):
where = np.setdiff1d(uidx + i, shifted)
locs[where[where<len(locs)]] += i*bar_width
shifted = np.concatenate([shifted, where])
plt.bar(locs, y, bar_width)
If you want you can label them with the second column instead of x:
plt.xticks(locs + bar_width/2, label, rotation=-90)
I'll leave doing both of them as an exercise to the reader (mainly because I have no idea how you want them to show up).

Categories

Resources