Matplotlib Bar Plot Grouping Subplots - python

I'm currently generating this plot:
But as you can see, it is taking taking the items from arrays below and spreading them across the xticks.
[array([ 1.77009257, 1.57980963, 0.31896943, 0.01874767]), array([ 1.02788175, 0.99604306])]
[array([ 0.20091287, 0.14682076, 0.03212798, 0.00187477]), array([ 0.09545977, 0.11318596])]
What I want is to create a cluster of all four items from the first array over the xtick -2wks and a cluster of the two items of the next array over xtick -1wk.
Bonus points if you can then give each bar in a given cluster the corresponding label from these arrays.
[Index([u'AL GAINESVILLE LOCK', u'AL GREENSBORO', u'AL HIGHLAND HOME', u'AL BREWTON 3 SSE'],dtype='object', name=u'StateCity'), Index([u'AL GREENSBORO', u'AL GAINESVILLE LOCK'], dtype='object', name=u'StateCity')]

You might be best off using pandas plot for this. The second example here looks very similar to what you would like to achieve if I understand you correctly.
If you transpose your data so that the index you show below makes up the columns and your xticks make up the new index, you should get what you are looking for.

Related

Create a scatter plot from an ndarray using the position in the row as the x-axis value and the value in the array for the y-axis

much like the title says I am trying to create a graph that shows 1-6 on the x-axis (the values position in the row of the array) and its value on the y-axis. A snippet from the array is shown below, with each column representing a coefficient number from 1-6.
[0.99105 0.96213 0.96864 0.96833 0.96698 0.97381]
[0.99957 0.99709 0.9957 0.9927 0.98492 0.98864]
[0.9967 0.98796 0.9887 0.98613 0.98592 0.99125]
[0.9982 0.99347 0.98943 0.96873 0.91424 0.83831]
[0.9985 0.99585 0.99209 0.98399 0.97253 0.97942]
It's already set up as a numpy array. I think it's relatively straightforward, just drawing a complete mental blank.
Any ideas?
Do you want something like this?
a = np.array([[0.99105, 0.96213, 0.96864, 0.96833, 0.96698, 0.97381],
[0.99957, 0.99709, 0.9957, 0.9927, 0.98492, 0.98864],
[0.9967, 0.98796, 0.9887, 0.98613, 0.98592, 0.99125],
[0.9982, 0.99347, 0.98943, 0.96873, 0.91424, 0.83831],
[0.9985, 0.99585, 0.99209, 0.98399, 0.97253, 0.97942]])
import matplotlib.pyplot as plt
plt.scatter(x=np.tile(np.arange(a.shape[1]), a.shape[0])+1, y=a)
output:
Note that you can emulate the same with groups using:
plt.plot(a.T, marker='o', ls='')
x = np.arange(a.shape[0]+1)
plt.xticks(x, x+1)
output:

I want to detect ranges with the same numerical boundaries of a dataset using matplotlib or pandas in python 3.7

I have a ton of ranges. They all consist of numbers. The range has a maximum and a minimum which can not be exceeded, but given the example that you have two ranges and one max point of the range reaches above the min area of the other. That would mean that you have a small area that covers both of them. You can write one range that includes the others.
I want to see if some ranges overlap or if I can find some ranges that cover most of the other. The goal would be to see if I can simplify them by using one smaller range that fits inside the other. For example 7,8 - 9,6 and 7,9 - 9,6 can be covered with one range.
You can see my attempt to visualize them. But when I use my entire dataset consisting of hundreds of ranges my graph is not longer useful.
I know that I can detect recurrent ranges using python. But I don't want to know how often a range occurs. I want to know how many ranges lay in the same numerical boundaries.I want see if I can have a couple of ranges covering all of them. Finally my goal is to have the masterranges sorted in categories. Meaning that I have range 1 covering 50 other ranges. then range 2 covering 25 ranges and so on.
My current program shows the penetration of ranges but I also want that in a printed output with the exact digits.
It would be nice if you share some ideas to solve that program or if you have any suggestions on tools within python 3.7
import matplotlib.pyplot as plt
intervals = [[3.6,4.5],
[3.6,4.5],
[7.8,9.6],
[7.9,9.6],
[7.8,9.6],
[3.4,4.1],
[2.8,3.4],
[8.25,9.83],
[3.62,3.96],
[8.25,9.83],
[0.62,0.68],
[2.15,2.49],
[0.8,1.0],
[0.8,1.0],
[3.1,3.9],
[6.7,8.3],
[1,1.5],
[1,1.2],
[1.5,1.8],
[1.8,2.5],
[3,4.0],
[6.5,8.0],
[1.129,1.35],
[2.82,3.38],
[1.69,3.38],
[3.38,6.21],
[2.25,2.82],
[5.649,6.214],
[1.920,6.214]
]
for int in intervals:
plt.plot(int,[0,0], 'b', alpha = 0.2, linewidth = 100)
plt.show()
Here is an idea, You make a pandas data frame with the array. You substract the values in column2 - colum1 ( column 1 is x, and column 2 is y ). After that you create a histogram in which you take the range and the frecuency.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
intervals = [[3.6,4.5],
[3.6,4.5],
[7.8,9.6],
[7.9,9.6],
[7.8,9.6],
[3.4,4.1],
[2.8,3.4],
[8.25,9.83],
[3.62,3.96],
[8.25,9.83],
[0.62,0.68],
[2.15,2.49],
[0.8,1.0],
[0.8,1.0],
[3.1,3.9],
[6.7,8.3],
[1,1.5],
[1,1.2],
[1.5,1.8],
[1.8,2.5],
[3,4.0],
[6.5,8.0],
[1.129,1.35],
[2.82,3.38],
[1.69,3.38],
[3.38,6.21],
[2.25,2.82],
[5.649,6.214],
[1.920,6.214]]
intervals_ar = np.array(intervals)
df = pd.DataFrame({'Column1': intervals_ar[:, 0], 'Column2': intervals_ar[:, 1]})
df['Ranges'] = df['Column2'] - df ['Column1']
print(df)
frecuency_range = df['Ranges'].value_counts().sort_index()
print(frecuency_range)
df.Ranges.value_counts().sort_index().plot(kind = 'hist', bins = 5)
plt.title("Histogram Frecuency vs Range (column 2- column1)")
plt.show()

How to plot errorbar in line chart from dataframes

I have two dataframes, df_avg and df_sem, which contain mean values and standard errors of the means, respectively. For example:
KPCmb1 KPCmb1IA KPCmb2 KPCmb3 KPCmb4 KPCmb5 KPCmb6
temp
19.99 15.185905 24.954296 22.610052 29.249107 26.151815 34.374257 36.589218
20.08 15.198452 24.998227 22.615342 29.229325 26.187794 34.343738 36.596730
20.23 15.208917 25.055061 22.647499 29.234424 26.193382 34.363549 36.580033
20.47 15.244485 25.092773 22.691421 29.206816 26.202425 34.337385 36.640839
20.62 15.270921 25.145798 22.720752 29.217821 26.235101 34.364162 36.600030
and
KPCmb1 KPCmb1IA KPCmb2 KPCmb3 KPCmb4 KPCmb5 KPCmb6
temp
19.99 0.342735 0.983424 0.131502 0.893494 1.223318 0.536450 0.988185
20.08 0.347366 0.983732 0.136239 0.898661 1.230763 0.534779 0.993970
20.23 0.348641 0.981614 0.134729 0.898790 1.227567 0.529240 1.005609
20.47 0.350937 0.993973 0.138411 0.881142 1.237749 0.526841 0.991591
20.62 0.345863 0.983064 0.132934 0.883863 1.234746 0.533048 0.987520
I want to plot a line chart using temp as the x-axis and the dataframe columns as the y-axes. I also want to use the df_sem dataframe to provide error bars for each line (note the column names are the same between the two dataframes).
I can achieve this with the following code:
df_avg.plot(yerr=df_sem), but this does not allow me to change many aspects of the plot, like DPI, labels, and things like that.
So I've tried to make the plot using the following code as an alternative:
plt.figure()
x = df_avg.index
y = df_avg
plt.errorbar(x,y,yerr=df_sem)
plt.show()
But this gives me the error: ValueError: shape mismatch: objects cannot be broadcast to a single shape
How do I go about making the same chart that I am able to using pandas plotting with matplotlib plotting?
Thanks!
You can do just a simple for loop:
for col in df_avg.columns:
plt.errorbar(df_avg.index, df_avg[col], yerr=df_sem[col], label=col)
plt.legend()
Output:

Python sort_values plot is inverted

new Python learner here. This seems like a very simple task but I can't do it to save my life.
All I want to do is to grab 1 column from my DataFrame, sort it, and then plot it. THAT'S IT. But when I plot it, the graph is inverted. Upon examination, I find that the values are sorted, but the index is not...
Here is my simple 3 liner code:
testData = pd.DataFrame([5,2,4,2,5,7,9,7,8,5,4,6],[9,4,3,1,5,6,7,5,4,3,7,8])
x = testData[0].sort_values()
plt.plot(x)
edit:
Using matplotlib
If you're talking about ordering values sequentially on the x-axis like 0, 1, 2, 3, 4 ... You need to re-index your values.
x = testData[0].sort_values()
x.index = range(len(x))
plt.plot(x)
Other than that if you want your values sorted in the data frame but displayed by order of index then you want a scatter plot not a line plot
plt.scatter(x.index, x.values)

Plot rolling mean together with data

I have a DataFrame that looks something like this:
####delays:
Worst case Avg case
2014-10-27 2.861433 0.953108
2014-10-28 2.899174 0.981917
2014-10-29 3.080738 1.030154
2014-10-30 2.298898 0.711107
2014-10-31 2.856278 0.998959
2014-11-01 3.118587 1.147104
...
I would like to plot the data of this DataFrame, together with the rolling mean of the data. I would like the data itself should be a dotted line and the rolling mean to be a full line. The worst case column should be in red, while the average case column should be in blue.
I've tried the following code:
import pandas as pd
import matplotlib.pyplot as plt
rolling = pd.rolling_mean(delays, 7)
delays.plot(x_compat=True, style='r--')
rolling.plot(style='r')
plt.title('Delays per day on entire network')
plt.xlabel('Date')
plt.ylabel('Minutes')
plt.show()
Unfortunately, this gives me 2 different plots. One with the data and one with the rolling mean. Also, the worst case column and average case column are both in red.
How can I get this to work?
You need to say to pandas where you want to plot. By default pandas creates a new figure.
Just modify these 2 lines:
delays.plot(x_compat=True, style='r--')
rolling.plot(style='r')
by:
ax_delays = delays.plot(x_compat=True, style='--', color=["r","b"])
rolling.plot(color=["r","b"], ax=ax_delays, legend=0)
in the 2nd line you now tell pandas to plot on ax_delays, and to not show the legend again.
To get 2 different colors for the 2 lines, just pass as many colors with color argument (see above).

Categories

Resources