Related
Dear members of stackoverflow,
I want to create a circular frequency histogram (rose diagram) using the the frequencies for each bin listed as a single column in a text file. How could I do this using matplotlib.pyplot and numpy in python3?
I did an initial attempt with a code I found on the internet, but when I get the rose diagram the bins are overlapped when they should be beside each other. Other detail: the radius of the circle for each bin should be the frequency, but this also changes and does not match my frequencies.
I want my bins to go from 0 to 360 degrees with width of 10 degrees; example: 0-10, 10-20 etc.
This is a sample of the txt file with the frequencies(frequencies.txt):
0
0
0
0
0
2
0
1
1
0
1
0
0
1
2
29
108
262
290
184
81
25
7
2
3
1
1
0
0
0
0
0
0
0
0
0
You could create a polar bar plot. The angles need to be converted from degrees to radians.
frequencies = np.loadtxt('filename.txt') would read the values from file (docs).
import numpy as np
import matplotlib.pyplot as plt
frequencies = [0, 0, 0, 0, 0, 2, 0, 1, 1, 0, 1, 0, 0, 1, 2, 29, 108, 262, 290,
184, 81, 25, 7, 2, 3, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
fig = plt.figure()
ax = plt.axes(polar=True)
theta = np.radians(np.arange(0, 360, 10))
width = np.radians(10)
ax.bar(theta, frequencies, width=width,
facecolor='lightblue', edgecolor='red', alpha=0.5, align='edge')
ax.set_xticks(theta)
plt.show()
started learning how to plot data on python and I need help achieving the following:
I have the following example df6:
df6 = pd.DataFrame({
'emails': [50, 60 ,30, 40, 90, 10, 0,85 ],
'delivered': [20, 16 ,6, 15, 66, 6, 0,55 ]
})
df6
Looks like:
emails delivered
0 50 20
1 60 16
2 30 6
3 40 15
4 90 66
5 10 6
6 0 0
7 85 55
I need to plot emails VS delivered in a 4 quadrant chart. X & Y range will be slightly extra of the max and the cross section will be the means of both columns.
What I did so far, used describe() to get the values of the df6 then:
fig, ax = plt.subplots()
fig.set_size_inches(7, 5)
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)
plt.axhline(y=45.6, color="black", linestyle="--")
plt.axvline(x=23, color="black", linestyle="--")
plt.plot(df6['delivered'],df6['emails'],"o")
plt.xlim([0, df6['delivered'].max()+20])
plt.ylim([0, df6['emails'].max()+20])
plt.show()
I got the following output so far:
What I am looking for is seeing the chart into just 4 groups scattered and label each group with the total count of one quarter:
I found it easier to normalize the data before plotting... UPDATE: Messed something up with counts, but the code is here to analyze my mistake.
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scale = scaler.fit(df6)
# normalize the sen_matrix
norm_df = pd.DataFrame(scale.transform(df6), columns=df6.columns)
quadrant_1 = sum(np.logical_and(norm_df['emails'] < 0, norm_df['delivered'] < 0))
display(quadrant_1)
quadrant_2 = sum(np.logical_and(norm_df['emails'] > 0, norm_df['delivered'] < 0))
display(quadrant_2)
quadrant_3 = sum(np.logical_and(norm_df['emails'] < 0, norm_df['delivered'] > 0))
display(quadrant_3)
quadrant_4 = sum(np.logical_and(norm_df['emails'] > 0, norm_df['delivered'] > 0))
display(quadrant_4)
fig, ax = plt.subplots()
fig.set_size_inches(7, 5)
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)
plt.axhline(y=0, color="black", linestyle="--")
plt.axvline(x=0, color="black", linestyle="--")
plt.plot(norm_df['delivered'],norm_df['emails'],"o")
plt.gca().spines['bottom'].set_visible(False)
plt.gca().spines['left'].set_visible(False)
plt.gca().axes.get_xaxis().set_visible(False)
plt.gca().axes.get_yaxis().set_visible(False)
plt.text(0,-2.1,'Delivered',horizontalalignment='center', verticalalignment='center')
plt.text(-2.1,0,'Emails', horizontalalignment='center', verticalalignment='center', rotation=90)
plt.text(1,1,'Count: ' + str(quadrant_1),horizontalalignment='center', verticalalignment='center')
plt.text(-1,1,'Count: ' + str(quadrant_2), horizontalalignment='center', verticalalignment='center')
plt.text(-1,-1,'Count: ' + str(quadrant_3),horizontalalignment='center', verticalalignment='center')
plt.text(1,-1,'Count: ' + str(quadrant_4), horizontalalignment='center', verticalalignment='center')
plt.xlim([-2, 2])
plt.ylim([-2, 2])
plt.show()
So to use the means in your plots you can start by simply modifying these 2 lines:
plt.axhline(y=df6['emails'].mean(), color="black", linestyle="--")
plt.axvline(x=df6['delivered'].mean(), color="black", linestyle="--")
We can then use pd.value_counts to compute the counts:
counts = df6.transform(lambda s: s >= s.mean()).value_counts()
pos = df6.agg(['min', 'max'])
Here counts contains the values of each pair of above/below means:
emails delivered
False False 4
True False 2
True 2
and pos contains the x/y (or email/delivered) coordinates at which the boxes are placed:
emails delivered
min 0 0
max 90 66
So you can adjust pos to change the annotation placement.
Finally you want to do the annotation on the figure:
for (eml, dlv), num in counts.iteritems():
ax.text(s=f'count: {num}',
x=pos.loc['max' if dlv else 'min', 'delivered'],
y=pos.loc['max' if eml else 'min', 'emails'],
ha='right' if dlv else 'left',
va='top' if eml else 'bottom',
)
Your are just missing the code for setting your left/bottom-spines position
import pandas as pd, numpy as np
df6 = pd.DataFrame({'emails': [50, 60 ,30, 40, 90, 10, 0,85 ],
'delivered': [20, 16 ,6, 15, 66, 6, 0,55 ]})
plt.plot(df6['delivered'],df6['emails'],"o")
count = np.count_nonzero(
(df6['emails'] < df6['delivered'].mean())&
(df6['delivered'] < df6['emails'].mean()) )
plt.annotate('count: %s'%count,(5,60))
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)
plt.gca().spines['left'].set_position(('data',df6['delivered'].mean()))
plt.gca().spines['bottom'].set_position(('data',df6['emails'].mean()))
Here's another solution, with a more symmetric looking plot:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(
{
"emails": [50, 60, 30, 40, 90, 10, 0, 85],
"delivered": [20, 16, 6, 15, 66, 6, 0, 55],
}
)
plt.plot(df["delivered"], df["emails"], "o")
plt.gca().spines["top"].set_visible(False)
plt.gca().spines["right"].set_visible(False)
plt.gca().spines["left"].set_position(("data", df["delivered"].mean()))
plt.gca().spines["bottom"].set_position(("data", df["emails"].mean()))
def get_lims(df, column, w=0.1):
mean = df[column].mean()
max_diff = max(
abs(df[column].max() - mean),
abs(df[column].min() - mean),
)
return [mean - max_diff - max_diff * w, mean + max_diff + max_diff * w]
plt.xlim(get_lims(df, "delivered"))
plt.ylim(get_lims(df, "emails"))
plt.show()
I am trying to plot a point to point line plot in python.
My data is in a pandas dataframe as below..
df = pd.DataFrame({
'x_coordinate': [0, 0, 0, 0, 1, 1,-1,-1,-2,0],
'y_coordinate': [0, 2, 1, 3, 3, 1,1,-2,2,-1],
})
print(df)
x_coordinate y_coordinate
0 0 0
1 0 2
2 0 1
3 0 3
4 1 3
5 1 1
6 -1 1
7 -1 -2
8 -2 2
9 0 -1
when I plot this, it is joining from point to point as in the order in the df.
df.plot('x_coordinate','y_coordinate')
But, is there a way, I can plot an order number next to it ? I mean the order it is travelling. Say 1 for the first connection from (0,0) to (0,2) and 2 from (0,2) to (0,1) and so on ?
The plot is OK. If you want to check how each vertex is plotted, you need modified data. Here is the modified data (x only) and the plot.
df = pd.DataFrame({
'x_coordinate': [0.1, 0.2, 0.3, 0.4, 1.5, 1.6,-1.7,-1.8,-2.9,0.1],
'y_coordinate': [0, 2, 1, 3, 3, 1,1,-2,2,-1],
})
Edit
For your new request, the code is modified as follows (full runnable code).
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.DataFrame({
'x_coordinate': [0.1, 0.2, 0.3, 0.4, 1.5, 1.6,-1.7,-1.8,-2.9,0.1],
'y_coordinate': [0, 2, 1, 3, 3, 1,1,-2,2,-1],
})
fig = plt.figure(figsize=(6,5))
ax1 = fig.add_subplot(1, 1, 1)
df.plot('x_coordinate','y_coordinate', legend=False, ax=ax1)
for ea in zip(np.array((range(len(df)))), df.x_coordinate.values, df.y_coordinate.values):
text, x, y = "P"+str(ea[0]), ea[1], ea[2]
ax1.annotate(text, (x,y))
I found an easier way to do it.. Thought to share..
fig, ax = plt.subplots()
df.plot('x_coordinate','y_coordinate',ax=ax)
for k, v in df[['x_coordinate','y_coordinate']].iterrows():
ax.annotate('p'+str(k+1), v)
plt.show()
I am using following code to make 5 bars on 3 different data sets a, b and c. How can I show all colors in each bar. I don't want their value to add up. For example, in first bar if the value of Green is 1, Yellow is 3 and Red is 6 I don't want the final value to be 10 rather it should be 6 but all colors should appear till their final value. I don't want to use transparent colors or only bar outlines.
import matplotlib.pyplot as plt
import numpy as np
a = [1, 2, 3, 4, 5]
b = [3, 4, 1, 10, 9]
c = [6, 7, 2, 4, 6]
ind = np.arange(len(a))
fig = plt.figure()
ax = fig.add_subplot(111)
ax.bar(x=ind, height=a, width=0.35, align='center', label='Green',
facecolor='g')
ax.bar(x=ind, height=b, width=0.35, align='center', label='Yellow',
facecolor='y')
ax.bar(x=ind, height=c, width=0.35, align='center', label='Red', facecolor='r')
plt.xticks(ind, a)
plt.xlabel('Coordination Number')
plt.ylabel('Frequency')
plt.legend()
plt.show()
The reference value for the 'a' column is 6, but it was unclear if it is the maximum value. I understood it to be the maximum value and calculated the composition ratio.
I created a stacked graph based on the results.
import numpy as np
import pandas as pd
a = [1, 2, 3, 4, 5]
b = [3, 4, 1, 10, 9]
c = [6, 7, 2, 4, 6]
ind = np.arange(len(a))
df = pd.DataFrame({'a':a,'b':b,'c':c}, index=ind)
df['total'] = df.sum(axis=1)
df['max'] = df[['a','b','c']].max(axis=1)
df['aa'] = df['max']*(df['a']/df['total'])
df['bb'] = df['max']*(df['b']/df['total'])
df['cc'] = df['max']*(df['c']/df['total'])
df
a b c total max aa bb cc
0 1 3 6 10 6 0.600000 1.800000 3.600000
1 2 4 7 13 7 1.076923 2.153846 3.769231
2 3 1 2 6 3 1.500000 0.500000 1.000000
3 4 10 4 18 10 2.222222 5.555556 2.222222
4 5 9 6 20 9 2.250000 4.050000 2.700000
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
ax.bar(x=ind, height=df.loc[:,'aa'], bottom=0, width=0.35, align='center', label='Green',
facecolor='g')
ax.bar(x=ind, height=df.loc[:,'bb'], bottom=df.loc[:,'aa'], width=0.35, align='center', label='Yellow',
facecolor='y')
ax.bar(x=ind, height=df.loc[:,'cc'], bottom=df.loc[:,'aa']+df.loc[:,'bb'], width=0.35, align='center', label='Red', facecolor='r')
plt.xticks(ind, a)
plt.xlabel('Coordination Number')
plt.ylabel('Frequency')
plt.legend()
plt.show()
If I understand your question correctly, you want to show all colour bars starting from the same zero baseline and grouped together under their corresponding Number?
I'll use bokeh for plotting, since it provides an easy way to "offset" each bar in the group. To vary the amount of visual offset for each bar, change the second parameter of the dodge function. For this combination of widths, 0.05 seemed like a nice value.
from bokeh.io import output_notebook, output_file, show
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure
from bokeh.transform import dodge
output_notebook() # or output_file("chart.html") if not using Jupyter
x_axis_values = [str(x) for x in range(1, 6)]
data = {
"Coordination Number" : x_axis_values,
"Green" : [1, 2, 3, 4, 5],
"Yellow" : [3, 4, 1, 10, 9],
"Red" : [6, 7, 2, 4, 6]
}
src = ColumnDataSource(data=data)
p = figure(
x_range=x_axis_values, y_range=(0, 10), plot_height=275,
title="Offset Group Bar Chart", toolbar_location=None, tools="")
p.vbar(
x=dodge('Coordination Number', -0.05, range=p.x_range),
top='Green', width=0.2, source=src, color="#8DD3C7", legend_label="Green")
p.vbar(
x=dodge('Coordination Number', 0.0, range=p.x_range),
top='Yellow', width=0.2, source=src, color="#FFD92F", legend_label="Yellow")
p.vbar(
x=dodge('Coordination Number', 0.05, range=p.x_range),
top='Red', width=0.2, source=src, color="#E15759", legend_label="Red")
p.x_range.range_padding = 0.1
p.xgrid.grid_line_color = None
p.legend.location = "top_left"
p.legend.orientation = "horizontal"
p.xaxis.axis_label = "Coordination Number"
p.yaxis.axis_label = "Frequency"
show(p)
I'm plotting many subplots in the same figure. I encounter the problem that xtick labels intercept one with each other. I do not want any space between the subplots.
Here is an example:
In particular I would like xtick labels not to be above/below the green lines, just like it happens at the points indicated with red squares.
One idea I had so far was, in a case where my max=4 and min=0, I'd draw tick labels for 1 2 and 3 at their respective locations, e.g 1,2,3. Then I'd draw 4 at the position 3.8 and 0 at the position 0.2. Any ideas?
thanks!
Not exactly what you asked for, but a quick solution is to set the alignment parameter:
pylab.xticks(..., horizontalalignment='left')
pylab.yticks(..., verticalalignment='bottom')
This will apply to all ticks.
This is how I would do it:
axScatter.set_xticks([0, 1, 2, 3, 4 ,5 ,6])
axScatter.set_yticks([-8, -6, -4, -2, 0, 2, 4, 6])
And you can use:
axScatter.yaxis.set_major_formatter(nullfmt)
To make the y axis labels disappear for the top right and bottom right plots.
The whole plt.figure routine should look something like this:
fig = plt.figure()
axplot_topleft = fig.add_subplot(2,2,1)
axplot_topleft.xaxis.set_major_formatter(nullfmt)
axplot_topleft.set_yticks([-8, -6, -4, -2, 0, 2, 4, 6])
axplot_topright = fig.add_subplot(2,2,2)
axplot_topright.xaxis.set_major_formatter(nullfmt)
axplot_topright.yaxis.set_major_formatter(nullfmt)
axplot_bottomleft = fig.add_subplot(2,2,3)
axplot_bottomleft.set_xticks([0, 1, 2, 3, 4 ,5 ,6])
axplot_bottomleft.set_yticks([-8, -6, -4, -2, 0, 2, 4, 6])
axplot_bottomright = fig.add_subplot(2,2,4)
axplot_bottomright.yaxis.set_major_formatter(nullfmt)
axplot_bottomright.set_xticks([0, 1, 2, 3, 4 ,5 ,6])