How to create a pie-chart from pandas DataFrame? - python

I have a dataframe, with Count arranged in decending order, that looks something like this:
df = pd.DataFrame({'Topic': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M'],
'Count': [80, 75, 70, 65, 60, 55, 50, 45, 40, 35, 30, 25, 20]})
But with more than 50 rows.
I would like to create a pie chart for the top 10 topics and rest of them to be summed up and represent its percentange as label "Others" in the pie chart. Is it possible to exclude the pie labels against each pie, and mention them seperately in a legend?
Thanking in anticipation

Replace Topic by Other if no top N in Series.where and then aggregate sum with Series.plot.pie:
N = 10
df['Topic'] = df['Topic'].where(df['Count'].isin(df['Count'].nlargest(N)), 'Other')
s = df.groupby('Topic')['Count'].sum()
pie = df.plot.pie(y='Count', legend=False)
#https://stackoverflow.com/a/44076433/2901002
labels = [f'{l}, {s:0.1f}%' for l, s in zip(s.index, s / s.sum())]
plt.legend(bbox_to_anchor=(0.85, 1), loc='upper left', labels=labels)

You need to craft a new dataframe. Assuming your counts are sorted in descending order (if not, use df.sort_values(by='Count', inplace=True)):
TOP = 10
df2 = df.iloc[:TOP]
df2 = df2.append({'Topic': 'Other', 'Count': df['Count'].iloc[TOP:].sum()},
ignore_index=True)
df2.set_index('Topic').plot.pie(y='Count', legend=False)
Example (N=10, N=5):
Percentages in the legend:
N = 5
df2 = df.iloc[:N]
df2 = df2.append({'Topic': 'Other', 'Count': df['Count'].iloc[N:].sum()}, ignore_index=True)
df2.set_index('Topic').plot.pie(y='Count', legend=False)
leg = plt.legend(labels=df2['Count'])
output:

import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'Topic': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M'],
'Count': [80, 75, 70, 65, 60, 55, 50, 45, 40, 35, 30, 25, 20]})
df.index = df.Topic
plot = df.plot.pie(y='Count', figsize=(5, 5))
plt.show()
Use documentation: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.pie.html

Related

Pandas/Matplotlib: How do I plot in groups and color code based on another column?

I hoped this would be very simple, but I wasted way too much time on this already. There has to be a simple way of doing this.
I have a very simple dataframe:
I want to simply plot a bar chart, that groups by the column "data_range", so that i have three bars indicating the different mean values for the three "trade_types".
df.groupby('data_range')['mean'].plot(legend=True)
The closest I got to making this happen was with this code. It returned this plot:
Which is already close, except that I want bars, label each group with the corresponding data_range and have the same color for each trade_type (also displayed in the legend). If I use .bar after .plot, I receive three different plots instead of one. How do I simply create a bar plot, that shows each data_range group and makes it comparable?
You can first pivot your table and then bar plot will work as you want.
import pandas as pd
#making a table like yours but with different values
df = pd.DataFrame({
'data_range':['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
'trade_type':['T1', 'T2', 'T3', 'T1', 'T2', 'T3', 'T1', 'T2', 'T3'],
'mean':[17, 11, 18, 15, 15, 11, 11, 6, 16],
})
#pivot the table so each trade type is a column
piv_df = df.pivot(index='data_range',columns='trade_type',values='mean')
#print(piv_df) #this is what the pivoted table looks like
# T1 T2 T3
#A 17 11 18
#B 15 15 11
#C 11 6 16
piv_df.plot.bar()
There's also a great plotting library called seaborn which is more powerful than the pandas built-in plots that allows you to make more customization. Here's an example of how the same plot could be accomplished in seaborn
import seaborn as sns
import pandas as pd
#making a table like yours but with different values
df = pd.DataFrame({
'data_range':['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
'trade_type':['T1', 'T2', 'T3', 'T1', 'T2', 'T3', 'T1', 'T2', 'T3'],
'mean':[17, 11, 18, 15, 15, 11, 11, 6, 16],
})
sns.barplot(
x = 'data_range',
y = 'mean',
hue = 'trade_type',
data = df,
)

Drawing box-plot without tails only max and min on the edges of the rectangle in python

I would like to draw box-plot for below data set but I don't need tails . I need a rectangle with max and min on edges . By the way it does not have to be a rectangle it could be a thick line
Please help.
Thank you.
import seaborn as sns
import pandas as pd
df=pd.DataFrame({'grup':['a','a','a','a','b','b','b','c','c','c','c','c','c','c'],'X1':
[10,9,12,5,20,43,28,40,65,78,65,98,100,150]})
df
ax = sns.boxplot(x="grup", y="X1", data=df, palette="Set3")
You can create a barplot, using the minimums as 'bottom' and the difference between maximums and minimums as heights.
Note that a barplot has a "sticky" bottom, fixing the lowest point of the y-axis to the lowest bar. As a remedy, we can change the ylim.
import seaborn as sns
import pandas as pd
import numpy as np
df = pd.DataFrame({'grup': ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'c', 'c'],
'X1': [10, 9, 12, 5, 20, 43, 28, 40, 65, 78, 65, 98, 100, 150]})
grups = np.unique(df['grup'])
bottoms = [df[df['grup'] == g]['X1'].min() for g in grups]
heights = [df[df['grup'] == g]['X1'].max() - g_bottom for g, g_bottom in zip(grups, bottoms)]
ax = sns.barplot(x=grups, y=heights, bottom=bottoms, palette="Set3", ec='black')
# for reference, show where the values are; leave this line out for the final plot
sns.stripplot(x='grup', y='X1', color='red', s=10, data=df, ax=ax)
ax.set_xlabel('grup') # needed because the barplot isn't directly using a dataframe
ax.set_ylabel('X1')
ax.set_ylim(ymin=0)
Update: adding the minimum and maximum values:
import seaborn as sns
import pandas as pd
import numpy as np
df = pd.DataFrame({'grup': ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'c', 'c'],
'X1': [10, 9, 12, 5, 20, 43, 28, 40, 65, 78, 65, 98, 100, 150]})
grups = np.unique(df['grup'])
bottoms = np.array([df[df['grup'] == g]['X1'].min() for g in grups])
tops = np.array([df[df['grup'] == g]['X1'].max() for g in grups])
ax = sns.barplot(x=grups, y=tops - bottoms, bottom=bottoms, palette="Set3", ec='black')
ax.set_xlabel('grup') # needed because the barplot isn't directly using a dataframe
ax.set_ylabel('X1')
ax.set_ylim(ymin=0)
for i, (grup, bottom, top) in enumerate(zip(grups, bottoms, tops)):
ax.text(i, bottom, f'\n{bottom}', ha='center', va='center')
ax.text(i, top, f'{top}\n', ha='center', va='center')

Plotly python bar plot stack order

Here is my code for the dataframe
df = pd.DataFrame({'var_1':['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c'],
'var_2':['m', 'n', 'o', 'm', 'n', 'o', 'm', 'n', 'o'],
'var_3':[np.random.randint(25, 33) for _ in range(9)]})
Here is the dataframe that I have
var_1 var_2 var_3
0 a m 27
1 a n 28
2 a o 28
3 b m 31
4 b n 30
5 b o 25
6 c m 27
7 c n 32
8 c o 27
Here is the code I used to get the stacked bar plot
fig = px.bar(df, x='var_3', y='var_1', color='var_2', orientation='h', text='var_3')
fig.update_traces(textposition='inside', insidetextanchor='middle')
fig
But I want the bar to stack in descending order of the values, largest at the start/bottom and smallest at top
How should I update the layout to get that
df = pd.DataFrame({'var_1':['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c'],
'var_2':['m', 'n', 'o', 'm', 'n', 'o', 'm', 'n', 'o'],
'var_3':[np.random.randint(25, 33) for _ in range(9)]})
df.sort_values(['var_1', 'var_3'], ignore_index=True, inplace=True, ascending=False)
# colors
colors = {'o': 'red',
'm': 'blue',
'n': 'green'}
# traces
data = []
# loop across the different rows
for i in range(df.shape[0]):
data.append(go.Bar(x=[df['var_3'][i]],
y=[df['var_1'][i]],
orientation='h',
text=str(df['var_3'][i]),
marker=dict(color=colors[df['var_2'][i]]),
name=df['var_2'][i],
legendgroup=df['var_2'][i],
showlegend=(i in [1, 2, 3])))
# layout
layout = dict(barmode='stack',
yaxis={'title': 'var_1'},
xaxis={'title': 'var_3'})
# figure
fig = go.Figure(data=data, layout=layout)
fig.update_traces(textposition='inside', insidetextanchor='middle')
fig.show()

Agregate functions block plot interactivity in Altair

Consider the following code adapted from: Altair website
import altair as alt
import pandas as pd
source = pd.DataFrame({
'a': ['A', 'B', 'B', 'B', 'E', 'F', 'G', 'H', 'I'],
'b': [28, 55, 43, 91, 81, 53, 19, 87, 52]
})
alt.Chart(source).mark_bar().encode(
x='a',
y='b:Q'
).interactive()
Outputs this plot:
Which is interactive (we can zoom in). However, if I change the Y encoding field to the following (which is what I need) - by adding an Aggregative function:
import altair as alt
import pandas as pd
source = pd.DataFrame({
'a': ['A', 'B', 'B', 'B', 'E', 'F', 'G', 'H', 'I'],
'b': [28, 55, 43, 91, 81, 53, 19, 87, 52]
})
alt.Chart(source).mark_bar().encode(
x='a',
y='sum(b):Q'
).interactive()
The plot is no longer interactive. Is it possible to make it interactive while using an Aggregative Function, ie: move it around, zoom in, zoom out?
Thank you :)
This is a known limitation in Vega/Vega-Lite; see https://github.com/vega/vega-lite/issues/5308
As a workaround, you can pass pre-aggregated data to the chart:
import altair as alt
import pandas as pd
source = pd.DataFrame({
'a': ['A', 'B', 'B', 'B', 'E', 'F', 'G', 'H', 'I'],
'b': [28, 55, 43, 91, 81, 53, 19, 87, 52]
})
data = source.groupby('a').sum().reset_index()
alt.Chart(data).mark_bar().encode(
x='a',
y=alt.Y('b', title='Sum of b')
).interactive()

How to plot multiple groups in different colors and shapes with matplotlib?

Given the following DataFrame (in pandas):
X Y Type Region
index
1 100 50 A US
2 50 25 A UK
3 70 35 B US
4 60 40 B UK
5 80 120 C US
6 120 35 C UK
In order to generate the DataFrame:
import pandas as pd
data = pd.DataFrame({'X': [100, 50, 70, 60, 80, 120],
'Y': [50, 25, 35, 40, 120, 35],
'Type': ['A', 'A', 'B', 'B', 'C', 'C'],
'Region': ['US', 'UK'] * 3
},
columns=['X', 'Y', 'Type', 'Region']
)
I tried to make a scatter plot of X and Y, colored by Type and shaped by Region. How could I achieve it in matplotlib?
With more Pandas:
from pandas import DataFrame
from matplotlib.pyplot import show, subplots
from itertools import cycle # Useful when you might have lots of Regions
data = DataFrame({'X': [100, 50, 70, 60, 80, 120],
'Y': [50, 25, 35, 40, 120, 35],
'Type': ['A', 'A', 'B', 'B', 'C', 'C'],
'Region': ['US', 'UK'] * 3
},
columns=['X', 'Y', 'Type', 'Region']
)
cs = {'A':'red',
'B':'blue',
'C':'green'}
markers = ('+','o','>')
fig, ax = subplots()
for region, marker in zip(set(data.Region),cycle(markers)):
reg_data = data[data.Region==region]
reg_data.plot(x='X', y='Y',
kind='scatter',
ax=ax,
c=[cs[x] for x in reg_data.Type],
marker=marker,
label=region)
ax.legend()
show()
For this kind of multi-dimensional plot, though, check out seaborn (works well with pandas).
An approach would be to do the following. It is not elegant, but works
import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np
plt.ion()
colors = ['g', 'r', 'c', 'm', 'y', 'k', 'b']
markers = ['*','+','D','H']
for iType in range(len(data.Type.unique())):
for iRegion in range(len(data.Region.unique())):
plt.plot(data.X.values[np.bitwise_and(data.Type.values == data.Type.unique()[iType],
data.Region.values == data.Region.unique()[iRegion])],
data.Y.values[np.bitwise_and(data.Type.values == data.Type.unique()[iType],
data.Region.values == data.Region.unique()[iRegion])],
color=colors[iType],marker=markers[iRegion],ms=10)
I am not familiar with Panda, but there must some more elegant way to do the filtering. A marker list can be obtained using markers.MarkerStyle.markers.keys() from matplotlib and the conventional color cycle can be obtained using gca()._get_lines.color_cycle.next()

Categories

Resources