I need to plot a pie chart using matplotlib but my DataFrame has 3 columns namely gender, segment and total_amount.
I have tried playing with plt.pie() arguments but it only takes x and labels for data. I tried setting gender as a legend but then it doesn't look right.
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({'gender': {0: 'Female',
1: 'Female',
2: 'Female',
3: 'Male',
4: 'Male',
5: 'Male'},
'Segment': {0: 'Gold',
1: 'Platinum',
2: 'Silver',
3: 'Gold',
4: 'Platinum',
5: 'Silver'},
'total_amount': {0: 2110045.0,
1: 2369722.0,
2: 1897545.0,
3: 2655970.0,
4: 2096445.0,
5: 2347134.0}})
plt.pie(data = df,x="claim_amount",labels="Segment")
plt.legend(d3.gender)
plt.show()
The result I want is a pie chart of total_amount and its labels as gender and segment. If I can get the percentage, it will be a bonus.
I suggest the following:
# Data to plot
# Take the information from the segment and label columns and join them into one string
labels = df["Segment"]+ " " + df["gender"].map(str)
# Extract the sizes of the segments
sizes = df["total_amount"]
# Plot with labels and percentage
plt.pie(sizes, labels=labels,autopct='%1.1f%%')
plt.show()
You should get this:
Related
I am trying to plot a world map with all the countries having different risk levels (low, moderate and high). I would like to make each risk level a different color but am not sure how to change the color scheme so that each risk category has a color of my choice.
The df.risk variable currently has low as 1, moderate as 2 and high as 3 so that it is a continuous variable, however I would like to use discrete,
fig = go.Figure(data=go.Choropleth(
locations = df['code'],
z = df['risk'],
text = df['COUNTRY'],
colorscale = 'Rainbow',
autocolorscale=False,
reversescale=True,
marker_line_color='darkgray',
marker_line_width=0.5,
colorbar_tickprefix = '',
colorbar_title = 'Risk level',
))
fig.update_layout(
title_text='Risk map',
geo=dict(
showframe=False,
showcoastlines=False,
projection_type='equirectangular'
),
annotations = [dict(
x=0.55,
y=0.15,
xref='paper',
yref='paper',
text='Source: <a href="www.google.com">\
Google</a>',
showarrow = False
)]
)
fig.show()
My sample df is:
{'Country': {0: 'Afghanistan',
1: 'Albania',
2: 'Algeria',
3: 'American Samoa',
4: 'Andorra'},
'code': {0: 'AFG', 1: 'ALB', 2: 'DZA', 3: 'ASM', 4: 'AND'},
'risk': {0: 'High', 1: 'Moderate', 2: 'High', 3: 'Low', 4: 'High'}}
In this case I would rather use plotly.express with color=df['risk'] and then set color_discrete_map={'High':'red', 'Moderate':'Yellow','Low':'Green'}:
Plot:
Complete code:
import plotly.express as px
import pandas as pd
fig = px.choropleth(locations=df['Country'],
locationmode="country names",
color=df['risk'],
color_discrete_map={'High':'red',
'Moderate':'Yellow',
'Low':'Green'}
#scope="usa"
)
fig.show()
df = pd.DataFrame({
'id': {0: -3, 1: 2, 2: -3, 3: 1},
'val': {0: 0.4, 1: 0.03, 2: 0.88, 3: 1.3},
'indicator': {0: 'A', 1: 'A', 2: 'B', 3: 'B'},
'count': {0: 40000, 1: 5779, 2: 3000, 3: 31090}
})
df
and I hope to get a plot like the following:
I know with code of following, I can get a close plot but I want to have the line size also varies with "count" variable, but when I tried to add size = 'count', I did not get a meaningful plot and also for the legend, I want to only have one legend for "indicator" rather than two:
plt.figure()
sns.lineplot(x = 'id', y = 'val', hue = 'indicator', data = df)
sns.scatterplot(x = 'id', y = 'val', hue = 'indicator', size = 'count', data = df)
To answer the second part of your question - you can disable the lineplot legend like so:
sns.lineplot(x = 'id', y = 'val', hue = 'indicator', data = df, legend=False)
This will leave you with two legend groups - one for colours and one for sizes. This is the easiest way, but you can also tinker with plt.legend() and build your own from scratch.
As for making the lines vary their thickness dynamically from one point to another, I don't think you can do it using seaborn. For something like that you'd need a more low-level library, like bokeh or use matplotlib directly to draw connecting lines between line markers, adjusting for their varying size.
I have a pandas dataframe. It looks like this:
Im trying to create scatter plot with different colors for each point. I tried:
df.plot.scatter(x='x',y='y',c=df.colors.tolist(), legend=True)
I get the scatter plot allright. But im not able to show label that "Financials" is associated with color '#4ce068'.
How do i do that here? Thanks
I tried:
df.plot.scatter(x='x',y='y',c=df.colors.tolist(),label=df.key.unique.tolist())
This almost works but the fact there there are too many labels and the colors associate with the label is hard to see.
I would like to have the key shows with associated color preferably on top of the chart i.e next to title. Is that possible?
My dataframe in dictionary format:
{'key': {0: 'Financials',
1: 'Consumer Discretionary',
2: 'Industrials',
3: 'Communication Services',
4: 'Communication Services',
5: 'Consumer Discretionary',
6: 'Health Care',
7: 'Information Technology',
8: 'Consumer Discretionary',
9: 'Information Technology'},
'x': {0: 1630000.0,
1: 495800000.0,
2: 562790000.0,
3: 690910000.0,
4: 690910000.0,
5: 753090000.0,
6: 947680000.0,
7: 1010000000.0,
8: 1090830000.0,
9: 1193600000.0},
'y': {0: 0.02549175,
1: 0.0383163,
2: -0.09842154,
3: 0.03876266,
4: 0.03596279,
5: 0.01897367,
6: 0.06159238,
7: 0.0291209,
8: 0.003931255,
9: -0.007134976},
'colors': {0: '#4ce068',
1: '#ecef4a',
2: '#ff3c83',
3: '#ff4d52',
4: '#ff4d52',
5: '#ecef4a',
6: '#00f1d1',
7: '#d9d9d9',
8: '#ecef4a',
9: '#d9d9d9'}}
If you don't mind using plotly you can try
import pandas as pd
import plotly.express as px
px.scatter(df, 'x', 'y', color="key", color_discrete_sequence=df["colors"].tolist())
But given your choice of colors just px.scatter(df, 'x', 'y', color="key") will look better
Here's one way to do it:
fig, ax = plt.subplots()
for i in df.colors.unique():
df_color = df[df.colors==i]
df_color.plot.scatter(x='x',y='y', ax=ax, c=i, label=df_color.key.iloc[0])
ax.legend(loc='best')
You'll get
slov = {'People': {0: 'Ivan', 1: 'John', 2: 'Peter', 3: 'Ming'}, 'Country':{0: 'Russia', 1: 'USA', 2: 'USA', 3: 'China'},\
'Height': {0: 181, 1: 175, 2: 174, 3: 173}}
I'd like to see this picture
But I don't know how to do it. I mean I want to have a red color for Russian people, Green color for USA people and yellow color for Chines.
My attemp to find solution:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
sns.set(style="white")
slov = {'People': {0: 'Ivan', 1: 'John', 2: 'Peter', 3: 'Ming'}, 'Country':{0: 'Russia', 1: 'USA', 2: 'USA', 3: 'China'},\
'Height': {0: 181, 1: 175, 2: 174, 3: 173}}
obj = pd.DataFrame(slov)
palette=["g", "b", "r"]
obj['Color']='r'
row_index = obj.Country == 'Russia'
obj.loc[row_index, 'Color'] = 'r'
row_index = obj.Country == 'USA'
obj.loc[row_index, 'Color'] = 'g'
row_index = obj.Country == 'China'
obj.loc[row_index, 'Color'] = 'y'
g = sns.factorplot(x="People", y="Height", data=obj, kind='bar', palette=obj['Color'])
plt.show()
And maybe my solution is not very good. I added color to DataFrame. Maybe we can write this better. Maybe I don't need to add color to my DataFrame (It seems not very correct.). But How can I solve my task without adding these colors to my DataFrame?
You can use map by dict:
d = {'Russia':'r', 'USA':'g','China':'y'}
g = sns.factorplot(x="People",
y="Height",
data=obj,
kind='bar',
palette=obj['Country'].map(d))
plt.show()
How can I format Matplotlib plots on multi-indexed data to resemble Excel's PivotChart axis layout? Excel's PivotChart feature groups similar axis labels together, whereas MPL labels each tick individually as (Index1,Index2). Using the Sample Data, I've provided the outputs for both Excel and MPL; notice how Index1 is grouped in the Excel chart, but not in the MPL plot.
data = {
'Index1': {0: 'A', 1: 'A', 2: 'B', 3: 'B'},
'Index2': {0: 1, 1: 2, 2: 1, 3: 2},
'Value': {0: 50, 1: 100, 2: 50, 3: 100}
}
Matplotlib Chart
Excel Chart
Does anyone have a solution? Ideally, the number of multi-index levels will not matter. Thanks for the help!