I have a dataframe that will be visualized. This is the code to obtain that dataframe:
zonasi = (df.groupby('kodya / kab')['customer'].nunique()) zonasi
this is the output from the code above:
kab bandung 1
kab bandung barat 4
kab banyumas 2
kab batang 1
kab bekasi 29
kab bogor 13
kab kudus 11
kab tangerang 15
kab tegal 2
kota adm jakarta barat 14
kota adm jakarta pusat 6
kota adm jakarta selatan 10
kota adm jakarta timur 23
kota adm jakarta utara 9
kota balikpapan 1
kota bandung 12
kota bekasi 12
kota semarang 11
kota surabaya 3
kota surakarta 2
kota tangerang 10
kota tasikmalaya 2
no data 44
I want to visualize the output into pie chart, but since the x labels ('kodya / kab') have a lot of different unique values, the xlabels are overlapping. So, I want to try using explode to visualize the pie chart (donut chart).
I tried using this code:
`#colors
colors = sns.color_palette('husl')
#explosion
explode = (0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05)
plt.pie(zonasi, colors = colors, autopct='%.2f%%', startangle = 90, pctdistance = 0.85, explode = explode)
#draw circle
centre_circle = plt.Circle((0, 0), 0.70,fc = 'white')
fig = plt.gcf()
fig.gca().add_artist(centre_circle)
#Equal aspect ratio ensures that pie is drawn as a circle
ax.axis('equal')
plt.tight_layout()
plt.show()`
but it returns this error:
'explode' must be of length 'x'
The thing is, I want to use the visualization code to different dataframe, so the xlabels will be different from one another. How can I define the explode variable so it can adjust to the xlabels automatically?
This is the example of what my output will look like:
Thank you in advance for the help.
You could do this:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
data = {'kodya / kab': ['kab bandung', 'kab bandung barat', 'kab banyumas', 'kab batang', 'kab bekasi', 'kab bogor', 'kab kudus', 'kab tangerang', 'kab tegal', 'kota adm jakarta barat', 'kota adm jakarta pusat', 'kota adm jakarta selatan', 'kota adm jakarta timur', 'kota adm jakarta utara', 'kota balikpapan', 'kota bandung', 'kota bekasi', 'kota semarang', 'kota surabaya', 'kota surakarta', 'kota tangerang', 'kota tasikmalaya', 'no data'],
'customer': [1, 4, 2, 1, 29, 13, 11, 15, 2, 14, 6, 10, 23, 9, 1, 12, 12, 11, 3, 2, 10, 2, 44]}
zonasi = pd.DataFrame(data)
zonasi.set_index('kodya / kab', inplace=True) # set the index to 'kodya / kab'
colors = sns.color_palette('husl')
explode = np.zeros(len(zonasi))
explode[1:5] = 0.1
zonasi.plot.pie(y='customer', colors=colors, autopct='%.2f%%', startangle=90, pctdistance=0.85, explode=explode,legend=False)
centre_circle = plt.Circle((0, 0), 0.70, fc='white')
fig = plt.gcf()
fig.gca().add_artist(centre_circle)
plt.axis('equal')
plt.tight_layout()
plt.show()
which gives
Related
I need to group/merge each city of the same name and calculate its overall percentage, to see which city amongst them has the lowest % literacy rate.
Code:
Python
import pandas as pd
df = pd.DataFrame({'Cities': ["Cape Town", "Cape Town", "Cape Town", "Tokyo", "Cape Town", "Tokyo", "Mumbai", "Belgium", "Belgium" ],
'LiteracyRate': [0.05, 0.35, 0.2, 0.11, 0.15, 0.2, 0.65, 0.35, 0.45]})
print(df)
For example:
Cities LiteracyRate
0 Cape Town 0.05
1 Cape Town 0.35
2 Cape Town 0.2
3 Tokyo 0.11
4 Cape Town 0.15
5 Tokyo 0.2
6 Mumbai 0.65
7 Belgium 0.35
8 Belgium 0.45
I'm expecting this:
Cities LiteracyRate %LiteracyRate
0 Cape Town 0.75 75
1 Tokyo 0.31 31
2 Mumbai 0.65 65
3 Belgium 0.8 80
So I tried this code below but it's not giving me desirable results, the countries with similar names are still not merged. And the percentages ain't right.
# Calculate the percentage
df["%LiteracyRate"] = (df["LiteracyRate"]/df["LiteracyRate"].sum())*100
# Show the DataFrame
print(df)
You can use groupby() in pandas, to join cities with the same names and sum() to calculate %
df = df.groupby('Cities').sum()
Than you can format results using
df['%LiteracyRate'] = (df['LiteracyRate']*100).round().astype(int)
df = df.reset_index()
To sort them by literacy rate you can
df = df.sort_values(by='%LiteracyRate')
df = df.reset_index()
Hope this helps!
I have a dataframe like so:
switch_summary duration count
McDonalds -> Arbys -> McDonalds 0.067 1
Wendys -> Popeyes -> McDonalds -> KFC 0.293 1
Arbys -> Wendys -> Popeyes -> McDonalds 0.542 2
Arbys -> McDonalds -> KFC 1.075 1
KFC -> Arbys -> Wendys -> Popeyes 2.123 3
KFC -> Wendys -> Popeyes -> Arbys 2.297 1
I want to create a seaborn plot to visualize both the duration and the count.
I have the following code:
plt.figure(figsize = [15,7])
ax = (sns.barplot(x = 'duration',
y = 'switch_summary',
palette = sns.color_palette('winter', 10),
data = df))
for p in ax.patches:
width = p.get_width()
plt.text( 0.1 + p.get_width(), p.get_y() + 0.55 * p.get_height(),
'~{:1.2f}\n days'.format(width),
ha = 'center', va = 'center', color = 'black', weight = 'bold')
ax = ax.set(title = 'Top 10 Fastest Trends',
xlabel = 'Total Duration in Trend',
ylabel = 'Restaurant Trend')
This code will display the duration but I also want to display the count.
How can I display the count in the plt.text() portion of the code?
One quick workaround will be to enumerate the patches, get the count using the index.
import pandas as pd
data = {'switch_summary': ['McDonalds -> Arbys -> McDonalds ', 'Wendys -> Popeyes -> McDonalds -> KFC', 'Arbys -> Wendys -> Popeyes -> McDonalds ', 'Arbys -> McDonalds -> KFC', 'KFC -> Arbys -> Wendys -> Popeyes', 'KFC -> Wendys -> Popeyes -> Arbys '],
'duration': [0.067, 0.293, 0.542, 1.075, 2.123, 2.297],
'count': [1, 1, 2,1,3,1]
}
df = pd.DataFrame(data)
plt.figure(figsize = [15,7])
ax = (sns.barplot(x = 'duration',
y = 'switch_summary',
palette = sns.color_palette('winter', 10),
data = df))
for i,p in enumerate(ax.patches):
width = p.get_width()
plt.text( 0.1 + p.get_width(), p.get_y() + 0.55 * p.get_height(),
'~{:1.2f}\n days \n count - {}'.format(width,df['count'][i]),
ha = 'center', va = 'center', color = 'black', weight = 'bold')
ax = ax.set(title = 'Top 10 Fastest Trends',
xlabel = 'Total Duration in Trend',
ylabel = 'Restaurant Trend')
I need to get two line charts & bar chart in same diagram. Both line charts have the same y axis,
but the bar chart has a different axis.
Table format
Indicator Name 2011 2012 2013 2014 2015 2016 2017 2018 2019
Bank nonperforming loans to total gross loans (%) 3.8 3.6 5.5 4.2 3.2 2.6 2.4 3.4 4.2
Bank nonperforming loans to total net loans (%) 3 2.2 3.8 2.6 1.2 1.7 1.3 2.2 2.5
Bank Total gross loans ( LK Bn) 99 116.6 191 165 152.8 142.3 160.7 263.1 275.5
This is my code:
df.loc['Bank nonperforming loans to total gross loans (%)', years].plot(kind = 'line',color='mediumvioletred',marker ='o',
markerfacecolor ='blue',markersize=9,label = "NPL %")
df.loc['Bank nonperforming loans to total net loans (%)', years].plot(kind = 'line',color='blue',label = "SL")
plt.twinx()
df.loc['Bank Total gross loans (LK Bn)', years].plot(kind = 'Bar',color='brown',label = "chk")
plt.ylim([90,280])
plt.title('Immigration from Afghanistan')
plt.ylabel('NPL %')
plt.xlabel('years')
plt.legend()
Below is the graph that I get, but it doesn't show the bar graph.
Your case requires more control than DataFrame.plot may provide. You need to define order of your plots: lines should on the top of bar plot, but in addtion you bar is on twinned axis, this creates another problem. Here is solution for your problem mainly based on this answer.
Code:
import pandas as pd
import matplotlib.pyplot as plt
data = {
2011: [3.8, 3, 99],
2012: [3.6, 2.2, 116.6],
2013: [5.5, 3.8, 191],
2014: [4.2, 2.6, 165],
2015: [3.2, 1.2, 152.8],
2016: [2.6, 1.7, 142.3],
2017: [2.4, 1.3, 160.7],
2018: [3.4, 2.2, 263.1],
2019: [4.2, 2.5, 275.5],
}
df = pd.DataFrame(
data,
index=['Bank nonperforming loans to total gross loans (%)',
'Bank nonperforming loans to total net loans (%)',
'Bank Total gross loans (LK Bn)'],
columns=data.keys()
)
years = list(data.keys())
fig, ax = plt.subplots()
# axis for the bar
ax2 = ax.twinx()
ax2.set_ylim([90, 280])
# creating a cloned axis from ax for lines
# wee need this to put line plots on bars
ax1 = fig.add_axes(ax.get_position(), sharex=ax, sharey=ax)
ax1.set_facecolor('none')
ax1.set_axis_off()
ax1.set_xticks(years)
bar = ax2.bar(
years,
df.loc['Bank Total gross loans (LK Bn)'], color='brown',
label='chk',
)
line1, = ax1.plot(
df.loc['Bank nonperforming loans to total gross loans (%)', years],
color='mediumvioletred',
marker='o',
markerfacecolor='blue',
markersize=9,
label='NPL %',
)
line2, = ax1.plot(
df.loc['Bank nonperforming loans to total net loans (%)', years],
color='blue',
label='SL',
)
ax.set_title('Immigration from Afghanistan')
ax.set_ylabel('NPL %')
ax.set_xlabel('years')
ax2.legend([bar, line1, line2], ['chk', 'NPL %', 'SL'])
Plot:
I am trying to make a plot figure of happiness degree in 30 different countries from the year 2012 to 2018, some years are missing happiness degree value.
the arrays are Happiness, Year and Country.
I want the y axis to be the happiness degree, the x axis to be the Years and the Y to be the country(each country is marked by a number from 1-30), so that there will be a color connecting all the different degrees through the years from each country.
The shape of each array is (210,). Here is my code:
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.tri as tri
sns.set(style="white")
x=Year
y=Hapiness
z=country
fig = plt.figure(figsize=(50, 50))
ax = fig.add_subplot(111)
nptsx, nptsy = 100, 100
xg, yg = np.meshgrid(np.linspace(x.min(), x.max(), nptsx),
np.linspace(y.min(), y.max(), nptsy))
triangles = tri.Triangulation(x, y)
tri_interp = tri.CubicTriInterpolator(triangles, z)
zg = tri_interp(xg, yg)
# change levels here according to your data
levels = np.linspace(1, 210, 30)
colormap = ax.contourf(xg, yg, zg, levels,
cmap=plt.cm.Blues,
norm=plt.Normalize(vmax=z.max(), vmin=z.min()))
# plot data points
ax.plot(x, y, color="#444444", marker="o", linestyle="", markersize=15)
# add a colorbar
fig.colorbar(colormap,
orientation='vertical', # horizontal colour bar
shrink=0.85)
# graph extras: look at xlim and ylim
ax.set_xlim((2012, 2018))
ax.set_ylim((0, 10))
ax.set_aspect("equal", "box")
plt.show()
Here is the error I get when I run the code:
RuntimeError Traceback (most recent call
last)
<ipython-input-65-2779759126bf> in <module>
16 np.linspace(y.min(), y.max(), nptsy))
17
---> 18 triangles = tri.Triangulation(x, y)
19 tri_interp = tri.CubicTriInterpolator(triangles, z)
20 zg = tri_interp(xg, yg)
C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\tri\triangulation.py in __init__(self, x, y,
triangles, mask)
52 # No triangulation specified, so use
matplotlib._qhull to obtain
53 # Delaunay triangulation.
---> 54 self.triangles, self._neighbors =_qhull.delaunay(x,y)
55 self.is_delaunay = True
56 else:
RuntimeError: Error in qhull Delaunay triangulation calculation: input inconsistency (exitcode=1); use python verbose option (-v) to see original qhull error.
A sample of my data (.CSV file):
Entity Code Year World Happiness Report(Cantril Ladder(0=worst; 10=best))
Argentina 1 2012 6.4
Argentina 1 2013 6.5
Argentina 1 2014 6.6
Argentina 1 2015 6.6
Argentina 1 2016 6.4
Argentina 1 2017 6.0
Argentina 1 2018 5.7
Australia 2 2012 7.1
Australia 2 2013 7.3
Australia 2 2014 7.2
Australia 2 2015 7.3
Australia 2 2016 7.2
Australia 2 2017 7.2
Australia 2 2018 7.1
Brazil 3 2012 6.6
Brazil 3 2013 7.1
Brazil 3 2014 6.9
Brazil 3 2015 6.5
Brazil 3 2016 6.3
Brazil 3 2017 6.3
Brazil 3 2018 6.1
your error is in the way you read your csv file, without assigning a type to your data it will be treated as a String.
try this:
import pandas as pd
import numpy as np
df = pd.read_csv("untitled.csv", delimiter=',')
x=np.array(df['Year'].values, dtype='float')
y=np.array(df['Happiness'].values, dtype='float')
z=np.array(df['Code'].values, dtype='int')
I have a dictionary of values:
{'Spanish Omlette': -0.20000000000000284,
'Crumbed Chicken Salad': -1.2999999999999972,
'Chocolate Bomb': 0.0,
'Seed Nut Muesli': -3.8999999999999915,
'Fruit': -1.2999999999999972,
'Frikerdels Salad': -1.2000000000000028,
'Seed Nut Cheese Biscuits': 0.4000000000000057,
'Chorizo Pasta': -2.0,
'No carbs Ice Cream': 0.4000000000000057,
'Veg Stew': 0.4000000000000057,
'Bulgar spinach Salad': 0.10000000000000853,
'Mango Cheese': 0.10000000000000853,
'Crumbed Calamari chips': 0.10000000000000853,
'Slaw Salad': 0.20000000000000284,
'Mango': -1.2000000000000028,
'Rice & Fish': 0.20000000000000284,
'Almonds Cheese': -0.09999999999999432,
'Nectarine': -1.7000000000000028,
'Banana Cheese': 0.7000000000000028,
'Mediteranean Salad': 0.7000000000000028,
'Almonds': -4.099999999999994}
I am trying to get the aggregated sum of the values of each food item from the dictionary using Pandas:
fooddata = pd.DataFrame(list(foodWeight.items()), columns=['food','weight']).groupby('food')['weight'].agg(['sum']).sort_values(by='sum', ascending=0)
The above code gives the the correct output:
sum
food
Banana Cheese 0.7
Mediteranean Salad 0.7
Seed Nut Cheese Biscuits 0.4
Veg Stew 0.4
No carbs Ice Cream 0.4
Slaw Salad 0.2
Rice & Fish 0.2
Almonds Mango 0.1
Bulgar spinach Salad 0.1
Crumbed Calamari chips 0.1
Frikkadels Salad 0.1
Mango Cheese 0.1
Chocolate Bomb 0.0
Burrito Salad 0.0
Fried Eggs Cheese Avocado 0.0
Burger and Chips -0.1
Traditional Breakfast -0.1
Almonds Cheese -0.1
However, I need to get the output in 2 columns not one which Pandas is giving me above.
How do I get the output into a format that I can plot the data. I.E Label and Value as separate values
Set as_index=False while calling group by
fooddata = pd.DataFrame(list(foodWeight.items()), columns=['food','weight']).groupby('food',as_index=False).agg({"weight":"sum"}).sort_values(by='weight', ascending=0)
You can use parameter as_index=False in groupby and aggregate sum:
fooddata = pd.DataFrame(list(foodWeight.items()), columns=['food','weight'])
print (fooddata.groupby('food', as_index=False)['weight']
.sum()
.sort_values(by='weight', ascending=0))
food weight
2 Banana Cheese 0.7
12 Mediteranean Salad 0.7
20 Veg Stew 0.4
14 No carbs Ice Cream 0.4
16 Seed Nut Cheese Biscuits 0.4
18 Slaw Salad 0.2
15 Rice & Fish 0.2
3 Bulgar spinach Salad 0.1
6 Crumbed Calamari chips 0.1
11 Mango Cheese 0.1
4 Chocolate Bomb 0.0
1 Almonds Cheese -0.1
19 Spanish Omlette -0.2
10 Mango -1.2
8 Frikerdels Salad -1.2
9 Fruit -1.3
7 Crumbed Chicken Salad -1.3
13 Nectarine -1.7
5 Chorizo Pasta -2.0
17 Seed Nut Muesli -3.9
0 Almonds -4.1
Another solution is add reset_index:
print (fooddata.groupby('food')['weight']
.sum()
.sort_values(ascending=0)
.reset_index(name='sum'))
food sum
0 Banana Cheese 0.7
1 Mediteranean Salad 0.7
2 Veg Stew 0.4
3 Seed Nut Cheese Biscuits 0.4
4 No carbs Ice Cream 0.4
5 Slaw Salad 0.2
6 Rice & Fish 0.2
7 Crumbed Calamari chips 0.1
8 Mango Cheese 0.1
9 Bulgar spinach Salad 0.1
10 Chocolate Bomb 0.0
11 Almonds Cheese -0.1
12 Spanish Omlette -0.2
13 Mango -1.2
14 Frikerdels Salad -1.2
15 Crumbed Chicken Salad -1.3
16 Fruit -1.3
17 Nectarine -1.7
18 Chorizo Pasta -2.0
19 Seed Nut Muesli -3.9
20 Almonds -4.1
For plotting is better not reset index - then values of index create axis x - use plot:
fooddata.groupby('food')['weight'].sum().sort_values(ascending=0).plot()
Or if need plot barh:
fooddata.groupby('food')['weight'].sum().sort_values(ascending=0).plot.barh()
After the grouping you need to reset the index or use as_index=False when calling groupby. Paraphrasing this post, by default aggregation functions will not return the groups that you are aggregating over if they are named columns. Instead the grouped columns will be the indices of the returned object. Passing as_index=False or calling reset_index afterwards, will return the groups that you are aggregating over, if they are named columns.
See below my attempt to turn your results in a meaningful graph:
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
df = fooddata.reset_index()
ax = df[['food','sum']].plot(kind='barh', title ="Total Sum per Food Item", figsize=(15, 10), legend=True, fontsize=12)
ax.set_xlabel("Sum per Food Item", fontsize=12)
ax.set_ylabel("Food Items", fontsize=12)
ax.set_yticklabels(df['food'])
plt.show()
This results in