I need to get two line charts & bar chart in same diagram. Both line charts have the same y axis,
but the bar chart has a different axis.
Table format
Indicator Name 2011 2012 2013 2014 2015 2016 2017 2018 2019
Bank nonperforming loans to total gross loans (%) 3.8 3.6 5.5 4.2 3.2 2.6 2.4 3.4 4.2
Bank nonperforming loans to total net loans (%) 3 2.2 3.8 2.6 1.2 1.7 1.3 2.2 2.5
Bank Total gross loans ( LK Bn) 99 116.6 191 165 152.8 142.3 160.7 263.1 275.5
This is my code:
df.loc['Bank nonperforming loans to total gross loans (%)', years].plot(kind = 'line',color='mediumvioletred',marker ='o',
markerfacecolor ='blue',markersize=9,label = "NPL %")
df.loc['Bank nonperforming loans to total net loans (%)', years].plot(kind = 'line',color='blue',label = "SL")
plt.twinx()
df.loc['Bank Total gross loans (LK Bn)', years].plot(kind = 'Bar',color='brown',label = "chk")
plt.ylim([90,280])
plt.title('Immigration from Afghanistan')
plt.ylabel('NPL %')
plt.xlabel('years')
plt.legend()
Below is the graph that I get, but it doesn't show the bar graph.
Your case requires more control than DataFrame.plot may provide. You need to define order of your plots: lines should on the top of bar plot, but in addtion you bar is on twinned axis, this creates another problem. Here is solution for your problem mainly based on this answer.
Code:
import pandas as pd
import matplotlib.pyplot as plt
data = {
2011: [3.8, 3, 99],
2012: [3.6, 2.2, 116.6],
2013: [5.5, 3.8, 191],
2014: [4.2, 2.6, 165],
2015: [3.2, 1.2, 152.8],
2016: [2.6, 1.7, 142.3],
2017: [2.4, 1.3, 160.7],
2018: [3.4, 2.2, 263.1],
2019: [4.2, 2.5, 275.5],
}
df = pd.DataFrame(
data,
index=['Bank nonperforming loans to total gross loans (%)',
'Bank nonperforming loans to total net loans (%)',
'Bank Total gross loans (LK Bn)'],
columns=data.keys()
)
years = list(data.keys())
fig, ax = plt.subplots()
# axis for the bar
ax2 = ax.twinx()
ax2.set_ylim([90, 280])
# creating a cloned axis from ax for lines
# wee need this to put line plots on bars
ax1 = fig.add_axes(ax.get_position(), sharex=ax, sharey=ax)
ax1.set_facecolor('none')
ax1.set_axis_off()
ax1.set_xticks(years)
bar = ax2.bar(
years,
df.loc['Bank Total gross loans (LK Bn)'], color='brown',
label='chk',
)
line1, = ax1.plot(
df.loc['Bank nonperforming loans to total gross loans (%)', years],
color='mediumvioletred',
marker='o',
markerfacecolor='blue',
markersize=9,
label='NPL %',
)
line2, = ax1.plot(
df.loc['Bank nonperforming loans to total net loans (%)', years],
color='blue',
label='SL',
)
ax.set_title('Immigration from Afghanistan')
ax.set_ylabel('NPL %')
ax.set_xlabel('years')
ax2.legend([bar, line1, line2], ['chk', 'NPL %', 'SL'])
Plot:
Related
I have a dataframe that will be visualized. This is the code to obtain that dataframe:
zonasi = (df.groupby('kodya / kab')['customer'].nunique()) zonasi
this is the output from the code above:
kab bandung 1
kab bandung barat 4
kab banyumas 2
kab batang 1
kab bekasi 29
kab bogor 13
kab kudus 11
kab tangerang 15
kab tegal 2
kota adm jakarta barat 14
kota adm jakarta pusat 6
kota adm jakarta selatan 10
kota adm jakarta timur 23
kota adm jakarta utara 9
kota balikpapan 1
kota bandung 12
kota bekasi 12
kota semarang 11
kota surabaya 3
kota surakarta 2
kota tangerang 10
kota tasikmalaya 2
no data 44
I want to visualize the output into pie chart, but since the x labels ('kodya / kab') have a lot of different unique values, the xlabels are overlapping. So, I want to try using explode to visualize the pie chart (donut chart).
I tried using this code:
`#colors
colors = sns.color_palette('husl')
#explosion
explode = (0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05)
plt.pie(zonasi, colors = colors, autopct='%.2f%%', startangle = 90, pctdistance = 0.85, explode = explode)
#draw circle
centre_circle = plt.Circle((0, 0), 0.70,fc = 'white')
fig = plt.gcf()
fig.gca().add_artist(centre_circle)
#Equal aspect ratio ensures that pie is drawn as a circle
ax.axis('equal')
plt.tight_layout()
plt.show()`
but it returns this error:
'explode' must be of length 'x'
The thing is, I want to use the visualization code to different dataframe, so the xlabels will be different from one another. How can I define the explode variable so it can adjust to the xlabels automatically?
This is the example of what my output will look like:
Thank you in advance for the help.
You could do this:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
data = {'kodya / kab': ['kab bandung', 'kab bandung barat', 'kab banyumas', 'kab batang', 'kab bekasi', 'kab bogor', 'kab kudus', 'kab tangerang', 'kab tegal', 'kota adm jakarta barat', 'kota adm jakarta pusat', 'kota adm jakarta selatan', 'kota adm jakarta timur', 'kota adm jakarta utara', 'kota balikpapan', 'kota bandung', 'kota bekasi', 'kota semarang', 'kota surabaya', 'kota surakarta', 'kota tangerang', 'kota tasikmalaya', 'no data'],
'customer': [1, 4, 2, 1, 29, 13, 11, 15, 2, 14, 6, 10, 23, 9, 1, 12, 12, 11, 3, 2, 10, 2, 44]}
zonasi = pd.DataFrame(data)
zonasi.set_index('kodya / kab', inplace=True) # set the index to 'kodya / kab'
colors = sns.color_palette('husl')
explode = np.zeros(len(zonasi))
explode[1:5] = 0.1
zonasi.plot.pie(y='customer', colors=colors, autopct='%.2f%%', startangle=90, pctdistance=0.85, explode=explode,legend=False)
centre_circle = plt.Circle((0, 0), 0.70, fc='white')
fig = plt.gcf()
fig.gca().add_artist(centre_circle)
plt.axis('equal')
plt.tight_layout()
plt.show()
which gives
I have a large data set containing years, NFL teams, their total salaries paid out for that year, and more misc stats. I want to create a plot that has the years on the x-axis, total salaries on the y and then has multiple lines, one for each team.
The data I want to plot looks something like this except there are of course way more teams and years and the total salaries are accurate:
Year
Team
Salaries
2015
Miami
$100
2015
Denver
$150
2015
LA
$125
2016
Miami
$125
2016
Denver
$100
2016
LA
$100
I know pandas plot function and I can set the x-axis but when I set y to be total salaries it just gives me a single line. I also do not know how to set it to break up the data by each team so each team is treated as a separate line.
You want to use a pivot table to get a new column per team.
Once you've got the data reshaped like this, plotting is easy. Check out the documentation on pivot tables.
import pandas as pd
df = pd.DataFrame(
{
"Year": ["2015", "2016", "2017", "2018"] * 6,
"Team": ["Miami", "Denver", "LA"] * 8,
"Salaries": [100, 200, 150, 125, 100, 250] * 4,
}
)
df.pivot_table(values="Salaries",index="Year",columns="Team").plot()
The result of the pivot table looks like this
Team Denver LA Miami
Year
2015 100 150 100
2016 200 250 125
2017 100 150 100
2018 200 250 125
And the plot:
Alternative via seaborn:
import seaborn as sns
import pandas as pd
df = pd.DataFrame(
{
"Year": ["2015", "2016", "2017", "2018"] * 6,
"Team": ["Miami", "Denver", "LA"] * 8,
"Salaries": [100, 200, 150, 125, 100, 250] * 4,
}
)
sns.lineplot(x='Year', y='Salaries', hue='Team', data=df)
OUTPUT:
NOTE: Thanks to #Cornelius Roemer for the model data.
Say I have the following dataframes:
Earthquakes:
latitude longitude place year
0 36.087000 -106.168000 New Mexico 1973
1 33.917000 -90.775000 Mississippi 1973
2 37.160000 -104.594000 Colorado 1973
3 37.148000 -104.571000 Colorado 1973
4 36.500000 -100.693000 Oklahoma 1974
… … … … …
13941 36.373500 -96.818700 Oklahoma 2016
13942 36.412200 -96.882400 Oklahoma 2016
13943 37.277167 -98.072667 Kansas 2016
13944 36.939300 -97.896000 Oklahoma 2016
13945 36.940500 -97.906300 Oklahoma 2016
and Wells:
LAT LONG BBLS Year
0 36.900324 -98.218260 300.0 1977
1 36.896636 -98.177720 1000.0 2002
2 36.806113 -98.325840 1000.0 1988
3 36.888589 -98.318530 1000.0 1985
4 36.892128 -98.194620 2400.0 2002
… … … … …
11117 36.263285 -99.557631 1000.0 2007
11118 36.263220 -99.548647 1000.0 2007
11119 36.520160 -99.334183 19999.0 2016
11120 36.276728 -99.298563 19999.0 2016
11121 36.436857 -99.137391 60000.0 2012
How do I manage to make a line graph showing the number of BBLS per year (from Wells), and the number of Earthquakes that occurred in a year (from Earthquakes), where the x-axis shows the year since 1980 and the y1-axis shows the sum of BBLS per year, while y2-axis shows the number of earthquakes.
I believe I need to make a groupby, count(for earthquakes) and sum(for BBLS) in order to make the plot but I really tried so many codings and I just don't get how to do it.
The only one that kinda worked was the line graph for earthquakes as follows:
Earthquakes.pivot_table(index=['year'],columns='type',aggfunc='size').plot(kind='line')
Still, for the line graph for BBLS nothing has worked
Wells.pivot_table(index=['Year'],columns='BBLS',aggfunc='count').plot(kind='line')
This one either:
plt.plot(Wells['Year'].values, Wells['BBL'].values, label='Barrels Produced')
plt.legend() # Plot legends (the two labels)
plt.xlabel('Year') # Set x-axis text
plt.ylabel('Earthquakes') # Set y-axis text
plt.show() # Display plot
This one from another thread either:
fig, ax = plt.subplots(figsize=(10,8))
Earthquakes.plot(ax = ax, marker='v')
ax.title.set_text('Earthquakes and Injection Wells')
ax.set_ylabel('Earthquakes')
ax.set_xlabel('Year')
ax.set_xticks(Earthquakes['year'])
ax2=ax.twinx()
ax2.plot(Wells.Year, Wells.BBL, color='c',
linewidth=2.0, label='Number of Barrels', marker='o')
ax2.set_ylabel('Annual Number of Barrels')
lines_1, labels_1 = ax.get_legend_handles_labels()
lines_2, labels_2 = ax2.get_legend_handles_labels()
lines = lines_1 + lines_2
labels = labels_1 + labels_2
ax.legend(lines, labels, loc='upper center')
Input data:
>>> df2 # Earthquakes
year
0 2007
1 1974
2 1979
3 1992
4 2006
.. ...
495 2002
496 2011
497 1971
498 1977
499 1985
[500 rows x 1 columns]
>>> df1 # Wells
BBLS year
0 16655 1997
1 7740 1998
2 37277 2000
3 20195 2014
4 11882 2018
.. ... ...
495 30832 1981
496 24770 2018
497 14949 1980
498 24743 1975
499 46933 2019
[500 rows x 2 columns]
Prepare data to plot:
data1 = df1.value_counts("year").sort_index().rename("Earthquakes")
data2 = df2.groupby("year")["BBLS"].sum()
Simple plot:
ax1 = data1.plot(legend=data1.name, color="blue")
ax2 = data2.plot(legend=data2.name, color="red", ax=ax1.twinx())
Now, you can do whatever with the 2 axes.
A more controlled chart
# Figure and axis
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
# Data
line1, = ax1.plot(data1.index, data1.values, label="Earthquakes", color="b")
line2, = ax2.plot(data2.index, data2.values / 10**6, label="Barrels", color="r")
# Legend
lines = [line1, line2]
ax1.legend(lines, [line.get_label() for line in lines])
# Titles
ax1.set_title("")
ax1.set_xlabel("Year")
ax1.set_ylabel("Earthquakes")
ax2.set_ylabel("Barrels Produced (MMbbl)")
This question already has an answer here:
Plotly: How to define colors in a figure using Plotly Graph Objects and Plotly Express?
(1 answer)
Closed 1 year ago.
I have a few lines of code that creates a great bar chart.
import plotly.express as px
fig = px.bar(df, x="site_name", y="value", color="variable", title="2019 & 2020 Revenue vs. Expenses")
fig.show()
Here is some sample data.
# Import pandas library
import pandas as pd
# initialize list of lists
data = [['14', '20 Hudson Yards_IB', 'revenue', '2651324.42'],
['5', '5 Camden Yards', 'revenue', '2772115.61'],
['12', 'PHO_SUNS', 'revenue', '2818249.28'],
['87', 'OSUMC--MAIN', 'expense', '4300628.21'],
['7', 'DA STADIUM', 'expense', '6452307.48'],
['99', '50 Midtown East', 'expense', '3534523.12']]
# Create the pandas DataFrame
df_result = pd.DataFrame(data, columns = ['index', 'site_name', 'variable', 'value'])
# print dataframe.
df_result
Result:
index site_name variable value
0 14 20 Hudson Yards_IB revenue 2651324.42
1 5 5 Camden Yards revenue 2772115.61
2 12 PHO_SUNS revenue 2818249.28
3 87 OSUMC--MAIN expense 4300628.21
4 7 DA STADIUM expense 6452307.48
5 99 50 Midtown East expense 3534523.12
The chart is almost exactly what I want, but I'm trying to get two specific colors for two variables. One variable is 'revenue' and one variable is 'expenses'. Now, with the default code shown above, my chart is red for revenue and blue for expenses. What I really want to do is make the 'revenue' green and 'expenses' red. How can I control this?
Use color_discrete_sequence and to specify order, use category_orders
fig = px.bar(df, x="site_name", y="value", color="variable",
color_discrete_sequence=["green","red"],
category_orders={"variable":["Revenue","Expenses"]},
title="2019 & 2020 Revenue vs. Expenses")
or use color_discrete_map
fig = px.bar(df, x="site_name", y="value", color="variable",
color_discrete_map={"Revenue": "green", "Expenses":"red"},
category_orders={"variable":["Revenue","Expenses"]},
title="2019 & 2020 Revenue vs. Expenses")
I am trying to make a plot figure of happiness degree in 30 different countries from the year 2012 to 2018, some years are missing happiness degree value.
the arrays are Happiness, Year and Country.
I want the y axis to be the happiness degree, the x axis to be the Years and the Y to be the country(each country is marked by a number from 1-30), so that there will be a color connecting all the different degrees through the years from each country.
The shape of each array is (210,). Here is my code:
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.tri as tri
sns.set(style="white")
x=Year
y=Hapiness
z=country
fig = plt.figure(figsize=(50, 50))
ax = fig.add_subplot(111)
nptsx, nptsy = 100, 100
xg, yg = np.meshgrid(np.linspace(x.min(), x.max(), nptsx),
np.linspace(y.min(), y.max(), nptsy))
triangles = tri.Triangulation(x, y)
tri_interp = tri.CubicTriInterpolator(triangles, z)
zg = tri_interp(xg, yg)
# change levels here according to your data
levels = np.linspace(1, 210, 30)
colormap = ax.contourf(xg, yg, zg, levels,
cmap=plt.cm.Blues,
norm=plt.Normalize(vmax=z.max(), vmin=z.min()))
# plot data points
ax.plot(x, y, color="#444444", marker="o", linestyle="", markersize=15)
# add a colorbar
fig.colorbar(colormap,
orientation='vertical', # horizontal colour bar
shrink=0.85)
# graph extras: look at xlim and ylim
ax.set_xlim((2012, 2018))
ax.set_ylim((0, 10))
ax.set_aspect("equal", "box")
plt.show()
Here is the error I get when I run the code:
RuntimeError Traceback (most recent call
last)
<ipython-input-65-2779759126bf> in <module>
16 np.linspace(y.min(), y.max(), nptsy))
17
---> 18 triangles = tri.Triangulation(x, y)
19 tri_interp = tri.CubicTriInterpolator(triangles, z)
20 zg = tri_interp(xg, yg)
C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\tri\triangulation.py in __init__(self, x, y,
triangles, mask)
52 # No triangulation specified, so use
matplotlib._qhull to obtain
53 # Delaunay triangulation.
---> 54 self.triangles, self._neighbors =_qhull.delaunay(x,y)
55 self.is_delaunay = True
56 else:
RuntimeError: Error in qhull Delaunay triangulation calculation: input inconsistency (exitcode=1); use python verbose option (-v) to see original qhull error.
A sample of my data (.CSV file):
Entity Code Year World Happiness Report(Cantril Ladder(0=worst; 10=best))
Argentina 1 2012 6.4
Argentina 1 2013 6.5
Argentina 1 2014 6.6
Argentina 1 2015 6.6
Argentina 1 2016 6.4
Argentina 1 2017 6.0
Argentina 1 2018 5.7
Australia 2 2012 7.1
Australia 2 2013 7.3
Australia 2 2014 7.2
Australia 2 2015 7.3
Australia 2 2016 7.2
Australia 2 2017 7.2
Australia 2 2018 7.1
Brazil 3 2012 6.6
Brazil 3 2013 7.1
Brazil 3 2014 6.9
Brazil 3 2015 6.5
Brazil 3 2016 6.3
Brazil 3 2017 6.3
Brazil 3 2018 6.1
your error is in the way you read your csv file, without assigning a type to your data it will be treated as a String.
try this:
import pandas as pd
import numpy as np
df = pd.read_csv("untitled.csv", delimiter=',')
x=np.array(df['Year'].values, dtype='float')
y=np.array(df['Happiness'].values, dtype='float')
z=np.array(df['Code'].values, dtype='int')