I have a dataframe df, which has many columns. In df["house_electricity"], there are values like 1,0 or blank/NA. I want to plot the column in terms of a pie chart, where percentage of only 1 and 0 will be shown. Similarly I want to plot another pie chart where percentage of 1,0 and blank/N.A all will be there.
customer_id
house_electricity
house_refrigerator
cid01
0
0
cid02
1
na
cid03
1
cid04
1
cid05
na
0
#I wrote the following but it didnt give my my expected result
import pandas as pd
import matplotlib.pyplot as plt
df=pd.read_csv("my_file.csv")
df_col=df.columns
df["house_electricity"].plot(kind="pie")
#I wrote the following but it didnt give my my expected result
import pandas as pd
import matplotlib.pyplot as plt
df=pd.read_csv("my_file.csv")
df_col=df.columns
df["house_electricity"].plot(kind="pie")
For a dataframe
df = pd.DataFrame({'a':[1,0,np.nan,1,1,1,'',0,0,np.nan]})
df
a
0 1
1 0
2 NaN
3 1
4 1
5 1
6
7 0
8 0
9 NaN
The code below will give
df["a"].value_counts(dropna=False).plot(kind="pie")
If you want combine na and empty value, try replacing empty values with np.nan, then try to plot
df["a"].replace("", np.nan).value_counts(dropna=False).plot(kind="pie")
For solution you need to try with this code to generate 3 blocks.
import pandas as pd
import matplotlib.pyplot as plt
data = {'customer_id': ['cid01', 'cid02', 'cid03', 'cid04', 'cid05'],
'house_electricity': [0, 1, None, 1, None],
'house_refrigerator': [0, None, 1, None, 0]}
df = pd.DataFrame(data)
counts = df['house_electricity'].value_counts(dropna=False)
counts.plot.pie(autopct='%1.1f%%', labels=['0', '1', 'NaN'], shadow=True)
plt.title('Percentage distribution of house_electricity column')
plt.axis('equal')
plt.show()
Result:
Related
I would like to plot a heatmap from a csv file which contains pixels position. This csv file has this shape:
0 0 8.400000e+01
1 0 8.500000e+01
2 0 8.700000e+01
3 0 8.500000e+01
4 0 9.400000e+01
5 0 7.700000e+01
6 0 8.000000e+01
7 0 8.300000e+01
8 0 8.900000e+01
9 0 8.500000e+01
10 0 8.300000e+01
I try to write some lines in Python, but it returns me an error. I guess it is the format of column 3 which contains string. Is there any way to plot this kind of file?
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
path_to_csv= "/run/media/test.txt"
df= pd.read_csv(path_to_csv ,sep='\t')
plt.imshow(df,cmap='hot',interpolation='nearest')
plt.show(df)
I tried also seaborn but with no success.
Here the error returned:
TypeError: Image data of dtype object cannot be converted to float
You can set dtype=float as a keyword argument of pandas.read_csv :
df = pd.read_csv(path_to_csv, sep='\t', dtype=float)
Or use pandas.DataFrame.astype :
plt.imshow(df.astype(float), cmap='hot', interpolation='nearest', aspect='auto')
plt.show()
# Output :
Imagine I have the following dataframes
import pandas as pd
import seaborn as sns
import numpy as np
d = {'val': [1, 2,3,4], 'a': [1, 1, 2, 2]}
d2 = {'val': [1, 2], 'a': [1, 2]}
df = pd.DataFrame(data=d)
df2 = pd.DataFrame(data=d2)
This will give me two dataframes that look the following:
df =
val a
0 1 1
1 2 1
2 3 2
3 4 2
and
df2 =
val a
0 1 1
1 2 2
Now I want to create a boxplot based on val in df and the values of a, i.e. fix a value a, i.e. 1; Then I have two different values val: 1 and 2; Then create a box at x=1 based on the values {1,2}; Then move on to a=2: Based on a=2 we have two values val={3,4} so create a box at x=2 based on the values {3,4};
Then I want to simply draw a line based on df2, where a is again my x-axis and val my y-axis; The way I did that is the following
ax = df.boxplot(column=['val'], by = ['a'],meanline=True, showmeans=True, showcaps=True,showbox=True)
sns.pointplot(x='a', y='val', data=df2, ax=ax)
The problem is that the box for a=1 is shifted at a=2 and the box for a=2 disappeared; I am confused if I have an error in my code or if it is a bug;
If I just add the boxplot, everything is fine, so if I do:
ax = df.boxplot(column=['val'], by = ['a'],meanline=True, showmeans=True, showcaps=True,showbox=True)
The boxes are at the right position but as soon as I add the pointplot, things don't seem to work anymore;
Anyone an idea what to do?
The problem is that you are plotting categories on the x-axis. Pointplot plots the first item at position 0 while boxplot starts at 1, thus the shift. One possibility is to use an twinned axis:
ax = df.boxplot(column=['val'], by = ['a'])
ax2 = ax.twiny()
sns.pointplot(x='a', y='val', data=df2, ax=ax2)
ax2.xaxis.set_visible(False)
I have the following pandas data frame and would like to create n plots horizontally where n = unique labels(l1,l2,.) in the a1 row(for example in the following example there will be two plots because of l1 and l2). Then for these two plots, each plot will plot a4 as the x-axis against a3 as y axis. For example, ax[0] will contain a graph for a1, where it has three lines, linking the points [(1,15)(2,20)],[(1,17)(2,19)],[(1,23)(2,15)] for the below data.
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
d = {'a1': ['l1','l1','l1','l1','l1','l1','l2','l2','l2','l2','l2','l2'],
'a2': ['a', 'a', 'b','b','c','c','d','d','e','e','f','f'],
'a3': [15,20,17,19,23,15,22,21,23,23,24,27],
'a4': [1,2,1,2,1,2,1,2,1,2,1,2]}
df=pd.DataFrame(d)
df
a1 a2 a3 a4
1 a 15 1
1 a 20 2
1 b 17 1
1 b 19 2
1 c 23 1
1 c 15 2
2 d 22 1
2 d 21 2
2 e 23 1
2 e 23 2
2 f 24 1
2 f 27 2
I currently have the following:
def graph(dataframe):
x = dataframe["a4"]
y = dataframe["a3"]
ax[0].plot(x,y) #how do I plot and set the title for each group in their respective subplot without the use of for-loop?
fig, ax = plt.subplots(1,len(pd.unique(df["a1"])),sharey='row',figsize=(15,2))
df.groupby(["a1"]).apply(graph)
However, my above attempt only plots all a3 against a4 on the first subplot(because I wrote ax[0].plot()). I can always use a for-loop to accomplish the desired task, but for large number of unique groups in a1, it will be computationally expensive. Is there a way to make it a one-liner on the line ax[0].plot(x,y) and it accomplishes the desired task without a for loop? Any inputs are appreciated.
I do not see any way of avoiding a for loop when plotting this data with pandas. My initial thought was to reshape the dataframe to make subplots=True work, like this:
dfp = df.pivot(columns='a1').swaplevel(axis=1).sort_index(axis=1)
dfp
But I do not see how to select the level 1 of the the columns MultiIndex to make something like dfp.plot(x='a4', y='a3', subplots=True) work.
Removing level 0 and then running the plotting function with
dfp.droplevel(axis=1, level=0).plot(x='a4', y='a3', subplots=True) raises ValueError: x must be a label or position. And even if this worked, there would still be the issue of linking the correct points together.
The seaborn package was created to conveniently plot this kind of dataset. If you are open to using it here is an example with relplot:
import pandas as pd # v 1.1.3
import seaborn as sns # v 0.11.0
d = {'a1': ['l1','l1','l1','l1','l1','l1','l2','l2','l2','l2','l2','l2'],
'a2': ['a', 'a', 'b','b','c','c','d','d','e','e','f','f'],
'a3': [15,20,17,19,23,15,22,21,23,23,24,27],
'a4': [1,2,1,2,1,2,1,2,1,2,1,2]}
df = pd.DataFrame(d)
sns.relplot(data=df, x='a4', y='a3', col='a1', hue ='a2', kind='line', height=4)
You can customize the colors with the palette argument and adjust the grid layout with col_wrap.
I would like to print the DataFrame besides the plot. What would be a pythonic way to do that?
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({'Age':[21,22,23,24,25,26,27,28,29,30],'Count':[4,1,3,7,2,3,5,1,1,5]})
print(df)
Age Count
0 21 4
1 22 1
2 23 3
3 24 7
4 25 2
5 26 3
6 27 5
7 28 1
8 29 1
9 30 5
plt.rcParams['figure.figsize']=(10,6)
fig,ax = plt.subplots()
font_used={'fontname':'pristina', 'color':'Black'}
ax.set_ylabel('Count',fontsize=20,**font_used)
ax.set_xlabel('Age',fontsize=20,**font_used)
plt.plot(df['Age'],df['Count'])
I would like to have a Graph like this. How can I have the DataFrame's plotted values are printed alongside?:
You can use ax.text to add the DataFrame to the plot. DataFrames have a .to_string method which makes formatting nice. Supply index=False to remove the row index.
plt.rcParams['figure.figsize']=(10, 6)
fig,ax = plt.subplots()
font_used={'fontname':'pristina', 'color':'Black'}
ax.set_ylabel('Count',fontsize=20,**font_used)
ax.set_xlabel('Age',fontsize=20,**font_used)
# Adjust to where you want.
ax.text(x=28.5, y=4.5, s=df.to_string(index=False))
plt.plot(df['Age'],df['Count'])
plt.show()
Another option is to use the function plt.table():
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame({'Age':[21,22,23,24,25,26,27,28,29,30],'Count':[4,1,3,7,2,3,5,1,1,5]})
plt.rcParams['figure.figsize']=(10,15)
fig,ax = plt.subplots()
plt.subplots_adjust(left=0.1, right=0.85, top=0.9, bottom=0.1)
font_used={'fontname':'pristina', 'color':'Black'}
ax.set_ylabel('Count',fontsize=20,**font_used)
ax.set_xlabel('Age',fontsize=20,**font_used)
plt.plot(df['Age'],df['Count'])
ax.table(cellText=df['Count'].map(str),
rowLabels=df['Age'].map(str),
colWidths=[0.2,0.25],
loc='right')
plt.show()
This approach will create a table with their respective lines. Just make sure to adjust the plot with subplots_adjust() afterwards.
Pandas has a to_html function you can use and place the html next to it. What are you placing the graph and Dataframe into?
df.to_html()
I have the following datasets of three variables:
df['Score'] Float dummy (1 or 0)
df['Province'] an object column where each row is a region
df['Product type'] an object indicating the industry.
I would like to create a jointplot where on the x axis I have the different industries, on the y axis the different provinces and as colours of my jointplot I have the relative frequency of the score.
Something like this.
https://seaborn.pydata.org/examples/hexbin_marginals.html
For the time being, I could only do the following
mean = df.groupby(['Province', 'Product type'])['score'].mean()
But i am not sure how to plot it.
Thanks!
If you are looking for a heatmap, you could use seaborn heatmap function. However you need to pivot your table first.
Just creating a small example:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
score = [1, 1, 1, 0, 1, 0, 0, 0]
provinces = ['Place1' ,'Place2' ,'Place2', 'Place3','Place1', 'Place2','Place3','Place1']
products = ['Product1' ,'Product3' ,'Product2', 'Product2','Product1', 'Product2','Product1','Product1']
df = pd.DataFrame({'Province': provinces,
'Product type': products,
'score': score
})
My df looks like:
'Province''Product type''score'
0 Place1 Product1 1
1 Place2 Product3 1
2 Place2 Product2 1
3 Place3 Product2 0
4 Place1 Product1 1
5 Place2 Product2 0
6 Place3 Product1 0
7 Place1 Product1 0
Then:
df_heatmap = df.pivot_table(values='score',index='Province',columns='Product type',aggfunc=np.mean)
sns.heatmap(df_heatmap,annot=True)
plt.show()
The result is: