Forcing x-axis of pyplot histogram (python, pandas) - python

Hi so I have a data called vc that looks like this.
It is a count of scores. The score range is from 0 to 40.
However, as shown below, there is only a few with actual counts. I can't make the histogram to have the x axis that I want..
d = {'count': [9, 30, 6, 2,3,1,1,4,1,1,2,2,6,3]}
vc = pd.DataFrame(data=d, index=[12,13,14,15,17,18,19,20,21,22,23,24,25,26])
vc
index count
12 9
13 30
14 6
15 2
17 3
18 1
19 1
20 4
...and so on
I want to make a histogram with x axis from 0 to 40, like this: histogram I want
However, my histogram doesn't show the scores with zero counts..
vc=vc.sort_index()
ax = vc.plot(kind='bar', legend=False)
ax.set_xlabel("score")
ax.set_ylabel("count")
ax.set_xticks(range(0,40,5))
The resulting histogram: enter image description here
How can I produce the wanted histogram in the first image?
I've tried for hours but have sadly failed.. Thank you

Maybe not a very clever solution, but you can plot a blank with the range you need, then plot over with your count table:
import pandas as pd
import matplotlib.pyplot as plt
d = {'count': [9, 30, 6, 2,3,1,1,4,1,1,2,2,6,3]}
vc = pd.DataFrame(data=d, index=[12,13,14,15,17,18,19,20,21,22,23,24,25,26])
plt.bar(x=np.arange(10,40),height=0)
plt.bar(vc.index.to_list(),vc['count'])

Related

pie chart drawing for a specific column in pandas python

I have a dataframe df, which has many columns. In df["house_electricity"], there are values like 1,0 or blank/NA. I want to plot the column in terms of a pie chart, where percentage of only 1 and 0 will be shown. Similarly I want to plot another pie chart where percentage of 1,0 and blank/N.A all will be there.
customer_id
house_electricity
house_refrigerator
cid01
0
0
cid02
1
na
cid03
1
cid04
1
cid05
na
0
#I wrote the following but it didnt give my my expected result
import pandas as pd
import matplotlib.pyplot as plt
df=pd.read_csv("my_file.csv")
df_col=df.columns
df["house_electricity"].plot(kind="pie")
#I wrote the following but it didnt give my my expected result
import pandas as pd
import matplotlib.pyplot as plt
df=pd.read_csv("my_file.csv")
df_col=df.columns
df["house_electricity"].plot(kind="pie")
For a dataframe
df = pd.DataFrame({'a':[1,0,np.nan,1,1,1,'',0,0,np.nan]})
df
a
0 1
1 0
2 NaN
3 1
4 1
5 1
6
7 0
8 0
9 NaN
The code below will give
df["a"].value_counts(dropna=False).plot(kind="pie")
If you want combine na and empty value, try replacing empty values with np.nan, then try to plot
df["a"].replace("", np.nan).value_counts(dropna=False).plot(kind="pie")
For solution you need to try with this code to generate 3 blocks.
import pandas as pd
import matplotlib.pyplot as plt
data = {'customer_id': ['cid01', 'cid02', 'cid03', 'cid04', 'cid05'],
'house_electricity': [0, 1, None, 1, None],
'house_refrigerator': [0, None, 1, None, 0]}
df = pd.DataFrame(data)
counts = df['house_electricity'].value_counts(dropna=False)
counts.plot.pie(autopct='%1.1f%%', labels=['0', '1', 'NaN'], shadow=True)
plt.title('Percentage distribution of house_electricity column')
plt.axis('equal')
plt.show()
Result:

how to plot the variation of feature

I have dataset with 2 columns and I would like to show the variation of one feature according to the binary output value
data
id Column1 output
1 15 0
2 80 1
3 120 1
4 20 0
... ... ...
I would like to drop a plot with python where x-axis contains values of Column1 and y-axis contains the percent of getting positive values.
I know already that the form of my plot have the form of exponontial function where when column1 has smaller numbers I will get more positive output then when it have long values
exponential plot maybe need two list like this
try this
import matplotlib.pyplot as plt
# x-axis points list
xL = [5,10,15,20,25,30]
# y-axis points list
yL = [100,50,25,12,10]
plt.plot(xL, yL)
plt.axis([0, 35, 0, 200])
plt.show()

Ascending order of bars in seaborn barplot

I have the following dataframe
Class Age Percentage
0 2004 3 43.491170
1 2004 2 29.616607
2 2004 4 13.838925
3 2004 6 10.049712
4 2004 5 2.637445
5 2004 1 0.366142
6 2005 2 51.267369
7 2005 3 19.589268
8 2005 6 13.730432
9 2005 4 11.155305
10 2005 5 3.343524
11 2005 1 0.913590
12 2005 9 0.000511
I would like to make a bar plot using seaborn where in the y-axis is the 'Percentage', in the x-axis is the 'Class' and label them using the 'Age' column. I would also like to arrange the bars in descending order, i.e. from the bigger to the smaller bar.
In order to do that I thought of the following: I will change the hue_order parameter based on the order of the 'Percentage' variable. For example, if I sort the 'Percentage' column in descending order for the Class == 2004, then the hue_order = [3, 2, 4, 6, 5, 1].
Here is my code:
import matplotlib.pyplot as plt
import seaborn as sns
def hue_order():
for cls in dataset.Class.unique():
temp_df = dataset[dataset['Class'] == cls]
order = temp_df.sort_values('Percentage', ascending = False)['Age']
return order
sns.barplot(x="Class", y="Percentage", hue = 'Age',
hue_order= hue_order(),
data=dataset)
plt.show()
However, the bars are in descending order only for the Class == 2005. Any help?
In my question, I am using the hue parameter, thus, it is not a duplicate as proposed.
The seaborn hue parameter adds another dimension to the plot. The hue_order determines in which order this dimension is handled. However you cannot split that order. This means you may well change the order such that Age == 2 is in the third place in the plot. But you cannot change it partially, such that in some part it is in the first and in some other it'll be in the third place.
In order to achieve what is desired here, namely to use different orders of the auxilary dimensions within the same axes, you need to handle this manually.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = pd.DataFrame({"Class" : [2004]*6+[2005]*7,
"Age" : [3,2,4,6,5,1,2,3,6,4,5,1,9],
"Percentage" : [50,40,30,20,10,30,20,35,40,50,45,30,15]})
def sortedgroupedbar(ax, x,y, groupby, data=None, width=0.8, **kwargs):
order = np.zeros(len(data))
df = data.copy()
for xi in np.unique(df[x].values):
group = data[df[x] == xi]
a = group[y].values
b = sorted(np.arange(len(a)),key=lambda x:a[x],reverse=True)
c = sorted(np.arange(len(a)),key=lambda x:b[x])
order[data[x] == xi] = c
df["order"] = order
u, df["ind"] = np.unique(df[x].values, return_inverse=True)
step = width/len(np.unique(df[groupby].values))
for xi,grp in df.groupby(groupby):
ax.bar(grp["ind"]-width/2.+grp["order"]*step+step/2.,
grp[y],width=step, label=xi, **kwargs)
ax.legend(title=groupby)
ax.set_xticks(np.arange(len(u)))
ax.set_xticklabels(u)
ax.set_xlabel(x)
ax.set_ylabel(y)
fig, ax = plt.subplots()
sortedgroupedbar(ax, x="Class",y="Percentage", groupby="Age", data=df)
plt.show()

Seaborn, violin plot with one data per column

I would like to combine this violin plot http://seaborn.pydata.org/generated/seaborn.violinplot.html (fourth example with split=True) with this one http://seaborn.pydata.org/examples/elaborate_violinplot.html.
Actually, I have a dataFrame with a column Success (Yes or No) and several data column. For example :
df = pd.DataFrame(
{"Success": 50 * ["Yes"] + 50 * ["No"],
"A": np.random.randint(1, 7, 100),
"B": np.random.randint(1, 7, 100)}
)
A B Success
0 6 4 Yes
1 6 2 Yes
2 1 1 Yes
3 1 2 Yes
.. .. .. ...
95 4 4 No
96 2 1 No
97 2 6 No
98 2 3 No
99 2 1 No
I would like to plot a violin plot for each data column. It works with :
import seaborn as sns
sns.violinplot(data=df[["A", "B"]], inner="quartile", bw=.15)
But now, I would like to split the violin according to the Success column. But, using hue="Success" I got an error with Cannot use 'hue' without 'x' or 'y'. Thus how can I do to plot the violin plot by splitting according to "Success" column ?
If understand your question correctly, you need to reshape your dataframe to have it in long format:
df = pd.melt(df, value_vars=['A', 'B'], id_vars='Success')
sns.violinplot(x='variable', y='value', hue='Success', data=df)
plt.show()
I was able to adapt an example of a violin plot over a DataFrame like so:
df = pd.DataFrame({"Success": 50 * ["Yes"] + 50 * ["No"],
"A": np.random.randint(1, 7, 100),
"B": np.random.randint(1, 7, 100)})
sns.violinplot(df.A, df.B, df.Success, inner="quartile", split=True)
sns.plt.show()
Clearly, it still needs some work: the A scale should be sized to fit a single half-violin, for example.

plotting a line graph on a count plot with a separate y-axis on the right side

I've created a dummy dataframe which is similar to the one I'm using.
The dataframe consists of Fare prices, Cabin-type, and Survival (1 is alive, 0 = dead).
The first plot creates many graphs via factorplot, with each graph representing the Cabin type. The x-axis is represented by the Fare price and Y-axis is just a count of the number of occurrences at that Fare price.
What I then did was created another series, via groupby of [Cabin, Fare] and then proceeded to take the mean of the survival to get the survival rate at each Cabin and Fare price.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame(dict(
Fare=[20, 10, 30, 40, 40, 10, 20, 30, 40 ,30, 20, 30, 30],
Cabin=list('AAABCDBDCDDDC'),
Survived=[1, 0, 0, 0 ,0 ,1 ,1 ,0 ,1 ,1 , 0, 1, 1]
))
g =sns.factorplot(x='Fare', col='Cabin', kind='count', data=df,
col_wrap=3, size=3, aspect=1.3, palette='muted')
plt.show()
x =df.groupby(['Cabin','Fare']).Survived.mean()
What I would like to do is, plot an lineplot on the count graph above, (so the x-axis is the same, and each graph is still represented by a Cabin-type), but I would like the y-axis to be the survival mean we calculated with the groupby series x in the code above, which when outputted would be the third column below.
Cabin Fare
A 10 0.000000
20 1.000000
30 0.000000
B 20 1.000000
40 0.000000
C 30 1.000000
40 0.500000
D 10 1.000000
20 0.000000
30 0.666667
The y-axis for the line plot should be on the right side, and the range I would like is [0, .20, .40, .60, .80, 1.0, 1.2]
I looked through the seaborn docs for a while, but I couldn't figure out how to properly do this.
My desired output looks something like this image. I'm sorry my writing looks horrible, I don't know how to use paint well. So the ticks and numbers are on the right side of each graph. The line plot will be connected via dots at each x,y point. So for Cabin A, the first x,y point is (10,0) with 0 corresponding to the right y-axis. The second point is (20,1) and so on.
Data operations:
Compute frequency counts:
df_counts = pd.crosstab(df['Fare'], df['Cabin'])
Compute means across the group and unstack it back to obtain a DF. The Nan's are left as they are and not replaced by zero's to show the break in the line plot or else they would be continuous which wouldn't make much sense here.
df_means = df.groupby(['Cabin','Fare']).Survived.mean().unstack().T
Prepare the x-axis labels as strings:
df_counts.index = df_counts.index.astype(str)
df_means.index = df_means.index.astype(str)
Plotting:
fig, ax = plt.subplots(1, 4, figsize=(10,4))
df_counts.plot.bar(ax=ax, ylim=(0,5), cmap=plt.cm.Spectral, subplots=True,
legend=None, rot=0)
# Use secondary y-axis(right side)
df_means.plot(ax=ax, secondary_y=True, marker='o', color='r', subplots=True,
legend=None, xlim=(0,4))
# Adjust spacing between subplots
plt.subplots_adjust(wspace=0.5, hspace=0.5)
plt.show()

Categories

Resources