Python pandas groupby boxplots overlap - python

I'm puzzled by this Pandas/Matplotlib behaviour:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
series = pd.Series(np.arange(10))
classifier = lambda x: 'Odd' if x%2 else "Even"
grouped = series.groupby(classifier)
grouped.plot(kind='box')
plt.show()
How do I get the boxplots next to each other Pandas style i.e. with nice syntax? :)
(Pandas v. 0.16.2, Matplotlib v. 1.4.3)
Edit:
I know I could do this:
grouped = grouped.apply(pd.Series.to_frame)
but I would assume there's a cleaner way to do this?

So my general advice is to avoid plotting through pandas with the following exceptions:
Super quick 'n' dirty interactive exploration and inspection
Time series
Any other time you'll want to use seaborn or roll your own matplotlib function. Since you're working with a dataframe, seaborn is your best bet, although labeled data support is very quickly coming down the pipe for matplotlib.
I'm also going to advise that you go ahead and create the dataframe with the classification stored inside of it.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn
seaborn.set(style='ticks')
df = pd.DataFrame(np.arange(10), columns=['val'])
df['class'] = df['val'].apply(lambda x: 'Odd' if x%2 else "Even")
seaborn.boxplot(x='class', y='val', data=df, width=0.5)
seaborn.despine(offset=10, trim=True)

Related

Unable to change the tick frequency on my chart

I have seen many questions on changing the tick frequency on SO, and that did help when I am building a line chart, but I have been struggling when its a bar chart. So below are my codes
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame(np.random.randint(1,10,(90,1)),columns=['Values'])
df.plot(kind='bar')
plt.show()
and thats the output I see. How do I change the tick frequency ?
(To be more clearer frequency of 5 on x axis!)
Using Pandas plot function you can do:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(1,10,(90,1)),columns=['Values'])
df.plot(kind='bar', xticks=np.arange(0,90,5))
Or better:
df.plot(kind='bar', xticks=list(df.index[0::5]))

How to use sns.jointplot form specific data?

I want to use jointplot from seaborn. I am using the following code but I do not Understand sns.jointplot command. I want in X axis to have g and in y axis to have Years.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('My.txt', sep="\t", header=None)
df.info
print(df)
Years=df[0]
date=df[1]
Lat=df[2]
Lon=df[3]
D=df[4]
g=df[5]
sns.jointplot(data=df, x=5 y=0)
should work? Otherwise, you will need to provide a Minimal, Complete, and Verifiable example that includes a toy dataset (refer to How to make good reproducible pandas examples)

Grouped Histogram in Python

Is there a simple way of creating histograms for a continuous variable (mpg) that is filtered by a categorical variable (cyl=4,8)? So essentially I need two histograms for mpg grouped by cyl, one for cyl=4 and one for cyl=8.
Here is an example from a different dataset:
import numpy as np
import pandas as pd
import seaborn as sns
data = pd.DataFrame()
data[4] = np.random.normal(0,10,300)
data[8] = np.random.normal(20,11,300)
sns.distplot(data[4], color="skyblue")
sns.distplot(data[8], color="orange")
I just used my random sample.
I am just being a little lazy here, but all you need to do is a seaborn package.
There are much more options you can handle, so please read it more here [https://python-graph-gallery.com/]

Plotting a bar chart

I have an imported excel file in python and want to create a bar chart.
In the bar chart, I want the bars to be separated by profit, 0-10, 10-20, 20-30...
How do I do this?
this is one of the things I have tried:
import NumPy as np
import matplotlib.pyplot as plt
%matplotlib inline
df.plot(kind="bar",x="profit", y="people")
df[df.profit<=10]
plt.show()
and:
df[df.profit range (10,20)]
It is a bit difficult to help you better without a sample of your data, but I constructed a dataset randomly that should have the shape of yours, so that this solution can hopefully be useful to you:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# For random data
import random
%matplotlib inline
df = pd.DataFrame({'profit':[random.choice([i for i in range(100)]) for x in range(100)], 'people':[random.choice([i for i in range(100)]) for x in range(100)]})
display(df)
out = pd.cut(df['profit'], bins=[x*10 for x in range(10)], include_lowest=True)
ax = out.value_counts(sort=False).plot.bar(rot=0, color="b", figsize=(14,4))
plt.xlabel("Profit")
plt.ylabel("People")
plt.show()
I had a look at another question on here (Pandas bar plot with binned range) and there they explained how this issue can be solved.
Hope it helps :)

Make pandas plot() show xlabel and xvalues

I am using the standard pandas.df.plot() function to plot two columns in a dataframe. For some reason, the x-axis values and the xlabel are not visible! There seem to be no options to turn them on in the function either (see https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.plot.html).
Does someone know what is going on, and how to correct it?
import matplotlib.cm as cm
import pandas as pd
ax1 = df.plot.scatter(x='t', y='hlReference', c='STEP_STRENGTH', cmap=cm.autumn);
gives this:
This is a bug with Jupyter notebooks displaying pandas scatterplots that have a colorscale displayed while using Matplotlib as the plotting backend.
#june-skeeter has a solution in the answers that works. Alternatively, pass sharex=False to df.plot.scatter and you don't need to create subplots.
import matplotlib.cm as cm
import pandas as pd
X = np.random.rand(10,3)
df = pd.DataFrame(X,columns=['t','hlReference', 'STEP_STRENGTH'])
df.plot.scatter(
x='t',
y='hlReference',
c='STEP_STRENGTH',
cmap=cm.autumn,
sharex=False
)
See discussion in this closed pandas issues. Which references the above solution in a related SO answer.
Still an issue with pandas v1.1.0. You can track the issue here: https://github.com/pandas-dev/pandas/issues/36064
Create your axes instance first and then send it as an argument to the plot()
import matplotlib.cm as cm
import pandas as pd
X = np.random.rand(10,3)
df = pd.DataFrame(X,columns=['t','hlReference', 'STEP_STRENGTH'])
fig,ax1=plt.subplots()
df.plot.scatter(x='t', y='hlReference', c='STEP_STRENGTH', cmap=cm.autumn,ax=ax1)

Categories

Resources