I would like to plot from the seaborn dataset 'tips'.
import seaborn as sns
import pandas as pd
tips = sns.load_dataset("tips")
x1 = tips.loc[(df['time']=='lunch'), 'tip']
x2 = tips.loc[(df['time']=='dinner'),'tip']
x1.plot.kde(color='orange')
x2.plot.kde(color='blue')
plt.show()
I don't know exactly where it's wrong...
Thanks for the help.
Seaborn's sns.kdeplot() supports the hue argument to split the plot between different categories:
import seaborn as sns
import pandas as pd
tips = sns.load_dataset("tips")
sns.kdeplot(data=tips, x='tip', hue='time')
Of course your approach could work too, but there are several problems with your code:
What is df? Shouldn't that be tips?
The category names Lunch and Dinner must be capitalized, as in the data.
You're mixing different indexing techniques. It should be e.g. x1 = tips.tip[tips['time'] == 'Lunch'].
If you want to plot two KDE in the same diagram, they should be scaled according to sample size. With my approach above, seaborn has done that automatically.
As you are loading data from the seaborn built-in datasets check that your column names are case sensitive replace them with correct name.
You can plot the cumulative distribution between the time and density as follows:
sns.kdeplot(
data=tips, x="total_bill", hue="time",
cumulative=True, common_norm=False, common_grid=True,
)
Related
I would like to have every column on my x-Axis and every value on my y-Axis.
With plotly and seaborn I could only find a way to plot the values against each other (column 1 on x vs coulmn 2 on y).
So for my shown example following would be columns:
"Import Files", "Defining Variables", "Simulate Cutting Down",...
I would like to have all theri values on the y-Axis.
So what I basically want is
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('timings.csv')
df.T.plot()
plt.show()
but with scatter. Matplotlib, Seaborn or Plotly is fine by me.
This would be an example for a csv File, since I can't upload a file:
Import Files,Defining Variables,Copy All Cutters,Simulate Cutting Down,Calculalte Circle, Simulate Cutting Circle, Calculate Unbalance,Write to CSV,Total Time
0.015956878662109375,0.0009989738464355469,0.022938966751098633,0.1466083526611328,0.0009968280792236328,48.128061294555664,0.0,0.014995098114013672,48.33055639266968
0.015958786010742188,0.0,0.024958133697509766,0.14598894119262695,0.0,49.22848296165466,0.0,0.004987239837646484,49.42037606239319
0.015943288803100586,0.0,0.036900997161865234,0.14561033248901367,0.0,46.80884146690369,0.0,0.004009723663330078,47.011305809020996
I only used the data you provided; as mentioned by others in the comments, barplot is more suited for this data but here it is with scatter plot:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(16,5))
sns.scatterplot(data=df.melt(), x='variable', y ='value', ax=ax)
ax.set_xlabel('')
ax.set_ylabel('Time in seconds')
I try to produce a stacked barplot showing some categories. However, the current dataframe seems difficult to stack categories together. Also, some years has no count, and this should be removed. Any ideas how to produce something like pic below. Highly appreciate your time.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df=pd.read_csv(r"https://raw.githubusercontent.com/tuyenhavan/Course_Data/main/test_barplot.csv")
fig,ax=plt.subplots(figsize=(15,8))
ax.bar(df.Year,df.Number)
plt.xticks(np.arange(2000,2022),np.arange(2000,2022))
plt.xlabel("Year", fontsize=15)
plt.ylabel("Number", fontsize=15)
plt.xticks()
plt.show()
You can reshape the data such that the stacked categories are columns. Then you can use pandas plot.bar with stacked=True. reindex adds the missing years.
fig, ax=plt.subplots(figsize=(15,8))
df_stack = df.pivot_table(index="Year",
columns="Category",
values="Number",
aggfunc=sum)
df_stack = df_stack.reindex(np.arange(2000, 2022))
df_stack.plot.bar(stacked=True, ax=ax)
plt.xlabel("Year", fontsize=15)
plt.ylabel("Number", fontsize=15)
Double Agriculture is due to one with and one without trailing space.
I have a count table as dataframe in Python and I want to plot my distribution as a boxplot. E.g.:
df=pandas.DataFrame.from_items([('Quality',[29,30,31,32,33,34,35,36,37,38,39,40]), ('Count', [3,38,512,2646,9523,23151,43140,69250,107597,179374,840596,38243])])
I 'solved' it by repeating my quality value by its count. But I dont think its a good way and my dataframe is getting very very big.
In R there its a one liner:
ggplot(df, aes(x=1,y=Quality,weight=Count)) + geom_boxplot()
This will output:!Boxplot from R1
My aim is to compare the distribution of different groups and it should look like
Can Python solve it like this too?
What are you trying to look at here? The boxplot hereunder will return the following figure.
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline
df=pd.DataFrame.from_items([('Quality',[29,30,31,32,33,34,35,36,37,38,39,40]), ('Count', [3,38,512,2646,9523,23151,43140,69250,107597,179374,840596,38243])])
plt.figure()
df_box = df.boxplot(column='Quality', by='Count',return_type='axes')
If you want to look at your Quality distibution weighted on Count, you can try plotting an histogramme:
plt.figure()
df_hist = plt.hist(df.Quality, bins=10, range=None, normed=False, weights=df.Count)
I'm trying to plot two series together in Pandas, from different dataframes.
Both their axis are datetime objects, so they can be plotted together:
amazon_prices.Close.plot()
data[amazon].BULL_MINUS_BEAR.resample("W").plot()
plt.plot()
Yields:
All fine, but I need the green graph to have its own scale. So I use the
amazon_prices.Close.plot()
data[amazon].BULL_MINUS_BEAR.resample("W").plot(secondary_y=True)
plt.plot()
This secondary_y creates a problem, as instead of having the desired graph, I have the following:
Any help with this is hugely appreciated.
(Less relevant notes: I'm (evidently) using Pandas, Matplotlib, and all this is in an Ipython notebook)
EDIT:
I've since noticed that removing the resample("W") solves the issue. It is still a problem however as the non-resampled data is too noisy to be visible. Being able to plot sampled data with a secondary axis would be hugely helpful.
import matplotlib.pyplot as plt
import pandas as pd
from numpy.random import random
df = pd.DataFrame(random((15,2)),columns=['a','b'])
df.a = df.a*100
fig, ax1 = plt.subplots(1,1)
df.a.plot(ax=ax1, color='blue', label='a')
ax2 = ax1.twinx()
df.b.plot(ax=ax2, color='green', label='b')
ax1.set_ylabel('a')
ax2.set_ylabel('b')
ax1.legend(loc=3)
ax2.legend(loc=0)
plt.show()
I had the same issue, always getting a strange plot when I wanted a secondary_y.
I don't know why no-one mentioned this method in this post, but here's how I got it to work, using the same example as cphlewis:
import matplotlib.pyplot as plt
import pandas as pd
from numpy.random import random
df = pd.DataFrame(random((15,2)),columns=['a','b'])
ax = df.plot(secondary_y=['b'])
plt.show()
Here's what it'll look like
I have a data set that has two independent variables and 1 dependent variable. I thought the best way to represent the dataset is by a checkerboard-type plot wherein the color of the cells represent a range of values, like this:
I can't seem to find a code to do this automatically.
You need to use a plotting package to do this. For example, with matplotlib:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
X = 100*np.random.rand(6,6)
fig, ax = plt.subplots()
i = ax.imshow(X, cmap=cm.jet, interpolation='nearest')
fig.colorbar(i)
plt.show()
For those who come across this years later as myself, what Original Poster wants is a heatmap.
Matplotlib has documentation regarding the following example here.