I'm trying to do a line plot with one line per column. My dataset looks like this:
I'm using this code, but it's giving me the following error:
ValueError: Wrong number of items passed 3, placement implies 27
plot_x = 'bill__effective_due_date'
plot_y = ['RR_bucket1_perc', 'RR_bucket7_perc', 'RR_bucket14_perc']
ax = sns.pointplot(x=plot_x, y=plot_y, data=df_rollrates_plot, marker="o", palette=sns.color_palette("coolwarm"))
display(ax.figure)
Maybe it's a silly question but I'm new to python so I'm not sure how to do this. This is my expected output:
Thanks!!
You can plot the dataframe as follows (edit: I updated the code below to make bill__effective_due_date the index of the dataframe):
import seaborn as sns
import pandas as pd
rr1 = [20,10,2,10,2,5]
rr7 = [17,8,2,8,2,4]
rr14 = [12,5,2,5,2,3]
x = ['Nov-1','Nov2','Nov-3','Nov-4','Nov-5','Nov-6']
df_rollrates_plot = pd.DataFrame({'RR_bucket1_perc':rr1,
'RR_bucket7_perc':rr7,
'RR_bucket14_perc':rr14})
df_rollrates_plot.index = x
df_rollrates_plot.index.name = 'bill__effective_due_date'
sns.lineplot(data=df_rollrates_plot)
plt.grid()
Your data is in the wrong shape to take advantage of the hue parameter in seaborn's lineplot. You need to stack it so that the columns become categorical values.
import pandas as pd
import seaborn as sns
rr1 = [20,10,2,10,2,5]
rr7 = [17,8,2,8,2,4]
rr14 = [12,5,2,5,2,3]
x = ['Nov-1','Nov2','Nov-3','Nov-4','Nov-5','Nov-6']
df = pd.DataFrame({'bill_effective_due_date':x,
'RR_bucket1_perc':rr1,
'RR_bucket7_perc':rr7,
'RR_bucket14_perc':rr14})
# This is where you are reshaping your data to make it work like you want
df = df.set_index('bill_effective_due_date').stack().reset_index()
df.columns=['bill_effective_due_date','roll_rates_perc','roll_rates']
sns.lineplot(data=df, x='bill_effective_due_date',y='roll_rates', hue='roll_rates_perc', marker='o')
Related
I'm using the below code to get Segment and Year in x-axis and Final_Sales in y-axis but it is throwing me an error.
CODE
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
%matplotlib inline
order = pd.read_excel("Sample.xls", sheet_name = "Orders")
order["Year"] = pd.DatetimeIndex(order["Order Date"]).year
result = order.groupby(["Year", "Segment"]).agg(Final_Sales=("Sales", sum)).reset_index()
bar = plt.bar(x = result["Segment","Year"], height = result["Final_Sales"])
ERROR
Can someone help me to correct my code to see the output as below.
Required Output
Try to add another pair of brackets - result[["Segment","Year"]],
What you tried to do is to retrieve column named - "Segment","Year",
But actually what are you trying to do is to retrieve a list of columns - ["Segment","Year"].
There are several problems with your code:
When using several columns to index a dataframe you want to pass a list of columns to [] (see the docs) as follows :
result[["Segment","Year"]]
From the figure you provide it looks like you want to use year as hue. matplotlib.barplot doesn't have a hue argument, you would have to build it manually as described here. Instead you can use seaborn library that you are already importing anyway (see https://seaborn.pydata.org/generated/seaborn.barplot.html):
sns.barplot(x = 'Segment', y = 'Final_Sales', hue = 'Year', data = result)
I have a pandas dataframe df for which I plot a multi-histogram as follow :
df.hist(bins=20)
This give me a result that look like this (Yes this exemple is ugly since there is only one data per histogram, sorry) :
I have a subplot for each numerical column of my dataframe.
Now I want all my histograms to have an X-axis between 0 and 1. I saw that the hist() function take a ax parameter, but I cannot manage to make it work.
How is it possible to do that ?
EDIT :
Here is a minmal example :
import pandas as pd
import matplotlib.pyplot as plt
myArray = [(0,0,0,0,0.5,0,0,0,1),(0,0,0,0,0.5,0,0,0,1)]
myColumns = ['col1','col2','col3','co4','col5','col6','col7','col8','col9']
df = pd.DataFrame(myArray,columns=myColumns)
print(df)
df.hist(bins=20)
plt.show()
Here is a solution that works, but for sure is not ideal:
import pandas as pd
import matplotlib.pyplot as plt
myArray = [(0,0,0,0,0.5,0,0,0,1),(0,0,0,0,0.5,0,0,0,1)]
myColumns = ['col1','col2','col3','co4','col5','col6','col7','col8','col9']
df = pd.DataFrame(myArray,columns=myColumns)
print(df)
ax = df.hist(bins=20)
for x in ax:
for y in x:
y.set_xlim(0,1)
plt.show()
I have a list of case and control samples along with the information about what characteristics are present or absent in each of them. A dataframe including the information can be generated by Pandas:
import pandas as pd
df={'Patient':[True,True,False],'Control':[False,True,False]} # Presence/absence data for three genes for each sample
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']
I need to visualize this data as a dotplot/scatterplot in the way that both of the x and y axis to be categorical and presence/absence to be coded by different shapes. Something like following:
Patient| x x -
Control| - x -
__________________
GeneA GeneB GeneC
I am new to Matplotlib/seaborn and I can plot simple line plots and scatter plots. But searching online I could not find any instructions or plot similar to what I need here.
A quick way would be:
import pandas as pd
import matplotlib.pyplot as plt
df={'Patient':[1,1,0],'Control':[0,1,0]} # Presence/absence data for three genes for each sample
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']
heatmap = plt.imshow(df)
plt.xticks(range(len(df.columns.values)), df.columns.values)
plt.yticks(range(len(df.index)), df.index)
cbar = plt.colorbar(mappable=heatmap, ticks=[0, 1], orientation='vertical')
# vertically oriented colorbar
cbar.ax.set_yticklabels(['Absent', 'Present'])
Thanks to #DEEPAK SURANA for adding labels to the colorbar.
I searched the pyplot documentation and could not find a scatter or dot plot exactly like you described. Here is my take on creating a plot that illustrates what you want. The True records are blue and the False records are red.
# creating dataframe and extra column because index is not numeric
import pandas as pd
df={'Patient':[True,True,False],
'Control':[False,True,False]}
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']
df['level'] = [i for i in range(0, len(df))]
print(df)
# plotting the data
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(10,6))
for idx, gene in enumerate(df.columns[:-1]):
df_gene = df[[gene, 'level']]
cList = ['blue' if x == True else 'red' for x in df[gene]]
for inr_idx, lv in enumerate(df['level']):
ax.scatter(x=idx, y=lv, c=cList[inr_idx], s=20)
fig.tight_layout()
plt.yticks([i for i in range(len(df.index))], list(df.index))
plt.xticks([i for i in range(len(df.columns)-1)], list(df.columns[:-1]))
plt.show()
Something like this might work
import pandas as pd
import numpy as np
from matplotlib.ticker import FixedLocator
df={'Patient':[1,1,0],'Control':[0,1,0]} # Presence/absence data for three genes for each sample
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']
plot = df.T.plot()
loc = FixedLocator([0,1,2])
plot.xaxis.set_major_locator(loc)
plot.xaxis.set_ticklabels(df.columns)
look at https://matplotlib.org/examples/pylab_examples/major_minor_demo1.html
and https://matplotlib.org/api/ticker_api.html
I think you have to convert the boolean values to zeros and ones to make it work. Someting like df.astype(int)
I am trying to make a line plot in which every one of the elements from the index appears as an xtick.
import pandas as pd
ind = ['16-12', '17-01', '17-02', '17-03', '17-04',
'17-05','17-06', '17-07', '17-08', '17-09', '17-10', '17-11']
data = [1,3,5,2,3,6,4,7,8,5,3,8]
df = pd.DataFrame(data,index=ind)
df.plot(kind='line',x_compat=True)
however the resultant plot skips every second element of the index like so:
My code to call the plot includes the (x_compat=True) parameter which the documentation for pandas suggests should stop the auto tick configuratioin but it seems to have no effect.
You need to use ticker object on axis and then use that axis when plotting.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
ind = ['16-12', '17-01', '17-02', '17-03', '17-04',
'17-05','17-06', '17-07', '17-08', '17-09', '17-10', '17-11']
data = [1,3,5,2,3,6,4,7,8,5,3,8]
df = pd.DataFrame(data,index=ind)
ax2 = plt.axes()
ax2.xaxis.set_major_locator(ticker.MultipleLocator(1))
df.plot(kind='line', ax=ax2)
I am analysing a two-factorial (M)ANOVA; the sampling design consists of two categorical variables with two and three levels respectively and a response of dimension 4. Having done all the data parsing in python, I would like to continue plotting the data within python, too. (Rather than switch to R for the plotting.) My code, though, is not only very verbose, but the whole thing looks and feels like a really bad hack, too. My question: What is the pandas-matplotlib-way of producing the following plot? Out of interest: I would also be happy to see a solution that is not using seaborn.
The solution in R (the plotting is 2 lines of code):
# Data managment
library(reshape2)
# Plotting
library(ggplot2)
# Creating sample data
set.seed(12345)
dat = data.frame(matrix(rnorm(42*4, mean=c(10,3,5,1)), ncol=4, byrow=T))
names(dat) = c('Base', 'State23', 'State42', 'End')
gen = factor(sample(2, size=42, replace=T), labels=c('WT', 'HET'))
env = factor(sample(3, size=42, replace=T), labels=c('heavySmoker', 'casualSmoker', 'nonSmoker'))
dat$genotype = gen
dat$environment = env
# Plotting the data
dam = melt(dat, measure.vars=c('Base', 'State23', 'State42', 'End'))
p = ggplot(dam, aes(genotype, value, fill=environment)) + geom_boxplot() + facet_wrap(~variable, nrow=1)
ggsave('boxplot-r.png', plot=p)
This will produce the following plot:
My current solution in python:
# Numerics
import numpy as np
from numpy.random import randint
# Data managment
import pandas as pd
from pandas import DataFrame
from pandas import Series
# Plotting
import matplotlib
matplotlib.use('Qt4Agg')
import matplotlib.pyplot as pt
import seaborn as sns
# Creating sample data
np.random.seed(12345)
index = pd.Index(np.arange(42))
frame = DataFrame(np.random.randn(42,4) + np.array([10,3,5,1]), columns=['Base', 'State23', 'State42', 'End'], index=index)
genotype = Series(['WT', 'HET'], name='genotype', dtype='category')
environment = Series(['heavySmoker', 'casualSmoker', 'nonSmoker'], name='environment', dtype='category')
gen = genotype[np.random.randint(2, size=42)]
env = environment[np.random.randint(3, size=42)]
gen.index = frame.index
env.index = frame.index
frame['genotype'] = gen
frame['environment'] = env
# Plotting the data
response = ['Base', 'State23', 'State42', 'End']
fig, ax = pt.subplots(1, len(response), sharex=True, sharey=True)
for i, r in enumerate(response):
sns.boxplot(data=frame, x='genotype', y=r, hue='environment', ax=ax[i])
ax[i].set_ylabel('')
ax[i].set_title(r)
fig.subplots_adjust(wspace=0)
fig.savefig('boxplot-python.png')
This will produce the following plot:
As you probably agree, the code is not only verbose, but it also does not really do what I want. For example, I have no idea how to remove the multiple appearance of the legend, and the labelling on the x-axis is odd.
Edited to use factorplot instead of Facetgrid as suggested by mwaskom in the comments.
If you melt the dataframe, then you can take advantage of Seaborn's factorplot:
df = pd.melt(frame, id_vars=['genotype', 'environment'])
sns.factorplot(data=df, x='genotype', y='value',
hue='environment', col='variable',
kind='box', legend=True)
You can rename "value" and "variable" as you wish in the melt function.
Here is the resulting chart:
Previous answer with FacetGrid:
g = sns.FacetGrid(df, col="variable", size=4, aspect=.7)
g.map(sns.boxplot, "genotype", "value", "environment").add_legend(title="environment")