I have a large Pandas DataFrame that contains three columns: two different dates and one of measurement (floats). I want to plot a 3d figure (eg. trisurf, plot_surface, etc) where the dates are on the x and y axes and measurement is on the z axis. I tried using the suggestions in this post, but it isn't helpful.
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.dates as dates
import datetime
import matplotlib.ticker as ticker
import pandas as pd
df = pd.DataFrame()
df['date1'] = pd.date_range(start='2018-01-05', end='2018-04-15', freq='1D')
df['date2'] = pd.date_range(start='2018-01-19', end='2018-04-29', freq='1D')
df['mydata'] = np.sin(2*np.linspace(-1,1,len(df))) # dummy variable
def format_date(x, pos=None):
return dates.num2date(x).strftime('%Y-%m-%d') #use FuncFormatter to format dates
plt.figure()
ax = Axes3D(fig,rect=[0,0.1,1,1]) #make room for date labels
ax.plot_trisurf(df.date1, df.date2, df.mydata, cmap=cm.coolwarm, linewidth=0.2)
ax.w_xaxis.set_major_locator(ticker.FixedLocator(some_dates)) # I want all the dates on my xaxis
ax.w_xaxis.set_major_formatter(ticker.FuncFormatter(format_date))
ax.w_yaxis.set_major_locator(ticker.FixedLocator(some_dates))
ax.w_yaxis.set_major_formatter(ticker.FuncFormatter(format_date))
for tl in ax.w_xaxis.get_ticklabels(): # re-create what autofmt_xdate but with w_xaxis
tl.set_ha('right')
tl.set_rotation(30)
for tl in ax.w_yaxis.get_ticklabels():
tl.set_ha('right')
#tl.set_rotation(30)
ax.set_xlabel('date1')
ax.set_ylabel('date2')
ax.set_zlabel('mydata')
plt.show()
I keep getting the error RuntimeError: Error in qhull Delaunay triangulation calculation: singular input data (exitcode=2); use python verbose option (-v) to see original qhull error. What am I doing wrong and how do I resolve it?
I 'm using Seaborn in a Jupyter notebook to plot histograms like this:
import numpy as np
import pandas as pd
from pandas import DataFrame
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
df = pd.read_csv('CTG.csv', sep=',')
sns.distplot(df['LBE'])
I have an array of columns with values that I want to plot histogram for and I tried plotting a histogram for each of them:
continous = ['b', 'e', 'LBE', 'LB', 'AC']
for column in continous:
sns.distplot(df[column])
And I get this result - only one plot with (presumably) all histograms:
My desired result is multiple histograms that looks like this (one for each variable):
How can I do this?
Insert plt.figure() before each call to sns.distplot() .
Here's an example with plt.figure():
Here's an example without plt.figure():
Complete code:
# imports
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [6, 2]
%matplotlib inline
# sample time series data
np.random.seed(123)
df = pd.DataFrame(np.random.randint(-10,12,size=(300, 4)), columns=list('ABCD'))
datelist = pd.date_range(pd.datetime(2014, 7, 1).strftime('%Y-%m-%d'), periods=300).tolist()
df['dates'] = datelist
df = df.set_index(['dates'])
df.index = pd.to_datetime(df.index)
df.iloc[0]=0
df=df.cumsum()
# create distplots
for column in df.columns:
plt.figure() # <==================== here!
sns.distplot(df[column])
Distplot has since been deprecated in seaborn versions >= 0.14.0. You can, however, use sns.histplot() to plot histogram distributions of the entire dataframe (numerical features only) in the following way:
fig, axes = plt.subplots(2,5, figsize=(15, 5))
ax = axes.flatten()
for i, col in enumerate(df.columns):
sns.histplot(df[col], ax=ax[i]) # histogram call
ax[i].set_title(col)
# remove scientific notation for both axes
ax[i].ticklabel_format(style='plain', axis='both')
fig.tight_layout(w_pad=6, h_pad=4) # change padding
plt.show()
If, you specifically want a way to estimate the probability density function of a continuous random variable using the Kernel Density Function (mimicing the default behavior of sns.distplot()), then inside the sns.histplot() function call, add kde=True, and you will have curves overlaying the histograms.
Also works when looping with plt.show() inside:
for column in df.columns:
sns.distplot(df[column])
plt.show()
I am trying to do a plot of values over time using seaborn linear model plot but I get the error
TypeError: invalid type promotion
I have read that it is not possible to plot pandas date objects, but that seems really strange given seaborn requires you pass a pandas DataFrame to the plots.
Below is a simple example. Does anyone know how I can get this to work?
import pandas as pd
import seaborn as sns; sns.set(color_codes=True)
import matplotlib.pyplot as plt
date = ['1975-12-03','2008-08-20', '2011-03-16']
value = [1,4,5]
df = pd.DataFrame({'date':date, 'value': value})
df['date'] = pd.to_datetime(df['date'])
g = sns.lmplot(x="date", y="value", data=df, size = 4, aspect = 1.5)
I am trying to do a plot like this one I created in r using ggplot hence why I want to use sns.lmplot
You need to convert your dates to floats, then format the x-axis to reinterpret and format the floats into dates.
Here's how I would do this:
import pandas
import seaborn
from matplotlib import pyplot, dates
%matplotlib inline
date = ['1975-12-03','2008-08-20', '2011-03-16']
value = [1,4,5]
df = pandas.DataFrame({
'date': pandas.to_datetime(date), # pandas dates
'datenum': dates.datestr2num(date), # maptlotlib dates
'value': value
})
#pyplot.FuncFormatter
def fake_dates(x, pos):
""" Custom formater to turn floats into e.g., 2016-05-08"""
return dates.num2date(x).strftime('%Y-%m-%d')
fig, ax = pyplot.subplots()
# just use regplot if you don't need a FacetGrid
seaborn.regplot('datenum', 'value', data=df, ax=ax)
# here's the magic:
ax.xaxis.set_major_formatter(fake_dates)
# legible labels
ax.tick_params(labelrotation=45)
I have found a derived solution from Paul H. for plotting timestamp in seaborn. I had to apply it over my data due to some backend error messages that was returning.
In my solution, I added a matplotlib.ticker FuncFormatter over the ax.xaxis.set_major_formatter. This FuncFormatter wraps the fake_dates function. This way, one doesn't need to insert the #pyplot.FuncFormatter beforehand.
Here is my solution:
import pandas
import seaborn
from matplotlib import pyplot, dates
from matplotlib.ticker import FuncFormatter
date = ['1975-12-03','2008-08-20', '2011-03-16']
value = [1,4,5]
df = pandas.DataFrame({
'date': pandas.to_datetime(date), # pandas dates
'datenum': dates.datestr2num(date), # maptlotlib dates
'value': value
})
def fake_dates(x, pos):
""" Custom formater to turn floats into e.g., 2016-05-08"""
return dates.num2date(x).strftime('%Y-%m-%d')
fig, ax = pyplot.subplots()
# just use regplot if you don't need a FacetGrid
seaborn.regplot('datenum', 'value', data=df, ax=ax)
# here's the magic:
ax.xaxis.set_major_formatter(FuncFormatter(fake_dates))
# legible labels
ax.tick_params(labelrotation=45)
fig.tight_layout()
I hope that works.
I am creating multiple categorical plots for data frame df with a for loop:
object_bol = df.dtypes == 'object'
for catplot in df.dtypes[object_bol].index:
sns.countplot(y=catplot,data=df)
plt.show()
Output is all the plots sequenced one after the other, how do i assign this to a grid with n columns and m rows (n & m vary depending on number of objects in data frame)?
You would want to extend the example from How do I plot two countplot graphs side by side in seaborn? to more subplots.
import numpy as np
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
df=pd.DataFrame(np.random.choice(list("abcd"), size=(100,20), p=[.4,.3,.2,.1]))
fig, axes =plt.subplots(5,4, figsize=(10,10), sharex=True)
axes = axes.flatten()
object_bol = df.dtypes == 'object'
for ax, catplot in zip(axes, df.dtypes[object_bol].index):
sns.countplot(y=catplot, data=df, ax=ax, order=np.unique(df.values))
plt.tight_layout()
plt.show()
You would get something similar without seaborn directly from pandas:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df=pd.DataFrame(np.random.choice(list("abcd"), size=(100,20), p=[.4,.3,.2,.1]))
df.apply(pd.value_counts).plot(kind="barh", subplots=True, layout=(4,5), legend=False)
plt.tight_layout()
plt.show()
I have data of the following format:
import pandas as ps
table={'time':[1,2,3,4,5,1,2,3,4,5,1,2,3,4,5],\
'data':[1,1,2,2,2,1,2,3,4,5,1,2,2,2,3],\
'type':['a','a','a','a','a','b','b','b','b','b','c','c','c','c','c']}
df=ps.DataFrame(table,columns=['time','data','type']
I would like to plot data as a function of time connected as a line, but I would like each line to be a separate color for unique types. In this example, the result would be three lines: a data(time) line for each type a, b, and, c. Any guidance is appreciated.
I have been unable to produce a line with this data--pandas.scatter will produce a plot, while pandas.plot will not. I have been messing with loops to produce a plot for each type, but I have not found a straight forward way to do this. My data typically has an unknown number of unique 'type's. Does pandas and/or matpltlib have a way to create this type of plot?
Pandas plotting capabilities will allow you to do this if everything is indexed properly. However, sometimes it's easier to just use matplotlib directly:
import pandas as pd
import matplotlib.pyplot as plt
table={'time':[1,2,3,4,5,1,2,3,4,5,1,2,3,4,5],
'data':[1,1,2,2,2,1,2,3,4,5,1,2,2,2,3],
'type':['a','a','a','a','a','b','b','b','b','b','c','c','c','c','c']}
df=pd.DataFrame(table, columns=['time','data','type'])
groups = df.groupby('type')
fig, ax = plt.subplots()
for name, group in groups:
ax.plot(group['time'], group['data'], label=name)
ax.legend(loc='best')
plt.show()
If you'd prefer to use the pandas plotting wrapper, you'll need to override the legend labels:
import pandas as pd
import matplotlib.pyplot as plt
table={'time':[1,2,3,4,5,1,2,3,4,5,1,2,3,4,5],
'data':[1,1,2,2,2,1,2,3,4,5,1,2,2,2,3],
'type':['a','a','a','a','a','b','b','b','b','b','c','c','c','c','c']}
df=pd.DataFrame(table, columns=['time','data','type'])
df.index = df['time']
groups = df[['data', 'type']].groupby('type')
fig, ax = plt.subplots()
groups.plot(ax=ax, legend=False)
names = [item[0] for item in groups]
ax.legend(ax.lines, names, loc='best')
plt.show()
Just to throw in the seaborn solution.
import seaborn as sns
import matplotlib.pyplot as plt
g = sns.FacetGrid(df, hue="type", size=5)
g.map(plt.plot, "time", "data")
g.add_legend()