I am trying to do a plot of values over time using seaborn linear model plot but I get the error
TypeError: invalid type promotion
I have read that it is not possible to plot pandas date objects, but that seems really strange given seaborn requires you pass a pandas DataFrame to the plots.
Below is a simple example. Does anyone know how I can get this to work?
import pandas as pd
import seaborn as sns; sns.set(color_codes=True)
import matplotlib.pyplot as plt
date = ['1975-12-03','2008-08-20', '2011-03-16']
value = [1,4,5]
df = pd.DataFrame({'date':date, 'value': value})
df['date'] = pd.to_datetime(df['date'])
g = sns.lmplot(x="date", y="value", data=df, size = 4, aspect = 1.5)
I am trying to do a plot like this one I created in r using ggplot hence why I want to use sns.lmplot
You need to convert your dates to floats, then format the x-axis to reinterpret and format the floats into dates.
Here's how I would do this:
import pandas
import seaborn
from matplotlib import pyplot, dates
%matplotlib inline
date = ['1975-12-03','2008-08-20', '2011-03-16']
value = [1,4,5]
df = pandas.DataFrame({
'date': pandas.to_datetime(date), # pandas dates
'datenum': dates.datestr2num(date), # maptlotlib dates
'value': value
})
#pyplot.FuncFormatter
def fake_dates(x, pos):
""" Custom formater to turn floats into e.g., 2016-05-08"""
return dates.num2date(x).strftime('%Y-%m-%d')
fig, ax = pyplot.subplots()
# just use regplot if you don't need a FacetGrid
seaborn.regplot('datenum', 'value', data=df, ax=ax)
# here's the magic:
ax.xaxis.set_major_formatter(fake_dates)
# legible labels
ax.tick_params(labelrotation=45)
I have found a derived solution from Paul H. for plotting timestamp in seaborn. I had to apply it over my data due to some backend error messages that was returning.
In my solution, I added a matplotlib.ticker FuncFormatter over the ax.xaxis.set_major_formatter. This FuncFormatter wraps the fake_dates function. This way, one doesn't need to insert the #pyplot.FuncFormatter beforehand.
Here is my solution:
import pandas
import seaborn
from matplotlib import pyplot, dates
from matplotlib.ticker import FuncFormatter
date = ['1975-12-03','2008-08-20', '2011-03-16']
value = [1,4,5]
df = pandas.DataFrame({
'date': pandas.to_datetime(date), # pandas dates
'datenum': dates.datestr2num(date), # maptlotlib dates
'value': value
})
def fake_dates(x, pos):
""" Custom formater to turn floats into e.g., 2016-05-08"""
return dates.num2date(x).strftime('%Y-%m-%d')
fig, ax = pyplot.subplots()
# just use regplot if you don't need a FacetGrid
seaborn.regplot('datenum', 'value', data=df, ax=ax)
# here's the magic:
ax.xaxis.set_major_formatter(FuncFormatter(fake_dates))
# legible labels
ax.tick_params(labelrotation=45)
fig.tight_layout()
I hope that works.
Related
I am doing the data segmentation where I have huge data of 1200 rows and 17 columns. I want to plot the graph for entire data for country and population.
When I am trying to work with below code I am getting an error:
ValueError: could not convert string to float: 'Canada'
The code:
import pandas as pd # for dataframes
import matplotlib.pyplot as plt # for plotting graphs
import seaborn as sns # for plotting graphs
import datetime as dt
data = pd.read_excel("TestData.xls")
plt.figure(1, figsize=(15, 6))
n=0
for x in ['Country', 'Product', 'Sales']:
n += 1
plt.subplot(1,3,n)
plt.subplots_adjust(hspace=0.5, wspace=0.5)
sns.distplot(data[x], bins=20)
plt.title('Displot of {}'.format(x))
plt.show()
If you're passing an object of type str as the first argument in the seaborn.distplot() method, make sure that the string is in the format of an integer or float.
You can try:
import pandas as pd # for dataframes
import matplotlib.pyplot as plt # for plotting graphs
import seaborn as sns # for plotting graphs
import datetime as dt
data = pd.read_excel("TestData.xls")
plt.figure(1, figsize=(15, 6))
n=0
for x in ['Country', 'Product', 'Sales']:
n += 1
plt.subplot(1,3,n)
plt.subplots_adjust(hspace=0.5, wspace=0.5)
a = data[x]
if a.replace('.', '', 1).isdigit():
sns.distplot(a, bins=20)
else:
print(f"{a} is not a float.")
plt.title('Displot of {}'.format(x))
plt.show()
But do note from the linked documentation:
Warning
This function is deprecated and will be removed in a future version. >Please adapt your code to use one of two new functions:
displot(), a figure-level function with a similar flexibility over the kind of plot to draw
histplot(), an axes-level function for plotting histograms, including with kernel density smoothing
I 'm using Seaborn in a Jupyter notebook to plot histograms like this:
import numpy as np
import pandas as pd
from pandas import DataFrame
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
df = pd.read_csv('CTG.csv', sep=',')
sns.distplot(df['LBE'])
I have an array of columns with values that I want to plot histogram for and I tried plotting a histogram for each of them:
continous = ['b', 'e', 'LBE', 'LB', 'AC']
for column in continous:
sns.distplot(df[column])
And I get this result - only one plot with (presumably) all histograms:
My desired result is multiple histograms that looks like this (one for each variable):
How can I do this?
Insert plt.figure() before each call to sns.distplot() .
Here's an example with plt.figure():
Here's an example without plt.figure():
Complete code:
# imports
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [6, 2]
%matplotlib inline
# sample time series data
np.random.seed(123)
df = pd.DataFrame(np.random.randint(-10,12,size=(300, 4)), columns=list('ABCD'))
datelist = pd.date_range(pd.datetime(2014, 7, 1).strftime('%Y-%m-%d'), periods=300).tolist()
df['dates'] = datelist
df = df.set_index(['dates'])
df.index = pd.to_datetime(df.index)
df.iloc[0]=0
df=df.cumsum()
# create distplots
for column in df.columns:
plt.figure() # <==================== here!
sns.distplot(df[column])
Distplot has since been deprecated in seaborn versions >= 0.14.0. You can, however, use sns.histplot() to plot histogram distributions of the entire dataframe (numerical features only) in the following way:
fig, axes = plt.subplots(2,5, figsize=(15, 5))
ax = axes.flatten()
for i, col in enumerate(df.columns):
sns.histplot(df[col], ax=ax[i]) # histogram call
ax[i].set_title(col)
# remove scientific notation for both axes
ax[i].ticklabel_format(style='plain', axis='both')
fig.tight_layout(w_pad=6, h_pad=4) # change padding
plt.show()
If, you specifically want a way to estimate the probability density function of a continuous random variable using the Kernel Density Function (mimicing the default behavior of sns.distplot()), then inside the sns.histplot() function call, add kde=True, and you will have curves overlaying the histograms.
Also works when looping with plt.show() inside:
for column in df.columns:
sns.distplot(df[column])
plt.show()
I have a pandas dataframe with a date column (RankingDate).
This date field is initially a string loaded from a csv in the the format "2006-11-03"
After running df["RankingDate"]=pd.to_datetime(df["RankingDate"]), the data type becomes '<M8[ns]'
I then plot multiple lines over time using seaborn:
f, ax = plt.subplots(figsize=(16, 8))
sns.tsplot(data, time='RankingDate', unit='Dummy', condition='Player', value='Points', ax=ax)
However this gives me a chart where the date axis is labelled in nanoseconds (i.e. 1e10^18), instead of a nice date format like "2006-11-03".
How can I get seaborn to display a date instead of nanoseconds?
Example code:
import numpy as np
import pandas as pd
import seaborn as sns
RankingDate = ['2015-03-02','2015-03-03','2015-03-04','2015-03-05','2015-03-06']
Player = ['Player1','Player2','Player2','Player1','Player1']
Points = np.random.randn(5)
df = pd.DataFrame({'RankingDate': RankingDate , 'Player': Player, 'Points': Points})
df["RankingDate"]=pd.to_datetime(df["RankingDate"])
df["Dummy"]=0
f, ax = plt.subplots(figsize=(16, 8))
sns.tsplot(df, time='RankingDate', unit='Dummy', condition='Player', value='Points', ax=ax)
Below I have the following script which creates a simple time series plot:
%matplotlib inline
import datetime
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
df = []
start_date = datetime.datetime(2015, 7, 1)
for i in range(10):
for j in [1,2]:
unit = 'Ones' if j == 1 else 'Twos'
date = start_date + datetime.timedelta(days=i)
df.append({
'Date': date.strftime('%Y%m%d'),
'Value': i * j,
'Unit': unit
})
df = pd.DataFrame(df)
sns.tsplot(df, time='Date', value='Value', unit='Unit', ax=ax)
fig.autofmt_xdate()
And the result of this is the following:
As you can see the x-axis has strange numbers for the datetimes, and not the usual "nice" representations that come with matplotlib and other plotting utilities. I've tried many things, re-formatting the data but it never comes out clean. Anyone know a way around?
Matplotlib represents dates as floating point numbers (in days), thus unless you (or pandas or seaborn), tell it that your values are representing dates, it will not format the ticks as dates. I'm not a seaborn expert, but it looks like it (or pandas) does convert the datetime objects to matplotlib dates, but then does not assign proper locators and formatters to the axes. This is why you get these strange numbers, which are in fact just the days since 0001.01.01. So you'll have to take care of the ticks manually (which, in most cases, is better anyways as it gives you more control).
So you'll have to assign a date locator, which decides where to put ticks, and a date formatter, which will then format the strings for the tick labels.
import datetime
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# build up the data
df = []
start_date = datetime.datetime(2015, 7, 1)
for i in range(10):
for j in [1,2]:
unit = 'Ones' if j == 1 else 'Twos'
date = start_date + datetime.timedelta(days=i)
# I believe it makes more sense to directly convert the datetime to a
# "matplotlib"-date (float), instead of creating strings and then let
# pandas parse the string again
df.append({
'Date': mdates.date2num(date),
'Value': i * j,
'Unit': unit
})
df = pd.DataFrame(df)
# build the figure
fig, ax = plt.subplots()
sns.tsplot(df, time='Date', value='Value', unit='Unit', ax=ax)
# assign locator and formatter for the xaxis ticks.
ax.xaxis.set_major_locator(mdates.AutoDateLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y.%m.%d'))
# put the labels at 45deg since they tend to be too long
fig.autofmt_xdate()
plt.show()
Result:
For me, #hitzg's answer results in "OverflowError: signed integer is greater than maximum" in the depths of DateFormatter.
Looking at my dataframe, my indices are datetime64, not datetime. Pandas converts these nicely though. The following works great for me:
import matplotlib as mpl
def myFormatter(x, pos):
return pd.to_datetime(x)
[ . . . ]
ax.xaxis.set_major_formatter(mpl.ticker.FuncFormatter(myFormatter))
Here is a potentially inelegant solution, but it's the only one I have ... Hope it helps!
g = sns.pointplot(x, y, data=df, ci=False);
unique_dates = sorted(list(df['Date'].drop_duplicates()))
date_ticks = range(0, len(unique_dates), 5)
g.set_xticks(date_ticks);
g.set_xticklabels([unique_dates[i].strftime('%d %b') for i in date_ticks], rotation='vertical');
g.set_xlabel('Date');
Let me know if you see any issues!
def myFormatter(x, pos):
return pd.to_datetime(x).strftime('%Y%m%d')
ax.xaxis.set_major_formatter(mpl.ticker.FuncFormatter(myFormatter))
I have data of the following format:
import pandas as ps
table={'time':[1,2,3,4,5,1,2,3,4,5,1,2,3,4,5],\
'data':[1,1,2,2,2,1,2,3,4,5,1,2,2,2,3],\
'type':['a','a','a','a','a','b','b','b','b','b','c','c','c','c','c']}
df=ps.DataFrame(table,columns=['time','data','type']
I would like to plot data as a function of time connected as a line, but I would like each line to be a separate color for unique types. In this example, the result would be three lines: a data(time) line for each type a, b, and, c. Any guidance is appreciated.
I have been unable to produce a line with this data--pandas.scatter will produce a plot, while pandas.plot will not. I have been messing with loops to produce a plot for each type, but I have not found a straight forward way to do this. My data typically has an unknown number of unique 'type's. Does pandas and/or matpltlib have a way to create this type of plot?
Pandas plotting capabilities will allow you to do this if everything is indexed properly. However, sometimes it's easier to just use matplotlib directly:
import pandas as pd
import matplotlib.pyplot as plt
table={'time':[1,2,3,4,5,1,2,3,4,5,1,2,3,4,5],
'data':[1,1,2,2,2,1,2,3,4,5,1,2,2,2,3],
'type':['a','a','a','a','a','b','b','b','b','b','c','c','c','c','c']}
df=pd.DataFrame(table, columns=['time','data','type'])
groups = df.groupby('type')
fig, ax = plt.subplots()
for name, group in groups:
ax.plot(group['time'], group['data'], label=name)
ax.legend(loc='best')
plt.show()
If you'd prefer to use the pandas plotting wrapper, you'll need to override the legend labels:
import pandas as pd
import matplotlib.pyplot as plt
table={'time':[1,2,3,4,5,1,2,3,4,5,1,2,3,4,5],
'data':[1,1,2,2,2,1,2,3,4,5,1,2,2,2,3],
'type':['a','a','a','a','a','b','b','b','b','b','c','c','c','c','c']}
df=pd.DataFrame(table, columns=['time','data','type'])
df.index = df['time']
groups = df[['data', 'type']].groupby('type')
fig, ax = plt.subplots()
groups.plot(ax=ax, legend=False)
names = [item[0] for item in groups]
ax.legend(ax.lines, names, loc='best')
plt.show()
Just to throw in the seaborn solution.
import seaborn as sns
import matplotlib.pyplot as plt
g = sns.FacetGrid(df, hue="type", size=5)
g.map(plt.plot, "time", "data")
g.add_legend()