I have a pandas dataframe with a date column (RankingDate).
This date field is initially a string loaded from a csv in the the format "2006-11-03"
After running df["RankingDate"]=pd.to_datetime(df["RankingDate"]), the data type becomes '<M8[ns]'
I then plot multiple lines over time using seaborn:
f, ax = plt.subplots(figsize=(16, 8))
sns.tsplot(data, time='RankingDate', unit='Dummy', condition='Player', value='Points', ax=ax)
However this gives me a chart where the date axis is labelled in nanoseconds (i.e. 1e10^18), instead of a nice date format like "2006-11-03".
How can I get seaborn to display a date instead of nanoseconds?
Example code:
import numpy as np
import pandas as pd
import seaborn as sns
RankingDate = ['2015-03-02','2015-03-03','2015-03-04','2015-03-05','2015-03-06']
Player = ['Player1','Player2','Player2','Player1','Player1']
Points = np.random.randn(5)
df = pd.DataFrame({'RankingDate': RankingDate , 'Player': Player, 'Points': Points})
df["RankingDate"]=pd.to_datetime(df["RankingDate"])
df["Dummy"]=0
f, ax = plt.subplots(figsize=(16, 8))
sns.tsplot(df, time='RankingDate', unit='Dummy', condition='Player', value='Points', ax=ax)
Related
I'm attempting to plot a pandas stacked bar plot with the x axis showing Months on the major ticks, or years on Jan 1, ideally with small ticks identifying the weeks but with no label.
I have a dataset with a datetime index that was then grouped by week and then I plot that dataset. If I don't attempt to control the settings the dates show up but are vertical and don't fit. So I used the set formatter to fix that but then the axes changed to 1970 as if following an index number instead of date. If I replace the pandas plotting with a regular bar chart, the "ConciseDateFormatter" works as desired/expected. But I wanted to use stacked with pandas as creating a regular stacked bar chart is a pain. I don't understand why I can't control pandas axes like I can a regular plot.
One thing I notice is that the index is shown as an object. If I convert it to to_datetime() it then adds 00:00 for times that I don't want on the axes or my data.
My data is a simple set of weekly random data:
date A B C D
3/20/2022 1.540765154 0.504616419 1.543679189 2.952934623
3/27/2022 1.781135128 4.594966635 4.799026389 3.499803401
4/3/2022 0.254059207 0.69835265 0.323039575 1.628138491
4/10/2022 3.112760301 0.287056897 4.372938373 0.130817579
4/17/2022 0.497273044 0.913246096 1.296612207 1.250610278
4/24/2022 1.370087689 3.124985109 4.322253295 4.49571603
5/1/2022 3.952629538 3.976896924 1.679311114 1.265443147
5/8/2022 3.470328161 1.266161308 3.990502436 1.364929959
5/15/2022 2.296588269 4.639761391 0.04685036 1.438471692
5/22/2022 3.443458637 2.66592719 0.968656871 2.349325343
5/29/2022 1.820278464 4.794211675 2.435710815 2.156110694
6/5/2022 4.328825266 0.049132356 1.842839099 3.665701299
6/12/2022 0.184631564 0.412976815 4.787477069 4.80052839
6/19/2022 4.846734385 3.471474741 1.808871854 2.440013553
6/26/2022 1.612870444 0.70191857 3.55713114 1.438699834
7/3/2022 2.896859156 4.025996887 0.209608767 4.174881655
Code:
import datetime
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
import pandas as pd
maxval = 200
values = ['A','B','C','D']
cum = [v + '_CUM' for v in values]
df = pd.read_csv('test_data.csv', index_col='date', parse_dates=True,
infer_datetime_format=True)
#df.index = pd.to_datetime(df.index.date).strftime("%b %d")
df = df.join(df.cumsum(), rsuffix="_CUM")
df = df.join(df[cum]/maxval * 100, rsuffix="_LIFE")
fig, axs = plt.subplots(nrows=2, ncols=1, sharex=False, squeeze=False,
facecolor='white')
axs = axs.flatten()
ax = axs[0]
df[values].plot.bar(ax=ax, grid=True, stacked=True, legend=True)
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_major_formatter(mdates.ConciseDateFormatter
(ax.xaxis.get_major_locator()))
# ax.xaxis.set_tick_params(rotation = 0)
plt.show(block=False)
I want to plot my dataframe (df) as a bar plot based on the time columns, where each bar represents the value counts() for each letter that appears in the column.
Expected output
.
date,00:00:00,01:00:00,02:00:00,03:00:00,04:00:00
2002-02-01,Y,Y,U,N,N
2002-02-02,U,N,N,N,N
2002-02-03,N,N,N,N,N
2002-02-04,N,N,N,N,N
2002-02-05,N,N,N,N,N
When I select individual time columns, I can do as below
import pandas as pd
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt
df = pd.read_csv('df.csv')
df = df['04:00:00'].value_counts()
df.plot(kind='bar')
plt.show()
How can I plot all the columns on the same bar plot as shown on the expected output.
One possible solution is:
pd.DataFrame({t: df[t].value_counts() for t in df.columns if t != "date"}).T.plot.bar()
Here is an approach via seaborn's catplot:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from io import StringIO
df_str = '''date,00:00:00,01:00:00,02:00:00,03:00:00,04:00:00
2002-02-01,Y,Y,U,N,N
2002-02-02,U,N,N,N,N
2002-02-03,N,N,N,N,N
2002-02-04,N,N,N,N,N
2002-02-05,N,N,N,N,N'''
df = pd.read_csv(StringIO(df_str))
df_long = df.set_index('date').melt(var_name='hour', value_name='kind')
g = sns.catplot(kind='count', data=df_long, x='kind', palette='mako',
col='hour', col_wrap=5, height=3, aspect=0.5)
for ax in g.axes.flat:
ax.set_xlabel(ax.get_title()) # use the title as xlabel
ax.grid(True, axis='y')
ax.set_title('')
if len(ax.get_ylabel()) == 0:
sns.despine(ax=ax, left=True) # remove left axis for interior subplots
ax.tick_params(axis='y', size=0)
plt.tight_layout()
plt.show()
I have around 4475 rows of csv data like below:
,Time,Values,Size
0,1900-01-01 23:11:30.368,2,
1,1900-01-01 23:11:30.372,2,
2,1900-01-01 23:11:30.372,2,
3,1900-01-01 23:11:30.372,2,
4,1900-01-01 23:11:30.376,2,
5,1900-01-01 23:11:30.380,,
6,1900-01-01 23:11:30.380,,
7,1900-01-01 23:11:30.380,,
8,1900-01-01 23:11:30.380,,321
9,1900-01-01 23:11:30.380,,111
.
.
4474,1900-01-01 23:11:32.588,,
When I try to create simple seaborn lineplot with below code. It creates line chart but its continuous chart while my data i.e. 'Values' has many empty/nan values which should show as gap on chart. How can I do that?
[from datetime import datetime
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.read_csv("Data.csv")
sns.set(rc={'figure.figsize':(13,4)})
ax =sns.lineplot(x="Time", y="Values", data=df)
ax.set(xlabel='Time', ylabel='Values')
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()]
As reported in this answer:
I've looked at the source code and it looks like lineplot drops nans from the DataFrame before plotting. So unfortunately it's not possible to do it properly.
So, the easiest way to do it is to use matplotlib in place of seaborn.
In the code below I generate a dataframe like your with 20% of missing values in 'Values' column and I use matplotlib to draw a plot:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({'Time': pd.date_range(start = '1900-01-01 23:11:30', end = '1900-01-01 23:11:30.1', freq = 'L')})
df['Values'] = np.random.randint(low = 2, high = 10, size = len(df))
df['Values'] = df['Values'].mask(np.random.random(df['Values'].shape) < 0.2)
fig, ax = plt.subplots(figsize = (13, 4))
ax.plot(df['Time'], df['Values'])
ax.set(xlabel = 'Time', ylabel = 'Values')
plt.xticks(rotation = 90)
plt.tight_layout()
plt.show()
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
corona_data = pd.read_csv("서울시 코로나19 확진자 현황 csv.csv", encoding="cp949")
confirmed_dates = corona_data["확진일"]
confirmed_date = [datetime.strptime(date, "%Y-%m-%d") for date in confirmed_dates]
corona_data["확진일"]= confirmed_date
plt.rc('font', family='Malgun Gothic')
corona_data["확진일"].plot(title="확진일 별 확진자 추이")
plt.show()
This plot show x-axis is just number and y-axis is date but I wanna change x-axis is date and y-axis is number how can I solve it?
If your data is in a dataframe, I recommend using Seaborn to visualize it. It has a great API that allows you to plot elements of your dataframe by referening column names. Here is a toy example:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Load data
df = pd.read_csv(...)
# Plot scatter plot
sns.scatter(x='col_1', y='col_2', data=df)
plt.show()
Check out the Seaborn documentation for more
The problem seems to be that your dataframe only contains one dataset which are the dated. You could add a column that contains the row numbers and then select what you want to have on x and y axis by passing the column name to the plot function:
import matplotlib.pyplot as plt
from datetime import datetime
corona_data = pd.read_csv("서울시 코로나19 확진자 현황 csv.csv", encoding="cp949")
confirmed_dates = corona_data["확진일"]
confirmed_date = [datetime.strptime(date, "%Y-%m-%d") for date in confirmed_dates]
corona_data["확진일"]= confirmed_date
# now add the numbers to the dataset
corona_data["numbers"]=[i for i in len(confirmed_dates)]
plt.rc('font', family='Malgun Gothic')
# and tell the plot function that you want "확진일" as x ans "numbers" as y axis
corona_data.plot("확진일","numbers",title="확진일 별 확진자 추이")
plt.show()```
I am trying to do a plot of values over time using seaborn linear model plot but I get the error
TypeError: invalid type promotion
I have read that it is not possible to plot pandas date objects, but that seems really strange given seaborn requires you pass a pandas DataFrame to the plots.
Below is a simple example. Does anyone know how I can get this to work?
import pandas as pd
import seaborn as sns; sns.set(color_codes=True)
import matplotlib.pyplot as plt
date = ['1975-12-03','2008-08-20', '2011-03-16']
value = [1,4,5]
df = pd.DataFrame({'date':date, 'value': value})
df['date'] = pd.to_datetime(df['date'])
g = sns.lmplot(x="date", y="value", data=df, size = 4, aspect = 1.5)
I am trying to do a plot like this one I created in r using ggplot hence why I want to use sns.lmplot
You need to convert your dates to floats, then format the x-axis to reinterpret and format the floats into dates.
Here's how I would do this:
import pandas
import seaborn
from matplotlib import pyplot, dates
%matplotlib inline
date = ['1975-12-03','2008-08-20', '2011-03-16']
value = [1,4,5]
df = pandas.DataFrame({
'date': pandas.to_datetime(date), # pandas dates
'datenum': dates.datestr2num(date), # maptlotlib dates
'value': value
})
#pyplot.FuncFormatter
def fake_dates(x, pos):
""" Custom formater to turn floats into e.g., 2016-05-08"""
return dates.num2date(x).strftime('%Y-%m-%d')
fig, ax = pyplot.subplots()
# just use regplot if you don't need a FacetGrid
seaborn.regplot('datenum', 'value', data=df, ax=ax)
# here's the magic:
ax.xaxis.set_major_formatter(fake_dates)
# legible labels
ax.tick_params(labelrotation=45)
I have found a derived solution from Paul H. for plotting timestamp in seaborn. I had to apply it over my data due to some backend error messages that was returning.
In my solution, I added a matplotlib.ticker FuncFormatter over the ax.xaxis.set_major_formatter. This FuncFormatter wraps the fake_dates function. This way, one doesn't need to insert the #pyplot.FuncFormatter beforehand.
Here is my solution:
import pandas
import seaborn
from matplotlib import pyplot, dates
from matplotlib.ticker import FuncFormatter
date = ['1975-12-03','2008-08-20', '2011-03-16']
value = [1,4,5]
df = pandas.DataFrame({
'date': pandas.to_datetime(date), # pandas dates
'datenum': dates.datestr2num(date), # maptlotlib dates
'value': value
})
def fake_dates(x, pos):
""" Custom formater to turn floats into e.g., 2016-05-08"""
return dates.num2date(x).strftime('%Y-%m-%d')
fig, ax = pyplot.subplots()
# just use regplot if you don't need a FacetGrid
seaborn.regplot('datenum', 'value', data=df, ax=ax)
# here's the magic:
ax.xaxis.set_major_formatter(FuncFormatter(fake_dates))
# legible labels
ax.tick_params(labelrotation=45)
fig.tight_layout()
I hope that works.