Histogram per hour - matplotlib - python

I'm analyzing public data on transport accidents in the UK.
My dataframe looks like this :
Index Time
0 02:30
1 00:37
2 01:25
3 09:15
4 07:53
5 09:29
6 08:53
7 10:05
I'm trying to plot a histogram showing accident distribution by time of day,
here is my code :
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import datetime as dt
import matplotlib.dates as mdates
df['hour']=pd.to_datetime(df['Time'],format='%H:%M')
df.set_index('hour', drop=False, inplace=True)
df['hour'].groupby(pd.Grouper(freq='60Min')).count().plot(kind='bar', color='b')
This is the output:
In this graph, I'd like to change the labels on the x-axis to the format 'hh:mm'. How would I go about doing this?

What you are missing is setting the format of the matplotlib x-axis format:
df.set_index('hour', drop=False, inplace=True)
df = df['hour'].groupby(pd.Grouper(freq='60Min')).count()
ax = df.plot(kind='bar', color='b')
ticklabels = df.index.strftime('%H:%Mh')
ax.xaxis.set_major_formatter(matplotlib.ticker.FixedFormatter(ticklabels))
plt.show()

Related

pandas bar plot combined with line plot shows the time axis beginning at 1970

I am trying to draw a stock market graph
timeseries vs closing price and timeseries vs volume.
Somehow the x-axis shows the time in 1970
the following is the graph and the code
The code is:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
pd_data = pd.DataFrame(data, columns=['id', 'symbol', 'volume', 'high', 'low', 'open', 'datetime','close','datetime_utc','created_at'])
pd_data['DOB'] = pd.to_datetime(pd_data['datetime_utc']).dt.strftime('%Y-%m-%d')
pd_data.set_index('DOB')
print(pd_data)
print(pd_data.dtypes)
ax=pd_data.plot(x='DOB',y='close',kind = 'line')
ax.set_ylabel("price")
#ax.pd_data['volume'].plot(secondary_y=True, kind='bar')
ax1=pd_data.plot(y='volume',secondary_y=True, ax=ax,kind='bar')
ax1.set_ylabel('Volumne')
# Choose your xtick format string
date_fmt = '%d-%m-%y'
date_formatter = mdates.DateFormatter(date_fmt)
ax1.xaxis.set_major_formatter(date_formatter)
# set monthly locator
ax1.xaxis.set_major_locator(mdates.MonthLocator(interval=1))
# set font and rotation for date tick labels
plt.gcf().autofmt_xdate()
plt.show()
Also tried the two graphs independently without ax=ax
ax=pd_data.plot(x='DOB',y='close',kind = 'line')
ax.set_ylabel("price")
ax1=pd_data.plot(y='volume',secondary_y=True,kind='bar')
ax1.set_ylabel('Volumne')
then price graph shows years properly whereas volumen graph shows 1970
And if i swap them
ax1=pd_data.plot(y='volume',secondary_y=True,kind='bar')
ax1.set_ylabel('Volumne')
ax=pd_data.plot(x='DOB',y='close',kind = 'line')
ax.set_ylabel("price")
Now the volume graph shows years properly whereas the price graph shows the years as 1970
I tried removing secondary_y and also changing bar to line. BUt no luck
Somehow pandas Data after first graph is changing the year.
I do not advise plotting a bar plot with such a numerous amount of bars.
This answer explains why there is an issue with the xtick labels, and how to resolve the issue.
Plotting with pandas.DataFrame.plot works without issue with .set_major_locator
Tested in python 3.8.11, pandas 1.3.2, matplotlib 3.4.2
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas_datareader as web # conda install -c anaconda pandas-datareader or pip install pandas-datareader
# download data
df = web.DataReader('amzn', data_source='yahoo', start='2015-02-21', end='2021-04-27')
# plot
ax = df.plot(y='Close', color='magenta', ls='-.', figsize=(10, 6), ylabel='Price ($)')
ax1 = df.plot(y='Volume', secondary_y=True, ax=ax, alpha=0.5, rot=0, lw=0.5)
ax1.set(ylabel='Volume')
# format
date_fmt = '%d-%m-%y'
years = mdates.YearLocator() # every year
yearsFmt = mdates.DateFormatter(date_fmt)
ax.xaxis.set_major_locator(years)
ax.xaxis.set_major_formatter(yearsFmt)
plt.setp(ax.get_xticklabels(), ha="center")
plt.show()
Why are the OP x-tick labels starting from 1970?
Bar plots locations are being 0 indexed (with pandas), and 0 corresponds to 1970
See Pandas bar plot changes date format
Most solutions with bar plots simply reformat the label to the appropriate datetime, however this is cosmetic and will not align the locations between the line plot and bar plot
Solution 2 of this answer shows how to change the tick locators, but is really not worth the extra code, when plt.bar can be used.
print(pd.to_datetime(ax1.get_xticks()))
DatetimeIndex([ '1970-01-01 00:00:00',
'1970-01-01 00:00:00.000000001',
'1970-01-01 00:00:00.000000002',
'1970-01-01 00:00:00.000000003',
...
'1970-01-01 00:00:00.000001552',
'1970-01-01 00:00:00.000001553',
'1970-01-01 00:00:00.000001554',
'1970-01-01 00:00:00.000001555'],
dtype='datetime64[ns]', length=1556, freq=None)
ax = df.plot(y='Close', color='magenta', ls='-.', figsize=(10, 6), ylabel='Price ($)')
print(ax.get_xticks())
ax1 = df.plot(y='Volume', secondary_y=True, ax=ax, kind='bar')
print(ax1.get_xticks())
ax1.set_xlim(0, 18628.)
date_fmt = '%d-%m-%y'
years = mdates.YearLocator() # every year
yearsFmt = mdates.DateFormatter(date_fmt)
ax.xaxis.set_major_locator(years)
ax.xaxis.set_major_formatter(yearsFmt)
[out]:
[16071. 16436. 16801. 17167. 17532. 17897. 18262. 18628.] ← ax tick locations
[ 0 1 2 ... 1553 1554 1555] ← ax1 tick locations
With plt.bar the bar plot locations are indexed based on the datetime
ax = df.plot(y='Close', color='magenta', ls='-.', figsize=(10, 6), ylabel='Price ($)', rot=0)
plt.setp(ax.get_xticklabels(), ha="center")
print(ax.get_xticks())
ax1 = ax.twinx()
ax1.bar(df.index, df.Volume)
print(ax1.get_xticks())
date_fmt = '%d-%m-%y'
years = mdates.YearLocator() # every year
yearsFmt = mdates.DateFormatter(date_fmt)
ax.xaxis.set_major_locator(years)
ax.xaxis.set_major_formatter(yearsFmt)
[out]:
[16071. 16436. 16801. 17167. 17532. 17897. 18262. 18628.]
[16071. 16436. 16801. 17167. 17532. 17897. 18262. 18628.]
sns.barplot(x=df.index, y=df.Volume, ax=ax1) has xtick locations as [ 0 1 2 ... 1553 1554 1555], so the bar plot and line plot did not align.
I could not find the reason for 1970, but rather use matplotlib.pyplot to plot instead of indirectly using pandas and also pass the datatime array instead of pandas
So the following code worked
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
import datetime as dt
import numpy as np
pd_data = pd.read_csv("/home/stockdata.csv",sep='\t')
pd_data['DOB'] = pd.to_datetime(pd_data['datetime2']).dt.strftime('%Y-%m-%d')
dates=[dt.datetime.strptime(d,'%Y-%m-%d').date() for d in pd_data['DOB']]
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%m/%d/%Y'))
plt.gca().xaxis.set_major_locator(mdates.MonthLocator(interval=2))
plt.bar(dates,pd_data['close'],align='center')
plt.gca().xaxis.set_minor_locator(plt.MultipleLocator(1))
plt.gcf().autofmt_xdate()
plt.show()
I have created a dates array in the datetime format. If i make graph using that then the dates are no more shown as 1970
open high low close volume datetime datetime2
35.12 35.68 34.79 35.58 1432995 1244385200000 2012-6-15 10:30:00
35.69 36.02 35.37 35.78 1754319 1244371600000 2012-6-16 10:30:00
35.69 36.23 35.59 36.23 3685845 1245330800000 2012-6-19 10:30:00
36.11 36.52 36.03 36.32 2635777 1245317200000 2012-6-20 10:30:00
36.54 36.6 35.8 35.9 2886412 1245303600000 2012-6-21 10:30:00
36.03 36.95 36.0 36.09 3696278 1245390000000 2012-6-22 10:30:00
36.5 37.27 36.18 37.11 2732645 1245376400000 2012-6-23 10:30:00
36.98 37.11 36.686 36.83 1948411 1245335600000 2012-6-26 10:30:00
36.67 37.06 36.465 37.05 2557172 1245322000000 2012-6-27 10:30:00
37.06 37.61 36.77 37.52 1780126 1246308400000 2012-6-28 10:30:00
37.47 37.77 37.28 37.7 1352267 1246394800000 2012-6-29 10:30:00
37.72 38.1 37.68 37.76 2194619 1246381200000 2012-6-30 10:30:00
The plot i get is

How to change xticks to yearly interval in pandas time series plot

I am very new to pandas, and I have searched many StackOverflow questions similar to this for changing xtick labels yearly, but they all are different did not solve my problem, so I decided to ask my own question.
Here is my question. I have a mock data frame which I want to plot yearly xticks in the x-axis.
import numpy as np
import pandas as pd
df = pd.DataFrame({'date': pd.date_range('1991-01-01','2019-01-01')}).set_index('date')
df['value'] = np.random.randn(len(df))
df.plot()
This gives:
Xticks ==> 1995 2000 2005 etc
But I want ==> 1991 1992 ... 2019
How to do that?
So far I have tried this:
import matplotlib
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
fig,ax = plt.subplots()
df.plot(ax=ax)
ax.xaxis.set_major_locator(matplotlib.dates.YearLocator(base=1))
# ax.xaxis.set_minor_locator(matplotlib.dates.YearLocator(base=1))
# ax.set_xticklabels(list(df.index.time))
This gives just 2005 as xtick and nothing has worked till now.
Links I looked:
- Changing xticks in a pandas plot
- Python: Change the time on xticks for Pandas Plot
- https://matplotlib.org/3.1.1/api/dates_api.html
You need to use the x_compat=True argument to have pandas choose the units in a way that they are compatible with matplotlib.dates locators and formatters.
df.plot(ax=ax, x_compat=True)
Complete code:
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
df = pd.DataFrame({'date': pd.date_range('1991-01-01','2019-01-01')}).set_index('date')
df['value'] = np.random.randn(len(df))
fig,ax = plt.subplots()
df.plot(ax=ax, x_compat=True)
ax.xaxis.set_major_locator(matplotlib.dates.YearLocator(base=1))
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter("%Y"))
plt.show()
You can try this:
import datetime
# create xticks
xticks = pd.date_range(datetime.datetime(1990,1,1), datetime.datetime(2020,1,1), freq='YS')
# plot
fig, ax = plt.subplots(figsize=(12,8))
df['value'].plot(ax=ax,xticks=xticks.to_pydatetime())
ax.set_xticklabels([x.strftime('%Y') for x in xticks]);
plt.xticks(rotation=90);
Complete Example
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import datetime
# data
df = pd.DataFrame({'date': pd.date_range('1991-01-01','2019-01-01')}).set_index('date')
df['value'] = np.random.randn(len(df))
# create xticks
xticks = pd.date_range(datetime.datetime(1990,1,1), datetime.datetime(2020,1,1), freq='YS')
# plot
fig, ax = plt.subplots(figsize=(12,8))
df['value'].plot(ax=ax,xticks=xticks.to_pydatetime())
ax.set_xticklabels([x.strftime('%Y') for x in xticks]);
plt.xticks(rotation=90);
plt.show()
This gives:

Seaborn color palette: how to choose which part to center on (e.g. which end of the plot is red and which end is blue)?

I want a catplot of a Pandas dataframe that contains a numerical value for all hours of a year. It has 3 columns: Hour, Weekday, and Value. I plot it like this:
cat_weekdayhour = plt.figure(figsize=(12,12))
cat_weekdayhour = sns.set_context("paper")
cat_weekdayhour = sns.set(style="darkgrid", font_scale=.6)
weekdayhour.shape
cat_weekdayhour = sns.catplot(x="Weekday", y="Value", hue="Hour", kind="swarm", palette="coolwarm", data=dataframe)
This gives me the following catplot, but I don't like how the early hours of a day (like 0-4 AM) are very blue and then the last hours (8-11 PM) are red. Instead, I want to center the RED color to the DAY hours and then make all the night hours blue. Can this be done? Thank you.
Create your own pallete by combining "coolwarm" with its reveresed pallette. I have an older version of seaborn so I'll use swarmplot to illustrate
Sample Data
import seaborn as sns
import numpy as np
import pandas as pd
n = 1000
np.random.seed(123)
df = pd.DataFrame({'Weekday': ['Friday']*n,
'Hour': np.random.randint(0,24,n),
'Value': np.random.randint(40,150,n)})
coolwarm palette
sns.swarmplot(x="Weekday", y="Value", hue="Hour", palette="coolwarm", data=df)
Custom palette
# 24 hours so split evenly between the two
mypal = sns.color_palette("coolwarm", 12) + sns.color_palette("coolwarm_r", 12)
sns.swarmplot(x="Weekday", y="Value", hue="Hour", palette=mypal, data=df)
One idea can be to create a new colormap, let's call it "cycliccoolwarm" which contains the original colormap and a reversed version of it. Then using this new colormap is as easy as any other existing colormap.
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
import seaborn as sns
import numpy as np
import pandas as pd
df = pd.DataFrame({"Day" : np.repeat(np.array(list("1234567")),24*40 ),
"Hour" : np.tile(np.arange(0,24), 7*40).astype(int),
"Value" : np.random.rand(7*24*40)*180})
df['Hour'] = df['Hour'].apply('{:02d}:00'.format)
cmap_orig = plt.get_cmap("coolwarm")
colors = cmap_orig(np.concatenate((np.linspace(0,1,128), np.linspace(1,0,128))))
cmap = ListedColormap(colors)
plt.cm.register_cmap("cycliccoolwarm", cmap=cmap)
g = sns.catplot(x="Day", y="Value", hue="Hour", kind="swarm", palette="cycliccoolwarm", data=df)
plt.show()
You are using the Hour column in your dataframe as a source for the coloring. Instead, you could introduce a new column that containes the values from Hour relative to 12:00:
dataframe['Color'] = -abs(dataframe['Hour'] - 12)
So 12:00 will become 0, 24:00 will become -12. 6:00 and 18:00 will become -6.
This way the middle of the day will be represented by the highest value (red → 0) while midnight will be represented by the lowest value (blue → -12).

Manipulating Dates in x-axis Pandas Matplotlib

I have a pretty simple set of data as displayed below. I am looking for a way to plot this stacked bar chart and format the x-axis (dates) so it starts at 1996-31-12 and ends at 2016-31-12 on increments of 365 days. The code I have written is plotting every single date and therefore the x-axis is very bunched up and not readable.
Datafame:
Date A B
1996-31-12 10 3
1997-31-03 5 6
1997-31-07 7 5
1997-30-11 3 12
1997-31-12 4 10
1998-31-03 5 8
.
.
.
2016-31-12 3 9
This is a similar question: Pandas timeseries plot setting x-axis major and minor ticks and labels
You can manage this using matplotlib itself instead of pandas.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# if your dates are strings you need this step
df.Date = pd.to_datetime(df.Date)
fig,ax = plt.subplots()
ax.plot_date(df.Date,df.A)
ax.plot_date(df.Date,df.B)
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b\n%Y'))
plt.show()

matplotlib - plot wrong datetime

I'd plot a figure with matplotlib in which the x-axis there are timestamp with yy-mm-dd hh-mm-ss. I have ts in datetime64 (pandas series) and to show also (right) minutes and seconds i follow the hint in this link using date2num. The problem is that it plots no-sense dates:
import datetime as dt
import matplotlib.pyplot as plt
import matplotlib.dates as md
for df in dfs:
datenums=md.date2num(df.toPandas()["timestamp"])
plt.xticks(rotation=25)
xfmt = md.DateFormatter('%Y-%m-%d %H:%M:%S')
ax.xaxis.set_major_formatter(xfmt)
plt.plot(datenums,x)
plt.show()
where df.toPandas()["timestamp"] is:
0 2015-12-15 03:53:13
Name: timestamp, dtype: datetime64[ns]
I tried to convert datetime64 in datetime but the result doesn't change.
If you have your timestamp values on seconds, use this to create a list for all the tick labels and then add them to the plot considering your data is related to an array of timestamps
import matplotlib.pyplot as plt
import numpy as np
import datetime
OX_ticks_name = [datetime.datetime.fromtimestamp(x).strftime('%Y-%m-%d %H:%M:%S') for x in arrayTmstmp]
OX_ticks_pos = np.arange(0,len(arrayTmstmp))
fig, ax = plt.subplots(figsize=(16, 9), dpi=100)
...
ax.set_xticks(OX_ticks_pos)
ax.set_xticklabels(OX_ticks_name, rotation=40, horizontalalignment='right', fontsize=10)
plt.tight_layout()
plt.show()
Of course, the position of each tick and the name for each can be configured as you want.

Categories

Resources