Manipulating Dates in x-axis Pandas Matplotlib

Manipulating Dates in x-axis Pandas Matplotlib - python

I have a pretty simple set of data as displayed below. I am looking for a way to plot this stacked bar chart and format the x-axis (dates) so it starts at 1996-31-12 and ends at 2016-31-12 on increments of 365 days. The code I have written is plotting every single date and therefore the x-axis is very bunched up and not readable.
Datafame:
Date A B
1996-31-12 10 3
1997-31-03 5 6
1997-31-07 7 5
1997-30-11 3 12
1997-31-12 4 10
1998-31-03 5 8
.
.
.
2016-31-12 3 9

This is a similar question: Pandas timeseries plot setting x-axis major and minor ticks and labels
You can manage this using matplotlib itself instead of pandas.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# if your dates are strings you need this step
df.Date = pd.to_datetime(df.Date)
fig,ax = plt.subplots()
ax.plot_date(df.Date,df.A)
ax.plot_date(df.Date,df.B)
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b\n%Y'))
plt.show()

Related

Categorical data visualization - scatter plot with multiple X using Pandas and Seaborn

I spent many hours looking for tips how to create categorical plot using Seaborn and Pandas having several Xs to be added on x-axis, but I have not found the solution.
For specified columns from excel (for example: S1_1, S1_2, S1_3) I would like to create one scatterplot with readings - it means for each column header 9 measurements are expected. Please refer to the image to see the data structure in excel. I was unable to find the right function.
I tried with the following code, but this is not what I wanted to achieve.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_excel("panda.xlsx")
dfx = pd.DataFrame({"CHAR": ["S1_1","S1_2","S1_3"]})
sns.stripplot(x=dfx['CHAR'],y=df['S1_1'],color='black')
sns.stripplot(x=dfx['CHAR'],y=df['S1_2'],color='black')
sns.stripplot(x=dfx['CHAR'],y=df['S1_3'],color='black')
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.show()
Expected vs obtained plot:

You're overthinking things. You don't need to call stripplot separately for each column. I generated new random data since you didn't share yours in a copy-and-pastable form, but stripplot will basically do what I think you want with a very short invocation.
> print(df)
S1 S2 S3 S4
0 0.314097 0.678525 0.228356 0.770293
1 0.207790 0.739484 0.965662 0.604426
2 0.975562 0.959384 0.088162 0.265529
3 0.616823 0.902795 0.015561 0.662020
4 0.210507 0.287713 0.660347 0.763312
5 0.763505 0.381314 0.759422 0.257578
6 0.707832 0.912063 0.774681 0.534284
7 0.996891 0.258103 0.313047 0.729142
8 0.121308 0.797310 0.286265 0.757299
> sns.stripplot(data=df[["S1", "S2", "S3"]], color='black')
> plt.xlabel("X Axis")

Plotting time series dataframe in python

I am having a really really hard time plotting a time series plot from a data frame python. Please find datatype below.
Time_split datetime64[ns]
Total_S4_Sig1 float64
The time_split column is the X axis and is the time variable. The total s4 is the Y variable and is a float.
0 15:21:00
1 15:22:00
2 15:23:00
3 15:24:00
4 15:25:00
5 19:29:00
6 19:30:00
7 19:31:00
8 19:32:00
9 19:33:00
Please be advised that the time series will never seconds fraction i.e. it will always be 00 and also the data be continuous i.e. it will be minute wise continuous data.
The data will NOT NECESSARILY start at a whole hour. It could start at any time for example 15:35. I want to create a graph where the X axis major marking will be full hours like 19:00, 21:00, 22:00 and the minor marking should be half the hour i.e. 21:30, 19:30. I do not want the seconds part of the time to be seen as its useless.
What I want it to do is just graph hour and minute in format HH:MM and major markings at whole hours and minor markings at half hours.
keydata["Time_split"] = keydata["Time_split"].dt.time
keydata.plot(x='Time_split', y='Total_S4_Sig1')
plt.show()
This code leads to such a plot.
I do not want the seconds to be shown and I want the marking at full hours and minor markings at half hours.
keydata["Time_split"] = keydata["Time_split"].dt.time
time_form = mdates.DateFormatter("%H:%M")
ax = keydata.plot(x='Time_split', y='Total_S4_Sig1')
ax.xaxis.set_major_formatter(time_form)
plt.show()
This code leads to such a plot.
Please be advised the seconds will always be 00

Try using matplotlib date formatting
import matplotlib.dates as mdates
date_fmt = mdates.DateFormatter('%H:%M:%S')
# plot your data
ax = df.plot.line(x='time', y='values')
# add the date formatter as the x axis tick formatter
ax.xaxis.set_major_formatter(date_fmt)

The following should address the problems you are facing:
import pandas as pd
from datetime import date, datetime, timedelta
import matplotlib.pyplot as plt
import matplotlib.dates as md
import numpy as np
#testing data
#keydata = pd.read_csv('test.txt',sep='\t',header=None,names=['Time_split','Total_S4_Sig1'])
x = pd.to_datetime(keydata['Time_split'])
y = keydata['Total_S4_Sig1']
# plot
fig, ax = plt.subplots(1, 1)
ax.plot(x, y,'ok')
# Format xtick labels as hour : minutes
xformatter = md.DateFormatter('%H:%M')
## Set xtick labels to appear every 1 hours
ax.xaxis.set_major_locator(md.HourLocator(interval=1))
#set minor ticks every 1/2 hour
ax.xaxis.set_minor_locator(md.MinuteLocator(byminute=[0,30],interval=1))
plt.gcf().axes[0].xaxis.set_major_formatter(xformatter)
plt.show()

Python Pandas - Plotting multiple Bar plots by category from dataframe

I have dataframe which looks like
df = pd.DataFrame(data={'ID':[1,1,1,2,2,2], 'Value':[13, 12, 15, 4, 2, 3]})
Index ID Value
0 1 13
1 1 12
2 1 15
3 2 4
4 2 2
5 2 3
and I want to plot it by the IDs (categories) so that each category would have different bar plot,
so in this case I would have two figures,
one figure with bar plot of ID=1,
and second separate figure bar plot of ID=2.
Can I do it (preferably without loops) with something like df.plot(y='Value', kind='bar')?

2 options are possible, one using matplotlib and the other seaborn that you should absolutely now as it works well with Pandas.
Pandas with matplotlib
You have to create a subplot with a number of columns and rows you set. It gives an array axes in 1-D if either nrows or ncols is set to 1, or in 2-D otherwise. Then, you give this object to the Pandas plot method.
If the number of categories is not known or high, you need to use a loop.
import pandas as pd
import matplotlib.pyplot as plt
fig, axes = plt.subplots( nrows=1, ncols=2, sharey=True )
df.loc[ df["ID"] == 1, 'Value' ].plot.bar( ax=axes[0] )
df.loc[ df["ID"] == 2, 'Value' ].plot.bar( ax=axes[1] )
plt.show()
Pandas with seaborn
Seaborn is the most amazing graphical tool that I know. The function catplot enables to plot a series of graph according to the values of a column when you set the argument col. You can select the type of plot with kind.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('white')
df['index'] = [1,2,3] * 2
sns.catplot(kind='bar', data=df, x='index', y='Value', col='ID')
plt.show()
I added a column index in order to compare with the df.plot.bar. If you don't want to, remove x='index' and it will display an unique bar with errors.

Histogram per hour - matplotlib

I'm analyzing public data on transport accidents in the UK.
My dataframe looks like this :
Index Time
0 02:30
1 00:37
2 01:25
3 09:15
4 07:53
5 09:29
6 08:53
7 10:05
I'm trying to plot a histogram showing accident distribution by time of day,
here is my code :
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import datetime as dt
import matplotlib.dates as mdates
df['hour']=pd.to_datetime(df['Time'],format='%H:%M')
df.set_index('hour', drop=False, inplace=True)
df['hour'].groupby(pd.Grouper(freq='60Min')).count().plot(kind='bar', color='b')
This is the output:
In this graph, I'd like to change the labels on the x-axis to the format 'hh:mm'. How would I go about doing this?

What you are missing is setting the format of the matplotlib x-axis format:
df.set_index('hour', drop=False, inplace=True)
df = df['hour'].groupby(pd.Grouper(freq='60Min')).count()
ax = df.plot(kind='bar', color='b')
ticklabels = df.index.strftime('%H:%Mh')
ax.xaxis.set_major_formatter(matplotlib.ticker.FixedFormatter(ticklabels))
plt.show()

Plotting Pandas Datetime Timeseries in AM/PM format

I have a pandas series with Timestamp indices that I'd like to plot.
print example.head()
2015-08-11 20:07:00-05:00 26
2015-08-11 20:08:00-05:00 66
2015-08-11 20:09:00-05:00 71
2015-08-11 20:10:00-05:00 63
2015-08-11 20:11:00-05:00 73
But when i plot it in pandas with:
plt.figure(figsize = (15,8))
cubs1m.plot(kind='area')
I'd like the values on the y-axis to show up in AM/PM format (8:08PM), not military time(20:08). Is there an easy way to do this?
And also, how would I control # of ticks and # of labels plotting with pandas?
Thanks in advance.

Your question has two elements:
How to control # of ticks/labels on a plot
How to change 24-hour time to 12-hour time
Axes methods set_xticks, set_yticks, set_xticklabels, and set_yticklabels control the ticks and the labels:
import matplotlib.pyplot as plt
plt.plot(range(10), range(10))
plt.gca().set_xticks(range(0,10,2))
plt.gca().set_xticklabels(['a{}'.format(ii) for ii in range(0,10,2)])
To change the time format, use pd.datetime.strftime: How can I convert 24 hour time to 12 hour time?
import pandas as pd
data = pd.Series(range(12), index=pd.date_range('2016-2-3 9:00','2016-2-3 20:00', freq='H'))
ax = data.plot(xticks=data.index[::2])
ax.set_xticklabels(data.index[::2].map(lambda x: pd.datetime.strftime(x, '%I %p')))
This question covers an alternate approach to plotting with dates: Pandas timeseries plot setting x-axis major and minor ticks and labels

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Manipulating Dates in x-axis Pandas Matplotlib - python

Related

Categorical data visualization - scatter plot with multiple X using Pandas and Seaborn

Plotting time series dataframe in python

Python Pandas - Plotting multiple Bar plots by category from dataframe

Histogram per hour - matplotlib

Plotting Pandas Datetime Timeseries in AM/PM format

Categories

Resources