Arrange pandas DataFrame for color Plotting

Arrange pandas DataFrame for color Plotting - python

I have a dataframe which looks like this (left column is the index):
YYYY-MO-DD HH-MI-SS_SSS ATMOSPHERIC PRESSURE (hPa) mean
2016-11-07 14:00:00 1014.028782
2016-11-07 15:00:00 1014.034111
.... ....
2016-11-30 09:00:00 1006.516436
2016-11-30 10:00:00 1006.216156
Now I want to plot a colormap with this data - so I want to create an X (horizontal axis) to be just the dates:
2016-11-07, 2016-11-08,...,2016-11-30
and the Y (Vertical axis) to be the time:
00:00:00, 01:00:00, 02:00:00, ..., 23:00:00
And finally the Z (color map) to be the pressure data for each date and time [f(x,y)].
How can I arrange the data for this kind of plotting ?
Thank you !

With test data prepared like so:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
samples = 24 * 365
index = pd.date_range('2017-01-01', freq='1H', periods=samples)
data = pd.DataFrame(np.random.rand(samples), index=index, columns=['data'])
I would do something like this:
data = data.reset_index()
data['date'] = data['index'].apply(lambda x: x.date())
data['time'] = data['index'].apply(lambda x: x.time())
pivoted = data.pivot(index='time', columns='date', values='data')
fig, ax = plt.subplots(1, 1)
ax.imshow(pivoted, origin='lower', cmap='viridis')
plt.show()
Which produces:
To improve the axis labeling, this is a start:
ax.set_yticklabels(['{:%H:%M:%S}'.format(t) for t in data['time'].unique()])
ax.set_xticklabels(['{:%Y-%m-%d}'.format(t) for t in data['date'].unique()])
but you'll need to figure out how to choose how often a label appears with set_xticks() and set_yticks()

Related

How to aggregate a metric and plot groups separately

I have this dataset:
df = pd.DataFrame()
df['year'] = [2011,2011,2011,2011,2011,2011,2011,2011,2011,2011,2011,2011]
df['month'] = [1,2,3,4,5,6,1,2,3,4,5,6]
df['after'] = [0,0,0,1,1,1,0,0,0,1,1,1]
df['campaign'] = [0,0,0,0,0,0,1,1,1,1,1,1]
df['sales'] = [10000,11000,12000,10500,10000,9500,7000,8000,5000,6000,6000,7000]
df['date_m'] = pd.to_datetime(df.year.astype(str) + '-' + df.month.astype(str))
And I want to make a line plot grouped by month and campaign, so I have tried this code:
df['sales'].groupby(df['date_m','campaign']).mean().plot.line()
But I get this error message KeyError: ('date_m', 'campaign'). Please, any help will be greatly appreciated.

Plotting is typically dependant upon the shape of the DataFrame.
.groupby creates a long format DataFrame, which is great for seaborn
.pivot_table creates a wide format DataFrame, which easily works with pandas.DataFrame.plot
.groupby the DataFrame
df['sales'].groupby(...) is incorrect, because df['sales'] selects one column of the dataframe; none of the other columns are available
.groupby converts the DataFrame into a long format, which is great for plotting with seaborn.lineplot.
Specify the hue parameter to separate by 'campaign'.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# perform groupby and reset the index
dfg = df.groupby(['date_m','campaign'])['sales'].mean().reset_index()
# display(dfg.head())
date_m campaign sales
0 2011-01-01 0 10000
1 2011-01-01 1 7000
2 2011-02-01 0 11000
3 2011-02-01 1 8000
4 2011-03-01 0 12000
# plot with seaborn
sns.lineplot(data=dfg, x='date_m', y='sales', hue='campaign')
.pivot_table the DataFrame
.pivot_table shapes the DataFrame correctly for plotting with pandas.DataFrame.plot, and it has an aggregation parameter.
The DataFrame is shaped into a wide format.
# pivot the dataframe into the correct shape for plotting
dfp = df.pivot_table(index='date_m', columns='campaign', values='sales', aggfunc='mean')
# display(dfp.head())
campaign 0 1
date_m
2011-01-01 10000 7000
2011-02-01 11000 8000
2011-03-01 12000 5000
2011-04-01 10500 6000
2011-05-01 10000 6000
# plot the dataframe
dfp.plot()
Plotting with matplotlib directly
fig, ax = plt.subplots(figsize=(8, 6))
for v in df.campaign.unique():
# select the data based on the campaign
data = df[df.campaign.eq(v)]
# this is only necessary if there is more than one value per date
data = data.groupby(['date_m','campaign'])['sales'].mean().reset_index()
ax.plot('date_m', 'sales', data=data, label=f'{v}')
plt.legend(title='campaign')
plt.show()
Notes
Package versions:
pandas v1.2.4
seaborn v0.11.1
matplotlib v3.3.4

How to extract hour:minute from a datetime stamp in Python

I have dataframe as given below: df=
POA ... Inverter efficiency
2019-01-25 08:00:00 20.608713 ... 0.708626
2019-01-29 08:00:00 200.250137 ... 0.017787
2019-01-29 08:30:00 347.699615 ... 0.000000
2019-01-29 09:00:00 492.822662 ... 0.000000
2019-01-29 09:30:00 620.336243 ...
.
.
2019-03-07 13:00:00 1151.468384 ... 1.067493
2019-03-07 13:30:00 1119.876831 ... 2.311577
2019-03-07 14:00:00 1038.760864 ... 3.395081
I want to plot 24 hours plot for all days. My code
plot(df.index.hour,df['POA'])
Result is:
However, there is a data at 08:30, 09:30,..., etc. But it is not reflected in plot. In fact, these intermediary hour data points are combined with 08, 09hr, etc data. So, my question is, how to show 08.30, 09.30,...,etc data as well on plot? (Looks like I have to extract both hour and minute from same datetime)
My accepted below answer gives following plot and this is what I wanted. But, x-axis ticks are clubbed together. They don't appear as in my first above plot. How to correct x-axis ticks in my second plot?: '

#rng = pd.date_range('1/5/2018 00:00', periods=5, freq='35T')
#df = pd.DataFrame({'POA':randint(1, 10, 5)}, index=rng)
labels = df.index.strftime('%H:%M')
x = np.arange(len(labels))
plt.plot(x, df['POA'])
plt.xticks(x, labels)
Steps:
labels = df.index.strftime('%H:%M') => Convert the datetime to "Hours:minutes" format to use as x labels
x = np.arange(len(labels)) => Create a dummy x axis for matplotlib
plt.plot(x, df['POA']) => Make the plot
plt.xticks(x, labels) => Replace the x labels with datetime
Assumption: The datetime index is sorted, if not the graph will be messed up. If the index is not in sorted order then sort it before plotting for correct results.
We can further enhance the x axis to include seconds, dates, etc by using the appropriate string formatter in df.index.strftime
Solution with skipping x-ticks to avoid clubbed x labels
#rng = pd.date_range('1/5/2018 00:00', periods=50, freq='35T')
#df = pd.DataFrame({'POA':randint(1, 10, 50)}, index=rng)
labels = df.index.strftime('%H:%M')
x = np.arange(len(labels))
fig, ax = plt.subplots()
plt.plot(x, df['POA'])
plt.xticks(x, labels)
skip_every_n = 10
for i, x_label in enumerate(ax.xaxis.get_ticklabels()):
if i % skip_every_n != 0:
x_label.set_visible(False)

Pandas dataframe groupby plot

I have a dataframe which is structured as:
Date ticker adj_close
0 2016-11-21 AAPL 111.730
1 2016-11-22 AAPL 111.800
2 2016-11-23 AAPL 111.230
3 2016-11-25 AAPL 111.790
4 2016-11-28 AAPL 111.570
...
8 2016-11-21 ACN 119.680
9 2016-11-22 ACN 119.480
10 2016-11-23 ACN 119.820
11 2016-11-25 ACN 120.740
...
How can I plot based on the ticker the adj_close versus Date?

Simple plot,
you can use:
df.plot(x='Date',y='adj_close')
Or you can set the index to be Date beforehand, then it's easy to plot the column you want:
df.set_index('Date', inplace=True)
df['adj_close'].plot()
If you want a chart with one series by ticker on it
You need to groupby before:
df.set_index('Date', inplace=True)
df.groupby('ticker')['adj_close'].plot(legend=True)
If you want a chart with individual subplots:
grouped = df.groupby('ticker')
ncols=2
nrows = int(np.ceil(grouped.ngroups/ncols))
fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=(12,4), sharey=True)
for (key, ax) in zip(grouped.groups.keys(), axes.flatten()):
grouped.get_group(key).plot(ax=ax)
ax.legend()
plt.show()

Similar to Julien's answer above, I had success with the following:
fig, ax = plt.subplots(figsize=(10,4))
for key, grp in df.groupby(['ticker']):
ax.plot(grp['Date'], grp['adj_close'], label=key)
ax.legend()
plt.show()
This solution might be more relevant if you want more control in matlab.
Solution inspired by: https://stackoverflow.com/a/52526454/10521959

The question is How can I plot based on the ticker the adj_close versus Date?
This can be accomplished by reshaping the dataframe to a wide format with .pivot or .groupby, or by plotting the existing long form dataframe directly with seaborn.
In the following sample data, the 'Date' column has a datetime64[ns] Dtype.
Convert the Dtype with pandas.to_datetime if needed.
Tested in python 3.10, pandas 1.4.2, matplotlib 3.5.1, seaborn 0.11.2
Imports and Sample Data
import pandas as pd
import pandas_datareader as web # for sample data; this can be installed with conda if using Anaconda, otherwise pip
import seaborn as sns
import matplotlib.pyplot as plt
# sample stock data, where .iloc[:, [5, 6]] selects only the 'Adj Close' and 'tkr' column
tickers = ['aapl', 'acn']
df = pd.concat((web.DataReader(ticker, data_source='yahoo', start='2020-01-01', end='2022-06-21')
.assign(ticker=ticker) for ticker in tickers)).iloc[:, [5, 6]]
# display(df.head())
Date Adj Close ticker
0 2020-01-02 73.785904 aapl
1 2020-01-03 73.068573 aapl
2 2020-01-06 73.650795 aapl
3 2020-01-07 73.304420 aapl
4 2020-01-08 74.483604 aapl
# display(df.tail())
Date Adj Close ticker
1239 2022-06-14 275.119995 acn
1240 2022-06-15 281.190002 acn
1241 2022-06-16 270.899994 acn
1242 2022-06-17 275.380005 acn
1243 2022-06-21 282.730011 acn
pandas.DataFrame.pivot & pandas.DataFrame.plot
pandas plots with matplotlib as the default backend.
Reshaping the dataframe with pandas.DataFrame.pivot converts from long to wide form, and puts the dataframe into the correct format to plot.
.pivot does not aggregate data, so if there is more than 1 observation per index, per ticker, then use .pivot_table
Adding subplots=True will produce a figure with two subplots.
# reshape the long form data into a wide form
dfp = df.pivot(index='Date', columns='ticker', values='Adj Close')
# display(dfp.head())
ticker aapl acn
Date
2020-01-02 73.785904 203.171112
2020-01-03 73.068573 202.832764
2020-01-06 73.650795 201.508224
2020-01-07 73.304420 197.157654
2020-01-08 74.483604 197.544434
# plot
ax = dfp.plot(figsize=(11, 6))
Use seaborn, which accepts long form data, so reshaping the dataframe to a wide form isn't necessary.
seaborn is a high-level api for matplotlib
sns.lineplot: axes-level plot
fig, ax = plt.subplots(figsize=(11, 6))
sns.lineplot(data=df, x='Date', y='Adj Close', hue='ticker', ax=ax)
sns.relplot: figure-level plot
Adding row='ticker', or col='ticker', will generate a figure with two subplots.
g = sns.relplot(kind='line', data=df, x='Date', y='Adj Close', hue='ticker', aspect=1.75)

Pandas: bar plot with multiIndex dataframe

I have a pandas DataFrame with a TIMESTAMP column (not the index), and the timestamp format is as follows:
2015-03-31 22:56:45.510
I also have columns called CLASS and AXLES. I would like to compute the count of records for each month separately for each unique value of AXLES (AXLES can take an integer value between 3-12).
I came up with a combination of resample and groupby:
resamp = dfWIM.set_index('TIMESTAMP').groupby('AXLES').resample('M', how='count').CLASS
This seems to give me a multiIndex dataframe object, as shown below.
In [72]: resamp
Out [72]:
AXLES TIMESTAMP
3 2014-07-31 5517
2014-08-31 31553
2014-09-30 42816
2014-10-31 49308
2014-11-30 44168
2014-12-31 45518
2015-01-31 54782
2015-02-28 52166
2015-03-31 47929
4 2014-07-31 3147
2014-08-31 24810
2014-09-30 39075
2014-10-31 46857
2014-11-30 42651
2014-12-31 48282
2015-01-31 42708
2015-02-28 43904
2015-03-31 50033
From here, how can I access different components of this multiIndex object to create a bar plot for the following conditions?
show data when AXLES = 3
show x ticks in the Month - Year format (no days, hours, minutes etc.)
Thanks!
EDIT: Following code gives me the plot, but I could not change the xtick formatting to MM-YY.
resamp[3].plot(kind='bar')
EDIT 2 below is a code snippet that generates a small sample of the data similar to what I have:
dftest = {'TIMESTAMP':['2014-08-31','2014-09-30','2014-10-31'], 'AXLES':[3, 3, 3], 'CLASS':[5,6,7]}
dfTest = pd.DataFrame(dftest)
dfTest.TIMESTAMP = pd.to_datetime(pd.Series(dfTest.TIMESTAMP))
resamp = dfTest.set_index('TIMESTAMP').groupby('AXLES').resample('M', how='count').CLASS
resamp[3].plot(kind='bar')
EDIT 3:
Here below is the solution:
A.Plot the whole resampled dataframe (based on #Ako 's suggestion):
df = resamp.unstack(0)
df.index = [ts.strftime('%b 20%y') for ts in df.index]
df.plot(kind='bar', rot=0)
B.Plot an individual index from the resampled dataframe (based on #Alexander 's suggestion):
df = resamp[3]
df.index = [ts.strftime('%b 20%y') for ts in df.index]
df.plot(kind='bar', rot=0)

You could generate and set the labels explicitly using ax.xaxis.set_major_formatter with a ticker.FixedFormatter. This will allow you to keep your DataFrame's MultiIndex with timestamp values, while displaying the timestamps in the desired %m-%Y format:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.ticker as ticker
dftest = {'TIMESTAMP':['2014-08-31','2014-09-30','2014-10-31'], 'AXLES':[3, 3, 3], 'CLASS':[5,6,7]}
dfTest = pd.DataFrame(dftest)
dfTest.TIMESTAMP = pd.to_datetime(pd.Series(dfTest.TIMESTAMP))
resamp = dfTest.set_index('TIMESTAMP').groupby('AXLES').resample('M', how='count').CLASS
ax = resamp[3].plot(kind='bar')
ticklabels = [timestamp.strftime('%m-%Y') for axle, timestamp in resamp.index]
ax.xaxis.set_major_formatter(ticker.FuncFormatter(lambda x, pos: ticklabels[int(x)]))
plt.gcf().autofmt_xdate()
plt.show()
yields

The following should work, but it is difficult to test without some data.
Start by resetting your index to get access to the TIMESTAMP column. Then use strftime to format it to your desired text representation (e.g. mm-yy). Finally, reset the index back to AXLES and TIMESTAMP.
df = resamp.reset_index()
df['TIMESTAMP'] = [ts.strftime('%m-%y') for ts in df.TIMESTAMP]
df.set_index(['AXLES', 'TIMESTAMP'], inplace=True)
>>> df.xs(3, level=0).plot(kind='bar')

Matplotlib: Making a line graph's datetime x axis labels look like Excel

I have a simple pandas DataFrame with yearly values that I am plotting as a line graph:
import matplotlib.pyplot as plt
import pandas as pd
>>>df
a b
2010-01-01 9.7 9.0
2011-01-01 8.8 14.2
2012-01-01 8.4 7.6
2013-01-01 9.6 8.4
2014-01-01 8.2 5.5
The expected format for the X axis is to use no margins for the labels:
fig = plt.figure(0)
ax = fig.add_subplot(1, 1, 1)
df.plot(ax = ax)
But I would like to force the values to plot in the middle of the year range, like as done in excel:
I have tried setting the x axis margins:
ax.margins(xmargin = 1)
But can see no difference.

If you just want to move the dates, you could try adding this line at the end:
ax.set_xlim(ax.get_xlim()[0] - 0.5, ax.get_xlim()[1] + 0.5)
If you need to format the dates as well you could either modify your index or make changes in the plotted ticks like so:
(presuming that you df.index is a datetime object)
ax.set_xticklabels(df.index.to_series().apply(lambda x: x.strftime('%d/%m/%Y')))
This will format the dates to look like your Excel example.
Or you could change your index to look like you want and then call .plot():
df.index = df.index.to_series().apply(lambda x: x.strftime('%d/%m/%Y'))
print df.index.tolist()
['01/01/2010', '01/01/2011', '01/01/2012', '01/01/2013', '01/01/2014']
And, if you index is not datetime you need to convert it first like this:
df.index = pd.to_datetime(df.index)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Arrange pandas DataFrame for color Plotting - python

Related

How to aggregate a metric and plot groups separately

How to extract hour:minute from a datetime stamp in Python

Pandas dataframe groupby plot

Pandas: bar plot with multiIndex dataframe

Matplotlib: Making a line graph's datetime x axis labels look like Excel

Categories

Resources