Month,Year with Value Plot,Pandas and MatPlotLib - python

i have DataFrame with Month,Year and Value and i want to do a TimeSeries Plot.
Sample:
month year Value
12 2016 0.006437804129357764
1 2017 0.013850880792606646
2 2017 0.013330349031207292
3 2017 0.07663058273768052
4 2017 0.7822831457266424
5 2017 0.8089573099244689
6 2017 1.1634845000200715
im trying to plot this Value data with Year and Month, Year and Month in X-Axis and Value in Y-Axis.

One way is this:
import pandas as pd
import matplotlib.pyplot as plt
df['date'] = df['month'].map(str)+ '-' +df['year'].map(str)
df['date'] = pd.to_datetime(df['date'], format='%m-%Y').dt.strftime('%m-%Y')
fig, ax = plt.subplots()
plt.plot_date(df['date'], df['Value'])
plt.show()

You need to set a DateTime index for pandas to properly plot the axis. A one line modification for your dataframe (assuming you don't need year and month anymore as columns and that first day of each month is correct) would do:
df.set_index(pd.to_datetime({
'day': 1,
'month': df.pop('month'),
'year': df.pop('year')
}), inplace=True)
df.Value.plot()

Related

Change xtick labels from pandas plot to different date format

I plot data from this dataframe.
Date 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
Date
01-01 12.896 13.353 12.959 13.011 13.073 12.721 12.643 12.484 12.876 13.102
01-02 12.915 13.421 12.961 13.103 13.125 12.806 12.644 12.600 12.956 13.075
01-03 12.926 13.379 13.012 13.116 13.112 12.790 12.713 12.634 12.959 13.176
01-04 13.051 13.414 13.045 13.219 13.051 12.829 12.954 12.724 13.047 13.187
01-05 13.176 13.417 13.065 13.148 13.115 12.874 12.956 12.834 13.098 13.123
The plot generates a line plot for each year across every day of the year.
ice_data.plot(figsize=(20,12), title='Arctic Sea Ice Extent', lw=4, fontsize=16, ax=ax, grid=True)
However, this plot generates the xtick labels in the format from the dataframe, which is 01-01 (January 1). I want to change the xtick label to be 01 Jan, but cannot seem to figure out how to manipulate the dataframe using strftime() to accomplish this. Here is a picture for reference.
Convert values to datetimes by to_datetime with specified year for correct parsing 2-29, last convert to custom format by DatetimeIndex.strftime:
idx = pd.to_datetime('2000' + ice_data.index, format='%Y%m-%d').strftime('%d %b')
ice_data = ice_data.set_index(idx)

Creating a HeatMap from a Pandas MultiIndex Series

I have a Pandas DF and I need to create a Heatmap. My data looks like this and I'd like to put the Years in Columns, the Days in rows and then use that with Seaborn to create a heatmap
I tried multiple ways but I was always getting "inconsistent shape" when I chose the DF, so any recommendation on how to transform it?
Year and Days are the index of this series
2016
Tuesday 4
Wednesady 6
.....
2017
Tuesday 4.4
Monday 3.5
....
import seaborn as sns
ax = sns.heatmap(dayofweek)
If you have a DataFrame like this:
years = range(2016,2019)
months = range(1,6)
df = pd.DataFrame(index=pd.MultiIndex.from_product([years,months]))
df['vals'] = np.random.random(size=len(df))
You can reformat the data to a rectangular shape using:
df2 = df.reset_index().pivot(columns='level_0',index='level_1',values='vals')
sns.heatmap(df2)

Python Pandas. Date object split by separate columns.

I have dates in Python (pandas) written as "1/31/2010". To apply linear regression I want to have 3 separate variables: number of day, number of month, number of year.
What will be the way to split a column with date in pandas into 3 columns?
Another question is to have the same but group days into 3 groups: 1-10, 11-20, 21-31.
df['date'] = pd.to_datetime(df['date'])
#Create 3 additional columns
df['day'] = df['date'].dt.day
df['month'] = df['date'].dt.month
df['year'] = df['date'].dt.year
Ideally, you can do this without having to create 3 additional columns, you can just pass the Series to your function.
In [2]: pd.to_datetime('01/31/2010').day
Out[2]: 31
In [3]: pd.to_datetime('01/31/2010').month
Out[3]: 1
In [4]: pd.to_datetime('01/31/2010').year
Out[4]: 2010
This answers only your first question
One solution is to extract attributes of pd.Timestamp objects using operator.attrgetter.
The benefit of this method is you can easily expand / change the attributes you require. In addition, the logic is not specific to object type.
from operator import attrgetter
import pandas as pd
df = pd.DataFrame({'date': ['1/21/2010', '5/5/2015', '4/30/2018']})
df['date'] = pd.to_datetime(df['date'], format='%m/%d/%Y')
attr_list = ['day', 'month', 'year']
attrs = attrgetter(*attr_list)
df[attr_list] = df['date'].apply(attrs).apply(pd.Series)
print(df)
date day month year
0 2010-01-21 21 1 2010
1 2015-05-05 5 5 2015
2 2018-04-30 30 4 2018
from datetime import datetime
import pandas as pd
df = pd.DataFrame({'yyyymmdd': ['20150204', '20160305']})
for col, field in [("year", "%Y"), ("month", "%m"), ("day", "%d")]:
df[col] = df["yyyymmdd"].apply(
lambda cell: datetime.strptime(cell, "%Y%m%d").strftime(field))
print(df)
yyyymmdd year month day
0 20150204 2015 02 04
1 20160305 2016 03 05

When plotting datetime index data, put markers in the plot on specific days (e.g. weekend)

I create a pandas dataframe with a DatetimeIndex like so:
import datetime
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# create datetime index and random data column
todays_date = datetime.datetime.now().date()
index = pd.date_range(todays_date-datetime.timedelta(10), periods=14, freq='D')
data = np.random.randint(1, 10, size=14)
columns = ['A']
df = pd.DataFrame(data, index=index, columns=columns)
# initialize new weekend column, then set all values to 'yes' where the index corresponds to a weekend day
df['weekend'] = 'no'
df.loc[(df.index.weekday == 5) | (df.index.weekday == 6), 'weekend'] = 'yes'
print(df)
Which gives
A weekend
2014-10-13 7 no
2014-10-14 6 no
2014-10-15 7 no
2014-10-16 9 no
2014-10-17 4 no
2014-10-18 6 yes
2014-10-19 4 yes
2014-10-20 7 no
2014-10-21 8 no
2014-10-22 8 no
2014-10-23 1 no
2014-10-24 4 no
2014-10-25 3 yes
2014-10-26 8 yes
I can easily plot the A colum with pandas by doing:
df.plot()
plt.show()
which plots a line of the A column but leaves out the weekend column as it does not hold numerical data.
How can I put a "marker" on each spot of the A column where the weekend column has the value yes?
Meanwhile I found out, it is as simple as using boolean indexing in pandas. Doing the plot directly with pyplot instead of pandas' own plot wrapper (which is more convenient to me):
plt.plot(df.index, df.A)
plt.plot(df[df.weekend=='yes'].index, df[df.weekend=='yes'].A, 'ro')
Now, the red dots mark all weekend days which are given by df.weekend='yes' values.

How to plot different groups of data from a dataframe into a single figure

I have a temperature file with many years temperature records, in a format as below:
2012-04-12,16:13:09,20.6
2012-04-12,17:13:09,20.9
2012-04-12,18:13:09,20.6
2007-05-12,19:13:09,5.4
2007-05-12,20:13:09,20.6
2007-05-12,20:13:09,20.6
2005-08-11,11:13:09,20.6
2005-08-11,11:13:09,17.5
2005-08-13,07:13:09,20.6
2006-04-13,01:13:09,20.6
Every year has different numbers, time of the records, so the pandas datetimeindices are all different.
I want to plot the different year's data in the same figure for comparing . The X-axis is Jan to Dec, the Y-axis is temperature. How should I go about doing this?
Try:
ax = df1.plot()
df2.plot(ax=ax)
If you a running Jupyter/Ipython notebook and having problems using;
ax = df1.plot()
df2.plot(ax=ax)
Run the command inside of the same cell!! It wont, for some reason, work when they are separated into sequential cells. For me at least.
Chang's answer shows how to plot a different DataFrame on the same axes.
In this case, all of the data is in the same dataframe, so it's better to use groupby and unstack.
Alternatively, pandas.DataFrame.pivot_table can be used.
dfp = df.pivot_table(index='Month', columns='Year', values='value', aggfunc='mean')
When using pandas.read_csv, names= creates column headers when there are none in the file. The 'date' column must be parsed into datetime64[ns] Dtype so the .dt extractor can be used to extract the month and year.
import pandas as pd
# given the data in a file as shown in the op
df = pd.read_csv('temp.csv', names=['date', 'time', 'value'], parse_dates=['date'])
# create additional month and year columns for convenience
df['Year'] = df.date.dt.year
df['Month'] = df.date.dt.month
# groupby the month a year and aggreate mean on the value column
dfg = df.groupby(['Month', 'Year'])['value'].mean().unstack()
# display(dfg)
Year 2005 2006 2007 2012
Month
4 NaN 20.6 NaN 20.7
5 NaN NaN 15.533333 NaN
8 19.566667 NaN NaN NaN
Now it's easy to plot each year as a separate line. The OP only has one observation for each year, so only a marker is displayed.
ax = dfg.plot(figsize=(9, 7), marker='.', xticks=dfg.index)
To do this for multiple dataframes, you can do a for loop over them:
fig = plt.figure(num=None, figsize=(10, 8))
ax = dict_of_dfs['FOO'].column.plot()
for BAR in dict_of_dfs.keys():
if BAR == 'FOO':
pass
else:
dict_of_dfs[BAR].column.plot(ax=ax)
This can also be implemented without the if condition:
fig, ax = plt.subplots()
for BAR in dict_of_dfs.keys():
dict_of_dfs[BAR].plot(ax=ax)
You can make use of the hue parameter in seaborn. For example:
import seaborn as sns
df = sns.load_dataset('flights')
year month passengers
0 1949 Jan 112
1 1949 Feb 118
2 1949 Mar 132
3 1949 Apr 129
4 1949 May 121
.. ... ... ...
139 1960 Aug 606
140 1960 Sep 508
141 1960 Oct 461
142 1960 Nov 390
143 1960 Dec 432
sns.lineplot(x='month', y='passengers', hue='year', data=df)

Categories

Resources