Pandas: changing years based on an int value

Pandas: changing years based on an int value - python

I'm trying to subtract years from one column based on a number in another column.
This is what i mean:
base_date amount_years
0 2006-09-01 2
1 2007-04-01 4
The result would be:
base_date amount_years
0 2008-09-01 2
1 20011-04-01 4
Is there a way to achieve this in python?

Use DateOffset with apply and axis=1 for process per rows:
f = lambda x: x['base_date'] + pd.offsets.DateOffset(years=x['amount_years'])
df['base_date'] = df.apply(f, axis=1)
print (df)
base_date amount_years
0 2008-09-01 2
1 2011-04-01 4

Related

Get sum of values month, year wise using Pandas

Below is the data that I have:
There are three columns the class, date and marks
I need my output to be in the format below:
Here column headers 1,2,3 are the classes which contain the total marks stored month and year wise
My approach to this was using the following logic:
(Although I get the result, I am not satisfied with the code. I was hoping that someone could help me with a more efficient solution for the same?)
class=sorted(df.Class.unique())
dict_all = dict.fromkeys(class , 1)
for c in class:
actuals=[]
for i in range(2018,2019):
for j in range(1,13):
a = df['date'].map(lambda x : x.year == i)
b = df['date'].map(lambda x : x.month == j)
x= df[a & b & (df.class==c)].marks.sum()
actuals.append(x)
dict_all[c]=actuals
result = pd.DataFrame.from_dict(dict_all)

Input:
Class Date Total
0 3 01-01-2018 32
1 2 01-01-2018 69
2 1 01-01-2018 129
3 3 01-01-2019 12
Using df.pivot
df1 = df.pivot(index="Date",columns="Class", values="Total", ).reset_index().fillna(0)
print(df1)
Using crosstab
df1 = pd.crosstab(index=df["Date"],columns=df["Class"], values=df["Total"],aggfunc="max").reset_index().fillna(0)
print(df1)
Using groupby & unstack()
df1 = df.groupby(['Date','Class'])['Total'].max().unstack(fill_value=0).reset_index()
print(df1)
All the code(s) gives the output:
Class Date 1 2 3
0 01-01-2018 129 69 32
1 01-01-2019 0 0 12

create index or each month in array for the specific time interval

What I am trying to do is that I have beginning and end of an interval and want to create an index for each month.
I'm using pandas, but I should calculate the number of month using the following approach:
import pandas as pd
pd.period_range('2014-04', periods=<number-of-month>, freq='M')
Is there any way to create it automatically, I mean for example give it two arguments as beginning and end interval and then it creates an index for each month; in other words I mean:
pseudo-code:
pd.period_range(start='2014-04', end='2014-07', freq='M')
Expected output for the above pseudo-code is [0, 0, 0] because there are 3 month from 2014-04 to 2014-07.
Expected DataFrame to implement and want to access them by index:
index date count
0 2014-04 0
1 2014-05 0
2 2014-06 0
At first the array place zero for all of the indices and I call them count. I want to increment the count column using date. for example:
a = pd.period_range(start='2014-04', end='2014-07', freq='M')
a['2014-04'] += 1
index date count
0 2014-04 1
1 2014-05 0
2 2014-06 0
How can I implement it?

You need create PeriodIndex by period_range and then for add 1 to column counter use loc:
a = pd.period_range(start='2014-04', end='2014-07', freq='M')
df = pd.DataFrame({'count':0}, index=a)
df.loc['2014-04', 'count'] += 1
print (df)
count
2014-04 1
2014-05 0
2014-06 0
2014-07 0
Solution with Series:
a = pd.period_range(start='2014-04', end='2014-07', freq='M')
s = pd.Series(0, index=a)
s['2014-04'] += 1
print (s)
2014-04 1
2014-05 0
2014-06 0
2014-07 0
Freq: M, dtype: int64

IIUC, make pandas.Series with index = pd.date_range(...):
import pandas as pd
s = pd.Series(0, index=pd.date_range(start='2014-04', end='2019-08', freq="M"))
s['2014-04'] += 1
s.head()
Output:
2014-04-30 1
2014-05-31 0
2014-06-30 0
2014-07-31 0
2014-08-31 0
Freq: M, dtype: int64

Accessing different columns from DataFrame in transform

I want to write a transformation function accessing two columns from a DataFrame and pass it to transform().
Here is the DataFrame which I would like to modify:
print(df)
date increment
0 2012-06-01 0
1 2003-04-08 1
2 2009-04-22 3
3 2018-05-24 6
4 2006-09-25 2
5 2012-11-02 4
I would like to increment the year in column date by the number of years given variable increment. The proposed code (which does not work) is:
df.transform(lambda df: date(df.date.year + df.increment, 1, 1))
Is there a way to access individual columns in the function (here a lambda function) passed to transform()?

You can use pandas.to_timedelta :
# If necessary convert to date type first
# df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'] + pd.to_timedelta(df['increment'], unit='Y')
[out]
date increment
0 2012-06-01 00:00:00 0
1 2004-04-07 05:49:12 1
2 2012-04-21 17:27:36 3
3 2024-05-23 10:55:12 6
4 2008-09-24 11:38:24 2
5 2016-11-01 23:16:48 4
or alternatively:
df['date'] = pd.to_datetime({'year': df.date.dt.year.add(df.increment),
'month': df.date.dt.month,
'day': df.date.dt.day})
[out]
date increment
0 2012-06-01 0
1 2004-04-08 1
2 2012-04-22 3
3 2024-05-24 6
4 2008-09-25 2
5 2016-11-02 4
Your own solution could also be fixed by instead using the apply method and passing the axis=1 argument:
from datetime import date
df.apply(lambda df: date(df.date.year + df.increment, 1, 1), axis=1)

better way to create a new column instead of for loop

is there any faster way to this code?
i just want to calculate t_last - t_i and create a new column
time_ges = pd.DataFrame()
for i in range(0, len(df.GesamteMessung_Sec.index), 1):
time = df.GesamteMessung_Sec.iloc[-1]-df.GesamteMessung_Sec.iloc[i]
time_ges = time_ges.append(pd.DataFrame({'echte_Ladezeit': time}, index=[0]), ignore_index=True)
df['echte_Ladezeit'] = time_ges
this code takes a lot of computation time, is there any better way to do this?
thanks, R

You can subtract last value by column GesamteMessung_Sec and add to_frame for convert Series to DataFrame:
df = pd.DataFrame({'GesamteMessung_Sec':[10,2,1,5]})
print (df)
GesamteMessung_Sec
0 10
1 2
2 1
3 5
time_ges = (df.GesamteMessung_Sec.iloc[-1] - df.GesamteMessung_Sec).to_frame('echte_Ladezeit')
print (time_ges )
echte_Ladezeit
0 -5
1 3
2 4
3 0
If need new column of original DataFrame:
df = pd.DataFrame({'GesamteMessung_Sec':[10,2,1,5]})
df['echte_Ladezeit'] = df.GesamteMessung_Sec.iloc[-1] - df.GesamteMessung_Sec
print (df)
GesamteMessung_Sec echte_Ladezeit
0 10 -5
1 2 3
2 1 4
3 5 0

How to efficiently add rows for those data points which are missing from a sequence using pandas?

I have the following time series dataset of the number of sales happening for a day as a pandas data frame.
date, sales
20161224,5
20161225,2
20161227,4
20161231,8
Now if I have to include the missing data points here(i. e. missing dates) with a constant value(zero) and want to make it look the following way, how can I do this efficiently(assuming the data frame is ~50MB) using Pandas.
date, sales
20161224,5
20161225,2
20161226,0**
20161227,4
20161228,0**
20161229,0**
20161231,8
**Missing rows which are been added to the data frame.
Any help will be appreciated.

You can first cast to to_datetime column date, then set_index and reindex by min and max value of index, reset_index and if necessary change format by strftime:
df.date = pd.to_datetime(df.date, format='%Y%m%d')
df = df.set_index('date')
df = df.reindex(pd.date_range(df.index.min(), df.index.max()), fill_value=0)
.reset_index()
.rename(columns={'index':'date'})
print (df)
date sales
0 2016-12-24 5
1 2016-12-25 2
2 2016-12-26 0
3 2016-12-27 4
4 2016-12-28 0
5 2016-12-29 0
6 2016-12-30 0
7 2016-12-31 8
Last if need change format:
df.date = df.date.dt.strftime('%Y%m%d')
print (df)
date sales
0 20161224 5
1 20161225 2
2 20161226 0
3 20161227 4
4 20161228 0
5 20161229 0
6 20161230 0
7 20161231 8

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas: changing years based on an int value - python

I'm trying to subtract years from one column based on a number in another column. This is what i mean: base_date amount_years 0 2006-09-01 2 1 2007-04-01 4 The result would be: base_date amount_years 0 2008-09-01 2 1 20011-04-01 4 Is there a way to achieve this in python?

Use DateOffset with apply and axis=1 for process per rows: f = lambda x: x['base_date'] + pd.offsets.DateOffset(years=x['amount_years']) df['base_date'] = df.apply(f, axis=1) print (df) base_date amount_years 0 2008-09-01 2 1 2011-04-01 4

Related

Get sum of values month, year wise using Pandas

create index or each month in array for the specific time interval

Accessing different columns from DataFrame in transform

better way to create a new column instead of for loop

How to efficiently add rows for those data points which are missing from a sequence using pandas?

Categories

Resources