I have a dataframe, df, that I am wanting to calculate the delta over a 7 day time period:
Monday Tuesday Wednesday Thursday Friday Sat Sun
5 10 15 20 25 30 35
1 2 3 4 5 6 7
I would like to find the delta for the first row, starting with Monday (5) and ending on Sun (35)
The delta for the first 7 day time period would be: 35 - 5 = 30
The next 7 day window delta would be: 7 - 1 = 6 and so on
The date would begin on 1/1/2020 and continue by 7 day or weekly increments.
Desired output: (New dataframe with the newly created Date and Delta columns)
Date Delta
1/1/2020 30
1/8/2020 6
This is what I am doing:
import pandas as pd
import numpy as np
df = pd.read_csv('df.csv')
df['Delta'] = df['Sunday'] - df['Monday]
df['Date'] = pd.date_range(start='1/1/2020', periods=len(df), freq='Weeks')
df2.to_csv('df2.csv')
Any suggestion is appreciated
Lets Try calculate date_range by incorporating multiples in the freq
df['Delta']=df.Sun.sub(df.Monday)
df['Date']=pd.Series(pd.date_range(pd.Timestamp('2020-01-01'), periods=7, freq='7d'))
or simply
df=df.assign(Delta=df.Sun.sub(df.Monday),Date=pd.Series\
(pd.date_range(pd.Timestamp('2020-01-01'), periods=7, freq='7d')))
Monday Tuesday Wednesday Thursday Friday Sat Sun Delta Date
0 5 10 15 20 25 30 35 30 2020-01-01
1 1 2 3 4 5 6 7 6 2020-01-08
# necessary imports
import datetime
import pandas
Can do:
numdays=5
base = datetime.datetime(2020,1,1)
date_list = [base + datetime.timedelta(days=7*x) for x in range(numdays)]
Then:
df=pd.DataFrame({'Date':date_list})
If you have another list of values, ie Deltas_list you want to include in this dataframe:
Deltas_list=[0,1,2,3,4]
Deltas=pd.Series(Deltas_list)
df['Delta']=Deltas
df will be:
Date Delta
0 2020-01-01 0
1 2020-01-08 1
2 2020-01-15 2
3 2020-01-22 3
4 2020-01-29 4
Related
Using .weekday() to find the day of the week as an integer (Monday = 0 ... Sunday = 6) for everyday from today until next year (+365 days from today). Problem now is that if the 1st of the month starts mid week then I need to return the day of the week with the 1st day of the month now being = 0.
Ex. If the month starts Wednesday then Wednesday = 0... Sunday = 4 (for that week only).
Annotated Picture of Month Explaining What I Want to Do
Originally had the below code but wrong as the first statement will run 7 days regardless.
import datetime
from datetime import date
for day in range (1,365):
departure_date = date.today() + datetime.timedelta(days=day)
if departure_date.weekday() < 7:
day_of_week = departure_date.day
else:
day_of_week = departure_date.weekday()
The following seems to do the job properly:
import datetime as dt
def custom_weekday(date):
if date.weekday() > (date.day-1):
return date.day - 1
else:
return date.weekday()
for day in range (1,366):
departure_date = dt.date.today() + dt.timedelta(days=day)
day_of_week = custom_weekday(date=departure_date)
print(departure_date, day_of_week, departure_date.weekday())
Your code had two small bugs:
the if condition was wrong
days are represented inconsistently: date.weekday() is 0-based, date.day is 1-based
For every date, get the first week of that month. Then, check if the date is within that first week. If it is, use the .day - 1 value (since you are 0-based). Otherwise, use the .weekday().
from datetime import date, datetime, timedelta
for day in range (-5, 40):
departure_date = date.today() + timedelta(days=day)
first_week = date(departure_date.year, departure_date.month, 1).isocalendar()[1]
if first_week == departure_date.isocalendar()[1]:
day_of_week = departure_date.day - 1
else:
day_of_week = departure_date.weekday()
print(departure_date, day_of_week)
2021-08-27 4
2021-08-28 5
2021-08-29 6
2021-08-30 0
2021-08-31 1
2021-09-01 0
2021-09-02 1
2021-09-03 2
2021-09-04 3
2021-09-05 4
2021-09-06 0
2021-09-07 1
2021-09-08 2
2021-09-09 3
2021-09-10 4
2021-09-11 5
2021-09-12 6
2021-09-13 0
2021-09-14 1
2021-09-15 2
2021-09-16 3
2021-09-17 4
2021-09-18 5
2021-09-19 6
2021-09-20 0
2021-09-21 1
2021-09-22 2
2021-09-23 3
2021-09-24 4
2021-09-25 5
2021-09-26 6
2021-09-27 0
2021-09-28 1
2021-09-29 2
2021-09-30 3
2021-10-01 0
2021-10-02 1
2021-10-03 2
2021-10-04 0
2021-10-05 1
2021-10-06 2
2021-10-07 3
2021-10-08 4
2021-10-09 5
2021-10-10 6
For any date D.M.Y, get the weekday W of 1.M.Y.
Then you need to adjust weekday value only for the first 7-W days of that month. To adjust, simply subtract the value W.
Example for September 2021: the first date of month (1.9.2021) is a Wednesday, so W is 2. You need to adjust weekdays for dates 1.9.2021 to 5.9.2021 (because 7-2 is 5) in that month by minus 2.
Here is data
id
date
population
1
2021-5
21
2
2021-5
22
3
2021-5
23
4
2021-5
24
1
2021-4
17
2
2021-4
24
3
2021-4
18
4
2021-4
29
1
2021-3
20
2
2021-3
29
3
2021-3
17
4
2021-3
22
I want to calculate the monthly change regarding population in each id. so result will be:
id
date
delta
1
5
.2353
1
4
-.15
2
5
-.1519
2
4
-.2083
3
5
.2174
3
4
.0556
4
5
-.2083
4
4
.3182
delta := (this month - last month) / last month
How to approach this in pandas? I'm thinking of groupby but don't know what to do next
remember there might be more dates. but results is always
Use GroupBy.pct_change with sorting columns first before, last remove misisng rows by column delta:
df['date'] = pd.to_datetime(df['date'])
df = df.sort_values(['id','date'], ascending=[True, False])
df['delta'] = df.groupby('id')['population'].pct_change(-1)
df = df.dropna(subset=['delta'])
print (df)
id date population delta
0 1 2021-05-01 21 0.235294
4 1 2021-04-01 17 -0.150000
1 2 2021-05-01 22 -0.083333
5 2 2021-04-01 24 -0.172414
2 3 2021-05-01 23 0.277778
6 3 2021-04-01 18 0.058824
3 4 2021-05-01 24 -0.172414
7 4 2021-04-01 29 0.318182
Try this:
df.groupby('id')['population'].rolling(2).apply(lambda x: (x.iloc[0] - x.iloc[1]) / x.iloc[0]).dropna()
maybe you could try something like:
data['delta'] = data['population'].diff()
data['delta'] /= data['population']
with this approach the first line would be NaNs, but for the rest, this should work.
I have a dataset with meteorological features for 2019, to which I want to join two columns of power consumption datasets for 2017, 2018. I want to match them by hour, day and month, but the data belongs to different years. How can I do that?
The meteo dataset is a 6 column similar dataframe with datetimeindexes belonging to 2019.
You can from the index 3 additional columns that represent the hour, day and month and use them for a later join. DatetimeIndex has attribtues for different parts of the timestamp:
import pandas as pd
ind = pd.date_range(start='2020-01-01', end='2020-01-20', periods=10)
df = pd.DataFrame({'number' : range(10)}, index = ind)
df['hour'] = df.index.hour
df['day'] = df.index.day
df['month'] = df.index.month
print(df)
number hour day month
2020-01-01 00:00:00 0 0 1 1
2020-01-03 02:40:00 1 2 3 1
2020-01-05 05:20:00 2 5 5 1
2020-01-07 08:00:00 3 8 7 1
2020-01-09 10:40:00 4 10 9 1
2020-01-11 13:20:00 5 13 11 1
2020-01-13 16:00:00 6 16 13 1
2020-01-15 18:40:00 7 18 15 1
2020-01-17 21:20:00 8 21 17 1
2020-01-20 00:00:00 9 0 20 1
I have two different columns in my dataset,
start end
0 2015-01-01 2017-01-01
1 2015-01-02 2015-06-02
2 2015-01-03 2015-12-03
3 2015-01-04 2020-11-25
4 2015-01-05 2025-07-27
I want the difference between start and end in a specific way, here's my desired output.
year_diff month_diff
2 1
0 6
0 12
5 11
10 7
Here the day is not important to me, only month and year. I've tried to period to get diff but it returns just different in months only. how can I achieve my desired output?
df['end'].dt.to_period('M') - df['start'].dt.to_period('M'))
Try:
df["year_diff"]=df["end"].dt.year.sub(df["start"].df.year)
df["month_diff"]=df["end"].dt.month.sub(df["start"].df.month)
This solution assumes that the number of days that make up a year (365) and a month (30) are constant. If the datetimes are strings, convert them into a datetime object. In a Pandas DataFrame this can be done like so
def to_datetime(dataframe):
new_dataframe = pd.DataFrame()
new_dataframe[0] = pd.to_datetime(dataframe[0], format="%Y-%m-%d")
new_dataframe[1] = pd.to_datetime(dataframe[1], format="%Y-%m-%d")
return new_dataframe
Next, column 1 can be subtracted from column 0 to give the difference in days. We can divide this number by 365 using the // operator to get the number of whole years. We can get the number of remaining days using the % operator and divide this by 30 using the // operator the get the number of whole months.
def get_time_diff(dataframe):
dataframe[2] = dataframe[1] - dataframe[0]
diff_dataframe = pd.DataFrame(columns=["year_diff", "month_diff"])
for i in range(0, dataframe.index.stop):
year_diff = dataframe[2][i].days // 365
month_diff = (dataframe[2][i].days % 365) // 30
diff_dataframe.loc[i] = [year_diff, month_diff]
return diff_dataframe
An example output from using these functions would be
start end days_diff year_diff month_diff
0 2019-10-15 2021-08-11 666 days 1 10
1 2020-02-11 2022-10-13 975 days 2 8
2 2018-12-17 2020-09-16 639 days 1 9
3 2017-01-03 2017-01-28 25 days 0 0
4 2019-12-21 2022-03-10 810 days 2 2
5 2018-08-08 2019-05-07 272 days 0 9
6 2017-06-18 2020-08-01 1140 days 3 1
7 2017-11-14 2020-04-17 885 days 2 5
8 2019-08-19 2020-05-10 265 days 0 8
9 2018-05-05 2020-09-08 857 days 2 4
Note: This will give the number of whole years and months. Hence, if there is a remainder of 29 days, one day short from a month, this will not be counted.
I have a multi index df called groupt3 in pandas which looks like this when I enter groupt3.head():
datetime song sum rat
artist datetime
2562 8 2 2 26 0
46 19 19 26 0
47 3 3 26 0
4Hero 1 2 2 32 0
26 20 20 32 0
9 10 10 32 0
I would like to have a "flat" data frame which took the artist index and the date time index and "repeats it" to form this:
artist date time song sum rat
2562 8 2 26 0
2562 46 19 26 0
2562 47 3 26 0
etc...
Thanks.
Using pandas.DataFrame.to_records().
Example:
import pandas as pd
import numpy as np
arrays = [['Monday','Monday','Tursday','Tursday'],
['Morning','Noon','Morning','Evening']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['Weekday', 'Time'])
df = pd.DataFrame(np.random.randint(5, size=(4,2)), index=index)
In [39]: df
Out[39]:
0 1
Weekday Time
Monday Morning 1 3
Noon 2 1
Tursday Morning 3 3
Evening 1 2
In [40]: pd.DataFrame(df.to_records())
Out[40]:
Weekday Time 0 1
0 Monday Morning 1 3
1 Monday Noon 2 1
2 Tursday Morning 3 3
3 Tursday Evening 1 2
I think you can use reset_index:
import pandas as pd
import numpy as np
np.random.seed(0)
arrays = [['Monday','Monday','Tursday','Tursday'],
['Morning','Noon','Morning','Evening']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['Weekday', 'Time'])
df = pd.DataFrame(np.random.randint(5, size=(4,2)), index=index)
print df
0 1
Weekday Time
Monday Morning 4 0
Noon 3 3
Tursday Morning 3 1
Evening 3 2
print df.reset_index()
Weekday Time 0 1
0 Monday Morning 4 0
1 Monday Noon 3 3
2 Tursday Morning 3 1
3 Tursday Evening 3 2