idxmax() returns index instead of timestaped - python

a python beginner here,
I am trying to get the highest price of a particular stock per month, and what date the maximum value occurred.
Getting the maximum value per month is okay using max()
but when I'm trying get the corresponding dates of the max price using idxmax(), my code returns the corresponding index instead of date. My code looks like this:
Max_Date = Daily_High.groupby(pd.Grouper(key="Date", freq="M")).High.idxmax()
Output
Date High
0 2020-04-30 9929
1 2020-05-31 9946
2 2020-06-30 9966
3 2020-07-31 9993
4 2020-08-31 10014
5 2020-09-30 10016
6 2020-10-31 10044
7 2020-11-30 10063
8 2020-12-31 10097
9 2021-01-31 10114
10 2021-02-28 10125
11 2021-03-31 10139
12 2021-04-30 10180
13 2021-05-31 10182
Output Should be like this
Date High Max Date
0 2020-04-30 2020-04-30
1 2020-05-31 2020-05-26
2 2020-06-30 2020-06-23
3 2020-07-31 2020-07-31
4 2020-08-31 2020-08-31
5 2020-09-30 2020-09-02
6 2020-10-31 2020-10-13
7 2020-11-30 2020-11-09
8 2020-12-31 2020-12-29
9 2021-01-31 2021-01-25
10 2021-02-28 2021-02-09
11 2021-03-31 2021-03-02
12 2021-04-30 2021-04-29
13 2021-05-31 2021-05-03
Hope you can help me to get the correct date. Thank you!

Create DatetimeIndex and remove key="Date" from pd.Grouper:
Max_Date = Daily_High.set_index('Date').groupby(pd.Grouper( freq="M")).High.idxmax()

Related

Python to increment date every week in a dataframe

I am trying to work on this requirement where I need to increment the date in weeks, here is the below code for the same:
import pandas as pd
import numpy as np
c=15
s={'week':[1,2,3,4,5,6,7,8],'Sales':[10,20,30,40,50,60,70,80]}
p=pd.DataFrame(data=s)
p['week'] =p['week'].apply(
lambda x: datetime.datetime.strptime(f'2021-{x:02}-1', '%Y-%U-%u')
)
O/P-
How would I be able to increment from last row of week column to get next 15 weeks?
Basically, the desired output of week starts from 2022-03-01 till the next 14 weeks.
One option is to use date_range to generate additional dates, then use set_index + reindex to append them:
p = p.set_index('week').reindex(pd.date_range('2021-01-04', periods=8+14, freq='W-MON')).rename_axis(['week']).reset_index()
Output:
week Sales
0 2021-01-04 10.0
1 2021-01-11 20.0
2 2021-01-18 30.0
3 2021-01-25 40.0
4 2021-02-01 50.0
5 2021-02-08 60.0
6 2021-02-15 70.0
7 2021-02-22 80.0
8 2021-03-01 NaN
9 2021-03-08 NaN
10 2021-03-15 NaN
11 2021-03-22 NaN
12 2021-03-29 NaN
13 2021-04-05 NaN
14 2021-04-12 NaN
15 2021-04-19 NaN
16 2021-04-26 NaN
17 2021-05-03 NaN
18 2021-05-10 NaN
19 2021-05-17 NaN
20 2021-05-24 NaN
21 2021-05-31 NaN
You can modify the length of of week list with range() function and your variable c, but you will also check for the length of sales, which has to have the same number of elements:
import pandas as pd
import numpy as np
import datetime
c=15
weeks = list(range(1, c+1))
sales = [10,20,30,40,50,60,70,80]
s={'week':weeks,'Sales':sales+[None]*(len(weeks)-len(sales) if (len(weeks)-len(sales)) >=0 else 0)}
p=pd.DataFrame(data=s)
p['week'] =p['week'].apply(
lambda x: datetime.datetime.strptime(f'2021-{x:02}-1', '%Y-%U-%u')
)
print(p)
another option in DateOffset:
p = pd.concat([p, pd.DataFrame({'week': [p.iloc[-1,0]+pd.DateOffset(weeks=i) for i in range(1,c)]})], ignore_index=True)
>>> p
'''
week Sales
0 2021-01-04 10.0
1 2021-01-11 20.0
2 2021-01-18 30.0
3 2021-01-25 40.0
4 2021-02-01 50.0
5 2021-02-08 60.0
6 2021-02-15 70.0
7 2021-02-22 80.0
8 2021-03-01 NaN
9 2021-03-08 NaN
10 2021-03-15 NaN
11 2021-03-22 NaN
12 2021-03-29 NaN
13 2021-04-05 NaN
14 2021-04-12 NaN
15 2021-04-19 NaN
16 2021-04-26 NaN
17 2021-05-03 NaN
18 2021-05-10 NaN
19 2021-05-17 NaN
20 2021-05-24 NaN
21 2021-05-31 NaN

Adding a year to a period?

I have a column which I have converted to dateime:
df['date'] = pd.to_datetime(df['date'], errors='coerce')
date
2021-10-21 00:00:00
2021-10-24 00:00:00
2021-10-25 00:00:00
2021-10-26 00:00:00
And I need to add 1 year to this time based on a conditional:
df.loc[df['quarter'] == "Q4_", 'date'] + pd.offsets.DateOffset(years=1)
but it's not working....
date
2021-10-21 00:00:00
2021-10-24 00:00:00
2021-10-25 00:00:00
2021-10-26 00:00:00
I have tried converting it to period since I only need the year to be used in a concatenation later:
df['year'] = df['date'].dt.to_period('Y')
but I cannot add any number to a period.
This appears to be working for me:
import pandas as pd
df = pd.DataFrame({'date':pd.date_range('1/1/2021', periods=50, freq='M')})
print(df.head(24))
Input:
date
0 2021-01-31
1 2021-02-28
2 2021-03-31
3 2021-04-30
4 2021-05-31
5 2021-06-30
6 2021-07-31
7 2021-08-31
8 2021-09-30
9 2021-10-31
10 2021-11-30
11 2021-12-31
12 2022-01-31
13 2022-02-28
14 2022-03-31
15 2022-04-30
16 2022-05-31
17 2022-06-30
18 2022-07-31
19 2022-08-31
20 2022-09-30
21 2022-10-31
22 2022-11-30
23 2022-12-31
Add, year:
df.loc[df['date'].dt.quarter == 4, 'date'] += pd.offsets.DateOffset(years=1)
print(df.head(24))
Note per your logic, the year increase on October.
Output:
date
0 2021-01-31
1 2021-02-28
2 2021-03-31
3 2021-04-30
4 2021-05-31
5 2021-06-30
6 2021-07-31
7 2021-08-31
8 2021-09-30
9 2022-10-31
10 2022-11-30
11 2022-12-31
12 2022-01-31
13 2022-02-28
14 2022-03-31
15 2022-04-30
16 2022-05-31
17 2022-06-30
18 2022-07-31
19 2022-08-31
20 2022-09-30
21 2023-10-31
22 2023-11-30
23 2023-12-31

December/January Seasonal Mean

I am attempting to calculate the seasonal means for the winter months of DJF and DJ. I first tried to use Xarray's .groupby function:
ds.groupby('time.month').mean('time')
Then I realized that instead of grouping by the previous years' December and the subsequent Jan/Feb., it was grouping all three months from the same year. I was then able to figure out how to solve for the DJF season by resampling and creating a function to select out the proper 3 month period:
>def is_djf(month):
return (month == 12)
>ds.resample('QS-MAR').mean('time')
>ds.sel(time=is_djf(ds['time.month']))
I am still unfortunately unsure how to solve for the Dec./Jan. season since the resampling method I used was for offsetting quarterly. Thank you for any and all help!
Use resample with QS-DEC.
Suppose this dataframe:
time val
0 2020-12-31 1
1 2021-01-31 1
2 2021-02-28 1
3 2021-03-31 2
4 2021-04-30 2
5 2021-05-31 2
6 2021-06-30 3
7 2021-07-31 3
8 2021-08-31 3
9 2021-09-30 4
10 2021-10-31 4
11 2021-11-30 4
12 2021-12-31 5
13 2022-01-31 5
14 2022-02-28 5
>>> df.set_index('time').resample('QS-DEC').mean()
val
time
2020-12-01 1.0
2021-03-01 2.0
2021-06-01 3.0
2021-09-01 4.0
2021-12-01 5.0

Why is pandas str.replace returning NaN?

I am trying to remove the comma separator from values in a dataframe in Pandas to enable me to convert the to Integers. I have been using the following method:
df_orders['qty'] = df_orders['qty'].str.replace(',','')
However this seems to be returning NaN values for some numbers which did not originally contain ',' in their values. I have included a sample of my Input data and current output below:
Input:
date sku qty
556603 2020-10-25 A 6
590904 2020-10-21 A 5
595307 2020-10-20 A 31
602678 2020-10-19 A 11
615022 2020-10-18 A 2
641077 2020-10-16 A 1
650203 2020-10-15 A 3
655363 2020-10-14 A 18
667919 2020-10-13 A 5
674990 2020-10-12 A 2
703901 2020-10-09 A 1
715411 2020-10-08 A 1
721557 2020-10-07 A 31
740515 2020-10-06 A 49
752670 2020-10-05 A 4
808426 2020-09-28 A 2
848057 2020-09-23 A 1
865751 2020-09-21 A 2
886630 2020-09-18 A 3
901095 2020-09-16 A 47
938648 2020-09-10 A 2
969909 2020-09-07 A 3
1021548 2020-08-31 A 2
1032254 2020-08-30 A 8
1077443 2020-08-25 A 5
1089670 2020-08-24 A 24
1098843 2020-08-23 A 16
1102025 2020-08-22 A 23
1179347 2020-08-12 A 1
1305700 2020-07-29 A 1
1316343 2020-07-28 A 1
1399930 2020-07-19 A 1
1451864 2020-07-15 A 1
1463195 2020-07-14 A 15
2129080 2020-05-19 A 1
2143468 2020-05-18 A 1
Current Output:
date sku qty
556603 2020-10-25 A 6
590904 2020-10-21 A 5
595307 2020-10-20 A 31
602678 2020-10-19 A 11
615022 2020-10-18 A 2
641077 2020-10-16 A 1
650203 2020-10-15 A 3
655363 2020-10-14 A NaN
667919 2020-10-13 A NaN
674990 2020-10-12 A NaN
703901 2020-10-09 A NaN
715411 2020-10-08 A NaN
721557 2020-10-07 A NaN
740515 2020-10-06 A NaN
752670 2020-10-05 A NaN
808426 2020-09-28 A 2
848057 2020-09-23 A 1
865751 2020-09-21 A 2
886630 2020-09-18 A 3
901095 2020-09-16 A 47
938648 2020-09-10 A NaN
969909 2020-09-07 A NaN
1021548 2020-08-31 A NaN
1032254 2020-08-30 A NaN
1077443 2020-08-25 A NaN
1089670 2020-08-24 A NaN
1098843 2020-08-23 A NaN
1102025 2020-08-22 A NaN
1179347 2020-08-12 A NaN
1305700 2020-07-29 A NaN
1316343 2020-07-28 A 1
1399930 2020-07-19 A 1
1451864 2020-07-15 A 1
1463195 2020-07-14 A 15
2129080 2020-05-19 A 1
2143468 2020-05-18 A 1
I have had a look around but can't seem to find what is causing this error.
I was able to reproduce your issue:
# toy df
df
qty
0 1
1 2,
2 3
df['qty'].str.replace(',', '')
0 NaN
1 2
2 NaN
Name: qty, dtype: object
I created df by doing this:
df = pd.DataFrame({'qty': [1, '2,', 3]})
In other words, your column has mixed data types - some values are integers while others are strings. So when you apply .str methods on mixed types, non str types are converted to NaN to indicate "hey it doesn't make sense to run a str method on an int".
You may fix this by converting the entire column to string, then back to int:
df['qty'].astype(str).str.replace(',', '').astype(int)
Or if you want something a litte more robust, try
df['qty'] = pd.to_numeric(
df['qty'].astype(str).str.extract('(\d+)', expand=False), errors='coerce')

Pandas fill consecutive null date values from previous dates + a constant number of days

I have a dataframe that contains a data column
Comp_date
0 2020-04-24
1 NaT
2 NaT
3 NaT
4 2020-08-06
5 NaT
6 NaT
7 NaT
8 2020-08-22
9 NaT
I am trying to fill the null with the value of the previous date + add a constant number of days (10). But I am unable to do so. I tried the following
df['Comp_date']=df['Comp_date'].fillna((df['Comp_date'].shift()+pd.to_timedelta(10, unit='D')), inplace=True)
Nothing happens and I get the same result. Any help?
expected outcome
Comp_date
0 2020-04-24
1 2020-05-04
2 2020-05-14
3 2020-05-24
4 2020-08-06
5 2020-08-16
6 2020-08-26
7 2020-09-05
8 2020-08-22
9 2020-09-01
Idea is create groups for missing values by Series.notna and Series.cumsum and create counter by GroupBy.cumcount, multiple number of days by Series.mul convert to timedeltas by to_timedelta what is added to forward filling missing values with ffill:
num_days = 10
g = df['Comp_date'].notna().cumsum()
days = pd.to_timedelta(df.groupby(g).cumcount().mul(num_days), unit='d')
df['Comp_date'] = df['Comp_date'].ffill().add(days)
print (df)
Comp_date
0 2020-04-24
1 2020-05-04
2 2020-05-14
3 2020-05-24
4 2020-08-06
5 2020-08-16
6 2020-08-26
7 2020-09-05
8 2020-08-22
9 2020-09-01
I'm not clear on your question, but this adds a constant number of days to the last observed Comp_date.
constant_number_of_days = 2
df2 = df['Comp_date'].ffill().to_frame()
df2.loc[df['Comp_date'].isnull(), 'Comp_date'] += pd.Timedelta(days=constant_number_of_days)
>>> df2
Comp_date
0 2020-04-24
1 2020-04-26
2 2020-04-26
3 2020-04-26
4 2020-08-06
5 2020-08-08
6 2020-08-08
7 2020-08-08
8 2020-08-22
9 2020-08-24

Categories

Resources