I'm using a dataframe and convert the time column to years and months like this:
consumer_confidence = pd.read_csv('consumer_confidence.csv')
business_confidence = pd.read_csv('business_confidence.csv')
consumer_confidence['Year'] = pd.to_datetime(consumer_confidence['TIME']).dt.year
consumer_confidence['Month'] = pd.to_datetime(consumer_confidence['TIME']).dt.month
business_confidence['Year'] = pd.to_datetime(business_confidence['TIME']).dt.year
business_confidence['Month'] = pd.to_datetime(business_confidence['TIME']).dt.month
business_confidence = business_confidence.groupby('Year')['Value'].sum()
consumer_confidence = consumer_confidence.groupby('Year')['Value'].sum()
Attempting the .groupby() statements result in this error:
AttributeError: 'Series' object has no attribute 'Year'
I am unsure how to resolve this as 'Year' should now be a column in the dataframe. Could someone explain my error here?
Your code (with some sample inputs as shown below) works fine for me:
import pandas as pd
'''
consumer_confidence = pd.read_csv('consumer_confidence.csv')
business_confidence = pd.read_csv('business_confidence.csv')
'''
consumer_confidence = pd.DataFrame({'TIME':['2021-01-01', '2021-02-01', '2022-04-11', '2022-04-12'], 'Value':[1,2,3,4]})
business_confidence = pd.DataFrame({'TIME':['2020-01-01', '2021-02-01', '2022-04-11', '2022-04-12'], 'Value':[5,6,7,8]})
print(consumer_confidence)
print(business_confidence)
consumer_confidence['Year'] = pd.to_datetime(consumer_confidence['TIME']).dt.year
consumer_confidence['Month'] = pd.to_datetime(consumer_confidence['TIME']).dt.month
business_confidence['Year'] = pd.to_datetime(business_confidence['TIME']).dt.year
business_confidence['Month'] = pd.to_datetime(business_confidence['TIME']).dt.month
print(consumer_confidence)
print(business_confidence)
business_confidence = business_confidence.groupby('Year')['Value'].sum()
consumer_confidence = consumer_confidence.groupby('Year')['Value'].sum()
print(consumer_confidence)
print(business_confidence)
Output:
TIME Value
0 2021-01-01 1
1 2021-02-01 2
2 2022-04-11 3
3 2022-04-12 4
TIME Value
0 2020-01-01 5
1 2021-02-01 6
2 2022-04-11 7
3 2022-04-12 8
TIME Value Year Month
0 2021-01-01 1 2021 1
1 2021-02-01 2 2021 2
2 2022-04-11 3 2022 4
3 2022-04-12 4 2022 4
TIME Value Year Month
0 2020-01-01 5 2020 1
1 2021-02-01 6 2021 2
2 2022-04-11 7 2022 4
3 2022-04-12 8 2022 4
Year
2021 3
2022 7
Name: Value, dtype: int64
Year
2020 5
2021 6
2022 15
Name: Value, dtype: int64
Related
I want to extract week of month column from the date.
Dummy Data:
data = pd.DataFrame(pd.date_range(' 1/ 1/ 2000', periods = 100, freq ='D'))
Code I tried:
def add_week_of_month(df):
df['monthweek'] = pd.to_numeric(df.index.day/7)
df['monthweek'] = df['monthweek'].apply(lambda x: math.ceil(x))
return df
But this code does count 7 day periods within a month. The first 7 days of a month the column would be 1, from day 8 to day 14 it would be 2 etc
But I want to have is calendar weeks per month, so on the first day of the month the feature would be 1, from the first Monday after that it would be 2 etc.
Can anyone help me with this?
You can convert to weekly period and subtract to the first week of the month + 1 if a Monday.
If you want weeks starting on Sundays, use 'W-SAT' as period and start.dt.dayofweek.eq(6).
# get first day of month
start = data[0]+pd.offsets.MonthBegin()+pd.offsets.MonthBegin(-1)
# or
# start = data[0].dt.to_period('M').dt.to_timestamp()
data['monthweek'] = ((data[0].dt.to_period('W')-start.dt.to_period('W'))
.apply(lambda x: x.n)
.add(start.dt.dayofweek.eq(0))
)
NB. in your input, column 0 is the date.
output:
0 monthweek
0 2000-01-01 0
1 2000-01-02 0
2 2000-01-03 1 # Monday
3 2000-01-04 1
4 2000-01-05 1
5 2000-01-06 1
6 2000-01-07 1
7 2000-01-08 1
8 2000-01-09 1
9 2000-01-10 2 # Monday
10 2000-01-11 2
.. ... ...
95 2000-04-05 1
96 2000-04-06 1
97 2000-04-07 1
98 2000-04-08 1
99 2000-04-09 1
[100 rows x 2 columns]
Example for 2001 (starts on a Monday):
0 monthweek
0 2001-01-01 1 # Monday
1 2001-01-02 1
2 2001-01-03 1
3 2001-01-04 1
4 2001-01-05 1
5 2001-01-06 1
6 2001-01-07 1
7 2001-01-08 2 # Monday
8 2001-01-09 2
9 2001-01-10 2
10 2001-01-11 2
11 2001-01-12 2
12 2001-01-13 2
13 2001-01-14 2
14 2001-01-15 3
get the first day then add it to the day of the month and divide by 7
first_day = dt.replace(day=1)
dom = dt.day
adjusted_dom = dom + first_day.weekday()
return int(math.ceil(adjusted_dom/7.0))
Here is data
id
date
population
1
2021-5
21
2
2021-5
22
3
2021-5
23
4
2021-5
24
1
2021-4
17
2
2021-4
24
3
2021-4
18
4
2021-4
29
1
2021-3
20
2
2021-3
29
3
2021-3
17
4
2021-3
22
I want to calculate the monthly change regarding population in each id. so result will be:
id
date
delta
1
5
.2353
1
4
-.15
2
5
-.1519
2
4
-.2083
3
5
.2174
3
4
.0556
4
5
-.2083
4
4
.3182
delta := (this month - last month) / last month
How to approach this in pandas? I'm thinking of groupby but don't know what to do next
remember there might be more dates. but results is always
Use GroupBy.pct_change with sorting columns first before, last remove misisng rows by column delta:
df['date'] = pd.to_datetime(df['date'])
df = df.sort_values(['id','date'], ascending=[True, False])
df['delta'] = df.groupby('id')['population'].pct_change(-1)
df = df.dropna(subset=['delta'])
print (df)
id date population delta
0 1 2021-05-01 21 0.235294
4 1 2021-04-01 17 -0.150000
1 2 2021-05-01 22 -0.083333
5 2 2021-04-01 24 -0.172414
2 3 2021-05-01 23 0.277778
6 3 2021-04-01 18 0.058824
3 4 2021-05-01 24 -0.172414
7 4 2021-04-01 29 0.318182
Try this:
df.groupby('id')['population'].rolling(2).apply(lambda x: (x.iloc[0] - x.iloc[1]) / x.iloc[0]).dropna()
maybe you could try something like:
data['delta'] = data['population'].diff()
data['delta'] /= data['population']
with this approach the first line would be NaNs, but for the rest, this should work.
I have a date object and a date column 'date1' in pandas dataframe 'df' as below:
date = '202107'
df
date1
0 2021-07-01
1 2021-08-01
2 2021-09-01
3 2021-10-01
4 2021-11-01
5 2023-02-01
6 2023-03-01
I want to create a column 'months' in df where
months = (date1 + 1month) - date
My output dataframe should look like below:
df
date1 months
0 2021-07-01 1
1 2021-08-01 2
2 2021-09-01 3
3 2021-10-01 4
4 2021-11-01 5
5 2023-02-01 20
6 2023-03-01 21
Here's a way to do using pandas:
date = '202107'
date = pd.to_datetime(date, format='%Y%m')
df['months'] = (df.date + pd.offsets.MonthBegin(1)).dt.month - date.month
print(df)
date months
0 2021-07-01 1
1 2021-08-01 2
2 2021-09-01 3
3 2021-10-01 4
4 2021-11-01 5
Given a date variable as follows
mydate = 202003
and a dataframe [df] containing a datetime variable start_date. You can do:
mydate_to_use= pd.to_datetime(mydate , format = '%Y%m', errors='ignore')
df['months'] = (df['START_DATE'].dt.year - mydate_to_use.year) * 12 + (df['START_DATE'].dt.month - mydate_to_use.month)
IIUC
s=(df.date1-pd.to_datetime(date,format='%Y%m'))//np.timedelta64(1, 'M')+1
Out[118]:
0 1
1 2
2 3
3 4
4 5
Name: date1, dtype: int64
df['months']=s
Update
(df.date1.dt.year*12+df.date1.dt.month)-(pd.to_numeric(date)//100)*12-(pd.to_numeric(date)%100)+1
Out[379]:
0 1
1 2
2 3
3 4
4 5
5 20
6 21
Name: date1, dtype: int64
I have a dataframe like this:
time
2018-06-25 20:42:00
2016-06-26 23:51:00
2017-05-34 12:29:00
2016-03-11 10:14:00
Now I created a column like this
df['isEIDRange'] = 0
Let's say, EID festivate is on 15 June 2018.
So I want to fill 1 value in isEIDRange column. If the date is between 10 June 2018 to 20 June 2018 (5 days before and 5 days after EID)
How can I do it?
Something like?
df.loc[ (df.time > 15 June - 5 days) & (df.time < 15 June + 5 days), 'isEIDRange' ] = 1
Use Series.between function for test values with cast mask to integers:
df['isEIDRange'] = df['time'].between('2018-06-10', '2018-06-20').astype(int)
If want dynamic solution:
df = pd.DataFrame({"time": pd.date_range("2018-06-08", "2018-06-22")})
#print (df)
date = '15 June 2018'
d = pd.to_datetime(date)
diff = pd.Timedelta(5, unit='d')
df['isEIDRange1'] = df['time'].between(d - diff, d + diff).astype(int)
df['isEIDRange2'] = df['time'].between(d - diff, d + diff, inclusive=False).astype(int)
print (df)
time isEIDRange1 isEIDRange2
0 2018-06-08 0 0
1 2018-06-09 0 0
2 2018-06-10 1 0
3 2018-06-11 1 1
4 2018-06-12 1 1
5 2018-06-13 1 1
6 2018-06-14 1 1
7 2018-06-15 1 1
8 2018-06-16 1 1
9 2018-06-17 1 1
10 2018-06-18 1 1
11 2018-06-19 1 1
12 2018-06-20 1 0
13 2018-06-21 0 0
14 2018-06-22 0 0
Or set values by numpy.where:
df['isEIDRange'] = np.where(df['time'].between(d - diff, d + diff), 1, 0)
You can use loc or np.where:
import numpy as np
df['isEIDRange'] = np.where((df['time'] > '2018-06-10') & (df['time'] < '2018-06-20'),1,df['isEIDRange']
This means that when the column time is between 2018-06-10 and 2018-06-20, the column isEIDRange will be equal to 1, otherwise it will retain it's original value (0).
You can use pandas date_range for this:
eid = pd.date_range("15/10/2019", "20/10/2019")
df = pd.DataFrame({"dates": pd.date_range("13/10/2019", "20/10/2019")})
df["eid"] = 0
df.loc[df["dates"].isin(eid), "eid"] = 1
and output:
dates eid
0 2019-10-13 0
1 2019-10-14 0
2 2019-10-15 1
3 2019-10-16 1
4 2019-10-17 1
5 2019-10-18 1
6 2019-10-19 1
7 2019-10-20 1
I have loaded a pandas dataframe from a .csv file that contains a column having datetime values.
df = pd.read_csv('data.csv')
The name of the column having the datetime values is pickup_datetime. Here's what I get if i do df['pickup_datetime'].head():
0 2009-06-15 17:26:00+00:00
1 2010-01-05 16:52:00+00:00
2 2011-08-18 00:35:00+00:00
3 2012-04-21 04:30:00+00:00
4 2010-03-09 07:51:00+00:00
Name: pickup_datetime, dtype: datetime64[ns, UTC]
How do I convert this column into a numpy array having only the day values of the datetime? For example: 15 from 0 2009-06-15 17:26:00+00:00, 05 from 1 2010-01-05 16:52:00+00:00, etc..
df['pickup_datetime'] = pd.to_datetime(df['pickup_datetime'], errors='coerce')
df['pickup_datetime'].dt.day.values
# array([15, 5, 18, 21, 9])
Just adding another Variant, although coldspeed already provide the briefed answer as a x-mas and New year bonus :-) :
>>> df
pickup_datetime
0 2009-06-15 17:26:00+00:00
1 2010-01-05 16:52:00+00:00
2 2011-08-18 00:35:00+00:00
3 2012-04-21 04:30:00+00:00
4 2010-03-09 07:51:00+00:00
Convert the strings to timestamps by inferring their format:
>>> df['pickup_datetime'] = pd.to_datetime(df['pickup_datetime'])
>>> df
pickup_datetime
0 2009-06-15 17:26:00
1 2010-01-05 16:52:00
2 2011-08-18 00:35:00
3 2012-04-21 04:30:00
4 2010-03-09 07:51:00
You can pic the day's only from the pickup_datetime:
>>> df['pickup_datetime'].dt.day
0 15
1 5
2 18
3 21
4 9
Name: pickup_datetime, dtype: int64
You can pic the month's only from the pickup_datetime:
>>> df['pickup_datetime'].dt.month
0 6
1 1
2 8
3 4
4 3
You can pic the Year's only from the pickup_datetime
>>> df['pickup_datetime'].dt.year
0 2009
1 2010
2 2011
3 2012
4 2010