I want to create a new column called DateTime using numerical columns "Year","Month","Day","Hour","Minute".
Year Month Day Hour Minute
2019 5 9 11 0
2019 5 9 11 10
2019 5 9 11 20
This is my code:
df["DateTime"] = pd.to_datetime(df[["Year","Month","Day","Hour","Minute"]])
The expected result is:
DateTime
2019-05-09 11:00:00
2019-05-09 11:10:00
2019-05-09 11:20:00
However, I get this wrong result:
DateTime
2019-05-09
2019-05-09
2019-05-09
try this:
d = {
'Year': [2019,2019],
'Month': [5,6],
'Day': [12,13],
'Hour': [12,20],
'Minute': [30,45],
}
df = pd.DataFrame(d)
df["DateTime"] = pd.to_datetime(df[["Year","Month","Day","Hour","Minute"]]).dt.strftime('%d/%m/%y %H:%M')
df
Year Month Day Hour Minute DateTime
0 2019 5 12 12 30 12/05/19 12:30
1 2019 6 13 20 45 13/06/19 20:45
Related
local_time
5398 2019-02-14 14:35:42+01:00
5865 2021-09-22 04:28:53+02:00
6188 2018-05-04 09:34:53+02:00
6513 2019-11-09 15:54:51+01:00
6647 2019-09-18 09:25:43+02:00
df_with_local_time['local_time'].loc[6647] returns
datetime.datetime(2019, 9, 18, 9, 25, 43, tzinfo=<DstTzInfo 'Europe/Oslo' CEST+2:00:00 DST>)
Based on the column, I would like to generate multiple date-related columns:
def datelike_variables(i):
year = i.year
month = i.month
#dayofweek = i.dayofweek
day = i.day
hour = i.hour
return year, month, day, hour
df_with_local_time[['year','month','day','hour']]=df_with_local_time['local_time'].apply(datelike_variables,axis=1,result_type="expand")
returns TypeError: datelike_variables() got an unexpected keyword argument 'result_type'
Expected result:
local_time year month day hour
5398 2019-02-14 14:35:42+01:00 2019 02 14 14
5865 2021-09-22 04:28:53+02:00 2021 09 22 04
6188 2018-05-04 09:34:53+02:00 2018 05 04 09
6513 2019-11-09 15:54:51+01:00 2019 11 09 15
6647 2019-09-18 09:25:43+02:00 2019 09 18 09
Error is because use Series.apply, there is no parameter result_type:
def datelike_variables(i):
year = i.year
month = i.month
#dayofweek = i.dayofweek
day = i.day
hour = i.hour
return pd.Series([year, month, day, hour])
df_with_local_time[['year','month','day','hour']]=df_with_local_time['local_time'].apply(datelike_variables)
print (df_with_local_time)
local_time year month day hour
5398 2019-02-14 14:35:42+01:00 2019 2 14 14
5865 2021-09-22 04:28:53+02:00 2021 9 22 4
6188 2018-05-04 09:34:53+02:00 2018 5 4 9
6513 2019-11-09 15:54:51+01:00 2019 11 9 15
6647 2019-09-18 09:25:43+02:00 2019 9 18 9
Your solution is possible by lambda function in DataFrame.apply:
def datelike_variables(i):
year = i.year
month = i.month
#dayofweek = i.dayofweek
day = i.day
hour = i.hour
return year, month, day, hour
df_with_local_time[['year','month','day','hour']]=df_with_local_time.apply(lambda x: datelike_variables(x['local_time']), axis=1,result_type="expand")
print (df_with_local_time)
local_time year month day hour
5398 2019-02-14 14:35:42+01:00 2019 2 14 14
5865 2021-09-22 04:28:53+02:00 2021 9 22 4
6188 2018-05-04 09:34:53+02:00 2018 5 4 9
6513 2019-11-09 15:54:51+01:00 2019 11 9 15
6647 2019-09-18 09:25:43+02:00 2019 9 18 9
I have a pandas Dataframe that looks like this:
year month name value1 value2
0 2021 7 cars 5000 4000
1 2021 7 boats 2000 250
2 2021 9 cars 3000 7000
And I want it to look like this:
year month day name value1 value2
0 2021 7 1 cars 161.29 129.03
1 2021 7 2 cars 161.29 129.03
2 2021 7 3 cars 161.29 129.03
3 2021 7 4 cars 161.29 129.03
...
31 2021 7 1 boats 64.51 8.064
32 2021 7 2 boats 64.51 8.064
33 2021 7 3 boats 64.51 8.064
...
62 2021 9 1 cars 100 233.33
63 2021 9 1 cars 100 233.33
64 2021 9 1 cars 100 233.33
The idea is that i want to divide the value columns by the number of days in the month, and create a day column so that in the end i can achieve a date column concatenating year, month and day.
Can anyone help me?
One option would be to use monthrange from calendar to get the number of days in a given month, divide the value by days in the month, then use Index.repeat to scale up the DataFrame and groupby cumcount to add in the Days:
from calendar import monthrange
import pandas as pd
df = pd.DataFrame(
{'year': {0: 2021, 1: 2021, 2: 2021}, 'month': {0: 7, 1: 7, 2: 9},
'name': {0: 'cars', 1: 'boats', 2: 'cars'},
'value1': {0: 5000, 1: 2000, 2: 3000},
'value2': {0: 4000, 1: 250, 2: 7000}})
days_in_month = (
df[['year', 'month']].apply(lambda x: monthrange(*x)[1], axis=1)
)
# Calculate new values
df.loc[:, 'value1':] = df.loc[:, 'value1':].div(days_in_month, axis=0)
df = df.loc[df.index.repeat(days_in_month)] # Scale Up DataFrame
df.insert(2, 'day', df.groupby(level=0).cumcount() + 1) # Add Days Column
df = df.reset_index(drop=True) # Clean up Index
df:
year month day name value1 value2
0 2021 7 1 cars 161.290323 129.032258
1 2021 7 2 cars 161.290323 129.032258
2 2021 7 3 cars 161.290323 129.032258
3 2021 7 4 cars 161.290323 129.032258
4 2021 7 5 cars 161.290323 129.032258
.. ... ... ... ... ... ...
87 2021 9 26 cars 100.000000 233.333333
88 2021 9 27 cars 100.000000 233.333333
89 2021 9 28 cars 100.000000 233.333333
90 2021 9 29 cars 100.000000 233.333333
91 2021 9 30 cars 100.000000 233.333333
for that you need to create a temp dataframe that will include the days in each month, then merge it, then divide the values
let's assume that you have data single year, so we can create the date range from it straight away, and create the temp dataframe:
dt_range = pd.DatFrame(pd.date_range(df.loc[0,'year'] + '-01-01', periods=365))
dt_range.columns = ['dte']
dt_range['year'] = dt_range['dte'].dt.year
dt_range['month'] = dt_range['dte'].dt.month
dt_range['day'] = dt_range['dte'].dt.day
now we can create the new dataframe:
new_df = pd.merge(df, dt_range,how='left',on=['year','month'])
now all we have to do is group by and merge, and we have what you needed
new_df = new_df.groupby(['year','month','day']).agg({'value':'mean'})
You can use resample to upsample months into days:
import pandas as pd
df = pd.DataFrame([[2021,7,5000]], columns=['year', 'month', 'value'])
# create datetime column as period
df['datetime'] = pd.to_datetime(df['month'].astype(str) + '/' + df['year'].astype(str)).dt.to_period("M")
# calculate values per day by dividing the value by number of days per month
df['ndays'] = df['datetime'].apply(lambda x: x.days_in_month)
df['value'] = df['value'] / df['ndays']
# set datetime as index and resample:
df = df[['value', 'datetime']].set_index('datetime')
df = df.resample('d').ffill().reset_index()
#split datetime to separate columns
df['day'] = df['datetime'].dt.day
df['month'] = df['datetime'].dt.month
df['year'] = df['datetime'].dt.year
df.drop(columns=['datetime'], inplace=True)
value
day
month
year
0
161.29
1
7
2021
1
161.29
2
7
2021
2
161.29
3
7
2021
3
161.29
4
7
2021
4
161.29
5
7
2021
I assume dataframe can have more months, for example extending a little Your initial dataframe:
df = pd.read_csv(StringIO("""
year month value
2021 7 5000
2021 8 5000
2021 9 5000
"""), sep = "\t")
Which gives dataframe df:
year month value
0 2021 7 5000
1 2021 8 5000
2 2021 9 5000
Solution is simple one-liner: first datetime index is created from raw year and month, then resample method is used to convert months to days, finally value is overwritten by calculating average per day in every month:
df_out = (
df.set_index(pd.DatetimeIndex(pd.to_datetime(dict(year=df.year, month=df.month, day=1)), freq="MS"))
.resample('D')
.ffill()
.assign(value = lambda df: df.value/df.index.days_in_month)
)
Resulting dataframe:
year month value
2021-07-01 2021 7 161.290323
2021-07-02 2021 7 161.290323
2021-07-03 2021 7 161.290323
2021-07-04 2021 7 161.290323
2021-07-05 2021 7 161.290323
... ... ...
2021-08-28 2021 8 161.290323
2021-08-29 2021 8 161.290323
2021-08-30 2021 8 161.290323
2021-08-31 2021 8 161.290323
2021-09-01 2021 9 166.666667
Please note September has only 30 days, so value is different than in previous months.
index date miles
0 7/8/2015 14:00:00 10
1 7/8/2015 15:00:01 2
2 7/8/2015 16:00:01 5
3 7/9/2015 09:00:02 12
4 7/10/2015 12:00:00 4
5 7/11/2015 11:00:00 25
6 7/12/2015 04:34:33 10
7 7/12/2015 05:35:35 22
8 7/12/2015 23:11:11 14
9 7/13/2015 01:00:23 10
10 7/13/2015 03:00:03 2
I want to make this table to following;
7/8/2015 17
7/9/2015 12
7/10/2015 4
7/11/2015 25
7/12/2015 46
7/13/2015 12
How can i make something like this in python? Group by date to get sum of miles of each day
If you asked about a solution to add the miles of same day in one line .A way to do it is to go through all of the dates using (for loop) and add all that are equal or basically the same date to a variable then print each line
Using resample:
df.set_index('date', inplace=True)
ddf = df.resample('1D').sum()
resample needs a datetime index, so you need to set the index to 'date' before.
If df is your sample input, ddf will look:
miles
date
2015-07-08 17
2015-07-09 12
2015-07-10 4
2015-07-11 25
2015-07-12 46
2015-07-13 12
As #Valentino mentionned:
data = {
'date': ['7/8/2015 14:00:00', '7/8/2015 14:00:00', '7/8/2015 14:00:00', '7/9/2015 14:00:00'],
'miles': [10, 2, 5, 12]
}
df = pandas.DataFrame(data)
df['date'] = pandas.to_datetime(df.date)
df['date'] = df['date'].dt.strftime('%m/%d/%Y')
print(df)
Out:
date miles
0 7/8/2015 10
1 7/8/2015 2
2 7/8/2015 5
3 7/9/2015 12
print(df.groupby('date').sum())
Out:
date miles
7/8/2015 17
7/9/2015 12
I am currently working on a dataset of 8 000 rows.
I want to split my date column by day, month, year. dtype for the date is object
How to convert the whole column of date by date. month, year?
A sample of the date of my dataset is shown below:
date
01-01-2016
01-01-2016
01-01-2016
01-01-2016
01-01-2016
df=pd.DataFrame(columns=['date'])
df['date'] = pd.to_datetime(df['date'], infer_datetime_format=True)
print(df)
dt=datetime.strptime('date',"%d-%m-%y")
print(dt)
This is the code I am using for date splitting but it is showing mean error
ValueError: time data 'date' does not match format '%d-%m-%y'
If you have pandas you can do this:
import pandas as pd
# Recreate your dataframe
df = pd.DataFrame(dict(date=['01-01-2016']*6))
df.date = pd.to_datetime(df.date)
# Create 3 new columns
df[['year','month','day']] = df.date.apply(lambda x: pd.Series(x.strftime("%Y,%m,%d").split(",")))
df
Returns
date year month day
0 2016-01-01 2016 01 01
1 2016-01-01 2016 01 01
2 2016-01-01 2016 01 01
3 2016-01-01 2016 01 01
4 2016-01-01 2016 01 01
5 2016-01-01 2016 01 01
Or without the formatting options:
df['year'],df['month'],df['day'] = df.date.dt.year, df.date.dt.month, df.date.dt.day
df
Returns
date year month day
0 2016-01-01 2016 1 1
1 2016-01-01 2016 1 1
2 2016-01-01 2016 1 1
3 2016-01-01 2016 1 1
4 2016-01-01 2016 1 1
5 2016-01-01 2016 1 1
I found this but cant get the syntax correct.
time.asctime(time.strptime('2017 28 1', '%Y %W %w'))
I want to set a new column to show month in the format "201707" for July. It can be int64 or string doesnt have to be an actual readable date in the column.
My dataframe column ['Week'] is also in the format 201729 i.e. YYYYWW
dfAttrition_Billings_KPIs['Day_1'] = \
time.asctime(time.strptime(dfAttrition_Billings_KPIs['Week'].str[:4]
+ dfAttrition_Billings_KPIs['Month'].str[:-2] - 1 + 1', '%Y %W %w'))
So I want the output of the rows that have week 201729 to show in a new field month 201707. the output depends on what the row value is in 'Week'.
I have a million records so would like to avoid iterations of rows, lambdas and slow functions where possible :)
Use to_datetime with parameter format with add 1 for Mondays, last for format YYYYMM use strftime
df = pd.DataFrame({'date':[201729,201730,201735]})
df['date1']=pd.to_datetime(df['date'].astype(str) + '1', format='%Y%W%w')
df['date2']=pd.to_datetime(df['date'].astype(str) + '1', format='%Y%W%w').dt.strftime('%Y%m')
print (df)
date date1 date2
0 201729 2017-07-17 201707
1 201730 2017-07-24 201707
2 201735 2017-08-28 201708
If need convert from datetime to weeks custom format:
df = pd.DataFrame({'date':pd.date_range('2017-01-01', periods=10)})
df['date3'] = df['date'].dt.strftime('%Y %W %w')
print (df)
date date3
0 2017-01-01 2017 00 0
1 2017-01-02 2017 01 1
2 2017-01-03 2017 01 2
3 2017-01-04 2017 01 3
4 2017-01-05 2017 01 4
5 2017-01-06 2017 01 5
6 2017-01-07 2017 01 6
7 2017-01-08 2017 01 0
8 2017-01-09 2017 02 1
9 2017-01-10 2017 02 2