I have columnar data of dates of the form mm-dd as shown. I need to add the correct year (dates October to December are 2017 and dates after 1-1 are 2018) and make a datetime object. The code below works, but it's ugly. Is there a more Pythonic way to accomplish this?
import pandas as pd
from datetime import datetime
import io
data = '''Date
1-3
1-2
1-1
12-21
12-20
12-19
12-18'''
df = pd.read_csv(io.StringIO(data))
for i,s in enumerate(df.Date):
s = s.split('-')
if int(s[0]) >= 10:
s = s[0]+'-'+s[1]+'-17'
else:
s = s[0]+'-'+s[1]+'-18'
df.Date[i] = pd.to_datetime(s)
print(df.Date[i])
Prints:
2018-01-03 00:00:00
2018-01-02 00:00:00
2018-01-01 00:00:00
2017-12-21 00:00:00
2017-12-20 00:00:00
2017-12-19 00:00:00
2017-12-18 00:00:00
You can conver the date to pandas datetimeobjects. Then modify their year with datetime.replace. See docs for more information.
You can use the below code:
df['Date'] = pd.to_datetime(df['Date'], format="%m-%d")
df['Date'] = df['Date'].apply(lambda x: x.replace(year=2017) if x.month in(range(10,13)) else x.replace(year=2018))
Output:
Date
0 2018-01-03
1 2018-01-02
2 2018-01-01
3 2017-12-21
4 2017-12-20
5 2017-12-19
6 2017-12-18
This is one way using pandas vectorised functionality:
df['Date'] = pd.to_datetime(df['Date'] + \
np.where(df['Date'].str.split('-').str[0].astype(int).between(10, 12),
'-2017', '-2018'))
print(df)
Date
0 2018-01-03
1 2018-01-02
2 2018-01-01
3 2017-12-21
4 2017-12-20
5 2017-12-19
6 2017-12-18
Related
I have the following dataframe:
import pandas as pd
from datetime import datetime
df = pd.DataFrame({'Id_sensor': [1, 2, 3, 4],
'Date_start': ['2018-01-04 00:00:00.0', '2018-01-04 00:00:10.0',
'2018-01-04 00:14:00.0', '2018-01-04'],
'Date_end': ['2018-01-05', '2018-01-06', '2017-01-06', '2018-01-05']})
The columns (Date_start and Date_end) are of type Object. I would like to transform to the data type of dates. And make the columns look the same. That is, in other words, fill in the date, hour and minute fields with zeros that the column (Date_end) does not have.
I tried to make the following code:
df['Date_start'] = pd.to_datetime(df['Date_start'], format='%Y/%m/%d %H:%M:%S')
df['Date_end'] = pd.to_datetime(df['Date_end'], format='%Y/%m/%d %H:%M:%S')
My output:
Id_sensor Date_start Date_end
1 2018-01-04 00:00:00 2018-01-05
2 2018-01-04 00:00:10 2018-01-06
3 2018-01-04 00:14:00 2017-01-06
4 2018-01-04 00:00:00 2018-01-05
But I would like the output to be like this:
Id_sensor Date_start Date_end
1 2018-01-04 00:00:00 2018-01-05 00:00:00
2 2018-01-04 00:00:10 2018-01-06 00:00:00
3 2018-01-04 00:14:00 2017-01-06 00:00:00
4 2018-01-04 00:00:00 2018-01-05 00:00:00
Actually what is happening is that both Series df['Date_start'] and df['Date_end'] are of type datetime64[ns], but when you show the dataframe, if all the time values of the columns are zero, it doesn't show them. What you can try, if you need a formatted output, is to convert them to object types again, and give them format with dt.strftime:
df['Date_start'] = pd.to_datetime(df['Date_start']).dt.strftime('%Y/%m/%d %H:%M:%S')
df['Date_end'] = pd.to_datetime(df['Date_end']).dt.strftime('%Y/%m/%d %H:%M:%S')
print (df)
Outputs:
Id_sensor Date_start Date_end
0 1 2018/01/04 00:00:00 2018/01/05 00:00:00
1 2 2018/01/04 00:00:10 2018/01/06 00:00:00
2 3 2018/01/04 00:14:00 2017/01/06 00:00:00
3 4 2018/01/04 00:00:00 2018/01/05 00:00:00
You can first convert your columns to datetime datatype using to_datetime, and subsequently use dt.strftime to convert the columns to string datatype with your desired format:
import pandas as pd
from datetime import datetime
df = pd.DataFrame({
'Id_sensor': [1, 2, 3, 4],
'Date_start': ['2018-01-04 00:00:00.0', '2018-01-04 00:00:10.0',
'2018-01-04 00:14:00.0', '2018-01-04'],
'Date_end': ['2018-01-05', '2018-01-06', '2017-01-06', '2018-01-05']})
df['Date_start'] = pd.to_datetime(df['Date_start']).dt.strftime('%Y-%m-%d %H:%M:%S')
df['Date_end'] = pd.to_datetime(df['Date_end']).dt.strftime('%Y-%m-%d %H:%M:%S')
print(df)
# Output:
#
# Id_sensor Date_start Date_end
# 0 1 2018-01-04 00:00:00 2018-01-05 00:00:00
# 1 2 2018-01-04 00:00:10 2018-01-06 00:00:00
# 2 3 2018-01-04 00:14:00 2017-01-06 00:00:00
# 3 4 2018-01-04 00:00:00 2018-01-05 00:00:00
I have data that looks like this.
VendorID lpep_pickup_datetime lpep_dropoff_datetime store_and_fwd_flag
2 1/1/2018 0:18:50 1/1/2018 12:24:39 AM N
2 1/1/2018 0:30:26 1/1/2018 12:46:42 AM N
2 1/1/2018 0:07:25 1/1/2018 12:19:45 AM N
2 1/1/2018 0:32:40 1/1/2018 12:33:41 AM N
2 1/1/2018 0:32:40 1/1/2018 12:33:41 AM N
2 1/1/2018 0:38:35 1/1/2018 1:08:50 AM N
2 1/1/2018 0:18:41 1/1/2018 12:28:22 AM N
2 1/1/2018 0:38:02 1/1/2018 12:55:02 AM N
2 1/1/2018 0:05:02 1/1/2018 12:18:35 AM N
2 1/1/2018 0:35:23 1/1/2018 12:42:07 AM N
So, I converted df.lpep_pickup_datetime to datetime, but originally it comes in as a string. I'm not sure which one is easier to work with. I want to append 5 fields onto my current dataframe: year, month, day, weekday, and hour.
I tried this:
df['Year']=[d.split('-')[0] for d in df.lpep_pickup_datetime]
df['Month']=[d.split('-')[1] for d in df.lpep_pickup_datetime]
df['Day']=[d.split('-')[2] for d in df.lpep_pickup_datetime]
That gives me this error: AttributeError: 'Timestamp' object has no attribute 'split'
I tried this:
df2 = pd.DataFrame(df.lpep_pickup_datetime.dt.strftime('%m-%d-%Y-%H').str.split('/').tolist(),
columns=['Month', 'Day', 'Year', 'Hour'],dtype=int)
df = pd.concat((df,df2),axis=1)
That gives me this error: AssertionError: 4 columns passed, passed data had 1 columns
Basically, I want to parse df.lpep_pickup_datetime into year, month, day, weekday, and hour, appending each to the same dataframe. How can I do that?
Thanks!!
Here you go, first I'm creating a random dataset and then renaming the column date to the name you want, so you can just copy the code. Pandas has a big section of time-series series manipulation, you don't actually need to import datetime. Here you can find a lot more information about it:
import pandas as pd
date_rng = pd.date_range(start='1/1/2018', end='4/01/2018', freq='H')
df = pd.DataFrame(date_rng, columns=['date'])
df['lpep_pickup_datetime'] = df['date']
df['year'] = df['lpep_pickup_datetime'].dt.year
df['year'] = df['lpep_pickup_datetime'].dt.month
df['weekday'] = df['lpep_pickup_datetime'].dt.weekday
df['day'] = df['lpep_pickup_datetime'].dt.day
df['hour'] = df['lpep_pickup_datetime'].dt.hour
print(df)
Output:
date lpep_pickup_datetime year weekday day hour
0 2018-01-01 00:00:00 2018-01-01 00:00:00 1 0 1 0
1 2018-01-01 01:00:00 2018-01-01 01:00:00 1 0 1 1
2 2018-01-01 02:00:00 2018-01-01 02:00:00 1 0 1 2
3 2018-01-01 03:00:00 2018-01-01 03:00:00 1 0 1 3
4 2018-01-01 04:00:00 2018-01-01 04:00:00 1 0 1 4
... ... ... ... ... ... ...
2156 2018-03-31 20:00:00 2018-03-31 20:00:00 3 5 31 20
2157 2018-03-31 21:00:00 2018-03-31 21:00:00 3 5 31 21
2158 2018-03-31 22:00:00 2018-03-31 22:00:00 3 5 31 22
2159 2018-03-31 23:00:00 2018-03-31 23:00:00 3 5 31 23
2160 2018-04-01 00:00:00 2018-04-01 00:00:00 4 6 1 0
EDIT: Since this is not working (As stated in the comments in this answer), I believe your data is formated incorrectly. Try this before applying anything:
df['lpep_pickup_datetime'] = pd.to_datetime(df['lpep_pickup_datetime'], format='%d/%m/%y %H:%M:%S')
If this format is recognized properly, then you should have no trouble using dt.year,dt.month,dt.hour,dt.day,dt.weekday.
Give this a go. Since your dates are in the datetime dtype already, just use the datetime properties to extract each part.
import pandas as pd
from datetime import datetime as dt
# Creating a fake dataset of dates.
dates = [dt.now().strftime('%d/%m/%Y %H:%M:%S') for i in range(10)]
df = pd.DataFrame({'lpep_pickup_datetime': dates})
df['lpep_pickup_datetime'] = pd.to_datetime(df['lpep_pickup_datetime'])
# Parse each date into its parts and store as a new column.
df['month'] = df['lpep_pickup_datetime'].dt.month
df['day'] = df['lpep_pickup_datetime'].dt.day
df['year'] = df['lpep_pickup_datetime'].dt.year
# ... and so on ...
Output:
lpep_pickup_datetime month day year
0 2019-09-24 16:46:10 9 24 2019
1 2019-09-24 16:46:10 9 24 2019
2 2019-09-24 16:46:10 9 24 2019
3 2019-09-24 16:46:10 9 24 2019
4 2019-09-24 16:46:10 9 24 2019
5 2019-09-24 16:46:10 9 24 2019
6 2019-09-24 16:46:10 9 24 2019
7 2019-09-24 16:46:10 9 24 2019
8 2019-09-24 16:46:10 9 24 2019
9 2019-09-24 16:46:10 9 24 2019
I have a CSV file that has a column that has values like:
10/23/2018 11:00:00 PM
I want to convert these values strictly by time and create a new column which takes the time of the entry (11:00:00 etc) and changes it into an hour ending time.
Example looks like:
11:00:00 PM to 12:00:00 AM = 24, 12:00:00 AM to 1:00:00 AM = 1, 1:00:00 AM to 2:00:00 AM = 2 .....etc
Looking for a simple way to calculate these by indexing them based off this conversion.
My first pseudo code idea is to do something like grabbing the column df['Date'] and finding out what the time is:
file = pd.read_csv()
def conv(n):
date_time = n.iloc[1,1] #Position of the date-time column in file
for i in date_time:
time = date_time[11:] #Point of the line where time begins
Unsure how to proceed.
You can also do this:
import pandas as pd
data ='''
10/23/2018 11:00:00 PM
10/23/2018 12:00:00 AM
'''.strip().split('\n')
df = pd.DataFrame(data, columns=['date'])
df['date'] = pd.to_datetime(df['date'])
#df['pad1hour'] = df['date'].dt.hour+1
#or
df['pad1hour'] = df['date'] + pd.Timedelta('1 hours')
# I prefer the second as you can add whatever interval e.g. '1 days 3 minutes'
print(df['pad1hour'].dt.time)
You should convert to a datetime with pd.to_datetime(df.your_col) (your format will be automatically parsed correctly, though you can specify it to improve the speed) and then you can use the .dt.hour accessor.
import pandas as pd
# Sample Data
df = pd.DataFrame({'date': pd.date_range('2018-01-01', '2018-01-03', freq='30min')})
df['hour'] = df.date.dt.hour+1
print(df.sample(20))
date hour
95 2018-01-02 23:30:00 24
66 2018-01-02 09:00:00 10
82 2018-01-02 17:00:00 18
80 2018-01-02 16:00:00 17
75 2018-01-02 13:30:00 14
83 2018-01-02 17:30:00 18
49 2018-01-02 00:30:00 1
47 2018-01-01 23:30:00 24
30 2018-01-01 15:00:00 16
52 2018-01-02 02:00:00 3
29 2018-01-01 14:30:00 15
86 2018-01-02 19:00:00 20
59 2018-01-02 05:30:00 6
65 2018-01-02 08:30:00 9
92 2018-01-02 22:00:00 23
8 2018-01-01 04:00:00 5
91 2018-01-02 21:30:00 22
10 2018-01-01 05:00:00 6
89 2018-01-02 20:30:00 21
51 2018-01-02 01:30:00 2
This is the best way to do it:
from datetime import timedelta
import pandas as pd
file = pd.read_csv()
Case One: If you want to keep the date
file['New datetime'] = file['Date_time'].apply(lambda x: pd.to_datetime(x) + timedelta(hours = 1))
Case Two: If you just want the time
file['New time'] = file['Date_time'].apply(lambda x: (pd.to_datetime(x) + timedelta(hours = 1)).time())
If you need the column's data type as string instead of Timestamp you can just do:
file['New time'] = file['New time'].astype(str)
To convert it to a readable string.
Hope it helps.
I am trying to extract a Date and a Time from a Timestamp:
DateTime
31/12/2015 22:45
to be:
Date | Time |
31/12/2015| 22:45 |
however when I use:
df['Date'] = pd.to_datetime(df['DateTime']).dt.date
I Get :
2015-12-31
Similarly with Time i get:
df['Time'] = pd.to_datetime(df['DateTime']).dt.time
gives
23:45:00
but if I try to format it I get an error:
df['Date'] = pd.to_datetime(f['DateTime'], format='%d/%m/%Y').dt.date
ValueError: unconverted data remains: 00:00
Try strftime
df['DateTime'] = pd.to_datetime(df['DateTime'])
df['Date'] = df['DateTime'].dt.strftime('%d/%m/%Y')
df['Time'] = df['DateTime'].dt.strftime('%H:%M')
DateTime Date Time
0 2015-12-31 22:45:00 31/12/2015 22:45
Option 1
Since you don't really need to operate on the dates per se, just split your column on space:
df = df.DateTime.str.split(expand=True)
df.columns = ['Date', 'Time']
df
Date Time
0 31/12/2015 22:45
Option 2
Alternatively, just drop the format specifier completely:
v = pd.to_datetime(df['DateTime'], errors='coerce')
df['Time'] = v.dt.time
df['Date'] = v.dt.floor('D')
df
Time Date
0 22:45:00 2015-12-31
If your DateTime column is already a datetime type, you shouldn't need to call pd.to_datetime on it.
Are you looking for a string ("12:34") or a timestamp (the concept of 12:34 in the afternoon)? If you're looking for the former, there are answers here that cover that. If you're looking for the latter, you can use the .dt.time and .dt.date accessors.
>>> pd.__version__
u'0.20.2'
>>> df = pd.DataFrame({'DateTime':pd.date_range(start='2018-01-01', end='2018-01-10')})
>>> df['date'] = df.DateTime.dt.date
>>> df['time'] = df.DateTime.dt.time
>>> df
DateTime date time
0 2018-01-01 2018-01-01 00:00:00
1 2018-01-02 2018-01-02 00:00:00
2 2018-01-03 2018-01-03 00:00:00
3 2018-01-04 2018-01-04 00:00:00
4 2018-01-05 2018-01-05 00:00:00
5 2018-01-06 2018-01-06 00:00:00
6 2018-01-07 2018-01-07 00:00:00
7 2018-01-08 2018-01-08 00:00:00
8 2018-01-09 2018-01-09 00:00:00
9 2018-01-10 2018-01-10 00:00:00
I have data in a table as presented below:
YEAR DOY Hour
2015 1 0
2015 1 1
2015 1 2
2015 1 3
2015 1 4
2015 1 5
This is how I'm reading the file:
df = pd.read_table('data2015.lst', sep='\s+')
lines = len(df)
To convert it to a datetime object I do:
dates = []
for l in range(0,lines):
date = str(df.ix[l,0])[:-2] +' '+ str(df.ix[l,1])[:-2] +' '+ str(df.ix[l,2])[:-2]
d = pd.to_datetime(date, format='%Y %j %H')
dates.append(d)
But this is taking a lot of time.
Is there some way to do it (more directly) without the loop?
You can do it in one line when reading it:
df = pd.read_csv('file.txt', sep='\s+', index_col='Timestamp',
parse_dates={'Timestamp': [0,1,2]},
date_parser=lambda x: pd.datetime.strptime(x, '%Y %j %H'))
Timestamp
2015-01-01 00:00:00
2015-01-01 01:00:00
2015-01-01 02:00:00
2015-01-01 03:00:00
2015-01-01 04:00:00
2015-01-01 05:00:00