How do i subtarct 2 time columns with each other in Python? - python

I have a column Start and HT where both are Object Datatype:
The output which is needed is (HT - Start) in minutes.
I try to convert them to datetime through pd.to_datetime but it throws error
TypeError: <class 'datetime.time'> is not convertible to datetime
Start
HT
09:30:00
09:40:00
09:30:00
09:36:00
09:30:00
09:50:00
09:30:00
10:36:00
Expected Output
Start
HT
diff(in minutes)
09:30:00
09:40:00
10
09:30:00
09:36:00
6
09:30:00
09:50:00
20
09:30:00
10:36:00
66
Please help.

You should fisrt convert dates using pd.to_datetime()
df['Start'] = pd.to_datetime(df['Start'], format='%H:%M:%S').dt.time.apply(str)
df['HT'] = pd.to_datetime(df['HT'], format='%H:%M:%S').dt.time.apply(str)
df['diff(in minutes)'] = (pd.to_timedelta(df['HT']) - pd.to_timedelta(df['Start'])).dt.total_seconds() / 60
print(df)
You can simplify the above code using pd.to_timedelta()
df['Start'] = pd.to_timedelta(df['Start'])
df['HT'] = pd.to_timedelta(df['HT'])
df['diff(in minutes)'] = (df['HT'] - df['Start']).dt.total_seconds() / 60
print(df)
Start HT diff(in minutes)
0 09:30:00 09:40:00 10.0
1 09:30:00 09:36:00 6.0
2 09:30:00 09:50:00 20.0
3 09:30:00 10:36:00 66.0

Related

How to calculate the difference between in hours two timestamps and exclude weekends

I have a dataframe like this:
Folder1 Folder2
0 2021-11-22 12:00:00 2021-11-24 10:00:00
1 2021-11-23 10:30:00 2021-11-25 18:30:00
2 2021-11-12 10:30:00 2021-11-15 18:30:00
3 2021-11-23 10:00:00 NaN
Using this code:
def strfdelta(td: pd.Timestamp):
seconds = td.total_seconds()
hours = int(seconds // 3600)
minutes = int((seconds % 3600) // 60)
seconds = int(seconds % 60)
return f"{hours:02}:{minutes:02}:{seconds:02}"
df["Folder1"] = pd.to_datetime(df["Folder1"])
df["Folder2"] = pd.to_datetime(df["Folder2"])
bm1 = df["Folder1"].notna() & df["Folder2"].notna()
bm2 = df["Folder1"].notna() & df["Folder2"].isna()
df["Time1"] = (df.loc[bm1, "Folder2"] - df.loc[bm1, "Folder1"]).apply(strfdelta)
df["Time2"] = (datetime.now() - df.loc[bm2, "Folder1"]).apply(strfdelta)
I have this df:
Folder1 Folder2 Time1 Time2
0 2021-11-22 12:00:00 2021-11-24 10:00:00 46:00:00 NaN
1 2021-11-23 10:30:00 2021-11-25 18:30:00 56:00:00 NaN
2 2021-11-12 10:30:00 2021-11-15 18:30:00 80:00:00 NaN
3 2021-11-23 10:00:00 NaN NaN 03:00:00
Basically, this is what i want, but, how can i exclude weekends hours when calculating the the difference between timestamps from Folder1 and Folder2? What should i change to have a df like this:
Folder1 Folder2 Time1 Time2
0 2021-11-22 12:00:00 2021-11-24 10:00:00 46:00:00 NaN
1 2021-11-23 10:30:00 2021-11-25 18:30:00 56:00:00 NaN
2 2021-11-12 10:30:00 2021-11-15 18:30:00 32:00:00 NaN
3 2021-11-23 10:00:00 NaN NaN 03:00:00
So, in row with index 2, 13.11 and 14.11 were weekends so, in Time 1 the difference should be 32 instead of 80
I think you could leverage on pandas.date_range function combined with pandas.tseries.offsets.CustomBusinessHour like this:
# import pandas and numpy
import pandas as pd
import numpy as np
# construct dataframe
df = pd.DataFrame()
df["Folder1"] = pd.to_datetime(
pd.Series(
[
"2021-11-22 12:00:00",
"2021-11-23 10:30:00",
"2021-11-12 10:30:00",
"2021-11-23 10:00:00",
]
)
)
df["Folder2"] = pd.to_datetime(
pd.Series(
[
"2021-11-24 10:00:00",
"2021-11-25 18:30:00",
"2021-11-15 18:30:00",
np.NaN
]
)
)
# define custom business hours
cbh = pd.tseries.offsets.CustomBusinessHour(start="0:00", end="23:59")
# actual calculation
df["Time1"] = df[~(df["Folder1"].isnull() | df["Folder2"].isnull())].apply(
lambda row: len(
pd.date_range(
start=row["Folder1"],
end=row["Folder2"],
freq=cbh)),
axis=1,
)
df.head()
Which for me yields:
print(df.head())
Folder1 Folder2 Time1
0 2021-11-22 12:00:00 2021-11-24 10:00:00 46.0
1 2021-11-23 10:30:00 2021-11-25 18:30:00 56.0
2 2021-11-12 10:30:00 2021-11-15 18:30:00 32.0
3 2021-11-23 10:00:00 NaT NaN
As a bonus you can do your Time2 calculation more efficiently using it as well:
df["Time2"] = df[df["Folder2"].isnull()].apply(
lambda row: len(
pd.date_range(
start=row["Folder1"],
end=datetime.datetime.now(),
freq=cbh)),
axis=1,
)
Which for me yields (at 14:45 CET):
print(df.head())
Folder1 Folder2 Time1 Time2
0 2021-11-22 12:00:00 2021-11-24 10:00:00 46.0 NaN
1 2021-11-23 10:30:00 2021-11-25 18:30:00 56.0 NaN
2 2021-11-12 10:30:00 2021-11-15 18:30:00 32.0 NaN
3 2021-11-23 10:00:00 NaT NaN 5.0
df = pd.DataFrame({'Folder1': ['2021-11-22 12:00:00', '2021-11-23 10:30:00', '2021-11-12 10:30:00', '2021-11-23 10:00:00'],
'Folder2': ['2021-11-24 10:00:00', '2021-11-25 18:30:00', '2021-11-15 18:30:00', None]})
df[['Folder1','Folder2']] = df[['Folder1','Folder2']].astype('datetime64')
def strfdelta(t1, t2):
hd = pd.date_range(t1, t2, freq='W-SAT').append(pd.date_range(t1, t2, freq='W-SUN'))
sec = (t2-t1).total_seconds() - len(hd)*24*3600
return f"{int(sec//3600):02d}:{int((sec%3600)//60):02d}:{int(sec%60):02d}"
now = pd.to_datetime('now')
df['Time1'] = df.fillna(now).apply(lambda x: strfdelta(x['Folder1'], x['Folder2']), axis=1)
print(df)
Prints:
Folder1 Folder2 Time1
0 2021-11-22 12:00:00 2021-11-24 10:00:00 46:00:00
1 2021-11-23 10:30:00 2021-11-25 18:30:00 56:00:00
2 2021-11-12 10:30:00 2021-11-15 18:30:00 32:00:00
3 2021-11-23 10:00:00 NaT 20:58:26
df['Folder1']=pd.to_datetime(df['Folder1'])
df['Folder2']=pd.to_datetime(df['Folder2']).fillna(df['Folder1'])
df['missing']=df.apply(lambda x: pd.date_range(start=x['Folder1'], end=x['Folder2'], freq='D'), axis=1)#Create column with missing date periods
df=(df.assign(time=np.where((df['missing'].apply(lambda x: x.strftime('%w'))).map(set).astype(str).str.contains('0|6'),#Where missing periods have a Saturday or Sunday
(df['Folder2']-df['Folder1']).astype('timedelta64[h]')-48,# When above condition is met, subtract two 48 hours from the two days columns difference
(df['Folder2']-df['Folder1']).astype('timedelta64[h]'))#When condition not met substract just the two date columns)
).drop(columns=['missing']) )
print(df)
Folder1 Folder2 time
0 2021-11-22 12:00:00 2021-11-24 10:00:00 46.0
1 2021-11-23 10:30:00 2021-11-25 18:30:00 56.0
2 2021-11-12 10:30:00 2021-11-15 18:30:00 32.0
3 2021-11-23 10:00:00 2021-11-23 10:00:00 0.0

Add missing timestamp row to a dataframe

I have a dataframe which contains data that were measured at two hours interval each day, some time intervals are however missing. My dataset looks like below:
2020-12-01 08:00:00 145.9
2020-12-01 10:00:00 100.0
2020-12-01 16:00:00 99.3
2020-12-01 18:00:00 91.0
I'm trying to insert the missing time intervals and fill their value with Nan.
2020-12-01 08:00:00 145.9
2020-12-01 10:00:00 100.0
2020-12-01 12:00:00 Nan
2020-12-01 14:00:00 Nan
2020-12-01 16:00:00 99.3
2020-12-01 18:00:00 91.0
I will appreciate any help on how to achieve this in python as i'm a newbie starting out with python
Create DatetimeIndex and use DataFrame.asfreq:
print (df)
date val
0 2020-12-01 08:00:00 145.9
1 2020-12-01 10:00:00 100.0
2 2020-12-01 16:00:00 99.3
3 2020-12-01 18:00:00 91.0
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date').asfreq('2H')
print (df)
val
date
2020-12-01 08:00:00 145.9
2020-12-01 10:00:00 100.0
2020-12-01 12:00:00 NaN
2020-12-01 14:00:00 NaN
2020-12-01 16:00:00 99.3
2020-12-01 18:00:00 91.0
assuming your df looks like
datetime value
0 2020-12-01T08:00:00 145.9
1 2020-12-01T10:00:00 100.0
2 2020-12-01T16:00:00 99.3
3 2020-12-01T18:00:00 91.0
make sure datetime column is dtype datetime;
df['datetime'] = pd.to_datetime(df['datetime'])
so that you can now resample to 2-hourly frequency:
df.resample('2H', on='datetime').mean()
value
datetime
2020-12-01 08:00:00 145.9
2020-12-01 10:00:00 100.0
2020-12-01 12:00:00 NaN
2020-12-01 14:00:00 NaN
2020-12-01 16:00:00 99.3
2020-12-01 18:00:00 91.0
Note that you don't need to set the on= keyword if your df already has a datetime index. The df resulting from resampling will have a datetime index.
Also note that I'm using .mean() as aggfunc, meaning that if you have multiple values within the two hour intervals, you'll get the mean of that.
You can try the following:
I have used datetime and timedelta for this,
from datetime import datetime, timedelta
# Asuming that the data is given like below.
data = ['2020-12-01 08:00:00 145.9',
'2020-12-01 10:00:00 100.0',
'2020-12-01 16:00:00 99.3',
'2020-12-01 18:00:00 91.0']
# initialize the start time using data[0]
date = data[0].split()[0].split('-')
time = data[0].split()[1].split(':')
start = datetime(int(date[0]), int(date[1]), int(date[2]), int(time[0]), int(time[1]), int(time[2]))
newdata = []
newdata.append(data[0])
i = 1
while i < len(data):
cur = start
nxt = start + timedelta(hours=2)
if (str(nxt) != (data[i].split()[0] + ' ' + data[i].split()[1])):
newdata.append(str(nxt) + ' NaN')
else:
newdata.append(data[i])
i+=1
start = nxt
newdata
NOTE : temedelta(hours=2) will add 2 hours to the existing time.

Generating list of 5 minute interval between two times

I have the following strings:
start = "07:00:00"
end = "17:00:00"
How can I generate a list of 5 minute interval between those times, ie
["07:00:00","07:05:00",...,"16:55:00","17:00:00"]
This works for me, I'm sure you can figure out how to put the results in the list instead of printing them out:
>>> import datetime
>>> start = "07:00:00"
>>> end = "17:00:00"
>>> delta = datetime.timedelta(minutes=5)
>>> start = datetime.datetime.strptime( start, '%H:%M:%S' )
>>> end = datetime.datetime.strptime( end, '%H:%M:%S' )
>>> t = start
>>> while t <= end :
... print datetime.datetime.strftime( t, '%H:%M:%S')
... t += delta
...
07:00:00
07:05:00
07:10:00
07:15:00
07:20:00
07:25:00
07:30:00
07:35:00
07:40:00
07:45:00
07:50:00
07:55:00
08:00:00
08:05:00
08:10:00
08:15:00
08:20:00
08:25:00
08:30:00
08:35:00
08:40:00
08:45:00
08:50:00
08:55:00
09:00:00
09:05:00
09:10:00
09:15:00
09:20:00
09:25:00
09:30:00
09:35:00
09:40:00
09:45:00
09:50:00
09:55:00
10:00:00
10:05:00
10:10:00
10:15:00
10:20:00
10:25:00
10:30:00
10:35:00
10:40:00
10:45:00
10:50:00
10:55:00
11:00:00
11:05:00
11:10:00
11:15:00
11:20:00
11:25:00
11:30:00
11:35:00
11:40:00
11:45:00
11:50:00
11:55:00
12:00:00
12:05:00
12:10:00
12:15:00
12:20:00
12:25:00
12:30:00
12:35:00
12:40:00
12:45:00
12:50:00
12:55:00
13:00:00
13:05:00
13:10:00
13:15:00
13:20:00
13:25:00
13:30:00
13:35:00
13:40:00
13:45:00
13:50:00
13:55:00
14:00:00
14:05:00
14:10:00
14:15:00
14:20:00
14:25:00
14:30:00
14:35:00
14:40:00
14:45:00
14:50:00
14:55:00
15:00:00
15:05:00
15:10:00
15:15:00
15:20:00
15:25:00
15:30:00
15:35:00
15:40:00
15:45:00
15:50:00
15:55:00
16:00:00
16:05:00
16:10:00
16:15:00
16:20:00
16:25:00
16:30:00
16:35:00
16:40:00
16:45:00
16:50:00
16:55:00
17:00:00
>>>
Try:
# import modules
from datetime import datetime, timedelta
# Create starting and end datetime object from string
start = datetime.strptime("07:00:00", "%H:%M:%S")
end = datetime.strptime("17:00:00", "%H:%M:%S")
# min_gap
min_gap = 5
# compute datetime interval
arr = [(start + timedelta(hours=min_gap*i/60)).strftime("%H:%M:%S")
for i in range(int((end-start).total_seconds() / 60.0 / min_gap))]
print(arr)
# ['07:00:00', '07:05:00', '07:10:00', '07:15:00', '07:20:00', '07:25:00', '07:30:00', ..., '16:55:00']
Explanations:
First, you need to convert string date to datetime object. The strptime does it!
Then, we will find the number of minutes between the starting date and the ending datetime. This discussion solved it! We can do it like this :
(end-start).total_seconds() / 60.0
However, in our case, we only want to iterate every n minutes. So, in our loop, we need to divide it by n.
Also, as we will iterate over this number of minutes, we need to convertit to int for the for loop. That results in:
int((end-start).total_seconds() / 60.0 / min_gap)
Then, on each element of our loop, we will add the number of minutes to the initial datetime. The tiemdelta function has been designed for. As parameter, we specify the number of hours we want to add : min_gap*i/60.
Finally, we convert this datetime object back to a string object using the strftime.

Flagging list of datetimes within date ranges in pandas dataframe

I've looked around (eg.
Python - Locating the closest timestamp) but can't find anything on this.
I have a list of datetimes, and a dataframe containing 10k + rows, of start and end times (formatted as datetimes).
The dataframe is effectively listing parameters for runs of an instrument.
The list describes times from an alarm event.
The datetime list items are all within a row (i.e. between a start and end time) in the dataframe. Is there an easy way to locate the rows which would contain the timeframe within which the alarm time would be? (sorry for poor wording there!)
eg.
for i in alarms:
df.loc[(df.start_time < i) & (df.end_time > i), 'Flag'] = 'Alarm'
(this didn't work but shows my approach)
Example datasets
# making list of datetimes for the alarms
df = pd.DataFrame({'Alarms':["18/07/19 14:56:21", "19/07/19 15:05:15", "20/07/19 15:46:00"]})
df['Alarms'] = pd.to_datetime(df['Alarms'])
alarms = list(df.Alarms.unique())
# dataframe of runs containing start and end times
n=33
rng1 = pd.date_range('2019-07-18', '2019-07-22', periods=n)
rng2 = pd.date_range('2019-07-18 03:00:00', '2019-07-22 03:00:00', periods=n)
df = pd.DataFrame({ 'start_date': rng1, 'end_Date': rng2})
Herein a flag would go against line (well, index) 4, 13 and 21.
You can use pandas.IntervalIndex here:
# Create and set IntervalIndex
intervals = pd.IntervalIndex.from_arrays(df.start_date, df.end_Date)
df = df.set_index(intervals)
# Update using loc
df.loc[alarms, 'flag'] = 'alarm'
# Finally, reset_index
df = df.reset_index(drop=True)
[out]
start_date end_Date flag
0 2019-07-18 00:00:00 2019-07-18 03:00:00 NaN
1 2019-07-18 03:00:00 2019-07-18 06:00:00 NaN
2 2019-07-18 06:00:00 2019-07-18 09:00:00 NaN
3 2019-07-18 09:00:00 2019-07-18 12:00:00 NaN
4 2019-07-18 12:00:00 2019-07-18 15:00:00 alarm
5 2019-07-18 15:00:00 2019-07-18 18:00:00 NaN
6 2019-07-18 18:00:00 2019-07-18 21:00:00 NaN
7 2019-07-18 21:00:00 2019-07-19 00:00:00 NaN
8 2019-07-19 00:00:00 2019-07-19 03:00:00 NaN
9 2019-07-19 03:00:00 2019-07-19 06:00:00 NaN
10 2019-07-19 06:00:00 2019-07-19 09:00:00 NaN
11 2019-07-19 09:00:00 2019-07-19 12:00:00 NaN
12 2019-07-19 12:00:00 2019-07-19 15:00:00 NaN
13 2019-07-19 15:00:00 2019-07-19 18:00:00 alarm
14 2019-07-19 18:00:00 2019-07-19 21:00:00 NaN
15 2019-07-19 21:00:00 2019-07-20 00:00:00 NaN
16 2019-07-20 00:00:00 2019-07-20 03:00:00 NaN
17 2019-07-20 03:00:00 2019-07-20 06:00:00 NaN
18 2019-07-20 06:00:00 2019-07-20 09:00:00 NaN
19 2019-07-20 09:00:00 2019-07-20 12:00:00 NaN
20 2019-07-20 12:00:00 2019-07-20 15:00:00 NaN
21 2019-07-20 15:00:00 2019-07-20 18:00:00 alarm
22 2019-07-20 18:00:00 2019-07-20 21:00:00 NaN
23 2019-07-20 21:00:00 2019-07-21 00:00:00 NaN
24 2019-07-21 00:00:00 2019-07-21 03:00:00 NaN
25 2019-07-21 03:00:00 2019-07-21 06:00:00 NaN
26 2019-07-21 06:00:00 2019-07-21 09:00:00 NaN
27 2019-07-21 09:00:00 2019-07-21 12:00:00 NaN
28 2019-07-21 12:00:00 2019-07-21 15:00:00 NaN
29 2019-07-21 15:00:00 2019-07-21 18:00:00 NaN
30 2019-07-21 18:00:00 2019-07-21 21:00:00 NaN
31 2019-07-21 21:00:00 2019-07-22 00:00:00 NaN
32 2019-07-22 00:00:00 2019-07-22 03:00:00 NaN
you were calling your columns start_date and end_Date, but in your for you use start_time and end_time.
try this:
import pandas as pd
df = pd.DataFrame({'Alarms': ["18/07/19 14:56:21", "19/07/19 15:05:15", "20/07/19 15:46:00"]})
df['Alarms'] = pd.to_datetime(df['Alarms'])
alarms = list(df.Alarms.unique())
# dataframe of runs containing start and end times
n = 33
rng1 = pd.date_range('2019-07-18', '2019-07-22', periods=n)
rng2 = pd.date_range('2019-07-18 03:00:00', '2019-07-22 03:00:00', periods=n)
df = pd.DataFrame({'start_date': rng1, 'end_Date': rng2})
for i in alarms:
df.loc[(df.start_date < i) & (df.end_Date > i), 'Flag'] = 'Alarm'
print(df[df['Flag']=='Alarm']['Flag'])
Output:
4 Alarm
13 Alarm
21 Alarm
Name: Flag, dtype: object

how to sort by english date format not american pandas .sort()

symb dates
4 BLK 01/03/2014 09:00:00
0 BBR 02/06/2014 09:00:00
21 HZ 02/06/2014 09:00:00
24 OMNI 02/07/2014 09:00:00
31 NOTE 03/04/2014 09:00:00
65 AMP 03/04/2016 09:00:00
40 RBY 04/07/2014 09:00:00
Here's a sample of the output from (df.sort('date')).
As you can see it uses the days for the months and vice versa. Any idea how to fix this ?
You can use pandas.to_datetime and use the format argument then sort it.
>> df['date'] = pd.to_datetime(df['date'], format='%m/%d/%Y %H:%M:%S')
>> df.sort('date')
date symb
0 2014-01-03 09:00:00 BLK
1 2014-02-06 09:00:00 BBR
2 2014-02-06 09:00:00 HZ
3 2014-02-07 09:00:00 OMNI
4 2014-03-04 09:00:00 NOTE
6 2014-04-07 09:00:00 RBY
5 2016-03-04 09:00:00 AMP
You can use to_datetime, for sorting sort_values:
#format mm/dd/YYYY
df['dates'] = pd.to_datetime(df['dates'])
print (df.sort_values('dates'))
symb dates
4 BLK 2014-01-03 09:00:00
0 BBR 2014-02-06 09:00:00
21 HZ 2014-02-06 09:00:00
24 OMNI 2014-02-07 09:00:00
31 NOTE 2014-03-04 09:00:00
40 RBY 2014-04-07 09:00:00
65 AMP 2016-03-04 09:00:00
#format dd/mm/YYYY
df['dates'] = pd.to_datetime(df['dates'], dayfirst=True)
print (df.sort_values('dates'))
symb dates
4 BLK 2014-03-01 09:00:00
31 NOTE 2014-04-03 09:00:00
0 BBR 2014-06-02 09:00:00
21 HZ 2014-06-02 09:00:00
24 OMNI 2014-07-02 09:00:00
40 RBY 2014-07-04 09:00:00
65 AMP 2016-04-03 09:00:00
Another solution is use parameter parse_dates in read_csv, if format dd/mm/YYYY add dayfirst=True:
import pandas as pd
import numpy as np
from pandas.compat import StringIO
temp=u"""symb,dates
BLK,01/03/2014 09:00:00
BBR,02/06/2014 09:00:00
HZ,02/06/2014 09:00:00
OMNI,02/07/2014 09:00:00
NOTE,03/04/2014 09:00:00
AMP,03/04/2016 09:00:00
RBY,04/07/2014 09:00:00"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), parse_dates=['dates'])
print (df)
symb dates
0 BLK 2014-01-03 09:00:00
1 BBR 2014-02-06 09:00:00
2 HZ 2014-02-06 09:00:00
3 OMNI 2014-02-07 09:00:00
4 NOTE 2014-03-04 09:00:00
5 AMP 2016-03-04 09:00:00
6 RBY 2014-04-07 09:00:00
print (df.dtypes)
symb object
dates datetime64[ns]
dtype: object
print (df.sort_values('dates'))
symb dates
0 BLK 2014-01-03 09:00:00
1 BBR 2014-02-06 09:00:00
2 HZ 2014-02-06 09:00:00
3 OMNI 2014-02-07 09:00:00
4 NOTE 2014-03-04 09:00:00
6 RBY 2014-04-07 09:00:00
5 AMP 2016-03-04 09:00:00
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), parse_dates=['dates'], dayfirst=True)
print (df)
symb dates
0 BLK 2014-03-01 09:00:00
1 BBR 2014-06-02 09:00:00
2 HZ 2014-06-02 09:00:00
3 OMNI 2014-07-02 09:00:00
4 NOTE 2014-04-03 09:00:00
5 AMP 2016-04-03 09:00:00
6 RBY 2014-07-04 09:00:00
print (df.dtypes)
symb object
dates datetime64[ns]
dtype: object
print (df.sort_values('dates'))
symb dates
0 BLK 2014-03-01 09:00:00
4 NOTE 2014-04-03 09:00:00
1 BBR 2014-06-02 09:00:00
2 HZ 2014-06-02 09:00:00
3 OMNI 2014-07-02 09:00:00
6 RBY 2014-07-04 09:00:00
5 AMP 2016-04-03 09:00:00
I am not sure how you are getting the data, but if you are importing it from some source such as a CSV you could use pandas.read_csv and set parse_dates=True. The question is what is the type of the dates column? You an easily change them to datelike objects using `dateutil.parse.parse. For example,
import pandas
import dateutil
data = {'symb': ['BLK', 'BBR', 'HZ', 'OMNI', 'NOTE', 'AMP', 'RBY'],
'dates': ['01/03/2014 09:00:00', '02/06/2014 09:00:00', '02/06/2014 09:00:00',
'02/07/2014 09:00:00', '03/04/2014 09:00:00', '03/04/2016 09:00:00',
'04/07/2014 09:00:00']}
df = pandas.DataFrame.from_dict(data)
df.dates = df.dates.apply(dateutil.parser.parse)
print df.to_string()
# OUTPUT
# 0 2014-01-03 09:00:00 BLK
# 1 2014-02-06 09:00:00 BBR
# 2 2014-02-06 09:00:00 HZ
# 3 2014-02-07 09:00:00 OMNI
# 4 2014-03-04 09:00:00 NOTE
# 5 2016-03-04 09:00:00 AMP
# 6 2014-04-07 09:00:00 RBY
This gets you the [ISO8601 format] which may be preferable to the dd/mm/yyyy format, but if you must have that format you can use the code recommended by #umutto

Categories

Resources