I'm trying to read a CSV file, where some columns have date or time values.
I started with this:
import pandas as pd
from datetime import datetime
timeparse = lambda x: datetime.strptime(x, '%H:%M:%S.%f')
lap_times = pd.read_csv(
'data/lap_times.csv',
parse_dates={'time_datetime': ['time']},
date_parser=timeparse
)
But sometimes the row of the column has a format %M:%S.%f and sometimes has %H:%M:%S.%f. So I got an error.
I thought about creating a function like this, but I can't see how I would pass an argument to the function to do the transformation for each row of the column passed as an argument.
def timeparse_1():
try:
return datetime.strptime(x, '%H:%M:%S.%f')
finally:
return datetime.strptime(x, '%M:%S.%f')
But I'm getting:
NameError: name 'x' is not defined
It would be easier if you post a sample of your CSV file, but something like this may work:
import pandas as pd
from datetime import datetime as dt
df = pd.DataFrame({'Time': ['12:34:56', '12:34:56.789']})
df.Time = df.Time.apply(lambda x: dt.strptime(x, '%H:%M:%S.%f') if len(x) > 8 else dt.strptime(x, '%H:%M:%S'))
Which will result in:
>>> df
0 1900-01-01 12:34:56.000
1 1900-01-01 12:34:56.789
Name: Time, dtype: datetime64[ns]
>>>
But there is a better way:
import pandas as pd
df = pd.DataFrame({'Time': ['12:34:56', '12:34:56.789']})
df.Time = df.Time.apply(pd.to_datetime)
Which results in the following:
>>> df
0 2022-11-20 12:34:56.000
1 2022-11-20 12:34:56.789
Name: Time, dtype: datetime64[ns]
>>>
Using the day of today to complete the datetime object.
Related
How can I convert "2022-03-01 1:01:42 AM" to just 1:01:42?
I tried to strip just the time out and convert to datetime format, but it keeps adding the current date to the beginning. Otherwise, it doesn't properly convert to datetime format so I can plot it later. All I want is the time in datetime format.
def time():
df['Time'] = df['TIME'].apply(lambda x: x.split(' ')[1])
df['Time'] = pd.to_datetime(df.Time, format = '%H:%M:%S', errors='ignore').dt.time
How about using simple date format
from datetime import datetime
now = datetime.now()
print (now.strftime("%H:%M:%S"))
No need to split it. just simply:
import pandas as pd
df = pd.DataFrame({'ID':[1,2],
'TIME':['2022-03-01 1:01:42 AM', '2022-03-01 12:01:42 PM']})
df['Time'] = pd.to_datetime(df.TIME, errors='ignore').dt.time
Output:
df.iloc[0]['Time']
Out[1]: datetime.time(1, 1, 42)
I have a csv file like this:
Tarih, Şimdi, Açılış, Yüksek, Düşük, Hac., Fark %
31.05.2022, 8,28, 8,25, 8,38, 8,23, 108,84M, 0,61%
(more than a thousand lines)
I want to change it like this:
Tarih, Şimdi, Açılış, Yüksek, Düşük, Hac., Fark %
5/31/2022, 8.28, 8.25, 8.38, 8.23, 108.84M, 0.61%
Especially "Date" format is Day.Month.Year and I need to put it in Month/Day/Year format.
i write the code like this:
import pandas as pd
import numpy as np
import datetime
data=pd.read_csv("qwe.csv", encoding= 'utf-8')
df.Tarih=df.Tarih.str.replace(".","/")
df.Şimdi=df.Şimdi.str.replace(",",".")
df.Açılış=df.Açılış.str.replace(",",".")
df.Yüksek=df.Yüksek.str.replace(",",".")
df.Düşük=df.Düşük.str.replace(",",".")
for i in df['Tarih']:
q = 1
datetime_obj = datetime.datetime.strptime(i, "%d/%m/%Y")
df['Tarih'].loc[df['Tarih'].values == q] = datetime_obj
But the "for" loop in my code doesn't work. I need help on this. Thank you
Just looking at converting the date, you can import to a datetime object with arguments for pd.read_csv, then convert to your desired format by applying strftime to each entry.
If I have the following tmp.csv:
date, value
30.05.2022, 4.2
31.05.2022, 42
01.06.2022, 420
import pandas as pd
df = pd.read_csv('tmp.csv', parse_dates=['date'], dayfirst=True)
df['date'] = df['date'].dt.strftime('%m/%d/%Y')
print(df)
output:
date value
0 05/30/2022 4.2
1 05/31/2022 42.0
2 06/01/2022 420.0
So, Basically, I got this 2 df columns with data content. The initial content is in the dd/mm/YYYY format, and I want to subtract them. But I can't really subtract string, so I converted it to datetime, but when I do such thing for some reason the format changes to YYYY-dd-mm, so when I try to subtract them, I got a wrong result. For example:
Initial Content:
a: 05/09/2022
b: 30/09/2021
result expected: 25 days.
Converted to DateTime:
a: 2022-05-09
b: 2021-09-30 (For some reason this date stills the same)
result: 144 days.
I'm using pandas and datetime to make this project.
So, I wanted to know a way I can subtract this 2 columns with the proper result.
--- Answer
When I used
pd.to_datetime(date, format="%d/%m/%Y")
It worked. Thank you all for your time. This is my first project in pandas. :)
df = pd.DataFrame({'Date1': ['05/09/2021'], 'Date2': ['30/09/2021']})
df = df.apply(lambda x:pd.to_datetime(x,format=r'%d/%m/%Y')).assign(Delta=lambda x: (x.Date2-x.Date1).dt.days)
print(df)
Date1 Date2 Delta
0 2021-09-05 2021-09-30 25
I just answered a similar query here subtracting dates in python
import datetime
from datetime import date
from datetime import datetime
import pandas as pd
date_format_str = '%Y-%m-%d %H:%M:%S.%f'
date_1 = '2016-09-24 17:42:27.839496'
date_2 = '2017-01-18 10:24:08.629327'
start = datetime.strptime(date_1, date_format_str)
end = datetime.strptime(date_2, date_format_str)
diff = end - start
# Get interval between two timstamps as timedelta object
diff_in_hours = diff.total_seconds() / 3600
print(diff_in_hours)
# get the difference between two dates as timedelta object
diff = end.date() - start.date()
print(diff.days)
Pandas
import datetime
from datetime import date
from datetime import datetime
import pandas as pd
date_1 = '2016-09-24 17:42:27.839496'
date_2 = '2017-01-18 10:24:08.629327'
start = pd.to_datetime(date_1, format='%Y-%m-%d %H:%M:%S.%f')
end = pd.to_datetime(date_2, format='%Y-%m-%d %H:%M:%S.%f')
# get the difference between two datetimes as timedelta object
diff = end - start
print(diff.days)
I am trying to format some dates with datetime, but for some reason it is ignoring my format call. I want day/month/Year format which is what the CSV file has the format is, but when I try this.
df = pd.read_csv('test.csv', parse_dates=['Date'],
date_parser=lambda x: pd.to_datetime(x, format='%d/%m/%Y'))
Result:
Why is it what I can only assume "defaulting" to %Y-%m-%d ???
This should work.
import datetime as dt
import pandas as pd
df = pd.read_csv('test.csv')
formatted_dates =[]
for old_date in df['Date']:
dt_obj = dt.datetime.strptime(old_date,'%d/%m/%Y')
new_date = """{}/{}/{}""".format(dt_obj.day,dt_obj.month,dt_obj.year)
formatted_dates.append(new_date)
df['Date'] = formatted_dates
Output:
18/1/2017
22/1/2017
31/1/2017
...
P.S. There's a bug with the parse_dates,date_parser in pd.read_csv which automatically changes the format to the YYYY-MM-DD.
I have written a function to convert pandas datetime dates to month-end:
import pandas
import numpy
import datetime
from pandas.tseries.offsets import Day, MonthEnd
def get_month_end(d):
month_end = d - Day() + MonthEnd()
if month_end.month == d.month:
return month_end # 31/March + MonthEnd() returns 30/April
else:
print "Something went wrong while converting dates to EOM: " + d + " was converted to " + month_end
raise
This function seems to be quite slow, and I was wondering if there is any faster alternative? The reason I noticed it's slow is that I am running this on a dataframe column with 50'000 dates, and I can see that the code is much slower since introducing that function (before I was converting dates to end-of-month).
df = pandas.read_csv(inpath, na_values = nas, converters = {open_date: read_as_date})
df[open_date] = df[open_date].apply(get_month_end)
I am not sure if that's relevant, but I am reading the dates in as follows:
def read_as_date(x):
return datetime.datetime.strptime(x, fmt)
Revised, converting to period and then back to timestamp does the trick
In [104]: df = DataFrame(dict(date = [Timestamp('20130101'),Timestamp('20130131'),Timestamp('20130331'),Timestamp('20130330')],value=randn(4))).set_index('date')
In [105]: df
Out[105]:
value
date
2013-01-01 -0.346980
2013-01-31 1.954909
2013-03-31 -0.505037
2013-03-30 2.545073
In [106]: df.index = df.index.to_period('M').to_timestamp('M')
In [107]: df
Out[107]:
value
2013-01-31 -0.346980
2013-01-31 1.954909
2013-03-31 -0.505037
2013-03-31 2.545073
Note that this type of conversion can also be done like this, the above would be slightly faster, though.
In [85]: df.index + pd.offsets.MonthEnd(0)
Out[85]: DatetimeIndex(['2013-01-31', '2013-01-31', '2013-03-31', '2013-03-31'], dtype='datetime64[ns]', name=u'date', freq=None, tz=None)
If the date column is in datetime format and is set to starting day of the month, this will add one month of time to it:
df['date1']=df['date'] + pd.offsets.MonthEnd(0)
import pandas as pd
import numpy as np
import datetime as dt
df0['Calendar day'] = pd.to_datetime(df0['Calendar day'], format='%m/%d/%Y')
df0['Calendar day'] = df0['Calendar day'].apply(pd.datetools.normalize_date)
df0['Month Start Date'] = df0['Calendar day'].dt.to_period('M').apply(lambda r: r.start_time)
This code should work. Calendar Day is a column in which date is given in the format %m/%d/%Y. For example: 12/28/2014 is 28 December, 2014. The output comes out to be 2014-12-01 in class 'pandas.tslib.Timestamp' type.
you can also use numpy to do it faster:
import numpy as np
date_array = np.array(['2013-01-01', '2013-01-15', '2013-01-30']).astype('datetime64[ns]')
month_start_date = date_array.astype('datetime64[M]')
In case the date is not in the index but in another column (works for Pandas 0.25.0):
import pandas as pd
import numpy as np
df = pd.DataFrame(dict(date = [pd.Timestamp('20130101'),
pd.Timestamp('20130201'),
pd.Timestamp('20130301'),
pd.Timestamp('20130401')],
value = np.random.rand(4)))
print(df.to_string())
df.date = df.date.dt.to_period('M').dt.to_timestamp('M')
print(df.to_string())
Output:
date value
0 2013-01-01 0.295791
1 2013-02-01 0.278883
2 2013-03-01 0.708943
3 2013-04-01 0.483467
date value
0 2013-01-31 0.295791
1 2013-02-28 0.278883
2 2013-03-31 0.708943
3 2013-04-30 0.483467
What you are looking for might be:
df.resample('M').last()
The other method as said earlier by #Jeff:
df.index = df.index.to_period('M').to_timestamp('M')