How to convert normal string to datetime in Pandas - python

I have normal strings with more than millions data points from .csv file with format as below:
Datetime
22/12/2015 17:00:00
22/12/2015 18:00:00
I loaded into pandas and tried to converted into datetime format by using pandas.to_datetime(df['Datetime']). However, the new time series data I got that is not correct. There are some new Datetime produced during converting process. For example, 2016-12-11 23:30:00 that does not contain in original data.

It has been a while that I worked with panda, but in your example you have a different dateformat than in the example lines from csv:
yyyy-mm-dd hh:mm:ss
instead of
mm/dd/yyyy hh:mm:ss
the to_datetime function takes a parameter "format", this should help if that is the cause.

You want to use the option dayfirst=True
pd.to_datetime(df.Datetime, dayfirst=True)
This:
Datetime
22/12/2015 17:00:00
22/12/2015 18:00:00
11/12/2015 23:30:00
Gets converted to
0 2015-12-22 17:00:00
1 2015-12-22 18:00:00
2 2015-12-11 23:30:00
Name: Datetime, dtype: datetime64[ns]

Related

Datetime dataframe conversion

I am able to convert this to datetime64[ns] while doing individually as a series, but when try to do it over dataframe I get this error:
df[['Date Range','ME Created Date/Time','Ready For Books Date/Time']]=pd.to_datetime(df[['Date Range','ME Created Date/Time','Ready For Books Date/Time']],format='%d-%m-%Y %H:%M:%S')
to assemble mappings requires at least that [year, month, day] be specified: [day,month,year] is missing
Date Range
ME Created Date/Time
Ready For Books Date/Time
11-05-2022 00:00:00
02-05-2022 14:31:37
11-05-2022 00:00:00
10-09-2022 00:00:00
06-09-2022 14:19:03
10-09-2022 00:00:00
10-09-2022 00:00:00
06-09-2022 14:19:03
10-09-2022 00:00:00
10-09-2022 00:00:00
06-09-2022 14:19:03
10-09-2022 00:00:00
10-09-2022 00:00:00
06-09-2022 14:19:03
10-09-2022 00:00:00
I solved it through apply method. But I wanted to do it directly with .to_datetime().
df[['Date Range','ME Created Date/Time','Ready For Books Date/Time']] = df[['Date Range','ME Created Date/Time','Ready For Books Date/Time']].apply(pd.to_datetime, format='%d-%m-%Y %H:%M:%S')
So I have 2 questions:
Is it possible to use to_datetime() directly on the dataframe as shown above without apply method?
Is it possible for to_datetime() to return the output as 'Date' without the input timestamp & without the help of .dt.date accessor?
I'm not sure this is the most efficient way, but for sure it's one of the easiest to read :
df = df.applymap(lambda x: pd.to_datetime(x).date())

Python - Pandas, issue with datetime format

I have a two-columns data frame, with departure and arrival times (see example below).
In order to make operations on those times, i want to convert the string into datetime format, keeping only hour/minutes/seconds information.
Example of input data - file name = table
departure_time,arrival_time
07:00:00,07:30:00
07:00:00,07:15:00
07:05:00,07:22:00
07:10:00,07:45:00
07:15:00,07:50:00
07:10:00,07:26:00
07:40:00,08:10:00
I ran this code to import the table file and then to convert the type in to datetime format:
import pandas as pd
from datetime import datetime
df= pd.read_excel("table.xlsx")
df['arrival_time']= pd.to_datetime(df['arrival_time'], format= '%H:%M:%S')
but get this error:
ValueError: time data ' 07:30:00' does not match format '%H:%M:%S' (match)
What mistake i am making?
Seems like an import issue ' 07:30:00', there's a space in front. If it's a CSV you're importing you can use skipinitialspace=True.
If I import your CSV file, and use your code, it works fine:
CSV:
departure_time,arrival_time
07:00:00,07:30:00
07:00:00,07:15:00
07:05:00,07:22:00
07:10:00,07:45:00
07:15:00,07:50:00
07:10:00,07:26:00
07:40:00,08:10:00
df = pd.read_csv('test.csv', skipinitialspace=True)
df['arrival_time']= pd.to_datetime(df['arrival_time'], format='%H:%M:%S').dt.time
print(df)
departure_time arrival_time
0 07:00:00 07:30:00
1 07:00:00 07:15:00
2 07:05:00 07:22:00
3 07:10:00 07:45:00
4 07:15:00 07:50:00
5 07:10:00 07:26:00
6 07:40:00 08:10:00

Pandas dataframe not including time of day when converting from UNIX

I am retrieving data from an API which is timestamped in UNIX millisecond time and am trying to save this data to a CSV file. The data is in daily intervals but represented in UNIX millisecond time as mentioned.
I am using pandas functions to convert from milliseconds to datetime but is still not saving the data with the time of day part. My code is as follows:
ticker = 'tBTCUSD'
r = requests.get(url, params = params)
data = pd.DataFrame(r.json())
data.set_index([0], inplace = True)
data.index = pd.to_datetime(data.index, unit = 'ms' )
data.to_csv('bitfinex_{}_usd_{}.csv'.format(ticker[1:-3].lower(), '1D'), mode='a', header=False)
It saves the data as 2020-08-21 instead of 2020-08-21 00:00:00. When I poll the API on say, an hourly or 15-minutely basis, that still includes the time but on daily intervals it doesn't. I was wondering if there is a step that I am missing to convert the time accordingly from UNIX millisecond to a %Y-%m-%d %H:%M:%S %Z format?
You can always explicitly specify the format:
data.index = pd.to_datetime(data.index, unit='ms').strftime('%Y-%m-%d %H:%M:%S UTC')
print(data)
1 2 3 4 5
0
2020-09-10 00:00:00 UTC 10241.000000 10333.862868 10516.00000 10233.087967 3427.178984
2020-09-09 00:00:00 UTC 10150.000000 10240.000000 10359.00000 10010.000000 2406.147398
2020-09-08 00:00:00 UTC 10400.000000 10148.000000 10464.00000 9882.400000 6761.138356
2020-09-07 00:00:00 UTC 10275.967600 10397.000000 10430.00000 9913.800000 6301.951492
2020-09-06 00:00:00 UTC 10197.000000 10276.000000 10365.07422 10031.000000 2755.663001
... ... ... ... ... ...
2020-05-18 00:00:00 UTC 9668.200000 9714.825163 9944.00000 9450.000000 9201.536549
2020-05-17 00:00:00 UTC 9386.000000 9668.200000 9883.50000 9329.700000 9663.262087
2020-05-16 00:00:00 UTC 9307.600000 9387.952090 9580.00000 9222.000000 4157.691762
2020-05-15 00:00:00 UTC 9791.000000 9311.200000 9848.90000 9130.200000 11340.269781
2020-05-14 00:00:00 UTC 9311.967387 9790.954158 9938.70000 9266.200000 12867.687617

How can I parse this date format into datetime? Python/Pandas

The starting date format I currently have is 2019-09-04 16:00 UTC+3 and I'm trying to convert it into a datetime format of 2019-09-04 16:00:00+0300.
The format I thought would work was format='%Y-%m-%d %H:%M %Z%z', but when I run it I get the error message ValueError: Cannot parse both %Z and %z.
Does anyone know the correct format to use, or should I be trying a different method altogether? Thanks.
Edit
Sorry, I had a hard time putting into words what it is I am looking to do, hopefully I can clarify.
I'm looking to change all the date and times in a dataframe into the datetime format.
This is the method I was trying to use which presented me with an error
df['datepicker'] = pd.to_datetime(df['datepicker'], format='%Y-%m-%d %H:%M %Z%z')
And here is a sample of the data I currently have.
datepicker
2019-09-07 16:00 UTC+2
2019-09-04 18:30 UTC+4
2019-09-06 17:00 UTC±0
2019-09-10 16:00 UTC+1
2019-09-04 18:00 UTC+3
And this is what I'm looking to convert them into, a timestamp format.
datepicker
2019-09-07 16:00:00+0200
2019-09-04 18:30:00+0400
2019-09-06 17:00:00+0000
2019-09-10 16:00:00+0100
2019-09-04 18:00:00+0300
pandas.to_datetime should parse this happily if you tweak the strings slightly:
import pandas as pd
df = pd.DataFrame({"datepicker":[ "2019-09-07 16:00 UTC+2", "2019-09-04 18:30 UTC+4",
"2019-09-06 17:00 UTC±0", "2019-09-10 16:00 UTC+1",
"2019-09-04 18:00 UTC+3"]})
df['datetime'] = pd.to_datetime(df['datepicker'].str.replace('±', '+'))
# df['datetime']
# 0 2019-09-07 16:00:00-02:00
# 1 2019-09-04 18:30:00-04:00
# 2 2019-09-06 17:00:00+00:00
# 3 2019-09-10 16:00:00-01:00
# 4 2019-09-04 18:00:00-03:00
# Name: datetime, dtype: object
Note that due to the mixed UTC offsets, the column's data type is 'object' (datetime objects). If you wish, you can also convert to UTC straight away, to get a column of dtype datetime[ns]:
df['UTC'] = pd.to_datetime(df['datepicker'].str.replace('±', '+'), utc=True)
# df['UTC']
# 0 2019-09-07 18:00:00+00:00
# 1 2019-09-04 22:30:00+00:00
# 2 2019-09-06 17:00:00+00:00
# 3 2019-09-10 17:00:00+00:00
# 4 2019-09-04 21:00:00+00:00
# Name: UTC, dtype: datetime64[ns, UTC]
When i defined as below. it works as you expect.
from datetime import datetime, timedelta, timezone
UTC = timezone(timedelta(hours=+3))
dt = datetime(2019, 1, 1, 12, 0, 0, tzinfo=UTC)
timestampStr = dt.strftime("%Y-%m-%d %H:%M %Z%z")
print(timestampStr)
With the output of:
2019-01-01 12:00 UTC+03:00+0300

Python - Local Time

I have a dataframe that has entries like this, where the times are in UTC:
start_date_time timezone
1 2017-01-01 14:00:00 America/Los_Angeles
2 2017-01-01 14:00:00 America/Denver
3 2017-01-01 14:00:00 America/Phoenix
4 2017-01-01 14:30:00 America/Los_Angeles
5 2017-01-01 14:30:00 America/Los_Angeles
I need to be able to group by date (local date, not UTC date) and I need to be able to create indicators for whether the event happened between certain times (local times, not UTC times).
I have successfully done the above in R by:
Creating a time variable in each of the timezones
Converting those to strings
Pulling each of the string date/time variables into one column, which one I pull depends on the appropriate timezone
Then, splitting that column to get a string date column and a string time column
I can then convert everything back to datetime objects for comparisons. e.g. now I can say if something happened between 2 and 3pm and it will correctly identify everything that happened between 2 and 3pm locally.
I have tried a bunch in python and have the dates as
2017-01-02 04:30:00-08:00
but I can't figure out how to go from there to
2017-01-01 20:30:00
Thanks!
Your example is incorrect. Your timezone is eight hours behind UTC, which means you need to add eight hours to 4:30AM which is 12:30PM UTC time.
The datetime object function astimezone(...) will do the conversion for you. For ease of use, I recommend pytz.
However in pure python:
import datetime as dt
local_tz = dt.timezone(dt.timedelta(hours=-8))
utc = dt.timezone.utc
d = dt.datetime(2017, 1, 2, 4, 30, 0, 0, local_tz)
print(d, d.astimezone(utc))
Will print:
2017-01-02 04:30:00-08:00 2017-01-02 12:30:00+00:00
Here's an example using pytz to lookup time zones:
import datetime as dt
import pytz
dates = [("2017-01-01 14:00:00", "America/Los_Angeles"),
("2017-01-01 14:00:00", "America/Denver"),
("2017-01-01 14:00:00", "America/Phoenix"),
("2017-01-01 14:30:00", "America/Los_Angeles"),
]
for d, tz_str in dates:
start = dt.datetime.strptime(d, "%Y-%m-%d %H:%M:%S")
start = start.replace(tzinfo=pytz.utc)
local_tz = pytz.timezone(tz_str) # convert to desired timezone
print(start, local_tz.zone, "\t", start.astimezone(local_tz))
This produces:
2017-01-01 14:00:00+00:00 America/Los_Angeles 2017-01-01 06:00:00-08:00
2017-01-01 14:00:00+00:00 America/Denver 2017-01-01 07:00:00-07:00
2017-01-01 14:00:00+00:00 America/Phoenix 2017-01-01 07:00:00-07:00
2017-01-01 14:30:00+00:00 America/Los_Angeles 2017-01-01 06:30:00-08:00

Categories

Resources