Issue while converting pandas to datetime - python

I'm converting string to datetime datatype using pandas,
here is my snippet,
df[col] = pd.to_datetime(df[col], format='%H%M%S%d%m%Y', errors='coerce')
input :
col
00000001011970
00000001011970
...
00000001011970
output:
col
1970-01-01
1970-01-01
...
1970-01-01 00:00:00
the ouput consists of date and date with time..
I need the output as date with time.
PLease help me out where I am going wrong

The time is there. It just so happens, because it's midnight, 00:00:00, it is not showing explicitly.
You can see it's with e.g.
df[col].dt.minute
which will give a Series of 0's.
To print out the time explicitly, you could use
df[col].dt.strftime('%H:%M:%S')
Alter the format as you see fit.
Keep in mind that the visual output with anything in Pandas (or computers in general) does not have to be exactly what is stored. It is up to the programmer to format the output into what they want. But calculations on the variables still uses all (invisible) information.

Just like the other answer suggested time is there, but since it's midnight 00:00:00, it's not showing explicitly. To print out the date with time you can try this :
df[col] = pd.to_datetime(df[col], format='%H%M%S%d%m%Y', errors='coerce').dt.strftime('%Y-%m-%d %H:%M:%S')

Related

convert time to UTC in pandas

I have multiple csv files, I've set DateTime as the index.
df6.set_index("gmtime", inplace=True)
#correct the underscores in old datetime format
df6.index = [" ".join( str(val).split("_")) for val in df6.index]
df6.index = pd.to_datetime(df6.index)
The time was put in GMT, but I think it's been saved as BST (British summertime) when I set the clock for raspberry pi.
I want to shift the time one hour backwards. When I use
df6.tz_convert(pytz.timezone('utc'))
it gives me below error as it assumes that the time is correct.
Cannot convert tz-naive timestamps, use tz_localize to localize
How can I shift the time to one hour?
Given a column that contains date/time info as string, you would convert to datetime, localize to a time zone (here: Europe/London), then convert to UTC. You can do that before you set as index.
Ex:
import pandas as pd
dti = pd.to_datetime(["2021-09-01"]).tz_localize("Europe/London").tz_convert("UTC")
print(dti) # notice 1 hour shift:
# DatetimeIndex(['2021-08-31 23:00:00+00:00'], dtype='datetime64[ns, UTC]', freq=None)
Note: setting a time zone means that DST is accounted for, i.e. here, during winter you'd have UTC+0 and during summer UTC+1.
To add to FObersteiner's response (sorry,new user, can't comment on answers yet):
I've noticed that in all the real world situations I've run across it (with full dataframes or pandas series instead of just a single date), .tz_localize() and .tz_convert() need to be called slightly differently.
What's worked for me is
df['column'] = pd.to_datetime(df['column']).dt.tz_localize('Europe/London').dt.tz_convert('UTC')
Without the .dt, I get "index is not a valid DatetimeIndex or PeriodIndex."

How to convert string/float to time in python/pandas?

I have a dataset which stores durations like 3 hours and 7 minutes in the format of, 3.11 as a string.
I want to convert the column containing these values into datetime in a way that I get: 03:07.
When I do:
df["ConnectedDuration"] = pd.to_datetime(df['ConnectedDuration'])
I get: 1970-01-01 00:00:00.000000003 which is obviousely not what I want.
When I do:
df["ConnectedDuration"] = pd.to_datetime(df['ConnectedDuration'], format='%H:%M')
I get the following error: ValueError: time data '3' does not match format '%H:%M' (match)
Any help is highly appreciated
You want to convert this values to timedelta instead of datetime. Thus you should use the pd.to_timedelta method, like:
pd.to_timedelta(df["ConnectedDuration"].astype('float'), unit='h')

Aligning datetime formats for comparrison

I'm having trouble align two different dates. I have an excel import which I turn into a DateTime in pandas and I would like to compare this DateTime with the current DateTime. The troubles are in the formatting of the imported DateTime.
Excel format of the date:
2020-07-06 16:06:00 (which is yyyy-dd-mm hh:mm:ss)
When I add the DateTime to my DataFrame it creates the datatype Object. After I convert it with pd.to_datetime it creates the format yyyy-mm-dd hh:mm:ss. It seems that the month and the day are getting mixed up.
Example code:
df = pd.read_excel('my path')
df['Arrival'] = pd.to_datetime(df['Arrival'], format='%Y-%d-%m %H:%M:%S')
print(df.dtypes)
Expected result:
2020-06-07 16:06:00
Actual result:
2020-07-06 16:06:00
How do I resolve this?
Gr,
Sempah
An ISO-8601 date/time is always yyyy-MM-dd, not yyyy-dd-MM. You've got the month and date positions switched around.
While localized date/time strings are inconsistent about the order of month and date, this particular format where the year comes first always starts with the biggest units (years) and decreases in unit size going right (month, date, hour, etc.)
It's solved. I think that I misunderstood the results. It already was working without me knowledge. Thanks for the help anyway.

How can I calculate the number of days between two dates with different format in Python?

I have a pandas dataframe with a column of orderdates formatted like this: 2019-12-26.
However when I take the max of this date it will give 2019-12-12. While it is actually 2019-12-26. It makes sense because my dateformat is Dutch and the max() function uses the 'American' (correct me if I'm wrong) format.
This meas that my calculations aren't correct.
How I can change the way the function calculate? Or if thats not possible, change the format of my date column so the calculations are correct?
[In] df['orderdate'] = df['orderdate'].astype('datetime64[ns]')
print(df["orderdate"].max())
[Out] 2019-12-12 00:00:00
Thank you!

Calculate difference between two datetimes if both present in pandas DataFrame

I currently have various time columns (DateTime format) in a pandas DataFrame, as shown below:
Entry Time Exit Time
00:30:59.555 06:30:59.555
00:56:43.200
10:30:30.500 11:30:30.500
I would like to return the difference between these times (Exit Time - Entry Time) in a new column in the dataframe if both Entry Time and Exit Time are present. Otherwise, I would like to skip the row, as shown below:
Entry Time Exit Time Time Difference
00:30:59.555 06:30:59.555 06:00:00.000
00:56:43.200
10:30:30.500 12:00:30.500 01:30:00.000
I am fairly new to Python, so my apologies if this is an obvious question. Any help would be greatly appreciated!
If your dtypes are really datetime's then it's really simple:
In [36]:
df['Difference Time'] = df['Exit Time'] - df['Entry Time']
df
Out[36]:
Entry Time Exit Time Difference Time
0 2014-08-01 00:30:59.555000 2014-08-01 06:30:59.555000 06:00:00
1 2014-08-01 00:56:43.200000 NaT NaT
2 2014-08-01 10:30:30.500000 2014-08-01 11:30:30.500000 01:00:00
[3 rows x 3 columns]
If they are not then you need to convert them using pd.to_datetime e.g.
df['Entry time'] = pd.to_datetime(df['Entry Time'])
EDIT
There seems to be some additional weirdness with your data which I don't quite understand but the following seems to have worked for you:
df.dropna()['Exit_Time'] - df.dropna()['Entry_Time']

Categories

Resources