I have a dataframe like:
This time
Time difference
2000-01-01 00:00:00
-3:00
2000-03-01 05:00:00
-5:00
...
...
2000-01-24 16:10:00
-7:00
I'd like to convert the 2nd column (-3:00 means minus 3 hours) from string into something like a time offest that I can directly use to operate with the 1st column (which is already in datetime64[ns]).
I thought there was supposed to be something in pd that does it but couldn't find anything straightforward. Does anyone have any clue?
You can use pd.to_timedelta:
df['Time difference'] = pd.to_timedelta(df['Time difference']+':00')
Obs: I used + ':00' because the default format for string conversion in pd.to_timedelta is "hh:mm:ss".
Related
I'm converting string to datetime datatype using pandas,
here is my snippet,
df[col] = pd.to_datetime(df[col], format='%H%M%S%d%m%Y', errors='coerce')
input :
col
00000001011970
00000001011970
...
00000001011970
output:
col
1970-01-01
1970-01-01
...
1970-01-01 00:00:00
the ouput consists of date and date with time..
I need the output as date with time.
PLease help me out where I am going wrong
The time is there. It just so happens, because it's midnight, 00:00:00, it is not showing explicitly.
You can see it's with e.g.
df[col].dt.minute
which will give a Series of 0's.
To print out the time explicitly, you could use
df[col].dt.strftime('%H:%M:%S')
Alter the format as you see fit.
Keep in mind that the visual output with anything in Pandas (or computers in general) does not have to be exactly what is stored. It is up to the programmer to format the output into what they want. But calculations on the variables still uses all (invisible) information.
Just like the other answer suggested time is there, but since it's midnight 00:00:00, it's not showing explicitly. To print out the date with time you can try this :
df[col] = pd.to_datetime(df[col], format='%H%M%S%d%m%Y', errors='coerce').dt.strftime('%Y-%m-%d %H:%M:%S')
I have a large dataframe that, in its date column, has a mixture of date formats (only 2).
Most are in the correct format but there is some data that is in a different format.
i.e. most are 2013-11-07. Some are 20170510. Pandas throws an exception when i try to validate the code against a schema i have.
Is there a quick way to convert all dates to have the same format as the majority? Or do i have to do something more painful/manual?
i.e.
date \
0 2013-11-07 False
2 2013-11-07 False
... ... ... ... ... ...
3595037 20170510 NaN
3595038 20200701 NaN
Is there a quick way to convert all dates to have the same format as the majority?
Considering that you have only two formats, one represented by 2013-11-07 and another by 20170510 it is enough to remove - from first to get common format, i.e.
import pandas as pd
df = pd.DataFrame({'day':['2013-11-07','20170510']})
df['day'] = df['day'].str.replace('-','')
print(df)
output
day
0 20131107
1 20170510
pandas.to_datetime does understand it correctly
df['day'] = pd.to_datetime(df['day'])
print(df)
output
day
0 2013-11-07
1 2017-05-10
Disclaimer: I converted to format of minority not majority. It is possible to convert that to format of majority using regular expression, however if you are interested in datetime objects, this is unnecessary complication.
I have a dataset which stores durations like 3 hours and 7 minutes in the format of, 3.11 as a string.
I want to convert the column containing these values into datetime in a way that I get: 03:07.
When I do:
df["ConnectedDuration"] = pd.to_datetime(df['ConnectedDuration'])
I get: 1970-01-01 00:00:00.000000003 which is obviousely not what I want.
When I do:
df["ConnectedDuration"] = pd.to_datetime(df['ConnectedDuration'], format='%H:%M')
I get the following error: ValueError: time data '3' does not match format '%H:%M' (match)
Any help is highly appreciated
You want to convert this values to timedelta instead of datetime. Thus you should use the pd.to_timedelta method, like:
pd.to_timedelta(df["ConnectedDuration"].astype('float'), unit='h')
Currently, I have a series of Datetime Values that display as so
0 Datetime
1 20041001
2 20041002
3 20041003
4 20041004
they are within a series named
d['Datetime']
They were originally something like
20041001ABCDEF
But I split the end off just to leave them with the remaining numbers. How do I go about putting them into the following format?
2004-10-01
You can do the following,
df['Datetime'] = pd.to_datetime(df['Datetime'], format='%Y%m%d'))
I am trying to use pandas.DatetimeIndex.asof() to find the closest value to a certain date. However, what is the input for this function exactly?
The documentation states that the input is a label but of what format?
To be more specific, I have a DataFrame that looks like this, where the datetime column is set as an index. I want the code to return the index of the row whose datetime is closest to 2018-07-28 13:00:00.
datetime | price
2018-07-28 12:57:13 8.50
2018-07-28 12:59:45 8.60
2018-07-28 13:01:19 8.70
2018-07-28 13:03:27 8.65
Agreed, the use of the word label in the documentation is unclear. The format should be the same as your datetime format. For example:
# If datetime column is already in datetime format:
df.set_index(df.datetime).asof('2018-07-28 13:00:00')
# If datetime is not already in proper datetime format
df.set_index(pd.to_datetime(df.datetime)).asof('2018-07-28 13:00:00')
returns a series of the closest datetime found:
datetime 2018-07-28 12:59:45
price 8.6
Name: 2018-07-28 13:00:00, dtype: object
Alternative solution (better IMO)
I think a better way to do this though is just to subtract your target datetime from the datetime column, find the minumum, and extract that using loc. In this way you can get the true closest value, including from rows that come after it (asof is limited to the most recent label up to and including the passed label, as noted in the docs you linked)
>>> df.loc[abs(df.datetime - pd.to_datetime('2018-07-28 13:00:00')).idxmin()]
datetime 2018-07-28 12:59:45
price 8.6
Name: 1, dtype: object