Two of the columns in my dataset are hour and mins as integers. Here's a snippet of the dataset.
I'm creating a timestamp through the following code:
TIME = pd.to_timedelta(df["hour"], unit='h') + pd.to_timedelta(df["mins"], unit='m')
#df['TIME'] = TIME
df['TIME'] = TIME.astype(str)
I convert TIME to string format because I'm exporting the dataframe to MS Excel which doesn't support timedelta format.
Now I want timestamps for every minute.
For that, I want to fill the missing minutes and add zero to the TOTAL_TRADE_RATE against them, for which I first have to set the TIME column as index. I'm applying this:
df = df.set_index('TIME')
df.index = pd.DatetimeIndex(df.index)
df.resample('60s').sum().reset_index()
but it's giving the following error:
Unknown string format: 0 days 09:33:00.000000000
Related
I am currently working on multiple datasets with TimeStamp column : dd/mm/yyyy HH:MM daily data at 5 mins interval
i want to resample dataset to fill missing dates n timestamps
Issue is few datasets have some rows as ddmmyy and then format abruptly
changes to mmddyyyy after say first few 100 rows and again ddmmyy without any pattern...
need solution or help to correct this issue
code i am using :::
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df['Timestamp'] = df.Timestamp.dt.strftime('%d/%m/%y %H:%M')
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
start_dt = df.loc[0, "Timestamp"]
end_dt = df["Timestamp"].iloc[-1]
r = pd.date_range(start=start_dt, end=end_dt, freq="5min")
# Reindexing by adding missing dates
df = df.set_index('Timestamp').reindex(r).rename_axis("Timestamp").reset_index()
Use regex to filter rows having ddmmyy & mmddyy and then convert to datetime format.
Trying to change multiple columns to the same datatype at once,
columns contain time data like hours minute and seconds, like
And the data
and I'm not able to change multiple columns at once to using pd.to_datetime to only the time format, I don't want the date because, if I do pd.to_datetime the date also gets added to the column which is not required, just want the time
how to convert the column to DateTime and only keep time in the column
First You can't have a datetime with only time in it in pandas/python.
So
Because python time is object in pandas convert all columns to datetimes (but there are also dates):
cols = ['Total Break Time','col1','col2']
df[cols] = df[cols].apply(pd.to_datetime)
Or convert columns to timedeltas, it looks like similar times, but possible working by datetimelike methods in pandas:
df[cols] = df[cols].apply(pd.to_timedelta)
You can pick only time as below:
import time
df['Total Break Time'] = pd.to_datetime(df['Total Break Time'],format= '%H:%M:%S' ).dt.time
Then you can repeat this for all your columns, as I suppose you already are.
The catch is, to convert to datetime and then only picking out what you need.
I have a dataframe that has an column that has an object datatype with the format mm:ss. I want to convert that column to a time format so that I could turn the time into seconds instead of mm:ss. However, I have not been able to convert the column into a time format.
Example of my data:
time
33:22
24:56
30:15
26:57
I have tried:
df['time'] = pd.to_timedelta(df['time'])
How do I convert this object data type column to a time format? And ultimately to total seconds?
Just add '00:' to the beginning.
df['time'] = pd.to_timedelta('00:' + df['time'])
df['total seconds'] = df['time'].dt.total_seconds()
I have a pandas dataframe column that pandas currently thinks is an object. It's written as "58:42.5" which is the minutes then seconds and fractions of a second. I want to convert that to a time type so that I can subtract the two time columns I have to get a duration.
I've tried:
merged['started_at'] = pd.to_datetime(merged['started_at'],format='%M:%S').dt.time
However, that returns an error saying that the .5 was not converted. (ValueError: unconverted data remains: .5). I can I convert my entire two columns of time data to the correct format so that I can eventually do math with them.
My columns look like:
started_at , ended_at
58:42.5 , 00:02.3
00:55.5 , 02:13.9
Thanks for your help.
You can convert to timedelta without specifying a format:
x = '58:42.5'
res = pd.to_timedelta('00:'+x)
Timedelta('0 days 00:58:42.500000')
This transformation can easily be applied to a series:
merged['started_at'] = pd.to_timedelta('00:' + merged['started_at'], errors='coerce')
Your subsequent operations, for example difference between timedelta objects, should follow naturally.
I have a column that is unix timestamps. I want to convert this column to just dates in a %y-%m-%d format. Just to test the to_datetime() function I did the below, which works as expected and gives me the column in a format like this 2015-05-12 00:11:30 :
df['time'] = pd.to_datetime(df['time'], unit='s')
When I add in the format argument Like below, I get an error:
df['time'] = pd.to_datetime(df['time'], unit='s', format='%d/%m/%Y')
The error is ValueError: time data 1431389490 does not match format '%d/%m/%Y'
How can I strip off the hours, minutes and seconds so I am only left with 2014-05-12?
If you want to extract just the date, you can do that in a second step after converting to datetime:
x = pd.to_datetime(pd.Series([1431389490]), unit='s')
# Datetime columns have a `.dt` attribute, with useful properties
# and methods for working with dates
x.dt.date
Out[7]:
0 2015-05-12
dtype: object
This will discard the information about hours and minutes, but you will be able to work with the resulting column/series easily because the result is a datetime.date object, e.g. subtracting to find the number of days between your column and a certain date.
If you want to keep the information about hours and minutes, but only display it differently, I'm not sure that's possible.