I was working on re-formatting some data in a dataframe and I needed to calculate a value for a new timedelta column which I did by subtracting start date of event with the start date when series is shifted up one row:
data['DURATION_NEW'] = (data['START'] - data['START'].shift(-1))
This work fine and creates a timedelta column, but the data there are in a very strange format:
foo['DURATION_NEW']
Out[80]:
0 -1 days +23:53:30
1 -1 days +15:35:00
2 -1 days +23:50:00
3 -1 days +23:49:00
4 -1 days +23:53:30
1459 -1 days +23:47:00
1461 -1 days +23:51:00
1462 -1 days +22:08:01
1463 -1 days +23:39:30
1464 NaT
Name: DURATION_NEW, Length: 1406, dtype: timedelta64[ns]
I need to somehow convert this data to be displayed in seconds. First I tried to convert it to a datetime, but for some reason got an error that dtype timedelta64[ns] cannot be converted to datetime64[ns].
Next I tried to manually re-convert it while specifying that I want it to be in seconds:
foo['DURATION_NEW'] = pd.to_timedelta(foo['DURATION_NEW'], unit='sec')
That didn't work either. All stays exactly as it is now.
How can I do this properly?
Use the total_seconds() method on the dt accessor:
foo['DURATION_NEW'].dt.total_seconds()
Related
how can i convert a float64 type value into datetime type value.
here is the the first five float values from the dataset:
0 41245.0
1 41701.0
2 36361.0
3 36145.0
4 42226.0
Name: product_first_sold_date, dtype: float64
And to convert the float type to datetime type value I wrote this:
from datetime import datetime
pd.to_datetime(y['product_first_sold_date'], format='%m%d%Y.0', errors='coerce')
but as the output I got 'NaT' for all the rows in the dataset:
0 NaT
1 NaT
2 NaT
3 NaT
4 NaT
Name: product_first_sold_date, Length: 19273, dtype: datetime64[ns]
then, this:
print(pd.to_datetime(y.product_first_sold_date, infer_datetime_format=True))
but it shows the same date for all the rows in the dataset
0 1970-01-01 00:00:00.000041245
1 1970-01-01 00:00:00.000041701
2 1970-01-01 00:00:00.000036361
3 1970-01-01 00:00:00.000036145
4 1970-01-01 00:00:00.000042226
and I really can't figure out what's wrong with the code?
i have also tried this:
pd.to_datetime(pd.Series(g.product_first_sold_date).astype(str), format='%d%m%Y.0')
and got this as output I have also change the format = '%Y%m%d.0':
ValueError: time data '41245.0' does not match format '%d%m%Y.0' (match)
it looks like nothing works or may be I just did something wrong, don't know how to fix this.Thanks in advance!
I'd assume these floating point values represent dates as Excel handles them internally, i.e. days since 1900-01-01:
To convert this format to Python/pandas datetime, you can do so by setting the appropriate origin and unit:
df['product_first_sold_date'] = pd.to_datetime(df['product_first_sold_date'],
origin='1899-12-30',
unit='D')
...which gives for the provided example
0 2012-12-02
1 2014-03-03
2 1999-07-20
3 1998-12-16
4 2015-08-10
Name: product_first_sold_date, dtype: datetime64[ns]
Important to note here (see #chux-ReinstateMonica's comment) is that 1900-01-01 is day 1 in Excel, not day zero (which you have to provide as origin). Day zero is 1899-12-30; in case you wonder why it's not 1899-12-31, the explanation is quite interesting, you can find more info here.
I have a data frame with type: String , i want to convert the delta column into total hours
deltas
0 2 days 12:19:00
1 04:45:00
2 3 days 06:41:00
3 5 days 01:55:00
4 13:57:00
Desired Output:
deltas
0 60 hours
1 4 hours
I tried pd.to_timedelta() but i get this error only leading negative signs are allowed and i am totally stuck in this
To get the number of hours as int run:
(pd.to_timedelta(df.s) / np.timedelta64(1, 'h')).astype(int)
The first step is to convert the string representation of Timedelta to
actual Timedelta.
Then divide it by 1 hour and convert to int.
I was working with some data in pandas as after saving it to csv it changed the format from 00:31:24.904000 timedelta64[ns] to 0 days 00:31:24.904000 Object.
0 0 days 00:25:20.835688000
1 0 days 00:01:44.004000000
2 0 days 00:18:29.023000000
3 0 days 00:09:06.633000000
4 0 days 00:02:16.826000000
...
6004 0 days 00:00:00.000000000
6005 0 days 00:31:24.904000000
6006 0 days 00:02:31.637000000
6007 0 days 00:03:40.214000000
6008 0 days 00:01:26.577000000
Name: Time, Length: 6009, dtype: object
How can I convert it back to timedelta or some other date/time related format?
How can I avoid such conversion during saving to csv
How can I convert it back to timedelta or some other date/time related format?
df['Time'] = pd.to_timedelta(df['Time'])
How can I avoid such conversion during saving to csv
It is not possible, because in csv all data are strings.
I do have a dataframe like this -> df
timestamp values
0 1574288141 34
1 1574288241 23
2 1574288341 22
3 1574288441 10
Here timestamp has the epoch time. I want to convert this into a datetime in the format 2019-11-20 04:03:01. I would like to convert this into a EST date.
When I do
pd.to_datetime(df['timestamp'], unit='s')
I get the conversion and the required format but the time doesn't seem to be in EST. It is 4 hours ahead of EST.
I have tried to convert utc to Eastern using the code
pd.to_datetime(df['timestamp'], unit='s').tz_localize('utc').dt.tz_convert('US/Eastern')
But I am getting an error
TypeError: index is not a valid DatetimeIndex or PeriodIndex
You should adding dt , since your input is series not index
pd.to_datetime(df.timestamp,unit='s').dt.tz_localize('utc').dt.tz_convert('US/Eastern')
Out[8]:
0 2019-11-20 17:15:41-05:00
1 2019-11-20 17:17:21-05:00
2 2019-11-20 17:19:01-05:00
3 2019-11-20 17:20:41-05:00
Name: timestamp, dtype: datetime64[ns, US/Eastern]
I am trying to alter the text on every second row after interpolation the numeric values between rows.
stamp value
0 00:00:00 2
1 00:00:00 3
2 01:00:00 5
trying to apply this change to every second stamp row (ie 30 instead of 00 between colons) - str column
stamp value
0 00:00:00 2
1 00:30:00 3
2 01:00:00 5
function to change string
def time_vals(row):
#run only on odd rows (1/2 hr)
if int(row.name) % 2 != 0:
l, m, r = row.split(':')
return l+":30:"+r
I have tried the following:
hh_weather['time'] =hh_weather[hh_weather.rows[::2]['time']].apply(time_vals(2))
but I get an error: AttributeError: 'DataFrame' object has no attribute 'rows'
and when I try:
hh_weather['time'] = hh_weather['time'].apply(time_vals)
AttributeError: 'str' object has no attribute 'name'
Any ideas?
Use timedelta instead of str
The strength of Pandas lies in vectorised functionality. Here you can use timedelta to represent times numerically. If data is as in your example, i.e. seconds are always zero, you can floor by hour and add 30 minutes. Then assign this series conditionally to df['stamp'].
# convert to timedelta
df['stamp'] = pd.to_timedelta(df['stamp'])
# create series by flooring by hour, then adding 30 minutes
s = df['stamp'].dt.floor('h') + pd.Timedelta(minutes=30)
# assign new series conditional on index
df['stamp'] = np.where(df.index % 2, s, df['stamp'])
print(df)
stamp value
0 00:00:00 2
1 00:30:00 3
2 01:00:00 5
#convert string value to timedelta (better to work with time)
df['stamp']=pd.to_timedelta(df['stamp'])
#slicing only odd row's from `stamp` column and adding 30 minutes to all the odd row's
odd_df=pd.to_timedelta(df.loc[1::2,'stamp'])+pd.to_timedelta('30 min')
#updating new series (out_df) with the existing df, based on index.
df['stamp'].update(odd_df)
#print(df)
stamp value
0 00:00:00 2
1 00:30:00 3
2 01:00:00 5