I have a data frame with type: String , i want to convert the delta column into total hours
deltas
0 2 days 12:19:00
1 04:45:00
2 3 days 06:41:00
3 5 days 01:55:00
4 13:57:00
Desired Output:
deltas
0 60 hours
1 4 hours
I tried pd.to_timedelta() but i get this error only leading negative signs are allowed and i am totally stuck in this
To get the number of hours as int run:
(pd.to_timedelta(df.s) / np.timedelta64(1, 'h')).astype(int)
The first step is to convert the string representation of Timedelta to
actual Timedelta.
Then divide it by 1 hour and convert to int.
Related
I was working on re-formatting some data in a dataframe and I needed to calculate a value for a new timedelta column which I did by subtracting start date of event with the start date when series is shifted up one row:
data['DURATION_NEW'] = (data['START'] - data['START'].shift(-1))
This work fine and creates a timedelta column, but the data there are in a very strange format:
foo['DURATION_NEW']
Out[80]:
0 -1 days +23:53:30
1 -1 days +15:35:00
2 -1 days +23:50:00
3 -1 days +23:49:00
4 -1 days +23:53:30
1459 -1 days +23:47:00
1461 -1 days +23:51:00
1462 -1 days +22:08:01
1463 -1 days +23:39:30
1464 NaT
Name: DURATION_NEW, Length: 1406, dtype: timedelta64[ns]
I need to somehow convert this data to be displayed in seconds. First I tried to convert it to a datetime, but for some reason got an error that dtype timedelta64[ns] cannot be converted to datetime64[ns].
Next I tried to manually re-convert it while specifying that I want it to be in seconds:
foo['DURATION_NEW'] = pd.to_timedelta(foo['DURATION_NEW'], unit='sec')
That didn't work either. All stays exactly as it is now.
How can I do this properly?
Use the total_seconds() method on the dt accessor:
foo['DURATION_NEW'].dt.total_seconds()
I have a dataframe, one column contains multiple dates called 'date', I want to convert it to the days calculated from today's date (2020-07-04), here's the code:
profile['membership_date'] = pd.to_datetime(profile['became_member_on'].astype(str), format='%Y%m%d')
the column is like this :
0 2017-02-12
1 2017-07-15
2 2018-07-12
3 2017-05-09
4 2017-08-04
5 2018-04-26
Then we get today's date:
today_date = datetime.date.today().strftime('%Y-%m-%d')
#calculate days, I tried two different ways but still getting error
profile['membership_date'] - today_date
profile['membership_days'] = (profile['membership_date'] - today_date).days
#error:unsupported operand type(s) for -: 'DatetimeIndex' and 'str'
Can someone help me, thanks.
Use, Series.sub to subtract the membership_date column from the pd.Timestamp.now which returns a series with timedelta objects, finally use Series.dt.days to get the integer representation of number of days elapsed between two dates.
profile['membership_days'] = (
profile['membership_date'].sub(pd.Timestamp.now()).dt.days
)
Result:
# print(profile)
Date membership_days
0 2017-02-12 -1239
1 2017-07-15 -1086
2 2018-07-12 -724
3 2017-05-09 -1153
4 2017-08-04 -1066
5 2018-04-26 -801
I have a dataframe with a '%Y/%U' date column:
Value Count YW Date
0 2 2017/19 2017-05-13
1 2 2017/20 2017-05-19
2 24 2017/22 2017-06-03
3 35 2017/23 2017-06-10
4 41 2017/24 2017-06-17
.. ... ... ...
126 51 2020/05 2020-02-06
127 26 2020/06 2020-02-15
128 30 2020/07 2020-02-22
129 26 2020/08 2020-02-29
130 18 2020/09 2020-03-04
I'm trying to add the missing weeks, like 2017/21 with 0 Count values, so I created this index:
idx = pdh.pd.date_range(df['Date'].min(), df['Date'].max(), freq='W').floor('d')
Which yields:
DatetimeIndex(['2017-05-14', '2017-05-21', '2017-05-28', '2017-06-04',
'2017-06-11', '2017-06-18', '2017-06-25', '2017-07-02',
'2017-07-09', '2017-07-16',
...
'2019-12-29', '2020-01-05', '2020-01-12', '2020-01-19',
'2020-01-26', '2020-02-02', '2020-02-09', '2020-02-16',
'2020-02-23', '2020-03-01'],
dtype='datetime64[ns]', length=147, freq=None)
Almost there, converting to '%Y/%U' again:
idx = idx.strftime('%Y/%U')
But this yields:
Index(['2017/20', '2017/21', '2017/22', '2017/23', '2017/24', '2017/25',
'2017/26', '2017/27', '2017/28', '2017/29',
...
'2019/52', '2020/01', '2020/02', '2020/03', '2020/04', '2020/05',
'2020/06', '2020/07', '2020/08', '2020/09'],
dtype='object', length=147)
I'm not sure yet whether it is a problem with reindexing but I've noticed that the firts year/week pair is now 2017/20 instead of 2017/19. This is because the freq='W' offset converts every date to the correspondent week starting day as the default is the same as 'W-SUN' anchored offset. Indeed, 2017-05-14 is a Sunday.
The problem is that the converted date now returns the next week number because of this, 2017-05-13 was converted to 2017-05-14. Using the %U strftime code does start the weeks on Sunday as well, however it is counted from the previous Sunday. Using 'W-SAT' (as 2017-05-13 was a Saturday) solves it at the start but the end will be wrong this case.
Is there any dynamic solution so date_range would start and end with the proper weeks?
I am trying to alter the text on every second row after interpolation the numeric values between rows.
stamp value
0 00:00:00 2
1 00:00:00 3
2 01:00:00 5
trying to apply this change to every second stamp row (ie 30 instead of 00 between colons) - str column
stamp value
0 00:00:00 2
1 00:30:00 3
2 01:00:00 5
function to change string
def time_vals(row):
#run only on odd rows (1/2 hr)
if int(row.name) % 2 != 0:
l, m, r = row.split(':')
return l+":30:"+r
I have tried the following:
hh_weather['time'] =hh_weather[hh_weather.rows[::2]['time']].apply(time_vals(2))
but I get an error: AttributeError: 'DataFrame' object has no attribute 'rows'
and when I try:
hh_weather['time'] = hh_weather['time'].apply(time_vals)
AttributeError: 'str' object has no attribute 'name'
Any ideas?
Use timedelta instead of str
The strength of Pandas lies in vectorised functionality. Here you can use timedelta to represent times numerically. If data is as in your example, i.e. seconds are always zero, you can floor by hour and add 30 minutes. Then assign this series conditionally to df['stamp'].
# convert to timedelta
df['stamp'] = pd.to_timedelta(df['stamp'])
# create series by flooring by hour, then adding 30 minutes
s = df['stamp'].dt.floor('h') + pd.Timedelta(minutes=30)
# assign new series conditional on index
df['stamp'] = np.where(df.index % 2, s, df['stamp'])
print(df)
stamp value
0 00:00:00 2
1 00:30:00 3
2 01:00:00 5
#convert string value to timedelta (better to work with time)
df['stamp']=pd.to_timedelta(df['stamp'])
#slicing only odd row's from `stamp` column and adding 30 minutes to all the odd row's
odd_df=pd.to_timedelta(df.loc[1::2,'stamp'])+pd.to_timedelta('30 min')
#updating new series (out_df) with the existing df, based on index.
df['stamp'].update(odd_df)
#print(df)
stamp value
0 00:00:00 2
1 00:30:00 3
2 01:00:00 5
I am looking to determine the count of string variables in a column across a 3 month data sample. Samples were taken at random times throughout each day. I can group the data by hour, but I require the fidelity of 30 minute intervals (ex. 0500-0600, 0600-0630) on roughly 10k rows of data.
An example of the data:
datetime stringvalues
2018-06-06 17:00 A
2018-06-07 17:30 B
2018-06-07 17:33 A
2018-06-08 19:00 B
2018-06-09 05:27 A
I have tried setting the datetime column as the index, but I cannot figure how to group the data on anything other than 'hour' and I don't have fidelity on the string value count:
df['datetime'] = pd.to_datetime(df['datetime']
df.index = df['datetime']
df.groupby(df.index.hour).count()
Which returns an output similar to:
datetime stringvalues
datetime
5 0 0
6 2 2
7 5 5
8 1 1
...
I researched multi-indexing and resampling to some length the past two days but I have been unable to find a similar question. The desired result would look something like this:
datetime A B
0500 1 2
0530 3 5
0600 4 6
0630 2 0
....
There is no straightforward way to do a TimeGrouper on the time component, so we do this in two steps:
v = (df.groupby([pd.Grouper(key='datetime', freq='30min'), 'stringvalues'])
.size()
.unstack(fill_value=0))
v.groupby(v.index.time).sum()
stringvalues A B
05:00:00 1 0
17:00:00 1 0
17:30:00 1 1
19:00:00 0 1