I am trying to alter the text on every second row after interpolation the numeric values between rows.
stamp value
0 00:00:00 2
1 00:00:00 3
2 01:00:00 5
trying to apply this change to every second stamp row (ie 30 instead of 00 between colons) - str column
stamp value
0 00:00:00 2
1 00:30:00 3
2 01:00:00 5
function to change string
def time_vals(row):
#run only on odd rows (1/2 hr)
if int(row.name) % 2 != 0:
l, m, r = row.split(':')
return l+":30:"+r
I have tried the following:
hh_weather['time'] =hh_weather[hh_weather.rows[::2]['time']].apply(time_vals(2))
but I get an error: AttributeError: 'DataFrame' object has no attribute 'rows'
and when I try:
hh_weather['time'] = hh_weather['time'].apply(time_vals)
AttributeError: 'str' object has no attribute 'name'
Any ideas?
Use timedelta instead of str
The strength of Pandas lies in vectorised functionality. Here you can use timedelta to represent times numerically. If data is as in your example, i.e. seconds are always zero, you can floor by hour and add 30 minutes. Then assign this series conditionally to df['stamp'].
# convert to timedelta
df['stamp'] = pd.to_timedelta(df['stamp'])
# create series by flooring by hour, then adding 30 minutes
s = df['stamp'].dt.floor('h') + pd.Timedelta(minutes=30)
# assign new series conditional on index
df['stamp'] = np.where(df.index % 2, s, df['stamp'])
print(df)
stamp value
0 00:00:00 2
1 00:30:00 3
2 01:00:00 5
#convert string value to timedelta (better to work with time)
df['stamp']=pd.to_timedelta(df['stamp'])
#slicing only odd row's from `stamp` column and adding 30 minutes to all the odd row's
odd_df=pd.to_timedelta(df.loc[1::2,'stamp'])+pd.to_timedelta('30 min')
#updating new series (out_df) with the existing df, based on index.
df['stamp'].update(odd_df)
#print(df)
stamp value
0 00:00:00 2
1 00:30:00 3
2 01:00:00 5
Related
I have a dataset in which there is a column named "Time" of the type object.
There are some rows given as 10 for 10:00 and others as 1000.
how do I convert this column to time format.
weather['Time'] = pd.to_datetime(weather['Time'], format='%H:%M').dt.Time
this is the code I used.
I am getting this error, ValueError: time data '10' does not match format '%H:%M' (match)
You can convert column to required time format first like this
weather= pd.DataFrame(['1000','10:00','10','1000'],columns=list("Time"))
def convert_time(x):
if len(x) == 2:
return f'{x}:00'
if ':' not in x:
return x[:2] + ':' + x[2:]
return x
wheather.Time= wheather.Time.apply(convert_time)
wheather.Time
Out[1]:
0 10:00
1 10:00
2 10:00
3 10:00
To convert it to datetime
wheather.Time = pd.to_datetime(wheather.Time)
Just the time component
wheather.Time.dt.time
Out[92]:
0 10:00:00
1 10:00:00
2 10:00:00
3 10:00:00
Another possible solution, which is based on the following ideas:
Replace :, when there is one, by the empty string.
Right pad with zeros, so that all entries will have 4 digits.
Use pd.to_datetime to convert to the wanted time format.
weather = pd.DataFrame({'Time': ['20', '1000', '12:30', '0930']})
pd.to_datetime(weather['Time'].str.replace(':', '').str.pad(
4, side='right', fillchar='0'), format='%H%M').dt.time
Output:
0 20:00:00
1 10:00:00
2 12:30:00
3 09:30:00
Name: Time, dtype: object
I have a pandas dataframe column that contains a str of time as follows:
Time
17:24:00
17:25:00
17:26:00
and I have already cast the column to datetime as follows:
new_data['Time'] = pd.to_datetime(new_data['Time'],
format='%H:%M:%S').apply(pd.Timestamp)
Now the column is in datetime64 as needed and shows:
0 1900-01-01 17:24:00
1 1900-01-01 17:25:00
2 1900-01-01 17:26:00
My question is as follows: I need to use the different time components of time as selection criteria for filtering. For example:
filtered_df = data.loc[(new_data['Time'].Hour == 17 )]
or
filtered_df = data.loc[(new_data['Time'].Minutes == 24 )]
But I cannot figure out how to select the individual components (Hour, Minutes, Seconds) now that I have it converted to datetime.
I have a data frame with type: String , i want to convert the delta column into total hours
deltas
0 2 days 12:19:00
1 04:45:00
2 3 days 06:41:00
3 5 days 01:55:00
4 13:57:00
Desired Output:
deltas
0 60 hours
1 4 hours
I tried pd.to_timedelta() but i get this error only leading negative signs are allowed and i am totally stuck in this
To get the number of hours as int run:
(pd.to_timedelta(df.s) / np.timedelta64(1, 'h')).astype(int)
The first step is to convert the string representation of Timedelta to
actual Timedelta.
Then divide it by 1 hour and convert to int.
I have a Python data frame containing a column with Date Time like this
2019-01-02 09:00:00 (which means January 2, 2019 9 AM)
There may be a bunch of rows which have the same date in the Date Time column.
In other words, I can have 2019-01-02 09:00:00 or 2019-01-02 09:15:00 or 2019-01-02 09:30:00 and so on.
Now I need to find the row index of the first occurrence of the date 2019-01-02 in the Python data frame.
I obviously do this using a loop, but am wondering if there is a better way.
With the df['Date Time'].str.contains() method, I can get that all the rows that match a given date, but I need the index.
The generic question is that how do we find the index of a first occurrence of a match in a cell in Python data frame that matches a given string pattern.
The more specific question is that how do we find the index of a first occurrence of a match in a cell in Python data frame that matches a given date in a cell that contains date Time assuming that the Python data frame is sorted in chronologically ascending order of date Time , i.e.
2019-01-02 09:00:00 occurs at an index earlier than 2019-01-02 09:15:00 followed by 2019-01-03 09:00:00 and so on.
Thank you for any inputs
You can use next with iter for first index value matched condition for prevent failed if no matched values:
df = pd.DataFrame({'dates':pd.date_range(start='2018-01-01 20:00:00',
end='2018-01-02 02:00:00', freq='H')})
print (df)
dates
0 2018-01-01 20:00:00
1 2018-01-01 21:00:00
2 2018-01-01 22:00:00
3 2018-01-01 23:00:00
4 2018-01-02 00:00:00
5 2018-01-02 01:00:00
6 2018-01-02 02:00:00
date = '2018-01-02'
mask = df['dates'] >= date
idx = next(iter(mask.index[mask]), 'not exist')
print (idx)
4
date = '2018-01-08'
mask = df['dates'] >= date
idx = next(iter(mask.index[mask]), 'not exist')
print (idx)
not exist
If performance is important, see Efficiently return the index of the first value satisfying condition in array.
Yep you can use .loc and a condition to slice the df, and then return the index using .iloc.
import pandas as pd
df = pd.DataFrame({'time':pd.date_range(start='2018-01-01 00:00:00',end='2018-12-31 00:00:00', freq='H')}, index=None).reset_index(drop=True)
# then use conditions and .iloc to get the first instance
df.loc[df['time']>'2018-10-30 01:00:00'].iloc[[0,]].index[0]
# if you specify a coarser condition, for instance without time,
# it will also return the first instance
df.loc[df['time']>'2018-10-30'].iloc[[0,]].index[0]
I do not know, if it is optimal, but it works
(df['Date Time'].dt.strftime('%Y-%m-%d') == '2019-01-02').idxmax()
I am looking to determine the count of string variables in a column across a 3 month data sample. Samples were taken at random times throughout each day. I can group the data by hour, but I require the fidelity of 30 minute intervals (ex. 0500-0600, 0600-0630) on roughly 10k rows of data.
An example of the data:
datetime stringvalues
2018-06-06 17:00 A
2018-06-07 17:30 B
2018-06-07 17:33 A
2018-06-08 19:00 B
2018-06-09 05:27 A
I have tried setting the datetime column as the index, but I cannot figure how to group the data on anything other than 'hour' and I don't have fidelity on the string value count:
df['datetime'] = pd.to_datetime(df['datetime']
df.index = df['datetime']
df.groupby(df.index.hour).count()
Which returns an output similar to:
datetime stringvalues
datetime
5 0 0
6 2 2
7 5 5
8 1 1
...
I researched multi-indexing and resampling to some length the past two days but I have been unable to find a similar question. The desired result would look something like this:
datetime A B
0500 1 2
0530 3 5
0600 4 6
0630 2 0
....
There is no straightforward way to do a TimeGrouper on the time component, so we do this in two steps:
v = (df.groupby([pd.Grouper(key='datetime', freq='30min'), 'stringvalues'])
.size()
.unstack(fill_value=0))
v.groupby(v.index.time).sum()
stringvalues A B
05:00:00 1 0
17:00:00 1 0
17:30:00 1 1
19:00:00 0 1