Adding different days in a DataFrame with a fixed date - python

I have a DataFrame with numbers ('number') and I wanted to add these numbers to a date.
Unfortunately my attempts don't work and I always get error messages no matter how I try....
This is a code example how I tried it:
from datetime import datetime
number = pd.DataFrame({'date1': ['7053','0','16419','7112','-2406','2513','8439','-180','13000','150','1096','15150','3875','-10281']})
number
df = datetime(2010, 1, 1) + number['date1']
df
As an example of the result (YYYY/MM/DD) should come out a column or DataFrame with a date, which results from the calculation "start date + number".
result = pd.DataFrame({'result': ['2001/03/01','1981/11/08','1975/04/08','2023/05/02']})
result
Currently the numbers are in the df 'number' type object.
Then I get this error message.
unsupported operand type(s) for +: 'numpy.ndarray' and 'Timestamp'
If I change df 'number' to str or int64, I get this error message.
addition/subtraction of integers and integer-arrays with timestamp is no longer supported. instead of adding/subtracting `n`, use `n * obj.freq`
What am I doing wrong or can someone help me?
Thanks a lot!

If need add days by original column to 2010-01-01 use to_datetime:
number['date1'] = pd.to_datetime(number['date1'].astype(int), unit='d', origin='2010-01-01')
print (number)
date1
0 2029-04-24
1 2010-01-01
2 2054-12-15
3 2029-06-22
4 2003-06-01
5 2016-11-18
6 2033-02-08
7 2009-07-05
8 2045-08-05
9 2010-05-31
10 2013-01-01
11 2051-06-25
12 2020-08-11
13 1981-11-08
For format YYYY/MM/DD add Series.dt.strftime:
number['date1'] = pd.to_datetime(number['date1'].astype(int), unit='d', origin='2010-01-01').dt.strftime('%Y/%m/%d')
print (number)
date1
0 2029/04/24
1 2010/01/01
2 2054/12/15
3 2029/06/22
4 2003/06/01
5 2016/11/18
6 2033/02/08
7 2009/07/05
8 2045/08/05
9 2010/05/31
10 2013/01/01
11 2051/06/25
12 2020/08/11
13 1981/11/08

number['date1'] = pd.to_datetime(number['date1'].astype(int), unit='d', origin='2010/01/01')
result = number['date1'].dt.strftime('%Y/%m/%d')
print (result)
0 2029/04/24
1 2010/01/01
2 2054/12/15
3 2029/06/22
4 2003/06/01
5 2016/11/18
6 2033/02/08
7 2009/07/05
8 2045/08/05
9 2010/05/31
10 2013/01/01
11 2051/06/25
12 2020/08/11
13 1981/11/08
Name: date1, dtype: object

Related

Why isn't my column converting to string from int?

*Input:*
df["waiting_time"].value_counts()
​
*Output:*
2 days 6724
4 days 5290
1 days 5213
7 days 4906
6 days 4037
...
132 days 1
125 days 1
117 days 1
146 days 1
123 days 1
Name: waiting_time, Length: 128, dtype: int64
I tried:
df['wait_dur'] = df['waiting_time'].values.astype(str)
and I've tried apply as well. No changes to the data type, it stays the same.
You need to skip the 'values' part in your code:
df['wait_dur'] = df['waiting_time'].astype(str)
If you check first row for example, you will get:
type(df['wait_dur'][0])
<class 'str'>
df = df.applymap(str)
This should work, it applies the map string throughout.
If you want to see more methods go here.

How to calculate a Process Duration from a TimeSeries Dataset with Pandas

I have a huge dataset of various sensor data sorted chronologically (by timestamp) and by sensor type. I want to calculate the duration of a process in seconds by subtracting the first entry of a sensor from the last entry. This is to be done with python and pandas. Attached is an example for better understanding:
enter image description here
I want to subtract the first row from the last row for each sensor type to get the process duration in seconds (i.e. row 8 minus row 1 : 2022-04-04T09:44:56.962Z - 2022-04-04T09:44:56.507Z = 0.455 seconds).
The duration should then be written to a newly created column in the last row of the sensor type.
Thanks in advance!
Assuming your 'timestamp' column is already 'to_datetime' converted, would this work ?
df['diffPerSensor_type']=df.groupby('sensor_type')['timestamp'].transform('last')-df.groupby('sensor_type')['timestamp'].transform('first')
You could then extract your seconds with this
df['diffPerSensor_type'].dt.seconds
If someone wants to reproduce an example, here is a df:
import pandas as pd
df = pd.DataFrame({
'sensor_type' : [0]*7 + [1]*11 + [13]*5 + [8]*5,
'timestamp' : pd.date_range('2022-04-04', periods=28, freq='ms'),
'value' : [128] * 28
})
df['time_diff in milliseconds'] = (df.groupby('sensor_type')['timestamp']
.transform(lambda x: x.iloc[-1]-x.iloc[0])
.dt.components.milliseconds)
print(df.head(10))
sensor_type timestamp value time_diff in milliseconds
0 0 2022-04-04 00:00:00.000 128 6
1 0 2022-04-04 00:00:00.001 128 6
2 0 2022-04-04 00:00:00.002 128 6
3 0 2022-04-04 00:00:00.003 128 6
4 0 2022-04-04 00:00:00.004 128 6
5 0 2022-04-04 00:00:00.005 128 6
6 0 2022-04-04 00:00:00.006 128 6
7 1 2022-04-04 00:00:00.007 128 10
8 1 2022-04-04 00:00:00.008 128 10
9 1 2022-04-04 00:00:00.009 128 10
My solution is nearly the same as #Daniel Weigel , only that I used lambda to calc the difference.

splitting space-separated string in one column to two (int) columns - pandas python

I have a dataset that looks like this:
df1.head()
time/wattage
0 1303132930 225.57
1 1303132931 226.09
2 1303132932 222.74
3 1303132933 222.20
4 1303132934 222.11
That has the dtype as:
df1.dtypes
time/wattage object
dtype: object
I want to have something like this:
df1.head()
time wattage
0 1303132930 225.57
1 1303132931 226.09
2 1303132932 222.74
3 1303132933 222.20
4 1303132934 222.11
where time and wattage are in 'int' and 'float' types, respectively.
Thanks!
You could do:
df1[['time','wattage']] = df1['time/wattage'].str.split(' ', expand=True)
Output:
time/wattage time wattage
0 1303132930 225.57 1303132930 225.57
1 1303132931 226.09 1303132931 226.09
2 1303132932 222.74 1303132932 222.74
3 1303132933 222.20 1303132933 222.20
4 1303132934 222.11 1303132934 222.11
Those are still string/object dtype, you need to cast the correct dtype.

Problem with tuple indices in loop in Python Pandas?

I try to calculate number of days until and since last and next holiday. My method of calculation it is like below:
holidays = pd.Series(pd.to_datetime(["01.01.2013", "06.01.2013", "14.02.2013","29.03.2013",
"31.03.2013", "01.04.2013", "01.05.2013", "03.05.2013",
"19.05.2013", "26.05.2013", "30.05.2013", "23.06.2013",
"15.07.2013", "27.10.2013", "01.11.2013", "11.11.2013",
"24.12.2013", "25.12.2013", "26.12.2013", "31.12.2013",
"01.01.2014", "06.01.2014", "14.02.2014", "30.03.2014",
"18.04.2014", "20.04.2014", "21.04.2014", "01.05.2014",
"03.05.2014", "03.05.2014", "26.05.2014", "08.06.2014",
"19.06.2014", "23.06.2014", "15.08.2014", "26.10.2014",
"01.11.2014", "11.11.2014", "24.12.2014", "25.12.2014",
"26.12.2014", "31.12.2014",
"01.01.2015", "06.01.2015", "14.02.2015", "29.03.2015",
"03.04.2015", "05.04.2015", "06.04.2015", "01.05.2015",
"03.05.2015", "24.05.2015", "26.05.2015", "04.06.2015",
"23.06.2015", "15.08.2015", "25.10.2015", "01.11.2015",
"11.11.2015", "24.12.2015", "25.12.2015", "26.12.2015",
"31.12.2015"], dayfirst=True))
#Number of days until next holiday
d_until_next_holiday = []
#Number of days since last holiday
d_since_last_holiday = []
for row in data.itertuples():
next_special_date = holidays[holidays >= row["Date"]].iloc[0]
d_until_next_holiday.append((next_special_date - row["Date"])/pd.Timedelta('1D'))
previous_special_date = holidays[holidays <= row.index].iloc[-1]
d_since_last_holiday.append((row["Date"] - previous_special_date)/pd.Timedelta('1D'))
#Add new cols to DF
sto2STG14["d_until_next_holiday"] = d_until_next_holiday
sto2STG14["d_since_last_holiday"] = d_since_last_holiday
Nevertheless, I have en error like below:
TypeError: tuple indices must be integers or slices, not str
Why I have this erro ? I know that row is tuple, but i use in my code .iloc[0] and .iloc[-1] ? WHat can I do ?
With pandas, you rarely need to loop. In this case, the .shift method allows you to compute everything in one go:
import pandas
holidays = pandas.Series(pandas.to_datetime([
"01.01.2013", "06.01.2013", "14.02.2013","29.03.2013",
"31.03.2013", "01.04.2013", "01.05.2013", "03.05.2013",
"19.05.2013", "26.05.2013", "30.05.2013", "23.06.2013",
"15.07.2013", "27.10.2013", "01.11.2013", "11.11.2013",
"24.12.2013", "25.12.2013", "26.12.2013", "31.12.2013",
"01.01.2014", "06.01.2014", "14.02.2014", "30.03.2014",
"18.04.2014", "20.04.2014", "21.04.2014", "01.05.2014",
"03.05.2014", "03.05.2014", "26.05.2014", "08.06.2014",
"19.06.2014", "23.06.2014", "15.08.2014", "26.10.2014",
"01.11.2014", "11.11.2014", "24.12.2014", "25.12.2014",
"26.12.2014", "31.12.2014",
"01.01.2015", "06.01.2015", "14.02.2015", "29.03.2015",
"03.04.2015", "05.04.2015", "06.04.2015", "01.05.2015",
"03.05.2015", "24.05.2015", "26.05.2015", "04.06.2015",
"23.06.2015", "15.08.2015", "25.10.2015", "01.11.2015",
"11.11.2015", "24.12.2015", "25.12.2015", "26.12.2015",
"31.12.2015"
], dayfirst=True)
)
results = (
holidays
.sort_values()
.to_frame('holiday')
.assign(
days_since_prev=lambda df: df['holiday'] - df['holiday'].shift(1),
days_until_next=lambda df: df['holiday'].shift(-1) - df['holiday'],
)
)
results.head(10)
And I get:
holiday days_since_prev days_until_next
0 2013-01-01 NaT 5 days
1 2013-01-06 5 days 39 days
2 2013-02-14 39 days 43 days
3 2013-03-29 43 days 2 days
4 2013-03-31 2 days 1 days
5 2013-04-01 1 days 30 days
6 2013-05-01 30 days 2 days
7 2013-05-03 2 days 16 days
8 2013-05-19 16 days 7 days
9 2013-05-26 7 days 4 days

Time arithmetic on pandas series

I have a pandas DataFrame with a column "StartTime" that could be any datetime value. I would like to create a second column that gives the StartTime relative to the beginning of the week (i.e., 12am on the previous Sunday). For example, this post is 5 days, 14 hours since the beginning of this week.
StartTime
1 2007-01-19 15:59:24
2 2007-03-01 04:16:08
3 2006-11-08 20:47:14
4 2008-09-06 23:57:35
5 2007-02-17 18:57:32
6 2006-12-09 12:30:49
7 2006-11-11 11:21:34
I can do this, but it's pretty dang slow:
def time_since_week_beg(x):
y = x.to_datetime()
return pd.Timedelta(days=y.weekday(),
hours=y.hour,
minutes=y.minute,
seconds=y.second
)
df['dt'] = df.StartTime.apply(time_since_week_beg)
What I want is something like this, that doesn't result in an error:
df['dt'] = pd.Timedelta(days=df.StartTime.dt.dayofweek,
hours=df.StartTime.dt.hour,
minute=df.StartTime.dt.minute,
second=df.StartTime.dt.second
)
TypeError: Invalid type <class 'pandas.core.series.Series'>. Must be int or float.
Any thoughts?
You can use a list comprehension:
df['dt'] = [pd.Timedelta(days=ts.dayofweek,
hours=ts.hour,
minutes=ts.minute,
seconds=ts.second)
for ts in df.StartTime]
>>> df
StartTime dt
0 2007-01-19 15:59:24 4 days 15:59:24
1 2007-03-01 04:16:08 3 days 04:16:08
2 2006-11-08 20:47:14 2 days 20:47:14
3 2008-09-06 23:57:35 5 days 23:57:35
4 2007-02-17 18:57:32 5 days 18:57:32
5 2006-12-09 12:30:49 5 days 12:30:49
6 2006-11-11 11:21:34 5 days 11:21:34
Depending on the format of StartTime, you may need:
...for ts in pd.to_datetime(df.StartTime)

Categories

Resources