subtracting time intervals from column dates in dataframes Pandas Python - python

How would I be able to subtract 1 second and 1 minute and 1 month from data['date'] column?
import pandas as pd
d = {'col1': [4, 5, 2, 2, 3, 5, 1, 1, 6], 'col2': [6, 2, 1, 7, 3, 5, 3, 3, 9],
'label':['Old','Old','Old','Old','Old','Old','Old','Old','Old'],
'date': ['2022-01-24 10:07:02', '2022-01-27 01:55:03', '2022-01-30 19:09:03', '2022-02-02 14:34:06',
'2022-02-08 12:37:03', '2022-02-10 03:07:02', '2022-02-10 14:02:03', '2022-02-11 00:32:25',
'2022-02-12 21:42:03']}
data = pd.DataFrame(d)
# subtract the dates by 1 second
date_mod_s = pd.to_datetime(data['date'])
# subtract the dates by 1 minute
date_mod_m = pd.to_datetime(data['date'])
# subtract the dates by 1 month
date_mod_M = pd.to_datetime(data['date'])

Your date column is of type string. Convert it to pd.Timestamp and you can use pd.DateOffset:
pd.to_datetime(data["date"]) - pd.DateOffset(months=1, minutes=1, seconds=1)

Related

Replace some data form a dataframe to another under specific conditions with pandas

I everyone, I'm quite new to Pandas dataset but, so I won't attach code if not pseudo-code cause I have no idea how to implement this.
I have two DataFrames, one with a Job number and a date related (let's call this DF2) to it and the bigger one with a bunch of different data (this will be DF1).
I would like to compare DF1 with DF2 and if the string in DF1[jobNo.] is equal to a string in DF2[jobNo.] get DF1[Date] == DF2[Date].
Any ideas? I really need your help.
Thanks
If you're trying to check if the dates match when the jobNo match, my approach would be to merge the two dataframes on jobNo and compare the dates.
import pandas as pd
df1 = pd.DataFrame({'jobNo': [0, 3, 1], 'date': [9, 8, 3]})
df2 = pd.DataFrame({'jobNo': [0, 3, 2], 'date': [9, 5, 3]})
df3 = df2.merge(df1, on=["jobNo"], suffixes=('_2', '_1'))
df3["date_match"] = df3.apply(lambda x: x["date_2"] == x["date_1"], axis=1)
print(df3)
jobNo date_2 date_1 date_match
0 0 9 9 True
1 3 5 8 False
if what you mean by df1["date"]==df2["date"] is that we're going to change the date in df1 if there's a match then this code looks for a match and replaces the date using apply
import pandas as pd
df1 = pd.DataFrame({'jobNo': [0, 3, 1], 'date': [9, 8, 3]})
df2 = pd.DataFrame({'jobNo': [0, 3, 2], 'date': [7, 5, 4]})
df1['new_date'] = df1.apply(lambda x: (x['date'] if x['jobNo']
not in df2['jobNo'
].values else df2[df2['jobNo'] == x['jobNo'
]]['date'].values[0]), axis=1)
print(df1)
jobNo date new_date
0 0 9 7
1 3 8 5
2 1 3 3

How to edit all data value given in a dataframe except for the values of a particular index?

I have a dataframe consisting of float64 values in it. I have to divide each value by hundred except for the the values of the row of index no. 388. For that I wrote the following code.
Dataset
Preprocessing:
df = pd.read_csv('state_cpi.csv')
d = {'January':1, 'February':2, 'March':3, 'April':4, 'May':5, 'June':6, 'July':7, 'August':8, 'September':9, 'October':10, 'November':11, 'December':12}
df['Month']=df['Name'].map(d)
r = {'Rural':1, 'Urban':2, 'Rural+Urban':3}
df['Region_code']=df['Sector'].map(r)
df['Himachal Pradesh'] = df['Himachal Pradesh'].str.replace('--','NaN')
df['Himachal Pradesh'] = df['Himachal Pradesh'].astype('float64')
Extracting the data of use:
data = df.iloc[:,3:-2]
Applying the division on the data dataframe
data[:,:388] = (data[:,:388] / 100).round(2)
data[:,389:] = (data[:,389:] / 100).round(2)
It returned me a dataframe where the data of row no. 388 was also divided by 100.
Dataset
As an example, I give the created dataframe. Indices except for 10 are copied into the aaa list. These index numbers are then supplied when querying and 1 is added to each element. The row with index 10 remains unchanged.
df = pd.DataFrame({'a': [1, 23, 4, 5, 7, 7, 8, 10, 9],
'b': [1, 2, 3, 4, 5, 6, 7, 8, 9]},
index=[1, 2, 5, 7, 8, 9, 10, 11, 12])
aaa = df[df.index != 10].index
df.loc[aaa, :] = df.loc[aaa, :] + 1
In your case, the code will be as follows:
aaa = data[data.index != 388].index
data.loc[aaa, :] = (data.loc[aaa, :] / 100).round(2)

Groupby columns on ID and month and assign value for each month as new colmuns

I have a dataset where i groupby the monthly data with the same id:
temp1 = listvar[2].groupby(["id", "month"])["value"].mean()
This results in this:
id month
SN10380 1 -9.670370
2 -8.303571
3 -4.932143
4 0.475862
5 5.732000
...
SN99950 8 6.326786
9 4.623529
10 1.290566
11 -0.867273
12 -2.485455
I then want to have each month and the corresponding value as a own column on the same ID, like this:
id month_1 month_2 month_3 month_4 .... month_12
SN10380 -9.670370 -8.303571 .....
SN99950
I have tried different solutions using apply(), transform() and agg(), but aren't able to produce the wanted output.
You could use unstack. Here's the sample code:
import pandas as pd
df = pd.DataFrame({
"id": [1, 1, 1, 1, 1, 2, 2, 2, 2, 2],
"month": [1, 2, 3, 4, 5, 1, 2, 3, 4, 5],
"value": [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
})
temp1 = df.groupby(["id", "month"])["value"].mean()
temp1.unstack()
I hope it helps!

Pandas Time Diffrerence into Minutes,

After computing the time diffrence in Pandas DataFrame, I am not able to get the Time Difference in Number of Minutes,
import pandas as pd
df = pd.DataFrame({'year': [2019] * 5,'month': [8] * 5,'day': [16] * 5,'hour': [12, 12, 12, 12, 13],
'minute': [1, 2, 3, 4, 5]})
df_2 = pd.DataFrame({'year': [2019] * 5,'month': [7] * 5,'day': [22] * 5,'hour': [11, 12, 12, 13, 14],
'minute': [1, 2, 3, 4, 5]})
df = pd.DataFrame(pd.to_datetime(df), columns=['Time_Stamp'])
df_2 = pd.DataFrame(pd.to_datetime(df_2), columns=['Time_Stamp_2'])
df['Time_Stamp_2']=df_2['Time_Stamp_2']
df['TimeDiff'] = df.Time_Stamp - df.Time_Stamp_2
df
Tried df['TimeDiff'].dt.seconds/60 but, that ignored the Days difference.
You need to use total_seconds()
df['TimeDiff'] = (df.Time_Stamp - df.Time_Stamp_2).dt.total_seconds().div(60)
The TimdeDiff column:
0 36060.0
1 36000.0
2 36000.0
3 35940.0
4 35940.0
Name: TimeDiff, dtype: float64
Run: (df.Time_Stamp - df.Time_Stamp_2) / np.timedelta64(1, 'm')

Drop consecutive duplicates which have milliseconds different sampling frequency - Python

The dataframe looks like this:
0, 3710.968017578125, 2012-01-07T03:13:43.859Z
1, 3710.968017578125, 2012-01-07T03:13:48.890Z
2, 3712.472900390625, 2012-01-07T03:13:53.906Z
3, 3712.472900390625, 2012-01-07T03:13:58.921Z
4, 3713.110107421875, 2012-01-07T03:14:03.900Z
5, 3713.110107421875, 2012-01-07T03:14:03.937Z
6, 3713.89892578125, 2012-01-07T03:14:13.900Z
7, 3713.89892578125, 2012-01-07T03:14:13.968Z
8, 3713.89892578125, 2012-01-07T03:14:19.000Z
9, 3714.64990234375, 2012-01-07T03:14:24.000Z
10, 3714.64990234375, 2012-01-07T03:14:24.015Z
11, 3714.64990234375, 2012-01-07T03:14:29.000Z
12, 3714.64990234375, 2012-01-07T03:14:29.031Z
At some rows, there are lines with millisecond different timestamps, I want to drop them and only keep the rows that have different second timestamps. there are rows that have the same value for milliseconds and seconds different rows like from row 9 to 12, therefore, I can't use a.loc[a.shift() != a]
The desired output would be:
0, 3710.968017578125, 2012-01-07T03:13:43.859Z
1, 3710.968017578125, 2012-01-07T03:13:48.890Z
2, 3712.472900390625, 2012-01-07T03:13:53.906Z
3, 3712.472900390625, 2012-01-07T03:13:58.921Z
4, 3713.110107421875, 2012-01-07T03:14:03.900Z
6, 3713.89892578125, 2012-01-07T03:14:13.900Z
8, 3713.89892578125, 2012-01-07T03:14:19.000Z
9, 3714.64990234375, 2012-01-07T03:14:24.000Z
11, 3714.64990234375, 2012-01-07T03:14:29.000Z
Try:
df.groupby(pd.to_datetime(df[2]).astype('datetime64[s]')).head(1)
I hope it's self-explained.
You can use below script. I didn't get your dataframe column names so I invented below columns ['x', 'date_time']
df = pd.DataFrame([
(3710.968017578125, pd.to_datetime('2012-01-07T03:13:43.859Z')),
(3710.968017578125, pd.to_datetime('2012-01-07T03:13:48.890Z')),
(3712.472900390625, pd.to_datetime('2012-01-07T03:13:53.906Z')),
(3712.472900390625, pd.to_datetime('2012-01-07T03:13:58.921Z')),
(3713.110107421875, pd.to_datetime('2012-01-07T03:14:03.900Z')),
(3713.110107421875, pd.to_datetime('2012-01-07T03:14:03.937Z')),
(3713.89892578125, pd.to_datetime('2012-01-07T03:14:13.900Z')),
(3713.89892578125, pd.to_datetime('2012-01-07T03:14:13.968Z')),
(3713.89892578125, pd.to_datetime('2012-01-07T03:14:19.000Z')),
(3714.64990234375, pd.to_datetime('2012-01-07T03:14:24.000Z')),
(3714.64990234375, pd.to_datetime('2012-01-07T03:14:24.015Z')),
(3714.64990234375, pd.to_datetime('2012-01-07T03:14:29.000Z')),
(3714.64990234375, pd.to_datetime('2012-01-07T03:14:29.031Z'))],
columns=['x', 'date_time'])
create a column 'time_diff' to get the difference between the
datetime of current row and next row
only get those difference either
None or more than 1 second
drop temp column time_diff
df['time_diff'] = df.groupby('x')['date_time'].diff()
df = df[(df['time_diff'].isnull()) | (df['time_diff'].map(lambda x: x.seconds > 1))]
df = df.drop(['time_diff'], axis=1)
df

Categories

Resources