After computing the time diffrence in Pandas DataFrame, I am not able to get the Time Difference in Number of Minutes,
import pandas as pd
df = pd.DataFrame({'year': [2019] * 5,'month': [8] * 5,'day': [16] * 5,'hour': [12, 12, 12, 12, 13],
'minute': [1, 2, 3, 4, 5]})
df_2 = pd.DataFrame({'year': [2019] * 5,'month': [7] * 5,'day': [22] * 5,'hour': [11, 12, 12, 13, 14],
'minute': [1, 2, 3, 4, 5]})
df = pd.DataFrame(pd.to_datetime(df), columns=['Time_Stamp'])
df_2 = pd.DataFrame(pd.to_datetime(df_2), columns=['Time_Stamp_2'])
df['Time_Stamp_2']=df_2['Time_Stamp_2']
df['TimeDiff'] = df.Time_Stamp - df.Time_Stamp_2
df
Tried df['TimeDiff'].dt.seconds/60 but, that ignored the Days difference.
You need to use total_seconds()
df['TimeDiff'] = (df.Time_Stamp - df.Time_Stamp_2).dt.total_seconds().div(60)
The TimdeDiff column:
0 36060.0
1 36000.0
2 36000.0
3 35940.0
4 35940.0
Name: TimeDiff, dtype: float64
Run: (df.Time_Stamp - df.Time_Stamp_2) / np.timedelta64(1, 'm')
Related
I have a dataframe consisting of float64 values in it. I have to divide each value by hundred except for the the values of the row of index no. 388. For that I wrote the following code.
Dataset
Preprocessing:
df = pd.read_csv('state_cpi.csv')
d = {'January':1, 'February':2, 'March':3, 'April':4, 'May':5, 'June':6, 'July':7, 'August':8, 'September':9, 'October':10, 'November':11, 'December':12}
df['Month']=df['Name'].map(d)
r = {'Rural':1, 'Urban':2, 'Rural+Urban':3}
df['Region_code']=df['Sector'].map(r)
df['Himachal Pradesh'] = df['Himachal Pradesh'].str.replace('--','NaN')
df['Himachal Pradesh'] = df['Himachal Pradesh'].astype('float64')
Extracting the data of use:
data = df.iloc[:,3:-2]
Applying the division on the data dataframe
data[:,:388] = (data[:,:388] / 100).round(2)
data[:,389:] = (data[:,389:] / 100).round(2)
It returned me a dataframe where the data of row no. 388 was also divided by 100.
Dataset
As an example, I give the created dataframe. Indices except for 10 are copied into the aaa list. These index numbers are then supplied when querying and 1 is added to each element. The row with index 10 remains unchanged.
df = pd.DataFrame({'a': [1, 23, 4, 5, 7, 7, 8, 10, 9],
'b': [1, 2, 3, 4, 5, 6, 7, 8, 9]},
index=[1, 2, 5, 7, 8, 9, 10, 11, 12])
aaa = df[df.index != 10].index
df.loc[aaa, :] = df.loc[aaa, :] + 1
In your case, the code will be as follows:
aaa = data[data.index != 388].index
data.loc[aaa, :] = (data.loc[aaa, :] / 100).round(2)
How would I be able to subtract 1 second and 1 minute and 1 month from data['date'] column?
import pandas as pd
d = {'col1': [4, 5, 2, 2, 3, 5, 1, 1, 6], 'col2': [6, 2, 1, 7, 3, 5, 3, 3, 9],
'label':['Old','Old','Old','Old','Old','Old','Old','Old','Old'],
'date': ['2022-01-24 10:07:02', '2022-01-27 01:55:03', '2022-01-30 19:09:03', '2022-02-02 14:34:06',
'2022-02-08 12:37:03', '2022-02-10 03:07:02', '2022-02-10 14:02:03', '2022-02-11 00:32:25',
'2022-02-12 21:42:03']}
data = pd.DataFrame(d)
# subtract the dates by 1 second
date_mod_s = pd.to_datetime(data['date'])
# subtract the dates by 1 minute
date_mod_m = pd.to_datetime(data['date'])
# subtract the dates by 1 month
date_mod_M = pd.to_datetime(data['date'])
Your date column is of type string. Convert it to pd.Timestamp and you can use pd.DateOffset:
pd.to_datetime(data["date"]) - pd.DateOffset(months=1, minutes=1, seconds=1)
I'm trying to iterate over the number of hours between two timestamps. for example:
a = 2018-01-19 12:35:00
b = 2018-01-19 18:50:00
for hour in range(a.hour, b.hour +1):
print(hour)
This will reult in: 12, 13, 14, 15, 16, 17, 18
Later on I want to use the 'hour' var, so I need it to count how many hours difference is there, and not the hours themselves..
The result I want is: 0, 1, 2, 3, 4, 5, 6
There another issue when getting timestamps like those:
c = 2018-01-16 17:59:00
d = 2018-01-17 00:14:00
because the hour in: 00:14:00 is 0.
in this case I want to get: 0, 1, 2, 3, 4, 5, 6, 7
I don't know how to do this.. can anyone help please?
The object you want is a "timedelta" object- it represents the duration between 2 timestamps. Say you wanted to start at a date object, and then do something every one hour after that. Don't try to figure out the interval logic yourself, use the built in stuff.
>>> a = datetime.now()
>>> a
datetime.datetime(2020, 8, 17, 6, 33, 25, 529995)
>>> a + timedelta(hours=1)
datetime.datetime(2020, 8, 17, 7, 33, 25, 529995)
>>> a + timedelta(hours=1)
datetime.datetime(2020, 8, 17, 7, 33, 25, 529995)
>>> a + timedelta(hours=2)
datetime.datetime(2020, 8, 17, 8, 33, 25, 529995)
Try this
from datetime import datetime
def date_range(x, y):
fmt = '%Y-%m-%d %H:%M:%S'
x, y = datetime.strptime(x, fmt), datetime.strptime(y, fmt)
duration = y.replace(minute=59) - x.replace(minute=0)
days, seconds = duration.days, duration.seconds
hours = days * 24 + seconds // 3600
return list(range(hours + 1))
a = '2018-01-19 12:35:00'
b = '2018-01-19 18:50:00'
c = '2018-01-16 17:59:00'
d = '2018-01-17 00:14:00'
print(date_range(a, b))
print(date_range(c, d))
Output:
[0, 1, 2, 3, 4, 5, 6]
[0, 1, 2, 3, 4, 5, 6, 7]
Is it possible for there to be dataframe where (for example) there is a column called "data", and each element in the column was a numpy array?
| Data | Time |
| [1, 2, 3, ... 10] | June 12, 2020 |
| [11, 12, ..., 20] | June 13, 2020 |
If so, how do you create a dataframe in this format?
Not sure you want to do it this way, but it works.
import pandas as pd
import numpy as np
df = pd.DataFrame({'Data': [np.array([1, 2, 3, 10]), np.array([11,12,13,20])], 'Time' : ['June 12, 2020', 'June 13, 2020']})
print (df)
Output:
Data Time
0 [1, 2, 3, 10] June 12, 2020
1 [11, 12, 13, 20] June 13, 2020
You can also do it with lists:
df = pd.DataFrame({'Data': [[1, 2, 3, 10], [11,12,13,20]], 'Time' : ['June 12, 2020', 'June 13, 2020']})
Yes you can, follow this question. It's useful when you data grouped by date, indexes, etc. Because you compress some rows but in terms of pandas operations maybe it isn't that efficient. Maybe you will prefer to use groupby() method and then apply operations.
I have a dataset where i groupby the monthly data with the same id:
temp1 = listvar[2].groupby(["id", "month"])["value"].mean()
This results in this:
id month
SN10380 1 -9.670370
2 -8.303571
3 -4.932143
4 0.475862
5 5.732000
...
SN99950 8 6.326786
9 4.623529
10 1.290566
11 -0.867273
12 -2.485455
I then want to have each month and the corresponding value as a own column on the same ID, like this:
id month_1 month_2 month_3 month_4 .... month_12
SN10380 -9.670370 -8.303571 .....
SN99950
I have tried different solutions using apply(), transform() and agg(), but aren't able to produce the wanted output.
You could use unstack. Here's the sample code:
import pandas as pd
df = pd.DataFrame({
"id": [1, 1, 1, 1, 1, 2, 2, 2, 2, 2],
"month": [1, 2, 3, 4, 5, 1, 2, 3, 4, 5],
"value": [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
})
temp1 = df.groupby(["id", "month"])["value"].mean()
temp1.unstack()
I hope it helps!