I have the below code
for i in range(index, len(df_2)+1):
if df_2.loc[i, 'Duration'] == 0:
df_2.loc[i, 'Duration'] = df_2.loc[i, "idle_hrs"] + df_2.loc[i - 1, "Duration"]
How can i write this in simple way to reduce time complexity? is there a war to write it in list comprehension style?
You can use shift for the accumulation.
import pandas as pd
index = 2
df = pd.DataFrame({'Duration': [1,0,2,0], 'idle_hrs': [10,20,30,40]})
df
Duration idle_hrs
0 1 10
1 0 20
2 2 30
3 0 40
start_df = df[:index]
df.loc[df['Duration'] == 0, 'Duration'] = df['idle_hrs'] + df['Duration'].shift(1)
df.iloc[:index] = start_df
df
Duration idle_hrs
0 1.0 10
1 0.0 20
2 2.0 30
3 42.0 40
Related
I want to make a function with loop and conditional, that count only when Actual Result = 1.
So the numbers always increase by 1 if the Actual Result = 1.
This is my dataframe:
This is my code but it doesnt produce the result that i want :
def func_count(x):
for i in range(1,880):
if x['Actual Result']==1:
result = i
else:
result = '-'
return result
X_machine_learning['Count'] = X_machine_learning.apply(lambda x:func_count(x),axis=1)
When i check & filter with count != '-' The result will be like this :
The number always equal to 1 and not increase by 1 everytime the actual result = 1. Any solution?
Try something like this:
import pandas as pd
df = pd.DataFrame({
'age': [30,25,40,12,16,17,14,50,22,10],
'actual_result': [0,1,1,1,0,0,1,1,1,0]
})
count = 0
lst_count = []
for i in range(len(df)):
if df['actual_result'][i] == 1:
count+=1
lst_count.append(count)
else:
lst_count.append('-')
df['count'] = lst_count
print(df)
Result
age actual_result count
0 30 0 -
1 25 1 1
2 40 1 2
3 12 1 3
4 16 0 -
5 17 0 -
6 14 1 4
7 50 1 5
8 22 1 6
9 10 0 -
Actually, you don't need to loop over the dataframe, which is mostly a Pandas-antipattern that should be avoided. With df your dataframe you could try the following instead:
m = df["Actual Result"] == 1
df["Count"] = m.cumsum().where(m, "-")
Result for the following dataframe
df = pd.DataFrame({"Actual Result": [1, 1, 0, 1, 1, 1, 0, 0, 1, 0]})
is
Actual Result Count
0 1 1
1 1 2
2 0 -
3 1 3
4 1 4
5 1 5
6 0 -
7 0 -
8 1 6
9 0 -
I am trying to implement the 'Bottom-Up Computation' algorithm in data mining (https://www.aaai.org/Papers/FLAIRS/2003/Flairs03-050.pdf).
I need to use the 'pandas' library to create a dataframe and provide it to a recursive function, which should also return a dataframe as output. I am only able to return the final column as output, because I am unable to figure out how to dynamically build a data frame.
Here is the python program:
import pandas as pd
def project_data(df, d):
return df.iloc[:, d]
def select_data(df, d, val):
col_name = df.columns[d]
return df[df[col_name] == val]
def remove_first_dim(df):
return df.iloc[:, 1:]
def slice_data_dim0(df, v):
df_temp = select_data(df, 0, v)
return remove_first_dim(df_temp)
def buc(df):
dims = df.shape[1]
if dims == 1:
input_sum = sum(project_data(df, 0) )
print(input_sum)
else:
dim_vals = set(project_data(df, 0).values)
for dim_val in dim_vals:
sub_data = slice_data_dim0(df, dim_val)
buc(sub_data)
sub_data = remove_first_dim(df)
buc(sub_data)
data = {'A':[1,1,1,1,2],
'B':[1,1,2,3,1],
'M':[10,20,30,40,50]
}
df = pd.DataFrame(data, columns = ['A','B','M'])
buc(df)
I get the following output:
30
30
40
100
50
50
80
30
40
But what I need is a dataframe, like this (not necessarily formatted, but a data frame):
A B M
0 1 1 30
1 1 2 30
2 1 3 40
3 1 ALL 100
4 2 1 50
5 2 ALL 50
6 ALL 1 80
7 ALL 2 30
8 ALL 3 40
9 ALL ALL 150
How do I achieve this?
Unfortunately pandas doesn't have functionality to do subtotals - so the trick is to just calculate them on the side and concatenate together with original dataframe.
from itertools import combinations
import numpy as np
dim = ['A', 'B']
vals = ['M']
df = pd.concat(
[df]
# subtotals:
+ [df.groupby(list(gr), as_index=False)[vals].sum() for r in range(len(dim)-1) for gr in combinations(dim, r+1)]
# total:
+ [df.groupby(np.zeros(len(df)))[vals].sum()]
)\
.sort_values(dim)\
.reset_index(drop=True)\
.fillna("ALL")
Output:
A B M
0 1 1 10
1 1 1 20
2 1 2 30
3 1 3 40
4 1 ALL 100
5 2 1 50
6 2 ALL 50
7 ALL 1 80
8 ALL 2 30
9 ALL 3 40
10 ALL ALL 150
I wanted to generate some sort of cycle for my dataFrame. One cycle in the example below has the length of 4. The last column is how is supposed to look like, the rest are attempts on my behalf.
My current code looks like this:
import pandas as pd
import numpy as np
l = list(np.linspace(0,10,12))
data = [
('time',l),
('A',[0,5,0.6,-4.8,-0.3,4.9,0.2,-4.7,0.5,5,0.1,-4.6]),
('B',[ 0,300,20,-280,-25,290,30,-270,40,300,-10,-260]),
]
df = pd.DataFrame.from_dict(dict(data))
length = len(df)
df.loc[0,'cycle']=1
df['cycle'] = length/4 +df.loc[0,'cycle']
i = 0
for i in range(0,length):
df.loc[i,'new_cycle']=i+1
df['want_cycle']= [1,1,1,1,2,2,2,2,3,3,3,3]
print(length)
print(df)
I do need an if conditions in the code, too only increase in the value of df['new_cycle'] if the index counter for example 4. But so far I failed to find a proper way to implement such conditions.
Try this with the default range index, because your dataframe row index is a range starting with 0, the default index of a dataframe, you can use floor divide to calculate your cycle:
df['cycle'] = df.index//4 + 1
Output:
time A B cycle
0 0.000000 0.0 0 1
1 0.909091 5.0 300 1
2 1.818182 0.6 20 1
3 2.727273 -4.8 -280 1
4 3.636364 -0.3 -25 2
5 4.545455 4.9 290 2
6 5.454545 0.2 30 2
7 6.363636 -4.7 -270 2
8 7.272727 0.5 40 3
9 8.181818 5.0 300 3
10 9.090909 0.1 -10 3
11 10.000000 -4.6 -260 3
Now, if your dataframe index isn't the default, the you can use something like this:
df['cycle'] = [df.index.get_loc(i) // 4 + 1 for i in df.index]
I've added just 1 thing for you, a new variable called new_cycle which will keep the count you're after.
In the for loop we're checking to see whether or not i is divisible by 4 without a remainder, if it is we're adding 1 to the new variable, and filling the data frame with this value the same way you did.
import pandas as pd
import numpy as np
l = list(np.linspace(0,10,12))
data = [
('time',l),
('A',[0,5,0.6,-4.8,-0.3,4.9,0.2,-4.7,0.5,5,0.1,-4.6]),
('B',[ 0,300,20,-280,-25,290,30,-270,40,300,-10,-260]),
]
df = pd.DataFrame.from_dict(dict(data))
length = len(df)
df.loc[0,'cycle']=1
df['cycle'] = length/4 +df.loc[0,'cycle']
new_cycle = 0
for i in range(0,length):
if i % 4 == 0:
new_cycle += 1
df.loc[i,'new_cycle']= new_cycle
df['want_cycle'] = [1,1,1,1,2,2,2,2,3,3,3,3]
print(length)
print(df)
List item
I am new to programming so don't know much about it.
I have a dataset like this:-
Type Value
A 40
A 70
A 125
A 150
B 50
B 80
B 130
B 150
And I want in this format:
Type <60 >60 >90 >120
A 1 3 2 2
B 1 3 2 2
Basically, count and categorize the values.
def delay_tag(list_name):
empty_list = []
for i in range(0, len(airline)):
if list_name[i] < 60:
empty_list.append('<60')
elif (list_name[i] > 60):
empty_list.append('>60')
elif (list_name[i] >= 120):
empty_list.append('>120')
else:
empty_list.append('>= 180')
return(empty_list)
This is what I Tried
This may give you an idea.
import pandas as pd
df = pd.DataFrame({
'Type':['A','A','A','A','B','B','B','B'],
'Value':[40,70,125,150,50,80,130,150]
})
df_lt60 = df[df['Value']<60]
print df_lt60.groupby('Type').Value.nunique()
df_gt60 = df[df['Value']>=60]
print df_gt60.groupby('Type').Value.nunique()
import pandas as pd
df = pd.read.csv('your_file.csv')
fun = lambda x:{'<60':x.lt(60).sum(),'>60':x.gt(60).sum(),'>90':x.gt(90).sum(),'>120':x.gt(120).sum()}
pd.DataFrame(df.groupby('Type').Value.apply(fun)).reset_index().pivot('Type','level_1','Value')
Out[76]:
level_1 <60 >120 >60 >90
Type
A 1 2 3 2
B 1 2 3 2
I want to know if there is any faster way to do the following loop? Maybe use apply or rolling apply function to realize this
Basically, I need to access previous row's value to determine current cell value.
df.ix[0] = (np.abs(df.ix[0]) >= So) * np.sign(df.ix[0])
for i in range(1, len(df)):
for col in list(df.columns.values):
if ((df[col].ix[i] > 1.25) & (df[col].ix[i-1] == 0)) | :
df[col].ix[i] = 1
elif ((df[col].ix[i] < -1.25) & (df[col].ix[i-1] == 0)):
df[col].ix[i] = -1
elif ((df[col].ix[i] <= -0.75) & (df[col].ix[i-1] < 0)) | ((df[col].ix[i] >= 0.5) & (df[col].ix[i-1] > 0)):
df[col].ix[i] = df[col].ix[i-1]
else:
df[col].ix[i] = 0
As you can see, in the function, I am updating the dataframe, I need to access the most updated previous row, so using shift will not work.
For example:
Input:
A B C
1.3 -1.5 0.7
1.1 -1.4 0.6
1.0 -1.3 0.5
0.4 1.4 0.4
Output:
A B C
1 -1 0
1 -1 0
1 -1 0
0 1 0
you can use .shift() function for accessing previous or next values:
previous value for col column:
df['col'].shift()
next value for col column:
df['col'].shift(-1)
Example:
In [38]: df
Out[38]:
a b c
0 1 0 5
1 9 9 2
2 2 2 8
3 6 3 0
4 6 1 7
In [39]: df['prev_a'] = df['a'].shift()
In [40]: df
Out[40]:
a b c prev_a
0 1 0 5 NaN
1 9 9 2 1.0
2 2 2 8 9.0
3 6 3 0 2.0
4 6 1 7 6.0
In [43]: df['next_a'] = df['a'].shift(-1)
In [44]: df
Out[44]:
a b c prev_a next_a
0 1 0 5 NaN 9.0
1 9 9 2 1.0 2.0
2 2 2 8 9.0 6.0
3 6 3 0 2.0 6.0
4 6 1 7 6.0 NaN
I am surprised there isn't a native pandas solution to this as well, because shift and rolling do not get it done. I have devised a way to do this using the standard pandas syntax but I am not sure if it performs any better than your loop... My purposes just required this for consistency (not speed).
import pandas as pd
df = pd.DataFrame({'a':[0,1,2], 'b':[0,10,20]})
new_col = 'c'
def apply_func_decorator(func):
prev_row = {}
def wrapper(curr_row, **kwargs):
val = func(curr_row, prev_row)
prev_row.update(curr_row)
prev_row[new_col] = val
return val
return wrapper
#apply_func_decorator
def running_total(curr_row, prev_row):
return curr_row['a'] + curr_row['b'] + prev_row.get('c', 0)
df[new_col] = df.apply(running_total, axis=1)
print(df)
# Output will be:
# a b c
# 0 0 0 0
# 1 1 10 11
# 2 2 20 33
Disclaimer: I used pandas 0.16 but with only slight modification this will work for the latest versions too.
Others had similar questions and I posted this solution on those as well:
Reference previous row when iterating through dataframe
Reference values in the previous row with map or apply
#maxU has it right with shift, I think you can even compare dataframes directly, something like this:
df_prev = df.shift(-1)
df_out = pd.DataFrame(index=df.index,columns=df.columns)
df_out[(df>1.25) & (df_prev == 0)] = 1
df_out[(df<-1.25) & (df_prev == 0)] = 1
df_out[(df<-.75) & (df_prev <0)] = df_prev
df_out[(df>.5) & (df_prev >0)] = df_prev
The syntax may be off, but if you provide some test data I think this could work.
Saves you having to loop at all.
EDIT - Update based on comment below
I would try my absolute best not to loop through the DF itself. You're better off going column by column, sending to a list and doing the updating, then just importing back again. Something like this:
df.ix[0] = (np.abs(df.ix[0]) >= 1.25) * np.sign(df.ix[0])
for col in df.columns.tolist():
currData = df[col].tolist()
for currRow in range(1,len(currData)):
if currData[currRow]> 1.25 and currData[currRow-1]== 0:
currData[currRow] = 1
elif currData[currRow] < -1.25 and currData[currRow-1]== 0:
currData[currRow] = -1
elif currData[currRow] <=-.75 and currData[currRow-1]< 0:
currData[currRow] = currData[currRow-1]
elif currData[currRow]>= .5 and currData[currRow-1]> 0:
currData[currRow] = currData[currRow-1]
else:
currData[currRow] = 0
df[col] = currData