better way to create a new column instead of for loop

better way to create a new column instead of for loop - python

is there any faster way to this code?
i just want to calculate t_last - t_i and create a new column
time_ges = pd.DataFrame()
for i in range(0, len(df.GesamteMessung_Sec.index), 1):
time = df.GesamteMessung_Sec.iloc[-1]-df.GesamteMessung_Sec.iloc[i]
time_ges = time_ges.append(pd.DataFrame({'echte_Ladezeit': time}, index=[0]), ignore_index=True)
df['echte_Ladezeit'] = time_ges
this code takes a lot of computation time, is there any better way to do this?
thanks, R

You can subtract last value by column GesamteMessung_Sec and add to_frame for convert Series to DataFrame:
df = pd.DataFrame({'GesamteMessung_Sec':[10,2,1,5]})
print (df)
GesamteMessung_Sec
0 10
1 2
2 1
3 5
time_ges = (df.GesamteMessung_Sec.iloc[-1] - df.GesamteMessung_Sec).to_frame('echte_Ladezeit')
print (time_ges )
echte_Ladezeit
0 -5
1 3
2 4
3 0
If need new column of original DataFrame:
df = pd.DataFrame({'GesamteMessung_Sec':[10,2,1,5]})
df['echte_Ladezeit'] = df.GesamteMessung_Sec.iloc[-1] - df.GesamteMessung_Sec
print (df)
GesamteMessung_Sec echte_Ladezeit
0 10 -5
1 2 3
2 1 4
3 5 0

Related

Pandas dataframe call for a previous row from inside a loop gives me wrong(?) values

i'm trying to do some recursive calculation that require to perform an operation in row i with the row i-1
I don't know why but I can't make it work with the iloc[i-1] or .shift() functions.
I define a function:
import pandas as pd
def myfunction(df):
#getting the length of the table for the loop:
code_len = df.shape[0]
print(df)
creating new output df
dnl = df
#this loop is taking the i'th row and subtract from it the i-1'th row
for i in range(1, code_len):
print(i)
dnl.iloc[i] = (df.iloc[i] - df.iloc[i-1])
droping the first row because there is now subtraction there
dnl = dnl.drop(0)
print(dnl)
return dnl
when i call this function :
df = pd.DataFrame(np.arange(16).reshape(3, 4), columns=['0', '1', '2', '3'])
something = myfunction(csv_in)
the first print output of the input df is
0 1 2 3
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
3 12 13 14 15
each row is different by 4 from the one before it
inside the loop i try to do a subtraction between two rows, i and i-1 but it doesn't give me the desired result
0 1 2 3
1 4 4 4 4
2 4 5 6 7
3 8 8 8 8
I also tried with .shift()
def myfunction(df):
code_len = df.shape[0]
print(df.tail())
dnl = df
for i in range(1, code_len):
print(i)
dnl.iloc[i] = (df.iloc[i] - df.shift().iloc[i])
dnl = dnl.drop(0)
print(dnl)
return dnl
but I get the same result.
isn't i-1 supposed to work here? what am I missing?
i expect to get:
0 1 2 3
1 4 4 4 4
2 4 4 4 4
3 4 4 4 4
thanks

This is because when you copy dnl = df and change dnl, those changes are reflected in the original dataframe. Use copy(), description here.
dnl = df.copy()

Apply function to dataframe row which references previous row's data

I'm trying to set col 'b' of my dataframe based on it's previous value from the row above. Is there any way to do this without iterating through the rows or using decorators to the pd.apply function?
Psuedo code:
if row != 0:
curr_row['b'] = prev_row['b'] + curr_row['a']
else:
curr_row['b'] = curr_row['a']
Here's what i've tried:
df = pd.DataFrame({'a': [1,2,3,4,5],
'b': [0,0,0,0,0]})
df.b = df.apply(lambda row: row.a if row.name < 1 else (df.iloc[row.name-1].b + row.a), axis=1)
output:
a b
0 1 1
1 2 2
2 3 3
3 4 4
4 5 5
Desired output:
a b
0 1 1
1 2 3
2 3 6
3 4 10
4 5 15
if I run the apply function a second time on the new df one more row value of c is correct.:
a b
0 1 1
1 2 3
2 3 5
3 4 7
4 5 9
This pattern continues if I continue to re-run the apply function until the output is correct.
I'm guessing the issue has something to do with the mechanics of how the apply function works which makes it break when you use a value from the same column you are 'applying' on. That or I'm just being an idiot somehow (very plausible). Can someone explain this?
Do I have to use decorators to store the previous row or is there a cleaner way to do this?

Your requirement is cumsum()
df = pd.DataFrame({'a': [1,2,3,4,5],
'b': [0,0,0,0,0]})
df.assign(b=df.a.cumsum())

How to generate a sequence of numbers when encountered a value in python pandas dataframe

sample and expected data
The block one is current data and block 2 is the expected data that is, when i encounter 1 i need the next row to be incremented by one and for next country b same should happen

First replace all another values after first 1 to 1, so is possible use GroupBy.cumsum:
df = pd.DataFrame({'c':['a']*3 + ['b']*3+ ['c']*3, 'v':[1,0,0,0,1,0,0,0,1]})
s = df.groupby('c')['v'].cumsum()
df['new'] = s.where(s.eq(0), 1).groupby(df['c']).cumsum()
print (df)
c v new
0 a 1 1
1 a 0 2
2 a 0 3
3 b 0 0
4 b 1 1
5 b 0 2
6 c 0 0
7 c 0 0
8 c 1 1
Another solution is replace all not 1 values to missing values and forward filling 1 per groups, then first missing values are replaced to 0, so cumulative sum also working perfectly:
s = df['v'].where(df['v'].eq(1)).groupby(df['c']).ffill().fillna(0).astype(int)
df['new'] = s.groupby(df['c']).cumsum()

Python - Pandas dataframe - iteration through a column

I have a pandas dataframe and I would like to add an empty column (named nb_trades). Then I would like to fill this new column with a 5 by 5 increment. So I should get a column with values 5 10 15 20 ...
Doing the below code assign the same value (last value of i) in the whole column and that's not what I wanted:
big_df["nb_trade"]='0'
for i in range(big_df.shape[0]):
big_df['nb_trade']=5*(i+1)
Can someone help me?

Use range or np.arrange:
df = pd.DataFrame({'a':[1,2,3]})
print (df)
a
0 1
1 2
2 3
df['new'] = range(5, len(df.index) * 5 + 5, 5)
print (df)
a new
0 1 5
1 2 10
2 3 15
df['new'] = np.arange(5, df.shape[0] * 5 + 5, 5)
print (df)
a new
0 1 5
1 2 10
2 3 15
Solution of John Galt from comment:
df['new'] = np.arange(5, 5*(df.shape[0]+1), 5)
print (df)
a new
0 1 5
1 2 10
2 3 15

Add column to pandas without headers

How does one append a column of constant values to a pandas dataframe without headers? I want to append the column at the end.
With headers I can do it this way:
df['new'] = pd.Series([0 for x in range(len(df.index))], index=df.index)

Each not empty DataFrame has columns, index and some values.
You can add default column value and create new column filled by scalar:
df[len(df.columns)] = 0
Sample:
df = pd.DataFrame({0:[1,2,3],
1:[4,5,6]})
print (df)
0 1
0 1 4
1 2 5
2 3 6
df[len(df.columns)] = 0
print (df)
0 1 2
0 1 4 0
1 2 5 0
2 3 6 0
Also for creating new column with name the simpliest is:
df['new'] = 1

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

better way to create a new column instead of for loop - python

Related

Pandas dataframe call for a previous row from inside a loop gives me wrong(?) values

Apply function to dataframe row which references previous row's data

How to generate a sequence of numbers when encountered a value in python pandas dataframe

Python - Pandas dataframe - iteration through a column

Add column to pandas without headers

Categories

Resources