While Loop Alternative in Python - python

I am working on a huge dataframe and trying to create a new column, based on a condition in another column. Right now, I have a big while-loop and this calculation takes too much time, is there an easier way to do it?
With lambda for example?:
def promo(dataframe, a):
i=0
while i < len(dataframe)-1:
i=i+1
if dataframe.iloc[i-1,5] >= a:
dataframe.iloc[i-1,6] = 1
else:
dataframe.iloc[i-1,6] = 0
return dataframe

Don't use loops in pandas, they are slow compared to a vectorized solution - convert boolean mask to integers by astype True, False are converted to 1, 0:
dataframe = pd.DataFrame({'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':list('aaabbb'),
'F':[5,3,6,9,2,4],
'G':[5,3,6,9,2,4]
})
a = 5
dataframe['new'] = (dataframe.iloc[:,5] >= a).astype(int)
print (dataframe)
A B C D E F G new
0 a 4 7 1 a 5 5 1
1 b 5 8 3 a 3 3 0
2 c 4 9 5 a 6 6 1
3 d 5 4 7 b 9 9 1
4 e 5 2 1 b 2 2 0
5 f 4 3 0 b 4 4 0
If you want to overwrite the 7th column:
a = 5
dataframe.iloc[:,6] = (dataframe.iloc[:,5] >= a).astype(int)
print (dataframe)
A B C D E F G
0 a 4 7 1 a 5 1
1 b 5 8 3 a 3 0
2 c 4 9 5 a 6 1
3 d 5 4 7 b 9 1
4 e 5 2 1 b 2 0
5 f 4 3 0 b 4 0

Related

Pandas take the line value below

There is such a model of real data:
C S E D
1 1 3 0 0
2 1 5 0 0
3 1 6 0 0
4 2 1 0 0
5 2 3 0 0
6 2 7 0 0
ะก - category, S - start, E - end, D - delta
Using pandas, you need to enter the value of column S with the condition id = id+1 in column E, and the last value of category E is equal to the value from column S of the same row
It turns out:
C S E D
1 1 3 5 0
2 1 5 6 0
3 1 6 6 0
4 2 1 3 0
5 2 3 7 0
6 2 7 7 0
And then subtract S from E and put it in D. This, in principle, is easy. The difficulty is filling in column E
The result is this:
C S E D
1 1 3 5 2
2 1 5 6 1
3 1 6 6 0
4 2 1 3 2
5 2 3 7 4
6 2 7 7 0
Use DataFrameGroupBy.shift with replace last missing values by original with Series.fillna and then only subtract for column D:
df['E'] = df.groupby('C')['S'].shift(-1).fillna(df['S']).astype(int)
df['D'] = df['E'] - df['S']
Or if use DataFrame.assign is necessary use lambda function for use counted values of E column:
df = df.assign(E = df.groupby('C')['S'].shift(-1).fillna(df['S']).astype(int),
D = lambda x: x['E'] - x['S'])
print (df)
C S E D
1 1 3 5 2
2 1 5 6 1
3 1 6 6 0
4 2 1 3 2
5 2 3 7 4
6 2 7 7 0

decrement a python pandas column relativly to an other one

I have this structure with column B holding the number of same occurrence of the value of column A.
df = pd.DataFrame(dict(A=list('aaabbcccc'), B=list('333224444')))
df
# A B
# 0 a 3
# 1 a 3
# 2 a 3
# 3 b 2
# 4 b 2
# 5 c 4
# 6 c 4
# 7 c 4
# 8 c 4
I look for an elegant way to add the C column, that decrement for each line the value of B.
res
# A B C
# 0 a 3 2
# 1 a 3 1
# 2 a 3 0
# 3 b 2 1
# 4 b 2 0
# 5 c 4 3
# 6 c 4 2
# 7 c 4 1
# 8 c 4 0
Use cumcount(ascending=False), as suggested by #ALollz:
df.groupby('B').cumcount(ascending=False)
0 2
1 1
2 0
3 1
4 0
5 3
6 2
7 1
8 0
dtype: int64

How do I multiply a pandas column with a part of a multi index dataframe

I have a data frame with a multi index and one column.
Index fields are type and amount, the column is called count
I would like to add a column that multiplies amount and count
df2 = df.groupby(['type','amount']).count().copy()
# I then dropped all columns but one and renamed it to "count"
df2['total_amount'] = df2['count'].multiply(df2['amount'], axis='index')
doesn't work. I get a key error on amount.
How do I access a part of the multi index to use it in calculations?
Use GroupBy.transform for Series with same size as original df with aggregated values, so possible multiple:
count = df.groupby(['type','amount'])['type'].transform('count')
df['total_amount'] = df['amount'].multiply(count, axis='index')
print (df)
A amount C D E type total_amount
0 a 4 7 1 5 a 8
1 b 5 8 3 3 a 5
2 c 4 9 5 6 a 8
3 d 5 4 7 9 b 10
4 e 5 2 1 2 b 10
5 f 4 3 0 4 b 4
Or:
df = pd.DataFrame({'A':list('abcdef'),
'amount':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'type':list('aaabbb')})
print (df)
A amount C D E type
0 a 4 7 1 5 a
1 b 5 8 3 3 a
2 c 4 9 5 6 a
3 d 5 4 7 9 b
4 e 5 2 1 2 b
5 f 4 3 0 4 b
df2 = df.groupby(['type','amount'])['type'].count().to_frame('count')
df2['total_amount'] = df2['count'].mul(df2.index.get_level_values('amount'))
print (df2)
count total_amount
type amount
a 4 2 8
5 1 5
b 4 1 4
5 2 10

Pad dataframe discontinuous column

I have the following dataframe:
Name B C D E
1 A 1 2 2 7
2 A 7 1 1 7
3 B 1 1 3 4
4 B 2 1 3 4
5 B 3 1 3 4
What I'm trying to do is to obtain a new dataframe in which, for rows with the same "Name", the elements in the "B" column are continuous, hence in this example for rows with "Name" = A, the dataframe would have to be padded with elements ranging from 1 to 7, and the values for columns C, D, E should be 0.
Name B C D E
1 A 1 2 2 7
2 A 2 0 0 0
3 A 3 0 0 0
4 A 4 0 0 0
5 A 5 0 0 0
6 A 6 0 0 0
7 A 7 0 0 0
8 B 1 1 3 4
9 B 2 1 5 4
10 B 3 4 3 6
What I've done so far is to turn the B column values for the same "Name" into continuous values:
new_idx = df_.groupby('Name').apply(lambda x: np.arange(x.index.min(), x.index.max() + 1)).apply(pd.Series).stack()
and reindexing the original (having set B as the index) df using this new Series, but I'm having trouble reindexing using duplicates. Any help would be appreciated.
You can use:
def f(x):
a = np.arange(x.index.min(), x.index.max() + 1)
x = x.reindex(a, fill_value=0)
return (x)
new_idx = (df.set_index('B')
.groupby('Name')
.apply(f)
.drop('Name', 1)
.reset_index()
.reindex(columns=df.columns))
print (new_idx)
Name B C D E
0 A 1 2 2 7
1 A 2 0 0 0
2 A 3 0 0 0
3 A 4 0 0 0
4 A 5 0 0 0
5 A 6 0 0 0
6 A 7 1 1 7
7 B 1 1 3 4
8 B 2 1 3 4
9 B 3 1 3 4

Add a name to pandas dataframe index

As the picture shows , how can I add a name to index in pandas dataframe?And when added it should be like this:
You need set index name:
df.index.name = 'code'
Or rename_axis:
df = df.rename_axis('code')
Sample:
np.random.seed(100)
df = pd.DataFrame(np.random.randint(10,size=(5,5)),columns=list('ABCDE'),index=list('abcde'))
print (df)
A B C D E
a 8 8 3 7 7
b 0 4 2 5 2
c 2 2 1 0 8
d 4 0 9 6 2
e 4 1 5 3 4
df.index.name = 'code'
print (df)
A B C D E
code
a 8 8 3 7 7
b 0 4 2 5 2
c 2 2 1 0 8
d 4 0 9 6 2
e 4 1 5 3 4
df = df.rename_axis('code')
print (df)
A B C D E
code
a 8 8 3 7 7
b 0 4 2 5 2
c 2 2 1 0 8
d 4 0 9 6 2
e 4 1 5 3 4

Categories

Resources