decrement a python pandas column relativly to an other one

decrement a python pandas column relativly to an other one - python

I have this structure with column B holding the number of same occurrence of the value of column A.
df = pd.DataFrame(dict(A=list('aaabbcccc'), B=list('333224444')))
df
# A B
# 0 a 3
# 1 a 3
# 2 a 3
# 3 b 2
# 4 b 2
# 5 c 4
# 6 c 4
# 7 c 4
# 8 c 4
I look for an elegant way to add the C column, that decrement for each line the value of B.
res
# A B C
# 0 a 3 2
# 1 a 3 1
# 2 a 3 0
# 3 b 2 1
# 4 b 2 0
# 5 c 4 3
# 6 c 4 2
# 7 c 4 1
# 8 c 4 0

Use cumcount(ascending=False), as suggested by #ALollz:
df.groupby('B').cumcount(ascending=False)
0 2
1 1
2 0
3 1
4 0
5 3
6 2
7 1
8 0
dtype: int64

Related

Pandas drop duplicate base on 2 columns, having differents value

How to drop duplicate in that specific way:
Index B C
1 2 1
2 2 0
3 3 1
4 3 1
5 4 0
6 4 0
7 4 0
8 5 1
9 5 0
10 5 1
Desired output :
Index B C
3 3 1
5 4 0
So dropping duplicate on B but if C is the same on all row and keep one sample/record.
For example, B = 3 for index 3/4 but since C = 1 for both, I do not destroy them all
But for example B = 5 for index 8/9/10 since C = 1 or 0, it get destroy.

Try this, using transform with nunique and drop_duplicates:
df[df.groupby('B')['C'].transform('nunique') == 1].drop_duplicates(subset='B')
Output:
B C
Index
3 3 1
5 4 0

Pandas take the line value below

There is such a model of real data:
C S E D
1 1 3 0 0
2 1 5 0 0
3 1 6 0 0
4 2 1 0 0
5 2 3 0 0
6 2 7 0 0
С - category, S - start, E - end, D - delta
Using pandas, you need to enter the value of column S with the condition id = id+1 in column E, and the last value of category E is equal to the value from column S of the same row
It turns out:
C S E D
1 1 3 5 0
2 1 5 6 0
3 1 6 6 0
4 2 1 3 0
5 2 3 7 0
6 2 7 7 0
And then subtract S from E and put it in D. This, in principle, is easy. The difficulty is filling in column E
The result is this:
C S E D
1 1 3 5 2
2 1 5 6 1
3 1 6 6 0
4 2 1 3 2
5 2 3 7 4
6 2 7 7 0

Use DataFrameGroupBy.shift with replace last missing values by original with Series.fillna and then only subtract for column D:
df['E'] = df.groupby('C')['S'].shift(-1).fillna(df['S']).astype(int)
df['D'] = df['E'] - df['S']
Or if use DataFrame.assign is necessary use lambda function for use counted values of E column:
df = df.assign(E = df.groupby('C')['S'].shift(-1).fillna(df['S']).astype(int),
D = lambda x: x['E'] - x['S'])
print (df)
C S E D
1 1 3 5 2
2 1 5 6 1
3 1 6 6 0
4 2 1 3 2
5 2 3 7 4
6 2 7 7 0

While Loop Alternative in Python

I am working on a huge dataframe and trying to create a new column, based on a condition in another column. Right now, I have a big while-loop and this calculation takes too much time, is there an easier way to do it?
With lambda for example?:
def promo(dataframe, a):
i=0
while i < len(dataframe)-1:
i=i+1
if dataframe.iloc[i-1,5] >= a:
dataframe.iloc[i-1,6] = 1
else:
dataframe.iloc[i-1,6] = 0
return dataframe

Don't use loops in pandas, they are slow compared to a vectorized solution - convert boolean mask to integers by astype True, False are converted to 1, 0:
dataframe = pd.DataFrame({'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':list('aaabbb'),
'F':[5,3,6,9,2,4],
'G':[5,3,6,9,2,4]
})
a = 5
dataframe['new'] = (dataframe.iloc[:,5] >= a).astype(int)
print (dataframe)
A B C D E F G new
0 a 4 7 1 a 5 5 1
1 b 5 8 3 a 3 3 0
2 c 4 9 5 a 6 6 1
3 d 5 4 7 b 9 9 1
4 e 5 2 1 b 2 2 0
5 f 4 3 0 b 4 4 0
If you want to overwrite the 7th column:
a = 5
dataframe.iloc[:,6] = (dataframe.iloc[:,5] >= a).astype(int)
print (dataframe)
A B C D E F G
0 a 4 7 1 a 5 1
1 b 5 8 3 a 3 0
2 c 4 9 5 a 6 1
3 d 5 4 7 b 9 1
4 e 5 2 1 b 2 0
5 f 4 3 0 b 4 0

Pad dataframe discontinuous column

I have the following dataframe:
Name B C D E
1 A 1 2 2 7
2 A 7 1 1 7
3 B 1 1 3 4
4 B 2 1 3 4
5 B 3 1 3 4
What I'm trying to do is to obtain a new dataframe in which, for rows with the same "Name", the elements in the "B" column are continuous, hence in this example for rows with "Name" = A, the dataframe would have to be padded with elements ranging from 1 to 7, and the values for columns C, D, E should be 0.
Name B C D E
1 A 1 2 2 7
2 A 2 0 0 0
3 A 3 0 0 0
4 A 4 0 0 0
5 A 5 0 0 0
6 A 6 0 0 0
7 A 7 0 0 0
8 B 1 1 3 4
9 B 2 1 5 4
10 B 3 4 3 6
What I've done so far is to turn the B column values for the same "Name" into continuous values:
new_idx = df_.groupby('Name').apply(lambda x: np.arange(x.index.min(), x.index.max() + 1)).apply(pd.Series).stack()
and reindexing the original (having set B as the index) df using this new Series, but I'm having trouble reindexing using duplicates. Any help would be appreciated.

You can use:
def f(x):
a = np.arange(x.index.min(), x.index.max() + 1)
x = x.reindex(a, fill_value=0)
return (x)
new_idx = (df.set_index('B')
.groupby('Name')
.apply(f)
.drop('Name', 1)
.reset_index()
.reindex(columns=df.columns))
print (new_idx)
Name B C D E
0 A 1 2 2 7
1 A 2 0 0 0
2 A 3 0 0 0
3 A 4 0 0 0
4 A 5 0 0 0
5 A 6 0 0 0
6 A 7 1 1 7
7 B 1 1 3 4
8 B 2 1 3 4
9 B 3 1 3 4

How to grouby one column and do nothing to other columns in pandas?

I have a dataframe like this:
a b c d
0 1 1 1 1
1 1 2 2 2
2 1 3 3 3
3 1 4 4 4
4 2 1 1 1
5 2 2 2 2
6 2 3 3 3
How to groupby 'a', and do nothing to column b c d, and split into several dataframes? Like this:
First groupby column 'a':
a b c d
0 1 1 1 1
1 2 2 2
2 3 3 3
3 4 4 4
4 2 1 1 1
5 2 2 2
6 3 3 3
And then split into different dataframes based on numbers in 'a':
dataframe 1:
a b c d
0 1 1 1 1
1 2 2 2
2 3 3 3
3 4 4 4
dataframe 2:
a b c d
0 2 1 1 1
1 2 2 2
2 3 3 3
:
:
:
dataframe n:
a b c d
0 n 1 1 1
1 2 2 2
2 3 3 3

Iterate over each group that df.groupby returns.
for _, g in df.groupby('a'):
print(g, '\n')
a b c d
0 1 1 1 1
1 1 2 2 2
2 1 3 3 3
3 1 4 4 4
a b c d
4 2 1 1 1
5 2 2 2 2
6 2 3 3 3
If you want a dict of dataframes, I'd recommend:
df_dict = {idx : g for idx, g in df.groupby('a')}
Here, idx is the unique a value.
A couple of nifty techniques courtesy Root:
df_dict = dict(list(df.groupby('a'))) # for a dictionary
And,
idxs, dfs = zip(*df.groupby('a')) # separate lists
idxs
(1, 2)
dfs
( a b c d
0 1 1 1 1
1 1 2 2 2
2 1 3 3 3
3 1 4 4 4, a b c d
4 2 1 1 1
5 2 2 2 2
6 2 3 3 3)

This is the way by using np.split
idx=df.a.diff().fillna(0).nonzero()[0]
dfs = np.split(df, idx, axis=0)
dfs
Out[210]:
[ a b c d
0 1 1 1 1
1 1 2 2 2
2 1 3 3 3
3 1 4 4 4, a b c d
4 2 1 1 1
5 2 2 2 2
6 2 3 3 3]
dfs[0]
Out[211]:
a b c d
0 1 1 1 1
1 1 2 2 2
2 1 3 3 3
3 1 4 4 4

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

decrement a python pandas column relativly to an other one - python

Use cumcount(ascending=False), as suggested by #ALollz: df.groupby('B').cumcount(ascending=False) 0 2 1 1 2 0 3 1 4 0 5 3 6 2 7 1 8 0 dtype: int64

Related

Pandas drop duplicate base on 2 columns, having differents value

Pandas take the line value below

While Loop Alternative in Python

Pad dataframe discontinuous column

How to grouby one column and do nothing to other columns in pandas?

Categories

Resources