Multiple pandas columns

Multiple pandas columns - python

If a have pandas dataframe with 4 columns like this:
A B C D
0 2 4 1 9
1 3 2 9 7
2 1 6 9 2
3 8 6 5 4
is it possible to apply df.cumsum() in some way to get the results in a new column next to existing column like this:
A AA B BB C CC D DD
0 2 2 4 4 1 1 9 9
1 3 5 2 6 9 10 7 16
2 1 6 6 12 9 19 2 18
3 8 14 6 18 5 24 4 22

You can create new columns using assign:
result = df.assign(**{col*2:df[col].cumsum() for col in df})
and order the columns with sort_index:
result.sort_index(axis=1)
# A AA B BB C CC D DD
# 0 2 2 4 4 1 1 9 9
# 1 3 5 2 6 9 10 7 16
# 2 1 6 6 12 9 19 2 18
# 3 8 14 6 18 5 24 4 22
Note that depending on the column names, sorting may not produce the desired order. In that case, using reindex is a more robust way of ensuring you obtain the desired column order:
result = df.assign(**{col*2:df[col].cumsum() for col in df})
result = result.reindex(columns=[item for col in df for item in (col, col*2)])
Here is an example which demonstrates the difference:
import pandas as pd
df = pd.DataFrame({'A': [2, 3, 1, 8], 'A A': [4, 2, 6, 6], 'C': [1, 9, 9, 5], 'D': [9, 7, 2, 4]})
result = df.assign(**{col*2:df[col].cumsum() for col in df})
print(result.sort_index(axis=1))
# A A A A AA A AA C CC D DD
# 0 2 4 4 2 1 1 9 9
# 1 3 2 6 5 9 10 7 16
# 2 1 6 12 6 9 19 2 18
# 3 8 6 18 14 5 24 4 22
result = result.reindex(columns=[item for col in df for item in (col, col*2)])
print(result)
# A AA A A A AA A C CC D DD
# 0 2 2 4 4 1 1 9 9
# 1 3 5 2 6 9 10 7 16
# 2 1 6 6 12 9 19 2 18
# 3 8 14 6 18 5 24 4 22

#unutbu's way certainly works but using insert reads better to me. Plus you don't need to worry about sorting/reindexing!
for i, col_name in enumerate(df):
df.insert(i * 2 + 1, col_name * 2, df[col_name].cumsum())
df
returns
A AA B BB C CC D DD
0 2 2 4 4 1 1 9 9
1 3 5 2 6 9 10 7 16
2 1 6 6 12 9 19 2 18
3 8 14 6 18 5 24 4 22

Related

Replace values in a column that come after a specific value

I would like to replace values in a column, but only to the values seen after an specific value
for example, I have the following dataset:
In [108]: df=pd.DataFrame([[12,13,14,15,16,17],[4,10,5,6,1,3],[1, 3,5,4,9,1],[2, 4, 1,8,3,4], [4, 2, 6,7,1,8]], columns=['ID','time,'A', 'B', 'C'])
In [109]: df
Out[109]:
ID time A B C
0 12 4 1 2 4
1 13 10 3 4 2
2 14 5 5 1 6
3 15 6 4 8 7
4 16 1 9 3 1
5 17 3 1 4 8
and I want to change for column "A" all the values that come after 5 for a 1, for column "B" all the values that come after 1 for 6, for column "C" change all the values after 7 for a 5. so it will look like this:
ID time A B C
0 12 4 1 2 4
1 13 10 3 4 2
2 14 5 5 1 6
3 15 6 1 6 7
4 16 1 1 6 5
5 17 3 1 6 5
I know that I could use where to get sort of a similar effect, but if I put a condition like df["A"] = np.where(x!=5,1,x), but obviously this will change the values before 5 as well. I can't think of anything else at the moment.
Thanks for the help.

Use DataFrame.mask with shifted valeus by DataFrame.shift, compared by dictioanry and for next Trues is used DataFrame.cummax:
df=pd.DataFrame([[12,13,14,15,16,17],[4,10,5,6,1,3],
[1, 3,5,4,9,1],[2, 4, 1,8,3,4], [4, 2, 6,7,1,8]],
index=['ID','time','A', 'B', 'C']).T
after = {'A':5, 'B':1, 'C': 7}
new = {'A':1, 'B':6, 'C': 5}
cols = list(after.keys())
s = pd.Series(new)
df[cols] = df[cols].mask(df[cols].shift().eq(after).cummax(), s, axis=1)
print (df)
ID time A B C
0 12 4 1 2 4
1 13 10 3 4 2
2 14 5 5 1 6
3 15 6 1 6 7
4 16 1 1 6 5
5 17 3 1 6 5

Cut data when values transition from low to high or high to low using python?

I was wondering if there was a way (possibly using Pandas) to cut a dataset (table below) into groups when the value of B transitions either above or below a set value:
A
B
1
10
2
15
3
12
4
2
5
5
6
3
7
4
8
2
9
14
10
11
For instance if the transition value was 6 then they would be grouped like:
A
B
Group
1
10
A
2
15
A
3
12
A
4
2
B
5
5
B
6
3
B
7
4
B
8
2
B
9
14
C
10
11
C
It's important that there are distinct groups and not just everything above/below 6 being in one group

Try:
df['Group'] = df['A'].ge(df['B']).ne(df['A'].ge(df['B']).shift()).cumsum()
>>> df
A B Group
0 1 10 1
1 2 15 1
2 3 12 1
3 4 2 2
4 5 5 2
5 6 3 2
6 7 4 2
7 8 2 2
8 9 14 3
9 10 11 3
If you want letter: df['Group'].add(64).apply(chr)

Here is my suggestion. Not the shortest, but it works:
import itertools
l=[(i>6)*1 for i in df.B.to_list()]
m = [(i, list(k)) for i, k in itertools.groupby(l))]
vals = ['A', 'B', 'C', 'D', 'E', 'F', 'G']
res= []
for i in range(len(m)):
res.extend([vals[i]]*len(m[i][1]))
df['Group'] = res
>>>print(df)
A B Group
0 1 10 A
1 2 15 A
2 3 12 A
3 4 2 B
4 5 5 B
5 6 3 B
6 7 4 B
7 8 2 B
8 9 14 C
9 10 11 C

How does pandas convert one column of data into another?

I have a dataframe generated by pandas, as follows：
NO CODE
1 a
2 a
3 a
4 a
5 a
6 a
7 b
8 b
9 a
10 a
11 a
12 a
13 b
14 a
15 a
16 a
I want to convert the CODE column data to get the NUM column. The encoding rules are as follows:
NO CODE NUM
1 a 1
2 a 2
3 a 3
4 a 4
5 a 5
6 a 6
7 b b
8 b b
9 a 1
10 a 2
11 a 3
12 a 4
13 b b
14 a 1
15 a 2
16 a 3
thank you！

Try:
a_group = df.CODE.eq('a')
df['NUM'] = np.where(a_group,
df.groupby(a_group.ne(a_group.shift()).cumsum())
.CODE.cumcount()+1,
df.CODE)
on
df = pd.DataFrame({'CODE':list('baaaaaabbaaaabbaa')})
yields
CODE NUM
-- ------ -----
0 b b
1 a 1
2 a 2
3 a 3
4 a 4
5 a 5
6 a 6
7 b b
8 b b
9 a 1
10 a 2
11 a 3
12 a 4
13 b b
14 b b
15 a 1
16 a 2

IIUC
s=df.CODE.eq('b').cumsum()
df['NUM']=df.CODE.where(df.CODE.eq('b'),s[~df.CODE.eq('b')].groupby(s).cumcount()+1)
df
Out[514]:
NO CODE NUM
0 1 a 1
1 2 a 2
2 3 a 3
3 4 a 4
4 5 a 5
5 6 a 6
6 7 b b
7 8 b b
8 9 a 1
9 10 a 2
10 11 a 3
11 12 a 4
12 13 b b
13 14 a 1
14 15 a 2
15 16 a 3

Pandas Split DataFrame using row index

I want to split dataframe by uneven number of rows using row index.
The below code:
groups = df.groupby((np.arange(len(df.index))/l[1]).astype(int))
works only for uniform number of rows.
df
a b c
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
l = [2, 5, 7]
df1
1 1 1
2 2 2
df2
3,3,3
4,4,4
5,5,5
df3
6,6,6
7,7,7
df4
8,8,8

You could use list comprehension with a little modications your list, l, first.
print(df)
a b c
0 1 1 1
1 2 2 2
2 3 3 3
3 4 4 4
4 5 5 5
5 6 6 6
6 7 7 7
7 8 8 8
l = [2,5,7]
l_mod = [0] + l + [max(l)+1]
list_of_dfs = [df.iloc[l_mod[n]:l_mod[n+1]] for n in range(len(l_mod)-1)]
Output:
list_of_dfs[0]
a b c
0 1 1 1
1 2 2 2
list_of_dfs[1]
a b c
2 3 3 3
3 4 4 4
4 5 5 5
list_of_dfs[2]
a b c
5 6 6 6
6 7 7 7
list_of_dfs[3]
a b c
7 8 8 8

I think this is what you need:
df = pd.DataFrame({'a': np.arange(1, 8),
'b': np.arange(1, 8),
'c': np.arange(1, 8)})
df.head()
a b c
0 1 1 1
1 2 2 2
2 3 3 3
3 4 4 4
4 5 5 5
5 6 6 6
6 7 7 7
last_check = 0
dfs = []
for ind in [2, 5, 7]:
dfs.append(df.loc[last_check:ind-1])
last_check = ind
Although list comprehension are much more efficient than a for loop, the last_check is necessary if you don't have a pattern in your list of indices.
dfs[0]
a b c
0 1 1 1
1 2 2 2
dfs[2]
a b c
5 6 6 6
6 7 7 7

I think this is you are looking for.,
l = [2, 5, 7]
dfs=[]
i=0
for val in l:
if i==0:
temp=df.iloc[:val]
dfs.append(temp)
elif i==len(l):
temp=df.iloc[val]
dfs.append(temp)
else:
temp=df.iloc[l[i-1]:val]
dfs.append(temp)
i+=1
Output:
a b c
0 1 1 1
1 2 2 2
a b c
2 3 3 3
3 4 4 4
4 5 5 5
a b c
5 6 6 6
6 7 7 7
Another Solution:
l = [2, 5, 7]
t= np.arange(l[-1])
l.reverse()
for val in l:
t[:val]=val
temp=pd.DataFrame(t)
temp=pd.concat([df,temp],axis=1)
for u,v in temp.groupby(0):
print v
Output:
a b c 0
0 1 1 1 2
1 2 2 2 2
a b c 0
2 3 3 3 5
3 4 4 4 5
4 5 5 5 5
a b c 0
5 6 6 6 7
6 7 7 7 7

You can create an array to use for indexing via NumPy:
import pandas as pd, numpy as np
df = pd.DataFrame(np.arange(24).reshape((8, 3)), columns=list('abc'))
L = [2, 5, 7]
idx = np.cumsum(np.in1d(np.arange(len(df.index)), L))
for _, chunk in df.groupby(idx):
print(chunk, '\n')
a b c
0 0 1 2
1 3 4 5
a b c
2 6 7 8
3 9 10 11
4 12 13 14
a b c
5 15 16 17
6 18 19 20
a b c
7 21 22 23
Instead of defining a new variable for each dataframe, you can use a dictionary:
d = dict(tuple(df.groupby(idx)))
print(d[1]) # print second groupby value
a b c
2 6 7 8
3 9 10 11
4 12 13 14

Adding duplicate rows to a DataFrame

I did not figure out how to solve the following question!
consider the following data set:
df = pd.DataFrame(data=np.array([['a',1, 2, 3], ['a',4, 5, 6],
['b',7, 8, 9], ['b',10, 11 , 12]]),
columns=['id','A', 'B', 'C'])
id A B C
a 1 2 3
a 4 5 6
b 7 8 9
b 10 11 12
I need to group the data by id and in each group duplicate the first row and add it to the dataset like the following data set:
id A B C A B C
a 1 2 3 1 2 3
a 4 5 6 1 2 3
b 7 8 9 7 8 9
b 10 11 12 7 8 9
I really appreciate it for your help.
I did the following steps, however I could not expand it :
df1 = df.loc [0:0 , 'A' :'C']
df3 = pd.concat([df,df1],axis=1)

Use groupby + first, and then concatenate df with this result:
v = df.groupby('id').transform('first')
pd.concat([df, v], 1)
id A B C A B C
0 a 1 2 3 1 2 3
1 a 4 5 6 1 2 3
2 b 7 8 9 7 8 9
3 b 10 11 12 7 8 9

cumcount + where+ffill
v=df.groupby('id').cumcount()==0
pd.concat([df,df.iloc[:,1:].where(v).ffill()],1)
Out[57]:
id A B C A B C
0 a 1 2 3 1 2 3
1 a 4 5 6 1 2 3
2 b 7 8 9 7 8 9
3 b 10 11 12 7 8 9

One can also try drop_duplicates and merge.
df_unique = df.drop_duplicates("id")
df.merge(df_unique, on="id", how="left")
id A_x B_x C_x A_y B_y C_y
0 a 1 2 3 1 2 3
1 a 4 5 6 1 2 3
2 b 7 8 9 7 8 9
3 b 10 11 12 7 8 9

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Multiple pandas columns - python

Related

Replace values in a column that come after a specific value

Cut data when values transition from low to high or high to low using python?

How does pandas convert one column of data into another?

Pandas Split DataFrame using row index

Adding duplicate rows to a DataFrame

Categories

Resources