Python - Swap values in multiple dataframes

Python - Swap values in multiple dataframes - python

I have a DataFrame like this
id val1 val2
0 A B
1 B B
2 A A
3 A A
And I would like swap values such as:
id val1 val2
0 B A
1 A A
2 B B
3 B B
I need to consider that the df could have other columns that I would like to keep unchanged.

You can use pd.DataFrame.applymap with a dictionary:
d = {'B': 'A', 'A': 'B'}
df = df.applymap(d.get).fillna(df)
print(df)
id val1 val2
0 0 B A
1 1 A A
2 2 B B
3 3 B B
For performance, in particular memory usage, you may wish to use categorical data:
for col in df.columns[1:]:
df[col] = df[col].astype('category')
df[col] = df[col].cat.rename_categories(d)

Try stacking, mapping, and then unstacking:
df[['val1', 'val2']] = (
df[['val1', 'val2']].stack().map({'B': 'A', 'A': 'B'}).unstack())
df
id val1 val2
0 0 B A
1 1 A A
2 2 B B
3 3 B B
For a (much) faster solution, use a nested list comprehension.
mapping = {'B': 'A', 'A': 'B'}
df[['val1', 'val2']] = [
[mapping.get(x, x) for x in row] for row in df[['val1', 'val2']].values]
df
id val1 val2
0 0 B A
1 1 A A
2 2 B B
3 3 B B

You can swap two values efficiently using numpy.where. However, if there are more than two values, this method stops working.
a = df[['val1', 'val2']].values
df[['val1', 'val2']] = np.where(a=='A', 'B', 'A')
id val1 val2
0 0 B A
1 1 A A
2 2 B B
3 3 B B
To adapt this keep other values the same, you can use np.select:
c1 = a=='A'
c2 = a=='B'
np.select([c1, c2], ['B', 'A'], a)

Use factorize and roll the corresponding values
def swaparoo(col):
i, r = col.factorize()
return pd.Series(r[(i + 1) % len(r)], col.index)
df[['id']].join(df[['val1', 'val2']].apply(swaparoo))
id val1 val2
0 0 B A
1 1 A A
2 2 B B
3 3 B B
Alternative gymnastics using the same function. This incorporates the whole dataframe into the factorization.
df.set_index('id').stack().pipe(swaparoo).unstack().reset_index()
Examples
df = pd.DataFrame(dict(id=range(4), val1=[*'ABAA'], val2=[*'BBAA']))
print(
df,
df.set_index('id').stack().pipe(swaparoo).unstack().reset_index(),
sep='\n\n'
)
id val1 val2
0 0 A B
1 1 B B
2 2 A A
3 3 A A
id val1 val2
0 0 B A
1 1 A A
2 2 B B
3 3 B B
df = pd.DataFrame(dict(id=range(4), val1=[*'AAAA'], val2=[*'BBBB']))
print(
df,
df.set_index('id').stack().pipe(swaparoo).unstack().reset_index(),
sep='\n\n'
)
id val1 val2
0 0 A B
1 1 A B
2 2 A B
3 3 A B
id val1 val2
0 0 B A
1 1 B A
2 2 B A
3 3 B A
df = pd.DataFrame(dict(id=range(4), val1=[*'AAAA'], val2=[*'BBBB'], val3=[*'CCCC']))
print(
df,
df.set_index('id').stack().pipe(swaparoo).unstack().reset_index(),
sep='\n\n'
)
id val1 val2 val3
0 0 A B C
1 1 A B C
2 2 A B C
3 3 A B C
id val1 val2 val3
0 0 B C A
1 1 B C A
2 2 B C A
3 3 B C A
df = pd.DataFrame(dict(id=range(4), val1=[*'ABCD'], val2=[*'BCDA'], val3=[*'CDAB']))
print(
df,
df.set_index('id').stack().pipe(swaparoo).unstack().reset_index(),
sep='\n\n'
)
id val1 val2 val3
0 0 A B C
1 1 B C D
2 2 C D A
3 3 D A B
id val1 val2 val3
0 0 B C D
1 1 C D A
2 2 D A B
3 3 A B C

Using replace : why we need a C here , check this
df[['val1','val2']].replace({'A':'C','B':'A','C':'B'})
Out[263]:
val1 val2
0 B A
1 A A
2 B B
3 B B

Related

How to convert binary columns with multiple occurrences into categorical data in Pandas

I have the following example data set
A
B
C
D
foo
0
1
1
bar
0
0
1
baz
1
1
0
How could extract the column names of each 1 occurrence in a row and put that into another column E so that I get the following table:
A
B
C
D
E
foo
0
1
1
C, D
bar
0
0
1
D
baz
1
1
0
B, C
Note that there can be more than two 1s per row.

You can use DataFrame.dot.
df['E'] = df[['B', 'C', 'D']].dot(df.columns[1:] + ', ').str.rstrip(', ')
df
A B C D E
0 foo 0 1 1 C, D
1 bar 0 0 1 D
2 baz 1 1 0 B, C
Inspired by jezrael's answer in this post.
Another way is that you can convert each row to boolean and use it as a selection mask to filter the column names.
cols = pd.Index(['B', 'C', 'D'])
df['E'] = df[cols].astype('bool').apply(lambda row: ", ".join(cols[row]), axis=1)
df
A B C D E
0 foo 0 1 1 C, D
1 bar 0 0 1 D
2 baz 1 1 0 B, C

How to populate categories in one column and paste the exact value in other column

It has been a long time that I dealt with pandas library. I searched for it but could not come up with an efficient way, which might be a function existed in the library.
Let's say I have the dataframe below:
df1 = pd.DataFrame({'V1':['A','A','B'],
'V2':['B','C','C'],
'Value':[4, 1, 5]})
df1
And I would like to extend this dataset and populate all the combinations of categories and put its corresponding value as exactly the same.
df2 = pd.DataFrame({'V1':['A','B','A', 'C', 'B', 'C'],
'V2':['B','A','C','A','C','B'],
'Value':[4, 4 , 1, 1, 5, 5]})
df2
In other words, in df1, A and B has Value of 4 and I also want to have a row of that B and A has Value of 4 in the second dataframe. It is very similar to melting. I also do not want to use a for loop. I am looking for a more efficient way.

Use:
df = pd.concat([df1, df1.rename(columns={'V2':'V1', 'V1':'V2'})]).sort_index().reset_index(drop=True)
Output:
V1 V2 Value
0 A B 4
1 B A 4
2 A C 1
3 C A 1
4 B C 5
5 C B 5

Or np.vstack:
>>> pd.DataFrame(np.vstack((df1.to_numpy(), df1.iloc[:, np.r_[1:-1:-1, -1]].to_numpy())), columns=df1.columns)
V1 V2 Value
0 A B 4
1 A C 1
2 B C 5
3 B A 4
4 C A 1
5 C B 5
>>>
For correct order:
>>> pd.DataFrame(np.vstack((df1.to_numpy(), df1.iloc[:, np.r_[1:-1:-1, -1]].to_numpy())), columns=df1.columns, index=[*df1.index, *df1.index]).sort_index()
V1 V2 Value
0 A B 4
0 B A 4
1 A C 1
1 C A 1
2 B C 5
2 C B 5
>>>
And index reset:
>>> pd.DataFrame(np.vstack((df1.to_numpy(), df1.iloc[:, np.r_[1:-1:-1, -1]].to_numpy())), columns=df1.columns, index=[*df1.index, *df1.index]).sort_index().reset_index(drop=True)
V1 V2 Value
0 A B 4
1 B A 4
2 A C 1
3 C A 1
4 B C 5
5 C B 5
>>>

You can use methods assign and append:
df1.append(df1.assign(V1=df1.V2, V2=df1.V1), ignore_index=True)
Output:
V1 V2 Value
0 A B 4
1 A C 1
2 B C 5
3 B A 4
4 C A 1
5 C B 5

Loading columns into multiple DataFrames based on prefix

I want to load columns with specific prefixes into separate DataFrames.
The columns I want have specific prefixes i.e.
A_1 A_2 B_1 B_2 C_1 C_2
1 0 0 0 0 0
1 0 0 1 1 1
0 1 1 1 1 0
I have a list of all the prefixes:
prefixes = ["A", "B", "C"]
I want to do something like this:
for prefix in prefixes:
f"df_{prefix}" = pd.read_csv("my_file.csv",
usecols=[f"{prefix}_1,
f"{prefix}_2,
f"{prefix}_3,])
So each DataFrame has the prefix in the name, but I'm not quite sure of the best way to do this or the syntax required.

You could try it with a different approach. Load the full csv once. Create three dfs out of it by dropping the columns don't mach your prefix.
x = pd.read_csv("my_file.csv")
notA = [c for c in x.columns if 'A' not in c]
notB = [c for c in x.columns if 'B' not in c]
notC = [c for c in x.columns if 'C' not in c]
a = x.drop(notA,1)
b = x.drop(notB,1)
c = x.drop(notC,1)

Considering you have a big dataframe like this:
In [1341]: df
Out[1341]:
A_1 A_2 B_1 B_2 C_1 C_2
0 1 0 0 0 0 0
1 1 0 0 1 1 1
2 0 1 1 1 1 0
Have a master list of prefixes:
In [1374]: master_list = ['A','B','C']
Create an empty dictionary to hold multiple subsets of dataframe:
In [1377]: dct = {}
Loop through the master list and store the column names in the above dict:
In [1378]: for i in master_list:
...: dct['{}_list'.format(i)] = [e for e in df.columns if e.startswith('{}'.format(i))]
Now, the dct has below keys with values:
A_list : ['A_1', 'A_2']
B_list : ['B_1', 'B_2']
C_list : ['C_1', 'C_2']
Then, subset your dataframes like below:
In [1381]: for k in dct:
...: dct[k] = df[dct[k]]
Now, the dictionary has actual rows of dataframe against every key:
In [1384]: for k in dct:
...: print dct[k]
In [1347]: df_A
Out[1347]:
A_1 A_2
0 1 0
1 1 0
2 0 1
In [1350]: df_B
Out[1350]:
B_1 B_2
0 0 0
1 0 1
2 1 1
In [1355]: df_C
Out[1355]:
C_1 C_2
0 0 0
1 1 1
2 1 0

First filter out not matched columns with startswith with boolean indexing and loc, because filter columns:
print (df)
A_1 A_2 B_1 B_2 C_1 D_2
0 1 0 0 0 0 0
1 1 0 0 1 1 1
2 0 1 1 1 1 0
prefixes = ["A", "B", "C"]
df = df.loc[:, df.columns.str.startswith(tuple(prefixes))]
print (df)
A_1 A_2 B_1 B_2 C_1
0 1 0 0 0 0
1 1 0 0 1 1
2 0 1 1 1 1
Then create Multiindex by split and then dictionary with groupby for dictioanry of DataFrames:
df.columns = df.columns.str.split('_', expand=True)
print (df)
A B C
1 2 1 2 1
0 1 0 0 0 0
1 1 0 0 1 1
2 0 1 1 1 1
d = {k: v[k] for k, v in df.groupby(level=0, axis=1)}
print (d['A'])
1 2
0 1 0
1 1 0
2 0 1
Or use lambda function with split:
d = {k: v for k, v in df.groupby(lambda x: x.split('_')[0], axis=1)}
print (d['A'])
A_1 A_2
0 1 0
1 1 0
2 0 1

merging rows with repeating column values

I have a dataframe as follows:
data
0 a
1 a
2 a
3 a
4 a
5 b
6 b
7 b
8 b
9 b
I want to group the repeating values of a and b into a single row element as follows:
data
0 a
a
a
a
a
1 b
b
b
b
b
How do I go about doing this? I tried the following but it puts each repeating value in its own column
df.groupby('data')

Seems like a pivot problem, but since you missing the column(create by cumcount) and index(create by factorize) columns , it is hard to figure out
pd.crosstab(pd.factorize(df.data)[0],df.groupby('data').cumcount(),df.data,aggfunc='sum')
Out[358]:
col_0 0 1 2 3 4
row_0
0 a a a a a
1 b b b b b

Something like
index = ((df['data'] != df['data'].shift()).cumsum() - 1).rename(columns= {'data':''})
df = df.set_index(index)
data
0 a
0 a
0 a
0 a
0 a
1 b
1 b
1 b
1 b
1 b

You can use pd.factorize followed by set_index:
df = df.assign(key=pd.factorize(df['data'], sort=False)[0]).set_index('key')
print(df)
data
key
0 a
0 a
0 a
0 a
0 a
1 b
1 b
1 b
1 b
1 b

Append values of row with same values

I have following data frame:
1 A a
1 A b
2 B c
1 A d
How do I append all the values of a row with same values to data frame:
1 A a,c,d
2 B c

You can use groupby and apply function join :
df.columns = ['a','b','c']
print (df)
a b c
0 1 A a
1 1 A b
2 2 B c
3 1 A d
print (df.groupby(['a', 'b'])['c'].apply(', '.join).reset_index())
a b c
0 1 A a, b, d
1 2 B c
Or if first column is index:
df.columns = ['a','b']
print (df)
a b
1 A a
1 A b
2 B c
1 A d
df1 = df.b.groupby([df.index, df.a]).apply(', '.join).reset_index(name='c')
df1.columns = ['a','b','c']
print (df1)
a b c
0 1 A a, b, d
1 2 B c

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - Swap values in multiple dataframes - python

I have a DataFrame like this id val1 val2 0 A B 1 B B 2 A A 3 A A And I would like swap values such as: id val1 val2 0 B A 1 A A 2 B B 3 B B I need to consider that the df could have other columns that I would like to keep unchanged.

Using replace : why we need a C here , check this df[['val1','val2']].replace({'A':'C','B':'A','C':'B'}) Out[263]: val1 val2 0 B A 1 A A 2 B B 3 B B

Related

How to convert binary columns with multiple occurrences into categorical data in Pandas

How to populate categories in one column and paste the exact value in other column

Loading columns into multiple DataFrames based on prefix

merging rows with repeating column values

Append values of row with same values

Categories

Resources