How to method-chain `ffill(axis=1)` in a dataframe

How to method-chain `ffill(axis=1)` in a dataframe - python

I would like to fill column b of a dataframe with values from a in case b is nan, and I would like to do it in a method chain, but I cannot figure out how to do this.
The following works
import numpy as np
import pandas as pd
df = pd.DataFrame(
{"a": [1, 2, 3, 4], "b": [10, np.nan, np.nan, 40], "c": ["a", "b", "c", "d"]}
)
df["b"] = df[["a", "b"]].ffill(axis=1)["b"]
print(df.to_markdown())
| | a | b | c |
|---:|----:|----:|:----|
| 0 | 1 | 10 | a |
| 1 | 2 | 2 | b |
| 2 | 3 | 3 | c |
| 3 | 4 | 40 | d |
but is not method-chained. Thanks a lot for the help!

This replaces NA in column df.b with values from df.a using fillna instead of ffill:
import numpy as np
import pandas as pd
df = (
pd.DataFrame({"a": [1, 2, 3, 4], "b": [10, np.nan, np.nan, 40], "c": ["a", "b", "c", "d"]})
.assign(b=lambda x: x.b.fillna(df.a))
)
display(df)
df.dtypes
Output:

df = pd.DataFrame({"a": [1, 2, 3, 4], "b": [10, np.nan, np.nan, 40], "c": ["a", "b", "c", "d"]})
df['b'] = df.b.fillna(df.a)
| | a | b | c |
|---:|----:|----:|:----|
| 0 | 1 | 10 | a |
| 1 | 2 | 2 | b |
| 2 | 3 | 3 | c |
| 3 | 4 | 40 | d |

One solution I have found is by using the pyjanitor library:
import pandas as pd
import pyjanitor
df = pd.DataFrame(
{"a": [1, 2, 3, 4], "b": [10, np.nan, np.nan, 40], "c": ["a", "b", "c", "d"]}
)
df.case_when(
lambda x: x["b"].isna(), lambda x: x["a"], lambda x: x["b"], column_name="b"
)
Here, the case_when(...) can be integrated into a chain of manipulations and we still keep the whole dataframe in the chain.
I wonder how this could be accomplished without pyjanitor.

Related

How to replace values in column in one DataFrame by values from second DataFrame both have major key in Python Pandas?

I have 2 DataFrames in Python Pandas like below:
DF1
COL1 | ... | COLn
-----|------|-------
A | ... | ...
B | ... | ...
A | ... | ...
.... | ... | ...
DF2
G1 | G2
----|-----
A | 1
B | 2
C | 3
D | 4
And I need to replace values from DF1 COL1 by values from DF2 G2
So, as a result I need DF1 in formt like below:
COL1 | ... | COLn
-----|------|-------
1 | ... | ...
2 | ... | ...
1 | ... | ...
.... | ... | ...
Of course my table in huge and it could be good to do that automaticly not by manually adjusting the values :)
How can I do that in Python Pandas?

import pandas as pd
df1 = pd.DataFrame({"COL1": ["A", "B", "A"]}) # Add more columns as required
df2 = pd.DataFrame({"G1": ["A", "B", "C", "D"], "G2": [1, 2, 3, 4]})
df1["COL1"] = df1["COL1"].map(df2.set_index("G1")["G2"])
output df1:
COL1
0 1
1 2
2 1

you could try using the assign or update method of Dataframe:
import pandas as pd
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'B': [7, 8, 9]})
try
df1 = df1.assign(B=df2['B'])# assign will create a new Dataframe
or
df1.update(df2)# update makes a in place modification
here are links to the docs https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.assign.html
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.update.html

Consequently subtract columns and get the resulted cumsum

Having the dataframe like this:
|one|two|three|
| 1 | 2 | 4 |
| 4 | 6 | 3 |
| 2 | 4 | 9 |
How can I subtract values from column one from values of column two and so on and then get the sum of obtained values? Like
|one|two|three| one-two | one-three | two-three | SUM |
| 1 | 2 | 4 | -1 | -3 | -2 | -6 |
| 4 | 6 | 3 |
| 2 | 4 | 9 |
As a result I need a df with only one-three columns and SUM onley

You can try this:
from itertools import combinations
import pandas as pd
df = pd.DataFrame({'one': {0: 1, 1: 4, 2: 2},
'two': {0: 2, 1: 6, 2: 4},
'three': {0: 4, 1: 3, 2: 9}})
create column combination using itertools.combinations
## create column combinations
column_combinations = list(combinations(list(df.columns), 2))
Subtract each combination column and create new column
column_names = []
for column_comb in column_combinations:
name = f"{column_comb[0]}_{column_comb[1]}"
df[name] = df[column_comb[0]] - df[column_comb[1]]
column_names.append(name)
df["SUM"] = df[column_names].sum(axis=1)
print(df)
output:

df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6], 'c': [7, 8, 9]})
column_differences = df.apply(np.diff, axis=1)
total_column_differences = np.sum(column_differences.to_list(), axis=1)
df['SUM'] = total_column_differences
print(df)
Gives the following.
a b c SUM
0 1 4 7 6
1 2 5 8 6
2 3 6 9 6

DataFrames make operations like this very easy
df['new_column'] = df['colA'] - df['colB']
PyData has a great resource to learn more.
In your example:
import pandas as pd
df = pd.DataFrame(data=[[1,2,4],[4,6,3], [2,4,9]], columns=['one', 'two', 'three'])
df['one-two'] = df['one'] - df['two']
df['one-three'] = df['one'] - df['three']
df['two-three'] = df['two'] - df['three']
df['sum'] = df['one-two'] + df['one-three'] + df['two-three']
df.drop(columns=['one', 'two', 'three'], inplace=True)
# print(df)
one-two one-three two-three sum
0 -1 -3 -2 -6
1 -2 1 3 2
2 -2 -7 -5 -14

Assuming you have only 3 columns and only want pairs of 2 features:
from itertools import combinations
from pandas import DataFrame
# Creating the DataFrame
df = DataFrame({'one': [1,4,2], 'two': [2,6,4], 'three': [4,3,9]})
# Getting the possible feature combinations
combs = combinations(df.columns, 2)
# Calculating the totals for the column pairs
for comb in combs:
df['-'.join(comb)] = df[comb[0]] - df[comb[1]]
# Adding the totals to the DataFrame
df['SUM'] = df[df.columns[3:]].sum(axis=1)
one two three one-two one-three two-three SUM
0 1 2 4 -1 -3 -2 -6
1 4 6 3 -2 1 3 2
2 2 4 9 -2 -7 -5 -14

Remove pandas columns based on list

I have a list:
my_list = ['a', 'b']
and a pandas dataframe:
d = {'a': [1, 2], 'b': [3, 4], 'c': [1, 2], 'd': [3, 4]}
df = pd.DataFrame(data=d)
What can I do to remove the columns in df based on list my_list, in this case remove columns a and b

This is very simple:
df = df.drop(columns=my_list)
drop removes columns by specifying a list of column names

This is a concise script using list comprehension: [df.pop(x) for x in my_list]
my_list = ['a', 'b']
d = {'a': [1, 2], 'b': [3, 4], 'c': [1, 2], 'd': [3, 4]}
df = pd.DataFrame(data=d)
print(df.to_markdown())
| | a | b | c | d |
|---:|----:|----:|----:|----:|
| 0 | 1 | 3 | 1 | 3 |
| 1 | 2 | 4 | 2 | 4 |
[df.pop(x) for x in my_list]
print(df.to_markdown())
| | c | d |
|---:|----:|----:|
| 0 | 1 | 3 |
| 1 | 2 | 4 |

You can select required columns as well:
cols_of_interest = ['c', 'd']
df = df[cols_of_interest]
if you have a range of columns to drop: for example 2 to 8, you can use:
df.drop(df.iloc[:,2:8].head(0).columns, axis=1)

pandas join on index of a particular column

I have three lists which look like this:
l1 = ["a", "b" , "c", "d", "e", "f", "g"]
l2 = ["a", "d", "f"]
l3 = ["b", "g"]
I would like to get a dataframe which looks like this:
| l1 | l2 | l3 |
|----|------|------|
| a | a | None |
| b | None | b |
| c | None | None |
| d | d | None |
| e | None | None |
| f | f | None |
| g | None | g |
I have tried to use the join/merge operations but could not figure this out.
How could i accomplish this?

You can do this using list comprehensions:
import pandas as pd
import numpy as np
a = [i if i in l2 else np.nan for i in l1]
b = [i if i in l3 else np.nan for i in l1]
df = pd.DataFrame({'l1': l1, 'l2': a, 'l3': b})
print(df)
Output:
l1 l2 l3
0 a a NaN
1 b NaN b
2 c NaN NaN
3 d d NaN
4 e NaN NaN
5 f f NaN
6 g NaN g

There are a few args in pd.merge that you can use for this purpose: left_on, right_on and how.
left_on allows you to specify which column in the left dataframe you would like to pandas to join on.
right_on is similar to left_on but for right dataframe.
how allows you to specify which type of join you would like to. In this case you probably want to perform a left join.
Learn more on this: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge.html
You can do something like this:
l1 = ["a", "b" , "c", "d", "e", "f", "g"]
l2 = ["a", "d", "f"]
l3 = ["b", "g"]
df = pd.DataFrame({'l1': l1})
df_l2 = pd.DataFrame({'l2': l2})
df_l3 = pd.DataFrame({'l3': l3})
df = pd.merge(df, df_l2, left_on='l1', right_on='l2', how='left')
df = pd.merge(df, df_l3, left_on='l1', right_on='l3', how='left')
Output:
l1 l2 l3
0 a a NaN
1 b NaN b
2 c NaN NaN
3 d d NaN
4 e NaN NaN
5 f f NaN
6 g NaN g

Plot in python after crosstab merge

I'd like to plot my DataFrame. I had this DF first:
id|project|categories|rating
1 | a | A | 1
1 | a | B | 1
1 | a | C | 2
1 | b | A | 1
1 | b | B | 1
2 | c | A | 1
2 | c | B | 2
used this code:
import pandas as pd
df = pd.DataFrame(...)
(df.groupby('id').project.nunique().reset_index()
.merge(pd.crosstab(df.id, df.categories).reset_index()))
and now got this DataFrame:
id | project | A | B | C |
1 | 2 | 2 | 2 | 1 |
2 | 1 | 1 | 1 | 0 |
Now I'd like to plot the DF. I want to show, if the number of projects depends on how many categories are affected, or which categories are affected. I know how to visualize dataframes, but after crosstab and merging, it is not working as usual

I reproduced your data using below code:
import pandas as pd
df = pd.DataFrame({'id': [1, 1, 1, 1, 1, 2, 2,],\
'project': ['a', 'a', 'a', 'b', 'b', 'c', 'c'],\
'categories': ['A', 'B', 'C', 'A', 'B', 'A', 'B'],\
'rating': [1, 1, 2, 1, 1, 1, 2]})
Now data looks like this
categories id project rating
0 A 1 a 1
1 B 1 a 1
2 C 1 a 2
3 A 1 b 1
4 B 1 b 1
5 A 2 c 1
6 B 2 c 2
If you want to plot 'category count' as a function of 'project count' it looks like this.
import matplotlib.pyplot as plt
# this line is your code
df2 = df.groupby('id').project.nunique().reset_index().merge(pd.crosstab(df.id, df.categories).reset_index())
plt.scatter(df2.project, df2.A, label='A', alpha=0.5)
plt.scatter(df2.project, df2.B, label='B', alpha=0.5)
plt.scatter(df2.project, df2.C, label='C', alpha=0.5)
plt.xlabel('project count')
plt.ylabel('category count')
plt.legend()
plt.show()
And you will get this

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to method-chain `ffill(axis=1)` in a dataframe - python

This replaces NA in column df.b with values from df.a using fillna instead of ffill: import numpy as np import pandas as pd df = ( pd.DataFrame({"a": [1, 2, 3, 4], "b": [10, np.nan, np.nan, 40], "c": ["a", "b", "c", "d"]}) .assign(b=lambda x: x.b.fillna(df.a)) ) display(df) df.dtypes Output:

df = pd.DataFrame({"a": [1, 2, 3, 4], "b": [10, np.nan, np.nan, 40], "c": ["a", "b", "c", "d"]}) df['b'] = df.b.fillna(df.a) | | a | b | c | |---:|----:|----:|:----| | 0 | 1 | 10 | a | | 1 | 2 | 2 | b | | 2 | 3 | 3 | c | | 3 | 4 | 40 | d |

Related

How to replace values in column in one DataFrame by values from second DataFrame both have major key in Python Pandas?

Consequently subtract columns and get the resulted cumsum

Remove pandas columns based on list

pandas join on index of a particular column

Plot in python after crosstab merge

Categories

Resources