Lookup values in cells based on values in another column - python

I have a pandas dataframe that looks like:
Best_val A B C Value(1 - Best_Val)
A 0.1 0.29 0.3 0.9
B 0.33 0.21 0.45 0.79
A 0.16 0.71 0.56 0.84
C 0.51 0.26 0.85 0.15
I want to fetch the column value from Best_val for that row an use it as column name to subtract t from 1 to be stored in Value

Use DataFrame.lookup for performance.
df['Value'] = 1 - df.lookup(df.index, df.BestVal)
df
BestVal A B C Value
0 A 0.10 0.29 0.30 0.90
1 B 0.33 0.21 0.45 0.79
2 A 0.16 0.71 0.56 0.84
3 C 0.51 0.26 0.85 0.15

You could use apply:
import pandas as pd
data = [['A', 0.1, 0.29, 0.3],
['B', 0.33, 0.21, 0.45],
['A', 0.16, 0.71, 0.56],
['C', 0.51, 0.26, 0.85]]
df = pd.DataFrame(data=data, columns=['BestVal', 'A', 'B', 'C'])
df['Value'] = df.apply(lambda x: 1 - x[x.BestVal], axis=1)
print(df)
Output
BestVal A B C Value
0 A 0.10 0.29 0.30 0.90
1 B 0.33 0.21 0.45 0.79
2 A 0.16 0.71 0.56 0.84
3 C 0.51 0.26 0.85 0.15

Related

Convert pandas dateframe from row to column [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 12 months ago.
I have the following data frame:
ID
value
freq
A
a
0.1
A
b
0.12
A
c
0.19
B
a
0.15
B
b
0.2
B
c
0.09
C
a
0.39
C
b
0.15
C
c
0.01
and I would like to get the following
ID
freq_a
freq_b
freq_c
A
0.1
0.12
0.19
B
0.15
0.2
0.09
C
0.39
0.15
0.01
Any ideas how to easily do this?
using pivot:
df.pivot(index='ID', columns='value', values='freq').add_prefix('freq_').reset_index()
output:
>>
value ID freq_a freq_b freq_c
0 A 0.10 0.12 0.19
1 B 0.15 0.20 0.09
2 C 0.39 0.15 0.01
Use pivot_table:
out = df.pivot_table('freq', 'ID', 'value').add_prefix('freq_') \
.rename_axis(columns=None).reset_index()
print(out)
# Output
ID freq_a freq_b freq_c
0 A 0.10 0.12 0.19
1 B 0.15 0.20 0.09
2 C 0.39 0.15 0.01

Find matching column interval in pandas

I have a pandas dataframe with multiple columns were their values increase from some value between 0 and 1 for column A up to column E which is always 1 (representing cumulative probabilities).
ID A B C D E SIM
1: 0.49 0.64 0.86 0.97 1.00 0.98
2: 0.76 0.84 0.98 0.99 1.00 0.87
3: 0.32 0.56 0.72 0.92 1.00 0.12
The column SIM represents a column with random uniform numbers.
I wish to add a new column SIM_CAT with values equal to the column-name which value is the right boundary of the interval in which the value in column SIM falls:
ID A B C D E SIM SIM_CAT
1: 0.49 0.64 0.86 0.97 1.00 0.98 E
2: 0.76 0.84 0.98 0.99 1.00 0.87 C
3: 0.32 0.56 0.72 0.92 1.00 0.12 A
I there a concise way to do that?
You can compare columns with SIM and use idxmax to find the 1st greater value:
cols = list('ABCDE')
df['SIM_CAT'] = df[cols].ge(df.SIM, axis=0).idxmax(axis=1)
df
ID A B C D E SIM SIM_CAT
0 1: 0.49 0.64 0.86 0.97 1.0 0.98 E
1 2: 0.76 0.84 0.98 0.99 1.0 0.87 C
2 3: 0.32 0.56 0.72 0.92 1.0 0.12 A
If SIM can contain values greater than 1:
cols = list('ABCDE')
df['SIM_CAT'] = None
df.loc[df.SIM <= 1, 'SIM_CAT'] = df[cols].ge(df.SIM, axis=0).idxmax(axis=1)
df
ID A B C D E SIM SIM_CAT
0 1: 0.49 0.64 0.86 0.97 1.0 0.98 E
1 2: 0.76 0.84 0.98 0.99 1.0 0.87 C
2 3: 0.32 0.56 0.72 0.92 1.0 0.12 A

Python(Pandas) - Create a column by matching column's values into dataframe

I have the below assumed dataframe
a b c d e F
0.02 0.62 0.31 0.67 0.27 a
0.30 0.07 0.23 0.42 0.00 a
0.82 0.59 0.34 0.73 0.29 a
0.90 0.80 0.13 0.14 0.07 d
0.50 0.62 0.94 0.34 0.53 d
0.59 0.84 0.95 0.42 0.54 d
0.13 0.33 0.87 0.20 0.25 d
0.47 0.37 0.84 0.69 0.28 e
Column F represents the columns of the dataframe.
For each row of column F I want to find relevant row and column from the rest of the dataframe and return the values into one column
The outcome will look like this:
a b c d e f To_Be_Filled
0.02 0.62 0.31 0.67 0.27 a 0.02
0.30 0.07 0.23 0.42 0.00 a 0.30
0.82 0.59 0.34 0.73 0.29 a 0.82
0.90 0.80 0.13 0.14 0.07 d 0.14
0.50 0.62 0.94 0.34 0.53 d 0.34
0.59 0.84 0.95 0.42 0.54 d 0.42
0.13 0.33 0.87 0.20 0.25 d 0.20
0.47 0.37 0.84 0.69 0.28 e 0.28
I am able to identify each case with the below, but not sure how to do it across the whole dataframe.
test.loc[test.iloc[:,5]==a,test.columns==a]
Many thanks in advance.
You can use lookup:
df['To_Be_Filled'] = df.lookup(np.arange(len(df)), df['F'])
df
Out:
a b c d e F To_Be_Filled
0 0.02 0.62 0.31 0.67 0.27 a 0.02
1 0.30 0.07 0.23 0.42 0.00 a 0.30
2 0.82 0.59 0.34 0.73 0.29 a 0.82
3 0.90 0.80 0.13 0.14 0.07 d 0.14
4 0.50 0.62 0.94 0.34 0.53 d 0.34
5 0.59 0.84 0.95 0.42 0.54 d 0.42
6 0.13 0.33 0.87 0.20 0.25 d 0.20
7 0.47 0.37 0.84 0.69 0.28 e 0.28
np.arange(len(df)) can be replaced with df.index.

python pandas change dataframe to pivoted columns

I have a dataframe that looks as following:
Type Month Value
A 1 0.29
A 2 0.90
A 3 0.44
A 4 0.43
B 1 0.29
B 2 0.50
B 3 0.14
B 4 0.07
I want to change the dataframe to following format:
Type A B
1 0.29 0.29
2 0.90 0.50
3 0.44 0.14
4 0.43 0.07
Is this possible ?
Use set_index + unstack
df.set_index(['Month', 'Type']).Value.unstack()
Type A B
Month
1 0.29 0.29
2 0.90 0.50
3 0.44 0.14
4 0.43 0.07
To match your exact output
df.set_index(['Month', 'Type']).Value.unstack().rename_axis(None)
Type A B
1 0.29 0.29
2 0.90 0.50
3 0.44 0.14
4 0.43 0.07
Pivot solution:
In [70]: df.pivot(index='Month', columns='Type', values='Value')
Out[70]:
Type A B
Month
1 0.29 0.29
2 0.90 0.50
3 0.44 0.14
4 0.43 0.07
In [71]: df.pivot(index='Month', columns='Type', values='Value').rename_axis(None)
Out[71]:
Type A B
1 0.29 0.29
2 0.90 0.50
3 0.44 0.14
4 0.43 0.07
You're having a case of long format table which you want to transform to a wide format.
This is natively handled in pandas:
df.pivot(index='Month', columns='Type', values='Value')

Python Pandas Shift Dataframe Column Down Into Rows (reset index on column?)

How would you drop / reset the column axis to shift the data down causing the column headers to be something like [0, 1, 2, 3, 4, 5] then set column headers to df[5] values? I reset the index of the rows axis all the time but never had the need to do it to columns.
df = pd.DataFrame({'very_low': ['High', 'Low', 'Middle', 'Low'], '0.2': [0.10000000000000001, 0.050000000000000003, 0.14999999999999999, 0.080000000000000002], '0.1': [0.080000000000000002, 0.059999999999999998, 0.10000000000000001, 0.080000000000000002], '0.4': [0.90000000000000002, 0.33000000000000002, 0.29999999999999999, 0.23999999999999999], '0': [0.080000000000000002, 0.059999999999999998, 0.10000000000000001, 0.080000000000000002], '0.3': [0.23999999999999999, 0.25, 0.65000000000000002, 0.97999999999999998]})
0 0.1 0.2 0.3 0.4 very_low
0 0.08 0.08 0.10 0.24 0.90 High
1 0.06 0.06 0.05 0.25 0.33 Low
2 0.10 0.10 0.15 0.65 0.30 Middle
3 0.08 0.08 0.08 0.98 0.24 Low
If I understood you correctly, something like this?
df2 = pd.concat([pd.DataFrame(df.columns).T, pd.DataFrame(df.values)],
ignore_index=True).iloc[:, :-1]
df2.columns = [df.columns[-1]] + df.iloc[:, -1].tolist()
>>> df2
very_low High Low Middle Low
0 0 0.1 0.2 0.3 0.4
1 0.08 0.08 0.1 0.24 0.9
2 0.06 0.06 0.05 0.25 0.33
3 0.1 0.1 0.15 0.65 0.3
4 0.08 0.08 0.08 0.98 0.24
I think this is what you want:
tdf = df.T
tdf.columns = tdf.iloc[5]
tdf.drop(tdf.tail(1).index,inplace=True)
>>> tdf
very_low High Low Middle Low
0 0.08 0.06 0.1 0.08
0.1 0.08 0.06 0.1 0.08
0.2 0.1 0.05 0.15 0.08
0.3 0.24 0.25 0.65 0.98
0.4 0.9 0.33 0.3 0.24

Categories

Resources