Sorting dataframe by absolute value of a row

Sorting dataframe by absolute value of a row - python

I have the following dataframe:
import pandas as pd
data = {0: [-1, -14], 1: [-3, 2], 2: [7, 10], 4: [-10, 15]}
df = pd.DataFrame(data)
I know how to sort an specific row:
df.sort_values(by=0, ascending=False, axis=1)
How is it possible to sort the dataframe by the absolute value of the first row?
In this case I will have something like:
sorted_data = {0: [-10, 15], 1: [7, 10], 2: [-3, 2], 4: [-1, -14]}

sort series by slicing of row 0 and passing its index to indexing the original df
df_sorted = df[df.iloc[0].abs().sort_values(ascending=False).index]
Out[94]:
4 2 1 0
0 -10 7 -3 -1
1 15 10 2 -14

Pandas 1.1 gives a key argument :
df.sort_values(0, axis=1, key=np.abs, ascending=False)
4 2 1 0
0 -10 7 -3 -1
1 15 10 2 -14

Let us try argsort
df = df.iloc[:,(-df.loc[0].abs()).argsort()]

Related

pandas groupby ID and select row with minimal value of specific columns

i want to select the whole row in which the minimal value of 3 selected columns is found, in a dataframe like this:
it is supposed to look like this afterwards:
I tried something like
dfcheckminrow = dfquery[dfquery == dfquery['A':'C'].min().groupby('ID')]
obviously it didn't work out well.
Thanks in advance!

Bkeesey's answer looks like it almost got you to your solution. I added one more step to get the overall minimum for each group.
import pandas as pd
# create sample df
df = pd.DataFrame({'ID': [1, 1, 2, 2, 3, 3],
'A': [30, 14, 100, 67, 1, 20],
'B': [10, 1, 2, 5, 100, 3],
'C': [1, 2, 3, 4, 5, 6],
})
# set "ID" as the index
df = df.set_index('ID')
# get the min for each column
mindf = df[['A','B']].groupby('ID').transform('min')
# get the min between columns and add it to df
df['min'] = mindf.apply(min, axis=1)
# filter df for when A or B matches the min
df2 = df.loc[(df['A'] == df['min']) | (df['B'] == df['min'])]
print(df2)
In my simplified example, I'm just finding the minimum between columns A and B. Here's the output:
A B C min
ID
1 14 1 2 1
2 100 2 3 2
3 1 100 5 1

One method do filter the initial DataFrame based on a groupby conditional could be to use transform to find the minimum for a "ID" group and then use loc to filter the initial DataFrame where `any(axis=1) (checking rows) is met.
# create sample df
df = pd.DataFrame({'ID': [1, 1, 2, 2, 3, 3],
'A': [30, 14, 100, 67, 1, 20],
'B': [10, 1, 2, 5, 100, 3]})
# set "ID" as the index
df = df.set_index('ID')
Sample df:
A B
ID
1 30 10
1 14 1
2 100 2
2 67 5
3 1 100
3 20 3
Use groupby and transform to find minimum value based on "ID" group.
Then use loc to filter initial df to where any(axis=1) is valid
df.loc[(df == df.groupby('ID').transform('min')).any(axis=1)]
Output:
A B
ID
1 14 1
2 100 2
2 67 5
3 1 100
3 20 3
In this example only the first row should be removed as it in both columns is not a minimum for the "ID" group.

Pandas find value corresponding to absolute minimum

I am trying to find the actual value that corresponds to the absolute minimum from multiple columns. For example:
df = pd.DataFrame({'A': [10, -5, -20, 50], 'B': [-5, 10, 30, 300], 'C': [15, 30, 15, 10]})
The output for this should be another another column with values -5, -5, 15 and 10.
I tried df['D'] = df[['A', 'B', 'C']].abs().min(axis=1), but it returns the minimum of absolutes, thereby losing the sign.

Try with idxmin
df['D'] = df.values[df.index,df.columns.get_indexer(df[['A', 'B', 'C']].abs().idxmin(1))]
df
Out[176]:
A B C D
0 10 -5 15 -5
1 -5 10 30 -5
2 -20 30 15 15
3 50 300 10 10

subtract multiple columns at once

I have two dataframes:
df_1 = pd.DataFrame({'a' : [7,8, 2], 'b': [6, 6, 11], 'c': [4, 8, 6]})
df_1
and
df_2 = pd.DataFrame({'d' : [8, 4, 12], 'e': [16, 2, 1], 'f': [9, 3, 4]})
df_2
My goal is something like:
In a way that 'in one shot' I can subtract each column multiple times.
I'm trying for loop but I´m stuck!

You can subtract them as numpy arrays (using .values) and then put the result in a dataframe:
df_3 = pd.DataFrame(df_1.values - df_2.values, columns=list('xyz'))
# x y z
# 0 -1 -10 -5
# 1 4 4 5
# 2 -10 10 2
Or rename df_1.columns and df_2.columns to ['x','y','z'] and you can subtract them directly:
df_1.columns = df_2.columns = list('xyz')
df_3 = df_1 - df_2
# x y z
# 0 -1 -10 -5
# 1 4 4 5
# 2 -10 10 2

Pandas Dataframe row selection combined condition index- and column values

I want to select rows from a dataframe based on values in the index combined with values in a specific column:
df = pd.DataFrame([[0, 2, 3], [0, 4, 1], [0, 20, 30], [40, 20, 30]],
index=[4, 5, 6, 7], columns=['A', 'B', 'C'])
A B C
4 0 2 3
5 0 4 1
6 0 20 30
7 40 20 30
with
df.loc[df['A'] == 0, 'C'] = 99
i can select all rows with column A = 0 and replace the value in column C with 99, but how can i select all rows with column A = 0 and the index < 6 (i want to combine selection on the index with selection on the column)?

You can use multiple conditions in your loc statement:
df.loc[(df.index < 6) & (df.A == 0), 'C'] = 99

How to sort numpy array by absolute value of a column?

What I have now:
import numpy as np
# 1) Read CSV with headers
data = np.genfromtxt("big.csv", delimiter=',', names=True)
# 2) Get absolute values for column in a new ndarray
new_ndarray = np.absolute(data["target_column_name"])
# 3) Append column in new_ndarray to data
# I'm having trouble here. Can't get hstack, concatenate, append, etc; to work
# 4) Sort by new column and obtain a new ndarray
data.sort(order="target_column_name_abs")
I would like:
A solution for 3): To be able to add this new "abs" column to the original ndarray or
Another approach to be able to sort a csv file by the absolute values of a column.

Here is a way to do it.
First, let's create a sample array:
In [39]: a = (np.arange(12).reshape(4, 3) - 6)
In [40]: a
Out[40]:
array([[-6, -5, -4],
[-3, -2, -1],
[ 0, 1, 2],
[ 3, 4, 5]])
Ok, lets say
In [41]: col = 1
which is the column we want to sort by,
and here is the sorting code - using Python's sorted:
In [42]: b = sorted(a, key=lambda row: np.abs(row[col]))
Let's convert b from list to array, and we have:
In [43]: np.array(b)
Out[43]:
array([[ 0, 1, 2],
[-3, -2, -1],
[ 3, 4, 5],
[-6, -5, -4]])
Which is the array with the rows sorted according to
the absolute value of column 1.

Here's a solution using pandas:
In [117]: import pandas as pd
In [118]: df = pd.read_csv('test.csv')
In [119]: df
Out[119]:
a b
0 1 -3
1 2 2
2 3 -1
3 4 4
In [120]: df['c'] = abs(df['b'])
In [121]: df
Out[121]:
a b c
0 1 -3 3
1 2 2 2
2 3 -1 1
3 4 4 4
In [122]: df.sort_values(by='c')
Out[122]:
a b c
2 3 -1 1
1 2 2 2
0 1 -3 3
3 4 4 4

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Sorting dataframe by absolute value of a row - python

sort series by slicing of row 0 and passing its index to indexing the original df df_sorted = df[df.iloc[0].abs().sort_values(ascending=False).index] Out[94]: 4 2 1 0 0 -10 7 -3 -1 1 15 10 2 -14

Pandas 1.1 gives a key argument : df.sort_values(0, axis=1, key=np.abs, ascending=False) 4 2 1 0 0 -10 7 -3 -1 1 15 10 2 -14

Let us try argsort df = df.iloc[:,(-df.loc[0].abs()).argsort()]

Related

pandas groupby ID and select row with minimal value of specific columns

Pandas find value corresponding to absolute minimum

subtract multiple columns at once

Pandas Dataframe row selection combined condition index- and column values

How to sort numpy array by absolute value of a column?

Categories

Resources