I have the following dataframe:
import pandas as pd
data = {0: [-1, -14], 1: [-3, 2], 2: [7, 10], 4: [-10, 15]}
df = pd.DataFrame(data)
I know how to sort an specific row:
df.sort_values(by=0, ascending=False, axis=1)
How is it possible to sort the dataframe by the absolute value of the first row?
In this case I will have something like:
sorted_data = {0: [-10, 15], 1: [7, 10], 2: [-3, 2], 4: [-1, -14]}
sort series by slicing of row 0 and passing its index to indexing the original df
df_sorted = df[df.iloc[0].abs().sort_values(ascending=False).index]
Out[94]:
4 2 1 0
0 -10 7 -3 -1
1 15 10 2 -14
Pandas 1.1 gives a key argument :
df.sort_values(0, axis=1, key=np.abs, ascending=False)
4 2 1 0
0 -10 7 -3 -1
1 15 10 2 -14
Let us try argsort
df = df.iloc[:,(-df.loc[0].abs()).argsort()]
Related
i want to select the whole row in which the minimal value of 3 selected columns is found, in a dataframe like this:
it is supposed to look like this afterwards:
I tried something like
dfcheckminrow = dfquery[dfquery == dfquery['A':'C'].min().groupby('ID')]
obviously it didn't work out well.
Thanks in advance!
Bkeesey's answer looks like it almost got you to your solution. I added one more step to get the overall minimum for each group.
import pandas as pd
# create sample df
df = pd.DataFrame({'ID': [1, 1, 2, 2, 3, 3],
'A': [30, 14, 100, 67, 1, 20],
'B': [10, 1, 2, 5, 100, 3],
'C': [1, 2, 3, 4, 5, 6],
})
# set "ID" as the index
df = df.set_index('ID')
# get the min for each column
mindf = df[['A','B']].groupby('ID').transform('min')
# get the min between columns and add it to df
df['min'] = mindf.apply(min, axis=1)
# filter df for when A or B matches the min
df2 = df.loc[(df['A'] == df['min']) | (df['B'] == df['min'])]
print(df2)
In my simplified example, I'm just finding the minimum between columns A and B. Here's the output:
A B C min
ID
1 14 1 2 1
2 100 2 3 2
3 1 100 5 1
One method do filter the initial DataFrame based on a groupby conditional could be to use transform to find the minimum for a "ID" group and then use loc to filter the initial DataFrame where `any(axis=1) (checking rows) is met.
# create sample df
df = pd.DataFrame({'ID': [1, 1, 2, 2, 3, 3],
'A': [30, 14, 100, 67, 1, 20],
'B': [10, 1, 2, 5, 100, 3]})
# set "ID" as the index
df = df.set_index('ID')
Sample df:
A B
ID
1 30 10
1 14 1
2 100 2
2 67 5
3 1 100
3 20 3
Use groupby and transform to find minimum value based on "ID" group.
Then use loc to filter initial df to where any(axis=1) is valid
df.loc[(df == df.groupby('ID').transform('min')).any(axis=1)]
Output:
A B
ID
1 14 1
2 100 2
2 67 5
3 1 100
3 20 3
In this example only the first row should be removed as it in both columns is not a minimum for the "ID" group.
I am trying to find the actual value that corresponds to the absolute minimum from multiple columns. For example:
df = pd.DataFrame({'A': [10, -5, -20, 50], 'B': [-5, 10, 30, 300], 'C': [15, 30, 15, 10]})
The output for this should be another another column with values -5, -5, 15 and 10.
I tried df['D'] = df[['A', 'B', 'C']].abs().min(axis=1), but it returns the minimum of absolutes, thereby losing the sign.
Try with idxmin
df['D'] = df.values[df.index,df.columns.get_indexer(df[['A', 'B', 'C']].abs().idxmin(1))]
df
Out[176]:
A B C D
0 10 -5 15 -5
1 -5 10 30 -5
2 -20 30 15 15
3 50 300 10 10
I have two dataframes:
df_1 = pd.DataFrame({'a' : [7,8, 2], 'b': [6, 6, 11], 'c': [4, 8, 6]})
df_1
and
df_2 = pd.DataFrame({'d' : [8, 4, 12], 'e': [16, 2, 1], 'f': [9, 3, 4]})
df_2
My goal is something like:
In a way that 'in one shot' I can subtract each column multiple times.
I'm trying for loop but I´m stuck!
You can subtract them as numpy arrays (using .values) and then put the result in a dataframe:
df_3 = pd.DataFrame(df_1.values - df_2.values, columns=list('xyz'))
# x y z
# 0 -1 -10 -5
# 1 4 4 5
# 2 -10 10 2
Or rename df_1.columns and df_2.columns to ['x','y','z'] and you can subtract them directly:
df_1.columns = df_2.columns = list('xyz')
df_3 = df_1 - df_2
# x y z
# 0 -1 -10 -5
# 1 4 4 5
# 2 -10 10 2
I want to select rows from a dataframe based on values in the index combined with values in a specific column:
df = pd.DataFrame([[0, 2, 3], [0, 4, 1], [0, 20, 30], [40, 20, 30]],
index=[4, 5, 6, 7], columns=['A', 'B', 'C'])
A B C
4 0 2 3
5 0 4 1
6 0 20 30
7 40 20 30
with
df.loc[df['A'] == 0, 'C'] = 99
i can select all rows with column A = 0 and replace the value in column C with 99, but how can i select all rows with column A = 0 and the index < 6 (i want to combine selection on the index with selection on the column)?
You can use multiple conditions in your loc statement:
df.loc[(df.index < 6) & (df.A == 0), 'C'] = 99
What I have now:
import numpy as np
# 1) Read CSV with headers
data = np.genfromtxt("big.csv", delimiter=',', names=True)
# 2) Get absolute values for column in a new ndarray
new_ndarray = np.absolute(data["target_column_name"])
# 3) Append column in new_ndarray to data
# I'm having trouble here. Can't get hstack, concatenate, append, etc; to work
# 4) Sort by new column and obtain a new ndarray
data.sort(order="target_column_name_abs")
I would like:
A solution for 3): To be able to add this new "abs" column to the original ndarray or
Another approach to be able to sort a csv file by the absolute values of a column.
Here is a way to do it.
First, let's create a sample array:
In [39]: a = (np.arange(12).reshape(4, 3) - 6)
In [40]: a
Out[40]:
array([[-6, -5, -4],
[-3, -2, -1],
[ 0, 1, 2],
[ 3, 4, 5]])
Ok, lets say
In [41]: col = 1
which is the column we want to sort by,
and here is the sorting code - using Python's sorted:
In [42]: b = sorted(a, key=lambda row: np.abs(row[col]))
Let's convert b from list to array, and we have:
In [43]: np.array(b)
Out[43]:
array([[ 0, 1, 2],
[-3, -2, -1],
[ 3, 4, 5],
[-6, -5, -4]])
Which is the array with the rows sorted according to
the absolute value of column 1.
Here's a solution using pandas:
In [117]: import pandas as pd
In [118]: df = pd.read_csv('test.csv')
In [119]: df
Out[119]:
a b
0 1 -3
1 2 2
2 3 -1
3 4 4
In [120]: df['c'] = abs(df['b'])
In [121]: df
Out[121]:
a b c
0 1 -3 3
1 2 2 2
2 3 -1 1
3 4 4 4
In [122]: df.sort_values(by='c')
Out[122]:
a b c
2 3 -1 1
1 2 2 2
0 1 -3 3
3 4 4 4