Mathematical action on a column using list of indexes - python

I'd like to make a mathematical action (division by 2) on each cell of column1 that corresponds to the condition col1 > 5. And I would like to save this result in df. Any ideas how to do that?
I've tried apply with lambda, but had no success, cause all the df was changed to the values
df = pd.DataFrame(data = {'col1' : [8, 6, 2, 2],
'col2' : [2, 2, 1, 1],
'col3' : [4, 4, 4, 4]})
out = df[df_check.col1 > 5].index
'''
I expect the first column to look like [4, 3, 2 , 2]

You are very close as you got the indices. You just need to divide the values of those indices and assign to the same indices. It will look like this:
df = pd.DataFrame(data = {'col1' : [8, 6, 2, 2],
'col2' : [2, 2, 1, 1],
'col3' : [4, 4, 4, 4]})
# indices where the condition is true
ix = df[df['col1']>5].index
df['col1'][ix] = df[df['col1']>5]['col1']//2
>>> df
col1 col2 col3
0 4 2 4
1 3 2 4
2 2 1 4
3 2 1 4

Related

Check pandas df2.colA for occurrences of df1.id and write (df2.colB, df2.colC) into df1.colAB

I have two pandas df and they do not have the same length. df1 has unique id's in column id. These id's occur (multiple times) in df2.colA. I'd like to add a list of all occurrences of df1.id in df2.colA (and another column at the matching index of df1.id == df2.colA) into a new column in df1. Either with the index of df2.colA of the match or additionally with other row entries of all matches.
Example:
df1.id = [1, 2, 3, 4]
df2.colA = [3, 4, 4, 2, 1, 1]
df2.colB = [5, 9, 6, 5, 8, 7]
So that my operation creates something like:
df1.colAB = [ [[1,8],[1,7]], [[2,5]], [[3,5]], [[4,9],[4,6]] ]
I've tries a bunch of approaches with mapping, looping explicitly (super slow), checking with isin etc.
You could use Pandas apply to iterate over each row of df1 value while creating a list with all the indices in df2.colA. This can be achieved by using Pandas index and loc over the df2.colB to create a list with all the indices in df2.colA that match the row in df1.id. Then, within the apply itself use a for-loop to create the list of matched values.
import pandas as pd
# setup
df1 = pd.DataFrame({'id':[1,2,3,4]})
print(df1)
df2 = pd.DataFrame({
'colA' : [3, 4, 4, 2, 1, 1],
'colB' : [5, 9, 6, 5, 8, 7]
})
print(df2)
#code
df1['colAB'] = df1['id'].apply(lambda row:
[[row, idx] for idx in df2.loc[df2[df2.colA == row].index,'colB']])
print(df1)
Output from df1
id colAB
0 1 [[1, 8], [1, 7]]
1 2 [[2, 5]]
2 3 [[3, 5]]
3 4 [[4, 9], [4, 6]]

pandas groupby & lambda function to return nlargest(2)

Please see pandas df:
pd.DataFrame({'id': [1, 1, 2, 2, 2, 3],
'pay_date': ['Jul1', 'Jul2', 'Jul8', 'Aug5', 'Aug7', 'Aug22'],
'id_ind': [1, 2, 1, 2, 3, 1]})
I am trying to groupby 'id' and 'pay_date'. I only want to keep df['id_ind'].nlargest(2) in the dataframe after grouping by 'id' and 'pay_date'. Here is my code:
df = pd.DataFrame(df.groupby(['id', 'pay_date'])['id_ind'].apply(
lambda x: x.nlargest(2)).reset_index()
This does not work, as the new df returns all the records. If it worked, 'id'==2 would only appear twice in the df, as there are 3 records and I only want the 2 largest by 'id_ind'.
My desired output:
pd.DataFrame({'id': [1, 1, 2, 2, 3],
'pay_date': ['Jul1', 'Jul2', 'Aug5', 'Aug7', 'Aug22'],
'id_ind': [1, 2, 2, 3, 1]})
Sort on id_ind and doing groupby.tail
df_final = (df.sort_values('id_ind').groupby('id').tail(2)
.sort_index()
.reset_index(drop=True))
Out[29]:
id id_ind pay_date
0 1 1 Jul1
1 1 2 Jul2
2 2 2 Aug5
3 2 3 Aug7
4 3 1 Aug22

Maximum of an array constituting a pandas dataframe cell

I have a pandas dataframe in which a column is formed by arrays. So every cell is an array.
Say there is a column A in dataframe df, such that
A = [ [1, 2, 3],
[4, 5, 6],
[7, 8, 9],
... ]
I want to operate in each array and get, e.g. the maximum of each array, and store it in another column.
In the example, I would like to obtain another column
B = [ 3,
6,
9,
...]
I have tried these approaches so far, none of them giving what I want.
df['B'] = np.max(df['A']);#
df.applymap (lambda B: A.max())
df['B'] = df.applymap (lambda B: np.max(np.array(df['A'].tolist()),0))
How should I proceed? And is this the best way to have my dataframe organized?
You can just apply(max). It doesn't matter if the values are lists or np.array.
df = pd.DataFrame({'a': [[1, 2, 3], [4, 5, 6], [7, 8, 9]]})
df['b'] = df['a'].apply(max)
print(df)
Outputs
a b
0 [1, 2, 3] 3
1 [4, 5, 6] 6
2 [7, 8, 9] 9
Here is one way without apply:
df['B']=np.max(df['A'].values.tolist(),axis=1)
A B
0 [1, 2, 3] 3
1 [4, 5, 6] 6
2 [7, 8, 9] 9

Duplicating rows with certain value in a column

I have to duplicate rows that have a certain value in a column and replace the value with another value.
For instance, I have this data:
import pandas as pd
df = pd.DataFrame({'Date': [1, 2, 3, 4], 'B': [1, 2, 3, 2], 'C': ['A','B','C','D']})
Now, I want to duplicate the rows that have 2 in column 'B' then change 2 to 4
df = pd.DataFrame({'Date': [1, 2, 2, 3, 4, 4], 'B': [1, 2, 4, 3, 2, 4], 'C': ['A','B','B','C','D','D']})
Please help me on this one. Thank you.
You can use append, to append the rows where B == 2, which you can extract using loc, but also reassigning B to 4 using assign. If order matters, you can then order by C (to reproduce your desired frame):
>>> df.append(df[df.B.eq(2)].assign(B=4)).sort_values('C')
B C Date
0 1 A 1
1 2 B 2
1 4 B 2
2 3 C 3
3 2 D 4
3 4 D 4

Convert pandas dataframe to a column-based order list

I want to convert pandas dataframe into a list.
For example, I have a dataframe like below, and I want to make list with all columns.
Dataframe (df)
A B C
0 4 8
1 5 9
2 6 10
3 7 11
Expected result
[[0,1,2,3], [4,5,6,7], [8,9,10,11]]
If I use df.values.tolist(), it will return in row-based order list like below.
[[0,4,8], [1,5,9], [2,6,10], [3,7,11]]
It is possible to transpose the dataframe, but I want to know whether there are better solutions.
I think simpliest is transpose.
Use T or numpy.ndarray.transpose:
df1 = df.T.values.tolist()
print (df1)
[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
Or:
df1 = df.values.transpose().tolist()
print (df1)
[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
Another answer with list comprehension, thank you John Galt:
L = [df[x].tolist() for x in df.columns]

Categories

Resources