Select a column based on name of a row (pandas) [duplicate] - python

This question already has an answer here:
Create a column that's a combination of other columns
(1 answer)
Closed 3 years ago.
I'm trying to select values of a column based on a value of a row using pandas.
For example:
names A B C D
A 1 0 2 0
A 2 1 0 0
C 1 0 4 0
A 2 0 0 0
D 0 1 2 5
As output I want something like:
name value
A 1
A 2
C 4
A 2
D 5
Is there a very fast (efficient) way to do such operation?

Try look at lookup
df.lookup(df.index,df.names)
Out[390]: array([1, 2, 4, 2, 5], dtype=int64)
#df['value']=df.lookup(df.index,df.names)

You can for example use the multi-purpose apply function:
import pandas
data = {
'names': ['A', 'A', 'C', 'A', 'D'],
'A': [1, 2, 1, 2, 0],
'B': [0, 1, 0, 0, 1],
'C': [2, 0, 4, 0, 2],
'D': [0, 0, 0, 0, 5],
}
df = pandas.DataFrame(data)
print(df)
def process_it(row):
return row[row['names']]
df['selection'] = df.apply(process_it, axis=1)
print(df[['names', 'selection']])

Related

Pandas DataFrame filter by multiple column criterias and multiple intervals

I have checked several answers but found no luck so far.
My dataset is like this:
df = pd.DataFrame({
'Location':['A', 'A', 'A', 'B', 'C', 'C'],
'Place':[1, 2, 3, 4, 2, 3],
'Value1':[1, 1, 2, 3, 4, 5],
'Value2':[1, 1, 2, 3, 4, 5]
}, columns = ['Location','Place','Value1','Value2'])
Location Place Value1 Value2
A 1 1 1
A 2 1 1
A 3 2 2
B 4 3 3
C 2 4 4
C 3 5 5
and I have a list of intervals:
A: [0, 1]
A: [3, 5]
B: [1, 3]
C: [1, 4]
C: [6, 10]
Now I want that every row that have Location equal to that of the filter list, should have the Place in range of the filter. So the desired output will be:
Location Place Value1 Value2
A 1 1 1
A 3 2 2
C 2 4 4
C 3 5 5
I know that I can chain multiple between conditions by | , but I have a really long list of intervals so manually enter the condition is not feasible. I also consider forloop to slice the data by location first, but I think there could be more efficient way.
Thank you for your help.
Edit: Currently the list of intervals is just strings like this
A 0 1
A 3 5
B 1 3
C 1 4
C 6 10
but I would like to slice them into list of dicts. Better structure for it is also welcome!
First define dataframe df and filters dff:
df = pd.DataFrame({
'Location':['A', 'A', 'A', 'B', 'C', 'C'],
'Place':[1, 2, 3, 4, 2, 3],
'Value1':[1, 1, 2, 3, 4, 5],
'Value2':[1, 1, 2, 3, 4, 5]
}, columns = ['Location','Place','Value1','Value2'])
dff = pd.DataFrame({'Location':['A','A','B','C','C'],
'fPlace':[[0,1], [3, 5], [1, 3], [1, 4], [6, 10]]})
dff[['p1', 'p2']] = pd.DataFrame(dff["fPlace"].to_list())
now dff is:
Location fPlace p1 p2
0 A [0, 1] 0 1
1 A [3, 5] 3 5
2 B [1, 3] 1 3
3 C [1, 4] 1 4
4 C [6, 10] 6 10
where fPlace transformed to lower and upper bounds p1 and p2 indicates filters that should be applied to Place. Next:
df.merge(dff).query('Place >= p1 and Place <= p2').drop(columns = ['fPlace','p1','p2'])
result:
Location Place Value1 Value2
0 A 1 1 1
5 A 3 2 2
7 C 2 4 4
9 C 3 5 5
Prerequisites:
# presumed setup for your intervals:
intervals = {
"A": [
[0, 1],
[3, 5],
],
"B": [
[1, 3],
],
"C": [
[1, 4],
[6, 10],
],
}
Actual solution:
x = df["Location"].map(intervals).explode().str
l, r = x[0], x[1]
res = df["Place"].loc[l.index].between(l, r)
res = res.loc[res].index.unique()
res = df.loc[res]
Outputs:
>>> res
Location Place Value1 Value2
0 A 1 1 1
2 A 3 2 2
4 C 2 4 4
5 C 3 5 5

pandas groupby & lambda function to return nlargest(2)

Please see pandas df:
pd.DataFrame({'id': [1, 1, 2, 2, 2, 3],
'pay_date': ['Jul1', 'Jul2', 'Jul8', 'Aug5', 'Aug7', 'Aug22'],
'id_ind': [1, 2, 1, 2, 3, 1]})
I am trying to groupby 'id' and 'pay_date'. I only want to keep df['id_ind'].nlargest(2) in the dataframe after grouping by 'id' and 'pay_date'. Here is my code:
df = pd.DataFrame(df.groupby(['id', 'pay_date'])['id_ind'].apply(
lambda x: x.nlargest(2)).reset_index()
This does not work, as the new df returns all the records. If it worked, 'id'==2 would only appear twice in the df, as there are 3 records and I only want the 2 largest by 'id_ind'.
My desired output:
pd.DataFrame({'id': [1, 1, 2, 2, 3],
'pay_date': ['Jul1', 'Jul2', 'Aug5', 'Aug7', 'Aug22'],
'id_ind': [1, 2, 2, 3, 1]})
Sort on id_ind and doing groupby.tail
df_final = (df.sort_values('id_ind').groupby('id').tail(2)
.sort_index()
.reset_index(drop=True))
Out[29]:
id id_ind pay_date
0 1 1 Jul1
1 1 2 Jul2
2 2 2 Aug5
3 2 3 Aug7
4 3 1 Aug22

Duplicating rows with certain value in a column

I have to duplicate rows that have a certain value in a column and replace the value with another value.
For instance, I have this data:
import pandas as pd
df = pd.DataFrame({'Date': [1, 2, 3, 4], 'B': [1, 2, 3, 2], 'C': ['A','B','C','D']})
Now, I want to duplicate the rows that have 2 in column 'B' then change 2 to 4
df = pd.DataFrame({'Date': [1, 2, 2, 3, 4, 4], 'B': [1, 2, 4, 3, 2, 4], 'C': ['A','B','B','C','D','D']})
Please help me on this one. Thank you.
You can use append, to append the rows where B == 2, which you can extract using loc, but also reassigning B to 4 using assign. If order matters, you can then order by C (to reproduce your desired frame):
>>> df.append(df[df.B.eq(2)].assign(B=4)).sort_values('C')
B C Date
0 1 A 1
1 2 B 2
1 4 B 2
2 3 C 3
3 2 D 4
3 4 D 4

Pandas change rows in DataFrame without assignment

Say we have a Python Pandas DataFrame:
In[1]: df = pd.DataFrame({'A': [1, 1, 2, 3, 5],
'B': [5, 6, 7, 8, 9]})
In[2]: print(df)
A B
0 1 5
1 1 6
2 2 7
3 3 8
4 5 9
I want to change rows that match a certain condition. I know that this can be done via direct assignment:
In[3]: df[df.A==1] = pd.DataFrame([{'A': 0, 'B': 5},
{'A': 0, 'B': 6}])
In[4]: print(df)
A B
0 0 5
1 0 6
2 2 7
3 3 8
4 5 9
My question is: Is there an equivalent solution to the above assignment that would return a new DataFrame with the rows changed, i.e. a stateless solution? I'm looking for something like pandas.DataFrame.assign but which acts on rows instead of columns.
DataFrame.copy
df2 = df.copy()
df2[df.A == 1] = pd.DataFrame([{'A': 0, 'B': 5}, {'A': 0, 'B': 6}])
DataFrame.mask + fillna
m = df.A == 1
fill_df = pd.DataFrame([{'A': 0, 'B': 5}, {'A': 0, 'B': 6}], index=df.index[m])
df2 = df.mask(m).fillna(fill_df)

Pandas: Compare every two rows and output result to a new dataframe

import pandas as pd
df1 = pd.DataFrame({'ID':['i1', 'i2', 'i3'],
'A': [2, 3, 1],
'B': [1, 1, 2],
'C': [2, 1, 0],
'D': [3, 1, 2]})
df1.set_index('ID')
df1.head()
A B C D
ID
i1 2 1 2 3
i2 3 1 1 1
i3 1 2 0 2
df2 = pd.DataFrame({'ID':['i1-i2', 'i1-i3', 'i2-i3'],
'A': [2, 1, 1],
'B': [1, 1, 1],
'C': [1, 0, 0],
'D': [1, 1, 1]})
df2.set_index('ID')
df2
A B C D
ID
i1-i2 2 1 1 1
i1-i3 1 1 0 1
i2-i3 1 1 0 1
Given a data frame as df1, I want to compare every two different rows, and get the smaller value at each column, and output the result to a new data frame like df2.
For example, to compare i1 row and i2 row, get new row i1-i2 as 2, 1, 1, 1
Please advise what is the best way of pandas to do that.
Try this:
from itertools import combinations
v = df1.values
r = pd.DataFrame([np.minimum(v[t[0]], v[t[1]])
for t in combinations(np.arange(len(df1)), 2)],
columns=df1.columns,
index=list(combinations(df1.index, 2)))
Result:
In [72]: r
Out[72]:
A B C D
(i1, i2) 2 1 1 1
(i1, i3) 1 1 0 2
(i2, i3) 1 1 0 1

Categories

Resources