Pandas Row-Indexing without looping - python

I'm attempting to row index by using pandas indexing, but it seems that there isn't an appropriate way to input a list for this. This is the solution I'm trying to use without loops.
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3, 4, 9], 'b': [3, 3, 4, 5, 100]})
print(df)
interest = [3, 4]
# results = df['a'].eq(interest)
# results = df[(df['a'] == 3) & (df['a'] == 4)]
df(results)
# print(df[df['b'] == 3]) # index 0 1 2
With loops, I'm able to get my desired result.
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3, 4, 9], 'b': [3, 3, 4, 5, 100]})
print(df)
lst = [3,4]
print('index values are : {}'.format(lst))
results = pd.DataFrame()
for itr in lst:
if results.empty:
results = df[ df['a'] == itr]
else:
results = results.append(df[ df['a'] == itr])
print('result : \n{}'.format(results))
I've search but most documentation will index both columns 'a' and 'b' and/or only use one value at a time for indexing, rather than a list. Let me know if I wasn't clear

IIUC you want .isin?
>>> df[df.a.isin([3,4])]
a b
2 3 4
3 4 5

Related

How to change values in one column based on whether conditions in columns A and B are met in Pandas/Python

I have a data frame. I want to change values in column C to null values based on whether conditions in columns A and B are met. To do this, I think I need to iterate over the rows of the dataframe, but I can't figure out how:
import pandas as pd
df = pd.DataFrame({'A': [1, 4, 1, 4], 'B': [9, 2, 5, 3], 'C': [0, 0, 5, 3]})
dataframe image
I tried something like this:
for row in df.iterrows()
if df['A'] > 2 and df['B'] == 3:
df['C'] == np.nan
but I just keep getting errors. Could someone please show me how to do this?
Yours is not a DataFrame, it's a dictionary. This is a DataFrame:
import pandas as pd
df = pd.DataFrame({'A': [1, 4, 1, 4], 'B': [9, 2, 5, 3], 'C': [0, 0, 5, 3]})
It is usually faster to use pandas/numpy arithmetic instead of regular Python loops.
df.loc[(df['A'].values > 2) & (df['B'].values == 3), 'C'] = np.nan
Or if you insist on your way of coding, the code (besides converting df to a real DataFrame) can be updated to:
import numpy as np
import pandas as pd
df = pd.DataFrame({'A': [1, 4, 1, 4], 'B': [9, 2, 5, 3], 'C': [0, 0, 5, 3]})
for i, row in df.iterrows():
if row.loc['A'] > 2 and row.loc['B'] == 3:
df.loc[i, 'C'] = np.nan
or
import numpy as np
import pandas as pd
df = pd.DataFrame({'A': [1, 4, 1, 4], 'B': [9, 2, 5, 3], 'C': [0, 0, 5, 3]})
for i, row in df.iterrows():
if df.loc[i, 'A'] > 2 and df.loc[i, 'B'] == 3:
df.loc[i, 'C'] = np.nan
You can try
df.loc[(df["A"].values > 2) & (df["B"].values==3), "C"] = None
Using pandas and numpy is way easier for you :D

Is there an elegant way to iterate over index and one column of a pandas dataframe?

I'd like have a loop that iterates over both the index, and the entries in one specific column of a dataframe. I've found a solution that works, but I feel there should be something more elegant. Any suggestions?
Working example:
import pandas as pd
df = pd.DataFrame(index = [10, 20, 30])
df['A'] = [1, 2, 3]
df['B'] = [5, 7, 9]
# This is the part that feels like it could be more elegant
for i, v in zip(df.index, df['A']):
print(i, v)
The dataframe entry has a dictionary interface for this purpose. You can do df['A'].items()
import pandas as pd
df = pd.DataFrame(index=[10, 20, 30])
df['A'] = [1, 2, 3]
df['B'] = [5, 7, 9]
for i, v in df['A'].items():
print(i, v)
10 1
20 2
30 3

Pandas Efficient Filtering: Same filter condition on multiple columns

Say I have the data below:
df = pd.DataFrame({'col1': [1, 2, 1],
'col2': [2, 4, 3],
'col3': [3, 6, 5],
'col4': [4, 8, 7]})
Is there a way to use list comprehensions to filter data efficiently? For example, if I wanted to find all cases where col2 was even OR col3 was even OR col 4 was even, is there a simpler way than just writing this?
df[(df['col2'] % 2 == 0) | (df['col3'] % 2 == 0) | (df['col4'] % 2 == 0)]
It would be nice if I could pass in a list of columns and the condition to check.
df[(df[cols] % 2 == 0).any(axis=1)]
where cols is your list of columns

Pandas: number of rows where df['A'] == df['B'] or df['B'] == []

Consider following dataframe df with columns A and B. I am trying to find the number of rows where df['A'] == df['B'] or df['B'] == []. How can I do this?
A B
m:QueryId
970000000 [0, 1, 2, 3, 4, 5] [0, 1, 2, 3, 4, 5]
970000001 [0] [0]
970000002 [1, 2, 3, 4, 5] []
970000003 [0, 1, 2, 3] []
970000004 [1, 2, 4] [5,6]
Try with :
df[df['A'].eq(df['B'])|~df['B'].astype(bool)]
For count of such rows:
(df['A'].eq(df['B'])|~df['B'].astype(bool)).sum()
IIUC, if you only want to compare list identity, do a simple comparison, for empty lists, if all elements are lists you can just check that its length is 0:
((df['A']==df['B'])|df['B'].str.len().eq(0)).sum()
output: 4

Drop columns if all entries in column match item from a list in Pandas

I have a dataframe that I am trying to drop some columns from, based on their content. If all of the rows in a column have the same value as one of the items in a list, then I want to drop that column. I am having trouble doing this without messing up the loops. Is there a better way to do this, or some error I can fix? I am getting an error that says:
IndexError: index 382 is out of bounds for axis 0 with size 382
Code:
def trimADAS(df):
notList = ["Word Recall Test","Result"]
print("START")
print(len(df.columns))
numCols = len(df.columns)
for h in range(numCols): # for every column
for i in range(len(notList)): # for every list item
if df[df.columns[h]].all() == notList[i]: # if all column entries == list item
print(notList[i]) # print list item
print(df[df.columns[h]]) # print column
print(df.columns[h]) # print column name
df.drop([df.columns[h]], axis = 1, inplace = True) # drop this column
numCols -= 1
print("END")
print(len(df.columns))
print(df.columns)
return()
Loopy isn't usually the way to go with pandas. Here's one solution.
import pandas as pd
df = pd.DataFrame({'A': [1, 1, 1, 1],
'B': [2, 2, 2, 2],
'C': [3, 3, 3, 3],
'D': [4, 4, 4, 4],
'E': [5, 5, 5, 5]})
lst = [2, 3, 4]
df = df.drop([x for x in df if any((df[x]==i).all() for i in lst)], 1)
# A E
# 0 1 5
# 1 1 5
# 2 1 5
# 3 1 5

Categories

Resources