if-else for multiple conditions dataframe [duplicate] - python

This question already has answers here:
Pandas conditional creation of a series/dataframe column
(13 answers)
Closed 3 years ago.
I don't know how to right properly the following idea:
I have a dataframe that has two columns, and many many rows.
I want to create a new column based on the data in these two columns, such that if there's 1 in one of them the value will be 1, otherwise 0.
Something like that:
if (df['col1']==1 | df['col2']==1):
df['newCol']=1
else:
df['newCol']=0
I tried to use .loc function in different ways but i get different errors, so either I'm not using it correctly, or this is not the right solution...
Would appreciate your help. Thanks!

Simply use np.where or np.select
df['newCol'] = np.where((df['col1']==1 | df['col2']==1), 1, 0)
OR
df['newCol'] = np.select([cond1, cond2, cond3], [choice1, choice2, choice3], default=def_value)
When a particular condition is true replace with the corresponding choice(np.select).

one way to solve this using .loc,
df.loc[(df['col1'] == 1 | df['col2']==1) ,'newCol'] = 1
df['newCol'].fillna(0,inplace=True)
incase if you want newcol as string use,
df.loc[(df['col1'] == 1 | df['col2']==1) ,'newCol'] = '1'
df['newCol'].fillna('0',inplace=True)
or
df['newCol']=df['newCol'].astype(str)

Related

Is loc an optional attribute when searching dataframe? [duplicate]

This question already has answers here:
What is the difference between using loc and using just square brackets to filter for columns in Pandas/Python?
(4 answers)
Closed 1 year ago.
Both the following lines seem to give the same output:
df1 = df[df['MRP'] > 1500]
df1 = df.loc[df['MRP'] > 1500]
Is loc an optional attribute when searching dataframe?
Coming from Padas.DataFrame.loc documentation:
Access a group of rows and columns by label(s) or a boolean array.
.loc[] is primarily label based, but may also be used with a boolean
array.
When you are using Boolean array to filter out data, .loc is optional, and in your example df['MRP'] > 1500 gives a Series with the values of truthfulness, so it's not necessary to use .loc in that case.
df[df['MRP']>15]
MRP cat
0 18 A
3 19 D
6 18 C
But if you want to access some other columns where this Boolean Series has True value, then you may use .loc:
df.loc[df['MRP']>15, 'cat']
0 A
3 D
6 C
Or, if you want to change the values where the condition is True:
df.loc[df['MRP']>15, 'cat'] = 'found'

create a new column which is a value_counts of another column in python [duplicate]

This question already has answers here:
pandas add column to groupby dataframe
(3 answers)
Closed 2 years ago.
I have a pandas datafram df that contains a column say x, and I would like to create another column out of x which is a value_count of each item in x.
Here is my approach
x_counts= []
for item in df['x']:
item_count = len(df[df['x']==item])
x_counts.append(item_count)
df['x_count'] = x_counts
This works but this is far inefficient. I am looking for a more efficient way to handle this. Your approach and recommendations are highly appreciated
It sounds like you are looking for groupby function that you are trying to get the count of items in x
There are many other function driven methods but they may differ in different versions.
I suppose that you are looking to join the same elements and find their sum
df.loc[:,'x_count']=1 # This will make a new column of x_count to each row with value 1 in it
aggregate_functions={"x_count":"sum"}
df=df.groupby(["x"],as_index=False,sort=False).aggregate(aggregate_functions) # as_index and sort functions will allow you to choose x separately otherwise it would conside the x column as index column
Hope it heps.

how to get one number from pandas sum / is in function [duplicate]

This question already has an answer here:
Count occurrences of certain string in entire pandas dataframe
(1 answer)
Closed 2 years ago.
Suppose I want to find the number of occurrences of something in a pandas dataframe as one number.
If I do df.isin(["ABC"]).sum() it gives me a table of all occurrences of "ABC" under each column.
What do I do if I want just one number which is the number of "ABC" entries under column 1?
Moreover, is there code to find entries that have both "ABC" under say column 1 and "DEF" under column 2. even this should just be a single number of entries/rows that have both of these.
You can check with groupby + size
out = df.groupby(['col1', 'col2']).size()
print(out.loc[('ABC','DEF')])
Q1: I'm sure there are more sophisticated ways of doing this, but you can do something like:
num_occurences = data[(data['column_name'] == 'ABC')]
len(num_occurences.index)
Q2: To add in 'DEF' search, you can try
num_occurences = data[(data['column_name'] == 'ABC') & (data['column_2_name'] == 'DEF')]
len(num_occurences.index)
I know this works for quantitative values; you'll need to see with qualitative.

Conditional Statement with a "wildcard" [duplicate]

This question already has answers here:
pandas select from Dataframe using startswith
(5 answers)
Closed 2 years ago.
In table A, there’s columns 1 and 2.
Column 1 is unique id’s like (‘A12324’) and column 2 is blank for now.
I want to fill the value of column 2 with Yes if the id starts with A, and else No.
Is any one familiar with how I can maybe use a left for this?
I tried this but my error read that left is not defined.
TableA.loc[TableA['col1'] == left('A',1), 'hasAnA'] = 'Yes'
You can use the pd.Series.str.startswith() method:
>>> frame = pd.DataFrame({'colA': ['A12342', 'B123123231'], 'colB': False})
>>> condition = frame['colA'].str.startswith('A')
>>> frame.loc[condition, 'colB'] = 'Yes'
>>> frame
colA colB
0 A12342 Yes
1 B123123231 False

Pandas: Add a scalar to multiple new columns in an existing dataframe [duplicate]

This question already has answers here:
How to add multiple columns to pandas dataframe in one assignment?
(13 answers)
Closed 4 years ago.
I recently answered a question where the OP was looking multiple columns with multiple different values to an existing dataframe (link). And it's fairly succinct, but I don't think very fast.
Ultimately I was hoping I could do something like:
# Existing dataframe
df = pd.DataFrame({'a':[1,2]})
df[['b','c']] = 0
Which would result in:
a b c
1 0 0
2 0 0
But it throws an error.
Is there a super simple way to do this that I'm missing? Or is the answer I posted earlier the fastest / easiest way?
NOTE
I understand this could be done via loops, or via assigning scalars to multiple columns, but am trying to avoid that if possible. Assume 50 columns or whatever number you wouldn't want to write:
df['b'], df['c'], ..., df['xyz'] = 0, 0, ..., 0
Not a duplicate:
The "Possible duplicate" question suggested to this shows multiple different values assigned to each column. I'm simply asking if there is a very easy way to assign a single scalar value to multiple new columns. The answer could correctly and very simply be, "No" - but worth knowing so I can stop searching.
Why not using assign
df.assign(**dict.fromkeys(['b','c'],0))
Out[781]:
a b c
0 1 0 0
1 2 0 0
Or create the dict by d=dict(zip([namelist],[valuelist]))
I think you want to do
df['b'], df['c'] = 0, 0

Categories

Resources