Conditional Statement with a "wildcard" [duplicate] - python

This question already has answers here:
pandas select from Dataframe using startswith
(5 answers)
Closed 2 years ago.
In table A, there’s columns 1 and 2.
Column 1 is unique id’s like (‘A12324’) and column 2 is blank for now.
I want to fill the value of column 2 with Yes if the id starts with A, and else No.
Is any one familiar with how I can maybe use a left for this?
I tried this but my error read that left is not defined.
TableA.loc[TableA['col1'] == left('A',1), 'hasAnA'] = 'Yes'

You can use the pd.Series.str.startswith() method:
>>> frame = pd.DataFrame({'colA': ['A12342', 'B123123231'], 'colB': False})
>>> condition = frame['colA'].str.startswith('A')
>>> frame.loc[condition, 'colB'] = 'Yes'
>>> frame
colA colB
0 A12342 Yes
1 B123123231 False

Related

Python Panda case statement [duplicate]

This question already has answers here:
Convert categorical data in pandas dataframe
(15 answers)
Convert categorical variables from String to int representation
(3 answers)
Closed 5 months ago.
I have a column with the following values:
City
A
B
C
I want to create a heatmap but can't because this column is not an integar so I will be making it as follows:
city_new
1
2
3
I have tried this case statement but it does not work
df['city_new'] = np.where(df['City']='A', 1,
np.where(df['City']='B', 2,
np.where(df['City']='C', 3)))
You can use pandas.factorize, so that you don't have to make conditions yourself (e.g. if you have 1000 different City):
df["new_city"] = pd.factorize(df["City"])[0] + 1
Output:
City new_city
0 A 1
1 B 2
2 C 3
You could use the replace option. To replace A, B, C with 1,2,3 as per the below code.
df['city_new'] = df['City'].replace(['A','B','C'], [1,2,3])
Your code was incorrect for two reasons:
You used = instead of == to check for the string
You need to state the equivalent of an 'else' clause if none of the logic statement are true, this is the value 4 in the code below.
Your code should look like this:
df['City_New'] = np.where(df['City']=='A', 1, np.where(df['City']=='B', 2, np.where(df['City']=='C', 3, 4)))

Python Pandas - rename a value conditionally [duplicate]

This question already has answers here:
Pandas: Replace a string with 'other' if it is not present in a list of strings
(5 answers)
Closed 1 year ago.
df2 = np.where(df2['color'] != 'blue' | 'red')
I want to create one category for many categorical values, such as:
If the color is not blue or red, call the color "other"
Please and thank you <3
You are basically halfway there. You just have to provide 2 more parameters to achieve what you want.
df2['color'] = np.where((df2['color'] == 'blue') | (df2['color'] == 'red'), df2['color'], 'other')
Reading the equality is easier because there is less cognitive load. If the condition is True the df2['color'] will be selected. If the condition is false for that row 'other' will be selected

Is loc an optional attribute when searching dataframe? [duplicate]

This question already has answers here:
What is the difference between using loc and using just square brackets to filter for columns in Pandas/Python?
(4 answers)
Closed 1 year ago.
Both the following lines seem to give the same output:
df1 = df[df['MRP'] > 1500]
df1 = df.loc[df['MRP'] > 1500]
Is loc an optional attribute when searching dataframe?
Coming from Padas.DataFrame.loc documentation:
Access a group of rows and columns by label(s) or a boolean array.
.loc[] is primarily label based, but may also be used with a boolean
array.
When you are using Boolean array to filter out data, .loc is optional, and in your example df['MRP'] > 1500 gives a Series with the values of truthfulness, so it's not necessary to use .loc in that case.
df[df['MRP']>15]
MRP cat
0 18 A
3 19 D
6 18 C
But if you want to access some other columns where this Boolean Series has True value, then you may use .loc:
df.loc[df['MRP']>15, 'cat']
0 A
3 D
6 C
Or, if you want to change the values where the condition is True:
df.loc[df['MRP']>15, 'cat'] = 'found'

if-else for multiple conditions dataframe [duplicate]

This question already has answers here:
Pandas conditional creation of a series/dataframe column
(13 answers)
Closed 3 years ago.
I don't know how to right properly the following idea:
I have a dataframe that has two columns, and many many rows.
I want to create a new column based on the data in these two columns, such that if there's 1 in one of them the value will be 1, otherwise 0.
Something like that:
if (df['col1']==1 | df['col2']==1):
df['newCol']=1
else:
df['newCol']=0
I tried to use .loc function in different ways but i get different errors, so either I'm not using it correctly, or this is not the right solution...
Would appreciate your help. Thanks!
Simply use np.where or np.select
df['newCol'] = np.where((df['col1']==1 | df['col2']==1), 1, 0)
OR
df['newCol'] = np.select([cond1, cond2, cond3], [choice1, choice2, choice3], default=def_value)
When a particular condition is true replace with the corresponding choice(np.select).
one way to solve this using .loc,
df.loc[(df['col1'] == 1 | df['col2']==1) ,'newCol'] = 1
df['newCol'].fillna(0,inplace=True)
incase if you want newcol as string use,
df.loc[(df['col1'] == 1 | df['col2']==1) ,'newCol'] = '1'
df['newCol'].fillna('0',inplace=True)
or
df['newCol']=df['newCol'].astype(str)

Pandas: Add a scalar to multiple new columns in an existing dataframe [duplicate]

This question already has answers here:
How to add multiple columns to pandas dataframe in one assignment?
(13 answers)
Closed 4 years ago.
I recently answered a question where the OP was looking multiple columns with multiple different values to an existing dataframe (link). And it's fairly succinct, but I don't think very fast.
Ultimately I was hoping I could do something like:
# Existing dataframe
df = pd.DataFrame({'a':[1,2]})
df[['b','c']] = 0
Which would result in:
a b c
1 0 0
2 0 0
But it throws an error.
Is there a super simple way to do this that I'm missing? Or is the answer I posted earlier the fastest / easiest way?
NOTE
I understand this could be done via loops, or via assigning scalars to multiple columns, but am trying to avoid that if possible. Assume 50 columns or whatever number you wouldn't want to write:
df['b'], df['c'], ..., df['xyz'] = 0, 0, ..., 0
Not a duplicate:
The "Possible duplicate" question suggested to this shows multiple different values assigned to each column. I'm simply asking if there is a very easy way to assign a single scalar value to multiple new columns. The answer could correctly and very simply be, "No" - but worth knowing so I can stop searching.
Why not using assign
df.assign(**dict.fromkeys(['b','c'],0))
Out[781]:
a b c
0 1 0 0
1 2 0 0
Or create the dict by d=dict(zip([namelist],[valuelist]))
I think you want to do
df['b'], df['c'] = 0, 0

Categories

Resources