Python Panda case statement [duplicate] - python

This question already has answers here:
Convert categorical data in pandas dataframe
(15 answers)
Convert categorical variables from String to int representation
(3 answers)
Closed 5 months ago.
I have a column with the following values:
City
A
B
C
I want to create a heatmap but can't because this column is not an integar so I will be making it as follows:
city_new
1
2
3
I have tried this case statement but it does not work
df['city_new'] = np.where(df['City']='A', 1,
np.where(df['City']='B', 2,
np.where(df['City']='C', 3)))

You can use pandas.factorize, so that you don't have to make conditions yourself (e.g. if you have 1000 different City):
df["new_city"] = pd.factorize(df["City"])[0] + 1
Output:
City new_city
0 A 1
1 B 2
2 C 3

You could use the replace option. To replace A, B, C with 1,2,3 as per the below code.
df['city_new'] = df['City'].replace(['A','B','C'], [1,2,3])

Your code was incorrect for two reasons:
You used = instead of == to check for the string
You need to state the equivalent of an 'else' clause if none of the logic statement are true, this is the value 4 in the code below.
Your code should look like this:
df['City_New'] = np.where(df['City']=='A', 1, np.where(df['City']=='B', 2, np.where(df['City']=='C', 3, 4)))

Related

Is loc an optional attribute when searching dataframe? [duplicate]

This question already has answers here:
What is the difference between using loc and using just square brackets to filter for columns in Pandas/Python?
(4 answers)
Closed 1 year ago.
Both the following lines seem to give the same output:
df1 = df[df['MRP'] > 1500]
df1 = df.loc[df['MRP'] > 1500]
Is loc an optional attribute when searching dataframe?
Coming from Padas.DataFrame.loc documentation:
Access a group of rows and columns by label(s) or a boolean array.
.loc[] is primarily label based, but may also be used with a boolean
array.
When you are using Boolean array to filter out data, .loc is optional, and in your example df['MRP'] > 1500 gives a Series with the values of truthfulness, so it's not necessary to use .loc in that case.
df[df['MRP']>15]
MRP cat
0 18 A
3 19 D
6 18 C
But if you want to access some other columns where this Boolean Series has True value, then you may use .loc:
df.loc[df['MRP']>15, 'cat']
0 A
3 D
6 C
Or, if you want to change the values where the condition is True:
df.loc[df['MRP']>15, 'cat'] = 'found'

how to fill empty cells with 0 in python pandas [duplicate]

This question already has answers here:
How to replace NaN values by Zeroes in a column of a Pandas Dataframe?
(17 answers)
Closed 2 years ago.
I have a column in my dataset which have the following values (single, married) some of the cell are empty and I want to convert single to 0 and married to 1 and convert it from string to int
df.X4[df.X4 == 'single'] = 1
df.X4[df.X4 == 'married'] = 2
df['X4'] = df['X4'].astype(str).astype(int)
the cells that has no value give this error
ValueError: invalid literal for int() with base 10: 'nan'
I have tried fillna like this : df.X4.fillna(0)
but still give same error
Let us try
df.X4[df.X4 == 'single'] = 1
df.X4[df.X4 == 'married'] = 2
df['X4'] = pd.to_numeric(df['X4'], errors='coerce').fillna(0).astype(int)

Conditional Statement with a "wildcard" [duplicate]

This question already has answers here:
pandas select from Dataframe using startswith
(5 answers)
Closed 2 years ago.
In table A, there’s columns 1 and 2.
Column 1 is unique id’s like (‘A12324’) and column 2 is blank for now.
I want to fill the value of column 2 with Yes if the id starts with A, and else No.
Is any one familiar with how I can maybe use a left for this?
I tried this but my error read that left is not defined.
TableA.loc[TableA['col1'] == left('A',1), 'hasAnA'] = 'Yes'
You can use the pd.Series.str.startswith() method:
>>> frame = pd.DataFrame({'colA': ['A12342', 'B123123231'], 'colB': False})
>>> condition = frame['colA'].str.startswith('A')
>>> frame.loc[condition, 'colB'] = 'Yes'
>>> frame
colA colB
0 A12342 Yes
1 B123123231 False

if-else for multiple conditions dataframe [duplicate]

This question already has answers here:
Pandas conditional creation of a series/dataframe column
(13 answers)
Closed 3 years ago.
I don't know how to right properly the following idea:
I have a dataframe that has two columns, and many many rows.
I want to create a new column based on the data in these two columns, such that if there's 1 in one of them the value will be 1, otherwise 0.
Something like that:
if (df['col1']==1 | df['col2']==1):
df['newCol']=1
else:
df['newCol']=0
I tried to use .loc function in different ways but i get different errors, so either I'm not using it correctly, or this is not the right solution...
Would appreciate your help. Thanks!
Simply use np.where or np.select
df['newCol'] = np.where((df['col1']==1 | df['col2']==1), 1, 0)
OR
df['newCol'] = np.select([cond1, cond2, cond3], [choice1, choice2, choice3], default=def_value)
When a particular condition is true replace with the corresponding choice(np.select).
one way to solve this using .loc,
df.loc[(df['col1'] == 1 | df['col2']==1) ,'newCol'] = 1
df['newCol'].fillna(0,inplace=True)
incase if you want newcol as string use,
df.loc[(df['col1'] == 1 | df['col2']==1) ,'newCol'] = '1'
df['newCol'].fillna('0',inplace=True)
or
df['newCol']=df['newCol'].astype(str)

Pandas: Add a scalar to multiple new columns in an existing dataframe [duplicate]

This question already has answers here:
How to add multiple columns to pandas dataframe in one assignment?
(13 answers)
Closed 4 years ago.
I recently answered a question where the OP was looking multiple columns with multiple different values to an existing dataframe (link). And it's fairly succinct, but I don't think very fast.
Ultimately I was hoping I could do something like:
# Existing dataframe
df = pd.DataFrame({'a':[1,2]})
df[['b','c']] = 0
Which would result in:
a b c
1 0 0
2 0 0
But it throws an error.
Is there a super simple way to do this that I'm missing? Or is the answer I posted earlier the fastest / easiest way?
NOTE
I understand this could be done via loops, or via assigning scalars to multiple columns, but am trying to avoid that if possible. Assume 50 columns or whatever number you wouldn't want to write:
df['b'], df['c'], ..., df['xyz'] = 0, 0, ..., 0
Not a duplicate:
The "Possible duplicate" question suggested to this shows multiple different values assigned to each column. I'm simply asking if there is a very easy way to assign a single scalar value to multiple new columns. The answer could correctly and very simply be, "No" - but worth knowing so I can stop searching.
Why not using assign
df.assign(**dict.fromkeys(['b','c'],0))
Out[781]:
a b c
0 1 0 0
1 2 0 0
Or create the dict by d=dict(zip([namelist],[valuelist]))
I think you want to do
df['b'], df['c'] = 0, 0

Categories

Resources