This question already has answers here:
Pandas: Replace a string with 'other' if it is not present in a list of strings
(5 answers)
Closed 1 year ago.
df2 = np.where(df2['color'] != 'blue' | 'red')
I want to create one category for many categorical values, such as:
If the color is not blue or red, call the color "other"
Please and thank you <3
You are basically halfway there. You just have to provide 2 more parameters to achieve what you want.
df2['color'] = np.where((df2['color'] == 'blue') | (df2['color'] == 'red'), df2['color'], 'other')
Reading the equality is easier because there is less cognitive load. If the condition is True the df2['color'] will be selected. If the condition is false for that row 'other' will be selected
Related
This question already has answers here:
Custom sorting in pandas dataframe
(5 answers)
Closed 11 months ago.
I am doing my dataset. I need to sort one of my dataset columns from the smallest to the largest like:
however, when I use :
count20 = count20.sort_values(by = ['Month Year', 'Age'])
I got:
Can anyone help me with this?
Thank you very much!
define a function like this:
def fn(x):
output = []
for item, value in x.iteritems():
output.append(''.join(e for e in value if e.isalnum()))
return output
and pass this function as key while sorting values.
count20 = count20.sort_values(by = ['Month Year', 'Age'], key= fn)
This question already has answers here:
Python Pandas replace values if not in value range
(4 answers)
Closed last year.
I am working with pandas df and I am trying to make all the numbers that are outside of range set as null, but having trouble
df['Numbers'] = df['Numbers'].mask((df['Numbers']< -10) & (df['Numbers']> 10), inplace=True)
So I want to keep the numbers between -10 and 10, if the numbers are outside of those two numbers, it should be set as null.
What am I doing wrong here?
One thing that immediately strikes out at me is that you're using & with your two conditions, so you're basically trying to select all numbers that are both less than -10 and greater than 10...which isn't gonna work ;)
I'd rewrite your code like this:
df.loc[df['Numbers'].lt(-10) | df['Numbers'].gt(10), 'Numbers'] = np.nan
I would do it like this:
df['Numbers'] = df['Numbers'].where((df['Numbers']>-10) & (df['Numbers']<10))
This question already has an answer here:
Count occurrences of certain string in entire pandas dataframe
(1 answer)
Closed 2 years ago.
Suppose I want to find the number of occurrences of something in a pandas dataframe as one number.
If I do df.isin(["ABC"]).sum() it gives me a table of all occurrences of "ABC" under each column.
What do I do if I want just one number which is the number of "ABC" entries under column 1?
Moreover, is there code to find entries that have both "ABC" under say column 1 and "DEF" under column 2. even this should just be a single number of entries/rows that have both of these.
You can check with groupby + size
out = df.groupby(['col1', 'col2']).size()
print(out.loc[('ABC','DEF')])
Q1: I'm sure there are more sophisticated ways of doing this, but you can do something like:
num_occurences = data[(data['column_name'] == 'ABC')]
len(num_occurences.index)
Q2: To add in 'DEF' search, you can try
num_occurences = data[(data['column_name'] == 'ABC') & (data['column_2_name'] == 'DEF')]
len(num_occurences.index)
I know this works for quantitative values; you'll need to see with qualitative.
This question already has answers here:
pandas select from Dataframe using startswith
(5 answers)
Closed 2 years ago.
In table A, there’s columns 1 and 2.
Column 1 is unique id’s like (‘A12324’) and column 2 is blank for now.
I want to fill the value of column 2 with Yes if the id starts with A, and else No.
Is any one familiar with how I can maybe use a left for this?
I tried this but my error read that left is not defined.
TableA.loc[TableA['col1'] == left('A',1), 'hasAnA'] = 'Yes'
You can use the pd.Series.str.startswith() method:
>>> frame = pd.DataFrame({'colA': ['A12342', 'B123123231'], 'colB': False})
>>> condition = frame['colA'].str.startswith('A')
>>> frame.loc[condition, 'colB'] = 'Yes'
>>> frame
colA colB
0 A12342 Yes
1 B123123231 False
This question already has answers here:
Pandas conditional creation of a series/dataframe column
(13 answers)
Closed 3 years ago.
I don't know how to right properly the following idea:
I have a dataframe that has two columns, and many many rows.
I want to create a new column based on the data in these two columns, such that if there's 1 in one of them the value will be 1, otherwise 0.
Something like that:
if (df['col1']==1 | df['col2']==1):
df['newCol']=1
else:
df['newCol']=0
I tried to use .loc function in different ways but i get different errors, so either I'm not using it correctly, or this is not the right solution...
Would appreciate your help. Thanks!
Simply use np.where or np.select
df['newCol'] = np.where((df['col1']==1 | df['col2']==1), 1, 0)
OR
df['newCol'] = np.select([cond1, cond2, cond3], [choice1, choice2, choice3], default=def_value)
When a particular condition is true replace with the corresponding choice(np.select).
one way to solve this using .loc,
df.loc[(df['col1'] == 1 | df['col2']==1) ,'newCol'] = 1
df['newCol'].fillna(0,inplace=True)
incase if you want newcol as string use,
df.loc[(df['col1'] == 1 | df['col2']==1) ,'newCol'] = '1'
df['newCol'].fillna('0',inplace=True)
or
df['newCol']=df['newCol'].astype(str)