How to replace values in a Pandas Dataframe on a condition? [duplicate] - python

This question already has answers here:
Replacing column values in a pandas DataFrame
(16 answers)
How to filter Pandas dataframe using 'in' and 'not in' like in SQL
(11 answers)
Closed 2 years ago.
I defined a list of values that I am searching for in a column of a dataframe and want to replace all values in that df that match.
First_name
0 Jon
1 Bill
2 Bill
names = {'First_name': ['Jon','Bill', 'Bill']}
name_list = ['Bill']
df = DataFrame(names,columns=['First_name'])
df.loc[df['First_name'].apply(str) in name_list] = 'Killed'
Should result in
First_name
0 Jon
1 Killed
2 Killed
but I'm getting an error
TypeError: 'in ' requires string as left operand, not Series
Not too sure why, since I am applying (str) to the left operand

Do you mean:
name_list = ['Bill']
df.loc[df['First_name'].isin(name_list), 'First_name'] = 'Killed'

Related

change a value based on other value in dataframe [duplicate]

This question already has answers here:
Pandas DataFrame: replace all values in a column, based on condition
(8 answers)
Conditional Replace Pandas
(7 answers)
Closed 1 year ago.
If product type == option, I replace the value in the PRICE column with the value of the STRIKE column.
How can I do this without using the for loop? (to make it faster)
Now I have the following but it's slow:
for i in range(df.shape[0]):
if df.loc[i,'type'] == 'Option:
df.loc[i,'PRICE'] = df.loc[i,'STRIKE']
Use .loc in a vectorized fashion
df.loc[df['type'] == 'Option', 'PRICE'] = df['STRIKE']
mask = (df.type == 'Option')
df[mask].PRICE = df[mask].STRIKE
see:
https://www.geeksforgeeks.org/boolean-indexing-in-pandas/

Data frame renaming columns [duplicate]

This question already has answers here:
Remove or replace spaces in column names
(2 answers)
How can I make pandas dataframe column headers all lowercase?
(6 answers)
Closed 1 year ago.
data sample from CSV file
Model,Displ,Cyl,Trans,Drive,Fuel,Cert Region,Stnd,Stnd Description,Underhood ID,Veh Class,Air Pollution Score,City MPG,Hwy MPG,Cmb MPG,Greenhouse Gas Score,SmartWay,Comb CO2
ACURA RDX,3.5,6,SemiAuto-6,2WD,Gasoline,FA,T3B125,Federal Tier 3 Bin 125,JHNXT03.5GV3,small SUV,3,20,28,23,5,No,386
import pandas as pd
df_18 = pd.read_csv('file name')
request:
Rename all column labels to replace spaces with underscores and convert everything to lowercase.
below code did work, and I don't know why
df_18.rename(str.lower().str.strip().str.replace(" ","_"),axis=1,inplace=True)
You can directly assign the list of column names to pandas.DataFrame.columns; you can perform the required operations i.e. lower, strip, and replace in a list-comprehension for each column names, and assign it back to the dataframe.columns
df_18.columns = [col.lower().strip().replace(" ","_") for col in df_18]
OUTPUT:
model displ cyl ... greenhouse_gas_score smartway comb_co2
0 ACURA RDX 3.5 6 ... 5 No 386
[1 rows x 18 columns]
There are many ways to rename the column,
reference for renaming columns
reference for replace string
you can use the below code.
df_18.columns=[col.lower().replace(" ","_") for col in df_18.columns]
for column in df_18.columns:
new_column_name = column.lower().strip().replace(" ","_")
if new_column_name != column:
df_18[new_column_name] = df_18[column]
del df_18[column]

One-liner to identify duplicates using pandas? [duplicate]

This question already has answers here:
How do I get a list of all the duplicate items using pandas in python?
(13 answers)
Closed 1 year ago.
In preps for data analyst interview questions, I came across "find all duplicate emails (not unique emails) in "one-liner" using pandas."
The best I've got is not a single line but rather three:
# initialize dataframe
import pandas as pd
d = {'email':['a','b','c','a','b']}
df= pd.DataFrame(d)
# select emails having duplicate entries
results = pd.DataFrame(df.value_counts())
results.columns = ['count']
results[results['count'] > 1]
>>>
count
email
b 2
a 2
Could the second block following the latter comment be condensed into a one-liner, avoiding the temporary variable results?
Just use duplicated:
>>> df[df.duplicated()]
email
3 a
4 b
Or if you want a list:
>>> df[df["email"].duplicated()]["email"].tolist()
['a', 'b']

Count instances in a dataframe [duplicate]

This question already has answers here:
Pandas, group by count and add count to original dataframe?
(3 answers)
Closed 3 years ago.
I have a dataframe containing a column of values (X).
df = pd.DataFrame({'X' : [2,3,5,2,2,3,7,2,2,7,5,2]})
For each row, I would like to find how many times it's value of X appears (A).
My expected output is:
create temp column with 1 and groupby and count to get your desired answer
df = pd.DataFrame({'X' : [2,3,5,2,2,3,7,2,2,7,5,2]})
df['temp'] = 1
df['count'] = df.groupby(['X'],as_index=False).transform(pd.Series.count)
del df['temp']
print(df)

Python data frames - how to select all columns that have a specific substring in their name [duplicate]

This question already has answers here:
Find column whose name contains a specific string
(8 answers)
Closed 7 years ago.
in Python I have a data frame (df) that contains columns with the following names A_OPEN, A_CLOSE, B_OPEN, B_CLOSE, C_OPEN, C_CLOSE, D_ etc.....
How can I easily select only the columns that contain _CLOSE in their name? A,B,C,D,E,F etc can have any value so I do not want to use the specific column names
In SQL this would be done with the like operator: df[like'%_CLOSE%']
What's the python way?
You could use a list comprehension, e.g.:
df[[x for x in df.columns if "_CLOSE" in x]]
Example:
df = pd.DataFrame(
columns = ['_CLOSE_A', '_CLOSE_B', 'C'],
data = [[2,3,4], [3,4,5]]
)
Then,
>>>print(df[[x for x in df.columns if "_CLOSE" in x]])
_CLOSE_A _CLOSE_B
0 2 3
1 3 4

Categories

Resources