Data frame renaming columns [duplicate] - python

This question already has answers here:
Remove or replace spaces in column names
(2 answers)
How can I make pandas dataframe column headers all lowercase?
(6 answers)
Closed 1 year ago.
data sample from CSV file
Model,Displ,Cyl,Trans,Drive,Fuel,Cert Region,Stnd,Stnd Description,Underhood ID,Veh Class,Air Pollution Score,City MPG,Hwy MPG,Cmb MPG,Greenhouse Gas Score,SmartWay,Comb CO2
ACURA RDX,3.5,6,SemiAuto-6,2WD,Gasoline,FA,T3B125,Federal Tier 3 Bin 125,JHNXT03.5GV3,small SUV,3,20,28,23,5,No,386
import pandas as pd
df_18 = pd.read_csv('file name')
request:
Rename all column labels to replace spaces with underscores and convert everything to lowercase.
below code did work, and I don't know why
df_18.rename(str.lower().str.strip().str.replace(" ","_"),axis=1,inplace=True)

You can directly assign the list of column names to pandas.DataFrame.columns; you can perform the required operations i.e. lower, strip, and replace in a list-comprehension for each column names, and assign it back to the dataframe.columns
df_18.columns = [col.lower().strip().replace(" ","_") for col in df_18]
OUTPUT:
model displ cyl ... greenhouse_gas_score smartway comb_co2
0 ACURA RDX 3.5 6 ... 5 No 386
[1 rows x 18 columns]

There are many ways to rename the column,
reference for renaming columns
reference for replace string
you can use the below code.
df_18.columns=[col.lower().replace(" ","_") for col in df_18.columns]

for column in df_18.columns:
new_column_name = column.lower().strip().replace(" ","_")
if new_column_name != column:
df_18[new_column_name] = df_18[column]
del df_18[column]

Related

Split commas separeted cell in pandas dataframe into different columns [duplicate]

This question already has answers here:
Pandas split column into multiple columns by comma
(7 answers)
Closed 1 year ago.
This post was edited and submitted for review 1 year ago and failed to reopen the post:
Original close reason(s) were not resolved
How do I split the comma-separated string to new columns
Expected output
Source Target Weight
0 Majed Moqed Majed Moqed 0
Try this:
df['Source'] = df['(Source, Target, Weight)'].split(',')[0]
df['Target'] = df['(Source, Target, Weight)'].split(',')[1]
df['Weight'] = df['(Source, Target, Weight)'].split(',')[2]
Try this:
col = '(Source, Target, Weight)'
df = pd.DataFrame(df[col].str.split(',').tolist(), columns=col[1:-1].split(', '))
You can also do:
col = '(Source, Target, Weight)'
df[col.strip('()').split(', ')] = df[col].str.split(',', expand=True)

Pandas: How to remove character that include non english characters? [duplicate]

This question already has answers here:
Remove non-ASCII characters from pandas column
(8 answers)
Closed 1 year ago.
In my DF there are values like الÙجيرة in different columns. How can I remove such values? I am reading the data from an excel file. So on reading, if we could do something then that will be great.
Also, I have some values like Battery ÁÁÁ so I want it to be Battery, So how can I delete these non-English characters but keep other content?
You can use regex to remove designated characters from your strings:
import re
import pandas as pd
records = [{'name':'Foo الÙجيرة'}, {'name':'Battery ÁÁÁ'}]
df = pd.DataFrame.from_records(records)
# Allow alpha numeric and spaces (add additional characters as needed)
pattern = re.compile('[^A-z0-9 ]+')
def clean_text(string):
return pattern.search('', string)
# Apply to your df
df['clean_name'] = df['name'].apply(clean_text)
name clean_name
0 Foo الÙجيرة Foo
1 Battery ÁÁÁ Battery
For more solutions, you can read this SO Q: Python, remove all non-alphabet chars from string
You can use python split method to do that or you can lambda function:
df[column_name] = df[column_name].apply(lambda column_name : column_name[start:stop])
#df['location'] = df['location'].apply(lambda location:location[0:4])
Split Method
df[column_name] = df[column_name].apply(lambda column_name: column_name.split('')[0])

How to replace values in a Pandas Dataframe on a condition? [duplicate]

This question already has answers here:
Replacing column values in a pandas DataFrame
(16 answers)
How to filter Pandas dataframe using 'in' and 'not in' like in SQL
(11 answers)
Closed 2 years ago.
I defined a list of values that I am searching for in a column of a dataframe and want to replace all values in that df that match.
First_name
0 Jon
1 Bill
2 Bill
names = {'First_name': ['Jon','Bill', 'Bill']}
name_list = ['Bill']
df = DataFrame(names,columns=['First_name'])
df.loc[df['First_name'].apply(str) in name_list] = 'Killed'
Should result in
First_name
0 Jon
1 Killed
2 Killed
but I'm getting an error
TypeError: 'in ' requires string as left operand, not Series
Not too sure why, since I am applying (str) to the left operand
Do you mean:
name_list = ['Bill']
df.loc[df['First_name'].isin(name_list), 'First_name'] = 'Killed'

Python/Pandas - Query a MultiIndex Column [duplicate]

This question already has answers here:
Select columns using pandas dataframe.query()
(5 answers)
Closed 4 years ago.
I'm trying to use query on a MultiIndex column. It works on a MultiIndex row, but not the column. Is there a reason for this? The documentation shows examples like the first one below, but it doesn't indicate that it won't work for a MultiIndex column.
I know there are other ways to do this, but I'm specifically trying to do it with the query function
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.random((4,4)))
df.index = pd.MultiIndex.from_product([[1,2],['A','B']])
df.index.names = ['RowInd1', 'RowInd2']
# This works
print(df.query('RowInd2 in ["A"]'))
df = pd.DataFrame(np.random.random((4,4)))
df.columns = pd.MultiIndex.from_product([[1,2],['A','B']])
df.columns.names = ['ColInd1', 'ColInd2']
# query on index works, but not on the multiindexed column
print(df.query('index < 2'))
print(df.query('ColInd2 in ["A"]'))
To answer my own question, it looks like query shouldn't be used at all (regardless of using MultiIndex columns) for selecting certain columns, based on the answer(s) here:
Select columns using pandas dataframe.query()
You can using IndexSlice
df.query('ilevel_0>2')
Out[327]:
ColInd1 1 2
ColInd2 A B A B
3 0.652576 0.639522 0.52087 0.446931
df.loc[:,pd.IndexSlice[:,'A']]
Out[328]:
ColInd1 1 2
ColInd2 A A
0 0.092394 0.427668
1 0.326748 0.383632
2 0.717328 0.354294
3 0.652576 0.520870

Python data frames - how to select all columns that have a specific substring in their name [duplicate]

This question already has answers here:
Find column whose name contains a specific string
(8 answers)
Closed 7 years ago.
in Python I have a data frame (df) that contains columns with the following names A_OPEN, A_CLOSE, B_OPEN, B_CLOSE, C_OPEN, C_CLOSE, D_ etc.....
How can I easily select only the columns that contain _CLOSE in their name? A,B,C,D,E,F etc can have any value so I do not want to use the specific column names
In SQL this would be done with the like operator: df[like'%_CLOSE%']
What's the python way?
You could use a list comprehension, e.g.:
df[[x for x in df.columns if "_CLOSE" in x]]
Example:
df = pd.DataFrame(
columns = ['_CLOSE_A', '_CLOSE_B', 'C'],
data = [[2,3,4], [3,4,5]]
)
Then,
>>>print(df[[x for x in df.columns if "_CLOSE" in x]])
_CLOSE_A _CLOSE_B
0 2 3
1 3 4

Categories

Resources