Split commas separeted cell in pandas dataframe into different columns [duplicate]

Split commas separeted cell in pandas dataframe into different columns [duplicate] - python

This question already has answers here:
Pandas split column into multiple columns by comma
(7 answers)
Closed 1 year ago.
This post was edited and submitted for review 1 year ago and failed to reopen the post:
Original close reason(s) were not resolved
How do I split the comma-separated string to new columns
Expected output
Source Target Weight
0 Majed Moqed Majed Moqed 0

Try this:
df['Source'] = df['(Source, Target, Weight)'].split(',')[0]
df['Target'] = df['(Source, Target, Weight)'].split(',')[1]
df['Weight'] = df['(Source, Target, Weight)'].split(',')[2]

Try this:
col = '(Source, Target, Weight)'
df = pd.DataFrame(df[col].str.split(',').tolist(), columns=col[1:-1].split(', '))

You can also do:
col = '(Source, Target, Weight)'
df[col.strip('()').split(', ')] = df[col].str.split(',', expand=True)

Related

Data frame renaming columns [duplicate]

This question already has answers here:
Remove or replace spaces in column names
(2 answers)
How can I make pandas dataframe column headers all lowercase?
(6 answers)
Closed 1 year ago.
data sample from CSV file
Model,Displ,Cyl,Trans,Drive,Fuel,Cert Region,Stnd,Stnd Description,Underhood ID,Veh Class,Air Pollution Score,City MPG,Hwy MPG,Cmb MPG,Greenhouse Gas Score,SmartWay,Comb CO2
ACURA RDX,3.5,6,SemiAuto-6,2WD,Gasoline,FA,T3B125,Federal Tier 3 Bin 125,JHNXT03.5GV3,small SUV,3,20,28,23,5,No,386
import pandas as pd
df_18 = pd.read_csv('file name')
request:
Rename all column labels to replace spaces with underscores and convert everything to lowercase.
below code did work, and I don't know why
df_18.rename(str.lower().str.strip().str.replace(" ","_"),axis=1,inplace=True)

You can directly assign the list of column names to pandas.DataFrame.columns; you can perform the required operations i.e. lower, strip, and replace in a list-comprehension for each column names, and assign it back to the dataframe.columns
df_18.columns = [col.lower().strip().replace(" ","_") for col in df_18]
OUTPUT:
model displ cyl ... greenhouse_gas_score smartway comb_co2
0 ACURA RDX 3.5 6 ... 5 No 386
[1 rows x 18 columns]

There are many ways to rename the column,
reference for renaming columns
reference for replace string
you can use the below code.
df_18.columns=[col.lower().replace(" ","_") for col in df_18.columns]

for column in df_18.columns:
new_column_name = column.lower().strip().replace(" ","_")
if new_column_name != column:
df_18[new_column_name] = df_18[column]
del df_18[column]

How to create new columns deriving from a categorical column in python? [duplicate]

This question already has answers here:
Pandas: Get Dummies
(5 answers)
What is the most efficient way of counting occurrences in pandas?
(4 answers)
Closed 1 year ago.
I have a data frame with a categorical column(TweetType) with three categories (T, RT and RE). I want to count how many times these categories appear and then sum them. I created three new columns, respectively T, RT, and RE.
def tweet_type(df):
result = df.copy()
result['T'] = result['tweetType'].str.contains("T")
result['RT'] = resulT['tweetType'].str.contains("RT")
result['RE'] = result['tweetType'].str.contains("RE")
return result
tweet_type(my_df)
Then I converted the boolean into 0 and 1. The problem is that the code matches T as RT and the result is not right.
What I obtain is:
TweetType RT T RE
RT 1 1 0
RE 0 0 1
T 1 0 0
RT 1 1 0

Instead of str.contains you should use boolean eq for exact matches:
def tweet_type(df):
result = df.copy()
result['T'] = result['Tweet_Type'].eq("T")
result['RT'] = result['Tweet_Type'].eq("RT")
result['RE'] = result['Tweet_Type'].eq("RE")
return result
However, there is an easier method for what you're trying to achieve.
Why not use one-hot encoding using get_dummies to do this:
new_df = pd.get_dummies(df, columns=["Tweet_Type"])
If you don't want the prefix Tweet_Type_:
new_df = pd.get_dummies(df, columns=["Tweet_Type"], prefix='', prefix_sep='')
If you wish to retain the first column:
df = pd.concat([df, new_df], axis=1)

You can use logical and to exclude the records that contain RT
(result['tweetType'].str.contains("T")) & (~result['tweetType'].str.contains("RT"))

Count instances in a dataframe [duplicate]

This question already has answers here:
Pandas, group by count and add count to original dataframe?
(3 answers)
Closed 3 years ago.
I have a dataframe containing a column of values (X).
df = pd.DataFrame({'X' : [2,3,5,2,2,3,7,2,2,7,5,2]})
For each row, I would like to find how many times it's value of X appears (A).
My expected output is:

create temp column with 1 and groupby and count to get your desired answer
df = pd.DataFrame({'X' : [2,3,5,2,2,3,7,2,2,7,5,2]})
df['temp'] = 1
df['count'] = df.groupby(['X'],as_index=False).transform(pd.Series.count)
del df['temp']
print(df)

remove rows from dataframe where contents could be a choice of strings [duplicate]

This question already has answers here:
dropping rows from dataframe based on a "not in" condition [duplicate]
(2 answers)
Closed 4 years ago.
so i can do something like:
data = df[ df['Proposal'] != 'C000' ]
to remove all Proposals with string C000, but how can i do something like:
data = df[ df['Proposal'] not in ['C000','C0001' ]
to remove all proposals that match either C000 or C0001 (etc. etc.)

You can try this,
df = df.drop(df[df['Proposal'].isin(['C000','C0001'])].index)
Or to select the required ones,
df = df[~df['Proposal'].isin(['C000','C0001'])]

import numpy as np
data = df.loc[np.logical_not(df['Proposal'].isin({'C000','C0001'})), :]
# or
data = df.loc[ ~df['Proposal'].isin({'C000','C0001'}) , :]

DataFrame Column Manipulation [duplicate]

This question already has answers here:
DataFrame String Manipulation
(3 answers)
Closed 8 years ago.
I have a dataframe which I load from an excel file like this:
df = pd.read_excel(filename, 0, index_col=0, skiprows=0, parse_cols=[0, 8, 9], tz='UTC',
parse_dates=True)
I do some simple changing of the column names just for my own readability:
df.columns = ['Ticker', 'Price']
The data in the ticker column looks like:
AAV.
AAV.
AAV.UN
AAV.UN
I am trying to remove the period from the end of the letters when there is no other letters following it.
I know I could use something like:
df['Ticker'].str.rstrip('.')
But that does not work, is there some other way to do what I need? I think my issue is that method is for a series and not a column of values. I tried apply and could not seem to get that to work either.
Any suggestions?

You use map() and a lambda like this
df['Ticker'] = df['Ticker'].map( lambda x : x[:-1] if x.endswith('.') else x)
Ticker
0 AAV
1 AAV
2 AAV.UN
3 AAV.UN

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Split commas separeted cell in pandas dataframe into different columns [duplicate] - python

Try this: df['Source'] = df['(Source, Target, Weight)'].split(',')[0] df['Target'] = df['(Source, Target, Weight)'].split(',')[1] df['Weight'] = df['(Source, Target, Weight)'].split(',')[2]

Try this: col = '(Source, Target, Weight)' df = pd.DataFrame(df[col].str.split(',').tolist(), columns=col[1:-1].split(', '))

You can also do: col = '(Source, Target, Weight)' df[col.strip('()').split(', ')] = df[col].str.split(',', expand=True)

Related

Data frame renaming columns [duplicate]

How to create new columns deriving from a categorical column in python? [duplicate]

Count instances in a dataframe [duplicate]

remove rows from dataframe where contents could be a choice of strings [duplicate]

DataFrame Column Manipulation [duplicate]

Categories

Resources