This question already has answers here:
DataFrame String Manipulation
(3 answers)
Closed 8 years ago.
I have a dataframe which I load from an excel file like this:
df = pd.read_excel(filename, 0, index_col=0, skiprows=0, parse_cols=[0, 8, 9], tz='UTC',
parse_dates=True)
I do some simple changing of the column names just for my own readability:
df.columns = ['Ticker', 'Price']
The data in the ticker column looks like:
AAV.
AAV.
AAV.UN
AAV.UN
I am trying to remove the period from the end of the letters when there is no other letters following it.
I know I could use something like:
df['Ticker'].str.rstrip('.')
But that does not work, is there some other way to do what I need? I think my issue is that method is for a series and not a column of values. I tried apply and could not seem to get that to work either.
Any suggestions?
You use map() and a lambda like this
df['Ticker'] = df['Ticker'].map( lambda x : x[:-1] if x.endswith('.') else x)
Ticker
0 AAV
1 AAV
2 AAV.UN
3 AAV.UN
Related
This question already has answers here:
access value from dict stored in python df
(3 answers)
Closed 3 months ago.
Edit: the dummy dataframe is edited
I have a pandas data frame with the below kind of column with 200 rows.
Let's say the name of df is data.
-----------------------------------|
B
-----------------------------------|
{'animal':'cat', 'bird':'peacock'...}
I want to extract the value of animal to a separate column C for all the rows.
I tried the below code but it doesn't work.
data['C'] = data["B"].apply(lambda x: x.split(':')[-2] if ':' in x else x)
Please help.
The dictionary is unpacked with pd.json_normalize
import pandas as pd
data = pd.DataFrame({'B': [{0: {'animal': 'cat', 'bird': 'peacock'}}]})
data['C'] = pd.json_normalize(data['B'])['0.animal']
I'm not totally sure of the structure of your data. Does this look right?
import pandas as pd
import re
df = pd.DataFrame({
"B": ["'animal':'cat'", "'bird':'peacock'"]
})
df["C"] = df.B.apply(lambda x: re.sub(r".*?\:(.*$)", r"\1", x))
This question already has answers here:
Pandas split column into multiple columns by comma
(7 answers)
Closed 1 year ago.
This post was edited and submitted for review 1 year ago and failed to reopen the post:
Original close reason(s) were not resolved
How do I split the comma-separated string to new columns
Expected output
Source Target Weight
0 Majed Moqed Majed Moqed 0
Try this:
df['Source'] = df['(Source, Target, Weight)'].split(',')[0]
df['Target'] = df['(Source, Target, Weight)'].split(',')[1]
df['Weight'] = df['(Source, Target, Weight)'].split(',')[2]
Try this:
col = '(Source, Target, Weight)'
df = pd.DataFrame(df[col].str.split(',').tolist(), columns=col[1:-1].split(', '))
You can also do:
col = '(Source, Target, Weight)'
df[col.strip('()').split(', ')] = df[col].str.split(',', expand=True)
This question already has answers here:
Pyspark : select specific column with its position
(2 answers)
Closed 3 years ago.
I'm new to python spark having only just started in python, so aplogies if this question is really dim
How do I delete columns or reorder my python spark dataframe using column number refs, not column names? My col names are long and I have a lot of cols, using names is v tedious
I want to turn eg:
Data = Data.drop("070_thing", "230_anglething", "152_magnetthing", "200_status_thing", "155_thing")
into:
Data = Data.drop(1, 5, 9, 15, 22)
Also reorder, so instead of:
df = df.select("id","name","time","city")
I want to put:
df = df.select(4, 3, 2, 1)
Thanks
you can use df.columns:
columns = Data.columns
Data.select(columns[0], columns[1])
or:
from operators import itemgetter
df_getter = lambda cols: list(itemgetter(**cols, columns))
data.select(*df_getter((1, 2, 3)))
This question already has answers here:
Split a Pandas column of lists into multiple columns
(11 answers)
Closed 4 years ago.
I have a dataframe in pandas, with a column which is a vector:
df = pd.DataFrame({'ID':[1,2], 'Averages':[[1,2,3],[4,5,6]]})
and I wish to split and divide it into elements which would look like this:
df2 = pd.DataFrame({'ID':[1,2], 'A':[1,4], 'B':[2,5], 'C':[3,6]})
I have tried
df['Averages'].astype(str).str.split(' ') but with no luck. any help would be appreciated.
pd.concat([df['ID'], df['Averages'].apply(pd.Series)], axis = 1).rename(columns = {0: 'A', 1: 'B', 2: 'C'})
This will work:
df[['A','B','C']] = pd.DataFrame(df.averages.values.tolist(), index= df.index)
This question already has answers here:
Select columns using pandas dataframe.query()
(5 answers)
Closed 4 years ago.
I'm trying to use query on a MultiIndex column. It works on a MultiIndex row, but not the column. Is there a reason for this? The documentation shows examples like the first one below, but it doesn't indicate that it won't work for a MultiIndex column.
I know there are other ways to do this, but I'm specifically trying to do it with the query function
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.random((4,4)))
df.index = pd.MultiIndex.from_product([[1,2],['A','B']])
df.index.names = ['RowInd1', 'RowInd2']
# This works
print(df.query('RowInd2 in ["A"]'))
df = pd.DataFrame(np.random.random((4,4)))
df.columns = pd.MultiIndex.from_product([[1,2],['A','B']])
df.columns.names = ['ColInd1', 'ColInd2']
# query on index works, but not on the multiindexed column
print(df.query('index < 2'))
print(df.query('ColInd2 in ["A"]'))
To answer my own question, it looks like query shouldn't be used at all (regardless of using MultiIndex columns) for selecting certain columns, based on the answer(s) here:
Select columns using pandas dataframe.query()
You can using IndexSlice
df.query('ilevel_0>2')
Out[327]:
ColInd1 1 2
ColInd2 A B A B
3 0.652576 0.639522 0.52087 0.446931
df.loc[:,pd.IndexSlice[:,'A']]
Out[328]:
ColInd1 1 2
ColInd2 A A
0 0.092394 0.427668
1 0.326748 0.383632
2 0.717328 0.354294
3 0.652576 0.520870