This question already has answers here:
Split a Pandas column of lists into multiple columns
(11 answers)
Closed 4 years ago.
I have a dataframe in pandas, with a column which is a vector:
df = pd.DataFrame({'ID':[1,2], 'Averages':[[1,2,3],[4,5,6]]})
and I wish to split and divide it into elements which would look like this:
df2 = pd.DataFrame({'ID':[1,2], 'A':[1,4], 'B':[2,5], 'C':[3,6]})
I have tried
df['Averages'].astype(str).str.split(' ') but with no luck. any help would be appreciated.
pd.concat([df['ID'], df['Averages'].apply(pd.Series)], axis = 1).rename(columns = {0: 'A', 1: 'B', 2: 'C'})
This will work:
df[['A','B','C']] = pd.DataFrame(df.averages.values.tolist(), index= df.index)
Related
This question already has answers here:
access value from dict stored in python df
(3 answers)
Closed 3 months ago.
Edit: the dummy dataframe is edited
I have a pandas data frame with the below kind of column with 200 rows.
Let's say the name of df is data.
-----------------------------------|
B
-----------------------------------|
{'animal':'cat', 'bird':'peacock'...}
I want to extract the value of animal to a separate column C for all the rows.
I tried the below code but it doesn't work.
data['C'] = data["B"].apply(lambda x: x.split(':')[-2] if ':' in x else x)
Please help.
The dictionary is unpacked with pd.json_normalize
import pandas as pd
data = pd.DataFrame({'B': [{0: {'animal': 'cat', 'bird': 'peacock'}}]})
data['C'] = pd.json_normalize(data['B'])['0.animal']
I'm not totally sure of the structure of your data. Does this look right?
import pandas as pd
import re
df = pd.DataFrame({
"B": ["'animal':'cat'", "'bird':'peacock'"]
})
df["C"] = df.B.apply(lambda x: re.sub(r".*?\:(.*$)", r"\1", x))
This question already has answers here:
Extract int from string in Pandas
(8 answers)
Closed 1 year ago.
Below is the dataframe
import pandas as pd
import numpy as np
d = {'col1': ['Get URI||1621992600749||com.particlenews.newsbreak||https://graph.fb.com||2021-05-26 01:30:00||1.3.0-QA-1100||90',
'Get URI||1621992600799||com.particlenews.newsbreak||https://graph.fb.com||2021-05-26 01:30:00||1.3.0-QA-1100||90']}
df = pd.DataFrame(data=d)
and need to extract the "1621992600749" and "1621992600799" values.
i have done it multiple ways , by using the split function
new = df["col1"].str.split("||", n = 1, expand = True)
but doesnt give the expected results, any thoughts will be helpful.
You cna use the extract with regex
df['col1'].str.extract(r'(\d+)')
#output
0
0 1621992600749
1 1621992600799
This question already has answers here:
Split / Explode a column of dictionaries into separate columns with pandas
(13 answers)
Closed 2 years ago.
I have a pandas DataFrame df with a column latlng.
The rows in this column have the format
{'latitude': '34.041005', 'longitude': '-118.249569'}.
In order to be able to add markers to a map (using folium librairie), I would like to create two columns 'latitude' and longitude which in this example would contain respectively 34.041005 and -118.249569.
EDIT:
Managed to have it working with this first step:
df['latlng'] = df['latlng'].map(eval)
You can use pd.json_normalize to avoid apply which is costly:
In [684]: df_out = pd.json_normalize(df.latlong)
In [686]: df_out
Out[686]:
latitude longitude
0 34.041005 -118.249569
1 30.041005 -120.249569
Then you can concat these columns back to df like below:
pd.concat([df.drop('latlong', axis=1), df_out], axis=1)
The following should work:
df['latitude']=[i['latitude'] for i in eval(df['latlong'])]
df['longtitude']=[i['longtitude'] for i in eval(df['longtitude'])]
This should do the job for you:
df['blatlong'].apply(pd.Series)
Try this:
df_new = pd.DataFrame(df['latlng'].values.tolist(), index=df.index)
This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 3 years ago.
Considering two dataframes like the ones below:
import pandas as pd
df = pd.DataFrame({'id_emp' : [1,2,3,4,5],
'name_emp': ['Cristiano', 'Gaúcho', 'Fenômeno','Angelin', 'Souza']})
df2 = pd.DataFrame({'id_emp': [1,2,3,6,7],
'name_emp': ['Cristiano', 'Gaúcho', 'Fenômeno', 'Kaká', 'Sérgio'],
'Description': ['Forward', 'Middle', 'Forward', 'back', 'winger']})
I have to create a third data frame from the union of them. I need to compare the id_emp values of the two dataframes, if they are the same, the third dataframe will receive the columns name_dep and description, in addition to the id_emp. Expected output result is as follows:
id_emp|name_emp|Description
1 |Cristiano|Forward
2 |Gaúcho |Middle
3 |Fenômeno |Forward
All you need is merge:
df.merge(df2)
This question already has answers here:
Select columns using pandas dataframe.query()
(5 answers)
Closed 4 years ago.
I'm trying to use query on a MultiIndex column. It works on a MultiIndex row, but not the column. Is there a reason for this? The documentation shows examples like the first one below, but it doesn't indicate that it won't work for a MultiIndex column.
I know there are other ways to do this, but I'm specifically trying to do it with the query function
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.random((4,4)))
df.index = pd.MultiIndex.from_product([[1,2],['A','B']])
df.index.names = ['RowInd1', 'RowInd2']
# This works
print(df.query('RowInd2 in ["A"]'))
df = pd.DataFrame(np.random.random((4,4)))
df.columns = pd.MultiIndex.from_product([[1,2],['A','B']])
df.columns.names = ['ColInd1', 'ColInd2']
# query on index works, but not on the multiindexed column
print(df.query('index < 2'))
print(df.query('ColInd2 in ["A"]'))
To answer my own question, it looks like query shouldn't be used at all (regardless of using MultiIndex columns) for selecting certain columns, based on the answer(s) here:
Select columns using pandas dataframe.query()
You can using IndexSlice
df.query('ilevel_0>2')
Out[327]:
ColInd1 1 2
ColInd2 A B A B
3 0.652576 0.639522 0.52087 0.446931
df.loc[:,pd.IndexSlice[:,'A']]
Out[328]:
ColInd1 1 2
ColInd2 A A
0 0.092394 0.427668
1 0.326748 0.383632
2 0.717328 0.354294
3 0.652576 0.520870