How to convert 'SeriesGroupBy' object to list - python

I have a dataframe with several columns that I am filtering in hopes of using the output to filter another dataframe further. Ultimately I'd like to convert the group by object to a list to further filter another dataframe but I am having a hard time converting the SeriesGroupBy object to a list of values. I am using:
id_list = df[df['date_diff'] == pd.Timedelta('0 days')].groupby('id')['id'].tolist()
I've tried to reset_index() and to_frame() and .values before to_list() with no luck.
error is:
'SeriesGroupBy' object has no attribute tolist
Expected output: Simply a list of id's

Try -
pd.Timedelta('0 days')].groupby('id')['id'].apply(list)
Also, I am a bit skeptical about how you are comparing df['date_diff'] with the groupby output.
EDIT: This might be useful for your intended purpose (s might be output of your groupby):
s = pd.Series([['a','a','b'],['b','b','c','d'],['a','b','e']])
s.explode().unique().tolist()
['a', 'b', 'c', 'd', 'e']

Related

Python - Change the original dataframe object from a dictionary?

I'm trying to perform a number of operations on a list of dataframes. I've opted to use a dictionary to help me with this process, but I was wonder if it's possible to reference the originally created dataframe with the changes.
So using the below code as an example, is it possible to call the dfA object with the columns ['a', 'b', 'c'] that were added when it was nested within the dictionary object?
dfA = pd.DataFrame(data=[1], columns=['x'])
dfB = pd.DataFrame(data=[1], columns=['y'])
dfC = pd.DataFrame(data=[1], columns=['z'])
dfdict = {'A':dfA,
'B':dfB,
'C':dfC}
df_dummy = pd.DataFrame(data=[[1,2,3]], columns=['a', 'b', 'c'])
for key in dfdict:
dfdict[str(key)] = pd.concat([dfdict[str(key)], df_dummy], axis=1)
The initial dfA that you created and the dfA DataFrame from your dictionary are two different objects. (You can confirm this by running dfA is dfdict['A'] or id(dfA) == id(dfdict['A']), both of which should return False).
To access the second (newly created) object you need to call it from the dictionary.
dfdict['A']
Or:
dfdict.get('A')
The returned DataFrame will have the new columns you added.

Convert pandas.core.groupby.SeriesGroupBy to a DataFrame

This question didn't have a satisfactory answer, so I'm asking it again.
Suppose I have the following Pandas DataFrame:
df1 = pd.DataFrame({'group': ['a', 'a', 'b', 'b'], 'values': [1, 1, 2, 2]})
I group by the first column 'group':
g1 = df1.groupby('group')
I've now created a "DataFrameGroupBy". Then I extract the first column from the GroupBy object:
g1_1st_column = g1['group']
The type of g1_1st_column is "pandas.core.groupby.SeriesGroupBy". Notice it's not a "DataFrameGroupBy" anymore.
My question is, how can I convert the SeriesGroupBy object back to a DataFrame object? I tried using the .to_frame() method, and got the following error:
g1_1st_column = g1['group'].to_frame()
AttributeError: Cannot access callable attribute 'to_frame' of 'SeriesGroupBy' objects, try using the 'apply' method.
How would I use the apply method, or some other method, to convert to a DataFrame?
Manish Saraswat kindly answered my question in the comments.
g1['group'].apply(pd.DataFrame)

How to change column names in pandas Dataframe using a list of names?

I have been trying to change the column names of a pandas dataframe using a list of names. The following code is being used:
df.rename(columns = list_of_names, inplace=True)
However I got a Type Error each time, with an error message that says "list object is not callable".
I would like to know why does this happen? And What can I do to solve this problem?
Thank you for your help.
you could use
df.columns = ['Leader', 'Time', 'Score']
If you need rename (l is your list of name need to change to)
df.rename(columns=dict(zip(df.columns,l)))
Just update the columns attribute:
df.columns = list_of_names
set_axis
To set column names, use set_axis along axis=1 or axis='columns':
df = df.set_axis(list_of_names, axis=1)
Note that the default axis=0 sets index names.
Why not just modify df.columns directly?
The accepted answer is fine and is used often, but set_axis has some advantages:
set_axis allows method chaining:
df.some_method().set_axis(list_of_names, axis=1).another_method()
vs:
df = df.some_method()
df.columns = list_of_names
df.another_method()
set_axis should theoretically provide better error checking than directly modifying an attribute, though I can't find a specific example at the moment.
if your list is : column_list so column_list is ['a', 'b', 'c']
and original df.columns is ['X', 'Y', 'Z']
you just need: df.columns = column_list

Renaming columns on DataFrame output of pandas.concat

I'm construction a new DataFrame by concatenating the columns of other DataFrames, like so:
pairs = pd.concat([pos1['Close'], pos2['Close'], pos3['Close'], pos4['Close'], pos5['Close'],
pos6['Close'], pos7['Close']], axis=1)
I want to rename all of the columns of the pairs Dataframe to the symbol of the underlying securities. Is there a way to do this during the the concat method call? Reading through the docs on the method here http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.concat.html didn't give me a solid answer.
You can achieve the same in one go using the attribute keys:
pairs = pd.concat([pos1['Close'], pos2['Close'], pos3['Close'], pos4['Close'], pos5['Close'], pos6['Close'], pos7['Close']],
axis=1, keys= ['JPM', 'WFC', 'BAC', 'C', 'STI', 'PNC', 'CMA'])
This is the approach I'm taking. Seems to fit all my requirements.
symbols = ['JPM', 'WFC', 'BAC', 'C', 'STI', 'PNC', 'CMA']
pairs.columns = symbols

Pandas dataframe column selection

I am using Pandas to select columns from a dataframe, olddf. Let's say the variable names are 'a', 'b','c', 'starswith1', 'startswith2', 'startswith3',...,'startswith10'.
My approach was to create a list of all variables with a common starting value.
filter_col = [col for col in list(health) if col.startswith('startswith')]
I'd like to then select columns within that list as well as others, by name, so I don't have to type them all out. However, this doesn't work:
newdf = olddf['a','b',filter_col]
And this doesn't either:
newdf = olddf[['a','b'],filter_col]
I'm a newbie so this is probably pretty simple. Is the reason this doesn't work because I'm mixing a list improperly?
Thanks.
Use
newdf = olddf[['a','b']+filter_col]
since adding lists concatenates them:
In [264]: ['a', 'b'] + ['startswith1']
Out[264]: ['a', 'b', 'startswith1']
Alternatively, you could use the filter method:
newdf = olddf.filter(regex=r'^(startswith|[ab])')

Categories

Resources