This question already has answers here:
Pandas: drop a level from a multi-level column index?
(7 answers)
Closed 3 years ago.
I have a dataframe that has a redundant set of columns that I would like to get rid of. My actual use case is a bit convoluted, but the essence can be captured in the following:
my_frame = pd.DataFrame(data={'a':[1,1,3],'b':[7,8,9],'c':[4,5,6],
'subcolumn_1':['A1','A2','A3'],
'subcolumn_2':['B1','B2','B3']})
my_frame.set_index(keys=['subcolumn_1','subcolumn_2'], inplace=True)
my_frame.transpose()
i.e.
subcolumn_1 A1 A2 A3
subcolumn_2 B1 B2 B3
a 1 1 3
b 7 8 9
c 4 5 6
I would like to delete subcolumn_2. However, I cannot do so via the standard method (e.g. by a drop) because subcolumn_2 is a column header, not an actual row.
try droplevel
my_frame.columns = my_frame.columns.droplevel(1)
Related
This question already has answers here:
How to unnest (explode) a column in a pandas DataFrame, into multiple rows
(16 answers)
Split (explode) pandas dataframe string entry to separate rows
(27 answers)
Closed 3 years ago.
Let's assume that I have a pandas dataset and its column A contains n dimensional vectors. I would like to split this column into multiple columns. Basically, my dataset looks like :
A B C
[1,0,2,3,5] ... ...
[4,5,3,2,1] ... ...
.........................
And I want to have :
A0 A1 A2 A3 A4 B C
1 0 2 3 5 ... ...
4 5 3 2 1 ... ...
.......................
I can solve this problem by using apply function and for loops, I think. But, I imagine that there exists a better (faster, easier to read, ...) way to do so.
Edit: My post gets marked as duplicate. But the given answers have a solution which leads to more rows. I want more columns as shown above.
Thanks,
This question already has answers here:
Pandas groupby with delimiter join
(2 answers)
Concatenate strings from several rows using Pandas groupby
(8 answers)
Closed 3 years ago.
Given a Pandas Dataframe df, with column names 'Session', and 'List':
Can I group together the 'List' values for the same values of 'Session'?
My Approach
I've tried solving the problem by creating a new dataframe, and iterating through the rows of the inital dataframe while maintaing a session counter that I increment if I see that the session has changed.
If it hasn't changed, then I append the List value that corresponds to that rows value with a comma.
Whenever the session changes, I used strip to get rid of the last comma (extra).
Initial DataFrame
Session List
0 1 a
1 1 b
2 1 c
3 2 d
4 2 e
5 3 f
Required DataFrame
Session List
0 1 a,b,c
1 2 d,e
2 3 f
Can someone suggest something more efficient or simple?
Thank you in advance.
Use groupby and apply and reset_index:
>>> df.groupby('Session')['List'].agg(','.join).reset_index()
Session List
0 1 a,b,c
1 2 d,e
2 3 f
>>>
This question already has answers here:
why should I make a copy of a data frame in pandas
(8 answers)
Closed 4 years ago.
I have a dataframe df
a b c
0 5 6 9
1 6 7 10
2 7 8 11
3 8 9 12
So if I want to select only col a and b and store it in another df I would use something like this
df1 = df[['a','b']]
But I have seen places where people write it this way
df1 = df[['a','b']].copy()
Can anyone let me know what is .copy() because the earlier code works just fine.
For example, if you want to rename a dataframe (example using replace):
df2=df
df2=df2.replace('blah','foo')
Here:
df==df2
Will be:
True
You want it to only do to, df2:
df2=df.copy()
df2=df2.replace('blah','foo')
Then now:
df==df2
Returns:
False
This question already has answers here:
Pandas filtering for multiple substrings in series
(3 answers)
Closed 4 years ago.
I tried
df = df[~df['event.properties.comment'].isin(['Extra'])]
Problem is it would just drop the row if the column contains exactly 'Extra' and I need to drop the ones that contain it even as a substring.
Any help?
You can use or condition to have multiple conditions in checking string, for your requirement you may retain text if it have "Extra" or "~".
Considered df
vals ids
0 1 ~
1 2 bball
2 3 NaN
3 4 Extra text
df[~df.ids.fillna('').str.contains('Extra')]
Out:
vals ids
0 1 ~
1 2 bball
2 3 NaN
This question already has answers here:
Remap values in pandas column with a dict, preserve NaNs
(11 answers)
Replace values in a pandas series via dictionary efficiently
(1 answer)
Closed 4 years ago.
I have a dataframe where i need to change row values if it is present in a dictionary like this:
dict = {"A":Apple:"B":Ball,"C":Cat}
c1 c2 c3
0 A Tree GH
1 B Train GC
2 C Yarn GR
I want the column c1 values to be changed from the dict if it is present.