How to pivot tables with duplicate entries? Order matters [duplicate] - python

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 6 months ago.
I have a Pandas DataFrame in Python such as this:
Group Pre/post Value
0 A Pre 3
1 A Pre 5
2 A Post 13
3 A Post 15
4 B Pre 7
5 B Pre 8
6 B Post 17
7 B Post 18
And I'd like to turn it into a different table such as:
Group Pre Post
0 A 3 13
1 A 5 15
2 B 7 17
3 B 8 18
I tried pivoting with df.pivot(index='Group', columns='Pre/post', values='Value') but since I have repeated values and order is important, it went traceback

Here is one way to do it, use list as an aggfunc in pivot_table, to collect the duplicate values for index and column as a list, then using explode split the list into multiple rows.
df.pivot_table(index='Group', columns='Pre/post', values='Value', aggfunc=list
).reset_index().explode(['Post','Pre'], ignore_index=True)
Pre/post Group Post Pre
0 A 13 3
1 A 15 5
2 B 17 7
3 B 18 8

Related

Is there a case sensitive method to filter columns in a dataframe by header? [duplicate]

This question already has answers here:
How to select all columns whose names start with X in a pandas DataFrame
(11 answers)
Closed 2 years ago.
I have a dataframe with multiple columns and different headers.
I want to filter the dataframe to keep only the columns that start with the letter I. Some of my column headers have the letter i but start with a different letter.
Is there a way to do this?
I tried using df.filter but for some reason, it's not case sensitive.
You can use df.filter with the regex parameter:
df.filter(regex=r'(?i)^i')
this will return columns starting with I ignoring the case.
Regex Demo
Example below:
Lets consider the input dataframe:
df = pd.DataFrame(np.random.randint(0,20,(5,4)),
columns=['itest','Itest','another','anothericol'])
print(df)
itest Itest another anothericol
0 1 4 14 17
1 17 10 14 1
2 16 18 10 7
3 10 12 17 14
4 6 15 17 19
With df.filter
print(df.filter(regex=r'(?i)^i'))
itest Itest
0 1 4
1 17 10
2 16 18
3 10 12
4 6 15

pandas pivot table and aggregate [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 3 years ago.
so what i have is the following:
test_df = pd.DataFrame({"index":[1,2,3,1],"columns":[5,6,7,5],"values":[9,9,9,9]})
index columns values
0 1 5 9
1 2 6 9
2 3 7 9
3 1 5 9
i would like the following, the index cols as my index, the columns cols as the columns and the values aggregated in their respective fields, like this:
5 6 7
1 18 nan nan
2 nan 9 nan
3 nan nan 9
thank you!!
EDIT: sorry i made i mistake. the value columns are also categorical, and i need their individual values.. so instead of 18 it should be something like [9:2,10:0,11:0] (assuming the possible value categoricals are 9,10,11)
What about?:
test_df.pivot_table(values='values', index='index', columns='columns', aggfunc='sum')
Also: This is just about reading the manual here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html. I suspect you've to read better about the 'aggfunc' param.

Pandas merging columns by reverse compliment string

So I am stuck on how to approach a data manipulation technique in pandas. I have an example dataframe below with a sum of 25 counts in each row.
I would like to merge column names by the reverse compliment sequence.
AA CC GG AT TT
4 7 0 9 5
3 8 5 5 2
8 6 2 8 1
The columns "AA" and "TT" are reverse compliments of each other as are "CC" and "GG"
AA/TT CC/GG AT
9 7 9
5 13 5
9 8 8
How can I match the reverse compliment of a column name and merge it with the name of another column.
Note: I already have a function to find the reverse compliment of a string
I'd suggest just creating a new frame using pd.concat:
new_df = pd.concat([df[['AA', 'TT']].sum(1).rename('AA/TT'),
df[['CC', 'GG']].sum(1).rename('CC/GG'),
df['AT']], axis=1)
>>> new_df
AA/TT CC/GG AT
0 9 7 9
1 5 13 5
2 9 8 8
More generally, you could do it in a list comprehension. Given the reverse compliments:
reverse_compliments = [['AA','TT'], ['CC','GG']]
Find those values in your original dataframe columns that are not in reverse compliments (There might be a better way here, but this works):
reverse_compliments.append(df.columns.difference(
pd.np.array(reverse_compliments)
.flatten()))
And use pd.concat with a list comprehension:
new_df = pd.concat([df[x].sum(1).rename('/'.join(x)) for x in reverse_compliments],
axis=1)
>>> new_df
AA/TT CC/GG AT
0 9 7 9
1 5 13 5
2 9 8 8

Is there an easy way to eliminate duplicate rows in a DataFrame in Python- pandas? [duplicate]

This question already has answers here:
Find unique rows in numpy.array
(20 answers)
Closed 5 years ago.
My problem is that my data isn't a good representation of what is really going on because it has a lot of duplicate rows. Consider the following-
a b
1 23 42
2 23 42
3 23 42
4 14 12
5 14 12
I only want 1 row and to eliminate all duplicates. It should look like the following after it's done.
a b
1 23 42
2 14 12
Is there a function to do this?
Let's use drop_duplicates with keep='first':
df2.drop_duplicates(keep='first')
Output:
a b
1 23 42
4 14 12

Pandas how to delete alternate rows [duplicate]

This question already has answers here:
Deleting DataFrame row in Pandas based on column value
(18 answers)
Closed 7 years ago.
I have a pandas dataframe with duplicate ids. Below is my dataframe
id nbr type count
7 21 High 4
7 21 Low 6
8 39 High 2
8 39 Low 3
9 13 High 5
9 13 Low 7
How to delete only the rows having the type Low
You can also just slice your df using iloc:
df.iloc[::2]
This will step every 2 rows
You can try this way :
df = df[df.type != "Low"]
Another possible solution is to use drop_duplicates
df = df.drop_duplicates('nbr')
print(df)
id nbr type count
0 7 21 High 4
2 8 39 High 2
4 9 13 High 5
You can also do:
df.drop_duplicates('nbr', inplace=True)
That way you don't have to reassign it.

Categories

Resources