Pandas how to delete alternate rows [duplicate] - python

This question already has answers here:
Deleting DataFrame row in Pandas based on column value
(18 answers)
Closed 7 years ago.
I have a pandas dataframe with duplicate ids. Below is my dataframe
id nbr type count
7 21 High 4
7 21 Low 6
8 39 High 2
8 39 Low 3
9 13 High 5
9 13 Low 7
How to delete only the rows having the type Low

You can also just slice your df using iloc:
df.iloc[::2]
This will step every 2 rows

You can try this way :
df = df[df.type != "Low"]

Another possible solution is to use drop_duplicates
df = df.drop_duplicates('nbr')
print(df)
id nbr type count
0 7 21 High 4
2 8 39 High 2
4 9 13 High 5
You can also do:
df.drop_duplicates('nbr', inplace=True)
That way you don't have to reassign it.

Related

How to pivot tables with duplicate entries? Order matters [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 6 months ago.
I have a Pandas DataFrame in Python such as this:
Group Pre/post Value
0 A Pre 3
1 A Pre 5
2 A Post 13
3 A Post 15
4 B Pre 7
5 B Pre 8
6 B Post 17
7 B Post 18
And I'd like to turn it into a different table such as:
Group Pre Post
0 A 3 13
1 A 5 15
2 B 7 17
3 B 8 18
I tried pivoting with df.pivot(index='Group', columns='Pre/post', values='Value') but since I have repeated values and order is important, it went traceback
Here is one way to do it, use list as an aggfunc in pivot_table, to collect the duplicate values for index and column as a list, then using explode split the list into multiple rows.
df.pivot_table(index='Group', columns='Pre/post', values='Value', aggfunc=list
).reset_index().explode(['Post','Pre'], ignore_index=True)
Pre/post Group Post Pre
0 A 13 3
1 A 15 5
2 B 17 7
3 B 18 8

DataFrame values frequency [duplicate]

This question already has answers here:
Count number of values in an entire DataFrame
(3 answers)
Closed 1 year ago.
I have a DataFrame which I want to find value frequencies through all the frame.
a b
0 5 7
1 7 8
2 5 7
The result should be like:
5 2
7 3
8 1
Use DataFrame.stack with Series.value_counts and Series.sort_index:
s = df.stack().value_counts().sort_index()
Or DataFrame.melt:
s = df.melt()['value'].value_counts().sort_index()
print (s)
5 2
7 3
8 1
Name: value, dtype: int64
a simple way is to use pd.Series for finding the unique count:
import pandas as pd
# creating the series
s = pd.Series(data = [5,10,9,8,8,4,5,9,10,0,1])
# finding the unique count
print(s.value_counts())
output:
10 2
9 2
8 2
5 2
4 1
1 1
0 1

Is there a case sensitive method to filter columns in a dataframe by header? [duplicate]

This question already has answers here:
How to select all columns whose names start with X in a pandas DataFrame
(11 answers)
Closed 2 years ago.
I have a dataframe with multiple columns and different headers.
I want to filter the dataframe to keep only the columns that start with the letter I. Some of my column headers have the letter i but start with a different letter.
Is there a way to do this?
I tried using df.filter but for some reason, it's not case sensitive.
You can use df.filter with the regex parameter:
df.filter(regex=r'(?i)^i')
this will return columns starting with I ignoring the case.
Regex Demo
Example below:
Lets consider the input dataframe:
df = pd.DataFrame(np.random.randint(0,20,(5,4)),
columns=['itest','Itest','another','anothericol'])
print(df)
itest Itest another anothericol
0 1 4 14 17
1 17 10 14 1
2 16 18 10 7
3 10 12 17 14
4 6 15 17 19
With df.filter
print(df.filter(regex=r'(?i)^i'))
itest Itest
0 1 4
1 17 10
2 16 18
3 10 12
4 6 15

Adding a entire column data below the other column in pandas [duplicate]

This question already has answers here:
Convert columns into rows with Pandas
(6 answers)
Closed 2 years ago.
I have a dataframe like this:
time a b
0 10 20
1 11 21
Now i need a dataframe like this:
time a
0 10
1 11
0 20
1 21
This can be done with melt:
df.melt('time', value_name='a').drop('variable', axis=1)
Output:
time a
0 0 10
1 1 11
2 0 20
3 1 21
Or if you have columns other than a,b in your data:
df.melt('time', ['a','b'], value_name='a').drop('variable', axis=1)

pandas pivot table and aggregate [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 3 years ago.
so what i have is the following:
test_df = pd.DataFrame({"index":[1,2,3,1],"columns":[5,6,7,5],"values":[9,9,9,9]})
index columns values
0 1 5 9
1 2 6 9
2 3 7 9
3 1 5 9
i would like the following, the index cols as my index, the columns cols as the columns and the values aggregated in their respective fields, like this:
5 6 7
1 18 nan nan
2 nan 9 nan
3 nan nan 9
thank you!!
EDIT: sorry i made i mistake. the value columns are also categorical, and i need their individual values.. so instead of 18 it should be something like [9:2,10:0,11:0] (assuming the possible value categoricals are 9,10,11)
What about?:
test_df.pivot_table(values='values', index='index', columns='columns', aggfunc='sum')
Also: This is just about reading the manual here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html. I suspect you've to read better about the 'aggfunc' param.

Categories

Resources