DataFrame values frequency [duplicate] - python

This question already has answers here:
Count number of values in an entire DataFrame
(3 answers)
Closed 1 year ago.
I have a DataFrame which I want to find value frequencies through all the frame.
a b
0 5 7
1 7 8
2 5 7
The result should be like:
5 2
7 3
8 1

Use DataFrame.stack with Series.value_counts and Series.sort_index:
s = df.stack().value_counts().sort_index()
Or DataFrame.melt:
s = df.melt()['value'].value_counts().sort_index()
print (s)
5 2
7 3
8 1
Name: value, dtype: int64

a simple way is to use pd.Series for finding the unique count:
import pandas as pd
# creating the series
s = pd.Series(data = [5,10,9,8,8,4,5,9,10,0,1])
# finding the unique count
print(s.value_counts())
output:
10 2
9 2
8 2
5 2
4 1
1 1
0 1

Related

add column to dataframe with sequence of integers depending on another column [duplicate]

This question already has answers here:
Add a sequential counter column on groups to a pandas dataframe
(4 answers)
Add conditional counter: counter column based on value of other columns
(2 answers)
Closed 7 months ago.
df = pd.DataFrame({'A':[3,5,2,5,4,2,5,2,3,1,4,1], 'B':['x','y','x','x','y','z','z','x','y','y','x','z']})
I'd like to add a column C that, for each letter in B, contains sequential integers:
A B C
0 3 x 1
1 5 y 1
2 2 x 2
3 5 x 3
4 4 y 2
5 2 z 1
6 5 z 2
7 2 x 4
8 3 y 3
9 1 y 4
10 4 x 5
11 1 z 3
You can use cumcount() grouping by B
df = pd.DataFrame({'A':[3,5,2,5,4,2,5,2,3,1,4,1], 'B':['x','y','x','x','y','z','z','x','y','y','x','z']})
df['C'] = df.groupby('B').cumcount() + 1

How to add all top cells in present cell of a column in dataframe [duplicate]

This question already has an answer here:
Cumsum as a new column in an existing Pandas data
(1 answer)
Closed 2 years ago.
For example.
Let us assume we are having below dataframe:
Num
0 2
1 4
2 1
3 5
4 3
The expected output in another "sum" should be as below:
Num sum
0 2 2
1 4 6 (2+4)
2 1 7 (2+4+1)
3 5 12 (2+4+1+5)
4 3 15 (2+4+1+5+3)
This can be achieved using cumsum:
df['sum'] = df['Num'].cumsum()

How to retrieve data if you know the column value and row value using a pandas data frame? [duplicate]

This question already has answers here:
How are iloc and loc different?
(6 answers)
Selection with .loc in python
(5 answers)
Closed 4 years ago.
If I have a pandas data frame like this:
A B C D E
1 3 4 2 5 1
2 5 4 2 4 4
3 5 1 8 1 3
4 1 1 9 9 4
5 3 6 4 1 1
and want to find a value with a row value of 3 and column value of D how do I go about doing it?
In this case, I had a row value of 3 and column value of D how would I get a return of 1 in this instance?
Or if I had a row value of 2 and column value of B how would I get a return of 4?
You can use DataFrame.loc: df.loc[row, 'col_name'], eg, df.loc[2, 'B'] for 4

how to union two data frames so that every value in one data frame is linked to all values in another using python and pandas [duplicate]

This question already has answers here:
cartesian product in pandas
(13 answers)
Closed 4 years ago.
For example, the data is:
a=pd.DataFrame({'aa':[1,2,3]})
b=pd.DataFrame({'bb':[4,5]})
what I want is to union these two data frames so the new frame is :
aa bb
1 4
1 5
2 4
2 5
3 4
3 5
You can see that every value in a is linked to all the values in b in the new frame. I probably can use tile or repeat to do this. But I have multiple frames which need to be done repeatedly. So I want to know if there is a better way?
Could anyone help me out here?
You can do it like this:
In [24]: a['key'] = 1
In [25]: b['key'] = 1
In [27]: pd.merge(a, b, on='key').drop('key', axis=1)
Out[27]:
aa bb
0 1 4
1 1 5
2 2 4
3 2 5
4 3 4
5 3 5
you can use pd.MultiIndex.from_product and then reset_index. It is generating all the combinations between both set of data (the same idea than itertools.product)
df_outut = (pd.DataFrame(index=pd.MultiIndex.from_product([a.aa,b.bb],names=['aa','bb']))
.reset_index())
and you get
aa bb
0 1 4
1 1 5
2 2 4
3 2 5
4 3 4
5 3 5

how to find data from dataFrame at a time,when the condition is a list [duplicate]

This question already has answers here:
Filter dataframe rows if value in column is in a set list of values [duplicate]
(7 answers)
Closed 6 years ago.
as i have the data frame as follow
In [107]: xx
Out[107]:
1 2 3 4
0 0 -1.234881 0.039231 -0.399870
1 1 -1.761733 -1.186537 0.043678
2 2 0.707564 -0.270639 -0.251519
3 3 -0.979584 0.476025 -1.587889
4 4 -0.576429 1.987681 -0.322581
5 5 -0.695509 1.285029 0.393906
6 6 -0.036627 -0.380702 -0.170813
7 7 0.673423 0.860289 -0.774651
8 8 -1.000333 0.978760 0.256645
9 9 -0.446005 -0.584627 0.187244
and the condition is the value of column = 1 just as
con = [2,4,6,8]
is there any function I can use, the I can get the result like follow:
1 2 3 4
2 2 0.707564 -0.270639 -0.251519
4 4 -0.576429 1.987681 -0.322581
6 6 -0.036627 -0.380702 -0.170813
8 8 -1.000333 0.978760 0.256645
thanks!
You can use the .isin() method:
con = [2,4,6,8]
xxx[xxx["1"].isin(con)]
Using isin
In [29]: df[df['1'].isin(con)]
Out[29]:
1 2 3 4
2 2 0.707564 -0.270639 -0.251519
4 4 -0.576429 1.987681 -0.322581
6 6 -0.036627 -0.380702 -0.170813
8 8 -1.000333 0.978760 0.256645

Categories

Resources