How can I extract data using the 'groupby'

How can I extract data using the 'groupby' - python

import pandas as pd
df= pd.DataFrame({'date':[1,2,3,4,5,1,2,3,4,5,1,2,3,4,5],
'name':list('aaaaabbbbbccccc'),
'v1':[10,20,30,40,50,10,20,30,40,50,10,20,30,40,50],
'v2':[10,20,30,40,50,10,20,30,40,50,10,20,30,40,50],
'v3':[10,20,30,40,50,10,20,30,40,50,10,20,30,40,50]})
a= list(set(list(df.name)))
plus=[]
for i in a:
sep=df[df.name==i]
sep2=sep[(sep.v1>=10)&(sep.v2>=20)&(sep.v3<=40)]
plus.append(sep2)
result=pd.concat(plus)
print(result)
I know this is not a good example anyway,
I would like to handle separately by name.
It takes too long in a big data
How can I extract data using the 'groupby'?
Even better if the function is used(def..apply...)
df.groupby(['name'])(df['v1']>20)...???? It cannot work...

looking at your desired data set i don't think you need to groupby your df, you can simply filter it:
In [112]: df.query('v1 >= 10 and v2 >= 20 and v3 <= 40')
Out[112]:
date name v1 v2 v3
1 2 a 20 20 20
2 3 a 30 30 30
3 4 a 40 40 40
6 2 b 20 20 20
7 3 b 30 30 30
8 4 b 40 40 40
11 2 c 20 20 20
12 3 c 30 30 30
13 4 c 40 40 40

Related

Pandas - How to offset all values with if it less than previous value on whole dataframe

I've got dataframe as follows:
time | value
0 30
1 40
5 55
10 10
11 25
20 10
As value stored should only increment (but sometimes it's being resetted) I want to create an output like follows;
0 30
1 40
5 55
10 65 //offset 55
11 80 //offset 55
20 90 //offset 80
Any easy way to achieve it with pandas?

Is there any way to ungroup the groupby dataframe with adding an additional column

Suppose we take a pandas dataframe...
item MRP sold
0 A 10 10
1 A 36 4
2 B 32 6
3 A 26 7
4 B 30 9
Then do a groupby('item').mean()
it becomes
item MRP sold
0 A 24 7
1 B 31 7.5
Is there a way to retain the mean values of MRP, of all the unique items and make another column which will contain those values when ungrouped.
Basically what i want is
item MRP sold Mean_MRP
0 A 10 10 24
1 A 36 4 24
2 B 32 6 31
3 A 26 7 24
4 B 30 9 31
There are a lot of items, so i need a faster and optimised way to do this

Use the Transform function :
df = (df
.assign(Mean_MRP = lambda x:x.groupby('item')['MRP']
.transform('mean')))
df
item MRP sold Mean_MRP
0 A 10 10 24
1 A 36 4 24
2 B 32 6 31
3 A 26 7 24
4 B 30 9 31
You could also use the pyjanitor module, which makes the code a bit cleaner:
import janitor
df.groupby_agg(by='item',
agg='mean',
agg_column_name="MRP",
new_column_name='Mean_MRP')

Try using transform:
df['Mean_MRP'] = df.groupby('item').transform('mean')

Select rows from pandas df, where index appears somewhere in another df

Assume the following:
df1:
x y z
1 10 11
2 20 22
3 30 33
4 40 44
1 20 21
1 30 31
1 40 41
2 10 12
2 30 32
2 40 42
3 10 31
3 20 23
3 40 43
4 10 14
4 20 24
4 30 34
df2:
x b
1 100
2 200
df3:
y c
10 1000
20 2000
I want all rows from df1, for which either x or y appears in either df2 or df3 respectively, meaning in this case
out:
x y z
1 10 11
2 20 22
1 20 21
1 30 31
1 40 41
2 10 12
2 30 32
2 40 42
3 10 31
3 20 23
4 10 14
4 20 24
I would like to do this in pure pandas, with no for loops, seems standard enough to me, but I don't really know what to look for

You can use isin on both cases, chain the conditions with a bitwise OR and perform boolean indexation on the dataframe with the result:
df1[df1.x.isin(df2.x) | df1.y.isin(df3.y)]

Checking if values of a row are consecutive

I have a df like this:
1 2 3 4 5 6
0 5 10 12 35 70 80
1 10 11 23 40 42 47
2 5 26 27 38 60 65
Where all the values in each row are different and have an increasing order.
I would like to create a new column with 1 or 0 if there are at least 2 consecutive numbers.
For example the second and third row have 10 and 11, and 26 and 27. Is there a more pythonic way than using an iterator?
Thanks

Use DataFrame.diff for difference per rows, compare by 1, check if at least one True per rows and last cast to integers:
df['check'] = df.diff(axis=1).eq(1).any(axis=1).astype(int)
print (df)
1 2 3 4 5 6 check
0 5 10 12 35 70 80 0
1 10 11 23 40 42 47 1
2 5 26 27 38 60 65 1
For improve performance use numpy:
arr = df.values
df['check'] = np.any(((arr[:, 1:] - arr[:, :-1]) == 1), axis=1).astype(int)

Create a column with periodically repeated values in pandas

I have a sample data frame df with one column:
Cost
30
49
98
10
37
20
10
48
70
20
30
40
50
29
90
39
30
29
50
40
and a list: id_list = ["A","B","C","D"] which is a list with 4 different id types. I would like to create a new column in the data frame where the first 5 cost values will be "A" the next 5 cost values will be "B" .... and the last 5 cost values will be "D". Therefore, I want to repeat the elements of the id_list 5 times and my new df will be like this:
Cost ID
30 A
49 A
98 A
10 A
37 A
20 B
10 B
48 B
70 B
20 B
30 C
40 C
50 C
29 C
90 C
39 D
30 D
29 D
50 D
40 D
My actual data frame has many rows and the actual id_list has many elements.
The row-number is multiple of 5 so there will be an exact fill in the final data frame.
In general I know how to add a column with specific values in pandas data frame
but I don't know how to do this with the repeated values.
Could you suggest how can I do this in python?
Thanks in advance for any help

There is function from numpy , repeat
df['New']=np.repeat(id_list,5)
df
Out[23]:
Cost New
0 30 A
1 49 A
2 98 A
3 10 A
4 37 A
5 20 B
6 10 B
7 48 B
8 70 B
9 20 B
10 30 C
11 40 C
12 50 C
13 29 C
14 90 C
15 39 D
16 30 D
17 29 D
18 50 D
19 40 D

Numpy free v1
df.assign(ID=sum(zip(*[id_list] * 5), tuple()))
Cost ID
0 30 A
1 49 A
2 98 A
3 10 A
4 37 A
5 20 B
6 10 B
7 48 B
8 70 B
9 20 B
10 30 C
11 40 C
12 50 C
13 29 C
14 90 C
15 39 D
16 30 D
17 29 D
18 50 D
19 40 D
Numpy free v2
df.assign(ID=[x for x in id_list for _ in range(5)])

I would suggest something like this, which takes advantage of the [item]*n => [item, item, item, ...] expansion that python does:
labels = ['label1', 'label2', 'label3']
num = 5
repeated = []
for i in labels:
repeated.extend([i]*num)
You can then add the column to your dataframe.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I extract data using the 'groupby' - python

Related

Pandas - How to offset all values with if it less than previous value on whole dataframe

Is there any way to ungroup the groupby dataframe with adding an additional column

Select rows from pandas df, where index appears somewhere in another df

Checking if values of a row are consecutive

Create a column with periodically repeated values in pandas

Categories

Resources