Grouping data in a dataframe [duplicate]

Grouping data in a dataframe [duplicate] - python

This question already has an answer here:
Pandas long to wide (unmelt or similar?) [duplicate]
(1 answer)
Closed 2 months ago.
I have a python dataframe with a few columns, let's say that it looks like this:
Heading 1
Values
A
1
A
2
B
9
B
8
B
6
What I want to is to "pivot" or group the table so it would look something like:
Heading 1
Value 1
Value 2
Value 3
A
1
2
B
9
8
6
I was trying to group the table or pivot/unpivot it by several ways, but i cannot figure out how to do it properly.

You can derive a new column that will hold a row number (so to speak) for each partition of heading 1.
df = pd.DataFrame({"heading 1":['A','A','B','B','B'], "Values":[1,2,9,8,6]})
df['rn'] = df.groupby(['heading 1']).cumcount() + 1
heading 1 Values rn
0 A 1 1
1 A 2 2
2 B 9 1
3 B 8 2
4 B 6 3
Then you can pivot, using the newly derived column as your columns argument:
df = df.pivot(index='heading 1', columns='rn', values='Values').reset_index()
rn heading 1 1 2 3
0 A 1.0 2.0 NaN
1 B 9.0 8.0 6.0

Related

How to create new column in Pandas dataframe where each row is product of previous rows

I have the following DataFrame dt:
a
0 1
1 2
2 3
3 4
4 5
How do I create a a new column where each row is a function of previous rows?
For instance, say the formula is:
B_row(t) = A_row(t-1)+A_row(t-2)+3
Such that:
a b
0 1 /
1 2 /
2 3 6
3 4 8
4 5 10
Also, I hear a lot about the fact that we mustn't loop through rows in Pandas', however it seems to me that I should go at it by looping through each row and creating a sort of recursive loop - as I would do in regular Python.

You could use cumprod:
dt['b'] = dt['a'].cumprod()
Output:
a b
0 1 1
1 2 2
2 3 6
3 4 24
4 5 120

float precision in Pandas df [duplicate]

This question already has answers here:
Set value for particular cell in pandas DataFrame using index
(23 answers)
Closed 2 years ago.
I am trying to change a number in the df but the Pandas converts it to a floor number.
A B
0 1 4
1 2 5
2 3 6
I change a number:
df['B'][1] = 1.2
it gives:
A B
0 1 4
1 2 1
2 3 6
instead of:
A B
0 1 4
1 2 1.2
2 3 6

Pandas has some rather complex view/copy behavior. Your syntax assigns a new value to a copy of the data, leaving the original unchanged. You can update the value in place via:
df.loc[1, "B"] = 1.2
result:
A B
0 1 4.0
1 2 1.2
2 3 6.0

group by two columns and use third column as value without using pivot_table [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 3 years ago.
I have the following dataframe.
user movie rating
0 1 1 3
1 1 2 4
2 2 1 2
3 2 2 5
4 3 1 3
My desired output is
movie 1 2
user
1 3 4
2 2 5
3 3 0
If a user has not rated a movie, I need to have '0' in the corresponding output column, otherwise, the rating value should be present.
Note: I was able to achieve this with pivot_table, but the catch is my dataset contains more than 100000 columns because of which I am getting "Unstacked DataFrame is too big, causing int32 overflow". I am trying groupby as an alternative to bypass this error.
I am trying the following, but it doesn't include the values from 'value' column of my dataframe.
df.groupby(['user', 'movie']).size().unstack('movie', fill_value=0)

try using crosstab:
pd.crosstab(df.user, df.movie, values = df.rating, aggfunc = 'first').fillna(0)
# movie 1 2
# user
# 1 3.0 4.0
# 2 2.0 5.0
# 3 3.0 0.0
to get interger values, just use .astype(int), as follows :
pd.crosstab(df.user, df.movie, values = df.rating, aggfunc = 'first').fillna(0).astype(int)
# movie 1 2
# user
# 1 3 4
# 2 2 5
# 3 3 0

I'm not sure why would you expect movie 3 since it doesn't exist in the original data sample but other than that this will work for you: movie_ratings.set_index(['user', 'movie']).unstack('movie', fill_value=0)

expand pandas groupby results to initial dataframe

Say I have a dataframe df and group it by a few columns, dfg, with the median of one of its columns. How could I then take those median values, and expand them out so that those mean values are in a new column of the original df, and associated with the respective conditions? This will mean there are duplicates, but I will next be using this column for a subsequent calculation and having these in a column will make this possible.
Example data:
import pandas as pd
data = {'idx':[1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2],
'condition1':[1,1,2,2,3,3,4,4,1,1,2,2,3,3,4,4],
'condition2':[1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2],
'values':np.random.normal(0,1,16)}
df = pd.DataFrame(data)
dfg = df.groupby(['idx', 'condition2'], as_index=False)['values'].median()
example of desired result (note duplicates corresponding to correct conditions):
idx condition1 condition2 values medians
0 1 1 1 0.35031 0.656355
1 1 1 2 -0.291736 -0.024304
2 1 2 1 1.593545 0.656355
3 1 2 2 -1.275154 -0.024304
4 1 3 1 0.075259 0.656355
5 1 3 2 1.054481 -0.024304
6 1 4 1 0.9624 0.656355
7 1 4 2 0.243128 -0.024304
8 2 1 1 1.717391 1.155406
9 2 1 2 0.788847 1.006583
10 2 2 1 1.145891 1.155406
11 2 2 2 -0.492063 1.006583
12 2 3 1 -0.157029 1.155406
13 2 3 2 1.224319 1.006583
14 2 4 1 1.164921 1.155406
15 2 4 2 2.042239 1.006583

I believe you need GroupBy.transform with median for new column:
df['medians'] = df.groupby(['idx', 'condition2'])['values'].transform('median')

how to union two data frames so that every value in one data frame is linked to all values in another using python and pandas [duplicate]

This question already has answers here:
cartesian product in pandas
(13 answers)
Closed 4 years ago.
For example, the data is:
a=pd.DataFrame({'aa':[1,2,3]})
b=pd.DataFrame({'bb':[4,5]})
what I want is to union these two data frames so the new frame is :
aa bb
1 4
1 5
2 4
2 5
3 4
3 5
You can see that every value in a is linked to all the values in b in the new frame. I probably can use tile or repeat to do this. But I have multiple frames which need to be done repeatedly. So I want to know if there is a better way?
Could anyone help me out here?

You can do it like this:
In [24]: a['key'] = 1
In [25]: b['key'] = 1
In [27]: pd.merge(a, b, on='key').drop('key', axis=1)
Out[27]:
aa bb
0 1 4
1 1 5
2 2 4
3 2 5
4 3 4
5 3 5

you can use pd.MultiIndex.from_product and then reset_index. It is generating all the combinations between both set of data (the same idea than itertools.product)
df_outut = (pd.DataFrame(index=pd.MultiIndex.from_product([a.aa,b.bb],names=['aa','bb']))
.reset_index())
and you get
aa bb
0 1 4
1 1 5
2 2 4
3 2 5
4 3 4
5 3 5

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Grouping data in a dataframe [duplicate] - python

Related

How to create new column in Pandas dataframe where each row is product of previous rows

float precision in Pandas df [duplicate]

group by two columns and use third column as value without using pivot_table [duplicate]

expand pandas groupby results to initial dataframe

how to union two data frames so that every value in one data frame is linked to all values in another using python and pandas [duplicate]

Categories

Resources