add serial count to each group pandas python

add serial count to each group pandas python - python

I have a simple problem but haven't been able to fix it.
I have a simple table such as:
group1
a
a
a
b
b
b
c
c
I can add a count to the column with:
df['count'] = range(1, len(df) + 1)
I have tried to alter this with groupby functions but can't manage to stop the count and restart per group such as:
group1 count
a 1
a 2
a 3
b 1
b 2
b 3
c 1
c 2

You could use cumcount. If you want to start from 1 you could add it:
In [16]: df['count'] = df.groupby('group1').cumcount()+1
In [17]: df
Out[17]:
group1 count
0 a 1
1 a 2
2 a 3
3 b 1
4 b 2
5 b 3
6 c 1
7 c 2

Related

Join an array to every row in the pandas dataframe

I have a data frame and an array as follows:
df = pd.DataFrame({'x': range(0,5), 'y' : range(1,6)})
s = np.array(['a', 'b', 'c'])
I would like to attach the array to every row of the data frame, such that I got a data frame as follows:
What would be the most efficient way to do this?

Just plain assignment:
# replace the first `s` with your desired column names
df[s] = [s]*len(df)

Try this:
for i in s:
df[i] = i
Output:
x y a b c
0 0 1 a b c
1 1 2 a b c
2 2 3 a b c
3 3 4 a b c
4 4 5 a b c

You could use pandas.concat:
pd.concat([df, pd.DataFrame(s).T], axis=1).ffill()
output:
x y 0 1 2
0 0 1 a b c
1 1 2 a b c
2 2 3 a b c
3 3 4 a b c
4 4 5 a b c

You can try using df.loc here.
df.loc[:, s] = s
print(df)
x y a b c
0 0 1 a b c
1 1 2 a b c
2 2 3 a b c
3 3 4 a b c
4 4 5 a b c

How to populate categories in one column and paste the exact value in other column

It has been a long time that I dealt with pandas library. I searched for it but could not come up with an efficient way, which might be a function existed in the library.
Let's say I have the dataframe below:
df1 = pd.DataFrame({'V1':['A','A','B'],
'V2':['B','C','C'],
'Value':[4, 1, 5]})
df1
And I would like to extend this dataset and populate all the combinations of categories and put its corresponding value as exactly the same.
df2 = pd.DataFrame({'V1':['A','B','A', 'C', 'B', 'C'],
'V2':['B','A','C','A','C','B'],
'Value':[4, 4 , 1, 1, 5, 5]})
df2
In other words, in df1, A and B has Value of 4 and I also want to have a row of that B and A has Value of 4 in the second dataframe. It is very similar to melting. I also do not want to use a for loop. I am looking for a more efficient way.

Use:
df = pd.concat([df1, df1.rename(columns={'V2':'V1', 'V1':'V2'})]).sort_index().reset_index(drop=True)
Output:
V1 V2 Value
0 A B 4
1 B A 4
2 A C 1
3 C A 1
4 B C 5
5 C B 5

Or np.vstack:
>>> pd.DataFrame(np.vstack((df1.to_numpy(), df1.iloc[:, np.r_[1:-1:-1, -1]].to_numpy())), columns=df1.columns)
V1 V2 Value
0 A B 4
1 A C 1
2 B C 5
3 B A 4
4 C A 1
5 C B 5
>>>
For correct order:
>>> pd.DataFrame(np.vstack((df1.to_numpy(), df1.iloc[:, np.r_[1:-1:-1, -1]].to_numpy())), columns=df1.columns, index=[*df1.index, *df1.index]).sort_index()
V1 V2 Value
0 A B 4
0 B A 4
1 A C 1
1 C A 1
2 B C 5
2 C B 5
>>>
And index reset:
>>> pd.DataFrame(np.vstack((df1.to_numpy(), df1.iloc[:, np.r_[1:-1:-1, -1]].to_numpy())), columns=df1.columns, index=[*df1.index, *df1.index]).sort_index().reset_index(drop=True)
V1 V2 Value
0 A B 4
1 B A 4
2 A C 1
3 C A 1
4 B C 5
5 C B 5
>>>

You can use methods assign and append:
df1.append(df1.assign(V1=df1.V2, V2=df1.V1), ignore_index=True)
Output:
V1 V2 Value
0 A B 4
1 A C 1
2 B C 5
3 B A 4
4 C A 1
5 C B 5

Grouping the columns and identifying values which are not part of this group

I have a DataFrame which looks like this:
df:-
A B
1 a
1 a
1 b
2 c
3 d
Now using this dataFrame i want to get the following new_df:
new_df:-
item val_not_present
1 c #1 doesn't have values c and d(values not part of group 1)
1 d
2 a #2 doesn't have values a,b and d(values not part of group 2)
2 b
2 d
3 a #3 doesn't have values a,b and c(values not part of group 3)
3 b
3 c
or an individual DataFrame for each items like:
df1:
item val_not_present
1 c
1 d
df2:-
item val_not_present
2 a
2 b
2 d
df3:-
item val_not_present
3 a
3 b
3 c
I want to get all the values which are not part of that group.

You can use np.setdiff and explode:
values_b = df.B.unique()
pd.DataFrame(df.groupby("A")["B"].unique().apply(lambda x: np.setdiff1d(values_b,x)).rename("val_not_present").explode())
Output:
val_not_present
A
1 c
1 d
2 a
2 b
2 d
3 a
3 b
3 c

Another approach is using crosstab/pivot_table to get counts and then filter on where count is 0 and transform to dataframe:
m = pd.crosstab(df['A'],df['B'])
pd.DataFrame(m.where(m.eq(0)).stack().index.tolist(),columns=['A','val_not_present'])
A val_not_present
0 1 c
1 1 d
2 2 a
3 2 b
4 2 d
5 3 a
6 3 b
7 3 c

You could convert B to a categorical datatype and then compute the value counts. Categorical variables will show categories that have frequency counts of zero so you could do something like this:
df['B'] = df['B'].astype('category')
new_df = (
df.groupby('A')
.apply(lambda x: x['B'].value_counts())
.reset_index()
.query('B == 0')
.drop(labels='B', axis=1)
.rename(columns={'level_1':'val_not_present',
'A':'item'})
)

merging rows with repeating column values

I have a dataframe as follows:
data
0 a
1 a
2 a
3 a
4 a
5 b
6 b
7 b
8 b
9 b
I want to group the repeating values of a and b into a single row element as follows:
data
0 a
a
a
a
a
1 b
b
b
b
b
How do I go about doing this? I tried the following but it puts each repeating value in its own column
df.groupby('data')

Seems like a pivot problem, but since you missing the column(create by cumcount) and index(create by factorize) columns , it is hard to figure out
pd.crosstab(pd.factorize(df.data)[0],df.groupby('data').cumcount(),df.data,aggfunc='sum')
Out[358]:
col_0 0 1 2 3 4
row_0
0 a a a a a
1 b b b b b

Something like
index = ((df['data'] != df['data'].shift()).cumsum() - 1).rename(columns= {'data':''})
df = df.set_index(index)
data
0 a
0 a
0 a
0 a
0 a
1 b
1 b
1 b
1 b
1 b

You can use pd.factorize followed by set_index:
df = df.assign(key=pd.factorize(df['data'], sort=False)[0]).set_index('key')
print(df)
data
key
0 a
0 a
0 a
0 a
0 a
1 b
1 b
1 b
1 b
1 b

How can I convert ranked table with pandas?

Let me simplify my problem for easy explanation.
I have a pandas DataFrame table with the below format:
a b c
0 1 3 2
1 3 1 2
2 3 2 1
The numbers in each row present ranks of columns.
For example, the order of the first row is {a, c, b}.
How can I convert the above to the below ?
1 2 3
0 a c b
1 c a b
2 c b a
I googled all day long. But I couldn't find any solutions until now.

Looks like you are just mapping one value to another and renaming the columns, e.g.:
>>> df = pd.DataFrame({'a':[1,3,3], 'b':[3,1,2], 'c':[2,2,1]})
>>> df = df.applymap(lambda x: df.columns[x-1])
>>> df.columns = [1,2,3]
>>> df
1 2 3
0 a c b
1 c a b
2 c b a

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

add serial count to each group pandas python - python

You could use cumcount. If you want to start from 1 you could add it: In [16]: df['count'] = df.groupby('group1').cumcount()+1 In [17]: df Out[17]: group1 count 0 a 1 1 a 2 2 a 3 3 b 1 4 b 2 5 b 3 6 c 1 7 c 2

Related

Join an array to every row in the pandas dataframe

How to populate categories in one column and paste the exact value in other column

Grouping the columns and identifying values which are not part of this group

merging rows with repeating column values

How can I convert ranked table with pandas?

Categories

Resources