This question already has an answer here:
Pandas rolling sum on string column
(1 answer)
Closed 3 years ago.
I have a Pandas dataframe that looks like the below code. I need to add a dynamic column that concatenates every value in a sequence before a given line. A loop sounds like the logical solution but would be super inefficient over a very large dataframe (1M+ rows).
user_id=[1,1,1,1,2,2,2,3,3,3,3,3]
variable=["A","B","C","D","A","B","C","A","B","C","D","E"]
sequence=[0,1,2,3,0,1,2,0,1,2,3,4]
df=pd.DataFrame(list(zip(ID,variable,sequence)),columns =['User_ID', 'Variables','Seq'])
# Need to add a column dynamically
df['dynamic_column']=["A","AB","ABC","ABCD","A","AB","ABC","A","AB","ABC","ABCD","ABCDE"]
I need to be able to create the dynamic column in an efficient way based on the user_id and the sequence number. I have played with the pandas shift function and that just results in having to create a loop. Looking for some easy efficient way of creating that dynamic concatenated column.
This is cumsum:
df['dynamic_column'] = df.groupby('User_ID').Variables.apply(lambda x: x.cumsum())
Output:
0 A
1 AB
2 ABC
3 ABCD
4 A
5 AB
6 ABC
7 A
8 AB
9 ABC
10 ABCD
11 ABCDE
Name: Variables, dtype: object
Your question is a little vague, but would something like this work?
df['DynamicColumn'] = df['user_id'] + df['sequencenumber']
Related
This question already has answers here:
Pandas: how to merge two dataframes on a column by keeping the information of the first one?
(4 answers)
Closed 1 year ago.
i turned a json file into a dataframe, but I am unsure of how to map a certain value from the json dataframe onto the existing data frame i have.
df1 = # (2nd column does'nt matter just there)
category_id
tags
1
a
1
a
10
b
10
c
40
d
df2(json) =
id
title
1
film
2
music
3
travel
4
cooking
5
dance
I would like to make a new column in df1, that maps the titles from the df2 onto df1 corresponding to the category_id. I am sorry I am new to python programming. I know I can hard code the dictionary and key values and go from there. However I was wondering if there is a way with python/pandas to do this in an easier way.
You can use pandas.Series.map() which maps values of Series according to input correspondence.
df1['tilte'] = df1['category_id'].map(df2.set_index('id')['title'])
# print(df1)
category_id tags tilte
0 1 a film
1 1 a film
2 10 b NaN
3 10 c NaN
4 40 d NaN
This question already has answers here:
Multiple aggregations of the same column using pandas GroupBy.agg()
(4 answers)
Closed 2 years ago.
I have a dataframe with individuals who called a variety of numbers. As so:
Person Called
A 123
B 123
C 234
I need to create a new dataframe that makes a list of people who called that number and the count. Like this:
Persons Called Count
A, B 123 2
C 234 1
I'm pretty sure I can just create a for loop that counts the number of times and appends them to a list, but I was wondering if there's a more efficient way to do this without a for loop. Apologies if the formatting is incorrect. I'm new to the forum.
Use name aggregations with GroupBy.agg:
df1 = (df.groupby('Called')
.agg(Persons = ('Person',','.join),
Count=('Person','size'))
.reset_index())
print (df1)
Called Persons Count
0 123 A,B 2
1 234 C 1
Because processing only one column is possible use alternative with tuples and column after groupby:
df1 = (df.groupby('Called')['Person']
.agg([('Persons', ','.join),
('Count','size')])
.reset_index())
print (df1)
Called Persons Count
0 123 A,B 2
1 234 C 1
This question already has answers here:
How to unnest (explode) a column in a pandas DataFrame, into multiple rows
(16 answers)
Split (explode) pandas dataframe string entry to separate rows
(27 answers)
Closed 3 years ago.
Let's assume that I have a pandas dataset and its column A contains n dimensional vectors. I would like to split this column into multiple columns. Basically, my dataset looks like :
A B C
[1,0,2,3,5] ... ...
[4,5,3,2,1] ... ...
.........................
And I want to have :
A0 A1 A2 A3 A4 B C
1 0 2 3 5 ... ...
4 5 3 2 1 ... ...
.......................
I can solve this problem by using apply function and for loops, I think. But, I imagine that there exists a better (faster, easier to read, ...) way to do so.
Edit: My post gets marked as duplicate. But the given answers have a solution which leads to more rows. I want more columns as shown above.
Thanks,
This question already has answers here:
Pandas groupby with delimiter join
(2 answers)
Concatenate strings from several rows using Pandas groupby
(8 answers)
Closed 3 years ago.
Given a Pandas Dataframe df, with column names 'Session', and 'List':
Can I group together the 'List' values for the same values of 'Session'?
My Approach
I've tried solving the problem by creating a new dataframe, and iterating through the rows of the inital dataframe while maintaing a session counter that I increment if I see that the session has changed.
If it hasn't changed, then I append the List value that corresponds to that rows value with a comma.
Whenever the session changes, I used strip to get rid of the last comma (extra).
Initial DataFrame
Session List
0 1 a
1 1 b
2 1 c
3 2 d
4 2 e
5 3 f
Required DataFrame
Session List
0 1 a,b,c
1 2 d,e
2 3 f
Can someone suggest something more efficient or simple?
Thank you in advance.
Use groupby and apply and reset_index:
>>> df.groupby('Session')['List'].agg(','.join).reset_index()
Session List
0 1 a,b,c
1 2 d,e
2 3 f
>>>
This question already has answers here:
Concatenate strings from several rows using Pandas groupby
(8 answers)
Closed 6 months ago.
I currently have dataframe at the top. Is there a way to use a groupby function to get another dataframe to group the data and concatenate the words into the format like further below using python pandas?
Thanks
[
You can apply join on your column after groupby:
df.groupby('index')['words'].apply(','.join)
Example:
In [326]:
df = pd.DataFrame({'id':['a','a','b','c','c'], 'words':['asd','rtr','s','rrtttt','dsfd']})
df
Out[326]:
id words
0 a asd
1 a rtr
2 b s
3 c rrtttt
4 c dsfd
In [327]:
df.groupby('id')['words'].apply(','.join)
Out[327]:
id
a asd,rtr
b s
c rrtttt,dsfd
Name: words, dtype: object
If you want to save even more ink, you don't need to use .apply() since .agg() can take a function to apply to each group:
df.groupby('id')['words'].agg(','.join)
OR
# this way you can add multiple columns and different aggregates as needed.
df.groupby('id').agg({'words': ','.join})