This question already has answers here:
Pandas, Pivot table from 2 columns with values being a count of one of those columns
(2 answers)
Most efficient way to melt dataframe with a ton of possible values pandas
(2 answers)
How to form a pivot table on two categorical columns and count for each index?
(2 answers)
Closed 2 years ago.
am trying to transform the rows and count the occurrences of the values based on groupby the id
Dataframe:
id value
A cake
A cookie
B cookie
B cookie
C cake
C cake
C cookie
expected:
id cake cookie
A 1 1
B 0 2
c 2 1
This question already has an answer here:
Pandas rolling sum on string column
(1 answer)
Closed 3 years ago.
I have a Pandas dataframe that looks like the below code. I need to add a dynamic column that concatenates every value in a sequence before a given line. A loop sounds like the logical solution but would be super inefficient over a very large dataframe (1M+ rows).
user_id=[1,1,1,1,2,2,2,3,3,3,3,3]
variable=["A","B","C","D","A","B","C","A","B","C","D","E"]
sequence=[0,1,2,3,0,1,2,0,1,2,3,4]
df=pd.DataFrame(list(zip(ID,variable,sequence)),columns =['User_ID', 'Variables','Seq'])
# Need to add a column dynamically
df['dynamic_column']=["A","AB","ABC","ABCD","A","AB","ABC","A","AB","ABC","ABCD","ABCDE"]
I need to be able to create the dynamic column in an efficient way based on the user_id and the sequence number. I have played with the pandas shift function and that just results in having to create a loop. Looking for some easy efficient way of creating that dynamic concatenated column.
This is cumsum:
df['dynamic_column'] = df.groupby('User_ID').Variables.apply(lambda x: x.cumsum())
Output:
0 A
1 AB
2 ABC
3 ABCD
4 A
5 AB
6 ABC
7 A
8 AB
9 ABC
10 ABCD
11 ABCDE
Name: Variables, dtype: object
Your question is a little vague, but would something like this work?
df['DynamicColumn'] = df['user_id'] + df['sequencenumber']
This question already has answers here:
How to unnest (explode) a column in a pandas DataFrame, into multiple rows
(16 answers)
Split (explode) pandas dataframe string entry to separate rows
(27 answers)
Closed 3 years ago.
Let's assume that I have a pandas dataset and its column A contains n dimensional vectors. I would like to split this column into multiple columns. Basically, my dataset looks like :
A B C
[1,0,2,3,5] ... ...
[4,5,3,2,1] ... ...
.........................
And I want to have :
A0 A1 A2 A3 A4 B C
1 0 2 3 5 ... ...
4 5 3 2 1 ... ...
.......................
I can solve this problem by using apply function and for loops, I think. But, I imagine that there exists a better (faster, easier to read, ...) way to do so.
Edit: My post gets marked as duplicate. But the given answers have a solution which leads to more rows. I want more columns as shown above.
Thanks,
This question already has answers here:
How to only do string manupilation on column of pandas that have 4 digits or less?
(3 answers)
Closed 3 years ago.
I have a pandas dataframe with a column for Phone however, the data is a bit inconsistent. Here are some examples that I would like to focus on.
df["Phone"]
0 732009852
1 738073222
2 755920306
3 0755353288
Row 3 has the necessary leading 0 for an Australian number. How do I update rows like 0,1 and 2?
Use pandas.Series.str.zfill:
s = pd.Series(['732009852', '0755353288'])
s.str.zfill(10)
Output:
0 0732009852
1 0755353288
Or pd.Series.str.rjust:
print(df["Phone"].str.rjust(10, '0'))
Output:
0 0732009852
1 0738073222
2 0755920306
3 0755353288
This question already has answers here:
Use a list of values to select rows from a Pandas dataframe
(8 answers)
Filter dataframe rows if value in column is in a set list of values [duplicate]
(7 answers)
Closed 5 years ago.
This is a subset of a dataframe:
index drug_id values
1 le.1 f
2 le.7 h
3 le.10 9
4 le.11 10
5 le.15 S
I am going to remove rows that values in the drug_id column are: le.7, le.10, le.11.
This is my code:
df.drop(df.drug_id[['le.7', 'le.10', 'le.11']], inplace = True )
I also tried this:
df.drop(df.drug_id == ['le.7', 'le.10', 'le.11'], inplace = True )
But none of them worked. Any suggestion ?