python-pandas: use values as index and count them in dataframe [duplicate] - python

This question already has answers here:
How to get value counts for multiple columns at once in Pandas DataFrame?
(14 answers)
Closed 2 years ago.
Need help with pandas.
I have this dataframe:
a b c
Yes No Yes
Yes Yes No
Yes No No
How can I implement a Pandas code that will turn my dataframe to:
a b c
Yes 3 1 1
No 0 2 2
I'm looking into using iloc and lambda but i'm really clueless. Is there a way for me to implement this?

This actually does the trick:
df.apply(pd.value_counts)

Related

Change the value from another dataframe [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 2 years ago.
I am new to python. I'm want to change all the values in the column 'Starting' from df_2 with the 'Station' column from df_1. I did it by using for loop . But How can I perform this task in simplest way?
df_1:
ID Station
0 1 Satose
1 2 Forlango
2 3 poterio
.
.
df_2:
Rail_Number Starting Ending
AABDD 3 44433
DLRAKA 1 45232
MiGOMu 2 18756
.
.
I have answered a similar question here :
Replace a value in a dataframe with a value from another dataframe
Step 1: Convert both columns from df_1 into a dictionary by using the following code:
d = dict(zip(df_1.ID,df_1.Station))
Step 2: Now we just need to map this dictionary and df_2:
df_2.Starting = df_1.ID.map(d)

Pandas dataframe with concatenated column [duplicate]

This question already has an answer here:
Pandas rolling sum on string column
(1 answer)
Closed 3 years ago.
I have a Pandas dataframe that looks like the below code. I need to add a dynamic column that concatenates every value in a sequence before a given line. A loop sounds like the logical solution but would be super inefficient over a very large dataframe (1M+ rows).
user_id=[1,1,1,1,2,2,2,3,3,3,3,3]
variable=["A","B","C","D","A","B","C","A","B","C","D","E"]
sequence=[0,1,2,3,0,1,2,0,1,2,3,4]
df=pd.DataFrame(list(zip(ID,variable,sequence)),columns =['User_ID', 'Variables','Seq'])
# Need to add a column dynamically
df['dynamic_column']=["A","AB","ABC","ABCD","A","AB","ABC","A","AB","ABC","ABCD","ABCDE"]
I need to be able to create the dynamic column in an efficient way based on the user_id and the sequence number. I have played with the pandas shift function and that just results in having to create a loop. Looking for some easy efficient way of creating that dynamic concatenated column.
This is cumsum:
df['dynamic_column'] = df.groupby('User_ID').Variables.apply(lambda x: x.cumsum())
Output:
0 A
1 AB
2 ABC
3 ABCD
4 A
5 AB
6 ABC
7 A
8 AB
9 ABC
10 ABCD
11 ABCDE
Name: Variables, dtype: object
Your question is a little vague, but would something like this work?
df['DynamicColumn'] = df['user_id'] + df['sequencenumber']

Combining pandas rows based on condition [duplicate]

This question already has answers here:
Pandas groupby with delimiter join
(2 answers)
Concatenate strings from several rows using Pandas groupby
(8 answers)
Closed 3 years ago.
Given a Pandas Dataframe df, with column names 'Session', and 'List':
Can I group together the 'List' values for the same values of 'Session'?
My Approach
I've tried solving the problem by creating a new dataframe, and iterating through the rows of the inital dataframe while maintaing a session counter that I increment if I see that the session has changed.
If it hasn't changed, then I append the List value that corresponds to that rows value with a comma.
Whenever the session changes, I used strip to get rid of the last comma (extra).
Initial DataFrame
Session List
0 1 a
1 1 b
2 1 c
3 2 d
4 2 e
5 3 f
Required DataFrame
Session List
0 1 a,b,c
1 2 d,e
2 3 f
Can someone suggest something more efficient or simple?
Thank you in advance.
Use groupby and apply and reset_index:
>>> df.groupby('Session')['List'].agg(','.join).reset_index()
Session List
0 1 a,b,c
1 2 d,e
2 3 f
>>>

How do I clean phone numbers in pandas [duplicate]

This question already has answers here:
How to only do string manupilation on column of pandas that have 4 digits or less?
(3 answers)
Closed 3 years ago.
I have a pandas dataframe with a column for Phone however, the data is a bit inconsistent. Here are some examples that I would like to focus on.
df["Phone"]
0 732009852
1 738073222
2 755920306
3 0755353288
Row 3 has the necessary leading 0 for an Australian number. How do I update rows like 0,1 and 2?
Use pandas.Series.str.zfill:
s = pd.Series(['732009852', '0755353288'])
s.str.zfill(10)
Output:
0 0732009852
1 0755353288
Or pd.Series.str.rjust:
print(df["Phone"].str.rjust(10, '0'))
Output:
0 0732009852
1 0738073222
2 0755920306
3 0755353288

subset the dataframe into a new one using copy [duplicate]

This question already has answers here:
why should I make a copy of a data frame in pandas
(8 answers)
Closed 4 years ago.
I have a dataframe df
a b c
0 5 6 9
1 6 7 10
2 7 8 11
3 8 9 12
So if I want to select only col a and b and store it in another df I would use something like this
df1 = df[['a','b']]
But I have seen places where people write it this way
df1 = df[['a','b']].copy()
Can anyone let me know what is .copy() because the earlier code works just fine.
For example, if you want to rename a dataframe (example using replace):
df2=df
df2=df2.replace('blah','foo')
Here:
df==df2
Will be:
True
You want it to only do to, df2:
df2=df.copy()
df2=df2.replace('blah','foo')
Then now:
df==df2
Returns:
False

Categories

Resources