Two column DataFrame to transition table (pivot) [duplicate] - python

This question already has answers here:
Get statistics for each group (such as count, mean, etc) using pandas GroupBy?
(9 answers)
How can I pivot a dataframe?
(5 answers)
Closed 3 years ago.
I have a pandas dataframe with two columns. I want to measure the transition count, that is, the number of times that each unique first column value is related to each unique second column value. This should be a pivot or pivot_table but I am stuck. In the code pasted, trial is the input dataframe, and ans is the answer dataframe what I would like to see by manipulating the trial dataframe.
I did not spot a similar dataframe question which has only two columns. The others used pivot on a third table where a mean or sum aggfunc were used. This is a case where there are only two columns, and I want to count the transitions. The other questions also used numerical columns where aggregation is possible. I want to count the columns for a non-numeric value.
If there is a similar question, would be very helpful if someone can point me to it.
trial=pd.DataFrame({'col1':list('AABCCCDDDD'),'col2':list('XYXXXYYXZZ')})
index col1 col2
0 A X
1 A Y
2 B X
3 C X
4 C X
5 C Y
6 D Y
7 D X
8 D Z
9 D Z
ans=pd.DataFrame({'col1':list('ABCD'),'X':[1,1,2,1],'Y':[1,0,1,1],'Z':[0,0,0,2]})
ans.set_index('col1')
col1 X Y Z
A 1 1 0
B 1 0 0
C 2 1 0
D 1 1 2

Related

fill a dataframe with value of another dataframe according to columns value [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 10 months ago.
I have two dataframe:
the first one, let's say dfrA
x,y,z
0,0,1
0,1,2
0,2,3
0,3,4
1,0,5
1,1,6
1,2,7
1,3,8
2,0,9
2,1,10
2,2,11
2,3,12
3,0,13
3,1,14
3,2,15
3,3,16
and another one, let's say dfrB
x,y
1,2
2,3
I would like to add a column in dfrB according with z value in the dfrA which has the same x and y of the dfrB.
In other words I expect:
x,y,z
1,2,7
2,3,12
I am able to a empty column to dfrB:
df_support = pd.DataFrame(columns=['z'])
dfrB = dfrB.join(df_support, how="outer")
how can now fill column z in dfrB? I would like to avoid to do a cycle full of if.
You can try pandas.DataFrame.merge
dfrB['z'] = dfrB.merge(dfrA, on=['x', 'y'], how='left')['z']
print(dfrB)
x y z
0 1 2 7
1 2 3 12

Are the values ​of column xy of df1 also present in column zy of df2? [duplicate]

This question already has answers here:
Check if Pandas column contains value from another column
(3 answers)
Check if value from one dataframe exists in another dataframe
(4 answers)
Closed 11 months ago.
I have two dataframes and I want to check which value of df1 in col1 also occurs in df2 in col1. If it occurs: a 1 in col2_new, otherwise a 0. Is it best to do this using a list? So column of df1 converted into list and then a loop over the column of the other data frame or is there a more elegant way?
df1 (before):
index
col1
1
a
2
b
3
c
df2:
index
col1
1
a
2
e
3
b
df1 (after):
index
col1
col2_new
1
a
1
2
b
1
3
c
0
Use Series.isin with converting mask to integers:
df1['col2_new'] = df1['col1'].isin(df2['col1']).astype(int)
Or:
df1['col2_new'] = np.where(df1['col1'].isin(df2['col1']), 1, 0)

reshape data to split one column into multiple columns based on delimiter in pandas or otherwise in python [duplicate]

This question already has answers here:
How to unnest (explode) a column in a pandas DataFrame, into multiple rows
(16 answers)
Closed 3 years ago.
I have the following dataframe
df_in = pd.DataFrame({
'State':['C','B','D','A','C','B'],
'Contact':['alpha a. theta| beta','beta| alpha a. theta| delta','Theta','gamma| delta','alpha|Eta| gamma| delta','beta'],
'Timestamp':[911583000000,912020000000,912449000000,912742000000,913863000000,915644000000]})
How do I transform it so that the second column which has pipe separated data is broken out into different rows as follows:
df_out = pd.DataFrame({
'State':['C','C','B','B','B','D','A','A','C','C','C','C','B'],
'Contact':['alpha a. theta','beta','beta','alpha a. theta','delta','Theta','gamma', 'delta','alpha','Eta','gamma','delta','beta'],
'Timestamp':[911583000000,911583000000,912020000000,912020000000,912020000000,912449000000,912742000000,912742000000,913863000000,913863000000,913863000000,913863000000,915644000000]})
print(df_in)
print(df_out)
I can use pd.melt but for that I already need to have the 'Contact' column broken out into multiple columns and not have all the contacts in one column separated by a delimiter.
You could split the column, then merge on the index:
df_in.Contact.str.split('|',expand=True).stack().reset_index()\
.merge(df_in.reset_index(),left_on ='level_0',right_on='index')\
.drop(['level_0','level_1','index','Contact'],1)
Out:
0 State Timestamp
0 alpha a. theta C 911583000000
1 beta C 911583000000
2 beta B 912020000000
3 alpha a. theta B 912020000000
4 delta B 912020000000
5 Theta D 912449000000
6 gamma A 912742000000
7 delta A 912742000000
8 alpha C 913863000000
9 Eta C 913863000000
10 gamma C 913863000000
11 delta C 913863000000
12 beta B 915644000000

Combining pandas rows based on condition [duplicate]

This question already has answers here:
Pandas groupby with delimiter join
(2 answers)
Concatenate strings from several rows using Pandas groupby
(8 answers)
Closed 3 years ago.
Given a Pandas Dataframe df, with column names 'Session', and 'List':
Can I group together the 'List' values for the same values of 'Session'?
My Approach
I've tried solving the problem by creating a new dataframe, and iterating through the rows of the inital dataframe while maintaing a session counter that I increment if I see that the session has changed.
If it hasn't changed, then I append the List value that corresponds to that rows value with a comma.
Whenever the session changes, I used strip to get rid of the last comma (extra).
Initial DataFrame
Session List
0 1 a
1 1 b
2 1 c
3 2 d
4 2 e
5 3 f
Required DataFrame
Session List
0 1 a,b,c
1 2 d,e
2 3 f
Can someone suggest something more efficient or simple?
Thank you in advance.
Use groupby and apply and reset_index:
>>> df.groupby('Session')['List'].agg(','.join).reset_index()
Session List
0 1 a,b,c
1 2 d,e
2 3 f
>>>

Error subsetting a data frame in python [duplicate]

This question already has an answer here:
Python - splitting dataframe into multiple dataframes based on column values and naming them with those values [duplicate]
(1 answer)
Closed 4 years ago.
I am learning python and pandas and am having trouble overcoming an error while trying to subset a data frame.
I have an input data frame:
df0-
Index Group Value
1 A 10
2 A 15
3 B 20
4 C 10
5 C 10
df0.dtypes-
Group object
Value float64
That I am trying to split out into unique values based off of the Group column. With the output looking something like this:
df1-
Index Group Value
1 A 10
2 A 15
df2-
Index Group Value
3 B 20
df3-
Index Group Value
4 C 10
5 C 10
So far I have written this code to subset the input:
UniqueGroups = df0['Group'].unique().tolist()
OutputFrame = {}
for x in UniqueAgencies:
ReturnFrame[str('ConsolidateReport_')+x] = UniqueAgencies[df0['Group']==x]
The code above returns the following error, which I can`t quite work my head around. Can anyone point me in the right direction?
*** TypeError: list indices must be integers or slices, not str
you can use groupby to group the column
for _, g in df0.groupby('Group'):
print g

Categories

Resources