Dataframe method to Transpose multiple rows to single column - python

How can i transpose multiple rows to a single column.
**** my rows contain a word 'Narrative', so there are many similar words.
if the word 'Narrative' is found then I want to transpose it to a single column.
example input input data
OUTPUT needed output
original dataframe
Updated

Find rows and where x == 'narrative' and move them to columns:
idx = df[df['x'] == 'narrative'].index
df1 = df.drop(idx).assign(narrative=df.loc[idx, 'y'].values).reset_index(drop=True)
Output:
>>> df1
x y z narrative
0 a b c A
1 d a b B

Related

add/combine columns after searching in a DataFrame

I'm trying to copy data from different columns to a particular column in the same DataFrame.
Index
col1A
col2A
colB
list
CT
CW
CH
0
1
:
1
b
2
2
3
3d
But prior to that I wanted to search if those columns(col1A,col2A,colB) exist in the DataFrame and group those columns which are present and move the grouped data to relevant columns(CT,CH,etc) like,
CH
CW
CT
0
1
1
1
b
b
2
2
2
3
3d
3d
I did,
col_list1 = ['ColA','ColB','ColC']
test1 = any([ i in df.columns for i in col_list1 ])
if test1==True:
df['CH'] = df['Col1A'] +df['Col2A']
df['CT'] = df['ColB']
this code is throwing me a keyerror
.
I want it to ignore columns that are not present and add only those that are present
IIUC, you can use Python set or Series.isin to find the common columns
cols = list(set(col_list1) & set(df.columns))
# or
cols = df.columns[df.columns.isin(col_list1)]
df['CH'] = df[cols].sum(axis=1)
Instead of just concatenating the columns with +, collect them into a list and use sum with axis=1:
df['CH'] = np.sum([df[c] for c in cl if c in df], axis=1)

grouping and printing the maximum in a dataframe in python

A dataframe has 3 Columns
A B C
^0hand(%s)leg$ 27;30 42;54
^-(%s)hand0leg 39;30 47;57
^0hand(%s)leg$ 24;33 39;54
So column A has regex patterns like this if those patterns are similar for example now row 1 and row 3 is similar so it has to merge the two rows and output only the maximum as below:
Output:
A B C
^0hand(%s)leg$ 27;33 42;54
^-(%s)hand0leg 39;30 47;57
Any leads will be helpful
You could use:
(df.set_index('A').stack()
.str.extract('(\d+);(\d+)').astype(int)
.groupby(level=[0,1]).agg(max).astype(str)
.assign(s=lambda d: d[0]+';'+d[1])['s'] # OR # .apply(';'.join, axis=1)
.unstack(1)
.loc[df['A'].unique()] ## only if the order of rows matters
.reset_index()
)
output:
A B C
0 ^0hand(%s)leg$ 27;33 42;54
1 ^-(%s)hand0leg 39;30 47;57

How can I merge multiple values in one cell in python

I have two dataframes. I have to lookup at dataframe 2 and input corresponding values in dataframe 1 or make a new dataframe. How can I do it in python?
Inputs:
Dataframe 1:
Value: 10, [20,30], 5
Dataframe 2:
Value: 10, 20, 30, 5
Letter: a, b, a, c
Output should be like this
Dataframe 3:
Value: 10, [20,30], 5
Letter: a, [b,a], c
input 1
input 2
output
So you have one DataFrame column that can contain either a simple value or a list, and a second DataFrame that you want to use as a translations table.
I will assume that the column Value in df2 only contain unique values.
A simple way is to explode df1.Value to have one single value in each cell, and reset the index to store the original index in a dataframe column to be able to later aggregate on it. Then you just merge with df2 and aggregate on the saved original index:
df1.reset_index().explode('Value').merge(df2, how='left', on='Value'
).groupby('index').agg(
lambda x: x.iat[0] if len(x) == 1 else x.to_list())
it gives as expected:
Value Letter
index
0 10 a
1 [20, 30] [b, a]
2 5 c

Pandas Return a list of Rows whose Substrings are found in another column

I have the following dataset;
I would like to end up with a column like this one;
Ideally, I would like to convert the columns to the same case and split the strings by spaces and return rows that contain a substring that is found on the other column.
Check values of Series.str.splited first column by DataFrame.isin with flatten splitted values of second column and get at least one True value per row by DataFrame.any, pass to boolean indexing with filter first column and if necessary create one column Dataframe by Series.to_frame:
df = pd.DataFrame({'column_a':['ga lt','ka','ku','na ma',np.nan, np.nan],
'column_b':['se','ga','ma po','na','ka ch', 'wa wo']})
vals = [y for x in df['column_b'] for y in x.split()]
mask = df['column_a'].str.split(expand=True).isin(vals).any(axis=1)
df = df.loc[mask, 'column_a'].to_frame('column_a_in_column_b')
print (df)
column_a_in_column_b
0 ga lt
1 ka
3 na ma

How to append column values of one dataframe to column of another dataframe

I'm working with 2 dataframes, A & B. Dataframe A is populated with values, while dataframe B is empty except for a header structure
I want to take the value of column in dataframe A, and append them to the corresponding column in dataframe B.
I've placed the values of the dataframe A column I want to append in a list. I 've tried setting the destination column values to equal the list of start column values, but that gives me the following error:
dataframeB[x] = list(dataframeA[A])
This yields the following error:
ValueError: Length of values does not match length of index
The result I expect is
Dataframe A's column A transfers over to Dataframe B's column x
A B C D
1 2 3 4
1 2 3 4
Dataframe B
x y
- -
Create the dataframe with the data already in it...
dataframeB = pd.DataFrame(dataframeA['A'], columns = ['x'])
Then you can add columns in from the other dataframe:
dataframeB['y'] = dataframeA['B']
Result:
x y
1 2
1 2

Categories

Resources