How can I merge multiple values in one cell in python - python

I have two dataframes. I have to lookup at dataframe 2 and input corresponding values in dataframe 1 or make a new dataframe. How can I do it in python?
Inputs:
Dataframe 1:
Value: 10, [20,30], 5
Dataframe 2:
Value: 10, 20, 30, 5
Letter: a, b, a, c
Output should be like this
Dataframe 3:
Value: 10, [20,30], 5
Letter: a, [b,a], c
input 1
input 2
output

So you have one DataFrame column that can contain either a simple value or a list, and a second DataFrame that you want to use as a translations table.
I will assume that the column Value in df2 only contain unique values.
A simple way is to explode df1.Value to have one single value in each cell, and reset the index to store the original index in a dataframe column to be able to later aggregate on it. Then you just merge with df2 and aggregate on the saved original index:
df1.reset_index().explode('Value').merge(df2, how='left', on='Value'
).groupby('index').agg(
lambda x: x.iat[0] if len(x) == 1 else x.to_list())
it gives as expected:
Value Letter
index
0 10 a
1 [20, 30] [b, a]
2 5 c

Related

If a dataframe contains a value in a column, how to perform a calculation in another column? - Pandas/ Python

Original Dataframe:
A
B
C
123
1500
0
Output:
A
B
C
123
1500
1.2
Logic needed: If column A contains 123, then for column C, take the value of B and multiply it with 0.0008, all the other values in column C should not be altered.
You could check if column "A" values are 123 or not and use mask on "C" to replace values there:
df['C'] = df['C'].mask(df['A']==123, df['B']*0.0008)
Output:
A B C
0 123 1500 1.2
I think you simply want to check if dataframe['A'] contains 123, if yes it should multiply the column B value with 0.0008 and append it in column C. If that's the case here's how you should go with it:
#Considering your dataframe is a pandas dataframe stored in variable 'df'
vals_a = df.index[df['A'] == 123].tolist () #Returns the list of the indexes containing 123.
for b in vals_a:
val_b = df['B'].values[b]
df.loc[val_b,'C'] = df.loc[b, 'B']*0.0008
This basically replaces the same index value in column 'C' with the mupltiplication of 0.0008 with the value in column B if column A includes 123.

Dataframe method to Transpose multiple rows to single column

How can i transpose multiple rows to a single column.
**** my rows contain a word 'Narrative', so there are many similar words.
if the word 'Narrative' is found then I want to transpose it to a single column.
example input input data
OUTPUT needed output
original dataframe
Updated
Find rows and where x == 'narrative' and move them to columns:
idx = df[df['x'] == 'narrative'].index
df1 = df.drop(idx).assign(narrative=df.loc[idx, 'y'].values).reset_index(drop=True)
Output:
>>> df1
x y z narrative
0 a b c A
1 d a b B

Extract row data from dictionary if dataframes based on filter on a column value

The dictionary dict_set has dataframes as the value for their keys.
I'm trying to extract data from a dictionary of dataframes based on a filter on 'A' column in the dataframe based on the value in column.
dict_set={}
dict_set['a']=pd.DataFrame({'A':[1,2,3],'B':[1,2,3]})
dict_set['b']=pd.DataFrame({'A':[1,4,5],'B':[1,5,6]})
df=pd.concat([dict_set[x][dict_set[x]['A']==1] for x in dict_set.keys()],axis=0)
output being the below.
A B
0 1 1
0 1 1
But I would want the output to be
A B x
0 1 1 a
0 1 1 b
Basically, I want the value of x to be present in the new dataframe formed as a column, say column x in the dataframe formed such that df[x] would give me the x values. Is there a simple way to do this?
Try this:
pd.concat([df.query("A == 1") for df in dict_set.values()], keys=dict_set.keys())\
.reset_index(level=0)\
.rename(columns={'level_0':'x'})
Output:
x A B
0 a 1 1
0 b 1 1
Details:
Let's get the dataframes from the dictionary using list comprehension and filter the datafames. Here, I choose to use query, but you could use boolean index with df[df['A'] == 1] also, then pd.concat with the keys parameter set to the dictionary keys. Lastly, reset_index level=0 and rename.

How to append column values of one dataframe to column of another dataframe

I'm working with 2 dataframes, A & B. Dataframe A is populated with values, while dataframe B is empty except for a header structure
I want to take the value of column in dataframe A, and append them to the corresponding column in dataframe B.
I've placed the values of the dataframe A column I want to append in a list. I 've tried setting the destination column values to equal the list of start column values, but that gives me the following error:
dataframeB[x] = list(dataframeA[A])
This yields the following error:
ValueError: Length of values does not match length of index
The result I expect is
Dataframe A's column A transfers over to Dataframe B's column x
A B C D
1 2 3 4
1 2 3 4
Dataframe B
x y
- -
Create the dataframe with the data already in it...
dataframeB = pd.DataFrame(dataframeA['A'], columns = ['x'])
Then you can add columns in from the other dataframe:
dataframeB['y'] = dataframeA['B']
Result:
x y
1 2
1 2

Replace a column in Pandas dataframe with another that has same index but in a different order

I'm trying to re-insert back into a pandas dataframe a column that I extracted and of which I changed the order by sorting it.
Very simply, I have extracted a column from a pandas df:
col1 = df.col1
This column contains integers and I used the .sort() method to order it from smallest to largest. And did some operation on the data.
col1.sort()
#do stuff that changes the values of col1.
Now the indexes of col1 are the same as the indexes of the overall df, but in a different order.
I was wondering how I can insert the column back into the original dataframe (replacing the col1 that is there at the moment)
I have tried both of the following methods:
1)
df.col1 = col1
2)
df.insert(column_index_of_col1, "col1", col1)
but both methods give me the following error:
ValueError: cannot reindex from a duplicate axis
Any help will be greatly appreciated.
Thank you.
Consider this DataFrame:
df = pd.DataFrame({'A': [1, 2, 3], 'B': [6, 5, 4]}, index=[0, 0, 1])
df
Out:
A B
0 1 6
0 2 5
1 3 4
Assign the second column to b and sort it and take the square, for example:
b = df['B']
b = b.sort_values()
b = b**2
Now b is:
b
Out:
1 16
0 25
0 36
Name: B, dtype: int64
Without knowing the exact operation you've done on the column, there is no way to know whether 25 corresponds to the first row in the original DataFrame or the second one. You can take the inverse of the operation (take the square root and match, for example) but that would be unnecessary I think. If you start with an index that has unique elements (df = df.reset_index()) it would be much easier. In that case,
df['B'] = b
should work just fine.

Categories

Resources