I want to move the column value to another column depending on the condition.
In the table below, if column A is 4 or more, the value of A1_1 is moved to A1_3, if the value is 3, it is moved to A1_2, and if the value is less than 2, the value is kept in A1_1.
I want to apply the same logic to columns B, B_1, B1_2, and B1_3.
How to approach it?
A B A1_1 A1_2 A1_3 B1_1 B1_2 B1_3
1 1 Apple Apple
2 2 Banana Banana
3 3 Tomato Tomato
4 4 Apple Apple
5 5 Banana Banana
You can iterate over your Dataframe and easily apply this logic :
import numpy as np
for index,row in df.iterrows():
if row['A']>=4:
row['A1_3']=row['A1_1']
df.at[index,'A1_1']=np.nan
elif row['A']==3:
row['A1_2']=row['A1_1']
df.at[index,'A1_1']=np.nan
df.iterrows() return index and row in each iteration. index is index of row and row is entire row which you can access each cell by row['Name_Of_Column']
according to this answer you can replace np.nan with None or 0 depend on your needs.
You can use .apply(axis=1) to iterate through each row and move the columns as you deem fit.
def update_row(row):
col = 'A_1'
if(row['A']>3):
col='A_3'
elif(row['A']==3):
col='A_2'
# Set A_1 to empty string and move the value to the required column
a1_val = row['A_1']
row['A_1'] = ""
row[col] = a1_val
return row
df.apply(lambda x: update_row(x), axis=1)
Related
DF1 =[
Column A
Column B
Cell 1
Cell 2
Cell 3
Cell 4
Column A
Column B
Cell 1
Cell 2
Cell 3
Cell 4
]
DF2 = [ NY, FL ]
in this case DF1 and DF2 have two indexes.
The result I am looking for is the following
Main_DF =
[
Column A
Column B
Column C
Cell 1
Cell 2
NY
Cell 3
Cell 4
NY
Column A
Column B
Column C
Cell 1
Cell 2
FL
Cell 3
Cell 4
FL
]
I tried to use pd.concat, assign and insert
none give me the way I'm looking for the result to be
Lists hold references to dataframes. So, you can amend the dataframes and not need to amend the list at all.
So, I'd do something like...
for df, val in zip(DF1, DF2):
df['Column C'] = val
Using zip allows you to iterate though the two lists in sync with each other;
1st element of DF1 goes in to df, and 1st element of DF2 goes into val
2nd element of DF1 goes in to df, and 2nd element of DF2 goes into val
and so on
I have a huge dataframe with 40 columns (10 groups of 4 columns), with value in some groups and NaN for others. I want the values for all the row left-shifted, such that wherever values be present in that row, the final Dataframe should be filled with Group1 -> Group 2 -> Group 3 and so on.
Here is a sample dataframe and the required output below:
Here is the required output:
I have used the below code to achieve shifting the values left. However, if a value is missing in an available group, e.g. Item 2 type-1, or Item 3 cat-2, the below code will ignore that and will replace it with the value to its right, and so on.
v = df1.values
a = [[n]*v.shape[1] for n in range(v.shape[0])]
b = pd.isnull(v).argsort(axis=1, kind = 'mergesort')
df2 = pd.DataFrame(v[a,b],df1.index,df1.columns)
How to achieve this?
Thanks.
I want to remove the rows from the pandas dataframe, that contains the strings from a particular column whose length is greater than the desired length.
For example:
Input frame:
X Y
0 Hi how are you.
1 An apple
2 glass of water
3 I like to watch movie
Now, say I want to remove the rows which has the string of words with length greater than or equal to 4 from the dataframe.
The desired output frame must be:
X Y
1 An apple
2 glass of water
Row with value 0,3 in column 'X' is removed as the number of words in column 0 is 4 and column 3 is 5 respectively.
First split values by whitespace, get number of rows by Series.str.len and check by inverted condition >= to < with Series.lt for boolean indexing:
df = df[df['Y'].str.split().str.len().lt(4)]
#alternative with inverted mask by ~
#df = df[~df['Y'].str.split().str.len().ge(4)]
print (df)
X Y
1 1 An apple
2 2 glass of water
You can count the spaces:
df[df.Y.str.count('\s+').lt(3)]
X Y
1 1 An apple
2 2 glass of water
After using transpose on a dataframe there is always an extra row as a remainder from the initial dataframe's index for example:
import pandas as pd
df = pd.DataFrame({'fruit':['apple','banana'],'number':[3,5]})
df
fruit number
0 apple 3
1 banana 5
df.transpose()
0 1
fruit apple banana
number 3 5
Even when i have no index:
df.reset_index(drop = True, inplace = True)
df
fruit number
0 apple 3
1 banana 5
df.transpose()
0 1
fruit apple banana
number 3 5
The problem is that when I save the dataframe to a csv file by:
df.to_csv(f)
this extra row stays at the top and I have to remove it manually every time.
Also this doesn't work:
df.to_csv(f, index = None)
because the old index is no longer considered an index (just another row...).
It also happened when I transposed the other way around and I got an extra column which i could not remove.
Any tips?
I had the same problem, I solved it by reseting index before doing the transpose. I mean df.set_index('fruit').transpose():
import pandas as pd
df = pd.DataFrame({'fruit':['apple','banana'],'number':[3,5]})
df
fruit number
0 apple 3
1 banana 5
And df.set_index('fruit').transpose() gives:
fruit apple banana
number 3 5
Instead of removing the extra index, why don't try setting the new index that you want and then use slicing ?
step 1: Set the new index you want:
df.columns = df.iloc[0]
step 2: Create a new dataframe removing extra row.
df_new = df[1:]
I have a two column dataframe df, each row are distinct, one element in one column can map to one or more than one elements in another column. I want to filter OUT those elements. So in the final dataframe, one element in one column only map to a unique element in another column.
What I am doing is to groupby one column and count the duplicates, then remove rows with counts more than 1. and do it again for another column. I am wondering if there is a better, simpler way.
Thanks
edit1: I just realize my solution is INCORRECT, removing multi-mapping elements in column A reduces the number of mapping in column B, consider the following example:
A B
1 4
1 3
2 4
1 maps to 3,4 , so the first two rows should be removed, and 4 maps to 1,2. The final table should be empty. However, my solution will keep the last row.
Can anyone provide me a fast and simple solution ? thanks
Well, You could do something like the following:
>>> df
A B
0 1 4
1 1 3
2 2 4
3 3 5
You only want to keep a row if no other row has the value of 'A' and no other row as that value of 'B'. Only row three meets those conditions in this example:
>>> Aone = df.groupby('A').filter(lambda x: len(x) == 1)
>>> Bone = df.groupby('B').filter(lambda x: len(x) == 1)
>>> Aone.merge(Bone,on=['A','B'],how='inner')
A B
0 3 5
Explanation:
>>> Aone = df.groupby('A').filter(lambda x: len(x) == 1)
>>> Aone
A B
2 2 4
3 3 5
The above grabs the rows that may be allowed based on looking at column 'A' alone.
>>> Bone = df.groupby('B').filter(lambda x: len(x) == 1)
>>> Bone
A B
1 1 3
3 3 5
The above grabs the rows that may be allowed based on looking at column 'B' alone. And then merging the intersection leaves you with rows that only meet both conditions:
>>> Aone.merge(Bone,on=['A','B'],how='inner')
Note, you could also do a similar thing using groupby/transform. But transform tends to be slowish so I didn't do it as an alternative.