Starting from an imported df from excel like that:
Code
Time
Rev
AAA
5
3
AAA
3
2
AAA
6
1
BBB
10
2
BBB
5
1
I want to add a new column like that evidence the last revision:
Code
Time
Rev
Last
AAA
5
3
OK
AAA
3
2
NOK
AAA
6
1
NOK
BBB
10
2
OK
BBB
5
1
NOK
The df is already sorted by 'Code' and 'Rev'
df= df.sort_values(['Code', 'Rev'],
ascending = [True,False])
I thought to evaluate the column 'Code', if the value in column Code is equal to the value in upper row I must have NOK in the new column.
Unfortunately, I am not able to write it in python
You can do:
#Create a column called 'Last' with 'NOK' values
df['Last'] = 'NOK'
#Skipping sorting because you say df is already sorted.
#Then locate the first row in each group and change its value to 'OK'
df.loc[df.groupby('Code', as_index=False).nth(0).index, 'Last'] = 'OK'
You can use pandas.groupby.cumcount and set every first row of group to 'OK'.
dict_ = {
'Code': ['AAA', 'AAA', 'AAA', 'BBB', 'BBB'],
'Time': [5, 3, 6, 10, 5],
'Rev': [3, 2, 1, 2, 1],
}
df = pd.DataFrame(dict_)
df['Last'] = 'NOK'
df.loc[df.groupby('Code').cumcount() == 0,'Last']='OK'
This gives us the expected output:
df
Code Time Rev Last
0 AAA 5 3 OK
1 AAA 3 2 NOK
2 AAA 6 1 NOK
3 BBB 10 2 OK
4 BBB 5 1 NOK
or you can try fetching the head of each group and set the value to OK for it.
df.loc[df.groupby('Code').head(1).index, 'Last'] = 'OK'
which gives us the same thing
df
Code Time Rev Last
0 AAA 5 3 OK
1 AAA 3 2 NOK
2 AAA 6 1 NOK
3 BBB 10 2 OK
4 BBB 5 1 NOK
Related
I have one column called "A" with only values 0 or 1. I have another column called "B". If column A value=0, I want the column B value to equal "dog". If column A value=1, I want the column B value to equal "cat".
Sample DataFrame column:
print(df)
A
0 0
1 1
Is there anyway to fill the B column as such without a for loop?
Desired:
print(df)
A B
0 0 Cat
1 1 Dog
Thanks
Can Simply can try Below using map...
Sample Data
print(df)
A
0 0
1 1
2 0
3 1
4 1
5 1
6 0
7 0
8 1
Result:
df['B'] = df['A'].map({0:'Cat', 1:'Dog'})
print(df)
A B
0 0 Cat
1 1 Dog
2 0 Cat
3 1 Dog
4 1 Dog
5 1 Dog
6 0 Cat
7 0 Cat
8 1 Dog
Next time, please post your research and minimal reproducible code. See comments
import pandas as pd
d = {'A': [0, 1, 0]}
df = pd.DataFrame(data=d)
m = {0: ("Dog"), 1: ("Cat")}
df['B'] = df['A'].map(lambda x: m[x])
I have a dataframe that looks like below
Group
ID
1
AAA
1
BBB
1
CCC
2
AAA
2
DDD
2
CCC
3
AAA
3
GGG
3
TTT
Here i want to find the number of ids that are present in "group 1 only", "group 1 and 2", "group 1, 2 and 3".
I want the final table to look like below
Group
Count
1
3
2
2
3
1
This is just an example table but i have 10 groups and millions of rows of data like this and i need an efficient way to calculate the same.
Try with crosstab then cumsum
pd.crosstab(df.Group,df.ID).cumsum().eq([1,2,3],axis=0).sum(1).reset_index(name='count')
Out[70]:
Group count
0 1 3
1 2 2
2 3 1
I have a dataframe with 100s of columns and 1000s of rows but the basic structure is
Index 0 1 2
0 AAA NaN AAA
1 NaN BBB NaN
2 NaN NaN CCC
3 DDD DDD DDD
I would like to add two new columns one would be and id which would be equal to the first value in each row the second would be a count of the values in each row. It would look like this. To be clear all rows will always have the same value.
Index id count 0 1 2
0 AAA 2 AAA NaN AAA
1 BBB 1 NaN BBB NaN
2 CCC 1 NaN NaN CCC
3 DDD 3 DDD DDD DDD
Any help in figuring out a way to do this would be greatly appreciated. Thanks
This should work.
df['id'] = df.bfill(axis=1).iloc[:, 0].fillna('All NANs')
df['count'] = df.drop(columns=["id"]).notnull().sum(axis=1)
To maintain the order of columns:
df = df[list(df.columns[-2:]) + list(df.columns[:-2])]
Create the Dataframe
test_df = pd.DataFrame([['AAA',np.nan,'AAA'], [np.nan,'BBB',np.nan], [np.nan,np.nan, 'CCC'], ['DDD','DDD','DDD']])
Count the non-NaN elements in each row as count
test_df['count'] = test_df.notna().sum(axis=1)
Option-1: Select the first element in the row as id (regardless of NaN value)
test_df['id'] = test_df[0]
Option-2: Select the first non-NaN element as id for each row
test_df['id'] = test_df.apply(lambda x: x[x.first_valid_index()], axis=1)
I have an existing Pandas Data-frame that I want to manipulate according to the following pattern:
The existing table has different set of codes in column 'code'. Each 'code' has certain labels listed in column 'label'. Each label has been tagged with either 0 or 1.
I have a requirement to add a 'new_column' with values 0 or 1 for each set of 'code', depending on the following condition:
Fill 1 in the 'new_column' only when all the 'label' of a particular 'code'
has value equals to 1 in the 'tag' column. Note I need to fill 1 for all the rows belonging to that particular 'code'.
As Shown in the desired Table, only code=30 has all the 'label' set in the 'tag' column equals to 1. Therefore i set the 'new_column' equals to 1 for that particular code. Rest of the codes have set to 0 value.
Existing Table:
code label tag
0 10 AAA 0
1 10 BBB 1
2 10 CCC 0
3 10 DDD 0
4 10 EEE 0
5 20 AAA 1
6 20 CCC 0
7 20 DDD 1
8 30 BBB 1
9 30 CCC 1
10 30 EEE 1
Desired Table
code label tag new_column
0 10 AAA 0 0
1 10 BBB 1 0
2 10 CCC 0 0
3 10 DDD 0 0
4 10 EEE 0 0
5 20 AAA 1 0
6 20 CCC 0 0
7 20 DDD 1 0
8 30 BBB 1 1
9 30 CCC 1 1
10 30 EEE 1 1
I have not tried any solution yet as it seems beyond my present level of expertise.
I think the right answer for this question is that given by #user3483203 in the comments:
df['new_column'] = df.groupby('code')['tag'].transform(all).astype(int)
The transform method applies to the dataframe whatever is passed to it, keeping the axis length the same.
The simple example in the documentation clearly explains the usage.
Coming to this particular question, the following happens when you run this snippet:
You first perform the grouping with respect to the 'code'. You end up with a DataFrameGroupBy object.
Next, from this you choose the tag column, ending up with a SeriesGroupBy object.
To this grouping, you apply the all function via transform, ultimately typecasting the boolean values to type int.
Basically, you can understand it like this (the values are binary to make them more related to your answer):
>>> int(all([1, 1, 1, 1]))
1
>>> int(all([1, 0, 1, 1]))
0
Finally, you are assigning the column you just created to the column new_column to the old dataframe.
the initial answer by user3483203 works. here is a variation. but his way was more concise.
I'm trying to replace a row in a dataframe with the row of another dataframe only if they share a common column.
Here is the first dataframe:
index no foo
0 0 1
1 1 2
2 2 3
3 3 4
4 4 5
5 5 6
and the second dataframe:
index no foo
0 2 aaa
1 3 bbb
2 22 3
3 33 4
4 44 5
5 55 6
I'd like my result to be
index no foo
0 0 1
1 1 2
2 2 aaa
3 3 bbb
4 4 5
5 5 6
The result of the inner merge between both dataframes returns the correct rows, but I'm having trouble inserting them at the correct index in the first dataframe
Any help would be greatly appreciated.
Thank you.
This should work as well
df1['foo'] = pd.merge(df1, df2, on='no', how='left').apply(lambda r: r['foo_y'] if r['foo_y'] == r['foo_y'] else r['foo_x'], axis=1)
You could use apply, there is probably a better way than this:
In [67]:
# define a function that takes a row and tries to find a match
def func(x):
# find if 'no' value matches, test the length of the series
if len(df1.loc[df1.no ==x.no, 'foo']) > 0:
return df1.loc[df1.no ==x.no, 'foo'].values[0] # return the first array value
else:
return x.foo # no match so return the existing value
# call apply and using a lamda apply row-wise (axis=1 means row-wise)
df.foo = df.apply(lambda row: func(row), axis=1)
df
Out[67]:
index no foo
0 0 0 1
1 1 1 2
2 2 2 aaa
3 3 3 bbb
4 4 4 5
5 5 5 6
[6 rows x 3 columns]