Can I search for values to match results from another dataframe? - python

I want to do is line up a value together from 2 dataframes but they differ in shape and size.
Say I want to extract column D from one of the dataframe and append it to another
DataFrame1:
A B C D
1 1 0 2
1 4 0 1
1 0 2 4
2 2 3 0
2 1 0 1
Dataframe2
A B C D
1 1 0 54
1 4 0 10
1 0 2 54
2 2 3 55
2 1 0 34
outcome I'm looking for:
A B C D newD
1 1 0 2 54
1 4 0 1 10
1 0 2 4 54
2 2 3 0 55
2 1 0 1 34
I tried this
DataFrame1['newD'] = DataFrame2.loc[DataFrame1[['A', 'B', 'C']] == DataFrame2['A', 'B', 'C']]['D']
but I got a keyword error: KeyError: ('A', 'B', 'C')
Is there an easy way to get this result?
bonus question - Is it possible to have multiple criteria in search(i.e. D not null or something?)?

Isn't it merge:
pd.merge(df1,df2, on=['A','B','C'], how='left')
Output:
A B C D_x D_y
0 1 1 0 2 54
1 1 4 0 1 10
2 1 0 2 4 54
3 2 2 3 0 55
4 2 1 0 1 34

Related

All possible permutations within groups of column pandas

I have a df
a b c d
1 0 1 2 4
2 0 1 3 5
3 0 2 1 7
4 1 3 2 5
Within groups, grouped by 'a' and 'b' I want all possible permutations of 'c'
a b c d
1 0 1 2 4
0 1 3 5
0 2 1 7
2 0 1 3 5
0 1 2 4
0 2 1 7
3 1 3 2 5
...
...
I tried:
s=pd.Series({x: list(it.permutations(y) )for x , y in df.groupby(['a','b']).c})
0 1 [(3,2),(2,3)]
2 [(1,)]
1 3 [(2,)]
Explode() only does not do what I need, since I need all combinations of groups within subgroups.
For example in this case there are 2 different ways to combine rows 1 and 2. If row 2 would have been 2 different permutations, it would be 2*2=4 ways.
Does anybody have an idea?
Fix your code with groupby and explode
s=pd.Series({x: list(itertools.permutations(y) )for x , y in df.groupby('a').b}).explode().explode().reset_index()
index 0
0 0 1
1 0 2
2 0 3
3 0 1
4 0 3
5 0 2
6 0 2
7 0 1
8 0 3
9 0 2
10 0 3
11 0 1
12 0 3
13 0 1
14 0 2
15 0 3
16 0 2
17 0 1
18 1 1
19 1 2
20 1 2
21 1 1

Concatenate groups of multiple dataframes

I have a df1:
a b c
1 0 1 4
2 0 2 5
3 1 1 3
and a second df2:
a b c
1 0 1 5
2 0 2 5
3 1 1 4
These df's have the same goups in a and b. Within groupby of 'a' and 'b' I want df2 underneath df1:
a b c
1 0 1 4
2 0 1 5
3 0 2 5
4 0 2 5
5 1 1 3
6 1 1 4
How can I combine groupby() and concat() to get the desired output?
You can do concat then sort_values
df=pd.concat[df1,df2]).sort_values(['a','b']).reset_index(drop=True)

Pandas DataFrame - count 0s in every row

I have dataframe that looks like this
x = pd.DataFrame.from_dict({'A':[1,2,0,4,0,6], 'B':[0, 0, 0, 44, 48, 81], 'C':[1,0,1,0,1,0]})
(assume it might have other columns).
I want to add a column, which specifies for each row, how many 0s there are in the specific columns A,B,C.
A B C num_zeros
0 1 0 1 1
1 2 0 0 2
2 0 0 1 2
3 4 44 0 1
4 0 48 1 1
5 6 81 0 1
Create a boolean dtype dataframe using ==, then use sum with axis=1:
x['num_zeros'] = (x == 0).sum(1)
Output:
A B C num_zeros
0 1 0 1 1
1 2 0 0 2
2 0 0 1 2
3 4 44 0 1
4 0 48 1 1
5 6 81 0 1
Now, if you want explicitly define which columns, ie... on count in B and C columns, then you can use this:
x['Num_zeros_in_BC'] = (x == 0)[['B','C']].sum(1)
Output:
A B C num_zeros Num_zeros_in_BC
0 1 0 1 1 1
1 2 0 0 2 2
2 0 0 1 2 1
3 4 44 0 1 1
4 0 48 1 1 0
5 6 81 0 1 1

Pad dataframe discontinuous column

I have the following dataframe:
Name B C D E
1 A 1 2 2 7
2 A 7 1 1 7
3 B 1 1 3 4
4 B 2 1 3 4
5 B 3 1 3 4
What I'm trying to do is to obtain a new dataframe in which, for rows with the same "Name", the elements in the "B" column are continuous, hence in this example for rows with "Name" = A, the dataframe would have to be padded with elements ranging from 1 to 7, and the values for columns C, D, E should be 0.
Name B C D E
1 A 1 2 2 7
2 A 2 0 0 0
3 A 3 0 0 0
4 A 4 0 0 0
5 A 5 0 0 0
6 A 6 0 0 0
7 A 7 0 0 0
8 B 1 1 3 4
9 B 2 1 5 4
10 B 3 4 3 6
What I've done so far is to turn the B column values for the same "Name" into continuous values:
new_idx = df_.groupby('Name').apply(lambda x: np.arange(x.index.min(), x.index.max() + 1)).apply(pd.Series).stack()
and reindexing the original (having set B as the index) df using this new Series, but I'm having trouble reindexing using duplicates. Any help would be appreciated.
You can use:
def f(x):
a = np.arange(x.index.min(), x.index.max() + 1)
x = x.reindex(a, fill_value=0)
return (x)
new_idx = (df.set_index('B')
.groupby('Name')
.apply(f)
.drop('Name', 1)
.reset_index()
.reindex(columns=df.columns))
print (new_idx)
Name B C D E
0 A 1 2 2 7
1 A 2 0 0 0
2 A 3 0 0 0
3 A 4 0 0 0
4 A 5 0 0 0
5 A 6 0 0 0
6 A 7 1 1 7
7 B 1 1 3 4
8 B 2 1 3 4
9 B 3 1 3 4

Create calculated column of sum values of other columns in pandas

I have a dataframe with about 60 columns and the following structure:
A B C Y
0 12 1 0 1
1 13 1 0 [....] 0
2 14 0 1 1
3 15 1 0 0
4 16 0 1 1
I want to create a zth column which will be the sum of the values from columns B to Y.
How can I proceed?
To create a copy of the dataframe while including a new column, use assign
df.assign(Z=df.loc[:, 'B':'Y'].sum(1))
A B C Y Z
0 12 1 0 1 2
1 13 1 0 0 1
2 14 0 1 1 2
3 15 1 0 0 1
4 16 0 1 1 2
To assign it to the same dataframe, in place, use
df['Z'] = df.loc[:, 'B':'Y'].sum(1)
df
A B C Y Z
0 12 1 0 1 2
1 13 1 0 0 1
2 14 0 1 1 2
3 15 1 0 0 1
4 16 0 1 1 2
Try this
df['z']=df.iloc[:,1:].sum(1)
You could
In [2361]: df.assign(Z=df.loc[:, 'B':'Y'].sum(1))
Out[2361]:
A B C Y Z
0 12 1 0 1 2
1 13 1 0 0 1
2 14 0 1 1 2
3 15 1 0 0 1
4 16 0 1 1 2

Categories

Resources