Mapping values from one DataFrame to another - python

I am trying to figure out some fast and clean way to map values from one DataFrame A to another. Let say I have DataFrame like this one:
C1 C2 C3 C4 C5
1 a b c a
2 d a e b a
3 a c
4 b e e
And now I want to change those letter codes to actual values. My DataFrame Bwith explanations looks like that:
Code Value
1 a 'House'
2 b 'Bike'
3 c 'Lamp'
4 d 'Window'
5 e 'Car'
So far my brute-force approach was to just go through every element in A and check with isin() the value in B. I know that I can also use Series (or simple dictionary) as an B instead of DataFrame and use for example Code column as a index. But still I would need to use multiple loops to map everything.
Is there any other nice way to achieve my goal?

You could use replace:
A.replace(B.set_index('Code')['Value'])
import pandas as pd
A = pd.DataFrame(
{'C1': ['a', 'd', 'a', 'b'],
'C2': ['b', 'a', 'c', 'e'],
'C3': ['c', 'e', '', 'e'],
'C4': ['a', 'b', '', ''],
'C5': ['', 'a', '', '']})
B = pd.DataFrame({'Code': ['a', 'b', 'c', 'd', 'e'],
'Value': ["'House'", "'Bike'", "'Lamp'", "'Window'", "'Car'"]})
print(A.replace(B.set_index('Code')['Value']))
yields
C1 C2 C3 C4 C5
0 'House' 'Bike' 'Lamp' 'House'
1 'Window' 'House' 'Car' 'Bike' 'House'
2 'House' 'Lamp'
3 'Bike' 'Car' 'Car'

Another alternative is map. Although it requires looping over columns, if I didn't mess up the tests, it is still faster than replace:
A = pd.DataFrame(np.random.choice(list("abcdef"), (1000, 1000)))
B = pd.DataFrame({'Code': ['a', 'b', 'c', 'd', 'e'],
'Value': ["'House'", "'Bike'", "'Lamp'", "'Window'", "'Car'"]})
B = B.set_index("Code")["Value"]
%timeit A.replace(B)
1 loop, best of 3: 970 ms per loop
C = pd.DataFrame()
%%timeit
for col in A:
C[col] = A[col].map(B).fillna(A[col])
1 loop, best of 3: 586 ms per loop

Related

Different ways to get unique get_level_values()

Consider the following DataFrame df:
df=
kind A B
names a1 a2 b1 b2 b3
Time
0.0 0.7804 0.5294 0.1895 0.9195 0.0508
0.1 0.1703 0.7095 0.8704 0.8566 0.5513
0.2 0.8147 0.9055 0.0506 0.4212 0.2464
0.3 0.3985 0.4515 0.7118 0.6146 0.2682
0.4 0.2505 0.2752 0.4097 0.3347 0.1296
When I issue the command levs = df.columns.get_level_values("kind"), I get that levs is equal to
Index(['A', 'A', 'A', 'B', 'B'], dtype='object', name='kind')
whereas I would like to have that levs is equal to Index(['A', 'B'], dtype='object', name='kind').
One way to achieve such an objective could be to run levs=list(set(levs)), but I am wondering if there are any other simple methods.
I think you can use levels:
out = df.columns.levels[0]
print (out)
Index(['A', 'B'], dtype='object')
EDIT: One idea with lookup by names of MultiIndex:
d = {v: k for k, v in enumerate(df.columns.names)}
print (d)
{'kind': 0, 'names': 1}
out = df.columns.levels[d['kind']]
print (out)
Index(['A', 'B'], dtype='object', name='kind')

For loop in Panda Dataframe with multiple conditions and different data types [duplicate]

This question already has answers here:
Pandas conditional creation of a series/dataframe column
(13 answers)
Closed 2 years ago.
I need help with this code:
d={'Name': ['Mark', 'Lala', "Nina", 'Catherine', 'Izzy', 'Ozno', 'Kim'],
'Level' : ['A', 'B', 'C', 'D', 'E', 'D', 'D'],
'Seats' : [3000, 5000, 4000, 1000, 1000, 2600, 2400]}
df = pd.DataFrame(data = d)
I want to add a new column called "Level_corrected", this is a duplicate of df['Level'], but if df['Level'] = 'D' and df['Seats'] <2500, than the 'D' value in df['Level_corrected'] will become 'D-'.
The desired result is:
d={'Name': ['Mark', 'Lala', "Nina", 'Catherine', 'Izzy', 'Ozno', 'Kim'],
'Level' : ['A', 'B', 'C', 'D', 'E', 'D', 'D'],
'Seats' : [3000, 5000, 4000, 1000, 1000, 2600, 2400],
'Level_corrected': ['A', 'B', 'C', 'D-', 'E', 'D', 'D-']}
df = pd.DataFrame(data = d)
I've done several attempts (I didn't save the code ...), but it seems like the error is because of the different data types. The Level column is an 'object' and the Seats column is a float64.
Could someone please help me?
Many thanks!
Use Series.mask with chained both masks with & for bitwise AND and compare by Series.eq for equal and Series.lt for less:
df['Level_corrected'] = df['Level'].mask(df['Level'].eq('D') & df['Seats'].lt(2500), 'D-')
print (df)
Name Level Seats Level_corrected
0 Mark A 3000 A
1 Lala B 5000 B
2 Nina C 4000 C
3 Catherine D 1000 D-
4 Izzy E 1000 E
5 Ozno D 2600 D
6 Kim D 2400 D-

how to extract index (multiple-level) for dataframe

mydf = pd.DataFrame({'dts':['1/1/2000','1/1/2000','1/1/2000','1/2/2000', '1/3/2000', '1/3/2000'],
'product':['A', 'B', 'A','A', 'A','B'],
'value':[1,2,2,3,6,1]})
a =mydf.groupby(['dts','product']).sum()
so a has multi-level index now...
a
Out[1]:
value
dts product
1/1/2000 A 3
B 2
1/2/2000 A 3
1/3/2000 A 6
B 1
how to extract product-level index in a? a.index['product']does not work.
Using get_level_values
>>> a.index.get_level_values(1)
Index(['A', 'B', 'A', 'A', 'B'], dtype='object', name='product')
You can also use the name of the level:
>>> a.index.get_level_values('product')
Index(['A', 'B', 'A', 'A', 'B'], dtype='object', name='product')

Sort or groupby dataframe in python using given string

I have given dataframe
Id Direction Load Unit
1 CN05059815 LoadFWD 0,0 NaN
2 CN05059815 LoadBWD 0,0 NaN
4 ....
....
and the given list.
list =['CN05059830','CN05059946','CN05060010','CN05060064' ...]
I would like to sort or group the data by a given element of the list.
For example,
The new data will have exactly the same sort as the list. The first column would start withCN05059815 which doesn't belong to the list, then the second CN05059830 CN05059946 ... are both belong to the list. With remaining the other data
One way is to use Categorical Data. Here's a minimal example:
# sample dataframe
df = pd.DataFrame({'col': ['A', 'B', 'C', 'D', 'E', 'F']})
# required ordering
lst = ['D', 'E', 'A', 'B']
# convert to categorical
df['col'] = df['col'].astype('category')
# set order, adding values not in lst to the front
order = list(set(df['col']) - set(lst)) + lst
# attach ordering information to categorical series
df['col'] = df['col'].cat.reorder_categories(order)
# apply ordering
df = df.sort_values('col')
print(df)
col
2 C
5 F
3 D
4 E
0 A
1 B
Consider below approach and example:
df = pd.DataFrame({
'col': ['a', 'b', 'c', 'd', 'e']
})
list_ = ['d', 'b', 'a']
print(df)
Output:
col
0 a
1 b
2 c
3 d
4 e
Then in order to sort the df with the list and its ordering:
df.reindex(df.assign(dummy=df['col'])['dummy'].apply(lambda x: list_.index(x) if x in list_ else -1).sort_values().index)
Output:
col
2 c
4 e
3 d
1 b
0 a

Python Pandas lookup and replace df1 value from df2

I have two dataframes
df df2
df column FOUR matches with df2 column LOOKUP COL
I need to match df column FOUR with df2 column LOOKUP COL and replace df column FOUR with the corresponding values from df2 column RETURN THIS
The resulting dataframe could overwrite df but I have it listed as result below.
NOTE: THE INDEX DOES NOT MATCH ON EACH OF THE DATAFRAMES
df = pd.DataFrame([['a', 'b', 'c', 'd'],
['e', 'f', 'g', 'h'],
['j', 'k', 'l', 'm'],
['x', 'y', 'z', 'w']])
df.columns = ['ONE', 'TWO', 'THREE', 'FOUR']
ONE TWO THREE FOUR
0 a b c d
1 e f g h
2 j k l m
3 x y z w
df2 = pd.DataFrame([['a', 'b', 'd', '1'],
['e', 'f', 'h', '2'],
['j', 'k', 'm', '3'],
['x', 'y', 'w', '4']])
df2.columns = ['X1', 'Y2', 'LOOKUP COL', 'RETURN THIS']
X1 Y2 LOOKUP COL RETURN THIS
0 a b d 1
1 e f h 2
2 j k m 3
3 x y w 4
RESULTING DF
ONE TWO THREE FOUR
0 a b c 1
1 e f g 2
2 j k l 3
3 x y z 4
You can use Series.map. You'll need to create a dictionary or a Series to use in map. A Series makes more sense here but the index should be LOOKUP COL:
df['FOUR'] = df['FOUR'].map(df2.set_index('LOOKUP COL')['RETURN THIS'])
df
Out:
ONE TWO THREE FOUR
0 a b c 1
1 e f g 2
2 j k l 3
3 x y z 4
df['Four']=[df2[df2['LOOKUP COL']==i]['RETURN THIS'] for i in df['Four']]
Should be something like sufficient to do the trick? There's probably a more pandas native way to do it.
Basically, list comprehension - We generate a new array of df2['RETURN THIS'] values based on using the lookup column as we iterate over the i in df['Four'] list.

Categories

Resources