Map dictionary values in Pandas - python

I have a data frame (df) with the following:
var1
a 1
a 1
b 2
b 3
c 3
d 5
And a dictionary:
dict_cat = {
'x' = ['a', 'b', 'c'],
'y' = 'd' }
And I want to create a new column called cat in which depending of the var1 value, it takes the dict key value:
var1 cat
a 1 x
a 1 x
b 2 x
b 3 x
c 3 x
d 5 y
I have tried to map the dict to the variable using: df['cat'] = df['var1'].map(dict_cat), but since values are inside a list, Python do not recognize the values and I only get NaN values. There is a way to do this using map, or should I create a function that iterates over rows checking if var1 is in the dictionary lists?
Thanks!

You need swap keys with values to new dict and then use map:
print (df)
var1 var2
0 a 1
1 a 1
2 b 2
3 b 3
4 c 3
5 d 5
dict_cat = {'x' : ['a', 'b', 'c'],'y' : 'd' }
d = {k: oldk for oldk, oldv in dict_cat.items() for k in oldv}
print (d)
{'a': 'x', 'b': 'x', 'c': 'x', 'd': 'y'}
df['cat'] = df['var1'].map(d)
print (df)
var1 var2 cat
0 a 1 x
1 a 1 x
2 b 2 x
3 b 3 x
4 c 3 x
5 d 5 y
If first columns is index is possible use rename or convert index to_series and then use map:
print (df)
var1
a 1
a 1
b 2
b 3
c 3
d 5
dict_cat = {'x' : ['a', 'b', 'c'],'y' : 'd' }
d = {k: oldk for oldk, oldv in dict_cat.items() for k in oldv}
df['cat'] = df.rename(d).index
Or:
df['cat'] = df.index.to_series().map(d)
print (df)
var1 cat
a 1 x
a 1 x
b 2 x
b 3 x
c 3 x
d 5 y

Related

how to transform a dict of lists to a dataframe in python?

I have a dict in python like this:
d = {"a": [1,2,3], "b": [4,5,6]}
I want to transform in a dataframe like this:
letter
number
a
1
a
2
a
3
b
4
b
5
b
6
i have tried this code:
df = pd.DataFrame.from_dict(vulnerabilidade, orient = 'index').T
but this gave me:
a
1
2
3
b
4
5
6
You can always read your data in as you already have and then .melt it:
When passed no id_vars or value_vars, melt turns each of your columns into their own rows.
import pandas as pd
d = {"a": [1,2,3], "b": [4,5,6]}
out = pd.DataFrame(d).melt(var_name='letter', value_name='value')
print(out)
letter value
0 a 1
1 a 2
2 a 3
3 b 4
4 b 5
5 b 6
To use 'letter' and 'number' as column labels you could use:
a2 = [[key, val] for key, x in d.items() for val in x]
dict2 = pd.DataFrame(a2, columns = ['letter', 'number'])
which gives
letter number
0 a 1
1 a 2
2 a 3
3 b 4
4 b 5
5 b 6
Yet another possible solution:
(pd.Series(d, index=d.keys(), name='numbers')
.rename_axis('letters').reset_index()
.explode('numbers', ignore_index=True))
Output:
letters numbers
0 a 1
1 a 2
2 a 3
3 b 4
4 b 5
5 b 6
This will yield what you want (there might be a simpler way though):
import pandas as pd
my_dict = {"a": [1,2,3], "b": [4,5,6]}
my_list = [[key, val] for key in my_dict for val in my_dict[key] ]
df = pd.DataFrame(my_list, columns=['letter','number'])
df
# Out[106]:
# letter number
# 0 a 1
# 1 a 2
# 2 a 3
# 3 b 4
# 4 b 5
# 5 b 6

How to transform dataframe to from-to pairs?

If I have a dataframe:
>>> import pandas as pd
>>> df = pd.DataFrame([
... ['A', 'B', 'C', 'D'],
... ['E', 'B', 'C']
... ])
>>> df
0 1 2 3
0 A B C D
1 E B C None
>>>
I shoudl transform the dataframe to two columns format:
x, y
-----
A, B
B, C
C, D
E, B
B, C
For each row, from left to right, take two neighbor values and make a pair of it.
It is kind of from-to if you consider each row as a path.
How to do the transformation?
We can do explode with zip
s=pd.DataFrame(df.apply(lambda x : list(zip(x.dropna()[:-1],x.dropna()[1:])),axis=1).explode().tolist())
Out[336]:
0 1
0 A B
1 B C
2 C D
3 E B
4 B C
Update
s=df.apply(lambda x : list(zip(x.dropna()[:-1],x.dropna()[1:])),axis=1).explode()
s=pd.DataFrame(s.tolist(),index=s.index)
s
Out[340]:
0 1
0 A B
0 B C
0 C D
1 E B
1 B C
Pre-preparing the data could help too:
import pandas as pd
inp = [['A', 'B', 'C', 'D'],
['E', 'B', 'C']]
# Convert beforehand
inp2 = [[[i[k], i[k+1]] for k in range(len(i)-1)] for i in inp]
inp2 = inp2[0] + inp2[1]
df = pd.DataFrame(inp2)
print(df)
Output:
0 1
0 A B
1 B C
2 C D
3 E B
4 B C

Put keys of dictionary in column of DataFrame when the value of the key is in other column

This is the DataFrame:
d_vals ={'vals': [i for i in range(1, 6)]}
df = pd.DataFrame(d_vals)
df
vals
0 1
1 2
2 3
3 4
4 5
And this the dictionary:
d_groups = {
'a': [1, 2],
'b': [3, 5],
'c': [4]
}
The point is to get a new column groups with the key of the dictionary when the value of the column vals is in the values of the key.
The final DataFrame should be as follows:
vals groups
0 1 a
1 2 a
2 3 b
3 4 c
4 5 b
Use Series.map with flattening dictionary to new one:
#swap key values in dict
#http://stackoverflow.com/a/31674731/2901002
d = {k: oldk for oldk, oldv in d_groups.items() for k in oldv}
df['groups'] = df['vals'].map(d)
print (df)
vals groups
0 1 a
1 2 a
2 3 b
3 4 c
4 5 b

Mapping dictionary onto dataframe when dictionary key is a list

I have a dictionary where the values are lists:
dict = {1: ['a','b'], 2:['c', 'd']}
I want to map the dictionary onto col1 of my dataframe.
col1
a
c
If the value of col1 is IN one of the values of my dictionary, then I want to replace the value of col1 with the value of the dictionary key.
Like this, my dataframe will become:
col1
1
2
thanks in advance
I would convert the dictionary in the right way:
mapping = {}
for key, values in D.items():
for item in values:
mapping[item] = key
and then
df['col1'] = df['col1'].map(mapping)
You can also try using stack + reset_index and set_index with map.
d = pd.DataFrame({1: ['a','b'], 2:['c', 'd']})
mapping = d.stack().reset_index().set_index(0)["level_1"]
s = pd.Series(['a', 'c'], name="col1")
s.map(mapping)
0 1
1 2
Name: col1, dtype: int64
Step by step demo
d.stack()
0 1 a
2 c
1 1 b
2 d
dtype: object
d.stack().reset_index()
level_0 level_1 0
0 0 1 a
1 0 2 c
2 1 1 b
3 1 2 d
d.stack().reset_index().set_index(0)
level_0 level_1
0
a 0 1
c 0 2
b 1 1
d 1 2
Finally, we select the level_1 column as our mapping to pass in map function.
do you mean something like this???
D = {1 : ['a', 'b'], 2 : ['c', 'd']}
for key, value in D.items():
for each in value:
if each in D[key]:
print(each, "is in D[%s]" % key)
o/p:
a is in D[1]
b is in D[1]
c is in D[2]
d is in D[2]

replicate rows in pandas by specific column with the values from that column

What would be the most efficient way to solve this problem?
i_have = pd.DataFrame(data={
'id': ['A', 'B', 'C'],
'v' : [ 's,m,l', '1,2,3', 'k,g']
})
i_need = pd.DataFrame(data={
'id': ['A','A','A','B','B','B','C', 'C'],
'v' : ['s','m','l','1','2','3','k','g']
})
I though about creating a new df and while iterating over i_have append the records to the new df. But as number of rows grow, it can take a while.
Use numpy.repeat with numpy.concatenate for flattening:
#create lists by split
splitted = i_have['v'].str.split(',')
#get legths of each lists
lens = splitted.str.len()
df = pd.DataFrame({'id':np.repeat(i_have['id'], lens),
'v':np.concatenate(splitted)})
print (df)
id v
0 A s
0 A m
0 A l
1 B 1
1 B 2
1 B 3
2 C k
2 C g
Thank you piRSquared for solution for repeat multiple columns:
i_have = pd.DataFrame(data={
'id': ['A', 'B', 'C'],
'id1': ['A1', 'B1', 'C1'],
'v' : [ 's,m,l', '1,2,3', 'k,g']
})
print (i_have)
id id1 v
0 A A1 s,m,l
1 B B1 1,2,3
2 C C1 k,g
splitted = i_have['v'].str.split(',')
lens = splitted.str.len()
df = i_have.loc[i_have.index.repeat(lens)].assign(v=np.concatenate(splitted))
print (df)
id id1 v
0 A A1 s
0 A A1 m
0 A A1 l
1 B B1 1
1 B B1 2
1 B B1 3
2 C C1 k
2 C C1 g
If you have multiple columns then first split the data by , with expand = True(Thank you piRSquared) then stack and ffill i.e
i_have = pd.DataFrame(data={
'id': ['A', 'B', 'C'],
'v' : [ 's,m,l', '1,2,3', 'k,g'],
'w' : [ 's,8,l', '1,2,3', 'k,g'],
'x' : [ 's,0,l', '1,21,3', 'ks,g'],
'y' : [ 's,m,l', '11,2,3', 'ks,g'],
'z' : [ 's,4,l', '1,2,32', 'k,gs'],
})
i_want = i_have.apply(lambda x :x.str.split(',',expand=True).stack()).reset_index(level=1,drop=True).ffill()
If the values are not equal sized then
i_want = i_have.apply(lambda x :x.str.split(',',expand=True).stack()).reset_index(level=1,drop=True)
i_want['id'] = i_want['id'].ffill()
Output i_want
id v w x y z
0 A s s s s s
1 A m 8 0 m 4
2 A l l l l l
3 B 1 1 1 11 1
4 B 2 2 21 2 2
5 B 3 3 3 3 32
6 C k k ks ks k
7 C g g g g gs
Here's another way
In [1667]: (i_have.set_index('id').v.str.split(',').apply(pd.Series)
.stack().reset_index(name='v').drop('level_1', 1))
Out[1667]:
id v
0 A s
1 A m
2 A l
3 B 1
4 B 2
5 B 3
6 C k
7 C g
As pointed in comment.
In [1672]: (i_have.set_index('id').v.str.split(',', expand=True)
.stack().reset_index(name='v').drop('level_1', 1))
Out[1672]:
id V
0 A s
1 A m
2 A l
3 B 1
4 B 2
5 B 3
6 C k
7 C g

Categories

Resources