How to turn string into dictionary with conditionals? - python

I have a dataframe (very large, millions of rows). Here how it looks:
id value
a1 0:0,1:10,2:0,3:0,4:7
b4 0:5,1:0,2:0,3:0,4:1
c5 0:0,1:3,2:2,3:0,4:0
k2 0:0,1:2,2:0,3:4,4:0
I want to turn those strings into dictionary, but only those key value pairs, where there is no 0. So desired result is:
id value
a1 {1:10, 4:7}
b4 {4:1}
c5 {1:3, 2:2}
k2 {1:2}
How to do that? when I try to use dict() function but it brings KeyError: 0:
df["value"] = dict(df["value"])
So I have problems with turning it into dictionary in the first place
I also have tried this:
df["value"] = json.loads(df["value"])
but it brings same error

This could do the trick, simply using list comprehensions:
import pandas as pd
dt = pd.DataFrame({"id":["a1", "b4", "c5", "k2"],
"value":["0:0,1:10,2:0,3:0,4:7","0:5,1:0,2:0,3:0,4:1","0:0,1:3,2:2,3:0,4:0","0:0,1:2,2:0,3:4,4:0"]})
def to_dict1(s):
return [dict([map(int, y.split(":")) for y in x.split(",") if "0" not in y.split(":")]) for x in s]
dt["dict"] = to_dict1(dt["value"])
Another way to obtain the same result would be using regular expressions (the pattern (?!0{1})(\d) matches any number but a single 0):
import re
def to_dict2(s):
return [dict([map(int, y) for y in re.findall("(?!0{1})(\d):(?!0{1})(\d+)", x)]) for x in s]
In terms of performance, to_dict1 is almost 20% faster, according to my tests.

This code will make a result you want. I made a sample input as you provided, and printed an expected result at the end.
import pandas as pd
df = pd.DataFrame(
{
'id': ['a1', 'b4', 'c5', 'k2'],
'value': ['0:0,1:10,2:0,3:0,4:7', '0:5,1:0,2:0,3:0,4:1', '0:0,1:3,2:2,3:0,4:0', '0:0,1:2,2:0,3:4,4:0']
}
)
value = [] # temporal value to save only key, value pairs without 0
for i, row in df.iterrows():
pairs = row['value'].split(',')
d = dict()
for pair in pairs:
k, v = pair.split(':')
k = int(k)
v = int(v)
if (k != 0) and (v != 0):
d[k] = v
value.append(d)
df['value'] = pd.Series(value)
print(df)
# id value
#0 a1 {1: 10, 4: 7}
#1 b4 {4: 1}
#2 c5 {1: 3, 2: 2}
#3 k2 {1: 2, 3: 4}

def make_dict(row):
""" Requires string list of shape
["0":"0", "1":"10", ...]"""
return {key: val for key, val
in map(lambda x: map(int, x.split(":")), row)
if key != 0 and val != 0}
df["value"] = df.value.str.split(",").apply(make_dict)

This is how I would do it:
def string_to_dict(s):
d = {}
pairs = s.split(',') # get each key pair
for pair in pairs:
key, value = pair.split(':') # split key from value
if int(value): # skip the pairs with zero value
d[key] = value
return d
df['value'] = df['value'].apply(string_to_dict)

use a dictionary comprehension to exclude key or value items equal to zero
txt="""id value
a1 0:0,1:10,2:0,3:0,4:7
b4 0:5,1:0,2:0,3:0,4:1
c5 0:0,1:3,2:2,3:0,4:0
k2 0:0,1:2,2:0,3:4,4:0 """
df = pd.DataFrame({"id":["a1", "b4", "c5", "k2"],
"value":["0:0,1:10,2:0,3:0,4:7","0:5,1:0,2:0,3:0,4:1","0:0,1:3,2:2,3:0,4:0","0:0,1:2,2:0,3:4,4:0"]})
for key,row in df.iterrows():
results=[]
{results.append({int(k),int(v)}) if int(k)!=0 and int(v)!=0 else None for k,v in (x.split(':') for x in row['value'].split(','))}
df.loc[key,'value']=results
print(df)
output:
id value
0 a1 [{1, 10}, {4, 7}]
1 b4 [{1, 4}]
2 c5 [{1, 3}, {2}]
3 k2 [{1, 2}, {3, 4}]
​

Related

Look up from a dictionary to fill pandas DataFrame with a condition

I have a dictionary that's something like this.
d = {
"p1": ["$0.00", nan, "$25.00"],
"p2": ["$30.25", nan, "$12.25"],
"p3": ["$0.00"],
"p4": [nan, "$15.00"]
}
I'd like to look up values from the dictionary and assign them to a column in pandas DataFrame. The DataFrame is something like this:
df = pd.DataFrame({
'name': ["p1","p2","p3","p4"],
'val': [np.nan, np.nan, 4, np.nan]
})
Assignment:
def fun(name):
# look up from dict
# apply condition
return
df['val'] = df.apply(lambda x: func(x['name']), axis=1)
For the assignment, the condition is to find the largest value from the dictionary, if length of the value pair > 0, i.e there is more than 1 element in the list. Note, the elements are stored as strings. So, need the function needs to convert them to float type.
Expected output:
First rework your dictionary to keep the "highest" value with a custom key. Then perform a simple map:
d2 = {k: max((e for e in v if pd.notna(e)),
key=lambda x: float(x.strip('$')) if isinstance(x, str) else x)
for k,v in d.items()}
# {'p1': '$25.00', 'p2': '$30.25', 'p3': '$0.00', 'p4': '$15.00'}
df['val'] = df['name'].map(d2)
output:
name val
0 p1 $25.00
1 p2 $30.25
2 p3 $0.00
3 p4 $15.00

Weird behaviour while manipulating Pandas dataframe within a dictionary

I am unable to understand this behaviour. I have a dataframe, which is present as a "value" inside a dictionary my_dict
my_dict = {'a': pd.DataFrame({'x': [1], 'y': [2]})}
print(my_dict)
>>{'a': x y
0 1 2}
Now, when I attempt a mathematical operation on the dataframe, that works, but a column renaming on the dataframe does not work -
for key, val in my_dict.items():
val['z'] = val['x'] * val['y']
val = val.rename(columns = {'x': 'new_x'})
print(my_dict)
{'a': x y z
0 1 2 2}
The mathematical operation val['z'] = val['x'] * val['y'] resulted in a new column z in the dataframe within my_dict
But the column renaming operation val = val.rename(columns = {'x': 'new_x'}) has no effect.
Why don't I see a column new_x in my_dict. What is going on?
change the assign to inplace
for key, val in my_dict.items():
val['z'] = val['x'] * val['y']
val.rename(columns = {'x': 'new_x'},inplace=True)
my_dict
Out[26]:
{'a': new_x y z
0 1 2 2}

How to find sum of dictionaries in a pandas DataFrame across all rows?

I have a DataFrame
df = pd.DataFrame({'keywords': [{'a': 3, 'b': 4, 'c': 5}, {'c':1, 'd':2}, {'a':5, 'c':21, 'd':4}, {'b':2, 'c':1, 'g':1, 'h':1, 'i':1}]})
I want to add all the elements across all rows that would give the result without using iterrows:
a: 8
b: 6
c: 28
d: 6
g: 1
h: 1
i: 1
note: no element occurs twice in a single row in the original DataFrame.
Using collections.Counter, you can sum an iterable of Counter objects. Since Counter is a subclass of dict, you can then feed to pd.DataFrame.from_dict.
from collections import Counter
counts = sum(map(Counter, df['keywords']), Counter())
res = pd.DataFrame.from_dict(counts, orient='index')
print(res)
0
a 8
b 6
c 28
d 6
g 1
h 1
i 1
Not sure how this compares in terms of optimization with #jpp's answer, but I'll give it a shot.
# What we're starting out with
df = pd.DataFrame({'keywords': [{'a': 3, 'b': 4, 'c': 5}, {'c':1, 'd':2}, {'a':5, 'c':21, 'd':4}, {'b':2, 'c':1, 'g':1, 'h':1, 'i':1}]})
# Turns the array of dictionaries into a DataFrame
values_df = pd.DataFrame(df["keywords"].values.tolist())
# Sums up the individual keys
sums = {key:values_df[key].sum() for key in values_df.columns}

Python 3.6: create new dict using values from another as indices

In Python 3.6.3, I have the following dict D1:
D1 = {0: array([1, 2, 3], dtype=int64), 1: array([0,4], dtype=int64)}
Each value inside the array is the index of the key of another dict D2:
D2 = {'Jack': 1, 'Mike': 2, 'Tim': 3, 'Paul': 4, 'Tommy': 5}
I am trying to create a third dict, D3, with the same keys as D1, and as values the keys of D2 corresponding to the indices of D1.values().
The result I am aiming for is:
D3 = {0: ['Mike','Tim','Paul'], 1: ['Jack','Tommy']}
My approach is partial in that I struggle to figure out how to tell D3 to get the keys from D1 and the values from D2. I am not too sure about that and. Any ideas?
D3 = {key:list(D1.values())[v] for key in D1.keys() and v in D2[v]}
You could use a dict-comprehension like so:
from numpy import array
D1 = {0: array([1, 2, 3]), 1: array([0,4])}
D2 = {'Jack': 1, 'Mike': 2, 'Tim': 3, 'Paul': 4, 'Tommy': 5}
temp = dict(zip(D2.values(), D2.keys())) # inverting key-value pairs
D3 = {k: [temp.get(i+1, 'N\A') for i in v] for k, v in D1.items()}
which results in:
{0: ['Mike', 'Tim', 'Paul'], 1: ['Jack', 'Tommy']}
If you're using Python 3.6+ you can use enumerate to create a dict to look up the names in D2 by index, and then map the indices in D1 to it:
r = dict(enumerate(D2))
D3 = {k: list(map(r.get, v)) for k, v in D1.items()}
D3 would become:
{0: ['Mike', 'Tim', 'Paul'], 1: ['Jack', 'Tommy']}
This is untested, but I believe this should get you headed in the right direction. I find it helpful sometimes to break out a complicated one-liner into multiple lines
D3={}
for d1k,d1v in D1.items():
D3[d1k] = []
for idx in d1v:
D3[d1k].append(D2[idx])
Might not be the best solution but works
D3={}
for key in D1.keys():
value_list=D1.get(key)
value_list= [(lambda x: x+1)(x) for x in value_list]
temp=[]
for d2_key,value in D2.items():
if value in value_list:
temp.append(d2_key)
D3[key]=temp
Output:
{0: ['Tim', 'Mike', 'Paul'], 1: ['Jack', 'Tommy']}
Here you go!
D1 = {0:[1, 2, 3], 1: [0,4]}
D2 = {'Jack': 1, 'Mike': 2, 'Tim': 3, 'Paul': 4, 'Tommy': 5}
D2_inverted = {v: k for k, v in D2.iteritems()}
D3={}
for key in D1:
temp = []
for value in D1[key]:
temp.append(D2_inv[value+1])
D3[key] = temp
print D3
Iterate the keys from D1;
Create a temporary list to store the values you wish to assign to the new dict, and fill it with the desired values from D2. (inverted its keys and values for simplicity);
Assign to D3.

How can I concatenate dicts (values to values of the same key and new key)? [duplicate]

This question already has answers here:
How do I merge two dictionaries in a single expression in Python?
(43 answers)
Closed 6 years ago.
I have a problem with concatenating dictionaries. Have so much code so I show in example what my problem is.
d1 = {'the':3, 'fine':4, 'word':2}
+
d2 = {'the':2, 'fine':4, 'word':1, 'knight':1, 'orange':1}
+
d3 = {'the':5, 'fine':8, 'word':3, 'sequel':1, 'jimbo':1}
=
finald = {'the':10, 'fine':16, 'word':6, 'knight':1, 'orange':1, 'sequel':1, 'jimbo':1}
It is prepering wordcounts for wordcloud. I dont know how to concatenate values of the keys it is puzzle for me. Please help.
Best regards
I would use a Counter from collections for this.
from collections import Counter
d1 = {'the':3, 'fine':4, 'word':2}
d2 = {'the':2, 'fine':4, 'word':1, 'knight':1, 'orange':1}
d3 = {'the':5, 'fine':8, 'word':3, 'sequel':1, 'jimbo':1}
c = Counter()
for d in (d1, d2, d3):
c.update(d)
print(c)
Outputs:
Counter({'fine': 16, 'the': 10, 'word': 6, 'orange': 1, 'jimbo': 1, 'sequel': 1, 'knight': 1})
import itertools
d1 = {'the':3, 'fine':4, 'word':2}
d2 = {'the':2, 'fine':4, 'word':1, 'knight':1, 'orange':1}
d3 = {'the':5, 'fine':8, 'word':3, 'sequel':1, 'jimbo':1}
dicts = [d1, d2, d3]
In [31]: answer = {k:sum(d[k] if k in d else 0 for d in dicts) for k in itertools.chain.from_iterable(dicts)}
In [32]: answer
Out[32]:
{'sequel': 1,
'the': 10,
'fine': 16,
'jimbo': 1,
'word': 6,
'orange': 1,
'knight': 1}
def sumDicts(*dicts):
summed = {}
for subdict in dicts:
for (key, value) in subdict.items():
summed[key] = summed.get(key, 0) + value
return summed
Shell example:
>>> d1 = {'the':3, 'fine':4, 'word':2}
>>> d2 = {'the':2, 'fine':4, 'word':1, 'knight':1, 'orange':1}
>>> d3 = {'the':5, 'fine':8, 'word':3, 'sequel':1, 'jimbo':1}
>>> sumDicts(d1, d2, d3)
{'orange': 1, 'the': 10, 'fine': 16, 'jimbo': 1, 'word': 6, 'knight': 1, 'sequel': 1}

Categories

Resources