I want to split an array into two array if object has 'confirmation' param. Are there any ways faster way than I used simple for loop. The array has a lot of elements. I have concern about performance.
Before
[
{
'id':'1'
},
{
'id':'2'
},
{
'id':'3',
'confirmation':'20',
},
{
'id':'4',
'confirmation':'10',
}
]
After
[{'id': 3, 'confirmation': 20}, {'id': 4, 'confirmation': 10}]
[{'id': 1}, {'id': 2}]
Implementation using for loop
$ python3
Python 3.4.3 (default, Nov 17 2016, 01:08:31)
dict1 = {"id":1}
dict2 = {"id":2}
dict3 = {"id":3, "confirmation":20}
dict4 = {"id":4, "confirmation":10}
list = [dict1, dict2, dict3, dict4]
list_with_confirmation = []
list_without_confirmation = []
for d in list:
if 'confirmation' in d:
list_with_confirmation.append(d)
else:
list_without_confirmation.append(d)
print(list_with_confirmation)
print(list_without_confirmation)
Update 1
This is the result on our real data. (3) is the fastest.
(1) 0.148394346
(2) 0.105772018
(3) 0.0339076519
_list = search()
logger.warning(time.time()) //1504691716.5748231
list_with_confirmation = []
list_without_confirmation = []
for d in _list:
if 'confirmation' in d:
list_with_confirmation.append(d)
else:
list_without_confirmation.append(d)
logger.warning(len(list_with_confirmation)) // 69427
logger.warning(time.time()) // 1504691716.7232175 (0.148394346) --- (1)
list_with_confirmation = [d for d in _list if 'confirmation' in d]
list_without_confirmation = [d for d in _list if not 'confirmation' in d]
logger.warning(len(list_with_confirmation)) // 69427
logger.warning(time.time()) // 1504691716.8289895 (0.105772018) --- (2)
lists = ([], [])
[lists['confirmation' in d].append(d) for d in _list]
logger.warning(len(lists[1])) // 69427
logger.warning(time.time()) // 1504691716.8628972 (0.0339076519) --- (3)
I could not know how to use timeit on my environment. sorry it is poor bench check..
List comprehension might be slightly faster:
list_with_confirmation = [d for d in list if "confirmation" in d]
list_without_confirmation = [d for d in list if "confirmation" not in d]
Refer to Why is list comprehension so faster?
Probably it is the fastest way, but you could try another:
lists = ([], [])
for d in source_list:
lists['confirmation' in d].append(d)
or even:
lists = ([], [])
[lists['confirmation' in d].append(d) for d in source_list]
This way lists[0] will be "without confirmation" and lists[1] will be "with confirmation". Do your own benchmarks.
Side note: don't use list for list name, as it overwrites list constructor function.
If you execute below code:
dict1 = {"id":1}
dict2 = {"id":2}
dict3 = {"id":3, "confirmation":20}
dict4 = {"id":4, "confirmation":10}
_list = [dict1, dict2, dict3, dict4]
import timeit
def fun(_list):
list_with_confirmation = []
list_without_confirmation = []
for d in _list:
if 'confirmation' in d:
list_with_confirmation.append(d)
else:
list_without_confirmation.append(d)
print(list_with_confirmation)
print(list_without_confirmation)
def my_fun(_list):
list_with_confirmation = [d for d in _list if 'confirmation' in d]
list_without_confirmation = [d for d in _list if not 'confirmation' in d]
print(list_with_confirmation)
print(list_without_confirmation)
if __name__ == '__main__':
print(timeit.timeit("fun(_list)", setup="from __main__ import fun, _list",number=1))
print(timeit.timeit("my_fun(_list)", setup="from __main__ import my_fun, _list",number=1))
You can get following statistics:
[{'confirmation': 20, 'id': 3}, {'confirmation': 10, 'id': 4}]
[{'id': 1}, {'id': 2}]
5.41210174561e-05
[{'confirmation': 20, 'id': 3}, {'confirmation': 10, 'id': 4}]
[{'id': 1}, {'id': 2}]
2.40802764893e-05
Which mean List comprehention is most optimize way for more reference you can see:blog
Related
Lets say I have:
dict_listA = [
{'id':0, 'b':1},
{'id':1, 'b':2},
{'id':2, 'b':3},
]
and
dict_listB = [
{'id':1, 'b':1},
{'id':2, 'b':3},
{'id':3, 'b':2},
]
How would I get a list of the id's where we have the intersection of these based on 'id' but symmetric difference based on b?
same_a_different_b = [
{'id':1, 'b':2},
]
currently this is my solution:
for d1 in list_dictA:
same_a_different_b = filter(lambda d2: d2['id'] == d1['id'] and d2['b'] != d1['b'], list_dictB)
I ask because this is currently the biggest time-sink in my program, I wish there was some way of doing it quicker. The result (same_a_different_b) is usually 0 or very small, one list has about 900 entries and the other around 1400. It currently takes 9 seconds.
Try this:
hashed = {e['id']: e['b'] for e in dict_listB}
same_a_different_b2 = [e for e in dict_listA if e['id'] in hashed and hashed[e['id']] != e['b']]
I think complexity of algorithm is equal to O(len(a) + len(b)).
For example in your solution it is equal to O(len(a) * len(b)).
If list can have duplicates:
hashed = defaultdict(set)
for e in dict_listB:
hashed[e['id']].add(e['b'])
same_a_different_b2 = [e for e in dict_listA if e['id'] in hashed and e['b'] not in hashed[e['id']]]
Compare speed (len(a) == len(b) == 2000):
from collections import defaultdict
import time
from itertools import product
dict_listA = [
{'id': 0, 'b': 1},
{'id': 1, 'b': 2},
{'id': 2, 'b': 3},
*[{'id': i, 'b': 1} for i in range(10000, 10000 + 2000)]
]
dict_listB = [
{'id': 1, 'b': 1},
{'id': 2, 'b': 3},
{'id': 3, 'b': 2},
*[{'id': i, 'b': 1} for i in range(20000, 20000 + 2000)]
]
same_a_different_b = [
{'id': 1, 'b': 2},
]
start_time = time.clock()
def previous_solution():
new_same_a_different_b = []
for d1 in dict_listA:
new_same_a_different_b.extend(filter(lambda d2: d2['id'] == d1['id'] and d2['b'] != d1['b'], dict_listB))
return new_same_a_different_b
def new_solution():
hashed = {e['id']: e['b'] for e in dict_listB}
return [e for e in dict_listA if e['id'] in hashed and hashed[e['id']] != e['b']]
def other_solution():
return [d1 for d1, d2 in product(dict_listA, dict_listB) if d2['id'] == d1['id'] and d2['b'] != d1['b']]
for func, name in [
(previous_solution, 'previous_solution'),
(new_solution, 'new_solution'),
(other_solution, 'other_solution')
]:
start_time = time.clock()
new_result = func()
print('{:20}: {:.5f}'.format(name, time.clock() - start_time))
assert new_result, same_a_different_b
Results:
previous_solution : 1.06517
new_solution : 0.00073
other_solution : 0.60582
Here is one way using list comprehension and itertools.prodcut:
In [41]: from itertools import product
In [42]: [d1 for d1, d2 in product(dict_listA, dict_listB) if d2['id'] == d1['id'] and d2['b'] != d1['b']]
Out[42]: [{'id': 1, 'b': 2}]
But note that this will generate duplicate results if you have multiple matched items within dict_listB. If you don't want to keep all the duplicate ones, you can use a set comprehension instead.
I have a list of the folowing form:
oldlist = [{'x': {'a':1,'b':2}, 'y':2},{'x':{'a':6,'b':7}, 'y':2},{'x':{'a':1,'b':2}, 'y':3},{'x':{'a':1,'b':2}, 'y':2},{'x':{'a':10,'b':11}, 'y':4}]
to be converted into
final = [{'x':{'a':1,'b':2},'y':[2,3,2],'count':3},{'x':{'a':6,'b':7},'y':[2],'count':1},{'x':{'a':10,'b':11},'y':[4],'count':1}]
I have tried
oldlist = [{'x': {'a':1,'b':2}, 'y':2},{'x':{'a':6,'b':7}, 'y':2},{'x':{'a':1,'b':2}, 'y':3},{'x':{'a':1,'b':2}, 'y':2},{'x':{'a':10,'b':11}, 'y':4}]
list1=[]
list2=[]
list3=[]
s = set([d['x'] for d in oldlist])
news=list(s)
for item in oldlist:
if item['x'] == news[0]:
list1.append(item['y'])
if item['x'] == news[1]:
list2.append(item['y'])
if item['x'] == news[2]:
list3.append(item['y'])
final=[]
dic1 = {'x':news[0],'y':list1,'count':len(list1)}
dic2 = {'x':news[1],'y':list2,'count':len(list2)}
dic3 = {'x':news[2],'y':list3,'count':len(list3)}
final.append(dic1)
final.append(dic2)
final.append(dic3)
print final
Getting
s = set([d['x'] for d in oldlist])
TypeError: unhashable type: 'dict'
Is there a simpler way to do it? Plus here I knew that x can have only three values so I created three variables list1, list2 and list3. What if x can have several other values and I have to find out a similar list of dictionaries like final! It should also work for strings!
EDIT:I tried this. But it all got messed up
s = list(frozenset(oldlist[0]['x'].items()))
print s
for item in oldlist:
s.append(frozenset(item['x'].items()))
The set function can only handle hashable objects, like a string, number, tuple e.t.c
Data types like List, dict are unhashable types, and hence the set function cannot handle them.
For some more clarity:
What do you mean by hashable in Python?
http://blog.lerner.co.il/is-it-hashable-fun-and-games-with-hashing-in-python/
A basic implementation of what you need:
for elem in oldlist:
found = False
for item in newlist:
if elem['x'] == item['x']:
y = item.get('y',[])
item['y'] = t.append(elem['y'])
found = True
break
if not found:
newlist.append({'x':elem['x'], 'y':[elem['y']]})
This will give you the expected result
You can use defaultdict where keys are frozenset objects created from value of x in the original dicts and values are list of relative y. Then you can construct the final result with list comprehension and turn frozensets back to dicts:
from collections import defaultdict
oldlist = [{'x': {'a':1,'b':2}, 'y':2},{'x':{'a':6,'b':7}, 'y':2},{'x':{'a':1,'b':2}, 'y':3},{'x':{'a':1,'b':2}, 'y':2},{'x':{'a':10,'b':11}, 'y':4}]
res = defaultdict(list)
for d in oldlist:
res[frozenset(d['x'].items())].append(d['y'])
final = [{'x': dict(k), 'y': v, 'count': len(v)} for k, v in res.items()] # [{'y': [2, 3, 2], 'x': {'a': 1, 'b': 2}, 'count': 3}, {'y': [4], 'x': {'a': 10, 'b': 11}, 'count': 1}, {'y': [2], 'x': {'a': 6, 'b': 7}, 'count': 1}]
Set function of python does not allow dictionaries and you can not force it, try another method instead. (Take a closer look of the comment on the 5th and 6th line)
Try this code:
oldlist = [{'x': {'a':1,'b':2}, 'y':2},{'x':{'a':6,'b':7}, 'y':2},{'x':{'a':1,'b':2}, 'y':3},{'x':{'a':1,'b':2}, 'y':2},{'x':{'a':10,'b':11}, 'y':4}]
list1=[]
list2=[]
list3=[]
s = [d['x'] for d in oldlist] # Placed the dictionaries in a list
s = result = [dict(tupleized) for tupleized in set(tuple(item.items()) for item in s)] # This is the manual way on removing duplicates dictionaries in a list instead of using set
news=list(s)
for item in oldlist:
if item['x'] == news[0]:
list1.append(item['y'])
if item['x'] == news[1]:
list2.append(item['y'])
if item['x'] == news[2]:
list3.append(item['y'])
final=[]
dic1 = {'x':news[0],'y':list1,'count':len(list1)}
dic2 = {'x':news[1],'y':list2,'count':len(list2)}
dic3 = {'x':news[2],'y':list3,'count':len(list3)}
final.append(dic1)
final.append(dic2)
final.append(dic3)
print final
I have a list of the folowing form:
oldlist = [{'x': 1, 'y':2},{'x':2, 'y':2},{'x':1, 'y':3},{'x':1, 'y':2},{'x':3, 'y':4}]
to be converted into
final = [{'x':1,'y':[2,3,2],'count':3},{'x':2,'y':[2],'count':1},{'x':3,'y':[4],'count':1}]
I have tried
oldlist = [{'x': {'a':1,'b':2}, 'y':2},{'x':{'a':6,'b':7}, 'y':2},{'x':{'a':1,'b':2}, 'y':3},{'x':{'a':1,'b':2}, 'y':2},{'x':{'a':10,'b':11}, 'y':4}]
list1=[]
list2=[]
list3=[]
s = set([d['x'] for d in oldlist])
news=list(s)
for item in oldlist:
if item['x'] == news[0]:
list1.append(item['y'])
if item['x'] == news[1]:
list2.append(item['y'])
if item['x'] == news[2]:
list3.append(item['y'])
final=[]
dic1 = {'x':news[0],'y':list1,'count':len(list1)}
dic2 = {'x':news[1],'y':list2,'count':len(list2)}
dic3 = {'x':news[2],'y':list3,'count':len(list3)}
final.append(dic1)
final.append(dic2)
final.append(dic3)
print final
Is there a simpler way to do it? Plus here I knew that x can have only three values so I created three variables list1, list2 and list3. What if x can have several other values and I have to find out a similar list of dictionaries like final! It should also work for strings!
You can collect the dicts to defaultdict where key is x from original dicts and value is list of related y values. Then use list comprehension to generate the final result:
from collections import defaultdict
l = [{'x':1, 'y':2},{'x':2, 'y':2},{'x':1, 'y':3},{'x':1, 'y':2},{'x':3, 'y':4}]
res = defaultdict(list)
for d in l:
res[d['x']].append(d['y'])
final = [{'x': k, 'y': v, 'count': len(v)} for k, v in res.items()] # [{'y': [2, 3, 2], 'x': 1, 'count': 3}, {'y': [2], 'x': 2, 'count': 1}, {'y': [4], 'x': 3, 'count': 1}]
x_list = [item['x'] for item in list1]
set_value = set(x_list)
final_lst = []
for i in set_value:
d = {}
d['x'] = i
d['count'] = x_list.count(i)
d['y'] = []
for item in list1:
if item['x'] == d['x']:
d['y'].append(item['y'])
final_lst.append(d)
print ("Final List", final_lst)
Ok, I used a little trick here to do the job, it's not the most pythonic way but it still gets the answer you want.
The problem was indexing? So a changed s to t, where t is a list and changed the end of the code because you have to put the keys on your dictionaries as strings, so x become 'x' and so on.
See the whole code below:
lista = [{'x':1, 'y':2},{'x':2, 'y':2},{'x':1, 'y':3},{'x':1, 'y':2},{'x':3, 'y':4}]
list1=[]
list2=[]
list3=[]
for d in lista:
print(d)
s = set([d['x'] for d in lista])
t = []
for element in s: #turn t into a list which has the same ordering as s
t.append(element)
for item in lista:
if item['x'] == t[0]:
list1.append(item['y'])
if item['x'] == t[1]:
list2.append(item['y'])
if item['x'] == t[2]:
list3.append(item['y'])
final=[]
dic1 = {'x':t[0],'y':list1,'count':len(list1)}
dic2 = {'x':t[1],'y':list2,'count':len(list2)}
dic3 = {'x':t[2],'y':list3,'count':len(list3)}
final.append(dic1)
final.append(dic2)
final.append(dic3)
print(final)
I have a list of dicts:
a =[{'id': 1,'desc': 'smth'},
{'id': 2,'desc': 'smthelse'},
{'id': 1,'desc': 'smthelse2'},
{'id': 1,'desc': 'smthelse3'}]
I would like to go trough the list and find those dicts that have the same id value (e.g. id=1) and create a new dict:
b = [{'id':1, 'desc' : [smth, smthelse2,smthelse3]},
{'id': 2, 'desc': 'smthelse'}]
You can try:
import operator, itertools
key = operator.itemgetter('id')
b = [{'id': x, 'desc': [d['desc'] for d in y]}
for x, y in itertools.groupby(sorted(a, key=key), key=key)]
It is better to keep the "desc" values as lists everywhere even if they contain a single element only. This way you can do
for d in b:
print d['id']
for desc in d['desc']:
print desc
This would work for strings too, just returning individual characters, which is not what you want.
And now the solution giving you a list of dicts of lists:
a =[{'id': 1,'desc': 'smth'},{'id': 2,'desc': 'smthelse'},{'id': 1,'desc': 'smthelse2'},{'id': 1,'desc': 'smthelse3'}]
c = {}
for d in a:
c.setdefault(d['id'], []).append(d['desc'])
b = [{'id': k, 'desc': v} for k,v in c.iteritems()]
b is now:
[{'desc': ['smth', 'smthelse2', 'smthelse3'], 'id': 1},
{'desc': ['smthelse'], 'id': 2}]
from collections import defaultdict
d = defaultdict(list)
for x in a:
d[x['id']].append(x['desc']) # group description by id
b = [dict(id=id, desc=desc if len(desc) > 1 else desc[0])
for id, desc in d.items()]
To preserve order:
b = []
for id in (x['id'] for x in a):
desc = d[id]
if desc:
b.append(dict(id=id, desc=desc if len(desc) > 1 else desc[0]))
del d[id]
I have 2 lists of dictionaries, say:
l1 = [{"customer":"amy", "order":2}, {"customer":"amy", "order":3}, {"customer":"basil", "order":4}]
l2 = [{"customer":"amy", "died":"who fell down the stairs"}, {"customer":'basil', "died":"assaulted by bears"}]
I am looking for an elegant way of taking the keys from l2 and putting them into l1. This is for joining lists of dictionaries that use different values as their index
The function should look something like join(l1,l2,'customer'), and produce
l3 = [{"customer":"amy", "order":2,"died":"who fell down the stairs"}, {"customer":"amy", "order":3,"died":"who fell down the stairs"}, {"customer":"basil", "order":4,"died":"assaulted by bears"}}]
l3 should have a dictionary for every dictionary in l1.
if l1 and l2 have the same non-joining key with different values, l2 takes, precedence.
l2 will have unique values for the joining key.
right now I have tried this ugly piece of code:
l3 = []
rdict = {}
for i in range(len(l2)):
rdict[l2[i][field]]=i
for d in l1:
l3.append(dict(d.items()+l2[rdict[d[field]]].items()))
return l3
as well as the solution from this SO question but that assumes only one index in all lists.
Thank you
Easy:
SELECT *
FROM l1, l2
WHERE l1.customer = l2.customer
...just kidding...
def join(t1,t2,column):
result = []
for entry in t2:
for match in [d for d in t1 if d[column] == entry[column]]:
result.append(dict((k,v) for k,v in entry.items()+match.items()))
return result
Alternative answer...
def diff(d1, d2, key):
if d1[key] != d2[key]:
return d1
new_keys = list(set(d2) - set(d1))
for new_key in new_keys:
d1[new_key] = d2[new_key]
return d1
def join(l1, l2, key):
l3 = l1
for d2 in l2:
l3 = map(lambda d1: diff(d1, d2, key), l3)
return l3
l3= [{"id": 64, "attribute1": 2},
{"id": 62, "attribute1": 3},
{"id": 64, "attribute2": 3}]
l4 = [{"id": 64, "Energy1": 2},
{"id": 62, "Energy1": 3},
{"id": 64, "Energy2": 3}]
def m1(l1,l2):
l1d = {}
for dct in l1: l1d.setdefault(dct["id"], {}).update(dct)
l2d = {}
for dct in l2: l2d.setdefault(dct["id"], {}).update(dct)
aa = {
k : dict(l1d.get(k,{}),**v) for k,v in l2d.items()
}
aal = [*aa.values()]
aalp = print(aal)
return aalp
m1(l3, l4)
"""
Output :
[{'id': 64, 'attribute1': 2, 'attribute2': 3, 'Energy1': 2, 'Energy2': 3}, {'id': 62, 'attribute1': 3, 'Energy1': 3}]
"""
Explanation:
The code takes two lists of dictionaries
and merges them into one list of dictionaries.
The code first creates two dictionaries from the two lists of dictionaries.
The dictionaries are created by using the id as the key
and the rest of the dictionary as the value.
The code then creates a new dictionary by using
the id as the key and the merged dictionaries as the value.
The code then creates a list from the new dictionary.
The code then prints the list.
The code then returns the list.
The Best method is always to use a defaultdict(dict)
Method 2(BEST) :
from _collections import defaultdict
from operator import itemgetter
l1 = [{"id":1, "b":2},
{"id":2, "b":3},
{"id":3, "b":"10"},
{"id":4, "b":"7"}]
l2 = [{"id":1, "c":4},
{"id":2, "c":5},
{"id":6, "c":8},
{"id":7, "c":9}]
def m2(l1,l2):
d = defaultdict(dict)
for l in (l1,l2):
for innerdict in l :
d[innerdict['id']].update(innerdict)
dv = d.values()
dvsorted = sorted(d.values(),key= itemgetter('id'))
dvsorted1 = [*dvsorted]
dvsorted1_print = print(dvsorted1)
return dvsorted1_print
m2(l1, l2)
"""
Output :
[{'id': 1, 'b': 2, 'c': 4}, {'id': 2, 'b': 3, 'c': 5}, {'id': 3, 'b': '10'}, {'id': 4, 'b': '7'}, {'id': 6, 'c': 8}, {'id': 7, 'c': 9}]
"""
Explanation:
The code takes two lists of dictionaries as input
and returns a single list of dictionaries.
The code uses defaultdict to create a dictionary of dictionaries.
The code uses update to update the inner dictionaries.
The code uses itemgetter to sort the list of dictionaries.
The code uses * to unpack the list of dictionaries.
Print the list of dictionaries.