Value duplicated in dictionary - python

The following is my code:
test = [{'name' : 'one'}, {'name' : 'two'}]
a = {}
b = []
c = {}
for i in test:
c['name'] = i['name']
b.append(c)
a['items'] = b
print(a)
This produces the following content of dictionary a, which is wrong:
{'items': [{'name': 'two'}, {'name': 'two'}]}
Why does the output dictionary, a, contains the value 'two' twice and not 1 time the value 'one' and 1 time the value 'two'?

You only created one dict named c, so it's name key changes each time through the loop. You want a new dict to append to b each time through the loop: move c = {} into the loop's body.
for i in test:
c = {}
c['name'] = i['name']
b.append(c)
or
for i in test:
c = {'name': i['name']}
b.append(c)
or
b = [{'name': i['name']} for i in test]

Related

How to create a list of dictionaries in python?

I am trying to make a list of dictionaries in python. Why don't these three methods produce the same results?
A = [{}]*2
A[0]['first_name'] = 'Tom'
A[1]['first_name'] = 'Nancy'
print A
B = [{},{}]
B[0]['first_name'] = 'Tom'
B[1]['first_name'] = 'Nancy'
print B
C = [None]*2
C[0] = {}
C[1] = {}
C[0]['first_name'] = 'Tom'
C[1]['first_name'] = 'Nancy'
print C
this is what I get:
[{'first_name': 'Nancy'}, {'first_name': 'Nancy'}]
[{'first_name': 'Tom'}, {'first_name': 'Nancy'}]
[{'first_name': 'Tom'}, {'first_name': 'Nancy'}]
Your first method is only creating one dictionary. It's equivalent to:
templist = [{}]
A = templist + templist
This extends the list, but doesn't make a copy of the dictionary that's in it. It's also equivalent to:
tempdict = {}
A = []
A.append(tempdict)
A.append(tempdict)
All the list elements are references to the same tempdict object.
Barmar has a very good answer why. I say you how to do it well :)
If you want to generate a list of empty dicts use a generator like this:
A = [{}for n in range(2)]
A[0]['first_name'] = 'Tom'
A[1]['first_name'] = 'Nancy'
print (A)

How to remove duplicate item in List?

If I have a list for example :
courses = [{name: a, course: math, count:1}]
and if I input again name: a course: math the list will be
courses = {name: a, course: math, count:2}
I just want the item with the same name and course will not append to the list but only increasing 'count' key item.
I tried :
def add_class(inputname,inputcourse):
for i in (len(courses)):
if courses[i]['name']== inputname and courses[i]['course']==inputcourse:
courses[i][count]+=1
else :
newdata = {"name":inputname, "course":inputcourse,count:1}
#i put count because this item is the first time.
courses.append(newdata)
print courses
I expect the output is class = {name: a, course: math, count:2} but the actual output is class = [{name: a, course: math, count:2},{name: a, course: math, count:1}]
if i input a new data like name : a, course: physic the output will be
[{name:a,course:physic,count:1},{name: a, course: math, count:2},{name: a, course: math, count:1}]
May I suggest you a different approach?
Instead of using a list of dictionaries wich may be complicated to manage in your case, write your own class to store "name" and "course".
class NC:
def __init__(self, n, c):
self.name = n
self.course = c
def __hash__(self):
return hash((self.name, self.course))
def __eq__(self, other):
if self.__hash__() == other.__hash__():
return True
else:
return False
def __repr__(self):
return "[{}: {}]".format(self.name, self.course)
By defining the special method __hash__ and __eq__ you make your objects hashable, so they can be counted by Counter.
If you write something like this:
from collections import Counter
a = NC("a", "math")
b = NC("b", "physics")
c = NC("a", "math")
l = [a, b, c]
lc = Counter(l)
print lc
the print will gives you Counter({[a: math]: 2, [b: physics]: 1})
With this approach, you may just append all the NC objects to your list and at the end use Counter to get the repetitions.
EDIT after request in the comment.
To do the same thing in "real time" you can create an empty counter an then update it.
from collections import Counter
lc = Counter()
lc.update([NC("a", "math")])
print lc #this prints: Counter({[a: math]: 1})
lc.update([NC("b", "physics")])
print lc #this prints: Counter({[a: math]: 1, [b: physics]: 1})
lc.update([NC("a", "math")])
print lc #this prints: Counter({[a: math]: 2, [b: physics]: 1})
Just remember that Counter.update wants an iterable, so if you want to add one element to the Counter, you have to give in input a list with that one element. Of course you may also add more elements togheter, for example: lc.update([NC("b", "physics"), NC("c", "chemistry")]) is valid and both objects are added to the counter.
You can use a for else clause. The else part will be called only if break is not reached, here is an example for you
courses = []
courses.append({'name': 'a', 'course': 'math', 'count': 1})
def add_course(d):
for course in courses:
if course['course'] == d['course'] and course['name'] == d['name']:
course['count'] += 1
break
else:
d['count'] = 1
courses.append(d)
add_course({'name': 'a', 'course': 'math'})
add_course({'name': 'a', 'course': 'english'})
print(courses)
As an output you have [{'name': 'a', 'course': 'math', 'count': 2}, {'name': 'a', 'course': 'english', 'count': 1}]

Unflatten nested Python dictionary

What would be the cleanest way to convert this
{"a.b.c[0].key1": 1, "a.b.c[1].key2": 2, "a.b.c[3].key3": 3}
Into this
{"a": {"b": {"c": [{"key1": 1}, {"key2": 2}, None, {"key3": 3}]}}}
the dictionary keys may be anything.
the length of the list may vary.
the depth of the dictionary may vary.
if there are missing values in the list the value must be None.
if values are repeated the last one declared is the one that counts.
I came up with the following working example.
Was wondering if we could find a better solution for our community.
def unflatten(data):
if type(data) != dict:
return None
regex = r'\.?([^.\[\]]+)|\[(\d+)\]'
result_holder = {}
for key,value in data.items():
cur = result_holder
prop = ""
results = re.findall(regex, key)
for result in results:
prop = int(prop) if type(cur) == list else prop
if (type(cur) == dict and cur.get(prop)) or (type(cur) == list and len(cur) > prop):
cur = cur[prop]
else:
if type(cur) == list:
if type(prop) is int:
while len(cur) <= prop:
cur.append(None)
cur[prop] = list() if result[1] else dict()
cur = cur[prop]
prop = result[1] or result[0]
prop = int(prop) if type(cur) == list else prop
if type(cur) == list:
if type(prop) is int:
while len(cur) <= prop:
cur.append(None)
print(len(cur), prop)
cur[prop] = data[key]
return result_holder[""] or result_holder
You can use recursion:
d = {"a.b.c[0].key1": 1, "a.b.c[1].key2": 2, "a.b.c[3].key3": 3}
from itertools import groupby
import re
def group_data(data):
new_results = [[a, [i[1:] for i in b]] for a, b in groupby(sorted(data, key=lambda x:x[0]), key=lambda x:x[0])]
arrays = [[a, list(b)] for a, b in groupby(sorted(new_results, key=lambda x:x[0].endswith(']')), key=lambda x:x[0].endswith(']'))]
final_result = {}
for a, b in arrays:
if a:
_chars = [[c, list(d)] for c, d in groupby(sorted(b, key=lambda x:re.findall('^\w+', x[0])[0]), key=lambda x:re.findall('^\w+', x[0])[0])]
_key = _chars[0][0]
final_result[_key] = [[int(re.findall('\d+', c)[0]), d[0]] for c, d in _chars[0][-1]]
_d = dict(final_result[_key])
final_result[_key] = [group_data([_d[i]]) if i in _d else None for i in range(min(_d), max(_d)+1)]
else:
for c, d in b:
final_result[c] = group_data(d) if all(len(i) >1 for i in d) else d[0][0]
return final_result
print(group_data([[*a.split('.'), b] for a, b in d.items()]))
Output:
{'a': {'b': {'c': [{'key1': 1}, {'key2': 2}, None, {'key3': 3}]}}}
A recursive function would probably be much easier to work with and more elegant.
This is partly pseudocode, but it may help you get thinking.
I haven't tested it, but I'm pretty sure it should work so long as you don't have any lists that are directly elements of other lists. So you can have dicts of dicts, dicts of lists, and lists of dicts, but not lists of lists.
def unflatten(data):
resultDict = {}
for e in data:
insertElement(e.split("."), data[e], resultDict)
return resultDict
def insertElement(path, value, subDict):
if (path[0] is of the form "foo[n]"):
key, index = parseListNotation(path[0])
if (key not in subDict):
subDict[key] = []
if (index >= subDict[key].len()):
subDict[key].expandUntilThisSize(index)
if (subDict[key][index] == None):
subDict[key][index] = {}
subDict[key][index] = insertElement(path.pop(0), value, subDict[key][index])
else:
key = path[0]
if (path.length == 1):
subDict[key] = value
else:
if (key not in subDict):
subDict[key] = {}
subDict[key] = insertElement(path.pop(0), value, subDict[key])
return subDict;
The idea is to build the dictionary from the inside, out. E.g.:
For the first element, first create the dictionary `
{key1: 1},
Then assign that to an element of a new dictionary
{c : [None]}, c[0] = {key1: 1}
Then assign that dictionary to the next element b in a new dict, like
- {b: {c : [{key1: 1}]}
Assign that result to a in a new dict
- {a: {b: {c : [{key1: 1}]}}
And lastly return that full dictionary, to use to add the next value.
If you're not familiar with recursive functions, I'd recommend practicing with some simpler ones, and then writing one that does what you want but for input that's only dictionaries.
General path of a dictionary-only recursive function:
Given a path that's a list of attributes of nested dictionaries ( [a, b, c, key1] in your example, if c weren't a list):
Start (path, value):
If there's only item in your path, build a dictionary setting
that key to your value, and you're done.
If there's more than one, build a dictionary using the first
element as a key, and set the value as the output of Start(path.remove(0), value)
Here is another variation on how to achieve the desired results. Not as pretty as I would like though, so I expect there is a much more elegant way. Probably more regex than is really necessary if you spent a bit more time on this, and also seems like the break approach to handling the final key is probably just an indicator that the loop logic could be improved to eliminate that sort of manual intervention. That said, hopefully this is helpful in the process of refining your approach here.
import re
def unflatten(data):
results = {}
list_rgx = re.compile(r'[^\[\]]+\[\d+\]')
idx_rgx = re.compile(r'\d+(?=\])')
key_rgx = re.compile(r'[^\[]+')
for text, value in data.items():
cur = results
keys = text.split('.')
idx = None
for i, key in enumerate(keys):
stop = (i == len(keys) - 1)
if idx is not None:
val = value if stop else {}
if len(cur) > idx:
cur[idx] = {key: val}
else:
for x in range(len(cur), idx + 1):
cur.append({key: val}) if x == idx else cur.append(None)
if stop:
break
else:
cur[idx].get(key)
idx = None
if stop:
cur[key] = value
break
elif re.match(list_rgx, key):
idx = int(re.search(idx_rgx, key).group())
key = re.search(key_rgx, key).group()
cur.setdefault(key, [])
else:
cur.setdefault(key, {})
cur = cur.get(key)
print(results)
Output:
d = {"a.b.c[0].key1": 1, "a.b.c[1].key2": 2, "a.b.c[3].key3": 3}
unflatten(d)
# {'a': {'b': {'c': [{'key1': 1}, {'key2': 2}, None, {'key3': 3}]}}}

An array into two arrays in fast way. python

I want to split an array into two array if object has 'confirmation' param. Are there any ways faster way than I used simple for loop. The array has a lot of elements. I have concern about performance.
Before
[
{
'id':'1'
},
{
'id':'2'
},
{
'id':'3',
'confirmation':'20',
},
{
'id':'4',
'confirmation':'10',
}
]
After
[{'id': 3, 'confirmation': 20}, {'id': 4, 'confirmation': 10}]
[{'id': 1}, {'id': 2}]
Implementation using for loop
$ python3
Python 3.4.3 (default, Nov 17 2016, 01:08:31)
dict1 = {"id":1}
dict2 = {"id":2}
dict3 = {"id":3, "confirmation":20}
dict4 = {"id":4, "confirmation":10}
list = [dict1, dict2, dict3, dict4]
list_with_confirmation = []
list_without_confirmation = []
for d in list:
if 'confirmation' in d:
list_with_confirmation.append(d)
else:
list_without_confirmation.append(d)
print(list_with_confirmation)
print(list_without_confirmation)
Update 1
This is the result on our real data. (3) is the fastest.
(1) 0.148394346
(2) 0.105772018
(3) 0.0339076519
_list = search()
logger.warning(time.time()) //1504691716.5748231
list_with_confirmation = []
list_without_confirmation = []
for d in _list:
if 'confirmation' in d:
list_with_confirmation.append(d)
else:
list_without_confirmation.append(d)
logger.warning(len(list_with_confirmation)) // 69427
logger.warning(time.time()) // 1504691716.7232175 (0.148394346) --- (1)
list_with_confirmation = [d for d in _list if 'confirmation' in d]
list_without_confirmation = [d for d in _list if not 'confirmation' in d]
logger.warning(len(list_with_confirmation)) // 69427
logger.warning(time.time()) // 1504691716.8289895 (0.105772018) --- (2)
lists = ([], [])
[lists['confirmation' in d].append(d) for d in _list]
logger.warning(len(lists[1])) // 69427
logger.warning(time.time()) // 1504691716.8628972 (0.0339076519) --- (3)
I could not know how to use timeit on my environment. sorry it is poor bench check..
List comprehension might be slightly faster:
list_with_confirmation = [d for d in list if "confirmation" in d]
list_without_confirmation = [d for d in list if "confirmation" not in d]
Refer to Why is list comprehension so faster?
Probably it is the fastest way, but you could try another:
lists = ([], [])
for d in source_list:
lists['confirmation' in d].append(d)
or even:
lists = ([], [])
[lists['confirmation' in d].append(d) for d in source_list]
This way lists[0] will be "without confirmation" and lists[1] will be "with confirmation". Do your own benchmarks.
Side note: don't use list for list name, as it overwrites list constructor function.
If you execute below code:
dict1 = {"id":1}
dict2 = {"id":2}
dict3 = {"id":3, "confirmation":20}
dict4 = {"id":4, "confirmation":10}
_list = [dict1, dict2, dict3, dict4]
import timeit
def fun(_list):
list_with_confirmation = []
list_without_confirmation = []
for d in _list:
if 'confirmation' in d:
list_with_confirmation.append(d)
else:
list_without_confirmation.append(d)
print(list_with_confirmation)
print(list_without_confirmation)
def my_fun(_list):
list_with_confirmation = [d for d in _list if 'confirmation' in d]
list_without_confirmation = [d for d in _list if not 'confirmation' in d]
print(list_with_confirmation)
print(list_without_confirmation)
if __name__ == '__main__':
print(timeit.timeit("fun(_list)", setup="from __main__ import fun, _list",number=1))
print(timeit.timeit("my_fun(_list)", setup="from __main__ import my_fun, _list",number=1))
You can get following statistics:
[{'confirmation': 20, 'id': 3}, {'confirmation': 10, 'id': 4}]
[{'id': 1}, {'id': 2}]
5.41210174561e-05
[{'confirmation': 20, 'id': 3}, {'confirmation': 10, 'id': 4}]
[{'id': 1}, {'id': 2}]
2.40802764893e-05
Which mean List comprehention is most optimize way for more reference you can see:blog

python list of dictionaries find duplicates based on value

I have a list of dicts:
a =[{'id': 1,'desc': 'smth'},
{'id': 2,'desc': 'smthelse'},
{'id': 1,'desc': 'smthelse2'},
{'id': 1,'desc': 'smthelse3'}]
I would like to go trough the list and find those dicts that have the same id value (e.g. id=1) and create a new dict:
b = [{'id':1, 'desc' : [smth, smthelse2,smthelse3]},
{'id': 2, 'desc': 'smthelse'}]
You can try:
import operator, itertools
key = operator.itemgetter('id')
b = [{'id': x, 'desc': [d['desc'] for d in y]}
for x, y in itertools.groupby(sorted(a, key=key), key=key)]
It is better to keep the "desc" values as lists everywhere even if they contain a single element only. This way you can do
for d in b:
print d['id']
for desc in d['desc']:
print desc
This would work for strings too, just returning individual characters, which is not what you want.
And now the solution giving you a list of dicts of lists:
a =[{'id': 1,'desc': 'smth'},{'id': 2,'desc': 'smthelse'},{'id': 1,'desc': 'smthelse2'},{'id': 1,'desc': 'smthelse3'}]
c = {}
for d in a:
c.setdefault(d['id'], []).append(d['desc'])
b = [{'id': k, 'desc': v} for k,v in c.iteritems()]
b is now:
[{'desc': ['smth', 'smthelse2', 'smthelse3'], 'id': 1},
{'desc': ['smthelse'], 'id': 2}]
from collections import defaultdict
d = defaultdict(list)
for x in a:
d[x['id']].append(x['desc']) # group description by id
b = [dict(id=id, desc=desc if len(desc) > 1 else desc[0])
for id, desc in d.items()]
To preserve order:
b = []
for id in (x['id'] for x in a):
desc = d[id]
if desc:
b.append(dict(id=id, desc=desc if len(desc) > 1 else desc[0]))
del d[id]

Categories

Resources