python: flat list of dict values - python

I have a list of dicts like so:
a = [ {'list':[1,2,3]}, {'list':[1,4,5]} ]
Am trying to get a flat set of the values in the list key like {1,2,3,4,5}. What's the quickest way?

You can write a loop like:
result = set()
for row in a:
result.update(row['list'])
which I think will work reasonably fast.
Or you can simply use set comprehension and that will result in the following one-liner:
result = {x for row in a for x in row['list']}
In case not all elements contain a 'list' key, you can use .get(..) with an empty tuple (this will reduce construction time):
result = {x for row in a for x in row.get('list',())}

It is not clear what your definition of "quickest" is, but whether it is speed or number of lines I would use a combination of itertools and a generator.
>>> import itertools
>>> a = [ {'list':[1,2,3]}, {'list':[1,4,5]} ]
>>> b = set(itertools.chain.from_iterable(x['list'] for x in a if 'list' in x))
Note that I have added a guard against any elements that may not contain a 'list' key; you can omit that if you know this will always be true.

flat list can be made through reduce easily.
All you need to use initializer - third argument in the reduce function.
reduce(
lambda _set, _dict, key='list': _set.update(
_dict.get(key) or set()) or _set,
a,
set())
Above code works for both python2 and python3, but you need to import reduce module as from functools import reduce. Refer below link for details.
for python2
for python3

Related

Iteration over a sub-list of dictionaries -Python

I have a list of dictionaries e.g.:
list_d = [{"a":1},{"b":2,"c":3}]
(case 1)
for item in list_d:
# add values of each sub-list's dicts
(case 2)
for item in list_d[1]:
# add values of the specific sub-list of dict
(case1) it returns the sum of each sub-list's dict values.
(case2) returns only the keys of the dictionaries.
Is there an efficient way to get the dictionaries of the sub-list(case2) so to add the values?
Here's one way to do it:
reduce(lambda x,y: x + sum(y.values()), list_d, 0)
That is, starting with 0 (as the first x), add the sum of all values in each dict within list_d.
Here's another way:
sum(sum(x.values()) for x in list_d)
That is, sum the (sum of values for each dict in list_d).
As Antti points out it is unclear what you are asking for. I would recommend you having a look at the built-in tools in Python for Functional programming
Consider the following examples:
from operator import add
list_d = [{"a":1},{"b":2,"c":3}]
case_1 = map(lambda d: sum(d.values()), list_d)
case_2 = reduce(add, map(lambda d: sum(d.values()), list_d))

Using Lambda with lists

I am trying to check if a string object is in a list. Simply written as:
if str in list:
The problem I am facing is that this list, is not a list of strings, but a list of tables. I understand that nothing is going to happen if I do this comparison directly. What I would like to do is access an attribute of each of these tables called 'Name'.
I could create a new list, and do my comparison against that:
newList = []
for i in list:
newList.append(i.Name)
But as I am still a newbie, I am curious about Lambda's and wondered if it would be possible to implement that instead?
something like (... but probably nothing like):
if str in list (lambda x: x.Name):
You can write
if str in [x.Name for x in list]
Or more lazily,
if str in (x.Name for x in list)
In the latter (with parens) it builds a generator, while in the former (with brackets), it builds first the full list.
Lambdas pretty much doesn't needed here. You can just check it directly:
for table in my_list:
if string in table.Name:
#do stuff
Or using list comprehension, if you want it that way:
if string in [table.Name for table in my_list]:
#do interesting stuff
More efficiently, as #Tim suggested, use a generator expression:
if string in (table.Name for table in my_list):
But if you insist in using lambdas:
names = map(lambda table: table.Name, my_list)
if string in names:
#do amazing stuff!
Here's a little demo:
>>> class test():
def __init__(self, name):
self.Name = name
>>> my_list = [test(n) for n in name]
>>> l = list(map(lambda table: table.Name, my_list)) #converted to list, it's printable.
>>> l
['a', 'b', 'c']
Also, avoid using names of built in functions such as str, list for variable names. It will override them!
Hope this helps!
I guess you're looking for any:
if any(x.Name == s for x in lst):
...
If the list is not large and you need these names somewhere else, you can create a list or a set of names:
names = {x.Name for x in lst}
if s in names:
....
The lambda you wrote is already in python, and called attrgetter (in module operator):
names = map(attrgetter('Name'), lst)
Note that comprehensions are usually preferred to that.
you can use filter
>>> foo = ["foo","bar","f","b"]
>>> list(filter( lambda x:"f" in x,foo))
['foo', 'f']
update
i keep this answer because someone may come here for lambdas but for this problem #arbautjc 's answer is better.

Python list of tuples, need to unpack and clean up

Assume you have a list such as
x = [('Edgar',), ('Robert',)]
What would be the most efficient way to get to just the strings 'Edgar' and 'Robert'?
Don't really want x[0][0], for example.
Easy solution, and the fastest in most cases.
[item[0] for item in x]
#or
[item for (item,) in x]
Alternatively if you need a functional interface to index access (but slightly slower):
from operator import itemgetter
zero_index = itemgetter(0)
print map(zero_index, x)
Finally, if your sequence is too small to fit in memory, you can do this iteratively. This is much slower on collections but uses only one item's worth of memory.
from itertools import chain
x = [('Edgar',), ('Robert',)]
# list is to materialize the entire sequence.
# Normally you would use this in a for loop with no `list()` call.
print list(chain.from_iterable(x))
But if all you are going to do is iterate anyway, you can also just use tuple unpacking:
for (item,) in x:
myfunc(item)
This is pretty straightforward with a list comprehension:
x = [('Edgar',), ('Robert',)]
y = [s for t in x for s in t]
This does the same thing as list(itertools.chain.from_iterable(x)) and is equivalent in behavior to the following code:
y = []
for t in x:
for s in t:
y.append(s)
I need to send this string to another function.
If your intention is just to call a function for each string in the list, then there's no need to build a new list, just do...
def my_function(s):
# do the thing with 's'
x = [('Edgar',), ('Robert',)]
for (item,) in x:
my_function(item)
...or if you're prepared to sacrifice readability for performance, I suspect it's quickest to do...
def my_function(t):
s = t[0]
# do the thing with 's'
return None
x = [('Edgar',), ('Robert',)]
filter(my_function, x)
Both map() and filter() will do the iteration in C, rather than Python bytecode, but map() will need to build a list of values the same length of the input list, whereas filter() will only build an empty list, as long as my_function() returns a 'falsish' value.
Here is one way:
>>> [name for name, in x]
['Edgar', 'Robert']
Note the placement of the comma, which unpacks the tuple.
>>> from operator import itemgetter
>>> y = map(itemgetter(0), x)
>>> y
['Edgar', 'Robert']
>>> y[0]
'Edgar'
>>> y[1]
'Robert'

An interesting code for obtaining unique values from a list

Say given a list s = [2,2,2,3,3,3,4,4,4]
I saw the following code being used to obtain unique values from s:
unique_s = sorted(unique(s))
where unique is defined as:
def unique(seq):
# not order preserving
set = {}
map(set.__setitem__, seq, [])
return set.keys()
I'm just curious to know if there is any difference between this and just doing list(set(s))? Both results in a mutable object with the same values.
I'm guessing this code is faster since it's only looping once rather than twice in the case of type conversion?
You should use the code you describe:
list(set(s))
This works on all Pythons from 2.4 (I think) to 3.3, is concise, and uses built-ins in an easy to understand way.
The function unique appears to be designed to work if set is not a built-in, which is true for Python 2.3. Python 2.3 is fairly ancient (2003). The unique function is also broken for the Python 3.x series, since dict.keys returns an iterator for Python 3.x.
For a sorted sequence you could use itertools unique_justseen() recipe to get unique values while preserving order:
from itertools import groupby
from operator import itemgetter
print map(itemgetter(0), groupby([2,2,2,3,3,3,4,4,4]))
# -> [2, 3, 4]
To remove duplicate items from a sorted sequence inplace (to leave only unique values):
def del_dups(sorted_seq):
prev = object()
pos = 0
for item in sorted_seq:
if item != prev:
prev = item
sorted_seq[pos] = item
pos += 1
del sorted_seq[pos:]
L = [2,2,2,3,3,3,4,4,4]
del_dups(L)
print L # -> [2, 3, 4]

python: union keys from multiple dictionary?

I have 5 dictionaries and I want a union of their keys.
alldict = [dict1, dict2, dict3, dict4, dict5]
I tried
allkey = reduce(lambda x, y: set(x.keys()).union(y.keys()), alldict)
but it gave me an error
AttributeError: 'set' object has no attribute 'keys'
Am I doing it wrong ? I using normal forloop but I wonder why the above code didn't work.
I think #chuck already answered the question why it doesn't work, but a simpler way to do this would be to remember that the union method can take multiple arguments:
allkey = set().union(*alldict)
does what you want without any loops or lambdas.
Your solution works for the first two elements in the list, but then dict1 and dict2 got reduced into a set and that set is put into your lambda as the x. So now x does not have the method keys() anymore.
The solution is to make x be a set from the very beginning by initializing the reduction with an empty set (which happens to be the neutral element of the union).
Try it with an initializer:
allkey = reduce(lambda x, y: x.union(y.keys()), alldict, set())
An alternative without any lambdas would be:
allkey = reduce(set.union, map(set, map(dict.keys, alldict)))
A simple strategy for non-functional neurons (pun intended):
allkey = []
for dictio in alldict:
for key in dictio:
allkey.append(key)
allkey = set(allkey)
We can convert this code to a much sorter form using set comprehensions:
allkey = {key for dictio in alldict for key in dictio}
This one-liner is still very readable in comparison with the conventional for loop.
The key to convert a nested loop to a list or set comprehension is to write the inner loop (the one that varies faster in the nested loop) as the last index (that is, for key in dictio).
set().union(dict1.keys(),dict2.keys()...)
I tried the list and it didnt work so just putting it up here for anyone.
Just one more way, 'cause what the hay:
a={}; [ a.update(b) for b in alldict ] and a.keys()
or the slightly-more-mysterious
reduce(lambda a, b: a.update(b) or a, alldict, {}).keys()
(I'm bummed that there's no built-in function equivalent to
def f(a,b):
r = {}
r.update(a)
r.update(b)
return r
is there?)
If you only want to union keys of 2 dicts you could use operator |.
Quote from docs:
Return a new set with elements from the set and all others.
Example:
all_keys = (dict1.keys() | dict2.keys())

Categories

Resources