python: iterating through a dictionary with list values - python
Given a dictionary of lists, such as
d = {'1':[11,12], '2':[21,21]}
Which is more pythonic or otherwise preferable:
for k in d:
for x in d[k]:
# whatever with k, x
or
for k, dk in d.iteritems():
for x in dk:
# whatever with k, x
or is there something else to consider?
EDIT, in case a list might be useful (e.g., standard dicts don't preserve order), this might be appropriate, although it's much slower.
d2 = d.items()
for k in d2:
for x in d2[1]:
# whatever with k, x
Here's a speed test, why not:
import random
numEntries = 1000000
d = dict(zip(range(numEntries), [random.sample(range(0, 100), 2) for x in range(numEntries)]))
def m1(d):
for k in d:
for x in d[k]:
pass
def m2(d):
for k, dk in d.iteritems():
for x in dk:
pass
import cProfile
cProfile.run('m1(d)')
print
cProfile.run('m2(d)')
# Ran 3 trials:
# m1: 0.205, 0.194, 0.193: average 0.197 s
# m2: 0.176, 0.166, 0.173: average 0.172 s
# Method 1 takes 15% more time than method 2
cProfile example output:
3 function calls in 0.194 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.194 0.194 <string>:1(<module>)
1 0.194 0.194 0.194 0.194 stackoverflow.py:7(m1)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
4 function calls in 0.179 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.179 0.179 <string>:1(<module>)
1 0.179 0.179 0.179 0.179 stackoverflow.py:12(m2)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1 0.000 0.000 0.000 0.000 {method 'iteritems' of 'dict' objects}
I considered a couple methods:
import itertools
COLORED_THINGS = {'blue': ['sky', 'jeans', 'powerline insert mode'],
'yellow': ['sun', 'banana', 'phone book/monitor stand'],
'red': ['blood', 'tomato', 'test failure']}
def forloops():
""" Nested for loops. """
for color, things in COLORED_THINGS.items():
for thing in things:
pass
def iterator():
""" Use itertools and list comprehension to construct iterator. """
for color, thing in (
itertools.chain.from_iterable(
[itertools.product((k,), v) for k, v in COLORED_THINGS.items()])):
pass
def iterator_gen():
""" Use itertools and generator to construct iterator. """
for color, thing in (
itertools.chain.from_iterable(
(itertools.product((k,), v) for k, v in COLORED_THINGS.items()))):
pass
I used ipython and memory_profiler to test performance:
>>> %timeit forloops()
1000000 loops, best of 3: 1.31 µs per loop
>>> %timeit iterator()
100000 loops, best of 3: 3.58 µs per loop
>>> %timeit iterator_gen()
100000 loops, best of 3: 3.91 µs per loop
>>> %memit -r 1000 forloops()
peak memory: 35.79 MiB, increment: 0.02 MiB
>>> %memit -r 1000 iterator()
peak memory: 35.79 MiB, increment: 0.00 MiB
>>> %memit -r 1000 iterator_gen()
peak memory: 35.79 MiB, increment: 0.00 MiB
As you can see, the method had no observable impact on peak memory usage, but nested for loops were unbeatable for speed (not to mention readability).
Here's the list comprehension approach. Nested...
r = [[i for i in d[x]] for x in d.keys()]
print r
[[11, 12], [21, 21]]
My results from Brionius code:
3 function calls in 0.173 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.173 0.173 <string>:1(<module>)
1 0.173 0.173 0.173 0.173 speed.py:5(m1)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Prof
iler' objects}
4 function calls in 0.185 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.185 0.185 <string>:1(<module>)
1 0.185 0.185 0.185 0.185 speed.py:10(m2)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Prof
iler' objects}
1 0.000 0.000 0.000 0.000 {method 'iteritems' of 'dict' obje
cts}
Related
ProcessPoolExecutor read-only variable speed discrepancy
I have two versions of a multiprocessing program using concurrent.futures.ProcessPoolExecutor (Python 3.6, Linux) with surprising speed discrepancies despite seemingly minor changes (one is ~3x slower than the other). Each child process executes a simple function that reads from a large dict (it does not alter it) and returns a result. The first version of the function passes the dict into executor.submit() as an argument. The second version of the function reads from the global dict directly. Code samples Variable passed in: #!/usr/bin/env python3 import concurrent.futures, pstats, sys, cProfile BIG_DICT = {i: 2*i for i in range(10000)} def foo(d): return d[0] with concurrent.futures.ProcessPoolExecutor(max_workers=10) as executor: tasks = [executor.submit(foo, BIG_DICT) for _ in range(100000)] for task in concurrent.futures.as_completed(tasks): task.result() Global variable read from: #!/usr/bin/env python3 import concurrent.futures, pstats, sys, cProfile BIG_DICT = {i: 2*i for i in range(10000)} def foo(): return BIG_DICT[0] with concurrent.futures.ProcessPoolExecutor(max_workers=10) as executor: tasks = [executor.submit(foo) for _ in range(100000)] for task in concurrent.futures.as_completed(tasks): task.result() Ideas I've profiled both versions of the program using cProfile and the majority of execution time seems to be spent waiting for locks. The global version only waits for about 10 seconds, while the pass-in version waits for almost 80 seconds! From what I understand, when a process is forked it should make a copy of its parent's memory. As the program is multiprocessed and BIG_DICT is never actually modified after creation, there shouldn't be any need for locking to maintain state consistency between submitting each process. Since BIG_DICT needs to be copied into the memory space of each child process in both versions, why is there so much discrepancy in execution time? A couple of ideas I have floating around: Implementation detail of ProcessPoolExecutor GIL quirk Some sort of Python runtime/OS optimisation Profiling results Variable passed in: 7672287 function calls in 92.434 seconds Ordered by: internal time, cumulative time List reduced from 247 to 12 due to restriction <0.05> ncalls tottime percall cumtime percall filename:lineno(function) 460133 75.428 0.000 75.428 0.000 {method 'acquire' of '_thread.lock' objects} 100001 7.034 0.000 7.034 0.000 {built-in method posix.write} 100001 2.490 0.000 2.490 0.000 {method '__enter__' of '_multiprocessing.SemLock' objects} 100001 0.686 0.000 78.344 0.001 _base.py:196(as_completed) 90033 0.553 0.000 75.879 0.001 threading.py:263(wait) 100000 0.548 0.000 13.639 0.000 process.py:449(submit) 190033 0.366 0.000 0.713 0.000 _base.py:174(_yield_finished_futures) 100000 0.351 0.000 0.598 0.000 _base.py:312(__init__) 90033 0.327 0.000 76.335 0.001 threading.py:533(wait) 100001 0.261 0.000 7.617 0.000 connection.py:181(send_bytes) 480065 0.260 0.000 0.382 0.000 threading.py:239(__enter__) 100001 0.258 0.000 11.329 0.000 queues.py:339(put) Ordered by: internal time, cumulative time List reduced from 247 to 12 due to restriction <0.05> Function was called by... ncalls tottime cumtime {method 'acquire' of '_thread.lock' objects} <- 90033 0.078 0.078 threading.py:251(_acquire_restore) 190033 0.391 0.391 threading.py:254(_is_owned) 180066 74.956 74.956 threading.py:263(wait) 1 0.003 0.003 threading.py:1062(_wait_for_tstate_lock) {built-in method posix.write} <- 100001 7.034 7.034 connection.py:365(_send) {method '__enter__' of '_multiprocessing.SemLock' objects} <- 100001 2.490 2.490 synchronize.py:95(__enter__) _base.py:196(as_completed) <- threading.py:263(wait) <- 90033 0.553 75.879 threading.py:533(wait) process.py:449(submit) <- 100000 0.548 13.639 local.py:13(<listcomp>) _base.py:174(_yield_finished_futures) <- 190033 0.366 0.713 _base.py:196(as_completed) _base.py:312(__init__) <- 100000 0.351 0.598 process.py:449(submit) threading.py:533(wait) <- 90032 0.327 76.334 _base.py:196(as_completed) 1 0.000 0.001 threading.py:828(start) connection.py:181(send_bytes) <- 100001 0.261 7.617 queues.py:339(put) threading.py:239(__enter__) <- 100000 0.070 0.116 _base.py:174(_yield_finished_futures) 100000 0.033 0.051 _base.py:405(result) 100000 0.083 0.108 queue.py:115(put) 90032 0.040 0.058 threading.py:523(clear) 90033 0.034 0.050 threading.py:533(wait) queues.py:339(put) <- 100000 0.258 11.329 process.py:449(submit) 1 0.000 0.000 process.py:499(shutdown) Global variable read from: 5949819 function calls in 27.158 seconds Ordered by: internal time, cumulative time List reduced from 247 to 12 due to restriction <0.05> ncalls tottime percall cumtime percall filename:lineno(function) 160569 10.072 0.000 10.072 0.000 {method 'acquire' of '_thread.lock' objects} 100001 5.453 0.000 5.453 0.000 {method '__enter__' of '_multiprocessing.SemLock' objects} 100001 5.338 0.000 5.338 0.000 {built-in method posix.write} 100000 0.883 0.000 1.163 0.000 _base.py:312(__init__) 100000 0.477 0.000 15.671 0.000 process.py:449(submit) 100001 0.438 0.000 6.133 0.000 connection.py:181(send_bytes) 100001 0.304 0.000 12.921 0.000 queues.py:339(put) 100000 0.304 0.000 0.304 0.000 process.py:116(__init__) 100001 0.277 0.000 0.432 0.000 reduction.py:38(__init__) 100000 0.267 0.000 0.333 0.000 threading.py:334(notify) 100000 0.240 0.000 0.747 0.000 queue.py:115(put) 100006 0.238 0.000 0.280 0.000 threading.py:215(__init__) Ordered by: internal time, cumulative time List reduced from 247 to 12 due to restriction <0.05> Function was called by... ncalls tottime cumtime {method 'acquire' of '_thread.lock' objects} <- 15142 0.007 0.007 threading.py:251(_acquire_restore) 115142 0.038 0.038 threading.py:254(_is_owned) 30284 10.022 10.022 threading.py:263(wait) 1 0.004 0.004 threading.py:1062(_wait_for_tstate_lock) {method '__enter__' of '_multiprocessing.SemLock' objects} <- 100001 5.453 5.453 synchronize.py:95(__enter__) {built-in method posix.write} <- 100001 5.338 5.338 connection.py:365(_send) _base.py:312(__init__) <- 100000 0.883 1.163 process.py:449(submit) process.py:449(submit) <- 100000 0.477 15.671 global.py:13(<listcomp>) connection.py:181(send_bytes) <- 100001 0.438 6.133 queues.py:339(put) queues.py:339(put) <- 100000 0.304 12.921 process.py:449(submit) 1 0.000 0.000 process.py:499(shutdown) process.py:116(__init__) <- 100000 0.304 0.304 process.py:449(submit) reduction.py:38(__init__) <- 100001 0.277 0.432 reduction.py:48(dumps) threading.py:334(notify) <- 100000 0.267 0.333 queue.py:115(put) queue.py:115(put) <- 100000 0.240 0.747 process.py:449(submit) threading.py:215(__init__) <- 100000 0.238 0.280 _base.py:312(__init__) 3 0.000 0.000 queue.py:27(__init__) 1 0.000 0.000 queues.py:67(_after_fork) 2 0.000 0.000 threading.py:498(__init__)
why python process a sorted list cost more time than a unsorted list
Example: import cProfile, random, copy def foo(lIn): return [i*i for i in lIn] lIn = [random.random() for i in range(1000000)] lIn1 = copy.copy(lIn) lIn2 = sorted(lIn1) cProfile.run('foo(lIn)') cProfile.run('foo(lIn2)') Result: 3 function calls in 0.075 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.005 0.005 0.075 0.075 :1() 1 0.070 0.070 0.070 0.070 test.py:716(foo) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 3 function calls in 0.143 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.006 0.006 0.143 0.143 :1() 1 0.137 0.137 0.137 0.137 test.py:716(foo) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
Not really an answer yet, but the comment margin is a bit too small for this. As random.shuffle() would yield the same result, I decided to implement my own shuffle function and vary the amount of times I'd shuffle. (In the below example, it's the parameter to xrange, 300000. def my_shuffle(array): for _ in xrange(300000): rand1 = random.randint(0, 999999) rand2 = random.randint(0, 999999) array[rand1], array[rand2] = array[rand2], array[rand1] The other code is pretty much unmodified: import cProfile, random, copy def foo(lIn): return [i*i for i in lIn] lIn = [random.random()*100000 for i in range(1000000)] lIn1 = copy.copy(lIn) my_shuffle(lIn1) cProfile.run('foo(lIn)') cProfile.run('foo(lIn1)') The results I got for the second cProfile depended on the number of times I shuffled: 10000 0.062 100000 0.082 200000 0.099 400000 0.122 800000 0.137 8000000 0.141 10000000 0.141 100000000 0.248 It looks like the more you mess an array up, the longer operations take, up to a certain point. (I don't know about the last result. It took so long that I did some light other stuff in the background and don't really want to retry.)
Most efficient way to create an array of cos and sin in Numpy
I need to store an array of size n with values of cos(x) and sin(x), lets say array[[cos(0.9), sin(0.9)], [cos(0.35),sin(0.35)], ...] The arguments of each pair of cos and sin is given by random choice. My code as far as I have been improving it is like this: def randvector(): """ Generates random direction for n junctions in the unitary circle """ x = np.empty([n,2]) theta = 2 * np.pi * np.random.random_sample((n)) x[:,0] = np.cos(theta) x[:,1] = np.sin(theta) return x Is there a shorter way or more effective way to achieve this?
Your code is effective enough. And justhalf's answer is not bad I think. For effective and short, How about this code? def randvector(n): theta = 2 * np.pi * np.random.random_sample((n)) return np.vstack((np.cos(theta), np.sin(theta))).T UPDATE Append cProfile result. justhalf's 5 function calls in 4.707 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.001 0.001 4.707 4.707 <string>:1(<module>) 1 2.452 2.452 4.706 4.706 test.py:6(randvector1) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 1 0.010 0.010 0.010 0.010 {method 'random_sample' of 'mtrand.RandomState' objects} 1 2.244 2.244 2.244 2.244 {numpy.core.multiarray.array} OP's 5 function calls in 0.088 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.088 0.088 <string>:1(<module>) 1 0.079 0.079 0.088 0.088 test.py:9(randvector2) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 1 0.009 0.009 0.009 0.009 {method 'random_sample' of 'mtrand.RandomState' objects} 1 0.000 0.000 0.000 0.000 {numpy.core.multiarray.empty} mine 21 function calls in 0.087 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.087 0.087 <string>:1(<module>) 2 0.000 0.000 0.000 0.000 numeric.py:322(asanyarray) 1 0.000 0.000 0.002 0.002 shape_base.py:177(vstack) 2 0.000 0.000 0.000 0.000 shape_base.py:58(atleast_2d) 1 0.076 0.076 0.087 0.087 test.py:17(randvector3) 6 0.000 0.000 0.000 0.000 {len} 1 0.000 0.000 0.000 0.000 {map} 2 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 1 0.009 0.009 0.009 0.009 {method 'random_sample' of 'mtrand.RandomState' objects} 2 0.000 0.000 0.000 0.000 {numpy.core.multiarray.array} 1 0.002 0.002 0.002 0.002 {numpy.core.multiarray.concatenate}
Your code already looks fine to me, but here are a few more thoughts. Here's a one-liner. It is marginally slower than your version. def randvector2(n): return np.exp((2.0j * np.pi) * np.random.rand(n, 1)).view(dtype=np.float64) I get these timings for n=10000 Yours: 1000 loops, best of 3: 716 µs per loop my shortened version: 1000 loops, best of 3: 834 µs per loop Now if speed is a concern, your approach is really very good. Another answer shows how to use hstack. That works well. Here is another version that is just a little different from yours and is marginally faster. def randvector3(n): x = np.empty([n,2]) theta = (2 * np.pi) * np.random.rand(n) np.cos(theta, out=x[:,0]) np.sin(theta, out=x[:,1]) return x This gives me the timing: 1000 loops, best of 3: 698 µs per loop If you have access to numexpr, the following is faster (at least on my machine). import numexpr as ne def randvector3(n): sample = np.random.rand(n, 1) c = 2.0j * np.pi return ne.evaluate('exp(c * sample)').view(dtype=np.float64) This gives me the timing: 1000 loops, best of 3: 366 µs per loop Honestly though, if I were writing this for anything that wasn't extremely performance intensive, I'd do pretty much the same thing you did. It makes your intent pretty clear to the reader. The version with hstack works well too. Another quick note: When I run timings for n=10, my one-line version is fastest. When I do n=10000000, the fast pure-numpy version is fastest.
You can use list comprehension to make the code a little bit shorter: def randvector(n): return np.array([(np.cos(theta), np.sin(theta)) for theta in 2*np.pi*np.random.random_sample(n)]) But, as IanH mentioned in comments, this is slower. In fact, through my experiment, this is 5x slower, because this doesn't take advantage of NumPy vectorization. So to answer your question: Is there a shorter way? Yes, which is what I give in this answer, although it's only shorter by a few characters (but it saves many lines!) Is there a more effective (I believe you meant "efficient") way? I believe the answer to this question, without overly complicating the code, is no, since numpy already optimizes the vectorization (assigning of the cos and sin values to the array) Timing Comparing various methods: OP's randvector: 0.002131 s My randvector: 0.013218 s mskimm's randvector: 0.003175 s So it seems that mskimm's randvector looks good in terms of code length end efficiency =D
Python: Most efficient way to toggle "verbose" output?
So, I have a script with a lot of debugging output that I can toggle on/off with a -v flag. My current code looks like this: def vprint( obj ): if args.verbose: print obj However, I'm thinking this is inefficient since every time I call vprint(), it has to jump to that function and check the value of args.verbose. I came up with this, which should be slightly more efficient: if args.verbose: def vprint( obj ): print obj else: def vprint( obj ): pass While the if is now removed, it still has to jump to that function. So I was wondering if there was a way to define vprint as something like a function pointer that goes nowhere, so it could skip that altogether? Or is Python smart enough to know not to waste time on a function that's just pass?
Unless your performance analysis has lead you here, it's probably not worth optimizing. A quick set of tests yields a minor (0.040) improvement over 1000000 iterations: 1000004 function calls in 0.424 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.424 0.424 <string>:1(<module>) 1 0.242 0.242 0.424 0.424 test.py:14(testit) 1 0.000 0.000 0.424 0.424 test.py:21(testit1) 1000000 0.182 0.000 0.182 0.000 test.py:6(vprint) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 1000004 function calls in 0.408 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.408 0.408 <string>:1(<module>) 1000000 0.142 0.000 0.142 0.000 test.py:10(vprint2) 1 0.266 0.266 0.408 0.408 test.py:14(testit) 1 0.000 0.000 0.408 0.408 test.py:18(testit2) 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} Test code follows; #!/usr/bin/python import cProfile verbose=False def vprint(msg): if verbose: print msg def vprint2(msg): pass def testit(fcn): for i in xrange(1000000): fcn(i) def testit2(): testit(vprint2) def testit1(): testit(vprint) if __name__ == '__main__': cProfile.run('testit1()') cProfile.run('testit2()')
Flattening a shallow list in Python [duplicate]
This question already has answers here: How do I make a flat list out of a list of lists? (34 answers) Closed 6 years ago. Is there a simple way to flatten a list of iterables with a list comprehension, or failing that, what would you all consider to be the best way to flatten a shallow list like this, balancing performance and readability? I tried to flatten such a list with a nested list comprehension, like this: [image for image in menuitem for menuitem in list_of_menuitems] But I get in trouble of the NameError variety there, because the name 'menuitem' is not defined. After googling and looking around on Stack Overflow, I got the desired results with a reduce statement: reduce(list.__add__, map(lambda x: list(x), list_of_menuitems)) But this method is fairly unreadable because I need that list(x) call there because x is a Django QuerySet object. Conclusion: Thanks to everyone who contributed to this question. Here is a summary of what I learned. I'm also making this a community wiki in case others want to add to or correct these observations. My original reduce statement is redundant and is better written this way: >>> reduce(list.__add__, (list(mi) for mi in list_of_menuitems)) This is the correct syntax for a nested list comprehension (Brilliant summary dF!): >>> [image for mi in list_of_menuitems for image in mi] But neither of these methods are as efficient as using itertools.chain: >>> from itertools import chain >>> list(chain(*list_of_menuitems)) And as #cdleary notes, it's probably better style to avoid * operator magic by using chain.from_iterable like so: >>> chain = itertools.chain.from_iterable([[1,2],[3],[5,89],[],[6]]) >>> print(list(chain)) >>> [1, 2, 3, 5, 89, 6]
If you're just looking to iterate over a flattened version of the data structure and don't need an indexable sequence, consider itertools.chain and company. >>> list_of_menuitems = [['image00', 'image01'], ['image10'], []] >>> import itertools >>> chain = itertools.chain(*list_of_menuitems) >>> print(list(chain)) ['image00', 'image01', 'image10'] It will work on anything that's iterable, which should include Django's iterable QuerySets, which it appears that you're using in the question. Edit: This is probably as good as a reduce anyway, because reduce will have the same overhead copying the items into the list that's being extended. chain will only incur this (same) overhead if you run list(chain) at the end. Meta-Edit: Actually, it's less overhead than the question's proposed solution, because you throw away the temporary lists you create when you extend the original with the temporary. Edit: As J.F. Sebastian says itertools.chain.from_iterable avoids the unpacking and you should use that to avoid * magic, but the timeit app shows negligible performance difference.
You almost have it! The way to do nested list comprehensions is to put the for statements in the same order as they would go in regular nested for statements. Thus, this for inner_list in outer_list: for item in inner_list: ... corresponds to [... for inner_list in outer_list for item in inner_list] So you want [image for menuitem in list_of_menuitems for image in menuitem]
#S.Lott: You inspired me to write a timeit app. I figured it would also vary based on the number of partitions (number of iterators within the container list) -- your comment didn't mention how many partitions there were of the thirty items. This plot is flattening a thousand items in every run, with varying number of partitions. The items are evenly distributed among the partitions. Code (Python 2.6): #!/usr/bin/env python2.6 """Usage: %prog item_count""" from __future__ import print_function import collections import itertools import operator from timeit import Timer import sys import matplotlib.pyplot as pyplot def itertools_flatten(iter_lst): return list(itertools.chain(*iter_lst)) def itertools_iterable_flatten(iter_iter): return list(itertools.chain.from_iterable(iter_iter)) def reduce_flatten(iter_lst): return reduce(operator.add, map(list, iter_lst)) def reduce_lambda_flatten(iter_lst): return reduce(operator.add, map(lambda x: list(x), [i for i in iter_lst])) def comprehension_flatten(iter_lst): return list(item for iter_ in iter_lst for item in iter_) METHODS = ['itertools', 'itertools_iterable', 'reduce', 'reduce_lambda', 'comprehension'] def _time_test_assert(iter_lst): """Make sure all methods produce an equivalent value. :raise AssertionError: On any non-equivalent value.""" callables = (globals()[method + '_flatten'] for method in METHODS) results = [callable(iter_lst) for callable in callables] if not all(result == results[0] for result in results[1:]): raise AssertionError def time_test(partition_count, item_count_per_partition, test_count=10000): """Run flatten methods on a list of :param:`partition_count` iterables. Normalize results over :param:`test_count` runs. :return: Mapping from method to (normalized) microseconds per pass. """ iter_lst = [[dict()] * item_count_per_partition] * partition_count print('Partition count: ', partition_count) print('Items per partition:', item_count_per_partition) _time_test_assert(iter_lst) test_str = 'flatten(%r)' % iter_lst result_by_method = {} for method in METHODS: setup_str = 'from test import %s_flatten as flatten' % method t = Timer(test_str, setup_str) per_pass = test_count * t.timeit(number=test_count) / test_count print('%20s: %.2f usec/pass' % (method, per_pass)) result_by_method[method] = per_pass return result_by_method if __name__ == '__main__': if len(sys.argv) != 2: raise ValueError('Need a number of items to flatten') item_count = int(sys.argv[1]) partition_counts = [] pass_times_by_method = collections.defaultdict(list) for partition_count in xrange(1, item_count): if item_count % partition_count != 0: continue items_per_partition = item_count / partition_count result_by_method = time_test(partition_count, items_per_partition) partition_counts.append(partition_count) for method, result in result_by_method.iteritems(): pass_times_by_method[method].append(result) for method, pass_times in pass_times_by_method.iteritems(): pyplot.plot(partition_counts, pass_times, label=method) pyplot.legend() pyplot.title('Flattening Comparison for %d Items' % item_count) pyplot.xlabel('Number of Partitions') pyplot.ylabel('Microseconds') pyplot.show() Edit: Decided to make it community wiki. Note: METHODS should probably be accumulated with a decorator, but I figure it'd be easier for people to read this way.
sum(list_of_lists, []) would flatten it. l = [['image00', 'image01'], ['image10'], []] print sum(l,[]) # prints ['image00', 'image01', 'image10']
This solution works for arbitrary nesting depths - not just the "list of lists" depth that some (all?) of the other solutions are limited to: def flatten(x): result = [] for el in x: if hasattr(el, "__iter__") and not isinstance(el, basestring): result.extend(flatten(el)) else: result.append(el) return result It's the recursion which allows for arbitrary depth nesting - until you hit the maximum recursion depth, of course...
In Python 2.6, using chain.from_iterable(): >>> from itertools import chain >>> list(chain.from_iterable(mi.image_set.all() for mi in h.get_image_menu())) It avoids creating of intermediate list.
Performance Results. Revised. import itertools def itertools_flatten( aList ): return list( itertools.chain(*aList) ) from operator import add def reduce_flatten1( aList ): return reduce(add, map(lambda x: list(x), [mi for mi in aList])) def reduce_flatten2( aList ): return reduce(list.__add__, map(list, aList)) def comprehension_flatten( aList ): return list(y for x in aList for y in x) I flattened a 2-level list of 30 items 1000 times itertools_flatten 0.00554 comprehension_flatten 0.00815 reduce_flatten2 0.01103 reduce_flatten1 0.01404 Reduce is always a poor choice.
There seems to be a confusion with operator.add! When you add two lists together, the correct term for that is concat, not add. operator.concat is what you need to use. If you're thinking functional, it is as easy as this:: >>> from functools import reduce >>> import operator >>> list2d = ((1,2,3),(4,5,6), (7,), (8,9)) >>> reduce(operator.concat, list2d) (1, 2, 3, 4, 5, 6, 7, 8, 9) You see reduce respects the sequence type, so when you supply a tuple, you get back a tuple. let's try with a list:: >>> list2d = [[1,2,3],[4,5,6], [7], [8,9]] >>> reduce(operator.concat, list2d) [1, 2, 3, 4, 5, 6, 7, 8, 9] Aha, you get back a list. How about performance:: >>> list2d = [[1,2,3],[4,5,6], [7], [8,9]] >>> %timeit list(itertools.chain.from_iterable(list2d)) 1000000 loops, best of 3: 1.36 µs per loop from_iterable is pretty fast! But it's no comparison to reduce with concat. >>> list2d = ((1,2,3),(4,5,6), (7,), (8,9)) >>> %timeit reduce(operator.concat, list2d) 1000000 loops, best of 3: 492 ns per loop
Here is the correct solution using list comprehensions (they're backward in the question): >>> join = lambda it: (y for x in it for y in x) >>> list(join([[1,2],[3,4,5],[]])) [1, 2, 3, 4, 5] In your case it would be [image for menuitem in list_of_menuitems for image in menuitem.image_set.all()] or you could use join and say join(menuitem.image_set.all() for menuitem in list_of_menuitems) In either case, the gotcha was the nesting of the for loops.
Off the top of my head, you can eliminate the lambda: reduce(list.__add__, map(list, [mi.image_set.all() for mi in list_of_menuitems])) Or even eliminate the map, since you've already got a list-comp: reduce(list.__add__, [list(mi.image_set.all()) for mi in list_of_menuitems]) You can also just express this as a sum of lists: sum([list(mi.image_set.all()) for mi in list_of_menuitems], [])
This version is a generator.Tweak it if you want a list. def list_or_tuple(l): return isinstance(l,(list,tuple)) ## predicate will select the container to be flattened ## write your own as required ## this one flattens every list/tuple def flatten(seq,predicate=list_or_tuple): ## recursive generator for i in seq: if predicate(seq): for j in flatten(i): yield j else: yield i You can add a predicate ,if want to flatten those which satisfy a condition Taken from python cookbook
If you have to flat a more complicated list with not iterable elements or with depth more than 2 you can use following function: def flat_list(list_to_flat): if not isinstance(list_to_flat, list): yield list_to_flat else: for item in list_to_flat: yield from flat_list(item) It will return a generator object which you can convert to a list with list() function. Notice that yield from syntax is available starting from python3.3, but you can use explicit iteration instead. Example: >>> a = [1, [2, 3], [1, [2, 3, [1, [2, 3]]]]] >>> print(list(flat_list(a))) [1, 2, 3, 1, 2, 3, 1, 2, 3]
Here is a version working for multiple levels of list using collectons.Iterable: import collections def flatten(o, flatten_condition=lambda i: isinstance(i, collections.Iterable) and not isinstance(i, str)): result = [] for i in o: if flatten_condition(i): result.extend(flatten(i, flatten_condition)) else: result.append(i) return result
have you tried flatten? From matplotlib.cbook.flatten(seq, scalarp=) ? l=[[1,2,3],[4,5,6], [7], [8,9]]*33 run("list(flatten(l))") 3732 function calls (3303 primitive calls) in 0.007 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.007 0.007 <string>:1(<module>) 429 0.001 0.000 0.001 0.000 cbook.py:475(iterable) 429 0.002 0.000 0.003 0.000 cbook.py:484(is_string_like) 429 0.002 0.000 0.006 0.000 cbook.py:565(is_scalar_or_string) 727/298 0.001 0.000 0.007 0.000 cbook.py:605(flatten) 429 0.000 0.000 0.001 0.000 core.py:5641(isMaskedArray) 858 0.001 0.000 0.001 0.000 {isinstance} 429 0.000 0.000 0.000 0.000 {iter} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} l=[[1,2,3],[4,5,6], [7], [8,9]]*66 run("list(flatten(l))") 7461 function calls (6603 primitive calls) in 0.007 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.007 0.007 <string>:1(<module>) 858 0.001 0.000 0.001 0.000 cbook.py:475(iterable) 858 0.002 0.000 0.003 0.000 cbook.py:484(is_string_like) 858 0.002 0.000 0.006 0.000 cbook.py:565(is_scalar_or_string) 1453/595 0.001 0.000 0.007 0.000 cbook.py:605(flatten) 858 0.000 0.000 0.001 0.000 core.py:5641(isMaskedArray) 1716 0.001 0.000 0.001 0.000 {isinstance} 858 0.000 0.000 0.000 0.000 {iter} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} l=[[1,2,3],[4,5,6], [7], [8,9]]*99 run("list(flatten(l))") 11190 function calls (9903 primitive calls) in 0.010 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.010 0.010 <string>:1(<module>) 1287 0.002 0.000 0.002 0.000 cbook.py:475(iterable) 1287 0.003 0.000 0.004 0.000 cbook.py:484(is_string_like) 1287 0.002 0.000 0.009 0.000 cbook.py:565(is_scalar_or_string) 2179/892 0.001 0.000 0.010 0.000 cbook.py:605(flatten) 1287 0.001 0.000 0.001 0.000 core.py:5641(isMaskedArray) 2574 0.001 0.000 0.001 0.000 {isinstance} 1287 0.000 0.000 0.000 0.000 {iter} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} l=[[1,2,3],[4,5,6], [7], [8,9]]*132 run("list(flatten(l))") 14919 function calls (13203 primitive calls) in 0.013 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.013 0.013 <string>:1(<module>) 1716 0.002 0.000 0.002 0.000 cbook.py:475(iterable) 1716 0.004 0.000 0.006 0.000 cbook.py:484(is_string_like) 1716 0.003 0.000 0.011 0.000 cbook.py:565(is_scalar_or_string) 2905/1189 0.002 0.000 0.013 0.000 cbook.py:605(flatten) 1716 0.001 0.000 0.001 0.000 core.py:5641(isMaskedArray) 3432 0.001 0.000 0.001 0.000 {isinstance} 1716 0.001 0.000 0.001 0.000 {iter} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' UPDATE Which gave me another idea: l=[[1,2,3],[4,5,6], [7], [8,9]]*33 run("flattenlist(l)") 564 function calls (432 primitive calls) in 0.000 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 133/1 0.000 0.000 0.000 0.000 <ipython-input-55-39b139bad497>:4(flattenlist) 1 0.000 0.000 0.000 0.000 <string>:1(<module>) 429 0.000 0.000 0.000 0.000 {isinstance} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} l=[[1,2,3],[4,5,6], [7], [8,9]]*66 run("flattenlist(l)") 1125 function calls (861 primitive calls) in 0.001 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 265/1 0.001 0.000 0.001 0.001 <ipython-input-55-39b139bad497>:4(flattenlist) 1 0.000 0.000 0.001 0.001 <string>:1(<module>) 858 0.000 0.000 0.000 0.000 {isinstance} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} l=[[1,2,3],[4,5,6], [7], [8,9]]*99 run("flattenlist(l)") 1686 function calls (1290 primitive calls) in 0.001 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 397/1 0.001 0.000 0.001 0.001 <ipython-input-55-39b139bad497>:4(flattenlist) 1 0.000 0.000 0.001 0.001 <string>:1(<module>) 1287 0.000 0.000 0.000 0.000 {isinstance} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} l=[[1,2,3],[4,5,6], [7], [8,9]]*132 run("flattenlist(l)") 2247 function calls (1719 primitive calls) in 0.002 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 529/1 0.001 0.000 0.002 0.002 <ipython-input-55-39b139bad497>:4(flattenlist) 1 0.000 0.000 0.002 0.002 <string>:1(<module>) 1716 0.001 0.000 0.001 0.000 {isinstance} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} l=[[1,2,3],[4,5,6], [7], [8,9]]*1320 run("flattenlist(l)") 22443 function calls (17163 primitive calls) in 0.016 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 5281/1 0.011 0.000 0.016 0.016 <ipython-input-55-39b139bad497>:4(flattenlist) 1 0.000 0.000 0.016 0.016 <string>:1(<module>) 17160 0.005 0.000 0.005 0.000 {isinstance} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} So to test how effective it is when recursive gets deeper: How much deeper? l=[[1,2,3],[4,5,6], [7], [8,9]]*1320 new=[l]*33 run("flattenlist(new)") 740589 function calls (566316 primitive calls) in 0.418 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 174274/1 0.281 0.000 0.417 0.417 <ipython-input-55-39b139bad497>:4(flattenlist) 1 0.001 0.001 0.418 0.418 <string>:1(<module>) 566313 0.136 0.000 0.136 0.000 {isinstance} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} new=[l]*66 run("flattenlist(new)") 1481175 function calls (1132629 primitive calls) in 0.809 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 348547/1 0.542 0.000 0.807 0.807 <ipython-input-55-39b139bad497>:4(flattenlist) 1 0.002 0.002 0.809 0.809 <string>:1(<module>) 1132626 0.266 0.000 0.266 0.000 {isinstance} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} new=[l]*99 run("flattenlist(new)") 2221761 function calls (1698942 primitive calls) in 1.211 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 522820/1 0.815 0.000 1.208 1.208 <ipython-input-55-39b139bad497>:4(flattenlist) 1 0.002 0.002 1.211 1.211 <string>:1(<module>) 1698939 0.393 0.000 0.393 0.000 {isinstance} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} new=[l]*132 run("flattenlist(new)") 2962347 function calls (2265255 primitive calls) in 1.630 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 697093/1 1.091 0.000 1.627 1.627 <ipython-input-55-39b139bad497>:4(flattenlist) 1 0.003 0.003 1.630 1.630 <string>:1(<module>) 2265252 0.536 0.000 0.536 0.000 {isinstance} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} new=[l]*1320 run("flattenlist(new)") 29623443 function calls (22652523 primitive calls) in 16.103 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 6970921/1 10.842 0.000 16.069 16.069 <ipython-input-55-39b139bad497>:4(flattenlist) 1 0.034 0.034 16.103 16.103 <string>:1(<module>) 22652520 5.227 0.000 5.227 0.000 {isinstance} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} I will bet "flattenlist" I am going to use this rather than matploblib for a long long time unless I want a yield generator and fast result as "flatten" uses in matploblib.cbook This, is fast. And here is the code : typ=(list,tuple) def flattenlist(d): thelist = [] for x in d: if not isinstance(x,typ): thelist += [x] else: thelist += flattenlist(x) return thelist
From my experience, the most efficient way to flatten a list of lists is: flat_list = [] map(flat_list.extend, list_of_list) Some timeit comparisons with the other proposed methods: list_of_list = [range(10)]*1000 %timeit flat_list=[]; map(flat_list.extend, list_of_list) #10000 loops, best of 3: 119 µs per loop %timeit flat_list=list(itertools.chain.from_iterable(list_of_list)) #1000 loops, best of 3: 210 µs per loop %timeit flat_list=[i for sublist in list_of_list for i in sublist] #1000 loops, best of 3: 525 µs per loop %timeit flat_list=reduce(list.__add__,list_of_list) #100 loops, best of 3: 18.1 ms per loop Now, the efficiency gain appears better when processing longer sublists: list_of_list = [range(1000)]*10 %timeit flat_list=[]; map(flat_list.extend, list_of_list) #10000 loops, best of 3: 60.7 µs per loop %timeit flat_list=list(itertools.chain.from_iterable(list_of_list)) #10000 loops, best of 3: 176 µs per loop And this methods also works with any iterative object: class SquaredRange(object): def __init__(self, n): self.range = range(n) def __iter__(self): for i in self.range: yield i**2 list_of_list = [SquaredRange(5)]*3 flat_list = [] map(flat_list.extend, list_of_list) print flat_list #[0, 1, 4, 9, 16, 0, 1, 4, 9, 16, 0, 1, 4, 9, 16]
def is_iterable(item): return isinstance(item, list) or isinstance(item, tuple) def flatten(items): for i in items: if is_iterable(item): for m in flatten(i): yield m else: yield i Test: print list(flatten2([1.0, 2, 'a', (4,), ((6,), (8,)), (((8,),(9,)), ((12,),(10)))]))
What about: from operator import add reduce(add, map(lambda x: list(x.image_set.all()), [mi for mi in list_of_menuitems])) But, Guido is recommending against performing too much in a single line of code since it reduces readability. There is minimal, if any, performance gain by performing what you want in a single line vs. multiple lines.
pylab provides a flatten: link to numpy flatten
If you're looking for a built-in, simple, one-liner you can use: a = [[1, 2, 3], [4, 5, 6] b = [i[x] for i in a for x in range(len(i))] print b returns [1, 2, 3, 4, 5, 6]
If each item in the list is a string (and any strings inside those strings use " " rather than ' '), you can use regular expressions (re module) >>> flattener = re.compile("\'.*?\'") >>> flattener <_sre.SRE_Pattern object at 0x10d439ca8> >>> stred = str(in_list) >>> outed = flattener.findall(stred) The above code converts in_list into a string, uses the regex to find all the substrings within quotes (i.e. each item of the list) and spits them out as a list.
A simple alternative is to use numpy's concatenate but it converts the contents to float: import numpy as np print np.concatenate([[1,2],[3],[5,89],[],[6]]) # array([ 1., 2., 3., 5., 89., 6.]) print list(np.concatenate([[1,2],[3],[5,89],[],[6]])) # [ 1., 2., 3., 5., 89., 6.]
The easiest way to achieve this in either Python 2 or 3 is to use the morph library using pip install morph. The code is: import morph list = [[1,2],[3],[5,89],[],[6]] flattened_list = morph.flatten(list) # returns [1, 2, 3, 5, 89, 6]
In Python 3.4 you will be able to do: [*innerlist for innerlist in outer_list]