incrementing defaultdict inside list comprehension (Python)

incrementing defaultdict inside list comprehension (Python) - python

I have this:
self.lines = [...]
cnt = defaultdict(int)
for line in self.lines:
cnt[line] += 1
Now this works. But I wonder if it (incrementing count of particular lines in defaultdict) could be done using list comprehension?
This is a syntax error:
[cnt[line] += 1 for line in self.lines]
Incidentally, why can't one use expressions like this within list comprehension? It's straightforward and would greatly improved both terseness and performance of such code.

Your list comprehension doesn't work because assignments are not expressions.
You shouldn't use list comprehension to replace a loop. Write a loop. List comprehensions are used for constructing lists.
Why do you think list comprehension would improve performance? If anything, it potentially hurts performance, because it needs to allocate and assign to the temporary list it builds, which is then never used. Imagine you have 1,000,000,000 lines in your original list.

You could use collections.Counter here:
>>> from collections import Counter
>>> lis = [1,2,3,3,3,5,6,1,2,2]
>>> Counter(lis)
Counter({2: 3, 3: 3, 1: 2, 5: 1, 6: 1})
cnt[line] += 1 is an assignment LC don't support assignments, and even using LCs for side effect is also a bad practice.
lis = []
[lis.append(x) for x in xrange(5)] #bad
Read: Is it Pythonic to use list comprehensions for just side effects?

Related

There is no counter in inline python?

When I'm trying to put counter in inline loop of Python, it tells me the syntax error. Apparently here it expects me to assign a value to i not k.
Could anyone help with rewriting the inline loop?
aa = [2, 2, 1]
k = 0
b = [k += 1 if i != 2 for i in aa ]
print(b)

You seem to misunderstand what you're doing. This:
[x for y in z]
is not an "inline for loop". A for loop can do anything, iterating on any iterable object. One of the things a for loop can do is create a list of items:
my_list = []
for i in other_list:
if condition_is_met:
my_list.append(i)
A list comprehension covers only this use case of a for loop:
my_list = [i for i in other_list if condition_is_met]
That's why it's called a "list comprehension" and not an "inline for loop" - because it only creates lists. The other things you might use a for loop for, like iterating a number, you can't directly use a list comprehension to do.
For your particular problem, you're trying to use k += 1 in a list comprehension. This operation doesn't return anything - it just modifies the variable k - so when python tries to assign that to a list item, the operation fails. If you want to count up with k, you should either just use a regular for loop:
for i in aa:
if i != 2:
k += 1
or use the list comprehension to indirectly measure what you want:
k += len([i for i in aa if i != 2])
Here, we use a list comprehension to construct a list of every element i in aa such that i != 2, then we take the number of elements in that list and add it to k. Since this operation actually produces a list of its own, the code will not crash, and it will have the same overall effect. This solution isn't always doable if you have more complicated things you'd like to do in a for loop - and it's slightly less efficient as well, because this solution requires actually creating the new list which isn't necessary for what you're trying to achieve.

you can use len() like so
print(len([i for i in a if i != 2]))

Organize/Formatting the python code to one line

Is there any way to rewrite the below python code in one line
for i in range(len(main_list)):
if main_list[i] != []:
for j in range(len(main_list[i])):
main_list[i][j][6]=main_list[i][j][6].strftime('%Y-%m-%d')
something like below,
[main_list[i][j][6]=main_list[i][j][6].strftime('%Y-%m-%d') for i in range(len(main_list)) if main_list[i] != [] for j in range(len(main_list[i]))]
I got SyntaxError for this.
Actually, i'm trying to storing all the values fetched from table into one list. Since the table contains date method/datatype, my requirement needs to convert it to string as i faced with malformed string error.
So my approach is to convert that element of list from datetime.date() to str. And i got it working. Just wanted it to work with one line

Use the explicit for loop. There's no better option.
A list comprehension is used to create a new list, not to modify certain elements of an existing list.
You may be able to update values via a list comprehension, e.g. [L.__setitem__(i, 'some_value') for i in range(len(L))], but this is not recommended as you are using a side-effect and in the process creating a list of None values which you then discard.
You could also write a convoluted list comprehension with a ternary statement indicating when you meet the 6th element in a 3rd nested sublist. But this will make your code difficult to maintain.
In short, use the for loop.

You're getting a syntax error because you're not allowed to perform assignments within a list comprehension. Python forbids assignments because it is discouraging over complex list comprehensions in favour of for loops.
Obviously you shouldn't do this on one line, but this is how to do it:
import datetime
# Example from your comment:
type1 = "some type"
main_list = [[], [],
[[1, 2, 3, datetime.date(2016, 8, 18), type1],
[3, 4, 5, datetime.date(2016, 8, 18), type1]], [], []]
def fmt_times(lst):
"""Format the fourth value of each element of each non-empty sublist"""
for i in range(len(lst)):
if lst[i] != []:
for j in range(len(lst[i])):
lst[i][j][3] = lst[i][j][3].strftime('%Y-%m-%d')
return lst
def fmt_times_one_line(main_list):
"""Format the fourth value of each element of each non-empty sublist"""
return [[] if main_list[i] == [] else [[main_list[i][j][k] if k != 3 else main_list[i][j][k].strftime('%Y-%m-%d') for k in range(len(main_list[i][j]))] for j in range(len(main_list[i])) ] for i in range(len(main_list))]
import copy
# Deep copy needed because fmt_times modifies the sublists.
assert fmt_times(copy.deepcopy(main_list)) == fmt_times_one_line(main_list)
The list comprehension is a functional thing. If you know how map() works in python or javascript then it's the same thing. In a map() or comprehension we generally don't mutate the data we're mapping over (and python discourages attempting it) so instead we recreate the entire object, substituting only the values we wanted to modify.

One line?
main_list = convert_list(main_list)
You will have to put a few more lines somewhere else though:
def convert_list(main_list):
for i, ml in enumerate(main_list):
if isinstance(ml, list) and len(ml) > 0:
main_list[i] = convert_list(ml)
elif isinstance(ml, datetime.date):
main_list[i] = ml.strftime('%Y-%m-%d')
return main_list
You might be able to whack this together with a list comprehension but it's a terrible idea (for reasons better explained in the other answer).

How does the list comprehension to flatten a python list work? [duplicate]

This question already has answers here:
How can I use list comprehensions to process a nested list?
(13 answers)
Closed 7 months ago.
I recently looked for a way to flatten a nested python list, like this: [[1,2,3],[4,5,6]], into this: [1,2,3,4,5,6].
Stackoverflow was helpful as ever and I found a post with this ingenious list comprehension:
l = [[1,2,3],[4,5,6]]
flattened_l = [item for sublist in l for item in sublist]
I thought I understood how list comprehensions work, but apparently I haven't got the faintest idea. What puzzles me most is that besides the comprehension above, this also runs (although it doesn't give the same result):
exactly_the_same_as_l = [item for item in sublist for sublist in l]
Can someone explain how python interprets these things? Based on the second comprension, I would expect that python interprets it back to front, but apparently that is not always the case. If it were, the first comprehension should throw an error, because 'sublist' does not exist. My mind is completely warped, help!

Let's take a look at your list comprehension then, but first let's start with list comprehension at it's easiest.
l = [1,2,3,4,5]
print [x for x in l] # prints [1, 2, 3, 4, 5]
You can look at this the same as a for loop structured like so:
for x in l:
print x
Now let's look at another one:
l = [1,2,3,4,5]
a = [x for x in l if x % 2 == 0]
print a # prints [2,4]
That is the exact same as this:
a = []
l = [1,2,3,4,5]
for x in l:
if x % 2 == 0:
a.append(x)
print a # prints [2,4]
Now let's take a look at the examples you provided.
l = [[1,2,3],[4,5,6]]
flattened_l = [item for sublist in l for item in sublist]
print flattened_l # prints [1,2,3,4,5,6]
For list comprehension start at the farthest to the left for loop and work your way in. The variable, item, in this case, is what will be added. It will produce this equivalent:
l = [[1,2,3],[4,5,6]]
flattened_l = []
for sublist in l:
for item in sublist:
flattened_l.append(item)
Now for the last one
exactly_the_same_as_l = [item for item in sublist for sublist in l]
Using the same knowledge we can create a for loop and see how it would behave:
for item in sublist:
for sublist in l:
exactly_the_same_as_l.append(item)
Now the only reason the above one works is because when flattened_l was created, it also created sublist. It is a scoping reason to why that did not throw an error. If you ran that without defining the flattened_l first, you would get a NameError

The for loops are evaluated from left to right. Any list comprehension can be re-written as a for loop, as follows:
l = [[1,2,3],[4,5,6]]
flattened_l = []
for sublist in l:
for item in sublist:
flattened_l.append(item)
The above is the correct code for flattening a list, whether you choose to write it concisely as a list comprehension, or in this extended version.
The second list comprehension you wrote will raise a NameError, as 'sublist' has not yet been defined. You can see this by writing the list comprehension as a for loop:
l = [[1,2,3],[4,5,6]]
flattened_l = []
for item in sublist:
for sublist in l:
flattened_l.append(item)
The only reason you didn't see the error when you ran your code was because you had previously defined sublist when implementing your first list comprehension.
For more information, you may want to check out Guido's tutorial on list comprehensions.

For the lazy dev that wants a quick answer:
>>> a = [[1,2], [3,4]]
>>> [i for g in a for i in g]
[1, 2, 3, 4]

While this approach definitely works for flattening lists, I wouldn't recommend it unless your sublists are known to be very small (1 or 2 elements each).
I've done a bit of profiling with timeit and found that this takes roughly 2-3 times longer than using a single loop and calling extend…
def flatten(l):
flattened = []
for sublist in l:
flattened.extend(sublist)
return flattened
While it's not as pretty, the speedup is significant. I suppose this works so well because extend can more efficiently copy the whole sublist at once instead of copying each element, one at a time. I would recommend using extend if you know your sublists are medium-to-large in size. The larger the sublist, the bigger the speedup.
One final caveat: obviously, this only holds true if you need to eagerly form this flattened list. Perhaps you'll be sorting it later, for example. If you're ultimately going to just loop through the list as-is, this will not be any better than using the nested loops approach outlined by others. But for that use case, you want to return a generator instead of a list for the added benefit of laziness…
def flatten(l):
return (item for sublist in l for item in sublist) # note the parens

Note, of course, that the sort of comprehension will only "flatten" a list of lists (or list of other iterables). Also if you pass it a list of strings you'll "flatten" it into a list of characters.
To generalize this in a meaningful way you first want to be able to cleanly distinguish between strings (or bytearrays) and other types of sequences (or other Iterables). So let's start with a simple function:
import collections
def non_str_seq(p):
'''p is putatively a sequence and not a string nor bytearray'''
return isinstance(p, collections.Iterable) and not (isinstance(p, str) or isinstance(p, bytearray))
Using that we can then build a recursive function to flatten any
def flatten(s):
'''Recursively flatten any sequence of objects
'''
results = list()
if non_str_seq(s):
for each in s:
results.extend(flatten(each))
else:
results.append(s)
return results
There are probably more elegant ways to do this. But this works for all the Python built-in types that I know of. Simple objects (numbers, strings, instances of None, True, False are all returned wrapped in list. Dictionaries are returned as lists of keys (in hash order).

One-line & multi-line loops and vectorization in Python

Can one-line loops in Python only be used to build lists? (i.e. list comprehensions), or can they use for more general computing?
For example, I am aware that list comprehensions (~single-line loop) in Python, e.g.
my_list = [ 2*i for i in range(10)]
can also be built with a multi-line loop:
my_list = []
for i in range(10):
my_list.append(2*i)
But can we always transform general multi-line loops into one-line loops?
For example, say we have the following multi-line for loop:
my_array = np.ones(10*10)
for x in range(10):
my_array[x,:] = 0
can we convert it into a single-line loop? More generally:
Q1. Are the two forms functionally equivalent? (i.e. they support the same set of manipulations/operations)
Q2. I think I have read before that one-line loops in Python are vectorized. Is this true? And does this mean that they can iterate faster than multi-line loops?

But can we we always transform general multi-line loops in Python into one-line loops?
The short answer is no.
List comprehensions are good for projections (mapping) and/or filtering.
For example, if you have code like this:
result = []
for x in seq:
if bar(x):
result.append(foo(x))
Then, as you point out, it can benefit from being rewritten as a list comprehension:
result = [foo(x) for f in seq if bar(x)]
However list comprehensions are generally not so good for operations that don't fit into this projection or filtering pattern.
For example if you need to mutate the elements but don't need the result then a list comprehension wouldn't be suitable. The following code would be inconvenient to write as a list comprehension (assuming that both methods return None):
for x in seq:
x.foo() # Modify x
x.bar() # Modify x again
Some operations are not allowed inside a comprehension. An example is breaking out of the loop early if a condition is met:
for x in seq:
if x.foo():
break
else:
x.bar()

One thing I'll point out is that it's not just lists, you can use comprehension to create sets and even dictionaries.
>>> {i**2 for i in range(5)}
set([0, 1, 4, 16, 9])
>>> {i : str(i) for i in range(5)}
{0: '0', 1: '1', 2: '2', 3: '3', 4: '4'}
Also, list comprehension is generally faster than using append numerous times (like your example) because the comprehension is done by underlying C code, as opposed to append, which has the extra Python-layer.
Of course, comprehension has limitations like anything else. If you wanted to perform a larger set of operations on each element of a list/set, then a normal loop might be more appropriate.

what is a quick way to delete all elements from a list that do not satisfy a constraint?

I have a list of strings. I have a function that given a string returns 0 or 1. How can I delete all strings in the list for which the function returns 0?

[x for x in lst if fn(x) != 0]
This is a "list comprehension", one of Python's nicest pieces of syntactical sugar that often takes lines of code in other languages and additional variable declarations, etc.
See:
http://docs.python.org/tutorial/datastructures.html#list-comprehensions

I would use a generator expression over a list comprehension to avoid a potentially large, intermediate list.
result = (x for x in l if f(x))
# print it, or something
print list(result)
Like a list comprehension, this will not modify your original list, in place.

edit: see the bottom for the best answer.
If you need to mutate an existing list, for example because you have another reference to it somewhere else, you'll need to actually remove the values from the list.
I'm not aware of any such function in Python, but something like this would work (untested code):
def cull_list(lst, pred):
"""Removes all values from ``lst`` which for which ``pred(v)`` is false."""
def remove_all(v):
"""Remove all instances of ``v`` from ``lst``"""
try:
while True:
lst.remove(v)
except ValueError:
pass
values = set(lst)
for v in values:
if not pred(v):
remove_all(v)
A probably more-efficient alternative that may look a bit too much like C code for some people's taste:
def efficient_cull_list(lst, pred):
end = len(lst)
i = 0
while i < end:
if not pred(lst[i]):
del lst[i]
end -= 1
else:
i += 1
edit...: as Aaron pointed out in the comments, this can be done much more cleanly with something like
def reversed_cull_list(lst, pred):
for i in range(len(lst) - 1, -1, -1):
if not pred(lst[i]):
del lst[i]
...edit
The trick with these routines is that using a function like enumerate, as suggested by (an) other responder(s), will not take into account the fact that elements of the list have been removed. The only way (that I know of) to do that is to just track the index manually instead of allowing python to do the iteration. There's bound to be a speed compromise there, so it may end up being better just to do something like
lst[:] = (v for v in lst if pred(v))
Actually, now that I think of it, this is by far the most sensible way to do an 'in-place' filter on a list. The generator's values are iterated before filling lst's elements with them, so there are no index conflict issues. If you want to make this more explicit just do
lst[:] = [v for v in lst if pred(v)]
I don't think it will make much difference in this case, in terms of efficiency.
Either of these last two approaches will, if I understand correctly how they actually work, make an extra copy of the list, so one of the bona fide in-place solutions mentioned above would be better if you're dealing with some "huge tracts of land."

>>> s = [1, 2, 3, 4, 5, 6]
>>> def f(x):
... if x<=2: return 0
... else: return 1
>>> for n,x in enumerate(s):
... if f(x) == 0: s[n]=None
>>> s=filter(None,s)
>>> s
[3, 4, 5, 6]

With a generator expression:
alist[:] = (item for item in alist if afunction(item))
Functional:
alist[:] = filter(afunction, alist)
or:
import itertools
alist[:] = itertools.ifilter(afunction, alist)
All equivalent.
You can also use a list comprehension:
alist = [item for item in alist if afunction(item)]
An in-place modification:
import collections
indexes_to_delete= collections.deque(
idx
for idx, item in enumerate(alist)
if afunction(item))
while indexes_to_delete:
del alist[indexes_to_delete.pop()]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.