What type of comprehension is this? - python

I came across the following Python code and am having trouble understanding it:
''.join(random.choice(string.ascii_lowercase + string.ascii_uppercase + string.digits) for i in range(length))
The for loop tells me it's a comprehension, but of what type? It's not a list comprehension, because the [] are missing (unless there's a special syntax at work here). I tried to work it out by running
random.choice(string.ascii_lowercase + string.ascii_uppercase + string.digits) for i in range(length)
directly in the interpreter but got syntax error at for.
I did some digging around and came to a not-so-sure conclusion that this is what's called a generator comprehension, but I didn't find any examples that look like this one; they all use the () notation for creating the generator object.
So, is it like join() works on iterators (and therefore generators) and we actually have a generator syntax here? If yes, can we omit the surrounding () when creating generator objects in function calls?

you need join() because the list contains characters, and you want to get a string, hence join()
random.choice() selects random character from the argument list
the argument list contains ASCII upper/lower case characters and digits
the length of the resulting string is length
Summing up all together, this line of code generates a random string with length length that contains upper/lower case letters and numbers.
This is a plain old list comprehension, just the [] are missing because not required when you use join()

It creates an iterator, much like in a list comprehension. Take this example from pythonwiki:
# list comprehension
doubles = [2 * n for n in range(50)]
# same as the list comprehension above
doubles = list(2 * n for n in range(50))
Both are list comprehensions, but the former case is more familiar. I believe your example relies on the latter case. The wiki I linked calls this a generator expression.

Related

Generator expression vs list comprehension for adding values to a set [duplicate]

This question already has answers here:
Understanding generators in Python
(13 answers)
Why does it work when I append a new element to a TUPLE?
(2 answers)
Closed 3 years ago.
I am a tutor for an intermediate Python course at a university and I recently had some students come to me for the following problem (code is supposed to add all the values in a list to a set):
mylist = [10, 20, 30, 40]
my_set = set()
(my_set.add(num) for num in mylist)
print(my_set)
Their output was:
set()
Now, I realized their generator expression is the reason nothing is being added to the set, but I am unsure as to why.
I also realized that using a list comprehension rather than a generator expression:
[my_set.add(num) for num in mylist]
actually adds all the values to the set (although I realize this is memory inefficient as it involves allocating a list that is never used. The same could be done with just a for loop and no additional memory.).
My question is essentially why does the list comprehension add to the set, while the generator expression does not? Also would the generator expression be in-place, or would it allocate more memory?
Generator expressions are lazy, if you don't actually iterate over them, they do nothing (aside from compute the value of the iterator for the outermost loop, e.g. in this case, doing work equivalent to iter(mylist) and storing the result for when the genexpr is actually iterated). To make it work, you'd have to run out the generator, e.g. using the consume itertools recipe:
consume(my_set.add(num) for num in mylist)
# Unoptimized equivalent:
for _ in (my_set.add(num) for num in mylist):
pass
In any event, this is an insane thing to do; comprehensions and generator expressions are functional programming tools, and should not have side-effects, let alone be written solely for the purpose of producing side-effects. Code maintainers (reasonably) expect that comprehensions will trigger no "spooky action at a distance"; don't violate that expectation. Just use a set comprehension:
myset = {num for num in mylist}
or since the comprehension does nothing in this case, the set constructor:
myset = set(mylist) # Or with modern unpacking generalizations, myset = {*mylist}
Your students (and yourself perhaps) are using comprehension expressions as shorthand for loops - that's a bad pattern.
The answer to your question is that the list comprehension needs to be evaluated immediately, as the results are needed to populate the list, while the generator expression is only evaluated as it's being used.
You're interested in the side effect of that evaluation, but if the side effect is really the main goal, the code should just be:
myset = set(mylist)

Validating generated strings quickly (+ maths)

I'm trying to validate a large amount of generated strings, pumped out by itertools.permutations
The way I'd like to validate them is checking if every overlapping 2 characters are found in an array I have set up, the string is only valid if every overlapping 2 strings are in the "paths" array
I have the following code to validate:
def valid(s):
matches = re.findall("(?=(..))", s)
for match in matches:
if match not in paths:
return False
return True
Now I'm wondering if this can get any faster since it's too slow for my liking, I assume a non regex solution would be faster
Also I was wondering if it was possible to pre-calculate how many accepted strings I will have, given that: every character in the paths array is in the itertools iterable* (so keyspace is known) and the size of the "paths" array is also known
Edit: Paths currently has 250 combinations
This is the iterable "1920eran876i3om54lstchdkgbvupywjfx"
example valid output:
1920876
1920873
1920875
1920874
1920867
1920863
1920865
1920864
1920834
1920857
You can simplify this by realising that your regex simply matches every two-character pair. You can get this by zipping two different iterators, like so:
def valid(s):
for c, d in zip(s[:-1], s[1:]):
if c + d not in paths:
return False
return True
It might be faster to iterate through indices and use slice on every loop, but you'd have to test it.
I've been assuming that paths is a string, but if it's a Sequence[str] you can make it a set for extra performance.
You can speed this up further by hand-coding Python byte code, but that's probably a bit excessive.
If paths is a list [] it would make things quicker to turn it into a set set() or {items} as a set uses a hashtable and therefore you don't have to check through the whole list if the item you're looking for is at the end.

"for in" with unused variable

I need to generate a string with length x out of characters in y.
My straightforward approach was
''.join(random.choice(y) for i in xrange(x))
The problem with this is that i is unused.
Is there a better way to do this?
There is no better way; you can name the variable _ to indicate it is ignored:
''.join(random.choice(y) for _ in xrange(x))
_ is just a convention; experienced programmers reading your code will understand that it signifies 'not used' here, Python doesn't care either way.
From a performance perspective, using a list comprehension here happens to be faster:
''.join([random.choice(y) for _ in xrange(x)])
because the implementation requires two scans to first determine the output length first; this double scan means any generator expression is turned into a list anyway. Using a list comprehension here short-cuts that conversion and is faster.
I would use a different function from the random package, random.sample. An example:
import string
from random import sample
y = string.ascii_letters
x = 10
xstr = "".join(sample(y, x))
No need loop...

trying to parse a string and convert it to nested lists

I'm new to Python and blocking on this problem:
trying to go from a string like this:
mystring = '[ [10, 20], [20,50], [ [0,400], [50, 328], [22, 32] ], 30, 12 ]'
to the nested list that is represented by the string. Basically, the reverse of
str(mylist)
If I try the obvious option
list(mystring)
it separates each character into a different element and I lose the nesting.
Is there an attribute to the list or str types that does this that I missed in the doc (I use Python 3.3)? Or do I need to code a function that does this?
additionnaly, how would you go about implementing that function? I have no clue what would be required to create nested lists of arbitrary depth...
Thanks,
--Louis H.
Call the ast.literal_eval function on the string.
To implement it by oneself, one could use a recursive function which would convert the string into a list of strings which represent lists. Then those strings would be passed to the function and so on.
If I try the obvious solution list(mystring) it separates each character into a different element and I lose the nesting.
This is because list() actually generates a list out of an iterable, which list() converts into a iterator using the __iter__() method of strings. When a string is converted into an iterator, each character is generated.
Alternately if you're looking to do this for a more general conversion from strings to objects I would suggest using the json module. Works with dictionaries, and returns a tried and true specification that can be readily used throughout the developer and web space.
import json
nested_list = json.reads(mystring)
# You can even go the other way
mystring == json.dumps(nested_list)
>>> True
Additionally, there are convenient methods for dealing directly with files that contain this kind of string representation:
# Instead of
data_structure = json.loads(open(filename).read())
# Just
data_structure = json.load(filename)
The same works in reverse with dump instead of load
If you want to know why you should use json instead of ast.literal_eval(), it's an extremely established point and you should read this question.

Python concatenate list

I'm new to python and this is just to automate something on my PC. I want to concatenate all the items in a list. The problem is that
''.join(list)
won't work as it isn't a list of strings.
This site http://www.skymind.com/~ocrow/python_string/ says the most efficient way to do it is
''.join([`num` for num in xrange(loop_count)])
but that isn't valid python...
Can someone explain the correct syntax for including this sort of loop in a string.join()?
You need to turn everything in the list into strings, using the str() constructor:
''.join(str(elem) for elem in lst)
Note that it's generally not a good idea to use list for a variable name, it'll shadow the built-in list constructor.
I've used a generator expression there to apply the str() constructor on each and every element in the list. An alternative method is to use the map() function:
''.join(map(str, lst))
The backticks in your example are another spelling of calling repr() on a value, which is subtly different from str(); you probably want the latter. Because it violates the Python principle of "There should be one-- and preferably only one --obvious way to do it.", the backticks syntax has been removed from Python 3.
Here is another way (discussion is about Python 2.x):
''.join(map(str, my_list))
This solution will have the fastest performance and it looks nice and simple imo. Using a generator won't be more efficient. In fact this will be more efficient, as ''.join has to allocate the exact amount of memory for the string based on the length of the elements so it will need to consume the whole generator before creating the string anyway.
Note that `` has been removed in Python 3 and it's not good practice to use it anymore, be more explicit by using str() if you have to eg. str(num).
just use this, no need of [] and use str(num):
''.join(str(num) for num in xrange(loop_count))
for list just replace xrange(loop_count) with the list name.
example:
>>> ''.join(str(num) for num in xrange(10)) #use range() in python 3.x
'0123456789'
If your Python is too old for "list comprehensions" (the odd [x for x in ...] syntax), use map():
''.join(map(str, list))

Categories

Resources