How does this implementation of the Python sum function work? [duplicate] - python

This question already has answers here:
Understanding generators in Python
(13 answers)
What does "list comprehension" and similar mean? How does it work and how can I use it?
(5 answers)
Closed last year.
I have found an example online of how to count items in a list with the sum() function in Python; however, when I search for how to use the sum() function on the internet, all I can find is the basic sum(iterable, start), which adds numbers together from each element of the list/array.
Code I found, where each line of the file contains one word, and file = open("words.txt", "r"):
wordsInFile = sum(1 for line in file)
this works in my program, and I kind of see what is happening, but I would like to learn more about this kind of syntax, and what it can or can't recognize besides line. It seems pretty efficient, but I can't find any website explaining how it works, which prevents me from using this in the future in other contexts.

This expression is a generator.
First, let's write it a bit differently
wordsInFile = sum([1 for line in file])
In this form, [1 for line in file] is called a list comprehension. It's basically a for loop which produces a list, wrapped up into one line. It's similar to
wordsInFile = []
for line in file:
wordsInFile.append(1)
but a lot more concise.
Now, when we remove the brackets
wordsInFile = sum(1 for line in file)
we get what's called a generator expression. It's basically the same as what I wrote before except that it doesn't produce an intermediate list. It produces a special iterator object that supplies the values on-demand, which is more efficient for a function like sum that only uses its input once (and hence doesn't need us to waste a bunch of memory producing a big list).

Related

TypeError: 'list' object cannot be interpreted as an integer, when using a regex. How can I fix this?

I know there are similar threads to this question (having looked at them already) but I cannot, as a noob, work out how to translate those answers across to adjust my script to make it work (4+ days of trying).
So.. I have a python script to randomly select a subset of items from a file and components of those items. I want to create two new txt files as output. One with the subset of items and one with just a list of components (Ingredients) for those items.
To do this I have done write-lines to the first txt (MenuOutput.txt)file and then want to use regex (re.sub) to strip out the first part of the string from each line in the second file (ShoppingOutput.txt).
Now the issue: the TypeError: 'list' object cannot be interpreted as an integer. I understand (I think) the problem is the re.sub outputs a list object. But I don't know another way to strip the first part of each line from a text file. Is there a way of tweaking the re.sub to make it work, or do I need another function I am unaware of?
Menu_choices = random.sample(sample_list, k=6)
MenuOutput = open('MenuOutput.txt', 'w')
for element in Menu_choices:
MenuOutput.write(element)
MenuOutput.close()
MyFile = open('ShoppingOutput.txt', 'w')
ShoppingOutput = re.sub(r'.*?', 'I', Menu_choices)
for element in ShoppingOutput:
MyFile.write(element)
MyFile.close
Just like you loop over the list of strings to write them, you have to loop over them to perform other string manipulations on them.
with open('ShoppingOutput.txt', 'w') as my_file:
for element in MenuChoices:
my_file.write(re.sub(r'.*?', 'I', element))
Notice also the upgrade to a with statement, and using snake_case for regular variables.
Your regex seems both inexact and inefficient, though. Probably better to just my_file.write('I' + element)) and get rid of the no-op re.sub, or perhaps replace with a simple substring operation if the intent was to remove a prefix but you hadn't worked out the correct regex for that yet.
my_file.write('I' + element[element.index(' ')+1:])
would write everything after the first space.

While loops where only conditions are executed? [duplicate]

This question already has answers here:
Remove all occurrences of a value from a list?
(26 answers)
Closed 2 years ago.
So I want to execute only while loop statements, without putting anything inside them. For example, I have an array arr from which I have to remove multiple occurrences of some element. The instant the condition statement returns an error, while loop should end.
arr=[1,2,4,2,4,2,2]
This removes only one 2:
arr.remove(2)
I need to run this as long as it does not return error. (C++ has a semicolon put after while to do this).
I want something like this
while(arr.remove(2));
Three things.
First, it's not considered good practice in Python – it's not "pythonic" – to use an expression for its side effects. This is why, for example, the Python assignment operator is not itself an expression. (Although you can do something like a = b = 1 to set multiple variables to the same value, that statement doesn't break down as a = (b = 1); any such attempt to use an assignment statement as a value is a syntax error.)
Second, modifying data in place is also discouraged; it's usually better to make a copy and make the changes as the copy is constructed.
Third, even if this were a good way to do things, it wouldn't work in this case. When the remove method succeeds, it returns None, which is not truthy, so your loop exits immediately. On the other hand, when it fails, instead of returning a false value, it throws an exception, which will abort your whole program instead of just the loop, unless you wrap it in a try block.
So the list comprehension probably is the best solution here.
The way you are looking to solve this does not yield the results you are looking for. Since you are looking to create a new list, you are not going to want to use the remove function as per #Matthias comment. The idiomatic way to do it would be something along the lines of this:
arr=[1,2,4,2,4,2,2]
arr = [x if x != 2 for x in arr]
So I want to execute only while loop statements, without putting anything inside them.
That's really not necessary. Don't try to copy other language's syntax in Python. Different languages are designed with different objectives and hence, they have different syntax (or grammar of the language). Python has a different way of doing things than C++.
If you want to focus on the effectiveness of the program, then that's the different story. See this for more information on this.
Unfortunately, remove doesn't return anything (it returns None). So, you can't have anything that would look neat and clean without putting anything inside while.
Pythonic way to remove all occurrence of a element from list:
list(filter((2).__ne__, arr))
Or
arr = [x for x in arr if x != 2]
Or
while 2 in arr:
arr.remove(2)
you can use:
arr = [1,2,4,2,4,2,2]
try:
while arr.pop(arr.index(2)):
pass
except ValueError:
pass
print(arr)
#[1, 4, 4]
I am assuming you want to remove all occurrences of an element. This link might help you.
click here

unpacking a split inside a list comprehension [duplicate]

This question already has an answer here:
List comprehension with repeated computation
(1 answer)
Closed 5 years ago.
If I want to generate a list of tuples based on elements of lines of a document, i can do :
[(line.split()[0], line.split()[-1][3:8]) for line in open("doc.txt")]
for example (i added the slicing to show that I might want use some operations on the elements of the split).
Still I would like to avoid using split two times, because that's unefficient.
So I wanted to use something like unpacking, with
[(linesplit0, linesplit1[3:8]) for line in open("doc.txt") for (linesplit0, linesplit1) in line.split()]
but that can't work since there are no tuples in the split, so at each element of the split we will be lacking one element.
What I would like is something that allows to use a placeholder name for the list resulting of the split (like splittedlist or whatever), and that could be used with indexing (splittedlist[0]), or unpacking or both), and that would be compatible with the comprehension list syntax.
Is it feasible?
You can use map (python3) or itertools.imap (python2) over open:
[(line[0], line[-1][3:8]) for line in map(str.split, open("doc.txt"))]
or use a generator:
[(line[0], line[-1][3:8]) for line in ( l.split() for l in open("doc.txt"))]
You can use map with the unbound method str.split:
[(linesplit[0], linesplit[-1][3:8]) for linesplit in map(str.split, open("doc.txt"))]
However I'd stay away from these; I'd instead use a generator:
def read_input(filename):
with open(filename) as f:
for line in f:
parts = line.split()
yield parts[0], parts[-1][3:8]
It might be a bit more, but it is easier to follow - and readability counts - and the user has a choice between using read_input('doc.txt') as such, or wrapping it into a list if needed.

Compound conditions in a for loop

Python allows an "if" condition in list comprehensions, e.g.:
[l for l in lines if l.startswith('example')]
This feature is missing in regular "for" loop, so in absence of:
for line in lines if line.startswith('example'):
statements
one needs to assess the condition in the loop:
for line in lines:
if line.startswith('example'):
statements
or to embed the generator expression, like:
for line in [l for l in lines if l.startswith('example')]:
statements
Is my understanding correct? Is there a better or more pythonic way than ones I listed above to achieve the same result of adding a condition in the for loop?
Please notice "lines" was chosen just as an example, any collection or generator could be there.
Several nice ideas came from other answers and comments, but I think this recent discussion on Python-ideas and its continuation are the best answer to this question.
To summarize: the idea was already discussed in the past, and the benefits did not seem enough to motivate the syntax change, considering:
increased language complexity and impact on learning curve
technical changes in all implementations: CPython, Jython, Pypy..
possible weird situations that extreme use of the synthax could lead
One point that people seem to highly consider is to avoid bringing Perl-alike complexity that compromise maintainability.
This message and this one nicely summarize possible alternatives (almost already appeared in this page as well) to a compound if-statement in for-loop:
# nested if
for l in lines:
if l.startswith('example'):
body
# continue, to put an accent on exceptional case
for l in lines:
if not l.startswith('example'):
continue
body
# hacky way of generator expression
# (better than comprehension as does not store a list)
for l in (l for l in lines if l.startswith('example')):
body()
# and its named version
def gen(lines):
return (l for l in lines if l.startswith('example'))
for line in gen(lines):
body
# functional style
for line in filter(lambda l: l.startswith('example'), lines):
body()
Maybe not Pythonic, but you could filter the lines.
for line in filter(lambda l: l.startswith('example'), lines):
print(line)
And you could define your own filter function, of course, if that lambda bothers you, or you want more complex filtering.
def my_filter(line):
return line.startswith('example') or line.startswith('sample')
for line in filter(my_filter, lines):
print(line)
I would say that having the condition within the loop is better because you aren't maintaining the "filtered" list in memory as you iterate over the lines.
So, that'd just be
for line in file:
if not my_filter(line):
continue
# statements
Its not that the feature is missing, I can't think of any way it could be done except in some special cases. (l for l in lines if l.startswith('example')) is a generator object and the l variable is local to that object. The for only sees what was returned by the generator's __next__ method.
The for is very different because the result of the generator needs to be bound to a variable in the caller's scope. You could have written
for line in (line for line in lines if l.startswith('example')):
foo(line)
safely because those two line's are in different scopes.
Further, the generator doesn't have to return just its local variable. It can evaluate any expression. How would you shortcut this?
for line in (foo(line)+'bar' for line in lines if line.startswith('example')):
statements
Suppose you have a list of lists
for l in (l[:] for l in list_of_lists if l):
l.append('modified')
That shouldn't append to the original lists.
Is there a better or more pythonic way than ones I listed above to achieve the same result of adding a condition in the for loop?
No, there is not, and there shouldn't be; that was the rationale for why list comprehensions got here in the first place. From the corresponding PEP:
List comprehensions provide a more concise way to create lists in situations where map() and filter() and/or nested loops would currently be used.
List comprehensions constitute an alternative for nested for, ifs; why would you want an alternative to the alternative?
If you need to use an if with a for, you nest it inside it, if you don't want to do that, you use a list comprehension. Flat is better than nested but readability counts; allowing an if there would result in long ugly lines that are harder to visually parse.

How to loop exact N times in Python? [duplicate]

This question already has answers here:
Is it possible to implement a Python for range loop without an iterator variable?
(15 answers)
Closed 8 years ago.
For example, I need to read a file by calling readline for 10 times.
with open("input") as input_file:
for i in range(10):
line = input_file.readline()
# Process the line here
This is a very common technique to use range to control the number of loops. The only downside is: there's an unused i variable.
Is this the best I can get from Python? Any better ideas?
P.S. In Ruby we can do:
3.times do
puts "This will be printed 3 times"
end
Which is elegant and express the intention very clearly.
That's pretty much the best option for looping over something a specific amount of times.
A common idiom is to do for _ in ... instead, representing the _ as a sort of placeholder.
Use islice from itertools
from itertools import islice
with open("input", 'r') as input_file:
for line in islice(input_file, 10):
#process line
Since you can iterate over lines of files directly, there is no need to call input_file.readline()
see the documentation for itertools.islice
There is no equivalent construct in python.
(Incidentally, even in the Ruby example, there is still a need for a counter! The language simply hides it from you. It is also possible that the interpreter is generating bytecode with those lines repeated multiple times, rather than a loop, but that is unlikely.)
The solution you already provided is pretty much the best one.
However, if you really wanted to, you could abstract it out using generators and iterslice so you don't need to see an unused variable:
import itertools
def take(iterable, amount):
return itertools.islice(iterable, 0, amount)
with open("inputfile.txt") as file:
for line in take(file.readlines(), 3):
print line

Categories

Resources