How to loop exact N times in Python? [duplicate] - python

This question already has answers here:
Is it possible to implement a Python for range loop without an iterator variable?
(15 answers)
Closed 8 years ago.
For example, I need to read a file by calling readline for 10 times.
with open("input") as input_file:
for i in range(10):
line = input_file.readline()
# Process the line here
This is a very common technique to use range to control the number of loops. The only downside is: there's an unused i variable.
Is this the best I can get from Python? Any better ideas?
P.S. In Ruby we can do:
3.times do
puts "This will be printed 3 times"
end
Which is elegant and express the intention very clearly.

That's pretty much the best option for looping over something a specific amount of times.
A common idiom is to do for _ in ... instead, representing the _ as a sort of placeholder.

Use islice from itertools
from itertools import islice
with open("input", 'r') as input_file:
for line in islice(input_file, 10):
#process line
Since you can iterate over lines of files directly, there is no need to call input_file.readline()
see the documentation for itertools.islice

There is no equivalent construct in python.
(Incidentally, even in the Ruby example, there is still a need for a counter! The language simply hides it from you. It is also possible that the interpreter is generating bytecode with those lines repeated multiple times, rather than a loop, but that is unlikely.)

The solution you already provided is pretty much the best one.
However, if you really wanted to, you could abstract it out using generators and iterslice so you don't need to see an unused variable:
import itertools
def take(iterable, amount):
return itertools.islice(iterable, 0, amount)
with open("inputfile.txt") as file:
for line in take(file.readlines(), 3):
print line

Related

How does this implementation of the Python sum function work? [duplicate]

This question already has answers here:
Understanding generators in Python
(13 answers)
What does "list comprehension" and similar mean? How does it work and how can I use it?
(5 answers)
Closed last year.
I have found an example online of how to count items in a list with the sum() function in Python; however, when I search for how to use the sum() function on the internet, all I can find is the basic sum(iterable, start), which adds numbers together from each element of the list/array.
Code I found, where each line of the file contains one word, and file = open("words.txt", "r"):
wordsInFile = sum(1 for line in file)
this works in my program, and I kind of see what is happening, but I would like to learn more about this kind of syntax, and what it can or can't recognize besides line. It seems pretty efficient, but I can't find any website explaining how it works, which prevents me from using this in the future in other contexts.
This expression is a generator.
First, let's write it a bit differently
wordsInFile = sum([1 for line in file])
In this form, [1 for line in file] is called a list comprehension. It's basically a for loop which produces a list, wrapped up into one line. It's similar to
wordsInFile = []
for line in file:
wordsInFile.append(1)
but a lot more concise.
Now, when we remove the brackets
wordsInFile = sum(1 for line in file)
we get what's called a generator expression. It's basically the same as what I wrote before except that it doesn't produce an intermediate list. It produces a special iterator object that supplies the values on-demand, which is more efficient for a function like sum that only uses its input once (and hence doesn't need us to waste a bunch of memory producing a big list).

Find the intersection of two sorted lists of integers using sys in Python 3

If you have done HireVue coding challenges you know that you have inputs that are already in the system and should be called with:
import sys (I guess)
How should I write the code that reads raw inputs to find the intersection between the 2 lists?
An intersection is simply a common value between the 2 lists.
My attempt:
def intersection(lst1, lst2):
lst3 = [value for value in lst1 if value in lst2]
return lst3
The problem is that I don't know how to put in the code the command that reads the raw input.
Just use the intersection operator of a set:
def intersection(lst1, lst2):
return list(set(lst1) & set(lst2))
If you have import sys and you need to take inputs as well on your own, one way that I am aware of is using sys.stdin for input.
sys.stdin is the Standard Input Stream for Python that normally uses the terminal or console running the program for input. But it can be updated to use some different file as well. It might be the case that your code platform has updated this parameter to reference their own input.
With sys.stdin, you have methods that can help you in taking inputs from stdin such as:
sys.stdin.read(): To read some characters from sys.stdin
sys.stdin.readline(): To read a line from sys.stdin
sys.stdin.readlines(): To read multiple lines from sys.stdin
To check about all its available methods, use import sys; help(sys.stdin) with Python 3.
Now, for your required code for intersection, [val for val in lst1 if val in lst2] might work as expected, but will take some more time since in operator in lists will look for your val in lst2 in O(n) time, which can be optimised if you use sets, as mentioned in other answers here.
I hope it helps. I was initially planning to leave a comment, but with under 50 reputation for now, I had no other choice but to separately answer to your question. Let me know by commenting below if anything is incorrect or unclear here and we might update it.

Is there a way to improve my code that generates a txt file with 10^8 lines with the return of the intertools.product function?

I want to create a text file filled with lines that are the result of a permutation with repetition with 10 numbers in 8 possibles positions and I am using the itertools.product function because it returns what I am deserving. The problem is the script takes too long and (I am supposing) a lot of resources, basically processor clocking.
I have the following code:
from itertools import product
F = open("dic.txt", "w")
for option in product([0,1,2,3,4,5,6,7,8,9], repeat=8):
line = str()
for number in option:
line += str(number)
line += "\n"
F.write(line)
F.close()
It works perfectly if the repeat argument is just 5
Edit: option are tuples, that is way I loop again.
This is tricky to answer, because I think there are easier ways to accomplish your overall goal (which we still don't know). But your code could be made better like this:
from itertools import product
with open("dic.txt", "w") as f:
for option in product([0,1,2,3,4,5,6,7,8,9], repeat=8):
f.write("{}\n".format("".join(str(o) for o in option)))
You can also replace your list of choices with a range:
from itertools import product
with open("dic.txt", "w") as f:
for option in product(range(10), repeat=8):
f.write("{}\n".format("".join(str(o) for o in option)))
You could iterate over a larger number of lines before doing the write.
Simply keep adding to the same stringe eg. 1000 times before doing the F.write(line). That way you save some overhead time to keep writing to the file againg and again.
Maximum speed option using CPython implementation details: Push all the work to C, allow product to reuse its result tuple by mapping the output so the tuple is converted and released before the next one is requested, and convert to using ints and raw bytes, to avoid repeatedly both converting from int to str and avoid all encoding overhead:
from itertools import product
with open("dic.txt", "wb") as f:
f.writelines(map(b'%b\n'.__mod__, map(bytes, product(range(b'0'[0], b'9'[0]+1), repeat=8)))
Some of these optimizations might be slightly wrong for your use case (e.g. if you're on Windows and want line ending conversions, if your locale isn't ASCII compatible, etc.), but they're all relatively easy to workaround; this is just one of the simplest examples, for ASCII compatible locales where plain newlines are acceptable.
Try
F = open("dic.txt", "w")
F.write('\n'.join([str(option) for option in product([0,1,2,3,4,5,6,7,8,9], repeat=8)]))
F.close()

Compound conditions in a for loop

Python allows an "if" condition in list comprehensions, e.g.:
[l for l in lines if l.startswith('example')]
This feature is missing in regular "for" loop, so in absence of:
for line in lines if line.startswith('example'):
statements
one needs to assess the condition in the loop:
for line in lines:
if line.startswith('example'):
statements
or to embed the generator expression, like:
for line in [l for l in lines if l.startswith('example')]:
statements
Is my understanding correct? Is there a better or more pythonic way than ones I listed above to achieve the same result of adding a condition in the for loop?
Please notice "lines" was chosen just as an example, any collection or generator could be there.
Several nice ideas came from other answers and comments, but I think this recent discussion on Python-ideas and its continuation are the best answer to this question.
To summarize: the idea was already discussed in the past, and the benefits did not seem enough to motivate the syntax change, considering:
increased language complexity and impact on learning curve
technical changes in all implementations: CPython, Jython, Pypy..
possible weird situations that extreme use of the synthax could lead
One point that people seem to highly consider is to avoid bringing Perl-alike complexity that compromise maintainability.
This message and this one nicely summarize possible alternatives (almost already appeared in this page as well) to a compound if-statement in for-loop:
# nested if
for l in lines:
if l.startswith('example'):
body
# continue, to put an accent on exceptional case
for l in lines:
if not l.startswith('example'):
continue
body
# hacky way of generator expression
# (better than comprehension as does not store a list)
for l in (l for l in lines if l.startswith('example')):
body()
# and its named version
def gen(lines):
return (l for l in lines if l.startswith('example'))
for line in gen(lines):
body
# functional style
for line in filter(lambda l: l.startswith('example'), lines):
body()
Maybe not Pythonic, but you could filter the lines.
for line in filter(lambda l: l.startswith('example'), lines):
print(line)
And you could define your own filter function, of course, if that lambda bothers you, or you want more complex filtering.
def my_filter(line):
return line.startswith('example') or line.startswith('sample')
for line in filter(my_filter, lines):
print(line)
I would say that having the condition within the loop is better because you aren't maintaining the "filtered" list in memory as you iterate over the lines.
So, that'd just be
for line in file:
if not my_filter(line):
continue
# statements
Its not that the feature is missing, I can't think of any way it could be done except in some special cases. (l for l in lines if l.startswith('example')) is a generator object and the l variable is local to that object. The for only sees what was returned by the generator's __next__ method.
The for is very different because the result of the generator needs to be bound to a variable in the caller's scope. You could have written
for line in (line for line in lines if l.startswith('example')):
foo(line)
safely because those two line's are in different scopes.
Further, the generator doesn't have to return just its local variable. It can evaluate any expression. How would you shortcut this?
for line in (foo(line)+'bar' for line in lines if line.startswith('example')):
statements
Suppose you have a list of lists
for l in (l[:] for l in list_of_lists if l):
l.append('modified')
That shouldn't append to the original lists.
Is there a better or more pythonic way than ones I listed above to achieve the same result of adding a condition in the for loop?
No, there is not, and there shouldn't be; that was the rationale for why list comprehensions got here in the first place. From the corresponding PEP:
List comprehensions provide a more concise way to create lists in situations where map() and filter() and/or nested loops would currently be used.
List comprehensions constitute an alternative for nested for, ifs; why would you want an alternative to the alternative?
If you need to use an if with a for, you nest it inside it, if you don't want to do that, you use a list comprehension. Flat is better than nested but readability counts; allowing an if there would result in long ugly lines that are harder to visually parse.

Use readlines() with indices or parse lines on the fly?

I'm making a simple test function that asserts that the output from an interpreter I'm developing is correct, by reading from a file the expression to evaluate and the expected result, much like python's doctest. This is for scheme, so an example of an input file would be
> 42
42
> (+ 1 2 3)
6
My first attempt for a function that can parse such a file looks like the following, and it seems to work as expected:
def run_test(filename):
interp = Interpreter()
response_next = False
num_tests = 0
with open(filename) as f:
for line in f:
if response_next:
assert response == line.rstrip('\n')
response_next = False
elif line.startswith('> '):
num_tests += 1
response = interp.eval(line[2:])
response = str(response) if response else ''
response_next = True
print "{:20} Ran {} tests successfully".format(os.path.basename(filename),
num_tests)
I wanted to improve it slightly by removing the response_next flag, as I am not a fan of such flags, and instead read in the next line within the elif block with next(f). I had a small unrelated question regarding that which I asked about in IRC at freenode. I got the help I wanted but I was also given the suggestion to use f.readlines() instead, and then use indexing on the resulting list. (I was also told that I could use groupby() in itertools for the pairwise lines, but I'll investigate that approach later.)
Now to the question, I was very curious why that approach would be better, but my Internet connection was a flaky one on a train and I was unable to ask, so I'll ask it here instead. Why would it be better to read everything with readlines() instead of parsing every line as they are read on the fly?
I'm really wondering as my feeling is the opposite, I think it seems cleaner to parse the lines one at a time so that everything is finished in one go. I usually avoid using indices in arrays in Python and prefer to work with iterators and generators. Maybe it is impossible to answer and guess what the person was thinking in case it was a subjective opinion, but if there is some general recommendation I'd be happy to hear about it.
It's certainly more Pythonic to process input iteratively rather than reading the whole input at once; for example, this will work if the input is a console.
An argument in favour of reading a whole array and indexing is that using next(f) could be unclear when combined with a for loop; the options there would be either to replace the for loop with a while True or to fully document that you are calling next on f within the loop:
try:
while True:
test = next(f)
response = next(f)
except StopIteration:
pass
As Jonas suggests you could accomplish this (if you're sure that the input will always consist of lines test/response/test/response etc.) by zipping the input with itself:
for test, response in zip(f, f): # Python 3
for test, response in itertools.izip(f, f): # Python 2
Reading everything into an array gives you the equivalent of random access: You use an array index to move down the array, and at any time you can check what's next and back up if necessary.
If you can carry out your task without backing up, you don't need the random access and it would be cleaner to do without it. In your examples, it seems that your syntax is always a single-line (?) expression followed by the expected response. So, I'd have written a top-level loop that iterates once per expression-value pair, reading lines as necessary.
If you want to support multi-line expressions and results, you can write separate functions to read each one: One that reads a complete expression, one that reads a result (up to the next blank line). The important thing is they should be able consume as much input as they need, and leave the input pointer in a reasonable state for the next input.
from itertools import ifilter,imap
def run_test(filename):
interp = Interpreter()
num_tests, num_passed, last_result = 0, 0, None
with open(filename) as f:
# iterate over non-blank lines
for line in ifilter(None, imap(str.strip, f)):
if line.startswith('> '):
last_result = interp.eval(line[2:])
else:
num_tests += 1
try:
assert line == repr(last_test_result)
except AssertionError, e:
print e.message
else:
num_passed += 1
print("Ran {} tests, {} passed".format(num_tests, num_passed))
... this simply assumes that any result-line refers to the preceding test.
I would avoid .readlines() unless you get get some specific benefit from having the whole file available at once.
I also changed the comparison to look at the representation of the result, so it can distinguish between output types, ie
'6' + '2'
> '62'
60 + 2
> 62

Categories

Resources