This question already has an answer here:
List comprehension with repeated computation
(1 answer)
Closed 5 years ago.
If I want to generate a list of tuples based on elements of lines of a document, i can do :
[(line.split()[0], line.split()[-1][3:8]) for line in open("doc.txt")]
for example (i added the slicing to show that I might want use some operations on the elements of the split).
Still I would like to avoid using split two times, because that's unefficient.
So I wanted to use something like unpacking, with
[(linesplit0, linesplit1[3:8]) for line in open("doc.txt") for (linesplit0, linesplit1) in line.split()]
but that can't work since there are no tuples in the split, so at each element of the split we will be lacking one element.
What I would like is something that allows to use a placeholder name for the list resulting of the split (like splittedlist or whatever), and that could be used with indexing (splittedlist[0]), or unpacking or both), and that would be compatible with the comprehension list syntax.
Is it feasible?
You can use map (python3) or itertools.imap (python2) over open:
[(line[0], line[-1][3:8]) for line in map(str.split, open("doc.txt"))]
or use a generator:
[(line[0], line[-1][3:8]) for line in ( l.split() for l in open("doc.txt"))]
You can use map with the unbound method str.split:
[(linesplit[0], linesplit[-1][3:8]) for linesplit in map(str.split, open("doc.txt"))]
However I'd stay away from these; I'd instead use a generator:
def read_input(filename):
with open(filename) as f:
for line in f:
parts = line.split()
yield parts[0], parts[-1][3:8]
It might be a bit more, but it is easier to follow - and readability counts - and the user has a choice between using read_input('doc.txt') as such, or wrapping it into a list if needed.
Related
This question already has answers here:
Understanding generators in Python
(13 answers)
What does "list comprehension" and similar mean? How does it work and how can I use it?
(5 answers)
Closed last year.
I have found an example online of how to count items in a list with the sum() function in Python; however, when I search for how to use the sum() function on the internet, all I can find is the basic sum(iterable, start), which adds numbers together from each element of the list/array.
Code I found, where each line of the file contains one word, and file = open("words.txt", "r"):
wordsInFile = sum(1 for line in file)
this works in my program, and I kind of see what is happening, but I would like to learn more about this kind of syntax, and what it can or can't recognize besides line. It seems pretty efficient, but I can't find any website explaining how it works, which prevents me from using this in the future in other contexts.
This expression is a generator.
First, let's write it a bit differently
wordsInFile = sum([1 for line in file])
In this form, [1 for line in file] is called a list comprehension. It's basically a for loop which produces a list, wrapped up into one line. It's similar to
wordsInFile = []
for line in file:
wordsInFile.append(1)
but a lot more concise.
Now, when we remove the brackets
wordsInFile = sum(1 for line in file)
we get what's called a generator expression. It's basically the same as what I wrote before except that it doesn't produce an intermediate list. It produces a special iterator object that supplies the values on-demand, which is more efficient for a function like sum that only uses its input once (and hence doesn't need us to waste a bunch of memory producing a big list).
I know there are similar threads to this question (having looked at them already) but I cannot, as a noob, work out how to translate those answers across to adjust my script to make it work (4+ days of trying).
So.. I have a python script to randomly select a subset of items from a file and components of those items. I want to create two new txt files as output. One with the subset of items and one with just a list of components (Ingredients) for those items.
To do this I have done write-lines to the first txt (MenuOutput.txt)file and then want to use regex (re.sub) to strip out the first part of the string from each line in the second file (ShoppingOutput.txt).
Now the issue: the TypeError: 'list' object cannot be interpreted as an integer. I understand (I think) the problem is the re.sub outputs a list object. But I don't know another way to strip the first part of each line from a text file. Is there a way of tweaking the re.sub to make it work, or do I need another function I am unaware of?
Menu_choices = random.sample(sample_list, k=6)
MenuOutput = open('MenuOutput.txt', 'w')
for element in Menu_choices:
MenuOutput.write(element)
MenuOutput.close()
MyFile = open('ShoppingOutput.txt', 'w')
ShoppingOutput = re.sub(r'.*?', 'I', Menu_choices)
for element in ShoppingOutput:
MyFile.write(element)
MyFile.close
Just like you loop over the list of strings to write them, you have to loop over them to perform other string manipulations on them.
with open('ShoppingOutput.txt', 'w') as my_file:
for element in MenuChoices:
my_file.write(re.sub(r'.*?', 'I', element))
Notice also the upgrade to a with statement, and using snake_case for regular variables.
Your regex seems both inexact and inefficient, though. Probably better to just my_file.write('I' + element)) and get rid of the no-op re.sub, or perhaps replace with a simple substring operation if the intent was to remove a prefix but you hadn't worked out the correct regex for that yet.
my_file.write('I' + element[element.index(' ')+1:])
would write everything after the first space.
I need to write an array of integers into a text file, but the formatted solution is adding the comma after each item and I'd like to avoid the last one.
The code looks like this:
with open(name, 'a+') as f:
line = ['FOO ', description, '|Bar|']
f.writelines(line)
f.writelines("%d," % item for item in values)
f.writelines('\n')
Each line starts with a small description of what the array to follow contains, and then a list of integers. New lines are added in the loop as they become available.
The output I get looks something like this:
FOO description|Bar|274,549,549,824,824,824,824,824,794,765,765,736,736,736,736,736,
And I would like to have it look like this, without the last comma:
FOO description|Bar|274,549,549,824,824,824,824,824,794,765,765,736,736,736,736,736
I was unable to find a solution that would work with the writelines() and I need to avoid lengthy processing in additional loops.
Use join:
f.writelines(",".join(map(str,values)))
Note that values is first mapped to a list of strings, instead of numbers, with map.
You can slice it with using below example.
It will always delete last character.
line = ['FOO ', description, '|Bar|']
line = line[:-1]
f.writelines(line)
Slicing is the best approach and works well for every situation atleast in your case.
f.writelines(line[:-1])
You can use print function here.
print(*values,sep=',',file=f)
If you are using python2 please import print function.
from __future__ import print_function
Python allows an "if" condition in list comprehensions, e.g.:
[l for l in lines if l.startswith('example')]
This feature is missing in regular "for" loop, so in absence of:
for line in lines if line.startswith('example'):
statements
one needs to assess the condition in the loop:
for line in lines:
if line.startswith('example'):
statements
or to embed the generator expression, like:
for line in [l for l in lines if l.startswith('example')]:
statements
Is my understanding correct? Is there a better or more pythonic way than ones I listed above to achieve the same result of adding a condition in the for loop?
Please notice "lines" was chosen just as an example, any collection or generator could be there.
Several nice ideas came from other answers and comments, but I think this recent discussion on Python-ideas and its continuation are the best answer to this question.
To summarize: the idea was already discussed in the past, and the benefits did not seem enough to motivate the syntax change, considering:
increased language complexity and impact on learning curve
technical changes in all implementations: CPython, Jython, Pypy..
possible weird situations that extreme use of the synthax could lead
One point that people seem to highly consider is to avoid bringing Perl-alike complexity that compromise maintainability.
This message and this one nicely summarize possible alternatives (almost already appeared in this page as well) to a compound if-statement in for-loop:
# nested if
for l in lines:
if l.startswith('example'):
body
# continue, to put an accent on exceptional case
for l in lines:
if not l.startswith('example'):
continue
body
# hacky way of generator expression
# (better than comprehension as does not store a list)
for l in (l for l in lines if l.startswith('example')):
body()
# and its named version
def gen(lines):
return (l for l in lines if l.startswith('example'))
for line in gen(lines):
body
# functional style
for line in filter(lambda l: l.startswith('example'), lines):
body()
Maybe not Pythonic, but you could filter the lines.
for line in filter(lambda l: l.startswith('example'), lines):
print(line)
And you could define your own filter function, of course, if that lambda bothers you, or you want more complex filtering.
def my_filter(line):
return line.startswith('example') or line.startswith('sample')
for line in filter(my_filter, lines):
print(line)
I would say that having the condition within the loop is better because you aren't maintaining the "filtered" list in memory as you iterate over the lines.
So, that'd just be
for line in file:
if not my_filter(line):
continue
# statements
Its not that the feature is missing, I can't think of any way it could be done except in some special cases. (l for l in lines if l.startswith('example')) is a generator object and the l variable is local to that object. The for only sees what was returned by the generator's __next__ method.
The for is very different because the result of the generator needs to be bound to a variable in the caller's scope. You could have written
for line in (line for line in lines if l.startswith('example')):
foo(line)
safely because those two line's are in different scopes.
Further, the generator doesn't have to return just its local variable. It can evaluate any expression. How would you shortcut this?
for line in (foo(line)+'bar' for line in lines if line.startswith('example')):
statements
Suppose you have a list of lists
for l in (l[:] for l in list_of_lists if l):
l.append('modified')
That shouldn't append to the original lists.
Is there a better or more pythonic way than ones I listed above to achieve the same result of adding a condition in the for loop?
No, there is not, and there shouldn't be; that was the rationale for why list comprehensions got here in the first place. From the corresponding PEP:
List comprehensions provide a more concise way to create lists in situations where map() and filter() and/or nested loops would currently be used.
List comprehensions constitute an alternative for nested for, ifs; why would you want an alternative to the alternative?
If you need to use an if with a for, you nest it inside it, if you don't want to do that, you use a list comprehension. Flat is better than nested but readability counts; allowing an if there would result in long ugly lines that are harder to visually parse.
This question already has answers here:
Is it possible to implement a Python for range loop without an iterator variable?
(15 answers)
Closed 8 years ago.
For example, I need to read a file by calling readline for 10 times.
with open("input") as input_file:
for i in range(10):
line = input_file.readline()
# Process the line here
This is a very common technique to use range to control the number of loops. The only downside is: there's an unused i variable.
Is this the best I can get from Python? Any better ideas?
P.S. In Ruby we can do:
3.times do
puts "This will be printed 3 times"
end
Which is elegant and express the intention very clearly.
That's pretty much the best option for looping over something a specific amount of times.
A common idiom is to do for _ in ... instead, representing the _ as a sort of placeholder.
Use islice from itertools
from itertools import islice
with open("input", 'r') as input_file:
for line in islice(input_file, 10):
#process line
Since you can iterate over lines of files directly, there is no need to call input_file.readline()
see the documentation for itertools.islice
There is no equivalent construct in python.
(Incidentally, even in the Ruby example, there is still a need for a counter! The language simply hides it from you. It is also possible that the interpreter is generating bytecode with those lines repeated multiple times, rather than a loop, but that is unlikely.)
The solution you already provided is pretty much the best one.
However, if you really wanted to, you could abstract it out using generators and iterslice so you don't need to see an unused variable:
import itertools
def take(iterable, amount):
return itertools.islice(iterable, 0, amount)
with open("inputfile.txt") as file:
for line in take(file.readlines(), 3):
print line