In sum: I need to write a List Comprehension in which i refer to list that is being created by the List Comprehension.
This might not be something you need to do every day, but i don't think it's unusual either.
Maybe there's no answer here--still, please don't tell me i ought to use a for loop. That might be correct, but it's not helpful. The reason is the problem domain: this line of code is part of an ETL module, so performance is relevant, and so is the need to avoid creating a temporary container--hence my wish to code this step in a L/C. If a for loop would work for me here, i would just code one.
In any event, i am unable to write this particular list comprehension. The reason: the expression i need to write has this form:
[ some_function(s) for s in raw_data if s not in this_list ]
In that pseudo-code, "this_list" refers to the list created by evaluating that list comprehension. And that's why i'm stuck--because this_list isn't built until my list comprehension is evaluated, and because this list isn't yet built by the time i need to refer to it, i don't know how to refer to it.
What i have considered so far (and which might be based on one or more false assumptions, though i don't know exactly where):
doesn't the python interpreter have
to give this list-under-construction
a name? i think so
that temporary name is probably taken
from some bound method used to build
my list ('sum'?)
but even if i went to the trouble to
find that bound method and assuming
that it is indeed the temporary name
used by the python interpreter to
refer to the list while it is under
construction, i am pretty sure you
can't refer to bound methods
directly; i'm not aware of such an
explicit rule, but those methods (at
least the few that i've actually
looked at) are not valid python
syntax. I'm guessing one reason why
is so that we do not write them into
our code.
so that's the chain of my so-called reasoning, and which has led me to conclude, or at least guess, that i have coded myself into a corner. Still i thought i ought to verify this with the Community before turning around and going a different direction.
There used to be a way to do this using the undocumented fact that while the list was being built its value was stored in a local variable named _[1].__self__. However that quit working in Python 2.7 (maybe earlier, I wasn't paying close attention).
You can do what you want in a single list comprehension if you set up an external data structure first. Since all your pseudo code seemed to be doing with this_list was checking it to see if each s was already in it -- i.e. a membership test -- I've changed it into a set named seen as an optimization (checking for membership in a list can be very slow if the list is large). Here's what I mean:
raw_data = [c for c in 'abcdaebfc']
seen = set()
def some_function(s):
seen.add(s)
return s
print [ some_function(s) for s in raw_data if s not in seen ]
# ['a', 'b', 'c', 'd', 'e', 'f']
If you don't have access to some_function, you could put a call to it in your own wrapper function that added its return value to the seen set before returning it.
Even though it wouldn't be a list comprehension, I'd encapsulate the whole thing in a function to make reuse easier:
def some_function(s):
# do something with or to 's'...
return s
def add_unique(function, data):
result = []
seen = set(result) # init to empty set
for s in data:
if s not in seen:
t = function(s)
result.append(t)
seen.add(t)
return result
print add_unique(some_function, raw_data)
# ['a', 'b', 'c', 'd', 'e', 'f']
In either case, I find it odd that the list being built in your pseudo code that you want to reference isn't comprised of a subset of raw_data values, but rather the result of calling some_function on each of them -- i.e. transformed data -- which naturally makes one wonder what some_function does such that its return value might match an existing raw_data item's value.
I don't see why you need to do this in one go. Either iterate through the initial data first to eliminate duplicates - or, even better, convert it to a set as KennyTM suggests - then do your list comprehension.
Note that even if you could reference the "list under construction", your approach would still fail because s is not in the list anyway - the result of some_function(s) is.
As far as I know, there is no way to access a list comprehension as it's being built.
As KennyTM mentioned (and if the order of the entries is not relevant), then you can use a set instead. If you're on Python 2.7/3.1 and above, you even get set comprehensions:
{ some_function(s) for s in raw_data }
Otherwise, a for loop isn't that bad either (although it will scale terribly)
l = []
for s in raw_data:
item = somefunction(s)
if item not in l:
l.append(item)
Why don't you simply do:[ some_function(s) for s in set(raw_data) ]
That should do what you are asking for. Except when you need to preserve the order of the previous list.
Related
Say I make a list comprehension that looks something like this:
i = range(5)
a = [f(i) for i in i]
for some function f. Will using a dummy name identical to the iterator ever yield unexpected results? Sometimes I have variable names that are individual letters, and to me it is more readable to stick with the same letter rather than assigning a new one, like [f(x) for x in x] instead of [f(i) for i in x] (for instance, if the letter of the iterator x is meaningful, I will wonder what the heck i is).
TL;DR: It is safe, technically, but it's a poor choice stylistically.
In a list comprehension, before binding the free variable of the for-loop to any object, Python will use a GET_ITER opcode on the iterable to get an iterator. This is done just once at the beginning of the loop.
Therefore in the body of the "loop" of the list comprehension (which actually creates a scope in Python 3), you may rebind the name which originally pointed to the iterable without any consequence. The iteration deals with a reference to the iterator directly, and whether or not it has a name in scope is irrelevant. The same should hold true in Python 2, though the scoping implementation details are different: the name of the collection will be lost after the comprehension, as the loop variable name will remain bound to the final element of iteration.
There is no advantage to writing the code in this way, and it is less readable than just avoiding the name collision. So, you should prefer to name the collection so that it more obvious that it is a collection:
[f(x) for x in xs]
While you can get away with using duplicate variable names due to the way Python executes list comprehensions - even nested list comprehensions -
don't do this. It may seem more readable in your opinion, but for most people, it would be very confusing.
This leads to a much more important point, however. Why use a name like i, j, or x at all? Using single letter variable names cause confusion and are ambiguous. Instead, use a variable name which clearly conveys your intent.
Or, if you don't have an need at all for the values in the iterable your iterating through(eg. You simple want to repeat a block of code a certain number of times), use a "throw-away" variable, _, to convey to the reader of your code that the value is not important and should be ignored.
But don't use non-descriptive, duplicate, single letter variable names. That will only serve to confuse future readers of your code, make your intent unclear, and create code hard to maintain and debug.
Because in the end, would you rather maintain code like this
[str(x) for x in x]
or this?
[str(user_id) for user_id in user_ids]
Probably a stupid question, but I am wondering in general, and if anyone knows, how much foresight the Python interpreter has, specifically in the field of regular expressions and text parsing.
Suppose my code at some point looks like this:
mylist = ['a', 'b', 'c', ... ]
if 'g' in list: print(mylist.index('g'))
is there any safer way to do this with a while loop or similar. I mean, will the index be looked up with a second parsing from the beginning or are the two g's (in the above line) the same thing in Python's mind?
It'll do the lookup both times. If it's worth it (for, say, a very big list), use try:
try:
print(mylist.index('g'))
except ValueError:
pass
The result of the containment check is not cached, and so the index will need to be discovered anew. And the dynamic nature of Python makes implicit caching of such a thing unreliable since the __contains__() method may mutate the object (although it would be a violation of several programming principles to do so).
Your code will result in two lookups, first to determine if 'g' is in the list and second to find the index. Python won't try to consolidate them into a single lookup. If you're worried about efficiency you can use a dictionary instead of a list which will make both lookups O(1) instead of O(n).
You can easily make a dict to look up. Something like this:
mydict = {k:v for v,k in enumerate(mylist)}
The overhead of creating the dict won't be worthwhile unless you are doing a few such lookups on the same list
Try is better option to find the index of element in list.
try:
print(mylist.index('g'))
except ValueError:
print "value not in list"
pass
yeah it will be looked up twice, the python interpreter doesn't cache instructions, though I've being wondering if its possible (for certain things), if this is an issue, then you can use sets or dicts both of which have constant look up time.
Either way it seems you are LBYL, in python we tend to EAFP so its quite common to wrap such things in try ... except blocks
I'm fairly new to python and have found that I need to query a list about whether it contains a certain item.
The majority of the postings I have seen on various websites (including this similar stackoverflow question) have all suggested something along the lines of
for i in list
if i == thingIAmLookingFor
return True
However, I have also found from one lone forum that
if thingIAmLookingFor in list
# do work
works.
I am wondering if the if thing in list method is shorthand for the for i in list method, or if it is implemented differently.
I would also like to which, if either, is more preferred.
In your simple example it is of course better to use in.
However... in the question you link to, in doesn't work (at least not directly) because the OP does not want to find an object that is equal to something, but an object whose attribute n is equal to something.
One answer does mention using in on a list comprehension, though I'm not sure why a generator expression wasn't used instead:
if 5 in (data.n for data in myList):
print "Found it"
But this is hardly much of an improvement over the other approaches, such as this one using any:
if any(data.n == 5 for data in myList):
print "Found it"
the "if x in thing:" format is strongly preferred, not just because it takes less code, but it also works on other data types and is (to me) easier to read.
I'm not sure how it's implemented, but I'd expect it to be quite a lot more efficient on datatypes that are stored in a more searchable form. eg. sets or dictionary keys.
The if thing in somelist is the preferred and fastest way.
Under-the-hood that use of the in-operator translates to somelist.__contains__(thing) whose implementation is equivalent to: any((x is thing or x == thing) for x in somelist).
Note the condition tests identity and then equality.
for i in list
if i == thingIAmLookingFor
return True
The above is a terrible way to test whether an item exists in a collection. It returns True from the function, so if you need the test as part of some code you'd need to move this into a separate utility function, or add thingWasFound = False before the loop and set it to True in the if statement (and then break), either of which is several lines of boilerplate for what could be a simple expression.
Plus, if you just use thingIAmLookingFor in list, this might execute more efficiently by doing fewer Python level operations (it'll need to do the same operations, but maybe in C, as list is a builtin type). But even more importantly, if list is actually bound to some other collection like a set or a dictionary thingIAmLookingFor in list will use the hash lookup mechanism such types support and be much more efficient, while using a for loop will force Python to go through every item in turn.
Obligatory post-script: list is a terrible name for a variable that contains a list as it shadows the list builtin, which can confuse you or anyone who reads your code. You're much better off naming it something that tells you something about what it means.
I have a list of lists that looks like:
floodfillque = [[1,1,e],[1,2,w], [1,3,e], [2,1,e], [2,2,e], [2,3,w]]
for each in floodfillque:
if each[2] == 'w':
floodfillque.remove(each)
else:
tempfloodfill.append(floodfillque[each[0+1][1]])
That is a simplified, but I think relevant part of the code.
Does the floodfillque[each[0+1]] part do what I think it is doing and taking the value at that location and adding one to it or no? The reason why I ask is I get this error:
TypeError: 'int' object is unsubscriptable
And I think I am misunderstanding what that code is actually doing or doing it wrong.
In addition to the bug in your code that other answers have already spotted, you have at least one more:
for each in floodfillque:
if each[2] == 'w':
floodfillque.remove(each)
don't add or remove items from the very container you're looping on. While such a bug will usually be diagnosed only for certain types of containers (not including lists), it's just as terrible for lists -- it will end up altering your intended semantics by skipping some items or seeing some items twice.
If you can't substantially alter and enhance your logic (generally by building a new, separate container rather than mucking with the one you're looping on), the simplest workaround is usually to loop on a copy of the container you must alter:
for each in list(floodfillque):
Now, your additions and removals won't alter what you're actually looping on (because what you're looping on is a copy, a "snapshot", made once and for all at loop's start) so your semantics will work as intended.
Your specific approach to altering floodfillque also has a performance bug -- it behaves quadratically, while sound logic (building a new container rather than altering the original one) would behave linearly. But, that bug is harder to fix without refactoring your code from the current not-so-nice logic to the new, well-founded one.
Here's what's happening:
On the first iteration of the loop, each is [1, 1, 'e']. Since each[2] != 'w', the else is executed.
In the else, you take each[0+1][1], which is the same as (each[0+1])[1]. each[0+1] is 1, and so you are doing (1)[1]. int objects can't be indexed, which is what's raising the error.
Does the floodfillque[each[0+1] part
do what I think it is doing and taking
the value at that location and adding
one to it or no?
No, it sounds like you want each[0] + 1.
Either way, the error you're getting is because you're trying to take the second item of an integer... each[0+1][1] resolves to each[1][1] which might be something like 3[1], which doesn't make any sense.
The other posters are correct. However, there is another bug in this code, which is that you are modifying floodfillque as you are iterating over it. This will cause problems, because Python internally maintains a counter to handle the loop, and deleting elements does not modify the counter.
The safe way to do this is to iterate of a copy of the loop:
for each in floodfillque[ : ]:
([ : ] is Python's notation for a copy.)
Here is how I understand NoahClark's intentions:
Remove those sublists whose third element is 'w'
For the remaining sublist, add 1 to the second item
If this is the case, the following will do:
# Here is the original list
floodfillque = [[1,1,'e'], [1,2,'w'], [1,3,'e'], [2,1,'e'], [2,2,'e'], [2,3,'w']]
# Remove those sublists which have 'w' as the third element
# For the rest, add one to the second element
floodfillque = [[a,b+1,c] for a,b,c in floodfillque if c != 'w']
This solution works fine, but it is not the most efficient: it creates a new list instead of patching up the original one.
I have a list full of various bits of information that I would like to pass to several strings for inclusion via the new string format method. As a toy example, let us define
thelist = ['a', 'b', 'c']
I would like to do a print statement like print '{0} {2}'.format(thelist) and print '{1} {2}'.format(thelist)
When I run this, I receive the message IndexError: tuple index out of range; when mucking about, it clearly takes the whole list as a single object. I would, of course, rather it translate thelist to 'a', 'b', 'c'.
I tried using a tuple and received the same error.
What on Earth is this particular technique called? If I knew the name, I could have searched for it. "Expand" is clearly not it. "Explode" doesn't yield anything useful.
My actual use is much longer and more tedious than the toy example.
.format(*thelist)
It's part of the calling syntax in Python. I don't know the name either, and I'm not convinced it has one. See the tutorial.
It doesn't just work on lists, though, it works for any iterable object.
'{0} {2}'.format(*thelist)
docs