Python, sets, Comparing two elements in the same set - python

If I have a python set and I want to find out if one element in the set is part of another element in the same set, how do I do it?
I've tried using indicies but I run into the following:
mySet = {"hello", "lo"}
mySet[1] in mySet[0] #I expect to return true
TypeError: 'set' object does not support indexing
I haven't found the python docs to be particularly helpful in this situation because I don't know how to compare elements within a set.
BTW, this is my first Stackoverflow question ever. I tried to adhere to the best practices. If there is a way I can improve the question, please let me know. Thank you for your help!

Sets don't have order. The index of an element is effectively the element itself. If you do need sets (although I have suspicions another data structure may be suitable) then they are iterable, and you can compare each element with other elements, but this won't be terrific performance wise, eg:
mySet = {"hello", "lo"}
for item in mySet:
for other_item in mySet.difference([item]):
if item in other_item:
print item, other_item

'set' object does not support indexing.
That clearly states that you can not index an element of set as mySet[1].
to access a single element of a set you have to use it like mySet.pop()

It looks like you're not actually trying to compare sets, but rather members of sets. The problem is you can't grab indexed members, because sets are an unordered (and as such unindexed) collection of elements.
You're trying to compare these two elements (strings). What you want is therefore a list or tuple:
>>> myTuple = ('hello', 'lo')
>>> myTuple[1] in myTuple[0]
True
This checks if the string 'lo' is a substring of 'hello'. This appears to be what you're trying to accomplish in your question.

Related

Why does Pythons' set() return a set item instead of a list

I quite often use set() to remove duplicates from lists. After doing so, I always directly change it back to a list.
a = [0,0,0,1,2,3,4,5]
b = list(set(a))
Why does set() return a set item, instead of simply a list?
type(set(a)) == set # is true
Is there a use for set items that I have failed to understand?
Yes, sets have many uses. They have lots of nice operations documented here which lists don't have. One very useful difference is that membership testing (x in a) can be much faster than for a list.
Okay, by doubles you mean duplicate? and set() will always return a set because it is a data structure in python like lists. when you are calling set you are creating an object of set().
rest of the information about sets you can find here
https://docs.python.org/2/library/sets.html
As already mentioned, I won't go into why set does not return a list but like you stated:
I quite often use set() to remove doubles from lists. After doing so, I always directly change it back to a list.
You could use OrderedDict if you really hate going back to changing it to a list:
source_list = [0,0,0,1,2,3,4,5]
from collections import OrderedDict
print(OrderedDict((x, True) for x in source_list).keys())
OUTPUT:
odict_keys([0, 1, 2, 3, 4, 5])
As said before, for certain operations if you use set instead of list, it is faster. Python wiki has query TimeComplexity in which speed of operations of various data types are given. Note that if you have few elements in your list or set, you will most probably do not notice difference, but with more elements it become more important.
Notice that for example if you want to make in-place removal, for list it is O(n) meaning that for 10 times longer list it will need 10 times more time, while for set and s.difference_update(t) where s is set, t is set with one element to be removed from s, time is O(1) i.e. independent from number of elements of s.

Python : Adding data to list

I am learning lists and trying to create a list and add data to it.
mylist=[]
mylist[0]="hello"
This generates Error.
Why cant we add members to lists like this, like we do with arrays in javascript.
Since these are also dynamic and we can add as many members and of any data type to it.
In javascript this works:
var ar=[];
ar[0]=333;
Why this dosent work in Python and we only use append() to add to list.
mylist[0] = 'hello' is syntactic sugar for mylist.__setitem__(0, 'hello').
As per the docs for object.__setitem__(self, key, value):
The same exceptions should be raised for improper key values as for
the __getitem__() method.
The docs for __getitem__ states specifically what leads to IndexError:
if value outside the set of indexes for the sequence (after any
special interpretation of negative values), IndexError should be
raised.
As to the purpose behind this design decision, one can write several chapters to explain why list has been designed in this way. You should familiarise yourself with Python list indexing and slicing before making judgements on its utility.
Lists in Python are fundamentally different to arrays in languages like C. You do not create a list of a fixed size and assign elements to indexes in it. Instead you either create an empty list and append elements to it, or use a list-comprehension to generate a list from a type of expression.
In your case, you want to add to the end, so you must use the .append method:
mylist.append('hello')
#["hello"]
And an example of a list comprehension:
squares = [x**2 for x in range(10)]
#[1,4,9,16,25,36,49,64,81,100]

Adding Elements from a List of Lists to a Set?

I'm attempting to add elements from a list of lists into a set. For example if I had
new_list=[['blue','purple'],['black','orange','red'],['green']]
How would I receive
new_set=(['blue','purple'],['black','orange','red'],['green'])
I'm trying to do this so I can use intersection to find out what elements appear in 2 sets. I thought this would work...
results=set()
results2=set()
for element in new_list:
results.add(element)
for element in new_list2:
results2.add(element)
results3=results.intersection(results2)
but I keep receiving:
TypeError: unhashable type: 'list'
for some reason.
Convert the inner lists to tuples, as sets allow you to store only hashable(immutable) objects:
In [72]: new_list=[['blue','purple'],['black','orange','red'],['green']]
In [73]: set(tuple(x) for x in new_list)
Out[73]: set([('blue', 'purple'), ('black', 'orange', 'red'), ('green',)])
How would I receive
new_set=(['blue','purple'],['black','orange','red'],['green'])
Well, despite the misleading name, that's not a set of anything, that's a tuple of lists. To convert a list of lists into a tuple of lists:
new_set = tuple(new_list)
Maybe you wanted to receive this?
new_set=set([['blue','purple'],['black','orange','red'],['green']])
If so… you can't. A set cannot contain unhashable values like lists. That's what the TypeError is telling you.
If this weren't a problem, all you'd have to do is write:
new_set = set(new_list)
And anything more complicated you write will have exactly the same problem as just calling set, so there's no tricky way around it.
Of course you can have a set of tuples, since they're hashable. So, maybe you wanted this:
new_set=set([('blue','purple'),('black','orange','red'),('green')])
That's easy too. Assuming your inner lists are guaranteed to contain nothing but strings (or other hashable values), as in your example it's just:
new_set = set(map(tuple, new_list))
Or, if you use a sort-based set class, you don't need hashable values, just fully-ordered values. For example:
new_set = sortedset(new_list)
Python doesn't come with such a thing in the standard library, but there are some great third-party implementations you can install, like blist.sortedset or bintrees.FastRBTree.
Of course sorted-set operations aren't quite as fast as hash operations in general, but often they're more than good enough. (For a concrete example, if you have 1 million items in the list, hashing will make each lookup 1 million times faster; sorting will only make it 50,000 times faster.)
Basically, any output you can describe or give an example of, we can tell you how to get that, or that it isn't a valid object you can get… but first you have to tell us what you actually want.
By the way, if you're wondering why lists aren't hashable, it's just because they're mutable. If you're wondering why most mutable types aren't hashable, the FAQ explains that.
Make the element a tuple before adding it to the set:
new_list=[['blue','purple'],['black','orange','red'],['green']]
new_list2=[['blue','purple'],['black','green','red'],['orange']]
results=set()
results2=set()
for element in new_list:
results.add(tuple(element))
for element in new_list2:
results2.add(tuple(element))
results3=results.intersection(results2)
print results3
results in:
set([('blue', 'purple')])
Set elements have to be hashable.
for adding lists to a set, instead use tuple
for adding sets to a set, instead use frozenset

iterable as comprarison key in sorted()?

Let's say I want to sort rows and I want to resolve any ties with the next column, subsequent ties to with the next-next column etc.
In python words the equivalent of sorted(rows, key=itemgetter(1, 2, 3, 4, ...)).
I tried writing my own generator but sorted doesn't iterate over my generator as it does with the tuple itemgetter returns. Any advice?
For the reasons noted in the comments, you cannot sort a list of things that hasn't been yet created. Generators exist to yield results when they are asked for so you can't sort a an iterable that hasn't been iterated (as with list(generator()).
To put in more ordinary terms, I'm thinking of ten names but am not telling you what they are yet, please sort them into alphabetical order. You should respond "how can I sort them when you haven't given them to me?" and you'd be correct: you can't.
OK, here's what you say you want to do:
I want to sort rows and I want to resolve any ties with the next column, subsequent ties to with the next-next column etc.
Note, first, that the documentation for the key argument does the following:
key specifies a function of one argument that is used to extract a comparison key from each list element
So your itemgetter idea isn't quite right, since you want to move through the list only when a comparison is equal.
However, things are actually much easier than you think. Check out the Python docs (See also this SO question.):
Sequence types also support comparisons. In particular, tuples and lists are compared lexicographically by comparing corresponding elements. This means that to compare equal, every element must compare equal and the two sequences must be of the same type and have the same length. (For full details see Comparisons in the language reference.)
Which, I think, is exactly what you want if you just make sure that each row is an equal-length sequence (list or tuple).
(Aha, I just read the comment regarding the die-roll function producing the keys. Confusing -- not sure if the above is helpful in that case, but I'm not sure what you are asking actually makes sense...)

Using 'in' to match an attribute of Python objects in an array

I don't remember whether I was dreaming or not but I seem to recall there being a function which allowed something like,
foo in iter_attr(array of python objects, attribute name)
I've looked over the docs but this kind of thing doesn't fall under any obvious listed headers
Using a list comprehension would build a temporary list, which could eat all your memory if the sequence being searched is large. Even if the sequence is not large, building the list means iterating over the whole of the sequence before in could start its search.
The temporary list can be avoiding by using a generator expression:
foo = 12
foo in (obj.id for obj in bar)
Now, as long as obj.id == 12 near the start of bar, the search will be fast, even if bar is infinitely long.
As #Matt suggested, it's a good idea to use hasattr if any of the objects in bar can be missing an id attribute:
foo = 12
foo in (obj.id for obj in bar if hasattr(obj, 'id'))
Are you looking to get a list of objects that have a certain attribute? If so, a list comprehension is the right way to do this.
result = [obj for obj in listOfObjs if hasattr(obj, 'attributeName')]
you could always write one yourself:
def iterattr(iterator, attributename):
for obj in iterator:
yield getattr(obj, attributename)
will work with anything that iterates, be it a tuple, list, or whatever.
I love python, it makes stuff like this very simple and no more of a hassle than neccessary, and in use stuff like this is hugely elegant.
No, you were not dreaming. Python has a pretty excellent list comprehension system that lets you manipulate lists pretty elegantly, and depending on exactly what you want to accomplish, this can be done a couple of ways. In essence, what you're doing is saying "For item in list if criteria.matches", and from that you can just iterate through the results or dump the results into a new list.
I'm going to crib an example from Dive Into Python here, because it's pretty elegant and they're smarter than I am. Here they're getting a list of files in a directory, then filtering the list for all files that match a regular expression criteria.
files = os.listdir(path)
test = re.compile("test\.py$", re.IGNORECASE)
files = [f for f in files if test.search(f)]
You could do this without regular expressions, for your example, for anything where your expression at the end returns true for a match. There are other options like using the filter() function, but if I were going to choose, I'd go with this.
Eric Sipple
The function you are thinking of is probably operator.attrgettter. For example, to get a list that contains the value of each object's "id" attribute:
import operator
ids = map(operator.attrgetter("id"), bar)
If you want to check whether the list contains an object with an id == 12, then a neat and efficient (i.e. doesn't iterate the whole list unnecessarily) way to do it is:
any(obj.id == 12 for obj in bar)
If you want to use 'in' with attrgetter, while still retaining lazy iteration of the list:
import operator,itertools
foo = 12
foo in itertools.imap(operator.attrgetter("id"), bar)
What I was thinking of can be achieved using list comprehensions, but I thought that there was a function that did this in a slightly neater way.
i.e. 'bar' is a list of objects, all of which have the attribute 'id'
The mythical functional way:
foo = 12
foo in iter_attr(bar, 'id')
The list comprehension way:
foo = 12
foo in [obj.id for obj in bar]
In retrospect the list comprehension way is pretty neat anyway.
If you plan on searching anything of remotely decent size, your best bet is going to be to use a dictionary or a set. Otherwise, you basically have to iterate through every element of the iterator until you get to the one you want.
If this isn't necessarily performance sensitive code, then the list comprehension way should work. But note that it is fairly inefficient because it goes over every element of the iterator and then goes BACK over it again until it finds what it wants.
Remember, python has one of the most efficient hashing algorithms around. Use it to your advantage.
I think:
#!/bin/python
bar in dict(Foo)
Is what you are thinking of. When trying to see if a certain key exists within a dictionary in python (python's version of a hash table) there are two ways to check. First is the has_key() method attached to the dictionary and second is the example given above. It will return a boolean value.
That should answer your question.
And now a little off topic to tie this in to the list comprehension answer previously given (for a bit more clarity). List Comprehensions construct a list from a basic for loop with modifiers. As an example (to clarify slightly), a way to use the in dict language construct in a list comprehension:
Say you have a two dimensional dictionary foo and you only want the second dimension dictionaries which contain the key bar. A relatively straightforward way to do so would be to use a list comprehension with a conditional as follows:
#!/bin/python
baz = dict([(key, value) for key, value in foo if bar in value])
Note the if bar in value at the end of the statement**, this is a modifying clause which tells the list comprehension to only keep those key-value pairs which meet the conditional.** In this case baz is a new dictionary which contains only the dictionaries from foo which contain bar (Hopefully I didn't miss anything in that code example... you may have to take a look at the list comprehension documentation found in docs.python.org tutorials and at secnetix.de, both sites are good references if you have questions in the future.).

Categories

Resources