Adding Elements from a List of Lists to a Set? - python

I'm attempting to add elements from a list of lists into a set. For example if I had
new_list=[['blue','purple'],['black','orange','red'],['green']]
How would I receive
new_set=(['blue','purple'],['black','orange','red'],['green'])
I'm trying to do this so I can use intersection to find out what elements appear in 2 sets. I thought this would work...
results=set()
results2=set()
for element in new_list:
results.add(element)
for element in new_list2:
results2.add(element)
results3=results.intersection(results2)
but I keep receiving:
TypeError: unhashable type: 'list'
for some reason.

Convert the inner lists to tuples, as sets allow you to store only hashable(immutable) objects:
In [72]: new_list=[['blue','purple'],['black','orange','red'],['green']]
In [73]: set(tuple(x) for x in new_list)
Out[73]: set([('blue', 'purple'), ('black', 'orange', 'red'), ('green',)])

How would I receive
new_set=(['blue','purple'],['black','orange','red'],['green'])
Well, despite the misleading name, that's not a set of anything, that's a tuple of lists. To convert a list of lists into a tuple of lists:
new_set = tuple(new_list)
Maybe you wanted to receive this?
new_set=set([['blue','purple'],['black','orange','red'],['green']])
If so… you can't. A set cannot contain unhashable values like lists. That's what the TypeError is telling you.
If this weren't a problem, all you'd have to do is write:
new_set = set(new_list)
And anything more complicated you write will have exactly the same problem as just calling set, so there's no tricky way around it.
Of course you can have a set of tuples, since they're hashable. So, maybe you wanted this:
new_set=set([('blue','purple'),('black','orange','red'),('green')])
That's easy too. Assuming your inner lists are guaranteed to contain nothing but strings (or other hashable values), as in your example it's just:
new_set = set(map(tuple, new_list))
Or, if you use a sort-based set class, you don't need hashable values, just fully-ordered values. For example:
new_set = sortedset(new_list)
Python doesn't come with such a thing in the standard library, but there are some great third-party implementations you can install, like blist.sortedset or bintrees.FastRBTree.
Of course sorted-set operations aren't quite as fast as hash operations in general, but often they're more than good enough. (For a concrete example, if you have 1 million items in the list, hashing will make each lookup 1 million times faster; sorting will only make it 50,000 times faster.)
Basically, any output you can describe or give an example of, we can tell you how to get that, or that it isn't a valid object you can get… but first you have to tell us what you actually want.
By the way, if you're wondering why lists aren't hashable, it's just because they're mutable. If you're wondering why most mutable types aren't hashable, the FAQ explains that.

Make the element a tuple before adding it to the set:
new_list=[['blue','purple'],['black','orange','red'],['green']]
new_list2=[['blue','purple'],['black','green','red'],['orange']]
results=set()
results2=set()
for element in new_list:
results.add(tuple(element))
for element in new_list2:
results2.add(tuple(element))
results3=results.intersection(results2)
print results3
results in:
set([('blue', 'purple')])
Set elements have to be hashable.
for adding lists to a set, instead use tuple
for adding sets to a set, instead use frozenset

Related

Why does Pythons' set() return a set item instead of a list

I quite often use set() to remove duplicates from lists. After doing so, I always directly change it back to a list.
a = [0,0,0,1,2,3,4,5]
b = list(set(a))
Why does set() return a set item, instead of simply a list?
type(set(a)) == set # is true
Is there a use for set items that I have failed to understand?
Yes, sets have many uses. They have lots of nice operations documented here which lists don't have. One very useful difference is that membership testing (x in a) can be much faster than for a list.
Okay, by doubles you mean duplicate? and set() will always return a set because it is a data structure in python like lists. when you are calling set you are creating an object of set().
rest of the information about sets you can find here
https://docs.python.org/2/library/sets.html
As already mentioned, I won't go into why set does not return a list but like you stated:
I quite often use set() to remove doubles from lists. After doing so, I always directly change it back to a list.
You could use OrderedDict if you really hate going back to changing it to a list:
source_list = [0,0,0,1,2,3,4,5]
from collections import OrderedDict
print(OrderedDict((x, True) for x in source_list).keys())
OUTPUT:
odict_keys([0, 1, 2, 3, 4, 5])
As said before, for certain operations if you use set instead of list, it is faster. Python wiki has query TimeComplexity in which speed of operations of various data types are given. Note that if you have few elements in your list or set, you will most probably do not notice difference, but with more elements it become more important.
Notice that for example if you want to make in-place removal, for list it is O(n) meaning that for 10 times longer list it will need 10 times more time, while for set and s.difference_update(t) where s is set, t is set with one element to be removed from s, time is O(1) i.e. independent from number of elements of s.

Python : Adding data to list

I am learning lists and trying to create a list and add data to it.
mylist=[]
mylist[0]="hello"
This generates Error.
Why cant we add members to lists like this, like we do with arrays in javascript.
Since these are also dynamic and we can add as many members and of any data type to it.
In javascript this works:
var ar=[];
ar[0]=333;
Why this dosent work in Python and we only use append() to add to list.
mylist[0] = 'hello' is syntactic sugar for mylist.__setitem__(0, 'hello').
As per the docs for object.__setitem__(self, key, value):
The same exceptions should be raised for improper key values as for
the __getitem__() method.
The docs for __getitem__ states specifically what leads to IndexError:
if value outside the set of indexes for the sequence (after any
special interpretation of negative values), IndexError should be
raised.
As to the purpose behind this design decision, one can write several chapters to explain why list has been designed in this way. You should familiarise yourself with Python list indexing and slicing before making judgements on its utility.
Lists in Python are fundamentally different to arrays in languages like C. You do not create a list of a fixed size and assign elements to indexes in it. Instead you either create an empty list and append elements to it, or use a list-comprehension to generate a list from a type of expression.
In your case, you want to add to the end, so you must use the .append method:
mylist.append('hello')
#["hello"]
And an example of a list comprehension:
squares = [x**2 for x in range(10)]
#[1,4,9,16,25,36,49,64,81,100]

Python, sets, Comparing two elements in the same set

If I have a python set and I want to find out if one element in the set is part of another element in the same set, how do I do it?
I've tried using indicies but I run into the following:
mySet = {"hello", "lo"}
mySet[1] in mySet[0] #I expect to return true
TypeError: 'set' object does not support indexing
I haven't found the python docs to be particularly helpful in this situation because I don't know how to compare elements within a set.
BTW, this is my first Stackoverflow question ever. I tried to adhere to the best practices. If there is a way I can improve the question, please let me know. Thank you for your help!
Sets don't have order. The index of an element is effectively the element itself. If you do need sets (although I have suspicions another data structure may be suitable) then they are iterable, and you can compare each element with other elements, but this won't be terrific performance wise, eg:
mySet = {"hello", "lo"}
for item in mySet:
for other_item in mySet.difference([item]):
if item in other_item:
print item, other_item
'set' object does not support indexing.
That clearly states that you can not index an element of set as mySet[1].
to access a single element of a set you have to use it like mySet.pop()
It looks like you're not actually trying to compare sets, but rather members of sets. The problem is you can't grab indexed members, because sets are an unordered (and as such unindexed) collection of elements.
You're trying to compare these two elements (strings). What you want is therefore a list or tuple:
>>> myTuple = ('hello', 'lo')
>>> myTuple[1] in myTuple[0]
True
This checks if the string 'lo' is a substring of 'hello'. This appears to be what you're trying to accomplish in your question.

Remove an item with certain property value from a list

Given a list of objects, where each has a property named x, and I want to remove all the objects whose x property contains value v from the list.
One way to do it is to use list comprehension: [item for item in mylist if item.x != v], but since my list is small (usually less than 10). Another way is to iterate through the list in a loop and check for every single item.
Is there a third way that is equally fast or even faster?
You can also use a generator or the filter function. Choose what you find the most readable; efficiency doesn't really matter at this point (especially not if you're dealing with just a few elements).
Create a new list using list comprehension syntax. I don't think you can do anything faster than that. It doesn't matter that your list is small, that's even better.

Generate a list of distinct empty mutables

I need to initialize a list of defaultdicts. If they were, say, strings, this would be tidy:
list_of_dds = [string] * n
…but for mutables, you get right into a mess with that approach:
>>> x=[defaultdict(list)] * 3
>>> x[0]['foo'] = 'bar'
>>> x
[defaultdict(<type 'list'>, {'foo': 'bar'}), defaultdict(<type 'list'>, {'foo': 'bar'}), defaultdict(<type 'list'>, {'foo': 'bar'})]
What I do want is an iterable of freshly-minted distinct instances of defaultdicts. I can do this:
list_of_dds = [defaultdict(list) for i in xrange(n)]
but I feel a little dirty using a list comprehension here. I think there's a better approach. Is there? Please tell me what it is.
Edit:
This is why I feel the list comprehension is suboptimal. I'm not usually the pre-optimization type, but I can't bring myself to ignore the speed difference here:
>>> timeit('x=[string.letters]*100', setup='import string')
0.9318461418151855
>>> timeit('x=[string.letters for i in xrange(100)]', setup='import string')
12.606678009033203
>>> timeit('x=[[]]*100')
0.890861988067627
>>> timeit('x=[[] for i in xrange(100)]')
9.716886043548584
Your approach using the list comprehension is correct. Why do you think it's dirty? What you want is a list of things whose length is defined by some base set. List comprehensions create lists based on some base set. What's wrong with using a list comprehension here?
Edit: The speed difference is a direct consequence of what you are trying to do. [[]]*100 is faster, because it only has to create one list. Creating a new list each time is slower, yeah, but you have to expect it to be slower if you actually want 100 different lists.
(It doesn't create a new string each time on your string examples, but it's still slower, because the list comprehension can't "know" ahead of time that all the elements are going to be the same, so it still has to reevaluate the expression every time. I don't know the internal details of the list comp, but it's possible there's also some list-resizing overhead because it doesn't necessarily know the size of the index iterable to start with, so it can't preallocate the list. In addition, note that some of the slowdown in your string example is due to looking up string.letters on every iteration. On my system using timeit.timeit('x=[letters for i in xrange(100)]', setup='from string import letters') instead --- looking up string.letters only once --- cuts the time by about 30%.)
The list comprehension is exactly what you should use.
The problem with the list multiplication is that the list containing a single mutable object is created and then you try to duplicate it. But by trying to duplicate the object from the object itself, the code used to create it is no longer relevant. Nothing you do with the object is going to do what you want, which is run the code used to create it N times, because the object has no idea what code was used to create it.
You could use copy.copy or copy.deepcopy to duplicate it, but that puts you right back in the same boat because then the call to copy/deepcopy just becomes the code you need to run N times.
A list comprehension is a very good fit here. What's wrong with it?

Categories

Resources