Removing list item if not in another list - python

Removing list item if not in another list - python - python

Here is my situation.
I have a list of Person objects.
class Person():
def __init__(self, name="", age=):
self.name = name
self.uid = str( uuid.uuid4( ) )
self.age = age
My UI contains a treeview displaying these items. In some cases users can have a instance of the same person if they want. I highlight those bold to let the user know it's the same person.
THE PROBLEM
When a user deletes a tree node I then need to know if I should remove the actual object from the list. However if another instance of that object is being used then I shouldn't delete it.
My thoughts for a solution.
Before the delete operation takes place, which removes just the treenode items, I would collect all persons being used in the ui.
Next I would proceed with deleting the treeview items.
Next take another collection of objevst being used in the ui.
Laslty compare the two lists and delete persons not appearing in second list.
If I go this solution would I be best to do a test like
for p in reversed(original_list):
if p not in new_list:
original_list.remove(p)
Or should I collect the uid numbers instead to do the comparisons rather then the entire object?
The lists could be rather large.
Herr is the code with my first attempt at handling the remove operation. It saves out a json file when you close the app.
https://gist.github.com/JokerMartini/4a78b3c5db1dff8b7ed8
This is my function doing the deleting.
def delete_treewidet_items(self, ctrl):
global NODES
root = self.treeWidget.invisibleRootItem()
# delete treewidget items from gui
for item in self.treeWidget.selectedItems():
(item.parent() or root).removeChild(item)
# collect all uids used in GUI
uids_used = self.get_used_uids( root=self.treeWidget.invisibleRootItem() )
for n in reversed(NODES):
if n.uid not in uids_used:
NODES.remove(n)

You have not really posted enough code but from what I can gather:
import collections
import uuid
class Person():
def __init__(self, name="", age=69):
self.name = name
self.uid = str( uuid.uuid4( ) )
self.age = age
def __eq__(self, other):
return isinstance(other, Person) and self.uid == other.uid
def __ne__(self, other): return self != other # you need this
def __hash__(self):
return hash(self.uid)
# UI --------------------------------------------------------------------------
persons_count = collections.defaultdict(int) # belongs to your UI class
your_list_of_persons = [] # should be a set
def add_to_ui(person):
persons_count[person] += 1
# add it to the UI
def remove_from_ui(person):
persons_count[person] -= 1
if not persons_count[person]: your_list_of_persons.remove(person)
# remove from UI
So basically:
before the delete operation takes place, which removes just the treenode items, I would collect all persons being used in the ui.
No - you have this info always available as a module variable in your ui - the persons_count above. This way you don't have to copy lists around.
Remains the code that creates the persons - then your list (which contains distinct persons so should be a set) should be updated. If this is done in add_to_ui (makes sense) you should modify as:
def add_to_ui(name, age):
p = Person(name, age)
set_of_persons.add(p) # if already there won't be re-added and it's O(1)
persons_count[person] += 1
# add it to the UI
To take this a step further - you don't really need your original list - that is just persons_count.keys(), you just have to modify:
def add_to_ui(name, age):
p = Person(name, age)
persons_count[person] += 1
# add it to the UI
def remove_from_ui(person):
persons_count[person] -= 1
if not persons_count[person]: del persons_count[person]
# remove from UI
So you get the picture
EDIT: here is delete from my latest iteration:
def delete_tree_nodes_clicked(self):
root = self.treeWidget.invisibleRootItem()
# delete treewidget items from gui
for item in self.treeWidget.selectedItems():
(item.parent() or root).removeChild(item)
self.highlighted.discard(item)
persons_count[item.person] -= 1
if not persons_count[item.person]: del persons_count[item.person]
I have posted my solution (a rewrite of the code linked to the first question) in: https://github.com/Utumno/so_34104763/commits/master. It's a nice exercise in refactoring - have a look at the commit messages. In particular I introduce the dict here: https://github.com/Utumno/so_34104763/commit/074b7e659282a9896ea11bbef770464d07e865b7
Could use more work but it's a step towards the right direction I think - should be faster too in most operations and conserve memory

Not worrying too much about runtime or size of lists, you could use set-operations:
for p in set(original_list) - set(new_list):
original_list.remove(p)
Or filter the list:
new_original_list = [p for p in original_list if p in new_list]
But then again, why look at the whole list - when one item (or even a non-leaf node in a tree) is deleted, you know which item was deleted, so you could restrict your search to just that one.

You can compare objects using:
object identity
object equality
To compare objects identity you should use build in function id() or keyword is (that uses id()). From docs:
id function
Return the “identity” of an object. This is an integer (or long
integer) which is guaranteed to be unique and constant for this object
during its lifetime. Two objects with non-overlapping lifetimes may
have the same id() value.
is operator
The operators is and is not test for object identity: x is y is true
if and only if x and y are the same object. x is not y yields the
inverse truth value.
Example:
>>> p1 = Person('John')
>>> p2 = Person('Billy')
>>> id(p1) == id(p2)
False
>>> p1 is p2
False
To compare object equality you use == operator. == operator uses eq method to test for equality. If class does not define such method it falls back to comparing identity of objects.
So for:
Or should I collect the uid numbers instead to do the comparisons
rather then the entire object?
you would be doing the same thing since you have not defined eq in your class.
To filter lists do not modify list while you are iterating over it is bad. Guess what will be printed:
>>> a = [1, 2, 3]
>>> b = [1, 2]
>>> for item in a:
... if item in b:
... a.remove(item)
>>> a
[2, 3]
If you want to do this safely iterate over list from the back like:
>>> a = [1, 2, 3]
>>> b = [1, 2]
>>> for i in xrange(len(a) - 1, -1, -1):
... if a[i] in b:
... a.pop(i)
2
1
>>> a
[3]

Related

The zen of python applied to methods in classes

The Zen of python tells us:
There should be one and only one obvious way to do it.
This is difficult to put in practice when it comes to the following situation.
A class receives a list of documents.
The output is a dictionary per document with a variety of key/value pairs.
Every pair depends on a previous calculated one or even from other value/pairs of other dictionary of the list.
This is a very simplified example of such a class.
What is the “obvious” way to go? Every method adds a value/pair to every of the dictionaries.
class T():
def __init__(self,mylist):
#build list of dicts
self.J = [{str(i):mylist[i]} for i in range(len(mylist))]
# enhancement 1: upper
self.method1()
# enhancement 2: lower
self.J = self.static2(self.J)
def method1(self):
newdict = []
for i,mydict in enumerate(self.J):
mydict['up'] = mydict[str(i)].upper()
newdict.append(mydict)
self.J = newdict
#staticmethod
def static2(alist):
J = []
for i,mydict in enumerate(alist):
mydict['down'] = mydict[str(i)].lower()
J.append(mydict)
return J
#property
def propmethod(self):
J = []
for i,mydict in enumerate(self.J):
mydict['prop'] = mydict[str(i)].title()
J.append(mydict)
return J
# more methods extrating info out of every doc in the list
# ...
self.method1() is simple run and a new key/value pair is added to every dict.
The static method 2 can also be used.
and also the property.
Out of the three ways I discharge #property because I am not adding another attribute.
From the other two which one would you choose?
Remember the class will be composed by tens of this Methode that so not add attributes. Only Update (add keine pair values) dictionaries in a list.
I can not see the difference between method1 and static2.
thx.

How would I be able to return which instance belongs to the random number in my list. Without using a million if statements?

what I am trying to do, is returnthe instance, which range has the value from a random.randint() in a list.... Example...
class Testing:
def __init__(self, name, value):
self.name = name
self.value = value
randomtest = Testing('First', range(1, 50))
randomtest_2 = Testing('Second', range(50, 100))
selections = []
counter = 0
while counter < 2:
counter =+ 1
selector = random.randint(1, 100)
selections.append(selector)
But I don't want to use a million if statements to determine which index in the selections list it belongs to.. Like this:
if selections[0] in list(randomtest.value):
return True
elif selections[0] in list(randomtest_2.value):
return True
Your help is much appreciated, I am fairly new to programming and my head has just come to a stand still at the moment.

You can use a set for your selections object then check the intersection with set.intersection() method:
ex:
In [84]: a = {1, 2}
In [85]: a.intersection(range(4))
Out[85]: {1, 2}
and in your code:
if selections.intersection(randomtest.value):
return True
You can also define a hase_intersect method for your Testing class, in order to cehck if an iterable object has intersection with your obejct:
class Testing:
def __init__(self, name, value):
self.name = name
self.value = value
def hase_intersect(self, iterable):
iterable = set(iterable)
return any(i in iterable for i in self.value)
And check like this:
if randomtest.hase_intersect(selections):
return True
based on your comment, if you want to check the intersection of a spesific list against a set of objects you have to iterate over the
set of objects and check the intersection using aforementioned methods. But if you want to refuse iterating over the list of objects you should probably use a base claas
with an special method that returns your desire output but still you need to use an iteration to fild the name of all intended instances. Thus, if you certainly want to
create different objects you neend to at least use 1 iteration for this task.

How to remove duplicates in set for objects?

I have set of objects:
class Test(object):
def __init__(self):
self.i = random.randint(1,10)
res = set()
for i in range(0,1000):
res.add(Test())
print len(res) = 1000
How to remove duplicates from set of objects ?
Thanks for answers, it's work:
class Test(object):
def __init__(self, i):
self.i = i
# self.i = random.randint(1,10)
# self.j = random.randint(1,20)
def __keys(self):
t = ()
for key in self.__dict__:
t = t + (self.__dict__[key],)
return t
def __eq__(self, other):
return isinstance(other, Test) and self.__keys() == other.__keys()
def __hash__(self):
return hash(self.__keys())
res = set()
res.add(Test(2))
...
res.add(Test(8))
result: [2,8,3,4,5,6,7]
but how to save order ? Sets not support order. Can i use list instead set for example ?

Your objects must be hashable (i.e. must have __eq__() and __hash__() defined) for sets to work properly with them:
class Test(object):
def __init__(self):
self.i = random.randint(1, 10)
def __eq__(self, other):
return self.i == other.i
def __hash__(self):
return self.i
An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() or __cmp__() method). Hashable objects which compare equal must have the same hash value.
Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.
If you have several attributes, hash and compare a tuple of them (thanks, delnan):
class Test(object):
def __init__(self):
self.i = random.randint(1, 10)
self.k = random.randint(1, 10)
self.j = random.randint(1, 10)
def __eq__(self, other):
return (self.i, self.k, self.j) == (other.i, other.k, other.j)
def __hash__(self):
return hash((self.i, self.k, self.j))

Your first question is already answered by Pavel Anossov.
But you have another question:
but how to save order ? Sets not support order. Can i use list instead set for example ?
You can use a list, but there are a few downsides:
You get the wrong interface.
You don't get automatic handling of duplicates. You have to explicitly write if foo not in res: res.append(foo). Obviously, you can wrap this up in a function instead of writing it repeatedly, but it's still extra work.
It's going to be a lot less efficient if the collection can get large. Basically, adding a new element, checking whether an element already exists, etc. are all going to be O(N) instead of O(1).
What you want is something that works like an ordered set. Or, equivalently, like a list that doesn't allow duplicates.
If you do all your adds first, and then all your lookups, and you don't need lookups to be fast, you can get around this by first building a list, then using unique_everseen from the itertools recipes to remove duplicates.
Or you could just keep a set and a list or elements by order (or a list plus a set of elements seen so far). But that can get a bit complicated, so you might want to wrap it up.
Ideally, you want to wrap it up in a type that has exactly the same API as set. Something like an OrderedSet akin to collections.OrderedDict.
Fortunately, if you scroll to the bottom of that docs page, you'll see that exactly what you want already exists; there's a link to an OrderedSet recipe at ActiveState.
So, copy it, paste it into your code, then just change res = set() to res = OrderedSet(), and you're done.

I think you can easily do what you want with a list as you asked in your first post since you defined the eq operator :
l = []
if Test(0) not in l :
l.append(Test(0))
My 2 cts ...

Pavel Anossov's answer is great for allowing your class to be used in a set with the semantics you want. However, if you want to preserve the order of your items, you'll need a bit more. Here's a function that de-duplicates a list, as long as the list items are hashable:
def dedupe(lst):
seen = set()
results = []
for item in lst:
if item not in seen:
seen.add(item)
results.append(item)
return results
A slightly more idiomatic version would be a generator, rather than a function that returns a list. This gets rid of the results variable, using yield rather than appending the unique values to it. I've also renamed the lst parameter to iterable, since it will work just as well on any iterable object (such as another generator).
def dedupe(iterable):
seen = set()
for item in iterable:
if item not in seen:
seen.add(item)
yield item

Python set with the ability to pop a random element

I am in need of a Python (2.7) object that functions like a set (fast insertion, deletion, and membership checking) but has the ability to return a random value. Previous questions asked on stackoverflow have answers that are things like:
import random
random.sample(mySet, 1)
But this is quite slow for large sets (it runs in O(n) time).
Other solutions aren't random enough (they depend on the internal representation of python sets, which produces some results which are very non-random):
for e in mySet:
break
# e is now an element from mySet
I coded my own rudimentary class which has constant time lookup, deletion, and random values.
class randomSet:
def __init__(self):
self.dict = {}
self.list = []
def add(self, item):
if item not in self.dict:
self.dict[item] = len(self.list)
self.list.append(item)
def addIterable(self, item):
for a in item:
self.add(a)
def delete(self, item):
if item in self.dict:
index = self.dict[item]
if index == len(self.list)-1:
del self.dict[self.list[index]]
del self.list[index]
else:
self.list[index] = self.list.pop()
self.dict[self.list[index]] = index
del self.dict[item]
def getRandom(self):
if self.list:
return self.list[random.randomint(0,len(self.list)-1)]
def popRandom(self):
if self.list:
index = random.randint(0,len(self.list)-1)
if index == len(self.list)-1:
del self.dict[self.list[index]]
return self.list.pop()
returnValue = self.list[index]
self.list[index] = self.list.pop()
self.dict[self.list[index]] = index
del self.dict[returnValue]
return returnValue
Are there any better implementations for this, or any big improvements to be made to this code?

I think the best way to do this would be to use the MutableSet abstract base class in collections. Inherit from MutableSet, and then define add, discard, __len__, __iter__, and __contains__; also rewrite __init__ to optionally accept a sequence, just like the set constructor does. MutableSet provides built-in definitions of all other set methods based on those methods. That way you get the full set interface cheaply. (And if you do this, addIterable is defined for you, under the name extend.)
discard in the standard set interface appears to be what you have called delete here. So rename delete to discard. Also, instead of having a separate popRandom method, you could just define popRandom like so:
def popRandom(self):
item = self.getRandom()
self.discard(item)
return item
That way you don't have to maintain two separate item removal methods.
Finally, in your item removal method (delete now, discard according to the standard set interface), you don't need an if statement. Instead of testing whether index == len(self.list) - 1, simply swap the final item in the list with the item at the index of the list to be popped, and make the necessary change to the reverse-indexing dictionary. Then pop the last item from the list and remove it from the dictionary. This works whether index == len(self.list) - 1 or not:
def discard(self, item):
if item in self.dict:
index = self.dict[item]
self.list[index], self.list[-1] = self.list[-1], self.list[index]
self.dict[self.list[index]] = index
del self.list[-1] # or in one line:
del self.dict[item] # del self.dict[self.list.pop()]

One approach you could take is to derive a new class from set which salts itself with random objects of a type derived from int.
You can then use pop to select a random element, and if it is not of the salt type, reinsert and return it, but if it is of the salt type, insert a new, randomly-generated salt object (and pop to select a new object).
This will tend to alter the order in which objects are selected. On average, the number of attempts will depend on the proportion of salting elements, i.e. amortised O(k) performance.

Can't we implement a new class inheriting from set with some (hackish) modifications that enable us to retrieve a random element from the list with O(1) lookup time? Btw, on Python 2.x you should inherit from object, i.e. use class randomSet(object). Also PEP8 is something to consider for you :-)
Edit:
For getting some ideas of what hackish solutions might be capable of, this thread is worth reading:
http://python.6.n6.nabble.com/Get-item-from-set-td1530758.html

Here's a solution from scratch, which adds and pops in constant time. I also included some extra set functions for demonstrative purposes.
from random import randint
class RandomSet(object):
"""
Implements a set in which elements can be
added and drawn uniformly and randomly in
constant time.
"""
def __init__(self, seq=None):
self.dict = {}
self.list = []
if seq is not None:
for x in seq:
self.add(x)
def add(self, x):
if x not in self.dict:
self.dict[x] = len(self.list)
self.list.append(x)
def pop(self, x=None):
if x is None:
i = randint(0,len(self.list)-1)
x = self.list[i]
else:
i = self.dict[x]
self.list[i] = self.list[-1]
self.dict[self.list[-1]] = i
self.list.pop()
self.dict.pop(x)
return x
def __contains__(self, x):
return x in self.dict
def __iter__(self):
return iter(self.list)
def __repr__(self):
return "{" + ", ".join(str(x) for x in self.list) + "}"
def __len__(self):
return len(self.list)

Yes, I'd implement an "ordered set" in much the same way you did - and use a list as an internal data structure.
However, I'd inherit straight from "set" and just keep track of the added items in an
internal list (as you did) - and leave the methods I don't use alone.
Maybe add a "sync" method to update the internal list whenever the set is updated
by set-specific operations, like the *_update methods.
That if using an "ordered dict" does not cover your use cases. (I just found that trying to cast ordered_dict keys to a regular set is not optmized, so if you need set operations on your data that is not an option)

If you don't mind only supporting comparable elements, then you could use blist.sortedset.

Hierarchy / Flyweight / Instancing Problem in Python

Here is the problem I am trying to solve, (I have simplified the actual problem, but this should give you all the relevant information). I have a hierarchy like so:
1.A
1.B
1.C
2.A
3.D
4.B
5.F
(This is hard to illustrate - each number is the parent, each letter is the child).
Creating an instance of the 'letter' objects is expensive (IO, database costs, etc), so should only be done once.
The hierarchy needs to be easy to navigate.
Children in the hierarchy need to have just one parent.
Modifying the contents of the letter objects should be possible directly from the objects in the hierarchy.
There needs to be a central store containing all of the 'letter' objects (and only those in the hierarchy).
'letter' and 'number' objects need to be possible to create from a constructor (such as Letter(**kwargs) ).
It is perfectably acceptable to expect that when a letter changes from the hierarchy, all other letters will respect the same change.
Hope this isn't too abstract to illustrate the problem.
What would be the best way of solving this? (Then I'll post my solution)
Here's an example script:
one = Number('one')
a = Letter('a')
one.addChild(a)
two = Number('two')
a = Letter('a')
two.addChild(a)
for child in one:
child.method1()
for child in two:
print '%s' % child.method2()

A basic approach will use builtin data types. If I get your drift, the Letter object should be created by a factory with a dict cache to keep previously generated Letter objects. The factory will create only one Letter object for each key.
A Number object can be a sub-class of list that will hold the Letter objects, so that append() can be used to add a child. A list is easy to navigate.
A crude outline of a caching factory:
>>> class Letters(object):
... def __init__(self):
... self.cache = {}
... def create(self, v):
... l = self.cache.get(v, None)
... if l:
... return l
... l = self.cache[v] = Letter(v)
... return l
>>> factory=Letters()
>>> factory.cache
{}
>>> factory.create('a')
<__main__.Letter object at 0x00EF2950>
>>> factory.create('a')
<__main__.Letter object at 0x00EF2950>
>>>
To fulfill requirement 6 (constructor), here is
a more contrived example, using __new__, of a caching constructor. This is similar to Recipe 413717: Caching object creation .
class Letter(object):
cache = {}
def __new__(cls, v):
o = cls.cache.get(v, None)
if o:
return o
else:
o = cls.cache[v] = object.__new__(cls)
return o
def __init__(self, v):
self.v = v
self.refcount = 0
def addAsChild(self, chain):
if self.refcount > 0:
return False
self.refcount += 1
chain.append(self)
return True
Testing the cache functionality
>>> l1 = Letter('a')
>>> l2 = Letter('a')
>>> l1 is l2
True
>>>
For enforcing a single parent, you'll need a method on Letter objects (not Number) - with a reference counter. When called to perform the addition it will refuse addition if the counter is greater than zero.
l1.addAsChild(num4)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.