The Zen of python tells us:
There should be one and only one obvious way to do it.
This is difficult to put in practice when it comes to the following situation.
A class receives a list of documents.
The output is a dictionary per document with a variety of key/value pairs.
Every pair depends on a previous calculated one or even from other value/pairs of other dictionary of the list.
This is a very simplified example of such a class.
What is the “obvious” way to go? Every method adds a value/pair to every of the dictionaries.
class T():
def __init__(self,mylist):
#build list of dicts
self.J = [{str(i):mylist[i]} for i in range(len(mylist))]
# enhancement 1: upper
self.method1()
# enhancement 2: lower
self.J = self.static2(self.J)
def method1(self):
newdict = []
for i,mydict in enumerate(self.J):
mydict['up'] = mydict[str(i)].upper()
newdict.append(mydict)
self.J = newdict
#staticmethod
def static2(alist):
J = []
for i,mydict in enumerate(alist):
mydict['down'] = mydict[str(i)].lower()
J.append(mydict)
return J
#property
def propmethod(self):
J = []
for i,mydict in enumerate(self.J):
mydict['prop'] = mydict[str(i)].title()
J.append(mydict)
return J
# more methods extrating info out of every doc in the list
# ...
self.method1() is simple run and a new key/value pair is added to every dict.
The static method 2 can also be used.
and also the property.
Out of the three ways I discharge #property because I am not adding another attribute.
From the other two which one would you choose?
Remember the class will be composed by tens of this Methode that so not add attributes. Only Update (add keine pair values) dictionaries in a list.
I can not see the difference between method1 and static2.
thx.
Related
Let's say I have a simple class, with an attribute, x:
class A:
def __init__(self):
self.x = random.randint(-5, 5) # not the most efficient, but it serves purposes well
I'll also have a list, with hundreds of instances of this class:
Az = []
for i in range(150):
Az.append(A())
Now, let's say I want to loop through all the As in Az, and run a function on the classes who's x attribute is equivalent to less than one. This is one way, but alas, is very inefficient:
for cls in Az:
if cls.x<1:
func1(cls) # A random function that accepts a class as a parameter, and does something to it
So, to wrap it up, my question: How to optimize the speed of the checking?
Optimizing only the third step is tricky. Why not start at the second step by saving list ids of classes where attribute x is <1?
Az = []
ids = []
for i, id in enumerate(range(150)):
cls = A()
if cls.x < 1:
ids.append(id)
Az.append(cls)
And then modify the third step:
for id in ids:
func1(Az[id])
what I am trying to do, is returnthe instance, which range has the value from a random.randint() in a list.... Example...
class Testing:
def __init__(self, name, value):
self.name = name
self.value = value
randomtest = Testing('First', range(1, 50))
randomtest_2 = Testing('Second', range(50, 100))
selections = []
counter = 0
while counter < 2:
counter =+ 1
selector = random.randint(1, 100)
selections.append(selector)
But I don't want to use a million if statements to determine which index in the selections list it belongs to.. Like this:
if selections[0] in list(randomtest.value):
return True
elif selections[0] in list(randomtest_2.value):
return True
Your help is much appreciated, I am fairly new to programming and my head has just come to a stand still at the moment.
You can use a set for your selections object then check the intersection with set.intersection() method:
ex:
In [84]: a = {1, 2}
In [85]: a.intersection(range(4))
Out[85]: {1, 2}
and in your code:
if selections.intersection(randomtest.value):
return True
You can also define a hase_intersect method for your Testing class, in order to cehck if an iterable object has intersection with your obejct:
class Testing:
def __init__(self, name, value):
self.name = name
self.value = value
def hase_intersect(self, iterable):
iterable = set(iterable)
return any(i in iterable for i in self.value)
And check like this:
if randomtest.hase_intersect(selections):
return True
based on your comment, if you want to check the intersection of a spesific list against a set of objects you have to iterate over the
set of objects and check the intersection using aforementioned methods. But if you want to refuse iterating over the list of objects you should probably use a base claas
with an special method that returns your desire output but still you need to use an iteration to fild the name of all intended instances. Thus, if you certainly want to
create different objects you neend to at least use 1 iteration for this task.
I have set of objects:
class Test(object):
def __init__(self):
self.i = random.randint(1,10)
res = set()
for i in range(0,1000):
res.add(Test())
print len(res) = 1000
How to remove duplicates from set of objects ?
Thanks for answers, it's work:
class Test(object):
def __init__(self, i):
self.i = i
# self.i = random.randint(1,10)
# self.j = random.randint(1,20)
def __keys(self):
t = ()
for key in self.__dict__:
t = t + (self.__dict__[key],)
return t
def __eq__(self, other):
return isinstance(other, Test) and self.__keys() == other.__keys()
def __hash__(self):
return hash(self.__keys())
res = set()
res.add(Test(2))
...
res.add(Test(8))
result: [2,8,3,4,5,6,7]
but how to save order ? Sets not support order. Can i use list instead set for example ?
Your objects must be hashable (i.e. must have __eq__() and __hash__() defined) for sets to work properly with them:
class Test(object):
def __init__(self):
self.i = random.randint(1, 10)
def __eq__(self, other):
return self.i == other.i
def __hash__(self):
return self.i
An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() or __cmp__() method). Hashable objects which compare equal must have the same hash value.
Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.
If you have several attributes, hash and compare a tuple of them (thanks, delnan):
class Test(object):
def __init__(self):
self.i = random.randint(1, 10)
self.k = random.randint(1, 10)
self.j = random.randint(1, 10)
def __eq__(self, other):
return (self.i, self.k, self.j) == (other.i, other.k, other.j)
def __hash__(self):
return hash((self.i, self.k, self.j))
Your first question is already answered by Pavel Anossov.
But you have another question:
but how to save order ? Sets not support order. Can i use list instead set for example ?
You can use a list, but there are a few downsides:
You get the wrong interface.
You don't get automatic handling of duplicates. You have to explicitly write if foo not in res: res.append(foo). Obviously, you can wrap this up in a function instead of writing it repeatedly, but it's still extra work.
It's going to be a lot less efficient if the collection can get large. Basically, adding a new element, checking whether an element already exists, etc. are all going to be O(N) instead of O(1).
What you want is something that works like an ordered set. Or, equivalently, like a list that doesn't allow duplicates.
If you do all your adds first, and then all your lookups, and you don't need lookups to be fast, you can get around this by first building a list, then using unique_everseen from the itertools recipes to remove duplicates.
Or you could just keep a set and a list or elements by order (or a list plus a set of elements seen so far). But that can get a bit complicated, so you might want to wrap it up.
Ideally, you want to wrap it up in a type that has exactly the same API as set. Something like an OrderedSet akin to collections.OrderedDict.
Fortunately, if you scroll to the bottom of that docs page, you'll see that exactly what you want already exists; there's a link to an OrderedSet recipe at ActiveState.
So, copy it, paste it into your code, then just change res = set() to res = OrderedSet(), and you're done.
I think you can easily do what you want with a list as you asked in your first post since you defined the eq operator :
l = []
if Test(0) not in l :
l.append(Test(0))
My 2 cts ...
Pavel Anossov's answer is great for allowing your class to be used in a set with the semantics you want. However, if you want to preserve the order of your items, you'll need a bit more. Here's a function that de-duplicates a list, as long as the list items are hashable:
def dedupe(lst):
seen = set()
results = []
for item in lst:
if item not in seen:
seen.add(item)
results.append(item)
return results
A slightly more idiomatic version would be a generator, rather than a function that returns a list. This gets rid of the results variable, using yield rather than appending the unique values to it. I've also renamed the lst parameter to iterable, since it will work just as well on any iterable object (such as another generator).
def dedupe(iterable):
seen = set()
for item in iterable:
if item not in seen:
seen.add(item)
yield item
I am in need of a Python (2.7) object that functions like a set (fast insertion, deletion, and membership checking) but has the ability to return a random value. Previous questions asked on stackoverflow have answers that are things like:
import random
random.sample(mySet, 1)
But this is quite slow for large sets (it runs in O(n) time).
Other solutions aren't random enough (they depend on the internal representation of python sets, which produces some results which are very non-random):
for e in mySet:
break
# e is now an element from mySet
I coded my own rudimentary class which has constant time lookup, deletion, and random values.
class randomSet:
def __init__(self):
self.dict = {}
self.list = []
def add(self, item):
if item not in self.dict:
self.dict[item] = len(self.list)
self.list.append(item)
def addIterable(self, item):
for a in item:
self.add(a)
def delete(self, item):
if item in self.dict:
index = self.dict[item]
if index == len(self.list)-1:
del self.dict[self.list[index]]
del self.list[index]
else:
self.list[index] = self.list.pop()
self.dict[self.list[index]] = index
del self.dict[item]
def getRandom(self):
if self.list:
return self.list[random.randomint(0,len(self.list)-1)]
def popRandom(self):
if self.list:
index = random.randint(0,len(self.list)-1)
if index == len(self.list)-1:
del self.dict[self.list[index]]
return self.list.pop()
returnValue = self.list[index]
self.list[index] = self.list.pop()
self.dict[self.list[index]] = index
del self.dict[returnValue]
return returnValue
Are there any better implementations for this, or any big improvements to be made to this code?
I think the best way to do this would be to use the MutableSet abstract base class in collections. Inherit from MutableSet, and then define add, discard, __len__, __iter__, and __contains__; also rewrite __init__ to optionally accept a sequence, just like the set constructor does. MutableSet provides built-in definitions of all other set methods based on those methods. That way you get the full set interface cheaply. (And if you do this, addIterable is defined for you, under the name extend.)
discard in the standard set interface appears to be what you have called delete here. So rename delete to discard. Also, instead of having a separate popRandom method, you could just define popRandom like so:
def popRandom(self):
item = self.getRandom()
self.discard(item)
return item
That way you don't have to maintain two separate item removal methods.
Finally, in your item removal method (delete now, discard according to the standard set interface), you don't need an if statement. Instead of testing whether index == len(self.list) - 1, simply swap the final item in the list with the item at the index of the list to be popped, and make the necessary change to the reverse-indexing dictionary. Then pop the last item from the list and remove it from the dictionary. This works whether index == len(self.list) - 1 or not:
def discard(self, item):
if item in self.dict:
index = self.dict[item]
self.list[index], self.list[-1] = self.list[-1], self.list[index]
self.dict[self.list[index]] = index
del self.list[-1] # or in one line:
del self.dict[item] # del self.dict[self.list.pop()]
One approach you could take is to derive a new class from set which salts itself with random objects of a type derived from int.
You can then use pop to select a random element, and if it is not of the salt type, reinsert and return it, but if it is of the salt type, insert a new, randomly-generated salt object (and pop to select a new object).
This will tend to alter the order in which objects are selected. On average, the number of attempts will depend on the proportion of salting elements, i.e. amortised O(k) performance.
Can't we implement a new class inheriting from set with some (hackish) modifications that enable us to retrieve a random element from the list with O(1) lookup time? Btw, on Python 2.x you should inherit from object, i.e. use class randomSet(object). Also PEP8 is something to consider for you :-)
Edit:
For getting some ideas of what hackish solutions might be capable of, this thread is worth reading:
http://python.6.n6.nabble.com/Get-item-from-set-td1530758.html
Here's a solution from scratch, which adds and pops in constant time. I also included some extra set functions for demonstrative purposes.
from random import randint
class RandomSet(object):
"""
Implements a set in which elements can be
added and drawn uniformly and randomly in
constant time.
"""
def __init__(self, seq=None):
self.dict = {}
self.list = []
if seq is not None:
for x in seq:
self.add(x)
def add(self, x):
if x not in self.dict:
self.dict[x] = len(self.list)
self.list.append(x)
def pop(self, x=None):
if x is None:
i = randint(0,len(self.list)-1)
x = self.list[i]
else:
i = self.dict[x]
self.list[i] = self.list[-1]
self.dict[self.list[-1]] = i
self.list.pop()
self.dict.pop(x)
return x
def __contains__(self, x):
return x in self.dict
def __iter__(self):
return iter(self.list)
def __repr__(self):
return "{" + ", ".join(str(x) for x in self.list) + "}"
def __len__(self):
return len(self.list)
Yes, I'd implement an "ordered set" in much the same way you did - and use a list as an internal data structure.
However, I'd inherit straight from "set" and just keep track of the added items in an
internal list (as you did) - and leave the methods I don't use alone.
Maybe add a "sync" method to update the internal list whenever the set is updated
by set-specific operations, like the *_update methods.
That if using an "ordered dict" does not cover your use cases. (I just found that trying to cast ordered_dict keys to a regular set is not optmized, so if you need set operations on your data that is not an option)
If you don't mind only supporting comparable elements, then you could use blist.sortedset.
Here is the problem I am trying to solve, (I have simplified the actual problem, but this should give you all the relevant information). I have a hierarchy like so:
1.A
1.B
1.C
2.A
3.D
4.B
5.F
(This is hard to illustrate - each number is the parent, each letter is the child).
Creating an instance of the 'letter' objects is expensive (IO, database costs, etc), so should only be done once.
The hierarchy needs to be easy to navigate.
Children in the hierarchy need to have just one parent.
Modifying the contents of the letter objects should be possible directly from the objects in the hierarchy.
There needs to be a central store containing all of the 'letter' objects (and only those in the hierarchy).
'letter' and 'number' objects need to be possible to create from a constructor (such as Letter(**kwargs) ).
It is perfectably acceptable to expect that when a letter changes from the hierarchy, all other letters will respect the same change.
Hope this isn't too abstract to illustrate the problem.
What would be the best way of solving this? (Then I'll post my solution)
Here's an example script:
one = Number('one')
a = Letter('a')
one.addChild(a)
two = Number('two')
a = Letter('a')
two.addChild(a)
for child in one:
child.method1()
for child in two:
print '%s' % child.method2()
A basic approach will use builtin data types. If I get your drift, the Letter object should be created by a factory with a dict cache to keep previously generated Letter objects. The factory will create only one Letter object for each key.
A Number object can be a sub-class of list that will hold the Letter objects, so that append() can be used to add a child. A list is easy to navigate.
A crude outline of a caching factory:
>>> class Letters(object):
... def __init__(self):
... self.cache = {}
... def create(self, v):
... l = self.cache.get(v, None)
... if l:
... return l
... l = self.cache[v] = Letter(v)
... return l
>>> factory=Letters()
>>> factory.cache
{}
>>> factory.create('a')
<__main__.Letter object at 0x00EF2950>
>>> factory.create('a')
<__main__.Letter object at 0x00EF2950>
>>>
To fulfill requirement 6 (constructor), here is
a more contrived example, using __new__, of a caching constructor. This is similar to Recipe 413717: Caching object creation .
class Letter(object):
cache = {}
def __new__(cls, v):
o = cls.cache.get(v, None)
if o:
return o
else:
o = cls.cache[v] = object.__new__(cls)
return o
def __init__(self, v):
self.v = v
self.refcount = 0
def addAsChild(self, chain):
if self.refcount > 0:
return False
self.refcount += 1
chain.append(self)
return True
Testing the cache functionality
>>> l1 = Letter('a')
>>> l2 = Letter('a')
>>> l1 is l2
True
>>>
For enforcing a single parent, you'll need a method on Letter objects (not Number) - with a reference counter. When called to perform the addition it will refuse addition if the counter is greater than zero.
l1.addAsChild(num4)