I am writing a simple class to retrieve a signal x digitized at constant sampling rate Fs. Digitization begins at time t0. Given the signal length N = len(x), the sampling rate, and the initial time, the signal's time base is uniquely determined. I rarely need to access the time base, but I would like an easy means of doing so when needed. Below, I've implemented a minimal working example of my desired time-base functionality using a property() decorator:
import numpy as np
class Signal(object):
def __init__(self, x, Fs, t0):
self.x = x
self.Fs = Fs
self.t0 = t0
return
#property
def t(self):
return self.t0 + (np.arange(len(self.x)) / self.Fs)
I'd like to know about the creation and "persistence" of the time-base property Signal.t. Take the example use case below:
x = np.arange(10)
Fs = 1.
t0 = 0.
sig = Signal(x, Fs, t0)
print sig.t
When is the time-base array t generated? During initialization or dynamically when print sig.t is called? If the sig.t attribute is dynamically calculated, will it persist beyond the print command? (i.e. has memory been allocated to store the time base as an object attribute?).
While the above is a trivial example, my typical signals are very large, and I do not want the memory overhead of creating and storing the time base for every signal. I'd like an easy means of dynamically generating the time base on an as-needed basis, however; the time base should not persist as an object attribute after its one-off use (e.g. for creating a plot of the raw signal).
If the property() decorator does not provide this desired functionality (i.e. minimal memory overhead, ease of use on an as-needed basis), what should I be using? Simply a class method? Or is there a different, more optimal solution? Thanks!
Every time you access sig.t, the t function you decorated with #property is (re)run, and the result is used as the value of sig.t. That means the array is created on demand and never stored on the t object.
You seem to want this, but I'd be wary about it. Property accesses are generally expected to be cheap, and this property isn't. Consider making an ordinary method instead.
When is the time-base array t generated?
When it is used. i.e. when you write print sig.t
If the sig.t attribute is dynamically calculated, will it persist beyond the print command? (i.e. has memory been allocated to store the time base as an object attribute?).
Nope. The next time your code references sig.t, a new object will be created.
If the property() decorator does not provide this desired functionality (i.e. minimal memory overhead, ease of use on an as-needed basis), what should I be using? Simply a class method? Or is there a different, more optimal solution? Thanks!
There are differing opinions here I suspect... you can modify the code so that you cache the value and return the same thing each call:
class Signal(object):
def __init__(self, x, Fs, t0):
self.x = x
self.Fs = Fs
self.t0 = t0
self._t = None
return
#property
def t(self):
if self._t is not None:
return self._t
self._t = self.t0 + (np.arange(len(self.x)) / self.Fs)
return self._t
But here you have no way of telling the class that t should be re-computed unless you make a setter...
If t isn't going to change after initialization, then why not just make it a public property?
class Signal(object):
def __init__(self, x, Fs, t0):
self.x = x
self.Fs = Fs
self.t0 = t0
self.t = self.t0 + (np.arange(len(self.x)) / self.Fs)
In your example, you are dynamically generating the Signal.t value every time you access the attribute t because you are essentially calling Signal.t() to access it.So, you are not storing the value just returning it.
Whenever there is a #property decorator being used around a class, it is quite common for that function to act as a "getter" for a "private"(not really) variable, and sometimes there is a "setter" for that "private" variable.
When I mean "private" I really mean attributes that are named to reflect that these attributes should not be accessed directly. However, in python you can access any attribute and so there isn't any private variables since python objects can be changed quite easy.
If you want to store your values, then you should do something like this.
import numpy as np
class Signal(object):
def __init__(self, x, Fs, t0):
self.x = x
self.Fs = Fs
self.t0 = t0
self._t = None
return
#property
def t(self):
if self._t is None:
self._t = self.t0 + (np.arange(len(self.x)) / self.Fs)
return self._t
#t.setter
def t(self,value):
self._t = value
The above example, will only calculate it once and store it inside _t , but you get the point. When the #property decorator is being used, usually there is an underlying variable that is used to retrieve and store a value.Hope that helps
Related
I'm programming an optimizer that has to run through several possible variations. The team wants to implement multithreading to get through those variants faster. This means I've had to put all my functions inside a thread-class. My problem is with my call of the wrapper function
class variant_thread(threading.Thread):
def __init__(self, name, variant, frequencies, fit_vals):
threading.Thread.__init__(self)
self.name = name
self.elementCount = variant
self.frequencies = frequencies
self.fit_vals = fit_vals
def run(self):
print("Running Variant:", self.elementCount) # display thread currently running
fitFunction = self.Wrapper_Function(self.elementCount)
self.popt, pcov, self.infoRes = curve_fit_my(fitFunction, self.frequencies, self.fit_vals)
def Optimize_Wrapper(self, frequencies, *params): # wrapper which returns values in manner which optimizer can work with
cut = int(len(frequencies)/2) <---- ERROR OCCURS HERE
freq = frequencies[:cut]
vals = (stuff happens here)
return (stuff in proper form for optimizer)
I've cut out as much as I could to simplify the example, and I hope you can understand what's going on. Essentially, after the thread is created it calls the optimizer. The optimizer sends the list of frequencies and the parameters it wants to change to the Optimize_Wrapper function.
The problem is that Optimize-Wrapper takes the frequencies-list and saves them to "self". This means that the "frequencies" variable becomes a single float value, as opposed to the list of floats it should be. Of course this throws an errorswhen I try to take len(frequencies). Keep in mind I also need to use self later in the function, so I can't just create a static method.
I've never had the problem that a class method saved any values to "self". I know it has to be declared explicitly in Python, but anything I've ever passed to the class method always skips "self" and saves to my declared variables. What's going on here?
Don't pass instance variables to methods. They are already accessible through self. And be careful about which variable is which. The first parameter to Wrapper_function is called "frequency", but you call it as self.Wrapper_Function(self.elementCount) - so you have a self.frequency and a frequency ... and they are different things. Very confusing!
class variant_thread(threading.Thread):
def __init__(self, name, variant, frequencies, fit_vals):
threading.Thread.__init__(self)
self.name = name
self.elementCount = variant
self.frequencies = frequencies
self.fit_vals = fit_vals
def run(self):
print("Running Variant:", self.elementCount) # display thread currently running
fitFunction = self.Wrapper_Function()
self.popt, pcov, self.infoRes = curve_fit_my(fitFunction, self.frequencies, self.fit_vals)
def Optimize_Wrapper(self): # wrapper which returns values in manner which optimizer can work with
cut = int(len(self.frequencies)/2) <---- ERROR OCCURS HERE
freq = self.frequencies[:cut]
vals = (stuff happens here)
return (stuff in proper form for optimizer)
You don't have to subclass Thread to run a thread. Its frequently easier to define a function and have Thread call that function. In your case, you may be able to put the variant processing in a function and use a thread pool to run them. This would save all the tedious handling of the thread object itself.
def run_variant(name, variant, frequencies, fit_vals):
cut = int(len(self.frequencies)/2) <---- ERROR OCCURS HERE
freq = self.frequencies[:cut]
vals = (stuff happens here)
proper_form = (stuff in proper form for optimizer)
return curve_fit_my(fitFunction, self.frequencies, self.fit_vals)
if __name__ == "__main__":
variants = (make the variants)
name = "name"
frequencies = (make the frequencies)
fit_vals = (make the fit_vals)
from multiprocessing.pool import ThreadPool
with ThreadPool() as pool:
for popt, pcov, infoRes in pool.starmap(run_variant,
((name, variant, frequencies, fit_vals) for variant in variants)):
# do the other work here
Is it possible to define an instance variable in a class as a function of another? I haven't gotten it to work unless you redefine the "function instance variable" all the time.
Basically you could have a scenario where you have one instance variable that is a list of integers, and want to have the sum of these as an instance variable, that automatically redefines every time the list is updated.
Is this possible?
class Example:
list_variable = []
sum_variable = sum(list_variable)
def __init__(self, list_variable):
self.list_variable = list_variable
return
This will result in sum_variable = 0 unless you change it.
I understand that this is far from a major issue, you could either define sum_variable as a method or redefine it every time you change list_variable, I'm just wondering if it's possible to skip those things/steps.
Python offers the property decorator for a syntatically identical use of your example:
class Example:
list_variable = []
def __init__(self, list_variable):
self.list_variable = list_variable
return
#property
def sum_variable(self):
return sum(self.list_variable)
e = Example(list_variable=[10, 20, 30])
e.sum_variable # returns 60
I have created an immutable data type in Python where millions of objects are created and released every second. After profiling the code thoroughly, it looks like the constructor is where most of the time is spent.
The solution I thought of was to use an object pool so that reference counting and memory allocation is done all at once. I have looked at solutions like this, where acquire and release methods need to be called explicitly.
However, the class I have implemented is similar to Decimal class in Python where objects are created and released automatically by numpy. For example, a short piece of code that uses my class will look like this (I use Decimal instead of my own class):
import numpy as np
from decimal import Decimal
x = np.array([[Decimal(1), Decimal(2)], [Decimal(3), Decimal(4)]]
y = np.array([[Decimal(5), Decimal(6)], [Decimal(7), Decimal(8)]]
z = (x * y) + (2 * x) - (y ** 2) + (x ** 3)
Because the class is immutable, numpy needs to create a new object for each operation and this slows down the whole code. Additionally, because numpy is the code that is creating these objects, I do not think that I can explicitly call methods such as acquire or release.
Is there a better implementation of an object pool or some other method where a lot of objects are created all at once and later, the released objects are automatically placed back in the pool? In other words, is there another solution that avoids frequent object creation and destruction?
P.S. I understand that this is not a good way of using numpy. This is one of the first steps in my design and hopefully, numpy will be used more efficiently in next steps.
Would something like this work?
class Pool():
def __init__(self, type_, extra_alloc=1):
self._objects = []
self.type = type_
self.extra_alloc = extra_alloc
def allocate(self, size):
self._objects.extend(object.__new__(self.type) for _ in range(size))
def get_obj(self):
print("Getting object")
if not self._objects:
self.allocate(self.extra_alloc)
return self._objects.pop()
def give_obj(self, obj):
print("Object released")
self._objects.append(obj)
class Thing(): # This can also be used as a base class
pool = None
def __new__(self, *args):
return self.pool.get_obj()
def __del__(self):
self.pool.give_obj(self)
thing_pool = Pool(Thing)
Thing.pool = thing_pool
Thing()
# Getting object
# Object released
x = Thing()
# Getting object
del x
# Object released
In the snippet below, how do I avoid computing the following numpy variables mask, zbar, te , ro and rvol in the procedures Get_mask, Get_K_Shell_Te etc? These variables are large arrays and I have to define at least six more procedures that reuse them. It looks like what I am doing is not a good idea and is slow.
import numpy as np
# this computes various quantities related to the shell in a object oriented way
class Shell_Data:
def __init__(self, data):
self.data = data
def Get_mask(self):
zbar=self.data['z2a1']
y=self.data['y']*1000
mask= np.logical_and(zbar >= 16 ,zbar<= 19 )
return self.mask
def Get_Shell_Te(self):
self.mask =self.Get_mask()
te =self.data['te'][self.mask]
ro =self.data['ro'][self.mask]
rvol =self.data['rvol'][self.mask]
self.Shell_Te=np.sum(te*ro/rvol)/(np.sum(ro/rvol))
print "Shell Temperature= %0.3f eV" % (self.Shell_Te)
return self.Shell_Te
def Get_Shell_ro(self):
mask =self.Get_mask()
te =self.data['te'][mask]
ro =self.data['ro'][mask]
rvol =self.data['rvol'][mask]
radk =self.data['radk'][mask]
self.Shell_ro=np.sum(radk*ro/rvol)/np.sum(radk/rvol)
return self.Shell_ro
zbar depends on self.data. If you update self.data, you likely have to re-compute it.
If you can make your data immutable, you can compute these values once, e.g. in the constructor.
If you want to avoid calculating the mask data until it's actually required, you can cache the value, like so:
class Shell_Data(...):
def __init__(self,...):
self.cached_mask = None
...
# #property makes an access to self.mask
# to actually return the result of a call to self.mask()
#property
def mask(self):
if self.cached_mask is None: # Not yet calculated.
self.cached_mask = self.computeMask()
return self.cached_mask
def computeMask(self):
zbar = ...
...
return np.logical_and(...)
def someComputation(self):
# The first access to self.mask will compute it.
te = self.data['te'][self.mask]
# The second access will just reuse the same result.
ro = self.data['ro'][self.mask]
...
If you have to mutate self.data, you can cache the computed data, and re-calculate it only when self.data changes. E.g. if you had a setData() method for that, you could recalculate the mask in it, or just set self.cached_mask to None.
(Also, read about instance variables again.
Every method receives the parameter named self, the instance of the object for which it is called. You can access all its instance variables as self.something, exactly the way you access instance variables (and methods) when an object is not called self. If you set an instance variable in one method, you can just access it an another (e.g. self.mask), no need to return it. If you return something from a method, likely it's not worth storing as an instance variable, like self.mask.)
I have two classes - one which inherits from the other. I want to know how to cast to (or create a new variable of) the sub class. I have searched around a bit and mostly 'downcasting' like this seems to be frowned upon, and there are some slightly dodgy workarounds like setting instance.class - though this doesn't seem like a nice way to go.
eg.
http://www.gossamer-threads.com/lists/python/python/871571
http://code.activestate.com/lists/python-list/311043/
sub question - is downcasting really that bad? If so why?
I have simplified code example below - basically i have some code that creates a Peak object after having done some analysis of x, y data. outside this code I know that the data is 'PSD' data power spectral density - so it has some extra attributes. How do i down cast from Peak, to Psd_Peak?
"""
Two classes
"""
import numpy as np
class Peak(object) :
"""
Object for holding information about a peak
"""
def __init__(self,
index,
xlowerbound = None,
xupperbound = None,
xvalue= None,
yvalue= None
):
self.index = index # peak index is index of x and y value in psd_array
self.xlowerbound = xlowerbound
self.xupperbound = xupperbound
self.xvalue = xvalue
self.yvalue = yvalue
class Psd_Peak(Peak) :
"""
Object for holding information about a peak in psd spectrum
Holds a few other values over and above the Peak object.
"""
def __init__(self,
index,
xlowerbound = None,
xupperbound = None,
xvalue= None,
yvalue= None,
depth = None,
ampest = None
):
super(Psd_Peak, self).__init__(index,
xlowerbound,
xupperbound,
xvalue,
yvalue)
self.depth = depth
self.ampest = ampest
self.depthresidual = None
self.depthrsquared = None
def peakfind(xdata,ydata) :
'''
Does some stuff.... returns a peak.
'''
return Peak(1,
0,
1,
.5,
10)
# Find a peak in the data.
p = peakfind(np.random.rand(10),np.random.rand(10))
# Actually the data i used was PSD -
# so I want to add some more values tot he object
p_psd = ????????????
edit
Thanks for the contributions.... I'm afraid I was feeling rather downcast(geddit?) since the answers thus far seem to suggest I spend time hard coding converters from one class type to another. I have come up with a more automatic way of doing this - basically looping through the attributes of the class and transfering them one to another. how does this smell to people - is it a reasonable thing to do - or does it spell trouble ahead?
def downcast_convert(ancestor, descendent):
"""
automatic downcast conversion.....
(NOTE - not type-safe -
if ancestor isn't a super class of descendent, it may well break)
"""
for name, value in vars(ancestor).iteritems():
#print "setting descendent", name, ": ", value, "ancestor", name
setattr(descendent, name, value)
return descendent
You don't actually "cast" objects in Python. Instead you generally convert them -- take the old object, create a new one, throw the old one away. For this to work, the class of the new object must be designed to take an instance of the old object in its __init__ method and do the appropriate thing (sometimes, if a class can accept more than one kind of object when creating it, it will have alternate constructors for that purpose).
You can indeed change the class of an instance by pointing its __class__ attribute to a different class, but that class may not work properly with the instance. Furthermore, this practice is IMHO a "smell" indicating that you should probably be taking a different approach.
In practice, you almost never need to worry about types in Python. (With obvious exceptions: for example, trying to add two objects. Even in such cases, the checks are as broad as possible; here, Python would check for a numeric type, or a type that can be converted to a number, rather than a specific type.) Thus it rarely matters what the actual class of an object is, as long as it has the attributes and methods that whatever code is using it needs.
See following example. Also, be sure to obey the LSP (Liskov Substitution Principle)
class ToBeCastedObj:
def __init__(self, *args, **kwargs):
pass # whatever you want to state
# original methods
# ...
class CastedObj(ToBeCastedObj):
def __init__(self, *args, **kwargs):
pass # whatever you want to state
#classmethod
def cast(cls, to_be_casted_obj):
casted_obj = cls()
casted_obj.__dict__ = to_be_casted_obj.__dict__
return casted_obj
# new methods you want to add
# ...
This isn't a downcasting problem (IMHO). peekfind() creates a Peak object - it can't be downcast because its not a Psd_Peak object - and later you want to create a Psd_Peak object from it. In something like C++, you'd likely rely on the default copy constructor - but that's not going to work, even in C++, because your Psd_Peak class requires more parameters in its constructor. In any case, python doesn't have a copy constructor, so you end up with the rather verbose (fred=fred, jane=jane) stuff.
A good solution may be to create an object factory and pass the type of Peak object you want to peekfind() and let it create the right one for you.
def peak_factory(peak_type, index, *args, **kw):
"""Create Peak objects
peak_type Type of peak object wanted
(you could list types)
index index
(you could list params for the various types)
"""
# optionally sanity check parameters here
# create object of desired type and return
return peak_type(index, *args, **kw)
def peakfind(peak_type, xdata, ydata, **kw) :
# do some stuff...
return peak_factory(peak_type,
1,
0,
1,
.5,
10,
**kw)
# Find a peak in the data.
p = peakfind(Psd_Peak, np.random.rand(10), np.random.rand(10), depth=111, ampest=222)