I am quite new to Python and I have been facing a problem for which I could not find a direct answer here on stackoverflow (but I guess I am just not experienced enough to google for the correct terms). I hope you can help 😊
Consider this:
import numpy as np
class Data:
def __init__(self, data):
self.data = data
def get_dimensions(self):
return np.shape(self.data)
test = Data(np.random.random((20, 15)))
print(test.get_dimensions())
This gives me
(20, 15)
just as I wanted.
Now here is what I want to do:
During my data processing I will need to get the shape of my datasets quite often, especially within the class itself. However, I do not want to call numpy every time I do
self.get_dimensions()
as I think this would always go though the process of analysing the array. Is there a way to calculate the shape variable just once and then share it within the class so I save computation time?
My Problem is more complicated, as I need to first open files, read them and the from this get the shape of the data, so I really want to avoid doing this every time I want to get the shape...
I hope you see my problem
thanks!! 😊
EDIT:
My question has already been answered, however I wanted to ask a follow up question if this would also be efficient:
import numpy as np
class Data:
def __init__(self, data):
self.data = data
self.dimensions = self._get_dimensions()
def _get_dimensions(self):
return np.shape(self.data)
test = Data(np.random.random((20, 15)))
print(test.dimensions)
I ask this, because with the method you guys described I have to calculate it at least once somewhere, before I can get the dimensions. Would this way also always go through the calculation process, or store it just once?
Thanks again! 😊
Sure, you could do it like this:
class Data:
def __init__(self, data):
self.data = data
self.dimensions = None
def get_dimensions(self):
self.dimensions = (np.shape(self.data) if
self.dimensions is None else
self.dimensions)
return self.dimensions
If you ever need to modify self.data and recalculate self.dimensions, you could be served better with a keyword argument to specify whether you'd like to recalculate the result. Ex:
def get_dimensions(self, calculate=False):
self.dimensions = (np.shape(self.data)
if calculate or self.dimensions is None
else self.dimensions)
return self.dimensions
You can cache the result as a member variable (if I understood the question correctly):
import numpy as np
class Data:
def __init__(self, data):
self.data = data
self.result = None
def get_dimensions(self):
if not self.result:
self.result = np.shape(self.data)
return self.result
test = Data(np.random.random((20, 15)))
print(test.get_dimensions())
The shape of an array is directly stored on an array and is not a computed value. The shape of the array has to stored as the backing memory is a flat array. Thus (4, 4), (2, 8) and (16,) would have the same backing array. Without storing the shape, the array cannot perform indexing operations. numpy.shape is only really useful for acquiring the shape of array-like objects (such as lists or tuples).
shape = self.data.shape
I missed the last bit where you were concerned about some other large expensive computation that you haven't shown. The best solution is to cache the computed value the first time and return the cached value on later method calls.
To cope with additional computation
from random import random
class Data:
#property
def dimensions(self):
# Do a try/except block as the exception will only every be thrown the first
# time. Secondary invocations will work quicker and not require any checking.
try:
return self._dimensions
except AttributeError:
pass
# some complex computation
self._dimensions = random()
return self._dimensions
d = Data()
assert d.dimensions == d.dimensions
if your dimension is not changing you can do it more simple.
import numpy as np
class Data:
def __init__(self, data):
self.data = data
self.dimension=np.shape(self.data)
def get_dimensions(self):
return self.dimension
test = Data(np.random.random((20, 15)))
print(test.get_dimensions())
Related
OpenAI's baselines use the following code to return a LazyFrames instead of a concatenated numpy array to save memory. The idea is to take the advantage of the fact that a numpy array can be saved at different lists at the same time as lists only save a reference not object itself. However, in the implementation of LazyFrames, it further saves the concatenated numpy array in self._out, in that case if every LazyFrames object has been invoked at least once, it will always save a concatenated numpy array within it, which does not seem to save any memory at all. Then what's the point of LazeFrames? Or do I misunderstand anything?
class FrameStack(gym.Wrapper):
def __init__(self, env, k):
"""Stack k last frames.
Returns lazy array, which is much more memory efficient.
See Also
--------
baselines.common.atari_wrappers.LazyFrames
"""
gym.Wrapper.__init__(self, env)
self.k = k
self.frames = deque([], maxlen=k)
shp = env.observation_space.shape
self.observation_space = spaces.Box(low=0, high=255, shape=(shp[:-1] + (shp[-1] * k,)), dtype=env.observation_space.dtype)
def reset(self):
ob = self.env.reset()
for _ in range(self.k):
self.frames.append(ob)
return self._get_ob()
def step(self, action):
ob, reward, done, info = self.env.step(action)
self.frames.append(ob)
return self._get_ob(), reward, done, info
def _get_ob(self):
assert len(self.frames) == self.k
return LazyFrames(list(self.frames))
class LazyFrames(object):
def __init__(self, frames):
"""This object ensures that common frames between the observations are only stored once.
It exists purely to optimize memory usage which can be huge for DQN's 1M frames replay
buffers.
This object should only be converted to numpy array before being passed to the model.
You'd not believe how complex the previous solution was."""
self._frames = frames
self._out = None
def _force(self):
if self._out is None:
self._out = np.concatenate(self._frames, axis=-1)
self._frames = None
return self._out
def __array__(self, dtype=None):
out = self._force()
if dtype is not None:
out = out.astype(dtype)
return out
def __len__(self):
return len(self._force())
def __getitem__(self, i):
return self._force()[i]
def count(self):
frames = self._force()
return frames.shape[frames.ndim - 1]
def frame(self, i):
return self._force()[..., I]
I actually came here trying to understand how this saved any memory at all! But you mention that the lists store references to the underlying data, while the numpy arrays store copies of that data, and I think you are correct about that.
To answer your question, you are right! When _force is called, it populates the self._out item with a numpy array, thereby expanding the memory. But until you call _force (which is called in any of the API functions of the LazyFrame), self._out is None. So the idea is to not call _force (and therefore, don't call any of the LazyFrames methods) until you need the underlying data, hence the warning in its doc string of "This object should only be converted to numpy array before being passed to the model".
Note that when self._out gets populated by the array, it also clears the self._frames, so that it doesn't store duplicate information (thereby harming its whole purpose of storing only as much as it needs).
Also, in the same file, you'll find the ScaledFloatFrame which carries this doc string:
Scales the observations by 255 after converting to float.
This will undo the memory optimization of LazyFrames,
so don't use it with huge replay buffers.
I was asking myself if it is possible to turn the output of a class into a np.array within the class itself.
I created the following class:
class stats:
def __init__( self, x ):
self.age = x[:,0]
self.education = x[:,1]
self.married = x[:,2]
self.nodegree = x[:,3]
self.RE75 = x[:,4]
self.RE78 = x[:,5]
def Vector( self ):
age = [np.mean(self.age), st.stdev(self.age)]
education = [np.mean(self.education), st.stdev(self.education)]
married = [np.mean(self.married), st.stdev(self.married)]
nodegree = [np.mean(self.nodegree), st.stdev(self.nodegree)]
RE75 = [np.mean(self.RE75), st.stdev(self.RE75)]
RE78 = [np.mean(self.RE78), st.stdev(self.RE78)]
return [age, education, married, nodegree, RE75, RE78]
results1 is a numpy.ndarray of shape 156x6.
I basically want to compute the mean as well as standard deviation for each column of results1 using a class. I use numpy to compute the mean and statistics for the std.
When I am printing the output I get the following:
results1_stats = stats(results1)
print(results1_stats.Vector())
Output:
[[25.98076923076923, 7.299554695959556], [10.314102564102564, 2.0597666237347005], [0.1858974358974359, 0.39027677820527085], [0.7243589743589743, 0.448275807219502], [1490.7220884615383, 3296.5535502409775], [6136.320646794872, 8143.4659725229685]]
Apparently, the class is working as wanted (although there is probablly a more efficent way to code this up).
The problem is, that I would lilke to get the output in a np.array of shape 6x2 (or transposed) directly from the class itself. However, since I just began using classes I don't know if that is even possible.
Any help is appreciated :)
Thank you!
You can construct an numpy array using np.array(your_list_sequence). Additionally, you can use list comprehension to convert list of lists to numpy array. More info here.
Try this:
def get_stats(results):
return np.array([np.array([np.mean(results[:, column]), st.stdev(results[:, column])]) for column in range(6)])
your_new_np_array = get_status(results)
Although, if you want only stats array, having a function for this instead of class would be better and simpler. But, you can easily include that method in your class and get back your expected result.
I recently moved from Matlab to Python and want to transfer some Matlab code to Python. However an obstacle popped up.
In Matlab you can define a class with its methods and create nd-arrays of instances. The nice thing is that you can apply the class methods to the array of instances as long as the method is written so it can deal with the arrays. Now in Python I found that this is not possible: when applying a class method to a list of instances it will not find the class method. Below an example of how I would write the code:
class testclass():
def __init__(self, data):
self.data = data
def times5(self):
return testclass(self.data * 5)
classlist = [testclass(1), testclass(10), testclass(100)]
times5(classlist)
This will give an error on the times5(classlist) line. Now this is a simple example explaining what I want to do (the final class will have multiple numpy arrays as variables).
What is the best way to get this kind of functionality in Python? The reason I want to do this is because it allows batch operations and they make the class a lot more powerful. The only solution I can think of is to define a second class that has a list of instances of the first class as variables. The batch processing would need to be implemented in the second class then.
thanks!
UPDATE:
In your comment , I notice this sentence,
For example a function that takes the data of the first class in the list and substracts the data of all following classe.
This can be solved by reduce function.
class testclass():
def __init__(self, data):
self.data = data
def times5(self):
return testclass(self.data * 5)
from functools import reduce
classlist = [x.data for x in [testclass(1), testclass(10), testclass(100)]]
result = reduce(lambda x,y:x-y,classlist[1:],classlist[0])
print(result)
ORIGIN ANSWER:
In fact, what you need is List Comprehensions.
Please let me show you the code
class testclass():
def __init__(self, data):
self.data = data
def times5(self):
return testclass(self.data * 5)
classlist = [testclass(1), testclass(10), testclass(100)]
results = [x.times5() for x in classlist]
print(results)
My code is something like this:
def something():
"""this function will call the `get_array` multiple times until the
end of the array and is fixed (cannot change this function)"""
#skip some functions and initialization there
#only used to get the result of `get_array`
a.get_array()
In a separate file:
class a:
def __init__(self, count=-1):
self.count = count
def cal_array(self, parameters):
"""calculate array"""
return array
def get_array(self, parameters):
if self.count == -1:
self.cal_array(parameters)
# some methods to store array there
else:
self.count += 1
return array[self.count,:]
I want to know if there is a way to calculate the numpy array once and save it for future use (calling from another function).
I know I can save the numpy array to the disk or use a global dict to save the array, but I wonder if there is a more efficient way to do this.
I apologize in advance if there is an obvious solution to this question or it is a duplicate.
I have a class as follows:
class Kernel(object):
""" creates kernels with the necessary input data """
def __init__(self, Amplitude, random = None):
self.Amplitude = Amplitude
self.random = random
if random != None:
self.dims = list(random.shape)
def Gaussian(self, X, Y, sigmaX, sigmaY, muX=0.0, muY=0.0):
""" return a 2 dimensional Gaussian kernel """
kernel = np.zeros([X, Y])
theta = [self.Amplitude, muX, muY, sigmaX, sigmaY]
for i in range(X):
for j in range(Y):
kernel[i][j] = integrate.dblquad(lambda x, y: G2(x + float(i) - (X-1.0)/2.0, \
y + float(j) - (Y-1.0)/2.0, theta), \
-0.5, 0.5, lambda y: -0.5, lambda y: 0.5)[0]
return kernel
It just basically creates a bunch of convolution kernels (I've only included the first).
I want to add an instance (method?) to this class so that I can use something like
conv = Kernel(1.5)
conv.Gaussian(9, 9, 2, 2).kershow()
and have the array pop up using Matplotlib. I know how to write this instance and plot it with Matplotlib, but I don't know how to write this class so that for each method I would like to have this additional ability (i.e. .kershow()), I may call it in this manner.
I think I could use decorators ? But I've never used them before. How can I do this?
The name of the thing you're looking for is function or method chaining.
Strings are a really good example of this in Python. Because a string is immutable, each string method returns a new string. So you can call string methods on the return values, rather than storing the intermediate value. For example:
lower = ' THIS IS MY NAME: WAYNE '.lower()
without_left_padding = lower.lstrip()
without_right_padding = without_left_padding.rstrip()
title_cased = without_right_padding.title()
Instead you could write:
title_cased = ' THIS IS MY NAME: WAYNE '.lower().lstrip().rstrip().title()
Of course really you'd just do .strip().title(), but this is an example.
So if you want a .kernshow() option, then you'll need to include that method on whatever you return. In your case, numpy arrays don't have a .kernshow method, so you'll need to return something that does.
Your options are mostly:
A subclass of numpy arrays
A class that wraps the numpy array
I'm not sure what is involved with subclassing the numpy array, so I'll stick with the latter as an example. Either you can use the kernel class, or create a second class.
Alex provided an example of using your kernel class, but alternatively you could have another class like this:
class KernelPlotter(object):
def __init__(self, kernel):
self.kernel = kernel
def kernshow(self):
# do the plotting here
Then you would pretty much follow your existing code, but rather than return kernel you would do return KernelPlotter(kernel).
Which option you choose really depends on what makes sense for your particular problem domain.
There's another sister to function chaining called a fluent interface that's basically function chaining but with the goal of making the interface read like English. For example you might have something like:
Kernel(with_amplitude=1.5).create_gaussian(with_x=9, and_y=9, and_sigma_x=2, and_sigma_y=2).show_plot()
Though obviously there can be some problems when writing your code this way.
Here's how I would do it:
class Kernel(object):
def __init__ ...
def Gaussian(...):
self.kernel = ...
...
return self # not kernel
def kershow(self):
do_stuff_with(self.kernel)
Basically the Gaussian method doesn't return a numpy array, it just stores it in the Kernel object to be used elsewhere in the class. In particular kershow can now use it. The return self is optional but allows the kind of interface you wanted where you write
conv.Gaussian(9, 9, 2, 2).kershow()
instead of
conv.Gaussian(9, 9, 2, 2)
conv.kershow()