I was asking myself if it is possible to turn the output of a class into a np.array within the class itself.
I created the following class:
class stats:
def __init__( self, x ):
self.age = x[:,0]
self.education = x[:,1]
self.married = x[:,2]
self.nodegree = x[:,3]
self.RE75 = x[:,4]
self.RE78 = x[:,5]
def Vector( self ):
age = [np.mean(self.age), st.stdev(self.age)]
education = [np.mean(self.education), st.stdev(self.education)]
married = [np.mean(self.married), st.stdev(self.married)]
nodegree = [np.mean(self.nodegree), st.stdev(self.nodegree)]
RE75 = [np.mean(self.RE75), st.stdev(self.RE75)]
RE78 = [np.mean(self.RE78), st.stdev(self.RE78)]
return [age, education, married, nodegree, RE75, RE78]
results1 is a numpy.ndarray of shape 156x6.
I basically want to compute the mean as well as standard deviation for each column of results1 using a class. I use numpy to compute the mean and statistics for the std.
When I am printing the output I get the following:
results1_stats = stats(results1)
print(results1_stats.Vector())
Output:
[[25.98076923076923, 7.299554695959556], [10.314102564102564, 2.0597666237347005], [0.1858974358974359, 0.39027677820527085], [0.7243589743589743, 0.448275807219502], [1490.7220884615383, 3296.5535502409775], [6136.320646794872, 8143.4659725229685]]
Apparently, the class is working as wanted (although there is probablly a more efficent way to code this up).
The problem is, that I would lilke to get the output in a np.array of shape 6x2 (or transposed) directly from the class itself. However, since I just began using classes I don't know if that is even possible.
Any help is appreciated :)
Thank you!
You can construct an numpy array using np.array(your_list_sequence). Additionally, you can use list comprehension to convert list of lists to numpy array. More info here.
Try this:
def get_stats(results):
return np.array([np.array([np.mean(results[:, column]), st.stdev(results[:, column])]) for column in range(6)])
your_new_np_array = get_status(results)
Although, if you want only stats array, having a function for this instead of class would be better and simpler. But, you can easily include that method in your class and get back your expected result.
Related
I am trying to map the individual rows of a dataframe into a custom object. The dataframe consists of multiple molecules that interact with a specific target. Additionally, multiple molecular descriptors are given. A slice is given below:
Now i need to map each instance into a Molecule object defined as something like this:
class Molecule:
allDescriptorKeys = []
def __init__(self, smiles, target, values):
self.smiles = smiles
self.target = target
self.d = {}
for i in range(len(Molecule.allDescriptorKeys)):
self.d[Molecule.allDescriptorKeys[i]] = values[i]
Where the allDescriptorsKeys class variable is set from outside the class using
def initdescriptorkeys(df):
Molecule.allDescriptorKeys = df.keys().values
Now I need a class function readMolDescriptors that reads in the molecule descriptors of a single molecule(row/instance). To use it later on in an external method to loop over the whole dataframe .I guess I need something like this:
def readMolDescriptors(self, index):
smiles = df.iloc[index]["SMILES"]
target = df.iloc[index]["Target"]
values = df.iloc[index][2:-1]
newMolecule = Molecule(smiles, target, values)
return newMolecule
But of course this is not a class function since the df is defined outside the class. I have a hard time wrapping my head around this, probably easy, problem. Hope someone can help.
It seems that you want to build a class from which you build a new instance for each row of the dataframe, and after that you want to get rid of the dataframe and play with those Molecule instances alone. Consider this:
class Molecule:
def __init__(self, data_row):
''' data_row: pd.Series. '''
self.smiles = data_row['SMILES']
# more self.xxx = data_row['xxx']
self.d = data_row.to_dict()
With this you can create an object of Molecule using a data row. For example,
molecules = [Molecule(data_row) for index, data_row in df.iterrows()]
To access a certain descriptor (e.g. nAT) value from the first molecule, you may do
print(molecules[0].d['nAT'])
although you can choose to define more dedicated method with the class to handle access like that.
Ofcourse, to build something like readMolDescriptors, below is my version.
def build_molecule_from_dataframe(df, index):
return Molecule(df.loc[index])
I recently moved from Matlab to Python and want to transfer some Matlab code to Python. However an obstacle popped up.
In Matlab you can define a class with its methods and create nd-arrays of instances. The nice thing is that you can apply the class methods to the array of instances as long as the method is written so it can deal with the arrays. Now in Python I found that this is not possible: when applying a class method to a list of instances it will not find the class method. Below an example of how I would write the code:
class testclass():
def __init__(self, data):
self.data = data
def times5(self):
return testclass(self.data * 5)
classlist = [testclass(1), testclass(10), testclass(100)]
times5(classlist)
This will give an error on the times5(classlist) line. Now this is a simple example explaining what I want to do (the final class will have multiple numpy arrays as variables).
What is the best way to get this kind of functionality in Python? The reason I want to do this is because it allows batch operations and they make the class a lot more powerful. The only solution I can think of is to define a second class that has a list of instances of the first class as variables. The batch processing would need to be implemented in the second class then.
thanks!
UPDATE:
In your comment , I notice this sentence,
For example a function that takes the data of the first class in the list and substracts the data of all following classe.
This can be solved by reduce function.
class testclass():
def __init__(self, data):
self.data = data
def times5(self):
return testclass(self.data * 5)
from functools import reduce
classlist = [x.data for x in [testclass(1), testclass(10), testclass(100)]]
result = reduce(lambda x,y:x-y,classlist[1:],classlist[0])
print(result)
ORIGIN ANSWER:
In fact, what you need is List Comprehensions.
Please let me show you the code
class testclass():
def __init__(self, data):
self.data = data
def times5(self):
return testclass(self.data * 5)
classlist = [testclass(1), testclass(10), testclass(100)]
results = [x.times5() for x in classlist]
print(results)
I apologize in advance if there is an obvious solution to this question or it is a duplicate.
I have a class as follows:
class Kernel(object):
""" creates kernels with the necessary input data """
def __init__(self, Amplitude, random = None):
self.Amplitude = Amplitude
self.random = random
if random != None:
self.dims = list(random.shape)
def Gaussian(self, X, Y, sigmaX, sigmaY, muX=0.0, muY=0.0):
""" return a 2 dimensional Gaussian kernel """
kernel = np.zeros([X, Y])
theta = [self.Amplitude, muX, muY, sigmaX, sigmaY]
for i in range(X):
for j in range(Y):
kernel[i][j] = integrate.dblquad(lambda x, y: G2(x + float(i) - (X-1.0)/2.0, \
y + float(j) - (Y-1.0)/2.0, theta), \
-0.5, 0.5, lambda y: -0.5, lambda y: 0.5)[0]
return kernel
It just basically creates a bunch of convolution kernels (I've only included the first).
I want to add an instance (method?) to this class so that I can use something like
conv = Kernel(1.5)
conv.Gaussian(9, 9, 2, 2).kershow()
and have the array pop up using Matplotlib. I know how to write this instance and plot it with Matplotlib, but I don't know how to write this class so that for each method I would like to have this additional ability (i.e. .kershow()), I may call it in this manner.
I think I could use decorators ? But I've never used them before. How can I do this?
The name of the thing you're looking for is function or method chaining.
Strings are a really good example of this in Python. Because a string is immutable, each string method returns a new string. So you can call string methods on the return values, rather than storing the intermediate value. For example:
lower = ' THIS IS MY NAME: WAYNE '.lower()
without_left_padding = lower.lstrip()
without_right_padding = without_left_padding.rstrip()
title_cased = without_right_padding.title()
Instead you could write:
title_cased = ' THIS IS MY NAME: WAYNE '.lower().lstrip().rstrip().title()
Of course really you'd just do .strip().title(), but this is an example.
So if you want a .kernshow() option, then you'll need to include that method on whatever you return. In your case, numpy arrays don't have a .kernshow method, so you'll need to return something that does.
Your options are mostly:
A subclass of numpy arrays
A class that wraps the numpy array
I'm not sure what is involved with subclassing the numpy array, so I'll stick with the latter as an example. Either you can use the kernel class, or create a second class.
Alex provided an example of using your kernel class, but alternatively you could have another class like this:
class KernelPlotter(object):
def __init__(self, kernel):
self.kernel = kernel
def kernshow(self):
# do the plotting here
Then you would pretty much follow your existing code, but rather than return kernel you would do return KernelPlotter(kernel).
Which option you choose really depends on what makes sense for your particular problem domain.
There's another sister to function chaining called a fluent interface that's basically function chaining but with the goal of making the interface read like English. For example you might have something like:
Kernel(with_amplitude=1.5).create_gaussian(with_x=9, and_y=9, and_sigma_x=2, and_sigma_y=2).show_plot()
Though obviously there can be some problems when writing your code this way.
Here's how I would do it:
class Kernel(object):
def __init__ ...
def Gaussian(...):
self.kernel = ...
...
return self # not kernel
def kershow(self):
do_stuff_with(self.kernel)
Basically the Gaussian method doesn't return a numpy array, it just stores it in the Kernel object to be used elsewhere in the class. In particular kershow can now use it. The return self is optional but allows the kind of interface you wanted where you write
conv.Gaussian(9, 9, 2, 2).kershow()
instead of
conv.Gaussian(9, 9, 2, 2)
conv.kershow()
I am quite new to Python and I have been facing a problem for which I could not find a direct answer here on stackoverflow (but I guess I am just not experienced enough to google for the correct terms). I hope you can help 😊
Consider this:
import numpy as np
class Data:
def __init__(self, data):
self.data = data
def get_dimensions(self):
return np.shape(self.data)
test = Data(np.random.random((20, 15)))
print(test.get_dimensions())
This gives me
(20, 15)
just as I wanted.
Now here is what I want to do:
During my data processing I will need to get the shape of my datasets quite often, especially within the class itself. However, I do not want to call numpy every time I do
self.get_dimensions()
as I think this would always go though the process of analysing the array. Is there a way to calculate the shape variable just once and then share it within the class so I save computation time?
My Problem is more complicated, as I need to first open files, read them and the from this get the shape of the data, so I really want to avoid doing this every time I want to get the shape...
I hope you see my problem
thanks!! 😊
EDIT:
My question has already been answered, however I wanted to ask a follow up question if this would also be efficient:
import numpy as np
class Data:
def __init__(self, data):
self.data = data
self.dimensions = self._get_dimensions()
def _get_dimensions(self):
return np.shape(self.data)
test = Data(np.random.random((20, 15)))
print(test.dimensions)
I ask this, because with the method you guys described I have to calculate it at least once somewhere, before I can get the dimensions. Would this way also always go through the calculation process, or store it just once?
Thanks again! 😊
Sure, you could do it like this:
class Data:
def __init__(self, data):
self.data = data
self.dimensions = None
def get_dimensions(self):
self.dimensions = (np.shape(self.data) if
self.dimensions is None else
self.dimensions)
return self.dimensions
If you ever need to modify self.data and recalculate self.dimensions, you could be served better with a keyword argument to specify whether you'd like to recalculate the result. Ex:
def get_dimensions(self, calculate=False):
self.dimensions = (np.shape(self.data)
if calculate or self.dimensions is None
else self.dimensions)
return self.dimensions
You can cache the result as a member variable (if I understood the question correctly):
import numpy as np
class Data:
def __init__(self, data):
self.data = data
self.result = None
def get_dimensions(self):
if not self.result:
self.result = np.shape(self.data)
return self.result
test = Data(np.random.random((20, 15)))
print(test.get_dimensions())
The shape of an array is directly stored on an array and is not a computed value. The shape of the array has to stored as the backing memory is a flat array. Thus (4, 4), (2, 8) and (16,) would have the same backing array. Without storing the shape, the array cannot perform indexing operations. numpy.shape is only really useful for acquiring the shape of array-like objects (such as lists or tuples).
shape = self.data.shape
I missed the last bit where you were concerned about some other large expensive computation that you haven't shown. The best solution is to cache the computed value the first time and return the cached value on later method calls.
To cope with additional computation
from random import random
class Data:
#property
def dimensions(self):
# Do a try/except block as the exception will only every be thrown the first
# time. Secondary invocations will work quicker and not require any checking.
try:
return self._dimensions
except AttributeError:
pass
# some complex computation
self._dimensions = random()
return self._dimensions
d = Data()
assert d.dimensions == d.dimensions
if your dimension is not changing you can do it more simple.
import numpy as np
class Data:
def __init__(self, data):
self.data = data
self.dimension=np.shape(self.data)
def get_dimensions(self):
return self.dimension
test = Data(np.random.random((20, 15)))
print(test.get_dimensions())
I have a weird problem with an assignment I got. We are supposed to implement a matrix class. Well, it's not that hard, but Python just won't do as I tell it to. But I'm sure there is an explanation.
The problem is that, in the following code, I try to save values (provided in a list) into a matrix.
class simplematrix:
matrix = [[]]
def __init__(self, * args):
lm = args[0]
ln = args[1]
values = args[2]
self.matrix = [[0]*ln]*lm
for m in range(lm):
for n in range(ln):
self.matrix[m][n] = values[(m*ln)+n]
vals = [0,1,2,3,4,5]
a = simplematrix(2,3,vals)
When I try to print the matrix, I expect to get [[0,1,2],[3,4,5]], which I get if I run it by hand, on a piece of paper. If I print the matrix from Python I get [[3,4,5],[3,4,5]] instead.
Can anyone tell me why Python acts like this, or if I made some stupid mistake somewhere? :)
The problem is in [[0]*ln]*lm. The result consists of lm references to the same list, so when you modify one row, all rows appear to change.
Try:
self.matrix = [[0]*ln for i in xrange(lm)]
The answers by Tim and aix correct your mistake, but that step isn't even necessary, you can do the whole thing in one line using a list comprehension:
self.matrix = [[values[m*ln+n] for n in range(ln)] for m in range(lm)]
You can also say:
vals = range(6)
as opposed to what you already have. This tidies up your code and makes it more Pythonic.
The problem is that self.matrix = [[0]*ln]*lm doesn't give you a list of lm separate sublists, but a list of lm references to the single same list [[0]*ln].
Try
self.matrix = [[0]*ln for i in range(lm)]
(If you're on Python 2, use xrange(lm) instead).