Matrix Implementation in Python - python

I am trying to implement a Matrix of Complex numbers in Python. But I am stuck at a particular point in the program. I have two modules Matrix.py, Complex.py and one test program test.py. The module implementation is hosted in Github at https://github.com/Soumya1234/Math_Repository/tree/dev_branch and my test.py is given below
from Matrix import *
from Complex import *
C_init = Complex(2, 0)
print C_init
m1 = Matrix(2, 2, C_init)
m1.print_matrix()
C2= Complex(3, 3)
m1.addValue(1, 1, C2)//This is where all values of the matrix are getting
changed. But I want only the (1,1)th value to be changed to C2
m1.print_matrix()
As mentioned in the comment, the addValue(self,i,j) in Matrix.py is supposed to change the value at the (i,j)th position only. Then why the entire matrix is getting replaced? what I am doing wrong?

If you don't want to implicitly make copies of init_value you could also change Matrix.addValue to this:
def addValue(self,i,j,value):
self.matrix_list[i][j] = value
This is a little more in line with how your Matrix currently works. It's important to remember that a Complex object can't implicitly make a copy of itself, so matrix_list actually has a lot of identical objects (pointers to one object in memory) so if you modify the object in-place, it will change everywhere.
Another tip - try to use the __init__ of Complex meaningfully. You could change this kind of thing:
def __sub__(self,complex_object):
difference=Complex(0,0)
difference.real=self.real-complex_object.real
difference.imag=self.imag-complex_object.imag
return difference
To this:
def __sub__(self, other):
return Complex(self.real - other.real,
self.imag - other.imag)
Which is more concise, doesn't use temporary initialisations or variables, and I find more readable. It might also benefit you to add some kind of .copy() method to Complex, which returns a new Complex object with the same values.
On your methods for string representation - I'd recommend displaying the real and imaginary values as floats, not integers, because they should be real numbers. Here I've rounded them to 2 decimal places:
def __repr__(self):
return "%.2f+j%.2f" %(self.real,self.imag)
Note also that you actually shouldn't need __str__ if it should do the same thing as __repr__. Also, show seems to be doing roughly the same, again.
Also, in Python, there are no private variables, so instead of getReal it's entirely possible to just access it by .real. If you really need getter/setter methods look into #property.
As you're already doing some overloading, I would also recommend implementing addValue in __getitem__, which I think is a good fit of index setting under Python's data model. If you do this:
def __setitem__(self, inds, value):
i, j = inds
self.matrix_list[i][j] = value
You could change the addValue in test.py to this:
m1[1, 1] = C2

The problem is, that in your matrix initialization method you add the same value C_init to all entries of your matrix. As you are not just setting its value in each of the entries but the item itself, you get a huge problem afterwards: as the item stored in (0,0) is the same object as in all other entries, you change all entries together when you just want to change one entry.
You have to modify your initialization method like this:
def __init__(self,x,y,init_value):
self.row=x
self.column=y
self.matrix_list=[[Complex(init_value.getReal(), init_value.getComplex()) for i in range(y)] for j in range(x)]
In this way you add entries of the same value to your matrix, but its not every time a reference to the same object.
Furthermore: just for practicing this is a good example, but if you want to use the Matrix class yo compute something, you should rather use numpy arrays.

Related

Dummy numpy array with all equal values for fast __getitem__

In my code I have a large 2D array from where, inside a double for-loop, I extract an element at the time.
Now, there are situations where the values inside this matrix are all equal. I'm trying to create a "dummy"-array that, when sliced, always returns the same value without actually performing the __getitem__ operation that would be a useless waste of time.
A possible inelegant solution could be to use a lambda function and replace the __getitem__ with a __call__. Something like:
if <values are all equal>:
x = lambda i,j : x0
else:
x = lambda i,j : x_values[i,j]
Then I'd need to replace x[i,j] with x(i,j) inside the code, that would look to something like:
for i in range(max_i):
for j in range(max_j):
...
x(i,j)
...
I find this, however, somehow un-intuitive to read and somehow cumbersome. The thing that I dislike the most is replacing x[i,j] with x(i,j) as it is less intuitive to understand that x(i,j) is a sort of slicing and not a real function.
Another solution could be to code a class like:
class constant_array:
def __init__(self, val):
self.val = val
def __getitem__(self, _)
return self.val
Another issue with both this and the previous method is that x would no longer be a numpy.array. So calls like np.mean(x) would fail.
Is there a better way to create an object that when sliced with x[i,j] always return a constant value independently from i and j without having to change all the following code?

Finding a abstraction for repetitive code: Bootstrap analysis

Intro
There is a pattern that I use all the time in my Python code which analyzes
numerical data. All implementations seem overly redundant or very cumbersome or
just do not play nicely with NumPy functions. I'd like to find a better way to
abstract this pattern.
The Problem / Current State
A method of statistical error propagation is the bootstrap method. It works by
running the same analysis many times with slightly different inputs and look at
the distribution of final results.
To compute the actual value of ams_phys, I have the following equation:
ams_phys = (amk_phys**2 - 0.5 * ampi_phys**2) / aB - amcr
All the values that go into that equation have a statistical error associated
with it. These values are also computed from other equations. For instance
amk_phys is computed from this equation, where both numbers also have
uncertainties:
amk_phys_dist = mk_phys / a_inv
The value of mk_phys is given as (494.2 ± 0.3) in a paper. What I now do is
parametric bootstrap and generate R samples from a Gaussian distribution
with mean 494.2 and standard deviation 0.3. This is what I store in
mk_phys_dist:
mk_phys_dist = bootstrap.make_dist(494.2, 0.3, R)
The same is done for a_inv which is also quoted with an error in the
literature. Above equation is then converted into a list comprehension to yield
a new distribution:
amk_phys_dist = [mk_phys / a_inv
for a_inv, mk_phys in zip(a_inv_dist, mk_phys_dist)]
The first equation is then also converted into a list comprehension:
ams_phys_dist = [
(amk_phys**2 - 0.5 * ampi_phys**2) / aB - amcr
for ampi_phys, amk_phys, aB, amcr
in zip(ampi_phys_dist, amk_phys_dist, aB_dist, amcr_dist)]
To get the end result in terms of (Value ± Error), I then take the average and
standard deviation of this distribution of numbers:
ams_phys_val, ams_phys_avg, ams_phys_err \
= bootstrap.average_and_std_arrays(ams_phys_dist)
The actual value is supposed to be computed with the actual value coming in,
not the mean of this bootstrap distribution. Before I had the code replicated
for that, now I have the original value at the 0th position in the _dist
arrays. The arrays now contain 1 + R elements and the
bootstrap.average_and_std_arrays function will separate that element.
This kind of line occurs for every number that I might want to quote in my
writing. I got annoyed by the writing and created a snippet for it:
$1_val, $1_avg, $1_err = bootstrap.average_and_std_arrays($1_dist)
The need for the snippet strongly told me that I need to do some refactoring.
Also the list comprehensions are always of the following pattern:
foo_dist = [ ... bar ...
for bar in bar_dist]
It feels bad to write bar three times there.
The Class Approach
I have tried to make those _dist things a Boot class such that I would not
write ampi_dist and ampi_val but could just use ampi.val without having
to explicitly call this average_and_std_arrays functions and type a bunch of
names for it.
class Boot(object):
def __init__(self, dist):
self.dist = dist
def __str__(self):
return str(self.dist)
#property
def cen(self):
return self.dist[0]
#property
def val(self):
x = np.array(self.dist)
return np.mean(x[1:,], axis=0)
#property
def err(self):
x = np.array(self.dist)
return np.std(x[1:,], axis=0)
However, this still does not solve the problem of the list comprehensions. I
fear that I still have to repeat myself there three times. I could make the
Boot object inherit from list, such that I could at least write it like
this (without the _dist):
bar = Boot([... foo ... for foo in foo])
Magic Approach
Ideally all those list comprehensions would be gone such that I could just
write
bar = ... foo ...
where the dots mean some non-trivial operation. Those can be simple arithmetic
as above, but that could also be a function call to something that does not
support being called with multiple values (like NumPy function do support).
For instance the scipy.optimize.curve_fit function needs to be called a bunch of times:
popt_dist = [op.curve_fit(linear, mpi, diff)[0]
for mpi, diff in zip(mpi_dist, diff_dist)]
One would have to write a wrapper for that because it does not automatically loops over list of arrays.
Question
Do you see a way to abstract this process of running every transformation with
1 + R sets of data? I would like to get rid of those patterns and the huge
number of variables in each namespace (_dist, _val, _avg, ...) as this
makes passing it to function rather tedious.
Still I need to have a lot of freedom in the ... foo ... part where I need to
call arbitrary functions.

Many independent pseudorandom graphs each with same arbitrary y for any input x

By 'graph' I mean 'function' in the mathematical sense, where you always find one unchanging y value per x value.
Python's random.Random class's seed behaves as the x-coordinate of a random graph and each new call to random.random() gives a new random graph with all new x-y mappings.
Is there a way to directly refer to random.Random's nth graph, or in other words, the nth value in a certain seed's series without calling random.random() n times?
I am making a set of classes that I call Transformers that take any (x,y) coordinates as input and output another pair of (x,y) coordinates. Each transformer has two methods: transform and untransform. One of the transformers that I want adds a random value to the input y coordinate depending on the the input x coordinate. Say that I then want this transformer to untransform(x, y), now I need to subtract the same value I added from y if x is the same. This can be done by setting the seed to the same value it had when I added to y, so acting like the x value. Now say that I want two different instances of the transformer that adds random values to y. My question is about my options for making this new random transformer give different values than the first one.
Since Python 3.4 apparently removes jumpahead, here's some code that implements a convenient pseudorandom dictionary.
from hashlib import sha256 as _sha256
from hmac import HMAC as _HMAC
from math import ldexp as _ldexp
from os import urandom as _urandom
from sys import byteorder as _byteorder
class PRF():
def __init__(self):
digestmod = _sha256
self._h = _HMAC(_urandom(digestmod().block_size), digestmod=digestmod)
def __getitem__(self, key):
h = self._h.copy()
h.update(repr(key).encode())
b = h.digest()
return _ldexp(int.from_bytes(b, _byteorder), (len(b) * (- 8)))
Example usage:
>>> import prf
>>> f = prf.PRF()
>>> f[0]
0.5414241336009658
>>> f[1]
0.5238549618249061
>>> f[1000]
0.7476468534384274
>>> f[2]
0.899810590895144
>>> f[1]
0.5238549618249061
Is there a way to directly refer to random.Random's nth graph, or in other words, the nth value in a certain seed's series without calling random.random() n times?
Yes, sort of; you use Random.jumpahead(). There aren't really separate functions/graphs, though -- there's only one sequence generated by the PRNG -- but you can get into it at any point.
You seem to be still working on the same problem as your last question, and the code I posted in a comment there should cover this:
from random import Random
class IndependentRepeatableRandom(object):
def __init__(self):
self.randgen = Random()
self.origstate = self.randgen.getstate()
def random(self, val):
self.randgen.jumpahead(int(val))
retval = self.randgen.random()
self.randgen.setstate(self.origstate)
return retval
Well you're probably going to need to come up with some more detailed requirements but yes, there are ways:
pre-populate a dictionary with however many terms in the series you require for a given seed and then at run-time simply look the nth term up.
if you're not fussed about the seed values and/or do not require some n terms for any given seed, then find a O(1) way of generating different seeds and only use the first term in each series.
Otherwise, you may want to stop using the built-in python functionality & devise your own (more predictable) algo.
EDIT wrt the new infos:
Ok. so i also looked at your profile & so you are doing something (musical?) other than any new crypto thing. if that's the case, then it's unfortunately mixed blessings, because while you don't require security, you also still won't want (audible) patterns appearing. so you unfortunately probably do still need a strong prng.
One of the transformers that I want adds a random value to the input y
coordinate depending on the the input x coordinate
It's not yet clear to me if there is actually any real requirement for y to depend upon x...
Now say that I want two different instances of the transformer that
adds random values to y. My question is about my options for making
this new random transformer give different values than the first one.
..because here, i'm getting the impression that all you really require is for two different instances to be different in some random way.
But, assuming you have some object containing tuple (x,y) and you really do want a transform function to randomly vary y for the same x; and you want an untransform function to quickly undo any transform operations, then why not just keep a stack of the state changes throughout the lifetime of any single instance of an object; and then in the untransform implementation, you just pop the last transformation off the stack ?

list() function of Python modifies its argument?

(I am quite a newbie in Python, so lots of things puzzle me even after reading the tutorial...)
Initially, I had the code like the following:
strings = ['>abc', 'qwertyu', '>def', 'zxcvbnm']
matrix = zip(*strings)
for member in matrix:
print("".join(member)) # characters are printed as expected
-- which did what I expected. But then for some reason I wanted to determine the number of members in matrix; as len(matrix) gave an error, I decided to copy it with converting to the list: mtxlist = list(matrix). Surprisingly, after this line the content of matrix seems to be changed - or at least I cannot use it the same way as above:
strings = ['>abc', 'qwertyu', '>def', 'zxcvbnm']
matrix = zip(*strings)
mtxlist = list(matrix) # this assignment empties (?) the matrix
for member in matrix:
print("".join(member)) # nothing printed
Can anybody explain what is going on there?
You're using Python 3, correct?
zip returns a generator that can only be iterated once. If you want to use it more than once, then your options are:
Write zip(*strings) each time you need it.
matrix = tuple(zip(*strings))
(iterate matrix as many times as you like. This is the easy option. The downside is that if zip(*strings) is big then it uses a lot of memory that the generator doesn't.)
matrix1, matrix2 = itertools.tee(zip(*strings))
(iterate each of matrix1 and matrix2 once. This is worse than the tuple in your usage, but it's useful if you want to partially consume matrix1, then use some of matrix2, more of matrix1, etc)
def matrix():
return zip(*strings)
# or
matrix = lambda: zip(*strings)
(iterate but using matrix(), not matrix, as many times as you like. Doesn't use extra memory for a copy of the result like the tuple solution, but the syntax for using it is a bit annoying)
class ReusableIterable:
def __init__(self, func):
self.func = func
def __iter__(self):
return iter(self.func())
matrix = ReusableIterable(lambda: zip(*strings))
(iterate using matrix as many times as you like. Deals with the syntax annoyance, although you still have to beware that if you modify strings between iterations over matrix then you'll get different results.)

Empty zeroth element in array/list to eliminate repeated decrementing. Does this improve performance?

I am using Python to solve Project Euler problems. Many require caching the results of past calculations to improve performance, leading to code like this:
pastResults = [None] * 1000000
def someCalculation(integerArgument):
# return result of a calculation performed on numberArgument
# for example, summing the factorial or square of its digits
for eachNumber in range(1, 1000001)
if pastResults[eachNumber - 1] is None:
pastResults[eachNumber - 1] = someCalculation(eachNumber)
# perform additional actions with pastResults[eachNumber - 1]
Would the repeated decrementing have an adverse impact on program performance? Would having an empty or dummy zeroth element (so the zero-based array emulates a one-based array) improve performance by eliminating the repeated decrementing?
pastResults = [None] * 1000001
def someCalculation(integerArgument):
# return result of a calculation performed on numberArgument
# for example, summing the factorial or square of its digits
for eachNumber in range(1, 1000001)
if pastResults[eachNumber] is None:
pastResults[eachNumber] = someCalculation(eachNumber)
# perform additional actions with pastResults[eachNumber]
I also feel that emulating a one-based array would make the code easier to follow. That is why I do not make the range zero-based with for eachNumber in range(1000000) as someCalculation(eachNumber + 1) would not be logical.
How significant is the additional memory from the empty zeroth element? What other factors should I consider? I would prefer answers that are not confined to Python and Project Euler.
EDIT: Should be is None instead of is not None.
Not really an answer to the question regarding the performance, rather a general tip about caching previously calculated values. The usual way to do this is to use a map (Python dict) for this, as this allows to use more complex keys instead of just integer numbers, like floating point numbers, strings, or even tuples. Also, you won't run into problems in case your keys are rather sparse.
pastResults = {}
def someCalculation(integerArgument):
if integerArgument not in pastResults:
pastResults[integerArgument] = # calculation performed on numberArg.
return pastResults[integerArgument]
Also, there is no need to perform the calculations "in order" using a loop. Just call the function for the value you are interested in, and the if statement will take care that, when invoked recursively, the function is called only once for each argument.
Ultimately, if you are using this a lot (as clearly the case for Project Euler) you can define yourself a function decorator, like this one:
def memo(f):
f.cache = {}
def _f(*args, **kwargs):
if args not in f.cache:
f.cache[args] = f(*args, **kwargs)
return f.cache[args]
return _f
What this does is: It takes a function and defines another function that first checks whether the given parameters can be found in the cache, and otherwise calculates the result of the original function and puts it into the cache. Just add the #memo annotation to your function definitions and this will take care of caching for you.
#memo
def someCalculation(integerArgument):
# function body
This is syntactic sugar for someCalculation = memo(someCalculation). Note however, that this will not always work out well. First, the paremters have to be hashable (no lists or other mutable types); second, in case you are passing parameters that are not relevant for the result (e.g., debugging stuff etc.) your cache can grow unnecessarily large, as all the parameters are used as the key.

Categories

Resources