Dummy numpy array with all equal values for fast __getitem__

Dummy numpy array with all equal values for fast __getitem__ - python

In my code I have a large 2D array from where, inside a double for-loop, I extract an element at the time.
Now, there are situations where the values inside this matrix are all equal. I'm trying to create a "dummy"-array that, when sliced, always returns the same value without actually performing the __getitem__ operation that would be a useless waste of time.
A possible inelegant solution could be to use a lambda function and replace the __getitem__ with a __call__. Something like:
if <values are all equal>:
x = lambda i,j : x0
else:
x = lambda i,j : x_values[i,j]
Then I'd need to replace x[i,j] with x(i,j) inside the code, that would look to something like:
for i in range(max_i):
for j in range(max_j):
...
x(i,j)
...
I find this, however, somehow un-intuitive to read and somehow cumbersome. The thing that I dislike the most is replacing x[i,j] with x(i,j) as it is less intuitive to understand that x(i,j) is a sort of slicing and not a real function.
Another solution could be to code a class like:
class constant_array:
def __init__(self, val):
self.val = val
def __getitem__(self, _)
return self.val
Another issue with both this and the previous method is that x would no longer be a numpy.array. So calls like np.mean(x) would fail.
Is there a better way to create an object that when sliced with x[i,j] always return a constant value independently from i and j without having to change all the following code?

Related

Dynamic advanced indexing of numpy array

I want to implement a function that can compute basic math operations on large array (that won't whole fit in RAM). Therefor I wanted to create a function that will process given operation block by block over selected axis. Main thought of this function is like this:
def process_operation(inputs, output, operation):
shape = inputs[0].shape
for index in range(shape[axis]):
output[index,:] = inputs[0][index:] + inputs[1][index:]
but I want to be able to change the axis by that the blocks should be sliced/indexed.
is it possible to do indexing some sort of dynamic way, not using the ':' syntactic sugar?
I found some help here but so far wasn't much helpful:
thanks

I think you could achieve what you want using python's builtin slice type.
Under the hood, :-expressions used inside square brackets are transformed into instances of slice, but you can also use a slice to begin with. To iterate over different axes of your input you can use a tuple of slices of the correct length.
This might look something like:
def process_operation(inputs, output, axis=0):
shape = inputs[0].shape
for index in range(shape[axis]):
my_slice = (slice(None),) * axis + (index,)
output[my_slice] = inputs[0][my_slice] + inputs[1][my_slice]
I believe this should work with h5py datasets or memory-mapped arrays without any modifications.
Background on slice and __getitem__
slice works in conjunction with the __getitem__ to evaluate the x[key] syntax. x[key] is evaluated in two steps:
If key contains any expressions such as :, i:j or i:j:k then these are de-sugared into slice instances.
key is passed to the __getitem__ method of the object x. This method is responsible for returning the correct value of x[key]
For the example the expressions:
x[2]
y[:, ::2]
are equivalent to:
x.__getitem__(2)
y.__getitem__((slice(None), slice(None, None, 2)))
You can explore how values are converted to slices using a class like the following:
class Sliceable:
def __getitem__(self, key):
print(key)
x = Sliceable()
x[::2] # prints "slice(None, None, 2)"

Matrix Implementation in Python

I am trying to implement a Matrix of Complex numbers in Python. But I am stuck at a particular point in the program. I have two modules Matrix.py, Complex.py and one test program test.py. The module implementation is hosted in Github at https://github.com/Soumya1234/Math_Repository/tree/dev_branch and my test.py is given below
from Matrix import *
from Complex import *
C_init = Complex(2, 0)
print C_init
m1 = Matrix(2, 2, C_init)
m1.print_matrix()
C2= Complex(3, 3)
m1.addValue(1, 1, C2)//This is where all values of the matrix are getting
changed. But I want only the (1,1)th value to be changed to C2
m1.print_matrix()
As mentioned in the comment, the addValue(self,i,j) in Matrix.py is supposed to change the value at the (i,j)th position only. Then why the entire matrix is getting replaced? what I am doing wrong?

If you don't want to implicitly make copies of init_value you could also change Matrix.addValue to this:
def addValue(self,i,j,value):
self.matrix_list[i][j] = value
This is a little more in line with how your Matrix currently works. It's important to remember that a Complex object can't implicitly make a copy of itself, so matrix_list actually has a lot of identical objects (pointers to one object in memory) so if you modify the object in-place, it will change everywhere.
Another tip - try to use the __init__ of Complex meaningfully. You could change this kind of thing:
def __sub__(self,complex_object):
difference=Complex(0,0)
difference.real=self.real-complex_object.real
difference.imag=self.imag-complex_object.imag
return difference
To this:
def __sub__(self, other):
return Complex(self.real - other.real,
self.imag - other.imag)
Which is more concise, doesn't use temporary initialisations or variables, and I find more readable. It might also benefit you to add some kind of .copy() method to Complex, which returns a new Complex object with the same values.
On your methods for string representation - I'd recommend displaying the real and imaginary values as floats, not integers, because they should be real numbers. Here I've rounded them to 2 decimal places:
def __repr__(self):
return "%.2f+j%.2f" %(self.real,self.imag)
Note also that you actually shouldn't need __str__ if it should do the same thing as __repr__. Also, show seems to be doing roughly the same, again.
Also, in Python, there are no private variables, so instead of getReal it's entirely possible to just access it by .real. If you really need getter/setter methods look into #property.
As you're already doing some overloading, I would also recommend implementing addValue in __getitem__, which I think is a good fit of index setting under Python's data model. If you do this:
def __setitem__(self, inds, value):
i, j = inds
self.matrix_list[i][j] = value
You could change the addValue in test.py to this:
m1[1, 1] = C2

The problem is, that in your matrix initialization method you add the same value C_init to all entries of your matrix. As you are not just setting its value in each of the entries but the item itself, you get a huge problem afterwards: as the item stored in (0,0) is the same object as in all other entries, you change all entries together when you just want to change one entry.
You have to modify your initialization method like this:
def __init__(self,x,y,init_value):
self.row=x
self.column=y
self.matrix_list=[[Complex(init_value.getReal(), init_value.getComplex()) for i in range(y)] for j in range(x)]
In this way you add entries of the same value to your matrix, but its not every time a reference to the same object.
Furthermore: just for practicing this is a good example, but if you want to use the Matrix class yo compute something, you should rather use numpy arrays.

Checking input values to methods to reduce the number of computations

I have a number of methods that are independent of each other but are needed collectively to compute an output. Thus, when a variable in any of the methods changes all the methods are called in the computation which is slow and expensive. Here is a quick pesudo-code of what I have:
# o represents an origin variable
# valueA represents a variable which can change
def a (o, valueA):
# calculations
return resultA
def b (o, valueB):
# calculations
return resultA
def c (o, valueC1, valueC2):
# calculations
return resultA
def compute (A, B, C1, C2):
one = self.a(o, A)
two = self.b(one,B)
three = self.c(two, C1, C2)
return img
For example when the value of C1 changes, when calling compute all the methods are calculated despite a & b having no change. What I would like is some way of checking which of the values of A,B,C1,C2 have changed between each call to compute.
I have considered defining a list of the values then on the next call comparing it to the new values being pass to compute. Eg; 1st call: list=[1,2,3,4] on 2nd call list=[1,3,4,5] so b & c need calculating but a is the same. However, I am unsure as to how to go from the comparison to defining which method to call?
Some background on my particular application in case it is of use. I have a wxPython window with sliders that determine values for image processing and an image is drawn on each change of these sliders.
What is the best way to compare each call to compute and remove these wasted repeated computations?

If i have to solve this, I would use a Dictionary, where the key is the valueX (or a list of it if have more than one, in your example C) and the value should be the result of the function.
So, you should have something like that:
{ valueA: resultA, valueB: resultB, [valueC1, valueC2]: resultC }
To do that, in the functions you will have to add it:
def a(o, valueA):
[calcs]
dic[valueA] = resultA
return resultA
[...]
def c(o, valueC1, valueC2)
[calcs]
dic[[valueC1, valueC2]] = resultC
return resultC
And, in the function that computes, you can try to get the value for the parameters and if not get the value, calculate it
def compute (A, B, C1, C2):
one = dic.get(A) if dic.get(A) else self.a(o, A)
two = dic.get(B) if dic.get(B) else self.b(one,B)
three = dic.get([C1,C2]) if dic.get([C1,C1]) else self.c(two, C1, C2)
return img
P.D: this is the "crude" implementation of memoize functions that #holdenweb pointed in his comment.

You could consider making the methods memoizing functions that use a dict to look up the results of previously stored computations (probably best in the class namespace to allow memoizing to optimize across all instances).
The memory requirements could be quite severe, however, if the methods are called with many arguments, in which case you might want to adopt a "publish and subscribe" pattern to try and make your computation more "systolic" (driven by changes in the data, loosely).
That' a couple of approaches. I'm sure SO will think of more.

Memoized to DP solution - Making Change

Recently I read a problem to practice DP. I wasn't able to come up with one, so I tried a recursive solution which I later modified to use memoization. The problem statement is as follows :-
Making Change. You are given n types of coin denominations of values
v(1) < v(2) < ... < v(n) (all integers). Assume v(1) = 1, so you can
always make change for any amount of money C. Give an algorithm which
makes change for an amount of money C with as few coins as possible.
[on problem set 4]
I got the question from here
My solution was as follows :-
def memoized_make_change(L, index, cost, d):
if index == 0:
return cost
if (index, cost) in d:
return d[(index, cost)]
count = cost / L[index]
val1 = memoized_make_change(L, index-1, cost%L[index], d) + count
val2 = memoized_make_change(L, index-1, cost, d)
x = min(val1, val2)
d[(index, cost)] = x
return x
This is how I've understood my solution to the problem. Assume that the denominations are stored in L in ascending order. As I iterate from the end to the beginning, I have a choice to either choose a denomination or not choose it. If I choose it, I then recurse to satisfy the remaining amount with lower denominations. If I do not choose it, I recurse to satisfy the current amount with lower denominations.
Either way, at a given function call, I find the best(lowest count) to satisfy a given amount.
Could I have some help in bridging the thought process from here onward to reach a DP solution? I'm not doing this as any HW, this is just for fun and practice. I don't really need any code either, just some help in explaining the thought process would be perfect.
[EDIT]
I recall reading that function calls are expensive and is the reason why bottom up(based on iteration) might be preferred. Is that possible for this problem?

Here is a general approach for converting memoized recursive solutions to "traditional" bottom-up DP ones, in cases where this is possible.
First, let's express our general "memoized recursive solution". Here, x represents all the parameters that change on each recursive call. We want this to be a tuple of positive integers - in your case, (index, cost). I omit anything that's constant across the recursion (in your case, L), and I suppose that I have a global cache. (But FWIW, in Python you should just use the lru_cache decorator from the standard library functools module rather than managing the cache yourself.)
To solve for(x):
If x in cache: return cache[x]
Handle base cases, i.e. where one or more components of x is zero
Otherwise:
Make one or more recursive calls
Combine those results into `result`
cache[x] = result
return result
The basic idea in dynamic programming is simply to evaluate the base cases first and work upward:
To solve for(x):
For y starting at (0, 0, ...) and increasing towards x:
Do all the stuff from above
However, two neat things happen when we arrange the code this way:
As long as the order of y values is chosen properly (this is trivial when there's only one vector component, of course), we can arrange that the results for the recursive call are always in cache (i.e. we already calculated them earlier, because y had that value on a previous iteration of the loop). So instead of actually making the recursive call, we replace it directly with a cache lookup.
Since every component of y will use consecutively increasing values, and will be placed in the cache in order, we can use a multidimensional array (nested lists, or else a Numpy array) to store the values instead of a dictionary.
So we get something like:
To solve for(x):
cache = multidimensional array sized according to x
for i in range(first component of x):
for j in ...:
(as many loops as needed; better yet use `itertools.product`)
If this is a base case, write the appropriate value to cache
Otherwise, compute "recursive" index values to use, look up
the values, perform the computation and store the result
return the appropriate ("last") value from cache

I suggest considering the relationship between the value you are constructing and the values you need for it.
In this case you are constructing a value for index, cost based on:
index-1 and cost
index-1 and cost%L[index]
What you are searching for is a way of iterating over the choices such that you will always have precalculated everything you need.
In this case you can simply change the code to the iterative approach:
for each choice of index 0 upwards:
for each choice of cost:
compute value corresponding to index,cost
In practice, I find that the iterative approach can be significantly faster (e.g. *4 perhaps) for simple problems as it avoids the overhead of function calls and checking the cache for preexisting values.

How to find the index of an array within an array

I have created an array in the way shown below; which represents 3 pairs of co-ordinates. My issue is I don't seem to be able to find the index of a particular pair of co-ordinates within the array.
import numpy as np
R = np.random.uniform(size=(3,2))
R
Out[5]:
array([[ 0.57150157, 0.46611662],
[ 0.37897719, 0.77653461],
[ 0.73994281, 0.7816987 ]])
R.index([ 0.57150157, 0.46611662])
The following is returned:
AttributeError: 'numpy.ndarray' object has no attribute 'index'
The reason I'm trying to do this is so I can extend a list, with the index of a co-ordinate pair, within a for-loop.
e.g.
v = []
for A in R:
v.append(R.index(A))
I'm just not sure why the index function isn't working, and can't seem to find a way around it.
I'm new to programming so excuse me if this seems like nonsense.

index() is a method of the type list, not of numpy.array. Try:
R.tolist().index(x)
Where x is, for example, the third entry of R. This first convert your array into a list, then you can use index ;)

You can achieve the desired result by converting your inner arrays (the coordinates) to tuples.
R = map(lambda x: (x), R);
And then you can find the index of a tuple using R.index((number1, number2));
Hope this helps!
[Edit] To explain what's going on in the code above, the map function goes through (iterates) the items in the array R, and for each one replaces it with the return result of the lambda function.
So it's equivalent to something along these lines:
def someFunction(x):
return (x)
for x in range(0, len(R)):
R[x] = someFunction(R[x])
So it takes each item and does something to it, putting it back in the list. I realized that it may not actually do what I thought it did (returning (x) doesn't seem to change a regular array to a tuple), but it does help your situation because I think by iterating through it python might create a regular array out of the numpy array.
To actually convert to a tuple, the following code should work
R = map(tuple, R)
(credits to https://stackoverflow.com/a/10016379/2612012)

Numpy arrays don't an index function, for a number of reasons. However, I think you're wanting something different.
For example, the code you mentioned:
v = []
for A in R:
v.append(R.index(A))
Would just be (assuming R has unique rows, for the moment):
v = range(len(R))
However, I think you might be wanting the built-in function enumerate. E.g.
for i, row in enumerate(R):
# Presumably you're doing something else with "row"...
v.append(i)
For example, let's say we wanted to know the indies where the sum of each row was greater than 1.
One way to do this would be:
v = []
for i, row in enumerate(R)
if sum(row) > 1:
v.append(i)
However, numpy also provides other ways of doing this, if you're working with numpy arrays. For example, the equivalent to the code above would be:
v, = np.where(R.sum(axis=1) > 1)
If you're just getting started with python, focus on understanding the first example before worry too much about the best way to do things with numpy. Just be aware that numpy arrays behave very differently than lists.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Dummy numpy array with all equal values for fast getitem - python

Related

Dynamic advanced indexing of numpy array

Matrix Implementation in Python

Checking input values to methods to reduce the number of computations

Memoized to DP solution - Making Change

How to find the index of an array within an array

Categories

Resources