I would like to invert a bunch of tensors in a list using cholesky decomposition in tensorflow 2, but the resulting code is quite ugly. is there any elegant / more pythonic way to do something like this :
iMps = []
for Mp in Mps :
cholMp = tf.linalg.cholesky(Mp)
icholMp = tf.linalg.inv(cholMp)
iMp = tf.tensordot(tf.transpose(icholMp),icholMp)
iMps.append(iMp)
is it possible to replace for loop with other stuff ?, Mps is list of tensors with different size (can i represent it as something else?). is there any way to make it more elegant ?
You can achieve this using python Map function.
I have modified your code to create Map function like below.
def inverse_tensors(Mp):
cholMp = tf.linalg.cholesky(Mp)
icholMp = tf.linalg.inv(cholMp)
iMp = tf.tensordot(tf.transpose(icholMp),icholMp,axes=0)
return iMp
iMps = list(map(inverse_tensors,list_tensors))
Hope this answers your question, Happy Learning!
Related
I am using numpy arrays aside from pandas for speed purposes. However, I am unable to advance my codes using broadcasting, indexing etc. Instead, I am using loop in loops as below. It is working but seems so ugly and inefficient to me.
Basically what I am doing is, I am trying to imitate groupby of pandas at the step mydata[mydata[:,1]==i]. You may consider it as a firm id number. Then with respect to the lookup data, I am checking if it is inside the selected firm or not at the step all(np.isin(lookup[u],d[:,3])). But as I denoted at the beginning, I feel so uncomfortable about this.
out = []
for i in np.unique(mydata[:,1]):
d = mydata[mydata[:,1]==i]
for u in range(0,len(lookup)):
control = all(np.isin(lookup[u],d[:,3]))
if(control):
out.append(d[np.isin(d[:,3],lookup[u])])
It takes about 0.27 seconds. However there must exist some clever alternatives.
I also tried Numba jit() but it does not work.
Could anyone help me about that?
Thanks in advance!
Fake Data:
a = np.repeat(np.arange(100)+5000, np.random.randint(50, 100, 100))
b = np.random.randint(100,200,len(a))
c = np.random.randint(10,70,len(a))
index = np.arange(len(a))
mydata = np.vstack((index,a, b,c)).T
lookup = []
for i in range(0,60):
lookup.append(np.random.randint(10,70,np.random.randint(3,6,1) ))
I had some problems getting the goal of your Program, but I got a decent performance improvement, by refactoring your second for loop. I was able to compress your code to 3 or 4 lines.
f = (
lambda lookup: out1.append(d[np.isin(d[:, 3], lookup)])
if all(np.isin(lookup, d[:, 3]))
else None
)
out = []
for i in np.unique(mydata[:, 1]):
d = mydata[mydata[:, 1] == i]
list(map(f, lookups))
This resolves to the same output list you received previously and the code runs almost twice as quick (at least on my machine).
I've discover Halide (the language), a few weeks ago and I actually enjoy trying to optimize some parts of my code with it, nonetheless, I struggle to find an optimized implementation of a very basic image processing task: Normalization
Basically, If I is my grayscale Image, I just want:
I_norm = (I - min(I)) / (max(I) - min(I))
I've managed to come up with this code (Python API of halide but hopefully in C++ it is similar)
def normalize(input: hl.Buffer, height: int, width: int):
low, high, norm_output = hl.Func('low'), hl.Func('high'), hl.Func('norm_output')
x, y = hl.Var('x'), hl.Var('y')
dom = hl.RDom([(0, width), (0, height)])
low[hl._0] = hl.minimum(input[dom.x, dom.y])
high[hl._0] = hl.maximum(input[dom.x, dom.y])
norm_output[x, y] = (input[x, y] - low[0]) / hl.f32(high[0] - low[0])
low.compute_root()
high.compute_root()
norm_output.compute_root().parallel(y).vectorize(x, 8)
return norm_output
This piece of code works quite well (and it is the fastest I could come up with...), but as soon as I use it in a pyramid, let's say I'm doing this:
def get_structure(pyr: List, h: int, w: int, name: str) -> List:
structure = [hl.Func('%s_%i' % (name, i)) for i in range(len(pyr))]
norm_structure = [hl.Func('norm_%s_%i' % (name, i)) for i in range(len(pyr))]
for lv, layer in enumerate(pyr):
structure[lv][x, y] = some_function(layer)[x, y] # return un-normalized "matrix"
# apply my normalization function
for lv, layer in enumerate(pyr):
norm_structure[lv] = normalize(structure[lv], h, w)
return norm_structure
Then everything becomes so slow....
Indeed, if I comment the line:
for lv, layer in enumerate(pyr):
norm_structure[lv] = normalize(structure[lv], h, w)
and return structure instead. My overall pipeline run in under 40ms...
As soon as I put the normalization, it sky-rocket to **0.
So the question is, how can we compute efficiently a normalization in Halide? Like we can do lot's of very complex stuff very efficiently but a simple normalization on the whole domain... ?
Note: I've also added scheduling, for example:
for lv in range(len(pyr)):
norm_structure[lv].compute_root().parallel(y, 4).vectorize(x, 4)
in get_structure(), but obviously it doesn't improve anything
Also, I'm not satisfied with my code, in the sense that In the best halide code that I've found I'm looping twice to get the min and then the max and finally compute the normalization,
will If I was doing that by myself I would maintain 2 variables for min and max in one loop
Note also that I've spend a lot of time to find how to optimize my code, be it through the official halide apps on Github or elsewhere but I didn't find anything to help build that simple function efficiently...
So, Thank you, in advance for the help!
I'm new to python.
I want to make a calculator and I am facing a problem right now.
Here's a simplified code I am trying to make:
from math import *
input = "(2)(3)e(sqrt(49))pi" #This is an example of equation
equation = "(2)*(3)*e*(sqrt(49))*pi" #The output
How can I add " * " between every ")(", ")e", "e(", and others based on the equation so that I can eval (equation) without having to put "*" manually, just like real life math?
I have tried to do it by making a code like this:
from math import *
input = "(2)(3)e(sqrt(49))pi"
input = input.replace(")(", ")*(")
input = input.replace(")e", ")*e")
input = input.replace("e(", "e*(")
input = input.replace(")pi", ")*pi")
#^^^I can loop this using for loop^^^
equation = input
print(eval(equation))
This definitely only works in this equation. I can loop the replacing method but that would be very inefficient. I don't want to have 49 iterations to just check if 7 different symbols need "*" between it or not.
The issue you will encounter here is that "e(" should be transformed to "e*(" but "sqrt(" should stay. As comments have suggested, the best or "cleanest" solution would be to write a proper parser for your equation. You could put "calculator parser" into your favorite search engine for a quick solution, or if you are interested in over-engineering but learning a lot, you could have a look at parser generators such as ANTLr.
If, for some reason, neither of those are an option, a quick-and-dirty solution could be this:
import re
def add_multiplication_symbols(equation: str) -> str:
constants = ['e', 'pi']
constants_re = '|'.join(f'(?:{re.escape(c)})' for c in constants)
equation = re.sub(r'(\))(\(|\w+)', r'\1*\2', equation)
equation = re.sub(f'({constants_re})' + r'(\()', r'\1*\2', equation)
return equation
Then print(add_multiplication_symbols("(2)(3)e(sqrt(49))pi")) results in (2)*(3)*e*(sqrt(49))*pi.
The function makes use of the re module (regular expressions) to group the cases for all constants together. It tries to work around the issue I described above by defining a set of constant variables (e.g. "e" and "pi") by hand.
I am trying to load using scipy loadmat a ground truth file, it return numpy ndarray of type object (dtype='O').
From that object I arrive to access to each element that are also ndarrays but I am struggling from that point to access to either the segmentation or the boundaries image.
I would like a to transform this a list of list of ndarray of numerical types how can I do that ?
Thanks in advance for any help
I found a way to fix my issue.
I do not think it is optimal but it works.
def load_bsd_gt(filename):
gt = loadmat(filename)
gt = gt['groundTruth']
cols = gt.shape[1]
what = ['Segmentation','Boundaries']
ret = list()
for i in range(cols):
j=0
tmp = list()
for w in what:
tmp.append(gt[0][j][w][0][0][:])
j+=1
ret.append(tmp)
return ret
If someone have a better way to do it please feel free to add a comment or an answer.
I am trying to understand if there is an advantage in space/time/programming to storing data from a signal processing system as nested list in either :
data[channel][sample]
data[sample][channel]
I can code processing for both - thou I personally find 1) easy to write and index to then 2).
However, 2) is the more common was my local group programs in and stores the data (either in excel/csv or from the data gathering systems). While it is easy to transpose
dataA = map(list, zip(*dataB))
I was wondering if there are any storage or performance - or even - module compatibility issues with 1 over 2?
with 1) I can loop like this
for R in dataA :
for C in R :
process_channel(C)
matplotlib.loglog(dataA[0], dataA[i])
where dataA[0] is time or frequency and i is some other channel to plot
with 2)
for R in dataB :
for C in R
process_sample(C)
matplotlib.loglog([j[0] for j in dataB],[k[i] for k in dataB])
This looks worse in programming style. Maybe I am missing a list method of making this easier? I have also developed code to used dicts ... but this really breaks with general use. So I am less inclined to continue to use dicts. Although the dict storage is
dataC = list(['f':0.1,'chnl1':100.0],['f':0.2,'chnl1':110.0])
or some such. It seems that to be better integrated option 2 is better. However, I am trying to understand how better to code when using option 2) when you wish to process over channels then samples? Just transpose the matrix first and then do the work in option 1) space and transpose back the results:
dataA = smoothing(dataA, smooth_factor)
def smoothing(d, s) :
td = numpy.transpose(d)
td = map(list, zip(*d))
nd=[]
for row in td :
col = []
for i in xrange(0,len(row)-step,step) :
col.append(sum(row[i:i+step]/step)
nd.append(col)
nd = numpy.transpose(nd)
return nd
while this construction works - transposing back and forth all the time looks - um - inefficient.