First, let me show you the codez:
a = array([...])
for n in range(10000):
func_curry = functools.partial(func, y=n)
result = array(map(func_curry, a))
do_something_else(result)
...
What I'm doing here is trying to apply func to an array, changing every time the value of the func's second parameter. This is SLOOOOW (creating a new function every iteration surely does not help), and I also feel I missed the pythonic way of doing it. Any suggestion?
Could a solution that gives me a 2D array be a good idea? I don't know, but maybe it is.
Answers to possible questions:
Yes, this is (using a broad definition), an optimization problem (do_something_else() hides this)
No, scipy.optimize hasn't worked because I'm dealing with boolean values and it never seems to converge.
Did you try numpy.vectorize?
...
vfunc_curry = vectorize(functools.partial(func, y=n))
result = vfunc_curry(a)
...
If a is of significant size the bottleneck should not be the creation of the function, but the duplication of the array.
Can you rewrite the function? If possible, you should write the function to take two numpy arrays a and numpy.arange(n). You may need to reshape to get the arrays to line up for broadcasting.
Related
I'm new to Python so I'm really struggling with this. I want to define a function, have a certain calculation done to it for an array of different values, store those newly calculated values in a new array, and then use those new values in another calculation. My attempt is this:
import numpy as np
from scipy.integrate import quad
radii = np.arange(10) #array of radius values
def rho(r):
return (r**2)
for i in range(len(radii)):
def M[r]: #new array by integrating over values from 0 to radii
scipy.integrate.quad(rho(r), 0, radii[i])
def P(r):
return (5*M[r]) #make new array using values from M[r] calculated above
Alright, this script is a bit of a mess, so let's unpack this. I've never used scipy.integrate.quad but I looked it up, and along with testing it have determined that those are valid arguments for quad. There are more efficient ways to do this, but in the interests of preservation, I'll try to keep the overall structure of your script, just fixing the bugs and errors. So, as I understand it, you want to write this:
import numpy as np
from scipy.integrate import quad
# Here's where we start to make changes. First, we're going to define the function, taking in two parameters, r and the array radii.
# We don't need to specify data types, because Python is a strongly-typed language.
# It is good practice to define your functions before the start of the program.
def M(r, radii):
# The loop goes _inside_ the function, otherwise we're just defining the function M(r) over and over again to a slightly different thing!
for i in range(len(radii)):
# Also note: since we imported quad from scipy.integrate, we only need to reference quad, and in fact referencing scipy.integrate.quad just causes an error!
output[i] = quad(r, 0, radii[i])
# We can also multiply by 5 in this function, so we really only need one. Hell, we don't actually _need_ a function at all,
# unless you're planning to reference it multiple times in other parts of a larger program.
output[i] *= 5
return output
# You have a choice between doing the maths _inside_ the main function or in maybe in a lambda function like this, which is a bit more pythonic than a 1-line normal function. Use like so:
rho = lambda r: r**2
# Beginning of program (this is my example of what calling the function with a list called radii might be)
radii = np.arange(10)
new_array = M(rho, radii)
If this solution is correct, please mark it as accepted.
I hope this helps!
I often find in the situation where I need to be sure that x is an array-like object, regardless of whether it comes to me as a float or as a list.
I ultimately need numpy arrays so I expected that np.array() could be a straightforward solution. But actually the brackets still remain a problem.
The bestsolution I have figured out is
def EnsureArray(x):
if np.isscalar(x):
return np.array([x])
else:
return np.array(np.x)
Is it ok, or is there something better (without defining my own function?)
I'm doing a DataCamp course on statistical thinking in Python. At one point in the course, the instructor advises initializing an empty array before filling it with random floats, e.g.
rand_nums = np.empty(100_000)
for i in range(100_000):
rand_nums[i] = np.random.random()
In theory, is there any reason to initialize an empty array before filling it? Does it save on memory? What is the advantage over the code above as compared to simply the following:
rand_nums = np.random.random(size=100_000)
There is absolutely no reason to do this. The second way is faster, more readable and semantically correct.
Besides that, np.empty actually does NOT initialize the array - it only allocates memory, but now it contains arbitrary data left in memory from this and other programs.
If all the code they provided is just like above, your way of initializing is better.
Their code might lead to something else later on
rand_nums = np.empty(100_000)
for i in range(100_000):
rand_nums[i] = np.random.random()
# maybe they will something else in here later with rand_nums[i]
Alright, so I apologize ahead of time if I'm just asking something silly, but I really thought I understood how apply_along_axis worked. I just ran into something that might be an edge case that I just didn't consider, but it's baffling me. In short, this is the code that is confusing me:
class Leaf(object):
def __init__(self, location):
self.location = location
def __len__(self):
return self.location.shape[0]
def bulk_leaves(child_array, axis=0):
test = np.array([Leaf(location) for location in child_array]) # This is what I want
check = np.apply_along_axis(Leaf, 0, child_array) # This returns an array of individual leafs with the same shape as child_array
return test, check
if __name__ == "__main__":
test, check = bulk_leaves(np.random.ran(100, 50))
test == check # False
I always feel silly using a list comprehension with numpy and then casting back to an array, but I'm just nor sure of another way to do this. Am I just missing something obvious?
The apply_along_axis is pure Python that you can look at and decode yourself. In this case it essentially does:
check = np.empty(child_array.shape,dtype=object)
for i in range(child_array.shape[1]):
check[:,i] = Leaf(child_array[:,i])
In other words, it preallocates the container array, and then fills in the values with an iteration. That certainly is better than appending to the array, but rarely better than appending values to a list (which is what the comprehension is doing).
You could take the above template and adjust it to produce the array that you really want.
for i in range(check.shape[0]):
check[i]=Leaf(child_array[i,:])
In quick tests this iteration times the same as the comprehension. The apply_along_axis, besides being wrong, is slower.
The problem seems to be that apply_along_axis uses isscalar to determine whether the returned object is a scalar, but isscalar returns False for user-defined classes. The documentation for apply_along_axis says:
The shape of outarr is identical to the shape of arr, except along the axis dimension, where the length of outarr is equal to the size of the return value of func1d.
Since your class's __len__ returns the length of the array it wraps, numpy "expands" the resulting array into the original shape. If you don't define a __len__, you'll get an error, because numpy doesn't think user-defined types are scalars, so it will still try to call len on it.
As far as I can see, there is no way to make this work with a user-defined class. You can return 1 from __len__, but then you'll still get an Nx1 2D result, not a 1D array of length N. I don't see any way to make Numpy see a user-defined instance as a scalar.
There is a numpy bug about the apply_along_axis behavior, but surprisingly I can't find any discussion of the underlying issue that isscalar returns False for non-numpy objects. It may be that numpy just decided to punt and not guess whether user-defined types are vector or scalar. Still, it might be worth asking about this on the numpy list, as it seems odd to me that things like isscalar(object()) return False.
However, if as you say you don't care about performance anyway, it doesn't really matter. Just use your first way with the list comprehension, which already does what you want.
Good day, I'm writing a Python module for some numeric work. Since there's a lot of stuff going on, I've been spending the last few days optimizing code to improve calculations times.
However, I have a question concerning Numba.
Basically, I have a class with some fields which are numpy arrays, which I initialize in the following way:
def init(self):
a = numpy.arange(0, self.max_i, 1)
self.vibr_energy = self.calculate_vibr_energy(a)
def calculate_vibr_energy(i):
return numpy.exp(-self.harmonic * i - self.anharmonic * (i ** 2))
So, the code is vectorized, and using Numba's JIT results in some improvement. However, sometimes I need to access the calculate_vibr_energy function from outside the class, and pass a single integer instead of an array in place of i.
As far as I understand, if I use Numba's JIT on the calculate_vibr_energy, it will have to always take an array as an argument.
So, which of the following options is better:
1) Create a new function calculate_vibr_energy_single(i), which will only take a single integer number, and use Numba on it too
2) Replace all usages of the function that are similar to this one:
myclass.calculate_vibr_energy(1)
with this:
tmp = np.array([1])
myclass.calculate_vibr_energy(tmp)[0]
Or are there other, more efficient (or at least, more Python-ic) ways of doing that?
I have only played a little with numba yet so I may be mistaken, but as far as I've understood it, using the "autojit" decorator should give functions that can take arguments of any type.
See e.g. http://numba.pydata.org/numba-doc/dev/pythonstuff.html