I am trying to find how to apply two functions to a numpy array each one only on half the values.
Here is the code I have been trying
def hybrid_array(xs,height,center,fwhh):
xs[xs<=center] = height*np.exp((-(xs[xs<=center]-center)**2)/(2*(fwhh/(2*np.sqrt(2*np.log(2))))**2))
xs[xs>center] = height*1/(np.abs(1+((center-xs[xs>center])/(fwhh/2))**2))
return xs
However I am overwriting the initial array that is passed to the argument. The usual trick of copying it with a slice ie. the following still changes b.
a = b[:]
c = hybrid_array(a,args)
If there is a better way of doing any part of what I am trying, I would be very grateful if you could let me know as I am still new to numpy arrays.
Thank you
Try copy.deepcopy to copy the array b onto a before calling your function.
import copy
a = copy.deepcopy(b)
c = hybrid_array(a,args)
Alternatively, you can use the copy method of the array
a = b.copy()
c = hybrid_array(a,args)
Note***
You may be wondering, why despite an easier way to copy an array with the copy method of numpy array I introduced the copy.deepcopy. Other's may disagree but here is my reasoning
Using the method deepcopy makes it clear that you are intending to do a deepcopy instead of reference copy
All python's data type do not support the copy method. Numpy has it and good it has but when you are programming with numpy and python you may end up using various numpy and non numpy data types not all of which would support the copy method. To remain consistent I would prefer to use the first.
Copying a NumPy array a is done with a.copy(). In your application, however, there is no need to copy the old data. All you need is a new array of the same shape and dtype as the old one. You can use
result = numpy.empty_like(xs)
to create such an array. If you generally don't want your function to modify its parameter, you should do this inside the function, rather than requiring the caller to take care of this.
Related
I'm quite puzzled by this simple piece of python code:
data = np.arange(2500,8000,100)
logdata = np.zeros((len(data)))+np.nan
randata = logdata
for i in range(len(data)):
logdata[i] = np.log(data[i])
randata[i] = np.log(random.randint(2500,8000))
plt.plot(logdata,randata,'bo')
OK, I don't need a for cycle in this specific instance (I'm just making a simple example), but what really confuses me is the role played by the initialisation of randata. I would expect that in virtue of the for cycle, randata would become a totally different array from logdata, but the two arrays turn out to be the same. I see from older discussions that only way to prevent this from happening, is to initialize randata by its own randata=np.zeros((len(data)))+np.nan or to make a copy randata=logdata.copy() but I don't understand why randata is so deeply linked to logdata in virtue of the for cycle.
If I were to give the following commands
logdata = np.zeros((len(data)))+np.nan
randata = logdata
logdata = np.array([1,2,3])
print(randata)
then randata would still be an array of nan, differently from logdata. Why is so?
Blckknght explains numpy assignment behavior in this post: Numpy array assignment with copy
B = A
This binds a new name B to the existing object already named A.
Afterwards they refer to the same object, so if you modify one in
place, you'll see the change through the other one too.
But to answer why they're "deeply linked" (or rather, point to the same location in memory), it's mostly because copying large arrays is computationally expensive. So in numpy, the assignment = operator references the same block of memory instead of creating a copy at every assignment. If different arrays are desired, we can allocate new memory explicitly using the copy() method. This gives us the efficiency of C/C++ (where avoiding copies is very common by passing around pointers and references) along with the ease-of-use of python (where pointers and references are not available).
I'd say this is a feature, not a bug.
I have a function takes_ownership() that performs an operation on a Numpy array a in place and effectively relies on becoming the owner of the data. Is there a way to alter the passed array such that it no longer points to its original buffer?
The motivation is a code that processes huge chunks of data in place (due to performance constraints) and a common error is unaware users recycling their input arrays.
Note that the question is very specifically not "why is a different if I change b = a in my function". The question is "how can I make using the given array in place safer for unsuspecting users" when I must not make a copy.
def takes_ownership(a):
b = a
# looking for this step
a.set_internal_buffer([])
# if this were C++ std::vectors, I would be looking for
# b = np.array([])
# a.swap(b)
# an expensive operation that invalidates a
b.resize((6, 6), refcheck=False)
# outside references to a no longer valid
return b
a = np.random.randn(5, 5)
b = takes_ownership(a)
# array no longer has data so that users cannot mess up
assert a.shape = ()
NumPy has a copy function that will clone an array (although if this is an object array, i.e. not primitive, there might still be nested object references after cloning). That being said, this is a questionable design pattern and it would probably be better practice to rewrite your code in a way that does not rely on this condition.
Edit: If you can't copy the array, you will probably need to make sure that none of your other code modifies it in place (instead running immutable operations to produce new arrays). Doing things this way may cause more issues down the road, so I'd recommend refactoring so that it is not necessary.
You can use the numpy's copy, which will absolutely do what you want.
Use the code b = np.copy(a) and no further changes are needed.
If you want to make a copy of an object, in general, so that you can call methods on that object then you can use the copy module.
From the linked page:
A shallow copy constructs a new compound object and then (to the
extent possible) inserts references into it to the objects found in
the original.
A deep copy constructs a new compound object and then, recursively,
inserts copies into it of the objects found in the original.
In this case, in your code import copy and then use b = copy.copy(a) and you will get a shallow copy (Which I think should be good enough for numpy arrays, but you'll want to check that yourself).
The hanging question here is why this is needed. Python prefers to use "pass by reference" when calling a function. The assignment operator = does not actually call any constructor for the object on the left-hand side of the operator, rather it assigns it to the reference of the object on the right-hand side. So, when you call a method on a reference (using the dot operator .), either a or b are going to call the same object in memory unless you make a new object with an explicit copy command.
I have a class in python3 that contains a few variables and represents a state.
During the program (simulation) I need to make a big amount of copies of this state so that I can change it and still have the previous information.
The problem is that deepcopy from the copy module is too slow. Would I be better of creating a method in that class to copy an object, which would create and return a new object and copy the values for each variable? Note: inside the object there is a 3D list as well.
Is there any other solution to this? The deepcopy is really too slow, it takes more than 99% of the execution time according to cProfile.
Edit: Would representing the 3D list and other lists as numpy arrays/matrix and copying them with numpy inside a custom copy function be the better way?
For people from the future having the same problem:
What I did was creating a method inside the class that would manually copy the information. I did not override deepcopy, maybe that would be cleaner, maybe not.
I tried with and without numpy for 2D and 3D lists, but appending 2 numpy arrays later in the code was much more time consuming than making a sum of 2 lists (which I did need to do for my specific program).
So I used:
my_list = list(map(list, my_list)) # for 2D list
my_list = [list(map(list, x)) for x in my_list] # for 3D list
I'm confused with how numpy methods are applied to nd-arrays. for example:
import numpy as np
a = np.array([[1,2,2],[5,2,3]])
b = a.transpose()
a.sort()
Here the transpose() method is not changing anything to a, but is returning the transposed version of a, while the sort() method is sorting a and is returning a NoneType. Anybody an idea why this is and what is the purpose of this different functionality?
Because numpy authors decided that some methods will be in place and some won't. Why? I don't know if anyone but them can answer that question.
'in-place' operations have the potential to be faster, especially when dealing with large arrays, as there is no need to re-allocate and copy the entire array, see answers to this question
BTW, most if not all arr methods have a static version that returns a new array. For example, arr.sort has a static version numpy.sort(arr) which will accept an array and return a new, sorted array (much like the global sorted function and list.sort()).
In a Python class (OOP) methods which operate in place (modify self or its attributes) are acceptable, and if anything, more common than ones that return a new object. That's also true for built in classes like dict or list.
For example in numpy we often recommend the list append approach to building an new array:
In [296]: alist = []
In [297]: for i in range(3):
...: alist.append(i)
...:
In [298]: alist
Out[298]: [0, 1, 2]
This is common enough that we can readily write it as a list comprehension:
In [299]: [i for i in range(3)]
Out[299]: [0, 1, 2]
alist.sort operates in-place, sorted(alist) returns a new list.
In numpy methods that return a new array are much more common. In fact sort is about the only in-place method I can think of off hand. That and a direct modification of shape: arr.shape=(...).
A number of basic numpy operations return a view. That shares data memory with the source, but the array object wrapper is new. In fact even indexing an element returns a new object.
So while you ultimately need to check the documentation, it's usually safe to assume a numpy function or method returns a new object, as opposed to operating in-place.
More often users are confused by the numpy functions that have the same name as a method. In most of those cases the function makes sure the argument(s) is an array, and then delegates the action to its method. Also keep in mind that in Python operators are translated into method calls - + to __add__, [index] to __getitem__() etc. += is a kind of in-place operation.
I have a numpy array that I would like to share between a bunch of python processes in a way that doesn't involve copies. I create a shared numpy array from an existing numpy array using the sharedmem package.
import sharedmem as shm
def convert_to_shared_array(A):
shared_array = shm.shared_empty(A.shape, A.dtype, order="C")
shared_array[...] = A
return shared_array
My problem is that each subprocess needs to access rows that are randomly distributed in the array. Currently I create a shared numpy array using the sharedmem package and pass it to each subprocess. Each process also has a list, idx, of rows that it needs to access. The problem is in the subprocess when I do:
#idx = list of randomly distributed integers
local_array = shared_array[idx,:]
# Do stuff with local array
It creates a copy of the array instead of just another view. The array is quite large and manipulating it first before shareing it so that each process accesses a contiguous range of rows like
local_array = shared_array[start:stop,:]
takes too long.
Question: What are good solutions for sharing random access to a numpy array between python processes that don't involve copying the array?
The subprocesses need readonly access (so no need for locking on access).
Fancy indexing induces a copy, so you need to avoid fancy indexing if you want to avoid copies there is no way around it.