I'm trying to write a code that will add 2 arrays(element by element) and store them in a 3rd array.
Basic Logic:
arr3[i] = arr1[i] + arr2[i]
For this, I have created two arrays arr1 and arr2. The result of the sum of arr1 and arr2 is getting appended in an empty array arr3.
code:
from numpy import append, array, int8
arr1 = array([1,2,3,4,5])
arr2 = array([2,4,6,8,10])
len = max(arr1.size,arr2.size)
arr3 = array([],dtype=int8)
for i in range(len):
append(arr3,arr1[i]+arr2[i])
print(arr1[i]+arr2[i])
print(arr3[i])
print(arr3)
In this code, I'm able to refer to elements of arr1 and arr2 and add them, but I'm not able to append the data to arr3.
Can anyone please help me to understand, what is the mistake in the code due to which I'm not able to store the data to arr3?
You can simply use
arr3 = arr1 + arr2
The reason why your code doesn't work is that append doesn't mutate the array, but returns a new one. You can simply modify your code like this:
for i in range(len):
arr3 = append(arr3,arr1[i]+arr2[i])
This could give indexing errors:
max(arr1.size,arr2.size)
if the arrays differ, range over this would produce index values that are too large for the smaller array.
The straight forward way of summing the 2 arrays is
In [79]: arr1 = np.array([1,2,3,4,5])
...: arr2 = np.array([2,4,6,8,10])
In [80]: arr1+arr2
Out[80]: array([ 3, 6, 9, 12, 15])
It makes optimal use of the numpy array definitions, and is fastest.
If you must iterate (for example because you want to learn from your mistakes), use something like (which is actually better if the inputs are lists, not arrays):
In [86]: alist = []
...: for x,y in zip(arr1,arr2):
...: alist.append(x+y)
...:
In [87]: alist
Out[87]: [3, 6, 9, 12, 15]
or better yet as a list comprehension
In [88]: [x+y for x,y in zip(arr1,arr2)]
Out[88]: [3, 6, 9, 12, 15]
I'm using zip instead of the arr1[i] types of range indexing. It's more concise, and less likely to produce errors.
np.append, despite the name, is not a list append clone. Read, if necessary reread, the np.append docs.
append : ndarray
A copy of `arr` with `values` appended to `axis`. Note that
`append` does not occur in-place: a new array is allocated and
filled. If `axis` is None, `out` is a flattened array.
This does work, but is slower:
In [90]: arr3 = np.array([])
...: for x,y in zip(arr1,arr2):
...: arr3 = np.append(arr3,x+y)
...:
In [91]: arr3
Out[91]: array([ 3., 6., 9., 12., 15.])
I would like to remove np.append, since it misleads far too many beginners.
Iteration like this is great for lists, but best avoided when working with numpy arrays. Learn to use the defined numpy operators and methods, and use elementwise iteration only as last resort.
First things first
Do Not Use built-in function name as variable.
len is a built-in function in python.
#sagi's answer is right. Writing the for loop would mean your code is not time-optimized.
But if you still want to understand where your code went wrong, check array shape
import numpy as np
arr3 = np.array([],dtype=int8)
print (arr3.shape)
>>> (0,)
Maybe you can create an empty array of the same shape as arr1 or arr2. Seems like for your problem they have same dimension.
arr3 = np.empty(arr1.shape, dtype=arr1.dtype)
arr3[:] = arr1 + arr2
If you are still persisting to use the dreaded for loop and append then use this--
list3 = []
for x, y in zip(arr1, arr2):
list3.append(x+y)
arr3 = np.asarray(list3)
print(arr3)
>>> array([ 3, 6, 9, 12, 15])
Cheers, good luck!!
Related
Let's say we have a simple 1D ndarray. That is:
import numpy as np
a = np.array([1,2,3,4,5,6,7,8,9,10])
I want to get the first 3 and the last 2 values, so that the output would be [ 1 2 3 9 10].
I have already solved this by merging and concatenating the merged variables as follows :
b= a[:2]
c= a[-2:]
a=np.concatenate([b,c])
However I would like to know if there is a more direct way to achieve this using slices, such as a[:2 and -2:] for instance. As an alternative I already tried this :
a = a[np.r_[:2, -2:]]
but it not seems to be working. It returns me only the first 2 values that is [1 2] ..
Thanks in advance!
Slicing a numpy array needs to be continuous AFAIK. The np.r_[-2:] does not work because it does not know how big the array a is. You could do np.r_[:2, len(a)-2:len(a)], but this will still copy the data since you are indexing with another array.
If you want to avoid copying data or doing any concatenation operation you could use np.lib.stride_tricks.as_strided:
ds = a.dtype.itemsize
np.lib.stride_tricks.as_strided(a, shape=(2,2), strides=(ds * 8, ds)).ravel()
Output:
array([ 1, 2, 9, 10])
But since you want the first 3 and last 2 values the stride for accessing the elements will not be equal. This is a bit trickier, but I suppose you could do:
np.lib.stride_tricks.as_strided(a, shape=(2,3), strides=(ds * 8, ds)).ravel()[:-1]
Output:
array([ 1, 2, 3, 9, 10])
Although, this is a potential dangerous operation because the last element is reading outside the allocated memory.
In afterthought, I cannot find out a way do this operation without copying the data somehow. The numpy ravel in the code snippets above is forced to make a copy of the data. If you can live with using the shapes (2,2) or (2,3) it might work in some cases, but you will only have reading permission to a strided view and this should be enforced by setting the keyword writeable=False.
You could try to access the elements with a list of indices.
import numpy as np
a = np.array([1,2,3,4,5,6,7,8,9,10])
b = a[[0,1,2,8,9]] # b should now be array([ 1, 2, 3, 9, 10])
Obviously, if your array is too long, you would not want to type out all the indices.
Thus, you could build the inner index list from for loops.
Something like that:
index_list = [i for i in range(3)] + [i for i in range(8, 10)]
b = a[index_list] # b should now be array([ 1, 2, 3, 9, 10])
Therefore, as long as you know where your desired elements are, you can access them individually.
I am new to python, Here I have an numpy array. Now, In this ,
I am trying to add an element in the index in the array .
for x in index:
output_result[x:x] = [300]
But it is not getting added, index is the position. where I want to add that element. So, can any one help mw eith this ?
are you maybe looking for something like this:
import numpy as np
a = np.zeros(10) # create numpy array with ten zeros
a = np.where(a == 0, 300, a) # substitute 300 where there are zeros in array - **i assume this is what you need**
print(a) # print generated array
print(type(a)) # print data type to show a numpy array was generated
or do you want to "append" a new element?
With a Python list, you can insert a value with:
In [104]: alist = [0,1,2,3]
In [105]: alist[1:1]=[300]
In [106]: alist
Out[106]: [0, 300, 1, 2, 3]
But this does not work with ndarray. The array size is fixed. The best you can do is create a new array, with original values and the new one(s).
np.insert is a function that can do that. Since the operation is not particularly efficient, it's best to do a whole set of inserts at once, rather than do it iteratively.
In [108]: np.insert(np.arange(4),1,300)
Out[108]: array([ 0, 300, 1, 2, 3])
In [109]: np.insert(np.arange(4),[1,2],[300,400])
Out[109]: array([ 0, 300, 1, 400, 2, 3])
(Even with a list, iterative insertion can be tricky, since each insertion changes the size of the list. The insertion point has to take that into account (unless you iterate from the end).)
I'm trying to do some calculation (mean, sum, etc.) on a list containing numpy arrays.
For example:
list = [array([2, 3, 4]),array([4, 4, 4]),array([6, 5, 4])]
How can retrieve the mean (for example) ?
In a list like [4,4,4] or a numpy array like array([4,4,4]) ?
Thanks in advance for your help!
EDIT : Sorry, I didn't explain properly what I was aiming to do : I would like to get the mean of i-th index of the arrays. For example, for index 0 :
(2+4+6)/3 = 4
I don't want this :
(2+3+4)/3 = 3
Therefore the end result will be
[4,4,4] / and not [3,4,5]
If L were a list of scalars then calculating the mean could be done using the straight forward expression:
sum(L) / len(L)
Luckily, this works unchanged on lists of arrays:
L = [np.array([2, 3, 4]), np.array([4, 4, 4]), np.array([6, 5, 4])]
sum(L) / len(L)
# array([4., 4., 4.])
For this example this happens to be quitea bit faster than the numpy function
np.mean
timeit(lambda: np.mean(L, axis=0))
# 13.708808058872819
timeit(lambda: sum(L) / len(L))
# 3.4780975924804807
You can use a for loop and iterate through the elements of your array, if your list is not too big:
mean = []
for i in range(len(list)):
mean.append(np.mean(list[i]))
Given a 1d array a, np.mean(a) should do the trick.
If you have a 2d array and want the means for each one separately, specify np.mean(a, axis=1).
There are equivalent functions for np.sum, etc.
https://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html
https://docs.scipy.org/doc/numpy/reference/generated/numpy.sum.html
You can use map
import numpy as np
my_list = [np.array([2, 3, 4]),np.array([4, 4, 4]),np.array([6, 5, 4])]
np.mean(my_list,axis=0) #[4,4,4]
Note: Do not name your variable as list as it will shadow the built-ins
I have an array:
a = [1, 3, 5, 7, 29 ... 5030, 6000]
This array gets created from a previous process, and the length of the array could be different (it is depending on user input).
I also have an array:
b = [3, 15, 67, 78, 138]
(Which could also be completely different)
I want to use the array b to slice the array a into multiple arrays.
More specifically, I want the result arrays to be:
array1 = a[:3]
array2 = a[3:15]
...
arrayn = a[138:]
Where n = len(b).
My first thought was to create a 2D array slices with dimension (len(b), something). However we don't know this something beforehand so I assigned it the value len(a) as that is the maximum amount of numbers that it could contain.
I have this code:
slices = np.zeros((len(b), len(a)))
for i in range(1, len(b)):
slices[i] = a[b[i-1]:b[i]]
But I get this error:
ValueError: could not broadcast input array from shape (518) into shape (2253412)
You can use numpy.split:
np.split(a, b)
Example:
np.split(np.arange(10), [3,5])
# [array([0, 1, 2]), array([3, 4]), array([5, 6, 7, 8, 9])]
b.insert(0,0)
result = []
for i in range(1,len(b)):
sub_list = a[b[i-1]:b[i]]
result.append(sub_list)
result.append(a[b[-1]:])
You are getting the error because you are attempting to create a ragged array. This is not allowed in numpy.
An improvement on #Bohdan's answer:
from itertools import zip_longest
result = [a[start:end] for start, end in zip_longest(np.r_[0, b], b)]
The trick here is that zip_longest makes the final slice go from b[-1] to None, which is equivalent to a[b[-1]:], removing the need for special processing of the last element.
Please do not select this. This is just a thing I added for fun. The "correct" answer is #Psidom's answer.
I'm dealing with arrays in python, and this generated a lot of doubts...
1) I produce a list of list reading 4 columns from N files and I store 4 elements for N times in a list. I then convert this list in a numpy array:
s = np.array(s)
and I ask for the shape of this array. The answer is correct:
print s.shape
#(N,4)
I then produce the mean of this Nx4 array:
s_m = sum(s)/len(s)
print s_m.shape
#(4,)
that I guess it means that this array is a 1D array. Is this correct?
2) If I subtract the mean vector s_m from the rows of the array s, I can proceed in two ways:
residuals_s = s - s_m
or:
residuals_s = []
for i in range(len(s)):
residuals_s.append([])
tmp = s[i] - s_m
residuals_s.append(tmp)
if I now ask for the shape of residuals_s in the two cases I obtain two different answers. In the first case I obtain:
(N,4)
in the second:
(N,1,4)
can someone explain why there is an additional dimension?
You can get the mean using the numpy method (producing the same (4,) shape):
s_m = s.mean(axis=0)
s - s_m works because s_m is 'broadcasted' to the dimensions of s.
If I run your second residuals_s I get a list containing empty lists and arrays:
[[],
array([ 1.02649662, 0.43613824, 0.66276758, 2.0082684 ]),
[],
array([ 1.13000227, -0.94129685, 0.63411801, -0.383982 ]),
...
]
That does not convert to a (N,1,4) array, but rather a (M,) array with dtype=object. Did you copy and paste correctly?
A corrected iteration is:
for i in range(len(s)):
residuals_s.append(s[i]-s_m)
produces a simpler list of arrays:
[array([ 1.02649662, 0.43613824, 0.66276758, 2.0082684 ]),
array([ 1.13000227, -0.94129685, 0.63411801, -0.383982 ]),
...]
which converts to a (N,4) array.
Iteration like this usually is not needed. But if it is, appending to lists like this is one way to go. Another is to pre allocate an array, and assign rows
residuals_s = np.zeros_like(s)
for i in range(s.shape[0]):
residuals_s[i,:] = s[i]-s_m
I get your (N,1,4) with:
In [39]: residuals_s=[]
In [40]: for i in range(len(s)):
....: residuals_s.append([])
....: tmp = s[i] - s_m
....: residuals_s[-1].append(tmp)
In [41]: residuals_s
Out[41]:
[[array([ 1.02649662, 0.43613824, 0.66276758, 2.0082684 ])],
[array([ 1.13000227, -0.94129685, 0.63411801, -0.383982 ])],
...]
In [43]: np.array(residuals_s).shape
Out[43]: (10, 1, 4)
Here the s[i]-s_m array is appended to an empty list, which has been appended to the main list. So it's an array within a list within a list. It's this intermediate list that produces the middle 1 dimension.
You are using NumPy ndarray without using the functions in NumPy, sum() is a python builtin function, you should use numpy.sum() instead.
I suggest you change your code as:
import numpy as np
np.random.seed(0)
s = np.random.randn(10, 4)
s_m = np.mean(a, axis=0, keepdims=True)
residuals_s = s - s_m
print s.shape, s_m.shape, residuals_s.shape
use mean() function with axis and keepdims arguments will give you the correct result.