What is the best way to touch two following values in an numpy array?
example:
npdata = np.array([13,15,20,25])
for i in range( len(npdata) ):
print npdata[i] - npdata[i+1]
this looks really messed up and additionally needs exception code for the last iteration of the loop.
any ideas?
Thanks!
numpy provides a function diff for this basic use case
>>> import numpy
>>> x = numpy.array([1, 2, 4, 7, 0])
>>> numpy.diff(x)
array([ 1, 2, 3, -7])
Your snippet computes something closer to -numpy.diff(x).
How about range(len(npdata) - 1) ?
Here's code (using a simple array, but it doesn't matter):
>>> ar = [1, 2, 3, 4, 5]
>>> for i in range(len(ar) - 1):
... print ar[i] + ar[i + 1]
...
3
5
7
9
As you can see it successfully prints the sums of all consecutive pairs in the array, without any exceptions for the last iteration.
You can use ediff1d to get differences of consecutive elements. More generally, a[1:] - a[:-1] will give the differences of consecutive elements and can be used with other operators as well.
Related
I am taking the Data Science course on DataCamp.On one of the examples there were some kind of lack of an explanation about the numpy addittion rules. I am sending the picture of the example and the question below. What i did not understood was how a 2 array with diffrent values can be add up and give a solution like that.
DataCamp Numpy example
Code Python
In [1]:
np.array([True, 1, 2]) + np.array([3, 4, False])
Out[1]:
array([4, 5, 2])
You can think of a numpy 1d array as a list in python.
In fact you can see this if you case to a list like this:
# cast to a list
a = np.array([True, 1, 2]).tolist()
b = np.array([3, 4, False]).tolist()
# print them out
print(a) # [1,1,2]
print(b) # [3,4,0]
returns this:
[1, 1, 2]
[3, 4, 0]
You are then just adding each element of the lists.
a[0]+b[0] , a[1]+b[1], a[2]+b[2]
So the (numpy) result is this:
[4,5,2]
Because you are using numpy (which is a module in python) the plus (+) operation returns the result as a numpy list (which is the sum of both lists).
Note: numpy arrays are similar, but not identical to python lists.
I'm trying to do some calculation (mean, sum, etc.) on a list containing numpy arrays.
For example:
list = [array([2, 3, 4]),array([4, 4, 4]),array([6, 5, 4])]
How can retrieve the mean (for example) ?
In a list like [4,4,4] or a numpy array like array([4,4,4]) ?
Thanks in advance for your help!
EDIT : Sorry, I didn't explain properly what I was aiming to do : I would like to get the mean of i-th index of the arrays. For example, for index 0 :
(2+4+6)/3 = 4
I don't want this :
(2+3+4)/3 = 3
Therefore the end result will be
[4,4,4] / and not [3,4,5]
If L were a list of scalars then calculating the mean could be done using the straight forward expression:
sum(L) / len(L)
Luckily, this works unchanged on lists of arrays:
L = [np.array([2, 3, 4]), np.array([4, 4, 4]), np.array([6, 5, 4])]
sum(L) / len(L)
# array([4., 4., 4.])
For this example this happens to be quitea bit faster than the numpy function
np.mean
timeit(lambda: np.mean(L, axis=0))
# 13.708808058872819
timeit(lambda: sum(L) / len(L))
# 3.4780975924804807
You can use a for loop and iterate through the elements of your array, if your list is not too big:
mean = []
for i in range(len(list)):
mean.append(np.mean(list[i]))
Given a 1d array a, np.mean(a) should do the trick.
If you have a 2d array and want the means for each one separately, specify np.mean(a, axis=1).
There are equivalent functions for np.sum, etc.
https://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html
https://docs.scipy.org/doc/numpy/reference/generated/numpy.sum.html
You can use map
import numpy as np
my_list = [np.array([2, 3, 4]),np.array([4, 4, 4]),np.array([6, 5, 4])]
np.mean(my_list,axis=0) #[4,4,4]
Note: Do not name your variable as list as it will shadow the built-ins
I have a question about how to get the average of every 2 elements in a list in python.
Ex:
a = [1, 3, 4, 1, 5, 2]
In this case, as it needs to compute (1 + 4 + 5)/3 and the next one (3 + 1 + 2)/3. The new list would have the following values:
amean = [3.3333,2]
So far I have managed to average, but I have no idea how to create a loop for it to return and start the average on the second element (3 + 1 + 2)/3.
Here's a piece of what I have done so far:
import numpy as np
a = [1.,3.,4.,1., 5., 2.]
def altElement(my_list):
b = my_list[:len(my_list):2]
print b
return np.mean(b)
print altElement(a)
Does anyone have any idea how to create this loop?
Here's a link for the code that I have done so far:
code
import numpy as np
a = np.asarray([1, 3, 4, 1, 5, 2])
print( a[::2].mean() ) #All Odd Elements
print( a[1::2].mean() ) #All Even Elements
Output:
3.33333333333
2.0
Edit as per comment(every 24 elements)
import numpy as np
a = range(1, 73)
for i in map(None,*[iter(a)]*24):
print( np.array(i).mean() )
Output:
12.5
36.5
60.5
my_list[1::2].mean() will give you the other element.
If you want pure Python and not Numpy:
mean = [sum(a[i::2]) / len(a[i::2]) for i in xrange(2)]
You may also want to add from __future__ import division or map(float, a) to avoid rounding.
Another approach is assuming that you have an even number of elements, you can reshape the array so that the odd elements appear in the first column and the even elements appear in the second column of a 2D array, then take the mean of each column:
b = np.array([a]).reshape(-1,2).mean(axis=0)
Example Output
>>> a = [1.,3.,4.,1., 5., 2.]
>>> b = np.array([a]).reshape(-1,2).mean(axis=0)
>>> b
array([ 3.33333333, 2. ])
The output is of course a NumPy array so if it is desired for you to have a list, simply invoke the tolist() method on the NumPy array:
>> b.tolist()
[3.3333333333333335, 2.0]
The following is an inefficient solution. But because the question is very basic, one might be curious to know the most basic solution first before the efficient solution which can be achieved using numpy or list comprehension
a = [1, 3, 4, 1, 5, 2]
list_1 = []
list_2 = []
for idx, elem in enumerate(a):
if idx % 2 == 0:
list_1.append(elem)
else:
list_2.append(elem)
print("Mean of the first every other elements ", sum(list_1)/float(len(list_1)))
print("Mean of the seond every other elements ", sum(list_2)/float(len(list_2)))
Suppose I have an array
a = np.array([1, 2, 1, 3, 3, 3, 0])
How can I (efficiently, Pythonically) find which elements of a are duplicates (i.e., non-unique values)? In this case the result would be array([1, 3, 3]) or possibly array([1, 3]) if efficient.
I've come up with a few methods that appear to work:
Masking
m = np.zeros_like(a, dtype=bool)
m[np.unique(a, return_index=True)[1]] = True
a[~m]
Set operations
a[~np.in1d(np.arange(len(a)), np.unique(a, return_index=True)[1], assume_unique=True)]
This one is cute but probably illegal (as a isn't actually unique):
np.setxor1d(a, np.unique(a), assume_unique=True)
Histograms
u, i = np.unique(a, return_inverse=True)
u[np.bincount(i) > 1]
Sorting
s = np.sort(a, axis=None)
s[:-1][s[1:] == s[:-1]]
Pandas
s = pd.Series(a)
s[s.duplicated()]
Is there anything I've missed? I'm not necessarily looking for a numpy-only solution, but it has to work with numpy data types and be efficient on medium-sized data sets (up to 10 million in size).
Conclusions
Testing with a 10 million size data set (on a 2.8GHz Xeon):
a = np.random.randint(10**7, size=10**7)
The fastest is sorting, at 1.1s. The dubious xor1d is second at 2.6s, followed by masking and Pandas Series.duplicated at 3.1s, bincount at 5.6s, and in1d and senderle's setdiff1d both at 7.3s. Steven's Counter is only a little slower, at 10.5s; trailing behind are Burhan's Counter.most_common at 110s and DSM's Counter subtraction at 360s.
I'm going to use sorting for performance, but I'm accepting Steven's answer because the performance is acceptable and it feels clearer and more Pythonic.
Edit: discovered the Pandas solution. If Pandas is available it's clear and performs well.
As of numpy version 1.9.0, np.unique has an argument return_counts which greatly simplifies your task:
u, c = np.unique(a, return_counts=True)
dup = u[c > 1]
This is similar to using Counter, except you get a pair of arrays instead of a mapping. I'd be curious to see how they perform relative to each other.
It's probably worth mentioning that even though np.unique is quite fast in practice due to its numpyness, it has worse algorithmic complexity than the Counter solution. np.unique is sort-based, so runs asymptotically in O(n log n) time. Counter is hash-based, so has O(n) complexity. This will not matter much for anything but the largest datasets.
I think this is most clear done outside of numpy. You'll have to time it against your numpy solutions if you are concerned with speed.
>>> import numpy as np
>>> from collections import Counter
>>> a = np.array([1, 2, 1, 3, 3, 3, 0])
>>> [item for item, count in Counter(a).items() if count > 1]
[1, 3]
note: This is similar to Burhan Khalid's answer, but the use of items without subscripting in the condition should be faster.
People have already suggested Counter variants, but here's one which doesn't use a listcomp:
>>> from collections import Counter
>>> a = [1, 2, 1, 3, 3, 3, 0]
>>> (Counter(a) - Counter(set(a))).keys()
[1, 3]
[Posted not because it's efficient -- it's not -- but because I think it's cute that you can subtract Counter instances.]
For Python 2.7+
>>> import numpy
>>> from collections import Counter
>>> n = numpy.array([1,1,2,3,3,3,0])
>>> [x[1] for x in Counter(n).most_common() if x[0] > 1]
[3, 1]
Here's another approach using set operations that I think is a bit more straightforward than the ones you offer:
>>> indices = np.setdiff1d(np.arange(len(a)), np.unique(a, return_index=True)[1])
>>> a[indices]
array([1, 3, 3])
I suppose you're asking for numpy-only solutions, since if that's not the case, it's very difficult to argue with just using a Counter instead. I think you should make that requirement explicit though.
If a is made up of small integers you can use numpy.bincount directly:
import numpy as np
a = np.array([3, 2, 2, 0, 4, 3])
counts = np.bincount(a)
print np.where(counts > 1)[0]
# array([2, 3])
This is very similar your "histogram" method, which is the one I would use if a was not made up of small integers.
If the array is a sorted numpy array, then just do:
a = np.array([1, 2, 2, 3, 4, 5, 5, 6])
rep_el = a[np.diff(a) == 0]
I'm adding my solution to the pile for this 3 year old question because none of the solutions fit what I wanted or used libs besides numpy. This method finds both the indices of duplicates and values for distinct sets of duplicates.
import numpy as np
A = np.array([1,2,3,4,4,4,5,6,6,7,8])
# Record the indices where each unique element occurs.
list_of_dup_inds = [np.where(a == A)[0] for a in np.unique(A)]
# Filter out non-duplicates.
list_of_dup_inds = filter(lambda inds: len(inds) > 1, list_of_dup_inds)
for inds in list_of_dup_inds: print inds, A[inds]
# >> [3 4 5] [4 4 4]
# >> [7 8] [6 6]
>>> import numpy as np
>>> a=np.array([1,2,2,2,2,3])
>>> uniques, uniq_idx, counts = np.unique(a,return_index=True,return_counts=True)
>>> duplicates = a[ uniq_idx[counts>=2] ] # <--- Get duplicates
If you also want to get the orphans:
>>> orphans = a[ uniq_idx[counts==1] ]
Combination of Pandas and Numpy (Utilizing value_counts():
import pandas as pd
import numpy as np
arr=np.array(('a','b','b','c','a'))
pd.Series(arr).value_counts()
OUTPUT:
a 2
b 2
c 1
I've been writing a program to brute force check a sequence of numbers to look for euler bricks, but the method that I came up with involves a triple loop. Since nested Python loops get notoriously slow, I was wondering if there was a better way using numpy to create the array of values that I need.
#x=max side length of brick. User Input.
for t in range(3,x):
a=[];b=[];c=[];
for u in range(2,t):
for v in range(1,u):
a.append(t)
b.append(u)
c.append(v)
a=np.array(a)
b=np.array(b)
c=np.array(c)
...
Is there a better way to generate the array af values, using numpy commands?
Thanks.
Example:
If x=10, when t=3 I want to get:
a=[3]
b=[2]
c=[1]
the first time through the loop. After that, when t=4:
a=[4, 4, 4]
b=[2, 3, 3]
c=[1, 1, 2]
The third time (t=5) I want:
a=[5, 5, 5, 5, 5, 5]
b=[2, 3, 3, 4, 4, 4]
c=[1, 1, 2, 1, 2, 3]
and so on, up to max side lengths around 5000 or so.
EDIT: Solution
a=array(3)
b=array(2)
c=array(1)
for i in range(4,x): #Removing the (3,2,1) check from code does not affect results.
foo=arange(1,i-1)
foo2=empty(len(foo))
foo2.fill(i-1)
c=hstack((c,foo))
b=hstack((b,foo2))
a=empty(len(b))
a.fill(i)
...
Works many times faster now. Thanks all.
Try to use .empty and .fill (http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.fill.html)
There are couple of things which could help, but probably only for large values of x. For starters use xrange instead of range, that will save creating a list you never need. You could also create empty numpy arrays of the correct length and fill them up with the values as you go, instead of appending to a list and then converting it into a numpy array.
I believe this code will work (no python access right this second):
for t in xrange(3, x):
size = (t - 2) * (t - 3)
a = np.zeros(size)
b = np.zeros(size)
c = np.zeros(size)
idx = 0
for u in xrange(2,t):
for v in xrange(1,u):
a[idx] = t
b[idx] = u
c[idx] = v
idx += 1