I am new to python.
If i have a (m x n) array, how can i find which column has the maximum repetitions of a particular value eg. 1 ? Is there any easy operation rather than writing iterative loops to do this.
Welcome to python and numpy. You can get the column with the most 1s by first checking which values in your array are 1, then counting that along each column and finally taking the argmax. In code it looks something like this:
>>> import numpy as np
>>> (m, n) = (4, 5)
>>> a = np.zeros((m, n))
>>> a[2, 3] = 1.
>>>
>>> a_eq_1 = a == 1
>>> repetitions = a_eq_1.sum(axis=0)
>>> np.argmax(repetitions)
3
Or more compactly:
>>> np.argmax((a == 1).sum(axis=0))
3
Related
I have a series of 1D arrays of different lengths greater than 1.
I would like to find in s the the numbers that appear together in more than one array and in how many arrays do they appear together.
import numpy as np
import pandas as pd
a=np.array([1,2,3])
b=np.array([])
c=np.array([2,3,4,5,6])
d=np.array([2,3,4,5,6,9,15])
e=np.arra([5,6])
s=pd.Series([a,b,c,d,e])
In this example the desired outcome would be sth like
{[2,3]:3, [5,6]:3, [2,3,4,5,6]:2]}
The expected result does not need to be a dictionary but any structure that contains this information.
Also i would have to to that for >200 series like s so performance also matters for me
I have tried
result=s.value_counts()
but i cant figure out how to proceed
I think what you are missing here is a way to build the combinations of numbers present in each array, to then be able to count how many times each combination appears. To do that you can use stuff like the built-in itertools module:
from itertools import combinations
import numpy as np
a = np.array([1,2,3])
for c in combinations(a, 2):
print(c)
>>> (1, 2)
>>> (1, 3)
>>> (2, 3)
So using this, you can then build a series for each length and check how many times each combination of length 2 happens, how many times each combination of length 3 happens and so on.
import numpy as np
import pandas as pd
a=np.array([1,2,3])
b=np.array([])
c=np.array([2,3,4,5,6])
d=np.array([2,3,4,5,6,9,15])
e=np.array([5,6])
all_arrays = a, b, c, d, e
maxsize = max(array.size for array in all_arrays)
for length in range(2, maxsize+1):
length_N_combs = pd.Series(x for array in all_arrays for x in combinations(array, length) if array.size >= length)
counts = length_N_combs.value_counts()
print(counts[counts>1])
From here you can format the output however you like. Note that you have to exclude arrays that are too short. I'm using a generator comprehension for a slight increase in efficiency, but note that this algorithm is not gonna be cheap anyway, you need a lot of comparisons. Generator comprehensions are a way to condense a generator expression into a one liner (and much more than that). In this case, the above nested comprehension is roughly equivalent to defining a generator that yields from the generator that combinations returns and calling that generator to build the pandas Series. Something like this will give you the same result:
def length_N_combs_generator(arrays, length):
for array in arrays:
if array.size >= length:
yield from combinations(array, length)
for length in range(2, maxsize+1):
s = pd.Series(length_N_combs_generator(all_arrays, length))
counts = s.value_counts()
print(counts[counts>1])
You can use set operations:
from itertools import combinations
from collections import Counter
s2 = s.apply(frozenset).sort_values(key=lambda s: s.str.len(), ascending=False)
c = Counter(x for a,b in combinations(s2, 2) if len(x:=a&b))
# increment all values by 1
for k in c:
c[k] += 1
dict(c)
Output:
{frozenset({2, 3, 4, 5, 6}): 2, frozenset({2, 3}): 3, frozenset({5, 6}): 3}
How to perform a sum just for a list of indices over numpy array, e.g., if I have an array a = [1,2,3,4] and a list of indices to sum, indices = [0, 2] and I want a fast operation to give me the answer 4 because the value for summing value at index 0 and index 2 in a is 4
You can use sum directly after indexing with indices:
a = np.array([1,2,3,4])
indices = [0, 2]
a[indices].sum()
The accepted a[indices].sum() approach copies data and creates a new array, which might cause problem if the array is large. np.sum actually has an argument to mask out colums, you can just do
np.sum(a, where=[True, False, True, False])
Which doesn't copy any data.
The mask array can be obtained by:
mask = np.full(4, False)
mask[np.array([0,2])] = True
Try:
>>> a = [1,2,3,4]
>>> indices = [0, 2]
>>> sum(a[i] for i in indices)
4
Faster
If you have a lot of numbers and you want high speed, then you need to use numpy:
>>> import numpy as np
>>> a = np.array([1,2,3,4])
>>> a[indices]
array([1, 3])
>>> np.sum(a[indices])
4
Suppose I have an array, and I want to compute differences between elements at a distance Delta. I can use numpy.diff(Array[::Delta-1]), but this will not give all possible differences (from each possible starting point). To get them, I can think of something like this:
for j in xrange(Delta-1):
NewDiff = numpy.diff(Array[j::Delta-1])
if j==0:
Diff = NewDiff
else:
Diff = numpy.hstack((Diff,NewDiff))
But I would be surprised if this is the most efficient way to do it. Any idea from those familiar with the most exoteric functionalities of numpy?
The following function returns a two-dimensional numpy array diff which contains the differences between all possible combinations of a list or numpy array a. For example, diff[3,2] would contain the result of a[3] - a[2] and so on.
def difference_matrix(a):
x = np.reshape(a, (len(a), 1))
return x - x.transpose()
Update
It seems I misunderstood the question and you are only asking for an the differences of array elements which are a certain distance d apart.1)
This can be accomplished as follows:
>>> a = np.array([1,3,7,11,13,17,19])
>>> d = 2
>>> a[d:] - a[:-d]
array([6, 8, 6, 6, 6])
Have a look at the documentation to learn more about this notation.
But, the function for the difference matrix I've posted above shall not be in vain. In fact, the array you're looking for is a diagonal of the matrix that difference_matrix returns.
>>> a = [1,3,7,11,13,17,19]
>>> d = 2
>>> m = difference_matrix(a)
>>> np.diag(m, -d)
array([6, 8, 6, 6, 6])
1) Judging by your comment, this distance d is different than the Delta you seem to be using, with d = Delta - 1, so that the distance between an element and itself is 0, and its distance to the adjacent elements is 1.
I have 2 arrays, for the sake of simplicity let's say the original one is a random set of numbers:
import numpy as np
a=np.random.rand(N)
Then I sample and shuffle a subset from this array:
b=np.array() <------size<N
The shuffling I do do not store the index values, so b is an unordered subset of a
Is there an easy way to get the original indexes of b, so they are in the same order as a, say, if element 2 of b has the index 4 in a, create an array of its assignation.
I could use a for cycle checking element by element, but perhaps there is a more pythonic way
Thanks
I think the most computationally efficient thing to do is to keep track of the indices that associate b with a as b is created.
For example, instead of sampling a, sample the indices of a:
indices = random.sample(range(len(a)), k) # k < N
b = a[indices]
On the off chance a happens to be sorted you could do:
>>> from numpy import array
>>> a = array([1, 3, 4, 10, 11])
>>> b = array([11, 1, 4])
>>> a.searchsorted(b)
array([4, 0, 2])
If a is not sorted you're probably best off going with something like #unutbu's answer.
What is the best way to touch two following values in an numpy array?
example:
npdata = np.array([13,15,20,25])
for i in range( len(npdata) ):
print npdata[i] - npdata[i+1]
this looks really messed up and additionally needs exception code for the last iteration of the loop.
any ideas?
Thanks!
numpy provides a function diff for this basic use case
>>> import numpy
>>> x = numpy.array([1, 2, 4, 7, 0])
>>> numpy.diff(x)
array([ 1, 2, 3, -7])
Your snippet computes something closer to -numpy.diff(x).
How about range(len(npdata) - 1) ?
Here's code (using a simple array, but it doesn't matter):
>>> ar = [1, 2, 3, 4, 5]
>>> for i in range(len(ar) - 1):
... print ar[i] + ar[i + 1]
...
3
5
7
9
As you can see it successfully prints the sums of all consecutive pairs in the array, without any exceptions for the last iteration.
You can use ediff1d to get differences of consecutive elements. More generally, a[1:] - a[:-1] will give the differences of consecutive elements and can be used with other operators as well.