My problem is similar to the quesiton asked here. Differing from this question, I need an algorithm which generates r-tuple permutations of a given list with repeated elements.
On an example:
list1 = [1,1,1,2,2]
for i in permu(list1, 3):
print i
[1,1,1]
[1,1,2]
[1,2,1]
[2,1,1]
[1,2,2]
[2,1,2]
[2,2,1]
It seems that itertools.permutations will work fine here with adding a simple filtering to remove the repeated ones. However in my real cases, lists are much longer than this example and as you already know complexity of itertools.permutations increases exponential as the length of list increases.
So far, what I have is below. This code does the described job, but it is not efficient.
def generate_paths(paths, N = None):
groupdxs = [i for i, group in enumerate(paths) for _ in range(len(group))]
oldCombo = []
result = []
for dxCombo in itertools.permutations(groupdxs, N):
if dxCombo <= oldCombo: # as simple filter
continue
oldCombo = dxCombo
parNumbers = partialCombinations(dxCombo, len(paths))
if not parNumbers.count(0) >= len(paths)-1: # all of nodes are coming from same path, same graph
groupTemps = []
for groupInd in range(len(parNumbers)):
groupTemp = [x for x in itertools.combinations(paths[groupInd], parNumbers[groupInd])]
groupTemps.append(groupTemp)
for parGroups in itertools.product(*groupTemps):
iters = [iter(group) for group in parGroups]
p = [next(iters[i]) for i in dxCombo]
result.append(p)
return result
def partialCombinations(combo, numGruops):
tempCombo = list(combo)
result = list([0] * numGruops)
for x in tempCombo:
result[x] += 1
return result
In first for loop, I need to generate all possible r-length tuples which makes algorithm slower. There is a good solution for permutation without using r-length in above link. How can I adopt this algorithm to mine? Or is there any better ways?
I haven't thought this through very well for your case, but here's another approach.
Instead of giving large lists to permutations, you could give a small list that has no duplicates. You can use combinations_with_replacement to generate these smaller lists (you'll need to filter them to match the quantities of duplicates from your original input) and then get the permutations of each combination.
possible_values = (1,2)
n_positions = 3
sorted_combinations = itertools.combinations_with_replacement(possible_values, n_positions)
unique_permutations = set()
for combo in sorted_combinations:
# TODO: Do filtering for acceptable combinations before passing to permutations.
for p in itertools.permutations(combo):
unique_permutations.add(p)
print "len(unique_permutations) = %i. It should be %i^%i = %i.\nPermutations:" % (len(unique_permutations), len(possible_values), n_positions, pow(len(possible_values), n_positions))
for p in unique_permutations:
print p
Related
Given a list containing N sublists of multiple lengths, find all unique combinations of a k size, selecting only one element from each sublist.
The order of the elements in the combination is not relevant: (a, b) = (b, a)
sample_k = 2
sample_list = [['B1','B2','B3'], ['T1','T2'], ['L1','L2','L3','L4']]
expected_output =
[
('B1', 'T1'),('B1', 'T2'),('B1', 'L1'),('B1', 'L2'),('B1', 'L3'),('B1', 'L4'),
('B2', 'T1'),('B2', 'T2'),('B2', 'L1'),('B2', 'L2'),('B2', 'L3'),('B2', 'L4'),
('B3', 'T1'),('B3', 'T2'),('B3', 'L1'),('B3', 'L2'),('B3', 'L3'),('B3', 'L4'),
('T1', 'L1'),('T1', 'L2'),('T1', 'L3'),('T1', 'L4'),
('T2', 'L1'),('T2', 'L2'),('T2', 'L3'),('T2', 'L4')
]
Extra points for a pythonic way of doing it
Speed/Efficiency matters, the idea is to use in a list with hundreds of lists ranging from 5 to 50 in length
What I have been able to accomplish so far:
Using for and while loops to move pointers and build the answer, however I am having a hard time figuring out how to include K parameter to set the size of tuple combination dinamically. (not really happy about it)
def build_combinations(lst):
result = []
count_of_lst = len(lst)
for i, sublist in enumerate(lst):
if i == count_of_lst - 1:
continue
else:
for item in sublist:
j = 0
while i < len(lst)-1:
while j <= len(lst[i+1])-1:
comb = (item, lst[i+1][j])
result.append(comb)
j = j + 1
i = i + 1
j = 0
i = 0
return result
I've seen many similar questions in stack overflow, but none of them addressed the parameters the way I am trying to (one item from each list, and the size of the combinations being a params of function)
I tried using itertools combinations, product, permutation and flipping them around without success. Whenever using itertools I have either a hard time using only one item from each list, or not being able to set the size of the tuple I need.
I tried NumPy using arrays and a more math/matrix approach, but didn't go too far. There's definitely a way of solving with NumPy, hence why I tagged numpy as well
You need to combine two itertools helpers, combinations to select the two unique ordered lists to use, then product to combine the elements of the two:
from itertools import combinations, product
sample_k = 2
sample_list = [['B1','B2','B3'], ['T1','T2'], ['L1','L2','L3','L4']]
expected_output = [pair
for lists in combinations(sample_list, sample_k)
for pair in product(*lists)]
print(expected_output)
Try it online!
If you want to get really fancy/clever/ugly, you can push all the work down to the C layer with:
from itertools import combinations, product, starmap, chain
sample_k = 2
sample_list = [['B1','B2','B3'], ['T1','T2'], ['L1','L2','L3','L4']]
expected_output = list(chain.from_iterable(starmap(product, combinations(sample_list, sample_k))))
print(expected_output)
That will almost certainly run meaningfully faster for huge inputs (especially if you can loop the results from chain.from_iterable directly rather than realizing them as a list), but it's probably not worth the ugliness unless you're really tight for cycles (I wouldn't expect much more than a 10% speed-up, but you'd need to benchmark to be sure).
So I wrote a model that computes results over various parameters via a nested loop. Each computation returns a list of len(columns) = 10 elements, which is added to a list of lists (res).
Say I compute my results for some parameters len(alpha) = 2, len(gamma) = 2, rep = 3, where rep is the number of repetitions that I run. This yields results in the form of a list of lists like this:
res = [ [elem_1, ..., elem_10], ..., [elem_1, ..., elem_10] ]
I know that len(res) = len(alpha) * len(gamma) * repetitions = 12 and that each inner list has len(columns) = 10 elements. I also know that every 3rd list in res is going to be a repetition (which I know from the way I set up my nested loops to iterate over all parameter combinations, in fact I am using itertools).
I now want to average the result list of lists. What I need to do is to take every (len(res) // repetitions) = 4th list , add them together element-wise, and divide by the number of repetitions (3). Sounded easier than done, for me.
Here is my ugly attempt to do so:
# create a list of lists of lists, where the inner list of lists are lists of the runs with the identical parameters alpha and gamma
res = [res[i::(len(res)//rep)] for i in range(len(res)//rep)]
avg_res = []
for i in res:
result = []
for j in (zip(*i)):
result.append(sum(j))
avg_res.append([i/repetitions for i in result])
print(len(result_list), avg_res)
This actually yields, what I want, but it surely is not the pythonic way to do it. Ugly as hell and 5 minutes later I can hardly make sense of my own code...
What would be the most pythonic way to do it? Thanks in advance!
In some cases a pythonic code is a matter of style, one of its idioms is using list comprehension instead of loop so writing result = [sum(j) for j in (zip(*i))] is simpler than iterating over zip(*i).
On the other hand nested list comprehension looks more complex so don't do
avg_res = [[i/repetitions for i in [sum(j) for j in (zip(*j))]] for j in res]
You can write:
res = [res[i::(len(res)//rep)] for i in range(len(res)//rep)]
avg_res = []
for i in res:
result = [sum(j) for j in (zip(*i))]
avg_res.append([i/repetitions for i in result])
print(len(result_list), avg_res)
Another idiom in Programming in general (and in python in particular) is naming operations with functions, and variable names, to make the code more readable:
def sum_columns(list_of_rows):
return [sum(col) for col in (zip(*list_of_rows))]
def align_alpha_and_gamma(res):
return [res[i::(len(res)//rep)] for i in range(len(res)//rep)]
aligned_lists = align_alpha_and_gamma(res)
avg_res = []
for aligned_list in aligned_lists:
sums_of_column= sum_columns(aligned_list)
avg_res.append([sum_of_column/repetitions for sum_of_column in sums_of_column])
print(len(result_list), avg_res)
Off course you can choose better names according to what you want to do in the code.
It was a bit hard to follow your instructions, but as I caught, you attempt to try sum over all element in N'th list and divide it by repetitions.
res = [list(range(i,i+10)) for i in range(10)]
N = 4
repetitions = 3
average_of_Nth_lists = sum([num for iter,item in enumerate(res) for num in item if iter%N==0])/repetitions
print(average_of_Nth_lists)
output:
85.0
explanation for the result: equals to sum(0-9)+sum(4-13)+sum(8-17) = 255 --> 255/3=85.0
created res as a list of lists, iterate over N'th list (in my case, 1,5,9 you can transform it to 4,8 etc if that what you are wish, find out where in the code or ask for help if you don't get it), sum them up and divide by repetitions
I have some strings,
['SGALWDV', 'GALWDVP', 'ALWDVPS', 'LWDVPSP', 'WDVPSPV']
These strings partially overlap each other. If you manually overlapped them you would get:
SGALWDVPSPV
I want a way to go from the list of overlapping strings to the final compressed string in python. I feel like this must be a problem that someone has solved already and am trying to avoid reinventing the wheel. The methods I can imagine now are either brute force or involve getting more complicated by using biopython and sequence aligners than I would like. I have some simple short strings and just want to properly merge them in a simple way.
Does anyone have any advice on a nice way to do this in python? Thanks!
Here is a quick sorting solution:
s = ['SGALWDV', 'GALWDVP', 'ALWDVPS', 'LWDVPSP', 'WDVPSPV']
new_s = sorted(s, key=lambda x:s[0].index(x[0]))
a = new_s[0]
b = new_s[-1]
final_s = a[:a.index(b[0])]+b
Output:
'SGALWDVPSPV'
This program sorts s by the value of the index of the first character of each element, in an attempt to find the string that will maximize the overlap distance between the first element and the desired output.
My proposed solution with a more challenging test list:
#strFrag = ['SGALWDV', 'GALWDVP', 'ALWDVPS', 'LWDVPSP', 'WDVPSPV']
strFrag = ['ALWDVPS', 'SGALWDV', 'LWDVPSP', 'WDVPSPV', 'GALWDVP', 'LWDVPSP', 'ALWDVPS']
for repeat in range(0, len(strFrag)-1):
bestMatch = [2, '', ''] #overlap score (minimum value 3), otherStr index, assembled str portion
for otherStr in strFrag[1:]:
for x in range(0,len(otherStr)):
if otherStr[x:] == strFrag[0][:len(otherStr[x:])]:
if len(otherStr)-x > bestMatch[0]:
bestMatch = [len(otherStr)-x, strFrag.index(otherStr), otherStr[:x]+strFrag[0]]
if otherStr[:-x] == strFrag[0][-len(otherStr[x:]):]:
if x > bestMatch[0]:
bestMatch = [x, strFrag.index(otherStr), strFrag[0]+otherStr[-x:]]
if bestMatch[0] > 2:
strFrag[0] = bestMatch[2]
strFrag = strFrag[:bestMatch[1]]+strFrag[bestMatch[1]+1:]
print(strFrag)
print(strFrag[0])
Basically the code compares every string/fragment to the first in list and finds the best match (most overlap). It consolidates the list progressively, merging the best matches and removing the individual strings. Code assumes that there are no unfillable gaps between strings/fragments (Otherwise answer may not result in longest possible assembly. Can be solved by randomizing the starting string/fragment). Also assumes that the reverse complement is not present (poor assumption with contig assembly), which would result in nonsense/unmatchable strings/fragments. I've included a way to restrict the minimum match requirements (changing bestMatch[0] value) to prevent false matches. Last assumption is that all matches are exact. To enable flexibility in permitting mismatches when assembling the sequence makes the problem considerably more complex. I can provide a solution for assembling with mismatches upon request.
To determine the overlap of two strings a and b, you can check if any prefix of b is a suffix of a. You can then use that check in a simple loop, aggregating the result and slicing the next string in the list according to the overlap.
lst = ['SGALWDV', 'GALWDVP', 'ALWDVPS', 'LWDVPSP', 'WDVPSPV']
def overlap(a, b):
return max(i for i in range(len(b)+1) if a.endswith(b[:i]))
res = lst[0]
for s in lst[1:]:
o = overlap(res, s)
res += s[o:]
print(res) # SGALWDVPSPV
Or using reduce:
from functools import reduce # Python 3
print(reduce(lambda a, b: a + b[overlap(a,b):], lst))
This is probably not super-efficient, with complexity of about O(n k), with n being the number of strings in the list and k the average length per string. You can make it a bit more efficient by only testing whether the last char of the presumed overlap of b is the last character of a, thus reducing the amount of string slicing and function calls in the generator expression:
def overlap(a, b):
return max(i for i in range(len(b)) if b[i-1] == a[-1] and a.endswith(b[:i]))
Here's my solution which borders on brute force from the OP's perspective. It's not bothered by order (threw in a random shuffle to confirm that) and there can be non-matching elements in the list, as well as other independent matches. Assumes overlap means not a proper subset but independent strings with elements in common at the start and end:
from collections import defaultdict
from random import choice, shuffle
def overlap(a, b):
""" get the maximum overlap of a & b plus where the overlap starts """
overlaps = []
for i in range(len(b)):
for j in range(len(a)):
if a.endswith(b[:i + 1], j):
overlaps.append((i, j))
return max(overlaps) if overlaps else (0, -1)
lst = ['SGALWDV', 'GALWDVP', 'ALWDVPS', 'LWDVPSP', 'WDVPSPV', 'NONSEQUITUR']
shuffle(lst) # to verify order doesn't matter
overlaps = defaultdict(list)
while len(lst) > 1:
overlaps.clear()
for a in lst:
for b in lst:
if a == b:
continue
amount, start = overlap(a, b)
overlaps[amount].append((start, a, b))
maximum = max(overlaps)
if maximum == 0:
break
start, a, b = choice(overlaps[maximum]) # pick one among equals
lst.remove(a)
lst.remove(b)
lst.append(a[:start] + b)
print(*lst)
OUTPUT
% python3 test.py
NONSEQUITUR SGALWDVPSPV
%
Computes all the overlaps and combines the largest overlap into a single element, replacing the original two, and starts process over again until we're down to a single element or no overlaps.
The overlap() function is horribly inefficient and likely can be improved but that doesn't matter if this isn't the type of matching the OP desires.
Once the peptides start to grow to 20 aminoacids cdlane's code chokes and spams (multiple) incorrect answer(s) with various amino acid lengths.
Try to add and use AA sequence 'VPSGALWDVPS' with or without 'D' and the code starts to fail its task because the N-and C-terminus grow and do not reflect what Adam Price is asking for. The output is: 'SGALWDVPSGALWDVPSPV' and thus 100% incorrect despite the effort.
Tbh imo there is only one 100% answer and that is to use BLAST and its protein search page or BLAST in the BioPython package. Or adapt cdlane's code to reflect AA gaps, substitutions and AA additions.
Dredging up an old thread, but had to solve this myself today.
For this specific case, where the fragments are already in order, and each overlap by the same amount (in this case 1), the following fairly simply concatenation works, though might not be the worlds most robust solution:
lst = ['SGALWDV', 'GALWDVP', 'ALWDVPS', 'LWDVPSP', 'WDVPSPV']
reference = "SGALWDVPSPV"
string = "".join([i[0] for i in lst] + [lst[-1][1:]])
reference == string
True
I am using two architecture programs, with visual programming plugins (Grasshopper for Rhino and Dynamo for Revit - for those that know / are interested)
Grasshopper contains a function called 'Jitter' this will shuffle a list, however it has an input from 0.0 to 1.0 which controls the degree of shuffling - 0.0 results in no shuffling 1.0 produces a complete shuffle.
The second of the programs (Dynamo) does not contain this functionality. It contains a shuffle module (which contains a seed value) however it is a complete random shuffle.
Ultimately the goal is to produce a series of solid and glazed panels, but to produce a slight random effect (but avoiding large clumping of solid and glazed elements - hence I want a "light shuffle")
I have written a code which will calculate the number of glazed(True) and solid(False) values required and then evenly distribute True and False values based on the number of items and percent specified.
I have checked out the random module reference however I'm not familiar with the various distributions as described.
Could someone help out or point me in the right direction if an existing function would achieve this.
(I have cheated slightly by adding True False alternately to make up the correct number of items within the list - list3 is the final list, list2 contains the repeated module of true falses)
Many thanks
import math
import random
percent = 30
items = 42
def remainder():
remain = items % len(list2)
list3.append(True)
remain -= 1
while remain > 0 :
list3.append(False)
remain -= 1
return list3
#find module of repeating True and False values
list1 = ([True] + [False] * int((100/percent)-1))
#multiply this list to nearest multiple based on len(items)
list2 = list1 * int(items/(100/percent))
# make a copy of list2
list3 = list2[:]
#add alternating true and false to match len(list3) to len(items)
remainder()
#an example of a completely shuffled list - which is not desired
shuffled = random.sample(list3, k = len(list3))
Here is an approach based on this paper which proves a result about the mixing time needed to scramble a list by using swaps of adjacent items
from random import choice
from math import log
def jitter(items,percent):
n = len(items)
m = (n**2 * log(n))
items = items[:]
indices = list(range(n-1))
for i in range(int(percent*m)):
j = choice(indices)
items[j],items[j+1] = items[j+1],items[j]
return items
A test, each line showing the result of jitter with various percents being applied to the same list:
ls = list(('0'*20 + '1'*20)*2)
for i in range(11):
p = i/10.0
print(''.join(jitter(ls,p)))
Typical output:
00000000000000000000111111111111111111110000000000000000000011111111111111111111
00000000000000111100001101111011011111001010000100010001101000110110111111111111
00000000100100000101111110000110111101000001110001101001010101100011111111111110
00000001010010011011000100111010101100001111011100100000111010110111011001011111
00100001100000001101010000011010011011111011001100000111011011111011010101011101
00000000011101000110000110000010011001010110011111100100111101111011101100111110
00110000000001011001000010110011111101001111001001100101010011010111111011101100
01101100000100100110000011011000001101111111010100000100000110111011110011011111
01100010110100010100010100011000000001000101100011111011111011111011010100011111
10011100101000100010001100100000100111001111011011000100101101101010101101011111
10000000001000111101101011000011010010110011010101110011010100101101011110101110
I'm not sure how principled the above is, but it seems like a reasonable place to start.
There's no clear definition of what "degree of shuffling" (d) means, so you'll need to choose one. One option would be: "the fraction of items remaining unshuffled is (1-d)".
You could implement that as:
Produce a list of indices
Remove (1-d)*N of them
Shuffle the rest
Reinsert the ones removed
Use these to look up values from the original data
def partial_shuffle(x, d):
"""
x: data to shuffle
d: fraction of data to leave unshuffled
"""
n = len(x)
dn = int(d*n)
indices = list(range(n))
random.shuffle(indices)
ind_fixed, ind_shuff = indices[dn:], indices[:dn]
# copy across the fixed values
result = x[:]
# shuffle the shuffled values
for src, dest in zip(ind_shuff, sorted(ind_shuff)):
result[dest] = x[src]
return result
The other algorithms you're referring to are probably using the Fisher-Yates shuffle under the hood.
This O(n) shuffle starts with the first element of an array and swaps it with a random higher element, then swaps the second element with a random higher element, and so on.
Naturally, stopping this shuffle before you reach the last element at some fraction [0,1] would give a partially-randomized array, like you want.
Unfortunately, the effect of the foregoing is that all the "randomness" builds up on one side of the array.
Therefore, make a list of array indices, shuffle these completely, and then use the indices as an input to the Fisher-Yates algorithm to partially sort the original array.
I believe I found a more versatile, robust, and a consistent way to implement this "adjustable shuffling" technique.
import random
import numpy as np
def acc_shuffle(lis, sr, array=False, exc=None): # "sr" = shuffling rate
if type(lis) != list: # Make it compatible with shuffling (mxn) numpy.ndarrays
arr = lis
shape = arr.shape
lis = list(arr.reshape(-1))
lis = lis[:] # Done, such that any changes applied on "lis" wont affect original input list "x"
indices = list(range(len(lis)))
if exc is not None: # Exclude any indices if necessary
for ele in sorted(exc, reverse=True):
del indices[ele]
shuff_range = int(sr * len(lis) / 2) # How much to shuffle (depends on shuffling rate)
if shuff_range < 1:
shuff_range = 1 # "At least one shuffle (swap 2 elements)"
for _ in range(shuff_range):
i = random.choice(indices)
indices.remove(i) # You can opt not to remove the indices for more flexibility
j = random.choice(indices)
indices.remove(j)
temp = lis[i]
lis[i] = lis[j]
lis[j] = temp
if array is True:
return np.array(lis).reshape(shape)
return lis
I am trying to write an algorithm that would pick N distinct items from an sequence at random, without knowing the size of the sequence in advance, and where it is expensive to iterate over the sequence more than once. For example, the elements of the sequence might be the lines of a huge file.
I have found a solution when N=1 (that is, "pick exactly one element at random from a huge sequence"):
import random
items = range(1, 10) # Imagine this is a huge sequence of unknown length
count = 1
selected = None
for item in items:
if random.random() * count < 1:
selected = item
count += 1
But how can I achieve the same thing for other values of N (say, N=3)?
If your sequence is short enough that reading it into memory and randomly sorting it is acceptable, then a straightforward approach would be to just use random.shuffle:
import random
arr=[1,2,3,4]
# In-place shuffle
random.shuffle(arr)
# Take the first 2 elements of the now randomized array
print arr[0:2]
[1, 3]
Depending upon the type of your sequence, you may need to convert it to a list by calling list(your_sequence) on it, but this will work regardless of the types of the objects in your sequence.
Naturally, if you can't fit your sequence into memory or the memory or CPU requirements of this approach are too high for you, you will need to use a different solution.
Use reservoir sampling. It's a very simple algorithm that works for any N.
Here is one Python implementation, and here is another.
Simplest I've found is this answer in SO, improved a bit below:
import random
my_list = [1, 2, 3, 4, 5]
how_big = 2
new_list = random.sample(my_list, how_big)
# To preserve the order of the list, you could do:
randIndex = random.sample(range(len(my_list)), how_big)
randIndex.sort()
new_list = [my_list[i] for i in randIndex]
If you have python version of 3.6+ you can use choices
from random import choices
items = range(1, 10)
new_items = choices(items, k = 3)
print(new_items)
[6, 3, 1]
#NPE is correct, but the implementations that are being linked to are sub-optimal and not very "pythonic". Here's a better implementation:
def sample(iterator, k):
"""
Samples k elements from an iterable object.
:param iterator: an object that is iterable
:param k: the number of items to sample
"""
# fill the reservoir to start
result = [next(iterator) for _ in range(k)]
n = k - 1
for item in iterator:
n += 1
s = random.randint(0, n)
if s < k:
result[s] = item
return result
Edit As #panda-34 pointed out the original version was flawed, but not because I was using randint vs randrange. The issue is that my initial value for n didn't account for the fact that randint is inclusive on both ends of the range. Taking this into account fixes the issue. (Note: you could also use randrange since it's inclusive on the minimum value and exclusive on the maximum value.)
Following will give you N random items from an array X
import random
list(map(lambda _: random.choice(X), range(N)))
It should be enough to accept or reject each new item just once, and, if you accept it, throw out a randomly chosen old item.
Suppose you have selected N items of K at random and you see a (K+1)th item. Accept it with probability N/(K+1) and its probabilities are OK. The current items got in with probability N/K, and get thrown out with probability (N/(K+1))(1/N) = 1/(K+1) so survive through with probability (N/K)(K/(K+1)) = N/(K+1) so their probabilities are OK too.
And yes I see somebody has pointed you to reservoir sampling - this is one explanation of how that works.
As aix mentioned reservoir sampling works. Another option is generate a random number for every number you see and select the top k numbers.
To do it iteratively, maintain a heap of k (random number, number) pairs and whenever you see a new number insert to the heap if it is greater than smallest value in the heap.
This was my answer to a duplicate question (closed before I could post) that was somewhat related ("generating random numbers without any duplicates"). Since, it is a different approach than the other answers, I'll leave it here in case it provides additional insight.
from random import randint
random_nums = []
N = # whatever number of random numbers you want
r = # lower bound of number range
R = # upper bound of number range
x = 0
while x < N:
random_num = randint(r, R) # inclusive range
if random_num in random_nums:
continue
else:
random_nums.append(random_num)
x += 1
The reason for the while loop over the for loop is that it allows for easier implementation of non-skipping in random generation (i.e. if you get 3 duplicates, you won't get N-3 numbers).
There's one implementation from the numpy library.
Assuming that N is smaller than the length of the array, you'd have to do the following:
# my_array is the array to be sampled from
assert N <= len(my_array)
indices = np.random.permutation(N) # Generates shuffled indices from 0 to N-1
sampled_array = my_array[indices]
If you need to sample the whole array and not just the first N positions, then you can use:
import random
sampled_array = my_array[random.sample(len(my_array), N)]