I have an assignment to code kfold(n, n_folds) function, which is supposed to work exactly like sklearn.model_selection.KFold: divide list of integers from 1 to n into n % k_folds groups of n // k_folds + 1 elements and n - n % k groups of n // k_folds elements (this part works well & optimized), then for each group get tuple (all integers from 1 to n not being part of the group, the group) My own realization of this last step just hits TL verdict hard (yet outputs correct answers).
def kfold(n, n_folds):
elements = np.arange(0, n)
slices = np.array(np.array_split(elements, n_folds))
ans = []
for slice in slices:
ans.append((np.setdiff1d(elements, slice), slice))
return ans
kfold(10, 3) as example returns:
[([5, 6, 7, 8, 9, 10], [1, 2, 3, 4]), ([1, 2, 3, 4, 8, 9, 10], [5, 6, 7]), ([1, 2, 3, 4, 5, 6, 7], [8, 9, 10])]
I believe the problem is my code is not fully vectorized, employing one cycle instead of numpy methods. I've read documentation of setdiff1d, setxor1d and likewise functions. While they work well dividing a single slice, I cannot see a way to make them execute all the n_folds slices simultaneously. Is there a way to make this functions work? If there is a nice alternative solution, I'd like to hear about it too
Related
There is this question, 189 Rotate array on Leetcode. enter link description here Its statement is "Given an array, rotate the array to the right by k steps, where k is non-negative."
To understand it better, here's an example.
enter image description here
So, My code for this is
for _ in range(k):
j = nums[-1]
nums.remove(nums[-1])
nums.insert(0, j)
It cannot pass some of the test cases in it.
In the discussion panel, I found a code claiming it got submitted successfully that went like
for _ in range(k):
nums.insert(0, nums.pop(-1))
I would like to know, what is the difference between these two and why my code isn't able to pass some of the test cases.
If you do this on python shell [].remove.__doc__, you'll see the purpose of list.remove is to:
Remove first occurrence of value. Raises ValueError if the value is
not present.
In your code nums.remove(nums[-1]) does not remove the last item, but first occurrence of the value of your last item.
E.g.
If you have a list with values nums = [2, 4, 8, 3, 4] and if you do nums.remove(nums[-1]) the content of nums becomes [2, 8, 3, 4] not [2, 4, 8, 3] that you're expecting.
Just use slicing:
>>> def rotate(l, n):
... return l[-n:] + l[:-n]
...
>>> lst = [1, 2, 3, 4, 5, 6, 7]
>>> rotate(lst, 1)
[7, 1, 2, 3, 4, 5, 6]
>>> rotate(lst, 2)
[6, 7, 1, 2, 3, 4, 5]
>>> rotate(lst, 3)
[5, 6, 7, 1, 2, 3, 4]
In your code j = nums[-1] and you are trying to insert(0, nums[-1]).
In
for _ in range(k):
nums.insert(0, nums.pop(-1))
inserting other number - (0, nums.pop(-1))
Answer given by cesarv is good and simple, but on large arrays numpy will definitely perform better. Consider:
np.roll(lst, n)
The remove() method takes a single element as an argument and removes it's first occurence from the list.
The pop() method takes a single argument (index). The argument passed to the method is optional. If not passed, the default index -1 is passed as an argument (index of the last item).
If the test case has the same item before the last index then test case will fail.
Hence to correct your code replace remove with pop
for _ in range(k):
poppedElement = nums.pop()
nums.insert(0, poppedElement)
or to make it even concise -
for _ in range(k):
nums.insert(0, nums.pop())
I wrote a program which is supposed to perform a recursive merge sort, I firstly created a fusion method to merge two list using indexs
Here is the fusion method
def Fusion (L,d,f,m):
aux=[]
i_d=d
i_m=m
while (i_d<=m-1 and i_m<=f-1):
if(L[i_d]<=L[i_m]):
aux.append(L[i_d])
i_d+=1
else:
aux.append(L[i_m])
i_m+=1
while (i_d<=m-1):
aux.append(L[i_d])
i_d+=1
while (i_m<=f-1):
aux.append(L[i_m])
i_m+=1
for j in range(len(aux)):
L[j]=aux[j]
This method works correctly, I tested it with two sorted lists.
The merge method here
def sort_fusion(T,a,b):
if (a<b):
m=(a+b)//2
sort_fusion(T,a,m)
sort_fusion(T,m+1,b)
Fusion(T,a,b,m)
To run the program
k1 =[3,1,0,9,1,2,6,8,5,7]
a=0
b=len(k1)
sort_fusion(k1,a,b)
The program gives this [2, 5, 6, 6, 7, 8, 1, 8, 5, 7] as output giving this input [3, 1, 0, 9, 1, 2, 6, 8, 5, 7]
I can't understand the program's behaviour, when I comment the sort_fusion(T,m+1,b) instruction, its gives me this [0, 1, 2, 3, 6, 8, 5, 7, 9, 1] as output, for me the algorithm is just fine, when I follow its progress.
Can anyone point out the problem.
L[j]=aux[j]
Here, you probably need
L[j + d] = aux[j]
The reason is that you are taking values from L, that start from the index d. When you write the auxiliary array back into L, you want to write the numbers to the same place.
Premise: My question is not a duplicate of Cyclic rotation in Python . I am not asking how to resolve the problem or why my solution does not work, I have already resolved it and it works. My question is about another particular solution to the same problem I found, because I would like to understand the logic behind the other solution.
I came across the following cyclic array rotation problem (below the sources):
Cyclic rotation in Python
https://app.codility.com/programmers/lessons/2-arrays/cyclic_rotation/
An array A consisting of N integers is given. Rotation of the array means that each element is shifted right by one index, and the last element of the array is moved to the first place. For example, the rotation of array A = [3, 8, 9, 7, 6] is [6, 3, 8, 9, 7] (elements are shifted right by one index and 6 is moved to the first place).
The goal is to rotate array A K times; that is, each element of A will be shifted to the right K times.
which I managed to solve with the following Python code:
def solution(A , K):
N = len(A)
if N < 1 or N == K:
return A
K = K % N
for x in range(K):
tmp = A[N - 1]
for i in range(N - 1, 0, -1):
A[i] = A[i - 1]
A[0] = tmp
return A
Then, on the following website https://www.martinkysel.com/codility-cyclicrotation-solution/, I have found the following fancy solution to the same problem:
def reverse(arr, i, j):
for idx in xrange((j - i + 1) / 2):
arr[i+idx], arr[j-idx] = arr[j-idx], arr[i+idx]
def solution(A, K):
l = len(A)
if l == 0:
return []
K = K%l
reverse(A, l - K, l -1)
reverse(A, 0, l - K -1)
reverse(A, 0, l - 1)
return A
Could someone explain me how this particular solution works? (The author does not explain it on his website)
My solution does not perform quite well for large A and K, where K < N, e.g.:
A = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] * 1000
K = 1000
expectedResult = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] * 1000
res = solution(A, K) # 1455.05908203125 ms = almost 1.4 seconds
Because for K < N, my code has a time complexity of O(N * K), where N is the length of the array.
For large K and small N (K > N), my solution performs well thanks to the modulo operation K = K % N:
A = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
K = 999999999999999999999999
expectedRes = [2, 3, 4, 5, 6, 7, 8, 9, 10, 1]
res = solution(A, K) # 0.0048828125 ms, because K is torn down to 9 thanks to K = K % N
The other solution, on the other hand, performs greatly in all cases, even when N > K and has a complexity of O(N).
What is the logic behind that solution?
Thank you for the attention.
Let me talk first the base case with K < N, the idea in this case is to split the array in two parts A and B, A is the first N-K elements array and B the last K elements. the algorithm reverse A and B separately and finally reverse the full array (with the two part reversed separately). To manage the case with K > N, think that every time you reverse the array N times you obtain the original array again so we can just use the module operator to find where to split the array (reversing only the really useful times avoiding useless shifting).
Graphical Example
A graphical step by step example can help understanding better the concept. Note that
The bold line indicate the the splitting point of the array (K = 3 in this example);
The red array indicate the input and the expected output.
Starting from:
look that what we want in front of the final output will be the last 3 letter reversed, for now let reverse it in place (first reverse of the algorithm):
now reverse the first N-K elements (second reverse of the algorithm):
we already have the solution but in the opposite direction, we can solve it reversing the whole array (third and last reverse of the algorithm):
Here the final output, the original array cyclical rotated with K = 3.
Code Example
Let give also another step by step example with python code, starting from:
A = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
K = 22
N = len(A)
we find the splitting index:
K = K%N
#2
because, in this case, the first 20 shift will be useless, now we reverse the last K (2) elements of the original array:
reverse(A, N-K, N-1)
# [1, 2, 3, 4, 5, 6, 7, 8, 10, 9]
as you can see 9 and 10 has been shift, now we reverse the first N-K elements:
reverse(A, 0, N-K-1)
# [8, 7, 6, 5, 4, 3, 2, 1, 10, 9]
And, finally, we reverse the full array:
reverse(A, 0, N-1)
# [9, 10, 1, 2, 3, 4, 5, 6, 7, 8]
Note that reversing an array have time complexity O(N).
Here is a very simple solution in Ruby. (scored 100% in codility)
Remove the last element in the array, and insert it in the beginning.
def solution(a, k)
if a.empty?
return []
end
modified = a
1.upto(k) do
last_element = modified.pop
modified = modified.unshift(last_element)
end
return modified
end
For a given exclude_list = [3, 5, 8], n = 30, k = 5
I'd like to pick 5(k) random numbers between 1 and 30.
But I should not pick numbers in the exclude_list
Suppose exclude_list, n could be potentially large.
When there's no need for exclusion, it is easy to get k random samples
rand_numbers = sample(range(1, n), k)
So to get the answer, I could do
sample(set(range(1, n)) - set(exclude_numbers), k)
I read that range keeps one number in memory at a time.
I'm not quite sure how it affects the two lines above.
The first question is, does the following code puts all n numbers in memory or does it put each number at a time?
rand_numbers = sample(range(1, n), k)
2nd question is, if the above code indeed puts one number at a time in memory, can I do the similar with the additional constraint of the exclusion list?
Sample notes in sample's docstring:
To choose a sample in a range of integers, use range as an argument.
This is especially fast and space efficient for sampling from a
large population: sample(range(10000000), 60)
I can test this on my machine:
In [11]: sample(range(100000000), 3)
Out[11]: [70147105, 27647494, 41615897]
In [12]: list(range(100000000)) # crash/takes a long time
One way to sample with an exclude list efficiently is to use the same range trick but "hop over" the exclusions (we can do this in O(k * log(len(exclude_list))) with the bisect module:
import bisect
import random
def sample_excluding(n, k, excluding):
# if we assume excluding is unique and sorted we can avoid the set usage...
skips = [j - i for i, j in enumerate(sorted(set(excluding)))]
s = random.sample(range(n - len(skips)), k)
return [i + bisect.bisect_right(skips, i) for i in s]
and we can see it working:
In [21]: sample_excluding(10, 3, [2, 4, 7])
Out[21]: [6, 3, 9]
In [22]: sample_excluding(10, 3, [1, 2, 8])
Out[22]: [0, 4, 3]
In [23]: sample_excluding(10, 6, [1, 2, 8])
Out[23]: [0, 7, 9, 6, 3, 5]
Specifically we've done this without using O(n) memory:
In [24]: sample_excluding(10000000, 6, [1, 2, 8])
Out[24]: [1495143, 270716, 9490477, 2570599, 8450517, 8283229]
For some reason, my code below isn't giving me the right solution to solve the sample variance of x= [7, 6, 8, 4, 2, 7, 6, 7, 6, 5]. The solution should be 3.067, but I keep getting 11.044 and I have no idea why. Can someone help?
def var_method_0(x):
n = len(x) # Number of samples
mean=sum(x)/n
variance=sum([((mean-i)**2) for i in range(n)])/(n-1)
return variance
Try this:
def var_method(lst):
n = len(lst)
mean = sum(lst) / float(n)
return sum((mean - x) ** 2 for x in lst) / float(n - 1)
I fixed two things:
Making sure that the divisions are returning float values (not a problem in Python 3.x, but could mess things in Python 2.x)
In the line where you calculate the actual variance, you don't want to iterate over the range, but over the actual values of the list of numbers
Apart from that I did a bit of cleaning-up and renaming, to make the code more concise. Now it works as expected:
var_method([7, 6, 8, 4, 2, 7, 6, 7, 6, 5])
=> 3.066666666666667