This question already has answers here:
Understanding slicing
(38 answers)
Closed 10 years ago.
The extended slice syntax in python has been explained to me as "a[n:m:k] returns every kth element from n to m".
This gives me a good idea what to expect when k is positive. But I'm lost on how to interpret a[n:m:k] for negative k. I know that a[::-1] reverses a, and that a[::-k] takes ever kth element of the reversed a.
But how is this a generalization of the definition for k positive? I'd like to know how a[n:m:k] is actually defined, so that (for example) I can understand why:
"abcd"[-1:0:-1] = "dcb"
Is a[n:m:-k] reversing the sequence a, then taking the elements with original indices starting from n and ending one before m or something? I don't think so, because this pattern doesn't fit other values of n and m I've tried. But I'm at a loss to figure out how this is actually defined, and searching has gotten me nowhere.
[-1:0:-1] means: start from the index len(string)-1 and move up to 0(not included) and take a step of -1(reverse).
So, the following indexes are fetched:
le-1, le-1-1, le-1-1-1 .... 1 # le is len(string)
example:
In [24]: strs = 'foobar'
In [25]: le = len(strs)
In [26]: strs[-1:0:-1] # the first -1 is equivalent to len(strs)-1
Out[26]: 'raboo'
In [27]: strs[le-1:0:-1]
Out[27]: 'raboo'
The Python documentation (here's the technical one; the explanation for range() is a bit easier to understand) is more correct than the simplified "every kth element" explanation. The slicing parameters are aptly named
slice[start:stop:step]
so the slice starts at the location defined by start, stops before the location stop is reached, and moves from one position to the next by step items.
Related
Is there a simple Python equivalent to R's : operator to create a vector of numbers? I only found range().
Example:
vector_example <- 1:4
vector_example
Output:
[1] 1 2 3 4
You mention range(). That's the standard answer for Python's equivalent. It returns a sequence. If you want the equivalent in a Python list, just create a list from the sequence returned by range():
range_list = list(range(1,5))
Result:
[1, 2, 3, 4]
I don't know 'go', but from your example, it appears that its : operator's second argument is inclusive...that is, that number is included in the resulting sequence. This is not true of Python's range() function. The second parameter passed to it is not included in the resulting sequence. So where you use 4 in your example, you want to use 5 with Python to get the same result.
I remember being frustrated by the lack of : to create sequences of consecutive numbers when I first switched from R to Python. In general, there is no direct equivalent to the : operator. Python sequences are more like R's seq() function.
While the base function range is alright, I personally prefer numpy.arange, as it is more flexible.
import numpy as np
# Create a simple array from 1 through 4
np.arange(1, 5)
# This is what I mean by "more flexible"
np.arange(1, 5).tolist()
Remember that Python lists and arrays are 0-indexed. As far as I'm concerned, all intervals are right-open too. So np.arange(a, b) will exclude b.
PS: There are other functions, such as numpy.linspace which may suit your needs.
I have tried to summarize the problem statement something like this::
Given n, k and an array(a list) arr where n = len(arr) and k is an integer in set (1, n) inclusive.
For an array (or list) myList, The Unfairness Sum is defined as the sum of the absolute differences between all possible pairs (combinations with 2 elements each) in myList.
To explain: if mylist = [1, 2, 5, 5, 6] then Minimum unfairness sum or MUS. Please note that elements are considered unique by their index in list not their values
MUS = |1-2| + |1-5| + |1-5| + |1-6| + |2-5| + |2-5| + |2-6| + |5-5| + |5-6| + |5-6|
If you actually need to look at the problem statement, It's HERE
My Objective
given n, k, arr(as described above), find the Minimum Unfairness Sum out of all of the unfairness sums of sub arrays possible with a constraint that each len(sub array) = k [which is a good thing to make our lives easy, I believe :) ]
what I have tried
well, there is a lot to be added in here, so I'll try to be as short as I can.
My First approach was this where i used itertools.combinations to get all the possible combinations and statistics.variance to check its spread of data (yeah, I know I'm a mess).
Before you see the code below, Do you think these variance and unfairness sum are perfectly related (i know they are strongly related) i.e. the sub array with minimum variance has to be the sub array with MUS??
You only have to check the LetMeDoIt(n, k, arr) function. If you need MCVE, check the second code snippet below.
from itertools import combinations as cmb
from statistics import variance as varn
def LetMeDoIt(n, k, arr):
v = []
s = []
subs = [list(x) for x in list(cmb(arr, k))] # getting all sub arrays from arr in a list
i = 0
for sub in subs:
if i != 0:
var = varn(sub) # the variance thingy
if float(var) < float(min(v)):
v.remove(v[0])
v.append(var)
s.remove(s[0])
s.append(sub)
else:
pass
elif i == 0:
var = varn(sub)
v.append(var)
s.append(sub)
i = 1
final = []
f = list(cmb(s[0], 2)) # getting list of all pairs (after determining sub array with least MUS)
for r in f:
final.append(abs(r[0]-r[1])) # calculating the MUS in my messy way
return sum(final)
The above code works fine for n<30 but raised a MemoryError beyond that.
In Python chat, Kevin suggested me to try generator which is memory efficient (it really is), but as generator also generates those combination on the fly as we iterate over them, it was supposed to take over 140 hours (:/) for n=50, k=8 as estimated.
I posted the same as a question on SO HERE (you might wanna have a look to understand me properly - it has discussions and an answer by fusion which takes me to my second approach - a better one(i should say fusion's approach xD)).
Second Approach
from itertools import combinations as cmb
def myvar(arr): # a function to calculate variance
l = len(arr)
m = sum(arr)/l
return sum((i-m)**2 for i in arr)/l
def LetMeDoIt(n, k, arr):
sorted_list = sorted(arr) # i think sorting the array makes it easy to get the sub array with MUS quickly
variance = None
min_variance_sub = None
for i in range(n - k + 1):
sub = sorted_list[i:i+k]
var = myvar(sub)
if variance is None or var<variance:
variance = var
min_variance_sub=sub
final = []
f = list(cmb(min_variance_sub, 2)) # again getting all possible pairs in my messy way
for r in f:
final.append(abs(r[0] - r[1]))
return sum(final)
def MainApp():
n = int(input())
k = int(input())
arr = list(int(input()) for _ in range(n))
result = LetMeDoIt(n, k, arr)
print(result)
if __name__ == '__main__':
MainApp()
This code works perfect for n up to 1000 (maybe more), but terminates due to time out (5 seconds is the limit on online judge :/ ) for n beyond 10000 (the biggest test case has n=100000).
=====
How would you approach this problem to take care of all the test cases in given time limits (5 sec) ? (problem was listed under algorithm & dynamic programming)
(for your references you can have a look on
successful submissions(py3, py2, C++, java) on this problem by other candidates - so that you can
explain that approach for me and future visitors)
an editorial by the problem setter explaining how to approach the question
a solution code by problem setter himself (py2, C++).
Input data (test cases) and expected output
Edit1 ::
For future visitors of this question, the conclusions I have till now are,
that variance and unfairness sum are not perfectly related (they are strongly related) which implies that among a lots of lists of integers, a list with minimum variance doesn't always have to be the list with minimum unfairness sum. If you want to know why, I actually asked that as a separate question on math stack exchange HERE where one of the mathematicians proved it for me xD (and it's worth taking a look, 'cause it was unexpected)
As far as the question is concerned overall, you can read answers by archer & Attersson below (still trying to figure out a naive approach to carry this out - it shouldn't be far by now though)
Thank you for any help or suggestions :)
You must work on your list SORTED and check only sublists with consecutive elements. This is because BY DEFAULT, any sublist that includes at least one element that is not consecutive, will have higher unfairness sum.
For example if the list is
[1,3,7,10,20,35,100,250,2000,5000] and you want to check for sublists with length 3, then solution must be one of [1,3,7] [3,7,10] [7,10,20] etc
Any other sublist eg [1,3,10] will have higher unfairness sum because 10>7 therefore all its differences with rest of elements will be larger than 7
The same for [1,7,10] (non consecutive on the left side) as 1<3
Given that, you only have to check for consecutive sublists of length k which reduces the execution time significantly
Regarding coding, something like this should work:
def myvar(array):
return sum([abs(i[0]-i[1]) for i in itertools.combinations(array,2)])
def minsum(n, k, arr):
res=1000000000000000000000 #alternatively make it equal with first subarray
for i in range(n-k):
res=min(res, myvar(l[i:i+k]))
return res
I see this question still has no complete answer. I will write a track of a correct algorithm which will pass the judge. I will not write the code in order to respect the purpose of the Hackerrank challenge. Since we have working solutions.
The original array must be sorted. This has a complexity of O(NlogN)
At this point you can check consecutive sub arrays as non-consecutive ones will result in a worse (or equal, but not better) "unfairness sum". This is also explained in archer's answer
The last check passage, to find the minimum "unfairness sum" can be done in O(N). You need to calculate the US for every consecutive k-long subarray. The mistake is recalculating this for every step, done in O(k), which brings the complexity of this passage to O(k*N). It can be done in O(1) as the editorial you posted shows, including mathematic formulae. It requires a previous initialization of a cumulative array after step 1 (done in O(N) with space complexity O(N) too).
It works but terminates due to time out for n<=10000.
(from comments on archer's question)
To explain step 3, think about k = 100. You are scrolling the N-long array and the first iteration, you must calculate the US for the sub array from element 0 to 99 as usual, requiring 100 passages. The next step needs you to calculate the same for a sub array that only differs from the previous by 1 element 1 to 100. Then 2 to 101, etc.
If it helps, think of it like a snake. One block is removed and one is added.
There is no need to perform the whole O(k) scrolling. Just figure the maths as explained in the editorial and you will do it in O(1).
So the final complexity will asymptotically be O(NlogN) due to the first sort.
This question already has answers here:
How can I find the minimum index of the array in this case?
(3 answers)
Closed 3 years ago.
Is there a more optimized solution to solve the stated problem?
Given an array 'arr' of 'N' elements and a number 'M', find the least index 'z' at which the equation gets satisfied. [ ] is considered as floor().
Code:
counts=0
ans=0
while(ans==0):
s=0
for i in range(counts,len(arr)):
s+=int(arr[i]/(i+1-counts))
if(s>M):
break
if((i+1)==len(arr) and s<=M):
print(counts)
ans=1
counts+=1
Explanation:
Check array from left to right. The first index that satisfies the condition is the answer. This is more optimized than considering from right to left.
If at any time during the calculation, 's' is deemed more than M, break the loop and consider the next. This is more optimized than calculating 's' completely.
Example:
INPUT:
N=3 M=3
arr=[1 2 3]
OUTPUT:
0
This would give the answer 0 since the 0th index contains the first element to satisfy the given relation.
Thanks in advance.
If you're working with relatively small arrays, your algorithm is going to be fast enough. Minor improvements could be achieved by reorganizing the code a bit but nothing dramatic.
If you're working with very large arrays, then I would suggest you look into numpy. It is optimized for array wide operations and has impressive performance.
For example, you can divide all the elements in an array by their inverted position in one operation:
terms = arr / np.arange(len(arr),0,-1)
and then get cumulative sums and the first index in a single line
index = np.where(np.cumsum(terms) <= M)
This question already has answers here:
Understanding slicing
(38 answers)
Closed 4 years ago.
I'm new to coding and python. I've taken an intro comp sci class but I still feel out of my depth when trying to understand most code so please forgive me if this question seems poorly placed.
I'm taking an algorithms class on Edx, which has an automatic grader. Starter code is provided for each problem and contains a section like the one below. This section is particularly difficult for me to understand.
I believe what will be fed into the function I write is a list that will look like this [1:2,4:6,7:10], but I'm not really sure.
I'm hoping someone could help me understand this code, so I can design a function around the data.
if __name__ == '__main__':
input = sys.stdin.read()
data = list(map(int, input.split()))
n = data[0]
m = data[1]
starts = data[2:2 * n + 2:2]
ends = data[3:2 * n + 2:2]
points = data[2 * n + 2:]
#use fast_count_segments
cnt = naive_count_segments(starts, ends, points)
for x in cnt:
print(x, end=' ')
Further, I don't really understand how to test this code on my own computer so that I can figure it out on my own. Any help would be much appreaciated. Thanks in advance.
The array slice notation (arr[1:2:1]) selects a portion of a sequence. The notation consists of three colon-separated expressions representing the start, stop, and optional step of the resulting sequence.
An expression like data[2:2 * n + 2:2] signifies a start index of 2, a stop index equal to 2 * n + 2, and a step equal to 2. The result will be a sequence starting from the 2nd index, stopping just before index 2 * n + 2, and proceeding in increments of the step of 2.
class slice(start, stop[, step]) Return a slice object representing
the set of indices specified by range(start, stop, step). The start
and step arguments default to None. Slice objects have read-only data
attributes start, stop and step which merely return the argument
values (or their default). They have no other explicit functionality;
however they are used by Numerical Python and other third party
extensions. Slice objects are also generated when extended indexing
syntax is used. For example: a[start:stop:step] or a[start:stop, i].
See itertools.islice() for an alternate version that returns an
iterator.
https://docs.python.org/3.4/library/functions.html#slice
I'm curious in Python why x[0] retrieves the first element of x while x[-1] retrieves the first element when reading in the reverse order. The syntax seems inconsistent to me since in the one case we're counting distance from the first element, whereas we don't count distance from the last element when reading backwards. Wouldn't something like x[-0] make more sense? One thought I have is that intervals in Python are generally thought of as inclusive with respect to the lower bound but exclusive for the upper bound, and so the index could maybe be interpreted as distance from a lower or upper bound element. Any ideas on why this notation was chosen? (I'm also just curious why zero indexing is preferred at all.)
The case for zero-based indexing in general is succinctly described by Dijkstra here. On the other hand, you have to think about how Python array indexes are calculated. As the array indexes are first calculated:
x = arr[index]
will first resolve and calculate index, and -0 obviously evaluates to 0, it would be quite impossible to have arr[-0] to indicate the last element.
y = -0 (??)
x = arr[y]
would hardly make sense.
EDIT:
Let's have a look at the following function:
def test():
y = x[-1]
Assume x has been declared above in a global scope. Now let's have a look at the bytecode:
0 LOAD_GLOBAL 0 (x)
3 LOAD_CONST 1 (-1)
6 BINARY_SUBSCR
7 STORE_FAST 0 (y)
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
Basically the global constant x (more precisely its address) is pushed on the stack. Then the array index is evaluated and pushed on the stack. Then the instruction BINARY_SUBSCR which implements TOS = TOS1[TOS] (where TOS means Top of Stack). Then the top of the stack is popped into the variable y.
As the BINARY_SUBSCR handles negative array indices, and that -0 will be evaluated to 0 before being pushed to the top of the stack, it would take major changes (and unnecessary changes) to the interpreter to have arr[-0] indicate the last element of the array.
Its mostly for a couple reasons:
Computers work with 0-based numbers
Older programming languages used 0-based indexing since they were low-level and closer to machine code
Newer, Higher-level languages use it for consistency and the same reasons
For more information: https://en.wikipedia.org/wiki/Zero-based_numbering#Usage_in_programming_languages
In many other languages that use 0-based indexes but without negative index implemented as python, to access the last element of a list (array) requires finding the length of the list and subtracting 1 for the last element, like so:
items[len(items) - 1]
In python the len(items) part can simply be omitted with support for negative index, consider:
>>> items = list(range(10))
>>> items[len(items) - 1]
9
>>> items[-1]
9
In python: 0 == -0, so x[0] == x[-0].
Why is sequence indexing zero based instead of one based? It is a choice the language designer should do. Most languages I know of use 0 based indexing. Xpath uses 1 based for selection.
Using negative indexing is also a convention for the language. Not sure why it was chosen, but it allows for circling or looping the sequence by simple addition (subtraction) on the index.