Divide and conquer algorith does not work - python

I am trying to resolve the average of numbers using a "divide and conquer" algorithm but the program that i have written only works when the number of elements is power of two (2,4,8,etc).
The code is written in python and it is as follows
def averageDyC(ini, end, notes):
if (ini == end):
return notes[ini]
mid = int((ini + end) / 2)
x = averageDyC(ini, mid, notes)
y = averageDyC(mid + 1, end, notes)
return float((x + y) / 2)
notes = [1, 4, 7, 8, 9, 2, 4]
average = averageDyC(0, len(notas) - 1, notes)
This code displays 3,75 but the correct average is 3,5
How can I chage this code to allow number of elements different of power of 2?

I don't actually think this technique can work for finding the average of a list.
Let's look at a list of three numbers: [1,2,3]
Clearly, the average is 2.
However, if we divide the list in two, we get [1,2] and [3]. Their respective averages are 1.5 and 3.
If we then add those two together then divide the result by two, we get 2.25.
That's because 1.5 is the average of two numbers, and 3 is the average of one number, so we should really be weighting the 1.5 with 2 and the 3 with 1.
So the correct sum would be (2(1.5) + 1(3)) / 3.
This same problem will occur whenever you try to divide an odd list into two halves and then give the two averages equal weighting.
For that reason, I don't think you can use this approach other than where you have a power of two length list.
EDIT: It's also worth noting that there are many good ways to get the average of a list (see this previous question). I haven't focussed on these as I believe your question is mainly asking about the specific algorithm you've used, not actually finding the average.

Related

A median that doesn't divide the sum of two elements in cases when there is an even number of elements

Tried searching, it did not turn up anything relevant. Let's say we have a series with even number of numbers, and we want to calculate its median:
pd.Series([4, 6, 8, 10]).median()
Since we have an even number of elements, there's no element that is exactly in the middle, so instead the method performs the calculation: (6 + 8) / 2 = 7. However, for my purposes it is very important that the median is a number that already exists in the Series, it can't be something calculated from scratch. So I'd rather pick either 6 or 8 than use 7.
One of the possible solutions is to detect the fact that there is an even number of elements and, in such cases, add another element that is guaranteed to be the largest or the smallest, and then just delete it after I get the median. But this solution seems rather clumsy even for a case with one Series. And if we're dealing with a SeriesGroupBy object instead, where such median has to be calculated for each group separately, I can't even begin to imagine how to implement that.
It looks like there's no parameter in the median() method that makes it select one of the two nearest elements instead of dividing, and I can't find any alternative to median() method that can do that either. Is implementing my own median function my only choice?
Instead of using median you should probably use the quantile option (default is median, the 0.5 quantile), and set interpolation to higher, lower, or nearest.
E.g.
>>> pd.Series([4, 6, 8, 10]).quantile(q=0.5, interpolation='nearest')
8
>>> pd.Series([4, 6, 8, 10]).quantile(q=0.5, interpolation='higher')
8
>>> pd.Series([4, 6, 8, 10]).quantile(q=0.5, interpolation='lower')
6
If you don't need to use pandas, you can do it by simply sorting the list and then getting the middle element. Use integer division to ensure that you get an actual index without a fraction.
def list_median(l):
if len(l) == 0:
return None # or maybe raise an error
return sorted(l)[(len(l) - 1) // 2]
Examples:
If the list length is 7 (odd), the media is index 3, and (len(l) - 1) // 2) == 3.
If the list length is 8 (even), the median is between indexes 3 and 4, and (len(l) - 1) // 2) == 3, which is the first of those two indexes.

Is NDCG (normalized discounted gain) flawed? I have calculated a few alternative ranking quality measures, and I can't make heads or tails of it

I'm using python for a learning-to-rank problem, and I am evaluating my success using the following DCG and NDCG code (from http://nbviewer.ipython.org/github/ogrisel/notebooks/blob/master/Learning%20to%20Rank.ipynb )
def dcg(relevances, rank=20):
relevances = np.asarray(relevances)[:rank]
n_relevances = len(relevances)
if n_relevances == 0:
return 0.
discounts = np.log2(np.arange(n_relevances) + 2)
return np.sum(relevances / discounts)
def ndcg(relevances, rank=20):
best_dcg = dcg(sorted(relevances, reverse=True), rank)
if best_dcg == 0:
return 0.
return dcg(relevances, rank) / best_dcg
Here are the DCG values for the best and worst case scenarios in a list of 3 items with no duplicate ranks...
>>> ndcg(np.asarray([3,2,1]))
1.0
>>> ndcg(np.asarray([1,2,3]))
0.78999800424603583
We can use this metric to compare the two rankings and see which is better. If I calculate the worst case for a 4 item list, however...
>>> ndcg(np.asarray([1,2,3,4]))
0.74890302967841715
The 4 item list no longer seems comparable to the 3 item list.
I have also calculated two alternative NDCGs. NDCG2 compares the achieved dcg to bot the best and worst case...
def ndcg2(relevances, rank=20):
best_dcg = dcg(sorted(relevances, reverse=True), rank)
worst_dcg=dcg(sorted(relevances, reverse=False),rank)
if best_dcg == 0:
return 0.
return (dcg(relevances, rank)-worst_dcg) / (best_dcg-worst_dcg)
NDCG randomizes my list of actual rankings 50 times, calculates dcg for each, and compares that to my actual DCG.
def ndcg3(relevances, rank=20):
shuffled=np.copy(relevances)
rands=[]
for i in range(50):
np.random.shuffle(shuffled)
rands.append(dcg(shuffled,rank))
avg_rand_dcg=np.mean(np.asarray(rands))
return dcg(relevances, rank) / avg_rand_dcg
Across my various lists, I get the following metrics...
NDCG: average is .87 (sounds good)
Spearman rank: around .25 (not amazing, but there is something there)
NDCG2: .58 (on average, slightly closer to the best dcg than it is to the worst)
NDCG3: 1.04 (slightly better than randomly sorted lists)
I honestly can't make heads or tails of these results. My NDCG values seem good, but are they really comparable across lists? Do the alternative metrics make more sense?
edit: In my first random comparison, I was not using np.copy(). As such, my random score was almost always .99. That is now fixed and results make more sense.
One think that may mislead you is the way to normalize NDCG. Usually, you have a number of documents to rank, but your NDCG is truncated at a smaller number of documents (for example NCDG#3). In your code, this is determined by the parameter 'rank'.
Let's say that you want to rank 5 documents with relevances R = [1, 2, 3, 4, 0], and compute NDCG#3. If your algorithm believes that the optimal order is [doc1, doc2, doc3, doc4, doc5], then you will have:
NDCG#3 = DCG([1, 2, 3]) / DCG([4, 3, 2])
and not
NDCG#3 = DGC([1, 2, 3]) / DCG([3, 2, 1]) # Incorrect
So in a sense, NDCG([1, 2, 3]) and NDCG([1, 2, 3, 4]) are not comparable. The numerators are quite the same, but the denominators are completely different. If you want NDCG to have an intuitive meaning, you have to set
'rank' smaller or equal to your number of documents.

Python 3: Lists and loops

I need help with my code when answering the following question.
An arithmetic progression is a sequence of numbers in which the distance (or difference) between any two successive numbers is the same. This in the sequence 1, 3, 5, 7, ..., the distance is 2 while in the sequence 6, 12, 18, 24, ..., the distance is 6.
Given the positive integer distance and the non-negative integer n, create a list consisting of the arithmetic progression between (and including) 1 and n with a distance of distance. For example, if distance is 2 and n is 8, the list would be [1, 3, 5, 7].
Associate the list with the variable arith_prog.
I updated my progress:
arith_prog = []
for i in range(1, n, distance):
arith_prog.append(n)
total = n + distance
While the suggestions made so far were helpful, I still haven't arrived at the correct solution turingscraft codelab is looking for.
The range function takes up to three arguments; start, stop and step. You want
list(range(1, n, distance))
I'm responding to this as a homework question, since you seem to be indicating that's what it is:
First of all, you never initialize n. What starting value should it
have?
Second, you don't need two loops here - all you need is one.
Third, why are you passing distance to range()? If you pass two
arguments to range() they're treated as a lower and upper bound,
respectively - and distance is probably not a bound.
The problem is where you have arith_prog.append(n). You need to replace the .append(n) with an .append(i) because we are adding the value in that range to list. I just did this homework for 15 minutes ago and that was one of the correct solutions. I made the same error you did.
do something like this
arith_prog = []
n = 5 #this is just for example, you can use anything you like or do an input
distance = 2 #this is also for example, change it to what ever you like
for i in range(1,n,distance):
arith_prog.append(i)
print(arith_prog) #for example this prints out [1,3]
I also encountered this exercise on myprogramminglab. You were very close. Try this:
arith_prog = []
for i in range(1, n + 1, distance):
arith_prog.append(i)
total = n + distance
Hope this helps.
Working through MPL and came across this problem, accepted answer below:
arith_prog=[]
for i in range(1,n+1,distance):
arith_prog.append(i)

Is it possible to calculate the number of count inversions using quicksort?

I have already solved the problem using mergesort, now I am thinking is that possible to calculate the number using quicksort? I also coded the quicksort, but I don't know how to calculate. Here is my code:
def Merge_and_Count(AL, AR):
count=0
i = 0
j = 0
A = []
for index in range(0, len(AL) + len(AR)):
if i<len(AL) and j<len(AR):
if AL[i] > AR[j]:
A.append(AR[j])
j = j + 1
count = count+len(AL) - i
else:
A.append(AL[i])
i = i + 1
elif i<len(AL):
A.append(AL[i])
i=i+1
elif j<len(AR):
A.append(AR[j])
j=j+1
return(count,A)
def Sort_and_Count(Arrays):
if len(Arrays)==1:
return (0,Arrays)
list1=Arrays[:len(Arrays) // 2]
list2=Arrays[len(Arrays) // 2:]
(LN,list1) = Sort_and_Count(list1)
(RN,list2) = Sort_and_Count(list2)
(M,Arrays)= Merge_and_Count(list1,list2)
return (LN + RN + M,Arrays)
Generally no, because during the partitioning, when you move a value to its correct side of the pivot, you don't know how many of the values you're moving it past are smaller than it and how many are larger. So, as soon as you do that you've lost information about the number of inversions in the original input.
I come across this problem for some times, As a whole, I think it should be still ok to use quick sort to compute the inversion count, as long as we do some modification to the original quick sort algorithm. (But I have not verified it yet, sorry for that).
Consider an array 3, 6, 2, 5, 4, 1. Support we use 3 as the pivot, the most voted answer is right in that the exchange might mess the orders of the other numbers. However, we might do it different by introducing a new temporary array:
Iterates over the array for the first time. During the iteration, moves all the numbers less than 3 to the temporary array. For each such number, we also records how much number larger than 3 are before it. In this case, the number 2 has one number 6 before it, and the number 1 has 3 number 6, 5, 4 before it. This could be done by a simple counting.
Then we copy 3 into the temporary array.
Then we iterates the array again and move the numbers large than 3 into the temporary array. At last we get 2 1 3 6 5 4.
The problem is that during this process how much inversion pairs are lost? The number is the sum of all the numbers in the first step, and the count of number less than the pivot in the second step. Then we have count all the inversion numbers that one is >= pivot and another is < pivot. Then we could recursively deal with the left part and the right part.

Sorting Technique Python

I'm trying to create a sorting technique that sorts a list of numbers. But what it does is that it compares two numbers, the first being the first number in the list, and the other number would be the index of 2k - 1.
2^k - 1 = [1,3,7, 15, 31, 63...]
For example, if I had a list [1, 4, 3, 6, 2, 10, 8, 19]
The length of this list is 8. So the program should find a number in the 2k - 1 list that is less than 8, in this case it will be 7.
So now it will compare the first number in the random list (1) with the 7th number in the same list (19). if it is greater than the second number, it will swap positions.
After this step, it will continue on to 4 and the 7th number after that, but that doesn't exist, so now it should compare with the 3rd number after 4 because 3 is the next number in 2k - 1.
So it should compare 4 with 2 and swap if they are not in the right place. So this should go on and on until I reach 1 in 2k - 1 in which the list will finally be sorted.
I need help getting started on this code.
So far, I've written a small code that makes the 2k - 1 list but thats as far as I've gotten.
a = []
for i in range(10):
a.append(2**(i+1) -1)
print(a)
EXAMPLE:
Consider sorting the sequence V = 17,4,8,2,11,5,14,9,18,12,7,1. The skipping
sequence 1, 3, 7, 15, … yields r=7 as the biggest value which fits, so looking at V, the first sparse subsequence =
17,9, so as we pass along V we produce 9,4,8,2,11,5,14,17,18,12,7,1 after the first swap, and
9,4,8,2,1,5,14,17,18,12,7,11 after using r=7 completely. Using a=3 (the next smaller term in the skipping
sequence), the first sparse subsequence = 9,2,14,12, which when applied to V gives 2,4,8,9,1,5,12,17,18,14,7,11, and the remaining a = 3 sorts give 2,1,8,9,4,5,12,7,18,14,17,11, and then 2,1,5,9,4,8,12,7,11,14,17,18. Finally, with a = 1, we get 1,2,4,5,7,8,9,11,12,14,17,18.
You might wonder, given that at the end we do a sort with no skips, why
this might be any faster than simply doing that final step as the only step at the beginning. Think of it as a comb
going through the sequence -- notice that in the earlier steps we’re using course combs to get distant things in the
right order, using progressively finer combs until at the end our fine-tuning is dealing with a nearly-sorted sequence
needing little adjustment.
p = 0
x = len(V) #finding out the length of V to find indexer in a
for j in a: #for every element in a (1,3,7....)
if x >= j: #if the length is greater than or equal to current checking value
p = j #sets j as p
So that finds what distance it should compare the first number in the list with but now i need to write something that keeps doing that until the distance is out of range so it switches from 3 to 1 and then just checks the smaller distances until the list is sorted.
The sorting algorithm you're describing actually is called Combsort. In fact, the simpler bubblesort is a special case of combsort where the gap is always 1 and doesn't change.
Since you're stuck on how to start this, here's what I recommend:
Implement the bubblesort algorithm first. The logic is simpler and makes it much easier to reason about as you write it.
Once you've done that you have the important algorithmic structure in place and from there it's just a matter of adding gap length calculation into the mix. This means, computing the gap length with your particular formula. You'll then modifying the loop control index and the inner comparison index to use the calculated gap length.
After each iteration of the loop you decrease the gap length(in effect making the comb shorter) by some scaling amount.
The last step would be to experiment with different gap lengths and formulas to see how it affects algorithm efficiency.

Categories

Resources