finding n largest differences between two lists

finding n largest differences between two lists - python

I have two lists old and new, with the same number of elements.
I'm trying to write an efficient function that takes n as a parameter, compares the elements of two lists at the same locations (by index), finds n largest differences, and returns the indices of those n elements.
I was thinking this would be best solved by a value-sorted dictionary, but one isn't available in Python (and I'm not aware of any libraries that offer it). Perhaps there's a better solution?

Whenever you think "n largest", think heapq.
>>> import heapq
>>> import random
>>> l1 = [random.randrange(100) for _ in range(100)]
>>> l2 = [random.randrange(100) for _ in range(100)]
>>> heapq.nlargest(10, (((a - b), a, b) for a, b in zip(l1, l2)))
[(78, 99, 21), (75, 86, 11), (69, 90, 21), (69, 70, 1), (60, 86, 26), (55, 95, 40), (52, 56, 4), (48, 98, 50), (46, 80, 34), (44, 81, 37)]
This will find the x largest items in O(n log x) time, where n is the total number of items in the list; sorting does it in O(n log n) time.
It just occurred to me that the above doesn't actually do what you asked for. You want an index! Still very easy. I'll also use abs here in case you want the absolute value of the difference:
>>> heapq.nlargest(10, xrange(len(l1)), key=lambda i: abs(l1[i] - l2[i]))
[91, 3, 14, 27, 46, 67, 59, 39, 65, 36]

Assuming the number of elements in the lists aren't huge, you could just difference all of them, sort, and pick the first n:
print sorted((abs(x-y) for x,y in zip(old, new)), reverse=True)[:n]
This would be O(k log k) where k is the length of your original lists.
If n is significantly smaller than k, the best idea would be to use the nlargest function provided by the heapq module:
import heapq
print heapq.nlargest(n, (abs(x-y) for x,y in zip(old, new))
This will be O(k log n) instead of O(k log k) which can be significant for k >> n.
Also, if your lists are really big, you'd probably be better off using itertools.izip instead of the regular zip function.

From your question i think this is what you want:
In difference.py
l1 = [15,2,123,4,50]
l2 = [9,8,7,6,5]
l3 = zip(l1, l2)
def f(n):
diff_val = 0
index_val = 0
l4 = l3[:n]
for x,y in l4:
if diff_val < abs(x-y):
diff_val = abs(x-y)
elem = (x, y)
index_val = l3.index(elem)
print "largest diff: ", diff_val
print "index of values:", index_val
n = input("Enter value of n:")
f(n)
Execution:
[avasal#avasal ]# python difference.py
Enter value of n:4
largest diff: 116
index of values: 2
[avasal#avasal]#
if this is not what you want, consider elaborating the question little more..

>>> l = []
... for i in itertools.starmap(lambda x, y: abs(x-y), itertools.izip([1,2,3], [100,102,330])):
... l.append(i)
>>> l
5: [99, 100, 327]
itertools comes handy for repetitive tasks. From starmap converts tuples to *args. For reference. With max function you will be able to get the desired result. index function will help to find the position.
l.index(max(l)
>>> l.index(max(l))
6: 2

Here's a solution hacked together in numpy (disclaimer, I'm a novice in numpy so there may be even slicker ways to do this). I didn't combine any of the steps so it is very clear what each step was doing. The final value is a list of the indexes of the original lists in order of the highest delta. Picking the top n is simply sorted_inds[:n] and retrieving the values from each list or from the delta list is trivial.
I don't know how it compares in performance to the other solutions and it's obviously not going to show up with such a small data set, but it might be worth testing with your real data set as my understanding is that numpy is very very fast for numerical array operations.
Code
import numpy
list1 = numpy.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
list2 = numpy.array([9, 8, 7, 6, 5, 4, 3, 2, 1])
#Caculate the delta between the two lists
delta = numpy.abs(numpy.subtract(list1, list2))
print('Delta: '.ljust(20) + str(delta))
#Get a list of the indexes of the sorted order delta
sorted_ind = numpy.argsort(delta)
print('Sorted indexes: '.ljust(20) + str(sorted_ind))
#reverse sort
sorted_ind = sorted_ind[::-1]
print('Reverse sort: '.ljust(20) + str(sorted_ind))
Output
Delta: [8 6 4 2 0 2 4 6 8]
Sorted indexes: [4 3 5 2 6 1 7 0 8]
Reverse sort: [8 0 7 1 6 2 5 3 4]

Related

How to make list of lists, where you take the square and cube of each even number?

I would like to make a list of lists after squaring and cubing each even number.
Below is my code i have thus far:
def sq_cube(numbers):
ls1 = []
for i in numbers:
if i%2 == 0:
ls1.append(i)
else:
pass
ls2square = [x**2 for x in ls1]
ls3cube = [x**3 for x in ls1]
ls4all = list(ls2square +ls3cube)
return ls4all
RUN: sq_cube([1,2,3,4,6])
OUTPUT:[4, 16, 36, 8, 64, 216]
I would love my OUTPUT to be: [[4, 8], [16, 64], [36, 216]]
ls1: Here I sorted the list 1,2,3,4,6 into even numbers.
ls2square: Squared the even number in ls1.
ls3cube: Cubed the even numbers in ls1.
As you can see in my OUTPUT it gives both lists but it does not give each even
number its separate list where that even number was squared and cubed.

The problem comes from ls4all. You code is
ls4all = list(ls2square +ls3cube)
variable ls4all does not contain the desired list of lists: [[4, 8], [16, 64], [36, 216]] since it's the concatenation of ls2square and ls3cube. We would rather like to have the list of pairs of elements in ls2square and ls3cube. To achieve that, you can create an iterator that outputs the pairs of elements of ls2square and ls3cube, for instance, using zip command.
zip command works like that:
>>> list(zip([1,10], [2,20]))
[(1, 2), (10, 20)]
As you can see, it gathers together elements in [1,10] and [2,20], making pairs.
So you can use zip this way:
ls2square = [4, 16, 36]
ls3cube = [8, 64, 216]
ls4all = list([a,b] for a,b in zip(ls2square, ls3cube))
print(ls4all)
anyway, a shorter answer would be:
>>> print([(k**2, k**3) for k in [1,2,3,4,6] if k % 2 == 0])
[(4, 8), (16, 64), (36, 216)]
UPDATE:
In case your integers have type float, k % 2 will raise the following error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: float modulo
So, if you want to round your float to the closest integer, just use round function like so:
ls2square = [round(x)**2 for x in ls1]
ls3cube = [round(x)**3 for x in ls1]
Short answer above is now:
>>> print([(round(k)**2, round(k)**3) for k in [1.0, 2.0, 3.0, 4.0, 6.0] if round(k) % 2 == 0])
[(4, 8), (16, 64), (36, 216)]

since you want the squares and the cubes of the same number to be in the same list you could use this, since both lists(ls2square,ls3cube) have the same length instead of just adding them together you can add in seperate list the each corisponding elements that they have, element 0 in ls2square goes with element 0 of ls3cube and so on:
def sq_cube(numbers):
ls1 = []
for i in numbers:
if i%2 == 0:
ls1.append(i)
else:
pass
ls2square = [x**2 for x in ls1]
ls3cube = [x**3 for x in ls1]
ls4all= [[ls2square[k],ls3cube[k]] for k in range(len(ls3cube))]
return ls4all

Create a new list
Round incoming list
loop for every number to check if even
then square and cube that number
def sq_cube(numbers):
new_string = []
numbers = [round(num) for num in numbers]
for W in numbers:
if W % 2 ==0:
new_string.append([W**2,W**3])
return new_string

How to add every nth entry in a python list to each other?

Let's say I have a python list:
[4,5,25,60,19,2]
How can I add every nth entry to each other?
e.g. I split the list into 3 entries [ 4,5 / 25,60 / 19,2 ], then add these entries in order to get a new list:
[4+25+19, 5+60+2]
Which gives me the sum:
[48, 67]
For a more complex example, lets say I have 2000 entries in my list. I want to add every 100th entry to the one before so I get 100 entries in the new list. Each entry would now be the sum of every 100th entry.

Iteratively extract your slices and sum them up.
>>> [sum(l[i::2]) for i in range(len(l) // 3)]
[48, 67]
You may have to do a bit more to handle corner cases but this should be a good start for you.

The itertools documentation has a recipe function called grouper, you can import it from more_itertools (needs manual install) or copy paste it.
It works like this:
>>> from more_itertools import grouper
>>> l = [4,5,25,60,19,2]
>>> list(grouper(2, l)) # 2 = len(l)/3
>>> [(4, 5), (25, 60), (19, 2)]
You can transpose the output of grouper with zip and apply sum to each group.
>>> [sum(g) for g in zip(*grouper(2, l))]
>>> [48, 67]
I prefer this to manually fiddling with indices. In addition, it works with any iterable, not just lists. A generic iterable may not support indexing or slicing, but it will always be able to produce a stream of values.

Using the chunks function taken from here, you could write the following:
def chunks(l, n):
"""Yield successive n-sized chunks from l."""
for i in range(0, len(l), n):
yield l[i:i + n]
l = [4,5,25,60,19,2]
print([sum(e) for e in list(chunks(l, 2))])

There may be a smart sequence of list operations that you could use but I couldn't think of any. So instead I just did a parser that goes from 0 to n-1 and within the confines of the list adds the elements, going every n. So if n=3, you go 0, 3, 6, etc; then 1, 4, 7, etc. - and put it into the output list.
The code is attached below. Hope it helps.
list1 = [7, 6, -5.4, 6, -4, 55, -21, 45, 67, -9, -8, -7, 8, 9, 11, 110, -0.8, -9.8, 1.1]
n = 5
list2 = []
sum_elem = 0
for i in range(n):
sum_elem = 0
j = i
while j < len( list1 ):
sum_elem += list1[j]
j += n
list2.append(sum_elem)
print( list2 )

PYTHON - finding the maximum of every 10 integers in an array

I have a large array of integers, and I need to print the maximum of every 10 integers and its corresponding index in the array as a pair.
ex. (max_value, index of max_value in array)
I can successfully find the maximum value and the corresponding index within the first 10 integers, however I am having trouble looping through the entire array.
I have tried using:
a = some array of integers
split = [a[i:i+10] for i in xrange(0, len(a), 10)]
for i in split:
j = max(i)
k = i.index(max(i))
print (j,k)
The issue with this method is that it splits my array into chunks of 10 so the max_values are correct, but the indexes are inaccurate (all of the indexes are between 0-10.)
I need to find a way of doing this that doesn't split my array into chunks so that the original indices are retained. I'm sure there is an easier way of looping through to find max values but I can't seem to figure it out.

A small modification to your current code:
a = some array of integers
split = [a[i:i+10] for i in xrange(0, len(a), 10)]
for index, i in enumerate(split):
j = max(i)
k = i.index(max(i))
print (j, k+10*index)

You need to count the number of elements that appear before the current window. This will do the job:
a=list(range(5,35))
split = [a[i:i+10] for i in xrange(0, len(a), 10)]
for ind,i in enumerate(split):
j = max(i)
k = i.index(j)
print (j,k+ind*10)
This prints
(14, 9)
(24, 19)
(34, 29)

So with debugging with an example array, we find that split returns a 2d list like this one:
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]]
And every time the for loop runs, it does through one of those lists in order. First it goes through the first inner list then the second one etc. So every time the for loop jumps into the next list, we simply add 10. Since the list can have over 2 lists in them, we store the number we need to add in a variable and add 10 to it every loop:
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
split = [a[i:i+10] for i in xrange(0, len(a), 10)]
counter = 0
for i in split:
j = max(i)
k = i.index(max(i))
print (j,k+counter)
counter += 10
You can test it here

The toolz package has a partition_all function that divides a sequence up into equal-sized tuples, so you can do something like this.
import toolz
ns = list(range(25))
[max(sublist) for sublist in toolz.partition_all(10, ns)]
This will return [9, 19, 24].

You will need to loop through in order to iterate through the list, however we could change your split's loop to make it more effective to what you want.
a = some array of integers
split = [a[i:i+10] for i in xrange(0, len(a), 10)]
for i in range(len(split)):
#Now instead of being the list, i is the index, so we can use 10*i as a counter
j = max(split[i])
#j = max(i)
k = split[i].index(j) + 10*i #replaced max(i) with j since we already calculated it.
#k = i.index(max(i))
print (j,k)
Though in the future, please make a new name for your split list since split is already a function in python. Perhaps split_list or separated or some other name that doesn't look like the split() function.

numpy solution for arbitrary input:
import numpy as np
a = np.random.randint(1,21,40) #40 random numbers from 1 to 20
b = a.reshape([4,10]) #shape into chunks 10 numbers long
i = b.argsort()[:,-1] #take the index of the largest number (last number from argsort)
# from each chunk. (these don't take into account the reshape)
i += np.arange(0,40,10) #add back in index offsets due to reshape
out = zip(i, a[i]) #zip together indices and values

You could simplify this by only enumerating once and using zip to partition your list into groups:
n=10
for grp in zip(*[iter(enumerate(some_list))]*n):
grp_max_ind, grp_mv=max(grp, key=lambda t: t[1])
k=[t[1] for t in grp].index(grp_mv)
print grp_mv, (grp_max_ind, k)
Use izip in Python 2 if you want a generator (or use Python 3)
from itertools import izip
for grp in izip(*[iter(enumerate(some_list))]*n):
grp_max_ind, grp_mv=max(grp, key=lambda t: t[1])
k=[t[1] for t in grp].index(grp_mv)
print grp_mv, (grp_max_ind, k)
Zip will truncate the last group if not a length of n

An example using numpy. First let's generate some data, i.e., integers ranging from 1 to V and of length (number of values) L:
import numpy as np
V = 1000
L = 45 # method works with arrays not multiples of 10
a = np.random.randint(1, V, size=L)
Now solve the problem for sub-arrays of size N:
import numpy as np
N = 10 # example "split" size
sa = np.array_split(a, range(N, len(a), N))
sind = [np.argpartition(i, -1)[-1] for i in sa]
ind = [np.ravel_multi_index(i, (len(sa), N)) for i in enumerate(sind)]
vals = np.asarray(a)[np.asarray(ind)]
split_imax = zip(vals, ind) # <-- output

Finding the difference between consecutive numbers in a list (Python)

Given a list of numbers, I am trying to write a code that finds the difference between consecutive elements. For instance, A = [1, 10, 100, 50, 40] so the output of the function should be [0, 9, 90, 50, 10]. Here is what I have so far trying to use recursion:
def deviation(A):
if len(A) < 2:
return
else:
return [abs(A[0]-A[1])] + [deviation(A[1: ])]
The output I get, however, (using the above example of A as the input) is [9, [90, [50, [10, None]]]]. How do I properly format my brackets? (I've tried guessing and checking but I this is the closest I have gotten) And how do I write this where it subtracts the current element from the previous element without getting an index error for the first element? I still want the first element of the output list to be zero but I do not know how to go about this using recursion and for some reason that seems the best route to me.

You can do:
[y-x for x, y in zip(A[:-1], A[1:])]
>>> A = [1, 10, 100, 50, 40]
>>> [y-x for x, y in zip(A[:-1], A[1:])]
[9, 90, -50, -10]
Note that the difference will be negative if the right side is smaller, you can easily fix this (If you consider this wrong), I'll leave the solution for you.
Explanation:
The best explanation you can get is simply printing each part of the list comprehension.
A[:-1] returns the list without the last element: [1, 10, 100, 50]
A[1:] returns the list without the first element: [10, 100, 50, 40]
zip(A[:-1], A[1:]) returns [(1, 10), (10, 100), (100, 50), (50, 40)]
The last step is simply returning the difference in each tuple.

The simplest (laziest) solution is to use the numpy function diff:
>>> A = [1, 10, 100, 50, 40]
>>> np.diff(A)
array([ 9, 90, -50, -10])
If you want the absolute value of the differences (as you've implied by your question), then take the absolute value of the array.

[abs(j-A[i+1]) for i,j in enumerate(A[:-1])]

You can do a list comprehension:
>>> A = [1, 10, 100, 50, 40]
>>> l=[A[0]]+A
>>> [abs(l[i-1]-l[i]) for i in range(1,len(l))]
[0, 9, 90, 50, 10]

For a longer recursive solution more in line with your original approach:
def deviation(A) :
if len(A) < 2 :
return []
else :
return [abs(A[0]-A[1])] + deviation(A[1:])
Your bracket issue is with your recursive call. Since you have your [deviation(a[1: ])] in its own [] brackets, with every recursive call you're going to be creating a new list, resulting in your many lists within lists.
In order to fix the None issue, just change your base case to an empty list []. Now your function will add 'nothing' to the end of your recursively made list, as opposed to the inherent None that comes with a blank return'

Actually recursion is an overkill:
def deviation(A):
yield 0
for i in range(len(A) - 1):
yield abs(A[i+1] - A[i])
Example:
>>> A = [3, 5, 2]
>>> list(deviation(A))
[0, 2, 3]
EDIT: Yet, another, even simplier and more efficient solution would be this:
def deviation(A):
prev = A[0]
for el in A:
yield abs(el - prev)
prev = el

What is the cleanest way to search a word into a list of strings? [duplicate]

In Python, how do you find the index of the first value greater than a threshold in a sorted list?
I can think of several ways of doing this (linear search, hand-written dichotomy,..), but I'm looking for a clean an reasonably efficient way of doing it. Since it's probably a pretty common problem, I'm sure experienced SOers can help!
Thanks!

Have a look at bisect.
import bisect
l = [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
bisect.bisect(l, 55) # returns 7
Compare it with linear search:
timeit bisect.bisect(l, 55)
# 375ns
timeit next((i for i,n in enumerate(l) if n > 55), len(l))
# 2.24us
timeit next((l.index(n) for n in l if n > 55), len(l))
# 1.93us

You might get a better time than the enumerate/generator approach using itertools; I think itertools provides faster implementations of the underlying algorithms, for the performance mongers in all of us. But bisect may still be faster.
from itertools import islice, dropwhile
threshold = 5
seq = [1,4,6,9,11]
first_val = islice(dropwhile(lambda x: x<=threshold, seq),0,1)
result = seq.index(first_val)
I wonder about the difference between the bisect approach shown here and the one listed for your question in the doc examples, as far as idiom/speed. They show an approach for finding the value, but truncated to first line, it returns the index. I'd guess that since it's called "bisect_right" instead of "bisect," it probably only looks from one direction. Given that your list is sorted and you want greater-than, this might be the greatest search economy.
from bisect import bisect_right
def find_gt(a, x):
'Find leftmost value(switching this to index) greater than x'
return bisect_right(a, x)
Interesting question.

Related Index and Val of the last element greater than a threshold
l = [1, 4, 9, 16, 25, 36, 49, 64, 100, 81, 100]
max((x,i) for i, x in enumerate(l) if x > 4)
(100, 10)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

finding n largest differences between two lists - python

Related

How to make list of lists, where you take the square and cube of each even number?

How to add every nth entry in a python list to each other?

PYTHON - finding the maximum of every 10 integers in an array

Finding the difference between consecutive numbers in a list (Python)

What is the cleanest way to search a word into a list of strings? [duplicate]

Categories

Resources