I have a list such has
results = [100, 100, -100, 100, -100, -100]
I would like to figure out the first occurrence of the opposite number. so first 100 would be match with the first -100, the second 100 would be match with the second -100.
I would like to have position as output such has:
[0, 2], [1, 4], [3, 5]
i.e : [0,2] represent the results[0] and results[2] where first occurrence of 100 is match with the first occurrence of -100
edit : you can assume there will always be the same amount of positive / negative and that the list will only contain 1 number
any help would be appricated
For your simple case where the list only contains 2 integers (x and -x), you could simply zip() together the indexes:
indexes = [[],[]]
for i,x in enumerate(results):
indexes[0].append(i) if x > 0 else indexes[1].append(i)
list(zip(*indexes))
Example:
>>> results = [100, 100, -100, 100, -100, -100]
>>> indexes = [[],[]]
>>> for i,x in enumerate(results): indexes[0].append(i) if x > 0 else indexes[1].append(i)
...
>>> list(zip(*indexes))
[(0, 2), (1, 4), (3, 5)]
Note for small inputs 2 separate list comprehensions (e.g. [i for i,x in enumerate(results) if x > 0] may be faster than appending in a for loop.
IMO, the fastest approach (for large inputs) should be the following one (though, my solution doesn't assume that the input list contains just one value and its opposite, so it can be made even faster if that assumption is added):
x = [100, 300, -300, 100, -100, -100]
from collections import defaultdict, deque
unmatched_positives = defaultdict(deque)
solution=[]
for i, val in enumerate(x):
if val > 0:
unmatched_positives[val].append(i)
else:
solution.append( (unmatched_positives[-val].popleft(), i) )
print('Unsorted solution:', solution)
# If you need the result to be sorted
print('Sorted solution:', sorted(solution))
Output:
Unsorted solution: [(1, 2), (0, 4), (3, 5)]
Sorted solution: [(0, 4), (1, 2), (3, 5)]
This should work:
results = [100, 100, -100, 100, -100, -100]
solution = []
for i, x in enumerate(results):
if x > 0 and isinstance(x, int):
y = results.index(-x)
results[results.index(-x)] = 'found'
solution.append([i,y])
print solution
This would work as well for the general case in which different numbers occur:
solutions = []
for x in set(abs(x) for x in results):
solutions += list(zip([i for i, x2 in enumerate(results) if x2 == x],
[i for i, x2 in enumerate(results) if x2 == x*-1]))
Well we can do this efficiently in two phases. In the analysis phase, we filter out positive numbers, sort them and group them by index, like:
from itertools import groupby
subresult = dict(map(lambda x:(x[0],iter(tuple(x[1]))),
groupby(sorted(filter(lambda x:x[1] < 0,enumerate(results)),
key=lambda x:x[::-1]),lambda x:x[1])
))
Or we can generate it step-by-step, like:
subresult = filter(lambda x:x[1] < 0,enumerate(results)) # filter negative values
subresult = sorted(subresult,key=lambda x:x[::-1]) # sort them on value and then on index
subresult = groupby(subresult,lambda x:x[1]) # group them on the value
subresult = map(lambda x:(x[0],iter(tuple(x[1]))),subresult) # construct a sequence of tuples (value,list of indices)
subresult = dict(subresult) # make it a dictionary
This generates a dictionary:
{-100: <itertools._grouper object at 0x7fedfb523ef0>}
Next in construction phase, we iterate over all positive integers, and always take the next opposite one from the subresult dictionary. Like:
end_result = [[i,next(subresult[-v])[0]] for i,v in enumerate(results) if v > 0]
This generates:
>>> subresult = dict(map(lambda x:(x[0],iter(tuple(x[1]))),groupby(sorted(filter(lambda x:x[1] < 0,enumerate(results)),key=lambda x:x[::-1]),lambda x:x[1])))
>>> [[i,next(subresult[-v])[0]] for i,v in enumerate(results) if v > 0]
[[0, 2], [1, 4], [3, 5]]
Usually because of the dictionary lookup and because we use an iterator (that thus does bookkeeping on at which index we are), this will work quite efficiently.
How about this simple observation based approach? Split it into two lists using list comprehension and then just zip them in the order you want it.
Using list comprehension
In [18]: neg_list = [idx for idx, el in enumerate(results) if el < 0]
In [19]: pos_list = [idx for idx, el in enumerate(results) if el > 0]
In [20]: neg_list
Out[20]: [2, 4, 5]
In [21]: pos_list
Out[21]: [0, 1, 3]
In [22]: list(zip(pos_list, neg_list))
Out[22]: [(0, 2), (1, 4), (3, 5)]
You can also modify what index you need from the order you zip them.
NumPy Version:
For larger lists (or arrays equivalently), the numpy version should be much faster.
In [30]: res = np.array(results)
In [38]: pos_idx = np.where(res > 0)[0]
In [39]: pos_idx
Out[39]: array([0, 1, 3])
In [40]: neg_idx = np.where(res < 0)[0]
In [42]: neg_idx
Out[42]: array([2, 4, 5])
In [44]: list(zip(pos_idx, neg_idx))
Out[44]: [(0, 2), (1, 4), (3, 5)]
# If you want to avoid using zip, then
# just use np.vstack and transpose the result
In [59]: np.vstack((pos_idx, neg_idx)).T
Out[59]:
array([[0, 2],
[1, 4],
[3, 5]])
P.S.: You could also use generator comprehension to achieve the same result but please note that it will be exhausted after you convert the generator to list once.
Using generator comprehension
In [24]: neg_gen = (idx for idx, el in enumerate(results) if el < 0)
In [25]: pos_gen = (idx for idx, el in enumerate(results) if el > 0)
In [27]: list(zip(pos_gen, neg_gen))
Out[27]: [(0, 2), (1, 4), (3, 5)]
# on 2nd run, there won't be any element in the generator.
In [28]: list(zip(pos_gen, neg_gen))
Out[28]: []
pos = {}
for i,item in enumerate(results ):
if item < 0: continue
if item not in pos:
pos[item] = []
pos[item].append(i)
[ [pos[-item].pop(0), i] for i,item in enumerate(results ) if item < 0]
[[0, 2], [1, 4], [3, 5]]
For the sample case where results only contains two different integers:
import numpy as np
results = np.array([100, 100, -100, 100, -100, -100])
output = list(zip(np.where(results > 0)[0], np.where(results < 0)[0]))
Output:
[(0, 2), (1, 4), (3, 5)]
Time is ~0.002 for results * 1000.
Related
I have the following DataFrame:
df = pd.DataFrame({'index':[0,1,2,3,4,5,6,7,8,9,10], 'X':[0,0,1,1,0,0,1,1,1,0,0]})
df.set_index('index', inplace = True)
X
index
0 0
1 0
2 1
3 1
4 0
5 0
6 1
7 1
8 1
9 0
10 0
What I need is to return a list of tuples showing the index value for the first and last instances of the 1s for each sequence of 1s (sorry if that's confusing). i.e.
Want:
[(2,3), (6,8)]
The first instance of the first 1 occurs at index point 2, then the last 1 in that sequence occurs at index point 3. The next 1 occurs at index point 6, and the last 1 in that sequence occurs at index point 8.
What I've tried:
I can grab the first one using numpy's argmax function. i.e.
x1 = np.argmax(df.values)
y1 = np.argmin(df.values[x1:])
(x1,2 + y1 - 1)
Which will give me the first tuple, but iterating through seems messy and I feel like there's a better way.
You need more_itertools.consecutive_groups
import more_itertools as mit
def find_ranges(iterable):
"""Yield range of consecutive numbers."""
for group in mit.consecutive_groups(iterable):
group = list(group)
if len(group) == 1:
yield group[0]
else:
yield group[0], group[-1]
list(find_ranges(df['X'][df['X']==1].index))
Output:
[(2, 3), (6, 8)]
You can use a third party library: more_itertools
loc with mit.consecutive_groups
[list(group) for group in mit.consecutive_groups(df.loc[df.ones == 1].index)]
# [[2, 3], [6, 7, 8]]
Simple list comprehension:
x = [(i[0], i[-1]) for i in x]
# [(2, 3), (6, 8)]
An approach using numpy, adapted from a great answer by #Warren Weckesser
def runs(a):
isone = np.concatenate(([0], np.equal(a, 1).view(np.int8), [0]))
absdiff = np.abs(np.diff(isone))
ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
return [(i, j-1) for i, j in ranges]
runs(df.ones.values)
# [(2, 3), (6, 8)]
Here's a pure pandas solution:
df.groupby(df['X'].eq(0).cumsum().mask(df['X'].eq(0)))\
.apply(lambda x: (x.first_valid_index(),x.last_valid_index()))\
.tolist()
Output:
[(2, 3), (6, 8)]
how do you take the sum of the two last columns if the two first columns matches?
input:
M = [[1,1,3,5],
[1,1,4,6],
[1,2,3,7],
[1,2,6,6],
[2,1,0,8],
[2,1,3,5],
[2,2,9,6],
[2,2,3,4]]
output:
M = [[1,1,7,11],
[1,2,9,13],
[2,1,3,13],
[2,2,12,10]]
and can you do it whit a for loop?
Assuming that 2 similar lists follow each other always you could iterate over M[:-1] and then check the current list values against the next list values
M =[[1,1,3,5],[1,1,4,6],[1,2,3,7],[1,2,6,6],[2,1,0,8],[2,1,3,5],[2,2,9,6],[2,2,3,4]]
t=[]
for i,m in enumerate(M[:-1]):
if m[0] == M[i+1][0] and m[1]==M[i+1][1]:
t.append([m[0],m[1],m[2]+M[i+1][2],m[3]+M[i+1][3]])
print(t)
#[[1, 1, 7, 11], [1, 2, 9, 13], [2, 1, 3, 13], [2, 2, 12, 10]]
If the order might be scrambled I'd use 2 for loops. The second one will check m against every other list after it (it doesn't need to check those earlier since they checked against it).
for i,m in enumerate(M[:-1]):
for x,n in enumerate(M[i+1:]):
if m[0] == n[0] and m[1]==n[1]:
t.append([m[0],m[1],m[2]+n[2],m[3]+n[3]])
We can find the unique tuples in the first two columns and then iterate over those to find the sum of each column whose rows equal the tuple.
Not sure what the fastest solution is, but this is one option:
M =[[1,1,3,5],[1,1,4,6],[1,2,3,7],[1,2,6,6],[2,1,0,8],[2,1,3,5],[2,2,9,6],[2,2,3,4]]
ans = []
for vals in list(set((x[0], x[1]) for x in M)):
ans.append([vals[0], vals[1], sum(res[2] for res in M if (res[0], res[1]) == vals), sum(res[3] for res in M if (res[0], res[1]) == vals)])
A solution with list comprehension and itertools' groupby:
from itertools import groupby
M = [
[1,1,3,5],
[1,1,4,6],
[1,2,3,7],
[1,2,6,6],
[2,1,0,8],
[2,1,3,5],
[2,2,9,6],
[2,2,3,4],
]
print([
[
key[0],
key[1],
sum(x[2] for x in group),
sum(x[3] for x in group),
]
for key, group in [
(key, list(group))
for key, group in groupby(sorted(M), lambda x: (x[0], x[1]))
]
])
Result:
[[1, 1, 7, 11], [1, 2, 9, 13], [2, 1, 3, 13], [2, 2, 12, 10]]
With reduce it can be simplified to:
from itertools import groupby
from functools import reduce
M = [
[1,1,3,5],
[1,1,4,6],
[1,2,3,7],
[1,2,6,6],
[2,1,0,8],
[2,1,3,5],
[2,2,9,6],
[2,2,3,4],
]
print([
reduce(
lambda x, y: [y[0], y[1], y[2] + x[2], y[3] + x[3]],
group,
(0, 0, 0, 0),
)
for _, group in groupby(sorted(M), lambda x: (x[0], x[1]))
])
I want to calculate absolute difference between all elements in a set of integers. I am trying to do abs(x-y) where x and y are two elements in the set. I want to do that for all combinations and save the resulting list in a new set.
I want to calculate absolute difference between all elements in a set of integers (...) and save the resulting list in a new set.
You can use itertools.combinations:
s = { 1, 4, 7, 9 }
{ abs(i - j) for i,j in combinations(s, 2) }
=>
set([8, 2, 3, 5, 6])
combinations returns the r-length tuples of all combinations in s without replacement, i.e.:
list(combinations(s, 2))
=>
[(9, 4), (9, 1), (9, 7), (4, 1), (4, 7), (1, 7)]
As sets do not maintain order, you may use something like an ordered-set and iterate till last but one.
For completeness, here's a solution based on Numpy ndarray's and pdist():
In [69]: import numpy as np
In [70]: from scipy.spatial.distance import pdist
In [71]: s = {1, 4, 7, 9}
In [72]: set(pdist(np.array(list(s))[:, None], 'cityblock'))
Out[72]: {2.0, 3.0, 5.0, 6.0, 8.0}
Here is another solution based on numpy:
data = np.array([33,22,21,1,44,54])
minn = np.inf
index = np.array(range(data.shape[0]))
for i in range(data.shape[0]):
to_sub = (index[:i], index[i+1:])
temp = np.abs(data[i] - data[np.hstack(to_sub)])
min_temp = np.min(temp)
if min_temp < minn : minn = min_temp
print('Min difference is',minn)
Output: "Min difference is 1"
Here is another way using combinations:
from itertools import combinations
def find_differences(lst):
" Find all differences, min & max difference "
d = [abs(i - j) for i, j in combinations(set(lst), 2)]
return min(d), max(d), d
Test:
list_of_nums = [1, 9, 7, 13, 56, 5]
min_, max_, diff_ = find_differences(list_of_nums)
print(f'All differences: {diff_}\nMaximum difference: {max_}\nMinimum difference: {min_}')
Result:
All differences: [4, 6, 8, 12, 55, 2, 4, 8, 51, 2, 6, 49, 4, 47, 43]
Maximum difference: 55
Minimum difference: 2
I have list:
numbers = [2,3,1,6,5]
And I must remove one min and one max number:
sorted(numbers)[1:-1]
And this is ok, but I want get additional information - position of removed numbers in original list:
remains = sorted(numbers)[1:-1]
min_number_position = 2
max_number_position = 3
How to do it? Numbers can be repeated.
Just use min and max functions in couple with index method of list to get position:
min_position = numbers.index(min(numbers))
max_position = numbers.index(max(numbers))
del numbers[min_position]
del numbers[max_position]
A pure python solution by creating arg sorted array (as created by numpy.argsort()) . Example -
numbers = [2,3,1,6,5]
argsorted = sorted(range(len(numbers)),key=lambda x:numbers[x])
maxpos,minpos = argsorted[-1],argsorted[0]
remains = [numbers[i] for i in argsorted[1:-1]]
Demo -
>>> numbers = [2,3,1,6,5]
>>> argsorted = sorted(range(len(numbers)),key=lambda x:numbers[x])
>>> argsorted
[2, 0, 1, 4, 3]
>>> maxpos,minpos = argsorted[-1],argsorted[0]
>>> remains = [numbers[i] for i in argsorted[1:-1]]
>>> remains
[2, 3, 5]
>>> maxpos
3
>>> minpos
2
If you can use numpy library, this can be easily done using array.argsort() . Example -
nnumbers = np.array(numbers)
nnumargsort = nnumbers.argsort()
minpos,maxpos = nnumargsort[[0,-1]]
remains = nnumbers[nnumargsort[1:-1]]
Demo -
In [136]: numbers = [2,3,1,6,5]
In [137]: nnumbers = np.array(numbers)
In [138]: nnumargsort = nnumbers.argsort()
In [139]: minpos,maxpos = nnumargsort[[0,-1]]
In [140]: remains = nnumbers[nnumargsort[1:-1]]
In [141]: remains
Out[141]: array([2, 3, 5])
In [142]: maxpos
Out[142]: 3
In [143]: minpos
Out[143]: 2
>>> sorted(enumerate(numbers), key=operator.itemgetter(1))
[(2, 1), (0, 2), (1, 3), (4, 5), (3, 6)]
The rest is left as an exercise for the reader.
You can use a function and return the index of max and min with list.index method :
>>> def func(li):
... sorted_li=sorted(li)
... return (li.index(sorted_li[0]),sorted_li[1:-1],li.index(sorted_li[-1]))
...
>>> min_number_position,remains,max_number_position=func(numbers)
>>> min_number_position
2
>>> remains
[2, 3, 5]
>>> max_number_position
3
In python 3.X you can use unpacking assignment :
>>> def func(li):
... mi,*re,ma=sorted(li)
... return (li.index(mi),re,li.index(ma))
I want to iterate over a numpy array starting at the index of the highest value working through to the lowest value
import numpy as np #imports numpy package
elevation_array = np.random.rand(5,5) #creates a random array 5 by 5
print elevation_array # prints the array out
ravel_array = np.ravel(elevation_array)
sorted_array_x = np.argsort(ravel_array)
sorted_array_y = np.argsort(sorted_array_x)
sorted_array = sorted_array_y.reshape(elevation_array.shape)
for index, rank in np.ndenumerate(sorted_array):
print index, rank
I want it to print out:
index of the highest value
index of the next highest value
index of the next highest value etc
If you want numpy doing the heavy lifting, you can do something like this:
>>> a = np.random.rand(100, 100)
>>> sort_idx = np.argsort(a, axis=None)
>>> np.column_stack(np.unravel_index(sort_idx[::-1], a.shape))
array([[13, 62],
[26, 77],
[81, 4],
...,
[83, 40],
[17, 34],
[54, 91]], dtype=int64)
You first get an index that sorts the whole array, and then convert that flat index into pairs of indices with np.unravel_index. The call to np.column_stack simply joins the two arrays of coordinates into a single one, and could be replaced by the Python zip(*np.unravel_index(sort_idx[::-1], a.shape)) to get a list of tuples instead of an array.
Try this:
from operator import itemgetter
>>> a = np.array([[2, 7], [1, 4]])
array([[2, 7],
[1, 4]])
>>> sorted(np.ndenumerate(a), key=itemgetter(1), reverse=True)
[((0, 1), 7),
((1, 1), 4),
((0, 0), 2),
((1, 0), 1)]
you can iterate this list if you so wish. Essentially I am telling the function sorted to order the elements of np.ndenumerate(a) according to the key itemgetter(1). This function itemgetter gets the second (index 1) element from the tuples ((0, 1), 7), ((1, 1), 4), ... (i.e the values) generated by np.ndenumerate(a).