Python: extending array keeps square bracket - python

I am trying to extend a list to add additional values but in the results it keeps displaying the end of the previous list.
def landmarksPoint():
landmarkPoints = []
# Check for range of landmarks (0 to 23) within the image, if all are displayed then continue to save the file.
for n in range(pointNumber):
# Split each line and column to save to text file and save to landmarkPoints Array.
x = landmarks.part(n).x
y = landmarks.part(n).y
# Print each line for testing and append it to array.
print("x:", x, " y:", y)
landmarkPoints.append((x, y))
return landmarkPoints
for hand in hands:
landmarks = predictor(imageGray1, composite1)
points1.append(landmarksPoint())
print(points1)
boundaryLoc = (1,1), (700,1), (1590, 1), (1590,500), (1590, 1190), (700, 1190), (1, 1190), (1,500)
points1.extend(boundaryLoc)
print(points1)
OUTPUT:
[[(992, 191), (1178, 337), (895, 702), (859, 873), (831, 991), (836, 514), (794, 627), (762, 768), (744, 900), (770, 396), (728, 479), (705, 586), (1213, 458), (690, 703), (773, 229), (803, 140), (1228, 147), (1281, 543), (1082, 471), (1027, 576), (996, 712), (970, 841), (933, 966), (922, 563)], (1, 1), (700, 1), (1590, 1), (1590, 500), (1590, 1190), (700, 1190), (1, 1190), (1, 500)]

The docs say that list.extend() extends the calling object with the contents of an argument that is an iterable.
So, points1.extend(boundaryLoc) extends the list points1 using the contents of the tuple boundaryLoc (you can verify that boundaryLoc is a tuple of tuples by examining the result of type(boundaryLoc)).
This means that each tuple contained within boundaryLoc will in effect be appended to points1, which is exactly what your output shows.
If you want to append a list of tuples to points1, you can do this:
boundaryLoc = [(1,1), (700,1), (1590, 1), (1590,500), (1590, 1190), (700, 1190), (1, 1190), (1,500)]
points1.append(boundaryLoc)
Note that we have explicitly made boundaryLoc a list (not a tuple) of tuples, and we use append() instead of extend().
If you really wanted to use extend(), you could do this:
points1.extend([boundaryLoc])

Related

Indices of all values in an array

I have a matrix A. I would like to generate the indices of all the values in this matrix.
A=np.array([[1,2,3],[4,5,6],[7,8,9]])
The desired output should look like:
[(0,0),(0,1),(0,2),(1,0),(1,1),(2,1),(2,0),(2,1),(2,2)]
You can use:
from itertools import product
list(product(*map(range, A.shape)))
This outputs:
[(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)]
Explanation:
A.shape gives the dimensions of the array. For each dimension, we create a range() that generates all of the numbers between 0 and the length of a given dimension. We use map() to perform this for each dimension of the array. Finally, we unpack all of these ranges into the arguments of itertools.product() to create the Cartesian product among all these ranges.
Notably, the use of list unpacking and map() means that this approach can handle ndarrays with an arbitrary number of dimensions. At the time of posting this answer, all of the other answers cannot be immediately extended to a non-2D array.
This should work.
indices = []
for i in range(len(A)):
for j in range(len(A[i])):
indices.append((i,j))
Heres a way of doing by using itertools combinations
from itertools import combinations
sorted(set(combinations(tuple(range(A.shape[0])) * 2, 2)))
combinations chooses two elements from the list and pairs them, which results in duplication, so converting it to set to remove duplications and then sorting it.
This line of list comprehension works. It probably isn't as fast as using itertools, but it does work.
[(i,j) for i in range(len(A)) for j in range(len(A[i]))]
Using numpy only you can take advantage of ndindex
list(np.ndindex(A.shape))
or unravel_index:
list(zip(*np.unravel_index(np.arange(A.size), A.shape)))
Output:
[(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)]
NB. The second option enables you to pass a order='C' (row-major) or order='F' (column-major) parameter to get a different order of the coordinates
Example on A = np.array([[1,2,3],[4,5,6]])
order='C' (default):
[(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)]
order='F':
[(0, 0), (1, 0), (0, 1), (1, 1), (0, 2), (1, 2)]

Comparing X Y coordinates of the same list

I have a list of X Y tuple coordinates. I am trying to eliminate the coordinates that are very close to each other using the euclidean distance. However, the code so far does not perform as expected, especially as the number of coordinates increases.
So far, I have found online how to compare two lists of coordinates, but not the elements within the same list.
Hence, what i have done is slice the list in the first element and the remainder of the list and so the euclidean distance comparison. If within the proximity, it is element value is removed from the list. Then the list is updated and the procedure repeated. However, it does not perform as expected.
from scipy.spatial import distance
# List of coordinates.
xy = [(123, 2191), (44, 2700), (125, 2958), (41, 3368), (33, 4379), (78, 4434), (75, 5897), (50, 6220), (75, 7271), (80, 7274), (58, 8440), (60, 8440), (59, 8441), (32, 9699), (54, 9758), (58, 9759), (43, 10113), (64, 10252), (57, 12118), (61, 12120), (60, 14129), (61, 14129), (66, 15932), (68, 15933), (53, 17302), (57, 17304), (84, 20012), (84, 20013), (102, 20222), (49, 21257), (47, 21653), (56, 27042), (51, 28200), (48, 28201), (55, 28202), (65, 29366), (43, 29484), (67, 29808), (32, 30840), (31, 30842), (48, 36368), (48, 36369), (49, 36369), (21, 37518), (102, 37519)]
uni = []
for x in xy[:]:
for i, j in enumerate(xy):
if i == 0:
new_xy = j # New List comprising of first element of the list
remaining_xy = list(set(xy) - set(new_xy)) # rest of list converted into a separate list
for m in remaining_xy:
print(new_xy , m, distance.euclidean(new_xy , m))
if distance.euclidean(new_xy ,m) < 1000: # If distance less then threshold, remove.
remaining_xy.remove(m)
xy = remaining_xy #reset xy
remaining_xy = [] #reset remaining_xy
uni.append(new_xy) # append unique values.
print(len((uni)), uni)
However, for example, the output shows
..., (53, 17302), (57, 17304), ...
Which does not satisfy the threshold.
For me your code is actually working. Maybe just change your last print statement to:
print(len(set(uni)), set(uni))
These outputs seem right for me. All coordinates in the set(uni) are more than 1000 apart from each other.
I get the following:
23 {(68, 15933), (58, 8440), (75, 7271), (51, 28200), (21, 37518), (61, 14129), (84, 20012), (65, 29366), (50, 6220), (49, 21257), (53, 17302), (41, 3368), (33, 4379), (64, 10252), (58, 9759), (56, 27042), (57, 12118), (78, 4434), (32, 30840), (31, 30842), (48, 36369), (48, 28201), (123, 2191)}
Update:
Unfortunately I haven't tested the complete output... I cannot directly find the issue in your code, but with a recursive function you will get the correct result you are looking for:
def recursiveCoord(_coordinateList):
if len(_coordinateList) > 1:
xy_0 = _coordinateList[0]
remaining_xy = list(set(_coordinateList) - set(xy_0))
new_xy_list = []
for coord in remaining_xy:
dist = distance.euclidean(xy_0 ,coord)
if dist >= 1000:
new_xy_list.append(coord)
return [xy_0] + recursiveCoord(new_xy_list)
else:
return []
Call it like that:
uni = recursiveCoord(xy)
and you will get a list with all unique coordinates.

Remove duplicate unordered tuples from list

In a list of tuples, I want to have just one copy of a tuple where it may be (x, y) or (y, x).
So, in:
# pairs = list(itertools.product(range(3), range(3)))
pairs = [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)]
the result should be:
result = [(0, 0), (0, 1), (0, 2), (1, 1), (1, 2), (2, 2)] # updated pairs
This list of tuples is generated using itertools.product() but I want to remove the duplicates.
My working solution:
pairs = [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)]
result = []
for pair in pairs:
a, b = pair
# reordering in increasing order
temp = (a, b) if a < b else (b, a)
result.append(temp)
print(list(set(result))) # I could use sorted() but the order doesn't matter
How can this be improved?
You could use combinations_with_replacement
The code for combinations_with_replacement() can be also expressed as a subsequence of product() after filtering entries where the elements are not in sorted order (according to their position in the input pool)
import itertools
pairs = list(itertools.combinations_with_replacement(range(3), 2))
print(pairs)
>>> [(0, 0), (0, 1), (0, 2), (1, 1), (1, 2), (2, 2)]
edit I just realized, your solution matches my solution. What you are doing is just fine. If you need to do this for a very large list, then there are some other options you may want to look into, like a key value store.
If you need to remove dupes more programatically, then you can use a function like this:
def set_reduce(pairs):
new_pairs = set([])
for x,y in pairs:
if x < y:
new_pairs.add((x,y))
else:
new_pairs.add((y,x))
return new_pairs
running this results in
>>>set_reduce(pairs)
set([(0, 1), (1, 2), (0, 0), (0, 2), (2, 2), (1, 1)])
This is one solution which relies on sparse matrices. This works for the following reasons:
An entry in a matrix cannot contain two values. Therefore, uniqueness is guaranteed.
Selecting the upper triangle ensures that (0, 1) is preferred above (1, 0), and inclusion of both is not possible.
import numpy as np
from scipy.sparse import csr_matrix, triu
lst = [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1),
(1, 2), (2, 0), (2, 1), (2, 2)]
# get row coords & col coords
d1, d2 = list(zip(*lst))
# set up sparse matrix inputs
row, col, data = np.array(d1), np.array(d2), np.array([1]*len(lst))
# get upper triangle of matrix including diagonal
m = triu(csr_matrix((data, (row, col))), 0)
# output coordinates
result = list(zip(*(m.row, m.col)))
# [(0, 0), (0, 1), (0, 2), (1, 1), (1, 2), (2, 2)]

Closest pair of point by brute force

I do not know what is wrong with my code. I generate 100 random points and I want to find the closest pair of these points, but the result is wrong.
#Closest pair
from math import sqrt
from random import randint
arr1=[]
dist=0
p1=[]
p2=[]
min1=1000
for i in range(0, 100):
arr1.append([randint(0,100),randint(0,100)])
print(arr1)
print("\n")
def dist(a,b):
x=pow((a[0]-b[0]),2)
y=pow((a[1]-b[1]),2)
return sqrt(x+y)
for i in range(0, len(arr1)):
for j in range(i+1, len(arr1)):
dis=dist(arr1[i],arr1[j])
if(dis<min1):
min1=dis
p1=arr1[i]
p2=arr1[j+1]
print(p1,"",p2,min1)
#print (sorted(arr1))
Okey you assume that (1, 5) and (5, 1) is the same point which is correct. However while you are looping from i+1 to 100 you adding arr1[j+1] I guess this is wrong consider when j=100 and you get the closest point then you will end up with arr1[101]
As İhsan Cemil Çiçek mentions, the main problem with your code is that you have p2=arr1[j+1], which should be p2=arr1[j].
However, there are a couple of things you can do to make this code more efficient.
There's no need to take the square root for every distance test. For non-negative d1 and d2, if sqrt(d1) < sqrt(d2) then d1 < d2, so we can just test the squared distances, and we only need to do a single expensive square root calculation when we've found the minimum.
Python has an efficient min function, so there's no need to find the minimum manually. Normally, min does a simple comparison of the values you pass it, but you can also supply it with a key function which it will use to make the comparisons.
You can use the combinations function from the standard itertools module to produce pairs of items from your points list with a single loop. This doesn't save much time, but it's cleaner than having a double loop.
Also, it's a good idea to supply a seed value to the random number generator when developing code that produces random values. This makes it easier to test & debug your code because it makes the results reproducible.
In the code below I've increased the range of the coordinates, because with 100 points with coordinates in the range 0 to 100 there's a high chance of generating duplicate points. You might like to use a set instead of a list if you don't want duplicate points.
from math import sqrt
from random import seed, randint
from itertools import combinations
seed(17)
high = 1000
numpoints = 100
points = [(randint(0, high), randint(0, high)) for _ in range(numpoints)]
points.sort()
print(points, '\n')
def dist(t):
a, b = t
x = a[0] - b[0]
y = a[1] - b[1]
return x*x + y*y
t = min(combinations(points, 2), key=dist)
a, b = t
print('{} {}: {}'.format(a, b, sqrt(dist(t))))
output
[(9, 51), (18, 443), (19, 478), (21, 635), (27, 254), (50, 165), (52, 918), (55, 746), (70, 316), (95, 707), (112, 939), (113, 929), (126, 903), (132, 256), (143, 832), (145, 698), (154, 692), (187, 200), (197, 765), (201, 154), (203, 317), (217, 51), (244, 119), (257, 983), (258, 880), (264, 76), (273, 65), (279, 343), (296, 178), (325, 655), (326, 174), (338, 552), (340, 96), (363, 51), (368, 59), (381, 585), (383, 593), (393, 834), (411, 140), (412, 496), (419, 83), (485, 648), (491, 76), (513, 821), (519, 962), (534, 424), (539, 980), (545, 572), (549, 312), (555, 87), (564, 63), (566, 923), (568, 545), (570, 218), (577, 537), (592, 801), (618, 848), (655, 614), (673, 413), (674, 314), (677, 284), (702, 141), (702, 215), (721, 553), (732, 654), (749, 974), (762, 279), (764, 429), (766, 732), (770, 756), (771, 356), (784, 722), (789, 319), (792, 5), (805, 282), (810, 896), (821, 978), (824, 911), (826, 310), (830, 323), (831, 418), (832, 518), (836, 400), (859, 256), (862, 996), (866, 700), (879, 485), (888, 415), (903, 722), (930, 588), (931, 496), (938, 356), (942, 323), (942, 344), (948, 429), (967, 741), (980, 254), (982, 488), (982, 604), (983, 374)]
(381, 585) (383, 593): 8.246211251235321
It will only work for first point, for all other points in list you are just checking the remaining points from (i+1 to n) not all points.(closest may also be in 0 to i)
You should use enumerate in the for loop, right now you are checking the i pair with all the pairs that appears after it in the array, what about the pairs before him?
also, you need to save the first and second pair that meet the condition of the distance as the i and j pair, why pair[j+1]?
Try this, I think it should work:
from math import sqrt
from random import randint
arr1=[]
dist=0
p1=[]
p2=[]
min1=1000
for i in range(0, 100):
arr1.append([randint(0,100),randint(0,100)])
print(arr1)
print("\n")
def dist(a,b):
x=pow((a[0]-b[0]),2)
y=pow((a[1]-b[1]),2)
return sqrt(x+y)
for i,x in enumerate (arr1):
for j,y in enumerate (arr1):
if (x != y):
dis=dist(arr1[i],arr1[j])
if(dis<min1):
min1=dis
p1=arr1[i]
p2=arr1[j]
print(p1,"",p2,min1)
print (sorted(arr1))

Is there a standard Python data structure that keeps things in sorted order?

I have a set of ranges that might look something like this:
[(0, 100), (150, 220), (500, 1000)]
I would then add a range, say (250, 400) and the list would look like this:
[(0, 100), (150, 220), (250, 400), (500, 1000)]
I would then try to add the range (399, 450), and it would error out because that overlapped (250, 400).
When I add a new range, I need to search to make sure the new range does not overlap an existing range. And no range will ever be in the list that overlaps another range in the list.
To this end, I would like a data structure that cheaply maintained its elements in sorted order, and quickly allowed me to find the element before or after a given element.
Is there a better way to solve this problem? Is there a data structure like that available in Python?
I know the bisect module exists, and that's likely what I will use. But I was hoping there was something better.
EDIT: I solved this using the bisect module. I had a link to the code since it was a bit longish. Unfortunately, paste.list.org turned out to be a bad place to put it because it's not there anymore.
It looks like you want something like bisect's insort_right/insort_left. The bisect module works with lists and tuples.
import bisect
l = [(0, 100), (150, 300), (500, 1000)]
bisect.insort_right(l, (250, 400))
print l # [(0, 100), (150, 300), (250, 400), (500, 1000)]
bisect.insort_right(l, (399, 450))
print l # [(0, 100), (150, 300), (250, 400), (399, 450), (500, 1000)]
You can write your own overlaps function, which you can use to check before using insort.
I assume you made a mistake with your numbers as (250, 400) overlaps (150, 300).
overlaps() can be written like so:
def overlaps(inlist, inrange):
for min, max in inlist:
if min < inrange[0] < max and max < inrange[1]:
return True
return False
Use SortedDict from the SortedCollection.
A SortedDict provides the same methods as a dict. Additionally, a SortedDict efficiently maintains its keys in sorted order. Consequently, the keys method will return the keys in sorted order, the popitem method will remove the item with the highest key, etc.
I've used it - it works. Unfortunately I don't have the time now to do a proper performance comparison, but subjectively it seems to have become faster than the bisect module.
Cheap searching and cheap insertion tend to be at odds. You could use a linked list for the data structure. Then searching to find the insertion point for a new element is O(n), and the subsequent insertion of the new element in the correct location is O(1).
But you're probably better off just using a straightforward Python list. Random access (i.e. finding your spot) takes constant time. Insertion in the correct location to maintain the sort is theoretically more expensive, but that depends on how the dynamic array is implemented. You don't really pay the big price for insertions until reallocation of the underlying array takes place.
Regarding checking for date range overlaps, I happen to have had the same problem in the past. Here's the code I use. I originally found it in a blog post, linked from an SO answer, but that site no longer appears to exist. I actually use datetimes in my ranges, but it will work equally well with your numeric values.
def dt_windows_intersect(dt1start, dt1end, dt2start, dt2end):
'''Returns true if two ranges intersect. Note that if two
ranges are adjacent, they do not intersect.
Code based on:
http://beautifulisbetterthanugly.com/posts/2009/oct/7/datetime-intersection-python/
http://stackoverflow.com/questions/143552/comparing-date-ranges
'''
if dt2end <= dt1start or dt2start >= dt1end:
return False
return dt1start <= dt2end and dt1end >= dt2start
Here are the unit tests to prove it works:
from nose.tools import eq_, assert_equal, raises
class test_dt_windows_intersect():
"""
test_dt_windows_intersect
Code based on:
http://beautifulisbetterthanugly.com/posts/2009/oct/7/datetime-intersection-python/
http://stackoverflow.com/questions/143552/comparing-date-ranges
|-------------------| compare to this one
1 |---------| contained within
2 |----------| contained within, equal start
3 |-----------| contained within, equal end
4 |-------------------| contained within, equal start+end
5 |------------| overlaps start but not end
6 |-----------| overlaps end but not start
7 |------------------------| overlaps start, but equal end
8 |-----------------------| overlaps end, but equal start
9 |------------------------------| overlaps entire range
10 |---| not overlap, less than
11 |-------| not overlap, end equal
12 |---| not overlap, bigger than
13 |---| not overlap, start equal
"""
def test_contained_within(self):
assert dt_windows_intersect(
datetime(2009,10,1,6,0), datetime(2009,10,1,7,0),
datetime(2009,10,1,6,30), datetime(2009,10,1,6,40),
)
def test_contained_within_equal_start(self):
assert dt_windows_intersect(
datetime(2009,10,1,6,0), datetime(2009,10,1,7,0),
datetime(2009,10,1,6,0), datetime(2009,10,1,6,30),
)
def test_contained_within_equal_end(self):
assert dt_windows_intersect(
datetime(2009,10,1,6,0), datetime(2009,10,1,7,0),
datetime(2009,10,1,6,30), datetime(2009,10,1,7,0),
)
def test_contained_within_equal_start_and_end(self):
assert dt_windows_intersect(
datetime(2009,10,1,6,0), datetime(2009,10,1,7,0),
datetime(2009,10,1,6,0), datetime(2009,10,1,7,0),
)
def test_overlaps_start_but_not_end(self):
assert dt_windows_intersect(
datetime(2009,10,1,6,0), datetime(2009,10,1,7,0),
datetime(2009,10,1,5,30), datetime(2009,10,1,6,30),
)
def test_overlaps_end_but_not_start(self):
assert dt_windows_intersect(
datetime(2009,10,1,6,0), datetime(2009,10,1,7,0),
datetime(2009,10,1,6,30), datetime(2009,10,1,7,30),
)
def test_overlaps_start_equal_end(self):
assert dt_windows_intersect(
datetime(2009,10,1,6,0), datetime(2009,10,1,7,0),
datetime(2009,10,1,5,30), datetime(2009,10,1,7,0),
)
def test_equal_start_overlaps_end(self):
assert dt_windows_intersect(
datetime(2009,10,1,6,0), datetime(2009,10,1,7,0),
datetime(2009,10,1,6,0), datetime(2009,10,1,7,30),
)
def test_overlaps_entire_range(self):
assert dt_windows_intersect(
datetime(2009,10,1,6,0), datetime(2009,10,1,7,0),
datetime(2009,10,1,5,0), datetime(2009,10,1,8,0),
)
def test_not_overlap_less_than(self):
assert not dt_windows_intersect(
datetime(2009,10,1,6,0), datetime(2009,10,1,7,0),
datetime(2009,10,1,5,0), datetime(2009,10,1,5,30),
)
def test_not_overlap_end_equal(self):
assert not dt_windows_intersect(
datetime(2009,10,1,6,0), datetime(2009,10,1,7,0),
datetime(2009,10,1,5,0), datetime(2009,10,1,6,0),
)
def test_not_overlap_greater_than(self):
assert not dt_windows_intersect(
datetime(2009,10,1,6,0), datetime(2009,10,1,7,0),
datetime(2009,10,1,7,30), datetime(2009,10,1,8,0),
)
def test_not_overlap_start_equal(self):
assert not dt_windows_intersect(
datetime(2009,10,1,6,0), datetime(2009,10,1,7,0),
datetime(2009,10,1,7,0), datetime(2009,10,1,8,0),
)
Maybe the module bisect could be better than the simple following function ? :
li = [(0, 100), (150, 220), (250, 400), (500, 1000)]
def verified_insertion(x,L):
u,v = x
if v<L[0][0]:
return [x] + L
elif u>L[-1][0]:
return L + [x]
else:
for i,(a,b) in enumerate(L[0:-1]):
if a<u and v<L[i+1][0]:
return L[0:i+1] + [x] + L[i+1:]
return L
lo = verified_insertion((-10,-2),li)
lu = verified_insertion((102,140),li)
le = verified_insertion((222,230),li)
lee = verified_insertion((234,236),le) # <== le
la = verified_insertion((408,450),li)
ly = verified_insertion((2000,3000),li)
for w in (lo,lu,le,lee,la,ly):
print li,'\n',w,'\n'
The function returns a list without modifying the list passed as argument.
result
[(0, 100), (150, 220), (250, 400), (500, 1000)]
[(-10, -2), (0, 100), (150, 220), (250, 400), (500, 1000)]
[(0, 100), (150, 220), (250, 400), (500, 1000)]
[(0, 100), (102, 140), (150, 220), (250, 400), (500, 1000)]
[(0, 100), (150, 220), (250, 400), (500, 1000)]
[(0, 100), (150, 220), (222, 230), (250, 400), (500, 1000)]
[(0, 100), (150, 220), (250, 400), (500, 1000)]
[(0, 100), (150, 220), (222, 230), (234, 236), (250, 400), (500, 1000)]
[(0, 100), (150, 220), (250, 400), (500, 1000)]
[(0, 100), (150, 220), (250, 400), (408, 450), (500, 1000)]
[(0, 100), (150, 220), (250, 400), (500, 1000)]
[(0, 100), (150, 220), (250, 400), (500, 1000), (2000, 3000)]
To answer your question:
Is there a data structure like that available in Python?
No there is not. But you can easily build one yourself using a list as the basic structure and code from the bisect module to keep the list in order and check for overlaps.
class RangeList(list):
"""Maintain ordered list of non-overlapping ranges"""
def add(self, range):
"""Add a range if no overlap else reject it"""
lo = 0; hi = len(self)
while lo < hi:
mid = (lo + hi)//2
if range < self[mid]: hi = mid
else: lo = mid + 1
if overlaps(range, self[lo]):
print("range overlap, not added")
else:
self.insert(lo, range)
I leave the overlaps function as an exercise.
(This code is untested and may need some tweeking)

Categories

Resources