eliminating redundant tuples - python

If I have a list of tuples, where each tuple represents variables, a, b and c, how can I eliminate redundant tuples?
Redundant tuples are those where a and b are simply interchanged, but c is the same. So for this example:
tups = [(30, 40, 50), (40, 30, 50), (20, 48, 52), (48, 20, 52)]
my final list should only contain only half of the entries. One possible output:
tups = [(30, 40, 50), (20, 48, 52)]
another
tups = [(40, 30, 50), (20, 48, 52)]
etc.
Is there an easy Pythonic way to do this?
I tried using sets, but (30, 40, 50) is different from (40, 30, 50), but to me these are redundant and I'd just like to keep one of them (doesn't matter which, but if I could pick I'd prefer the low to high value order). If there was a way to sort the first 2 elements of the tuples, then using the set would work.
I am sure I could hack together a working solution (perhaps converting tuples to lists as intermediate step), but I just wanted to see if there's an easy and obvious way to do this that I'm not familiar with.
PS: This question partially motivated by PE #39. But even aside from this PE problem, I am now just curious how this could be done easily (or if).
Edit:
Just to provide a bit of context for those not familiar with PE #39 - a, b, and c represent sides of a right triangle, so I'm checking if a**2 + b**2 == c**2, clearly the order of a and b don't matter.

set([(a,b,c) if a<b else (b,a,c) for a,b,c in tups])

From your question, it seems that the first two elements of your tuples form a sub-unit within the tuple. Therefore it would seem to make sense to restructure your data as a tuple of a tuple and a third number, where the first tuple is the first two numbers in sorted order. Then you can naturally use sets:
>>> newTups = [(tuple(sorted([a, b])), c) for a, b, c in tups]
>>> newTups
[((30, 40), 50), ((30, 40), 50), ((20, 48), 52), ((20, 48), 52)]
>>> set(newTups)
set([((20, 48), 52), ((30, 40), 50)])

tups = [(30, 40, 50), (40, 30, 50), (20, 48, 52), (48, 20, 52)]
no_duplicates = list(set(tuple(sorted(tup)) for tup in tups))
Of course this is assuming that the 3rd element of each tuple will always be the largest element in each tuple, otherwise, do this:
no_duplicates = list(set(tuple(sorted(tup[:2])) + (tup[2],) for tup in tups))
As WolframH suggested, the expression tuple(sorted(tup[:2])) + (tup[2],) can be written as tuple(sorted(tup[:2])) + tup[2:], which is advantageous because it can be generalized to tuple(sorted(tup[:i])) + tup[i:], where i can be any point that one wants to separate the sorted elements from the unsorted elements.

Convert each of your tuples into a frozenset and create a set of these frozensets.
tups = [(30, 40, 50), (40, 30, 50), (20, 48, 52), (48, 20, 52)]
frozen_sets = { frozenset(x) for x in tups }
tups2 = [tuple(x) for x in frozen_sets]
This works because frozenset([1,2,3]) == frozenset([3,1,2]), in contrast to tuples, where (1,2,3) != (3,1,2).
You have to convert the tuples into frozensets rather than simple sets because you get the following error when you try to make one set a member of another set:
TypeError: unhashable type: 'set'
frozensets are hashable, and so avoid this problem.

If you do not care about the order for the first two elements, you don't really want to use 3-uples : just convert to a new data structure which discards the information you do not need.
result = {({x[0],x[1]},x[2]) for x in tups}

Related

mantain a sum in list comprehension?

this is my code
def width2colspec(widths):
tupleback = []
a=0
for w in widths:
b=a+w
tupleback.append((a,a+w))
a=b
return tupleback
eg:
widths=[15,9,50,10]
width2colspec(widths)
Result:
[(0, 15), (15, 24), (24, 74), (74,84)]
(first one always has to be a zero)
It works and all(maybe not very elegant tho)
To practice I tried to convert it into a list comprehension one liner but i couldn't make it work, closest i got was this.
widths=[15,9,50,10]
colspecs=list((widths[i],widths[i]+widths[i+1]) for i in range(len(widths)-1))
result:
[(0, 15), (15, 24), (9, 59), (50,60)]
(its not maintaining the sum trough the loop)
So my question is, is it possible?
You can do this as a pure list comprehension, but it involves a lot of re-computation, so I wouldn't recommend actually doing it this way.
Start by building a list of all the sums (note that this is re-summing the same numbers over and over, so it's less efficient than your original code that keeps a running sum):
>>> [sum(widths[:i]) for i in range(len(widths)+1)]
[0, 15, 24, 74, 84]
and then iterate over that to produce your list of tuples:
>>> [tuple([sum(widths[:i]) for i in range(len(widths)+1)][i:i+2]) for i in range(len(widths))]
[(0, 15), (15, 24), (24, 74), (74, 84)]
Note that we're now re-computing all those sums again because we didn't assign the original list to a variable.
If you're using Python 3.8 or later, you can use an assignment expression (utilizing the walrus operator :=) to do this in a single list comprehension. I'm "cheating" a little by initializing last via a default argument, but this isn't strictly necessary (I just wanted to make it a "one-liner").
def width2colspec(widths, last=0):
return [(last, last := last+width) for width in widths]
I don't find it that confusing, to be honest.

intercept dictionary items which are two dimensional array

Here is my dictionary of n items.
{
"proceed": [[6,46] , [7,67], [12,217], [67,562], [67,89]],
"concluded": [[6,46] , [783,123], [121,521], [67,12351], [67,12351]],
...
}
imagine a dictionary s.t. like that with n keys and items which are two dimensional arrays.
I want to intercept all of them and take the result as [6,46]
I tried s.t. like that :
result=set.intersection(*map(set,output.values()))
however it got error because of items are two dimensinal array.
Can someone please help me how to do that ?
Thanks.
So... sets don't work for lists because lists are not hashable. Instead you'll have to make them sets of tuples like so:
result = set.intersection(*({tuple(p) for p in v} for v in output.values()))
Edit: works in py version >= 2.7
Completely agree with answer of #FHTMitchell but here's a bit of more explanation with example of why you can't get unique set with list and get TypeError: unhashable type
Consider below values:
x = {'concluded': [[6, 46], [783, 123], [121, 521], [67, 12351], [67, 12351]],
'proceed': [[6, 46], [7, 1], [12, 217], [67, 562], [67, 89]]}
y = {'concluded': ((6, 46), (67, 12351), (121, 521), (783, 123)),
'proceed': ((6, 46), (7, 1), (12, 217), (67, 89), (67, 562))}
x is the dictionary containing list of list as values; the main thing to note is that value of keys are stored as list which is mutable; but in y it's tuple of tuples or you may keep it as set which is not mutable
Now consider some how you managed to get your desire output [6,46] but if you notice it's a list contains some elements stored in a list so if you change the values as below:
x['proceed'][0][0] = 9
it will change your value [6, 46] to [9,46] in concluded key and now your output may or may not change which depends on how you iterated and stored it.

Append difference of elements in tuples in list

If I have a list such as [(10, 22), (12, 50), (13, 15)] and would like to append the difference of these numbers so that the list would look like [(12, 10, 22), (38, 12, 50), (2, 13, 15)] how can I do this?
I have this line of code newList = [[???]+list(tup) for tup in list] but am not sure what to put where the question marks are to get what I want.
Thanks a lot
tuples can't be modified (they are immutable). So you will have to create new tuples. It looks like you are prepending the difference rather than appending.
newList = [(b-a, a,b) for (a,b) in oldList]

Sorting lists in dictionary

Could someone please explain how I could sort a list in dictionary? For example:
B = {'Mary': [(850, 1000), (9, 10), (1000, 3000), (250, 550)], 'john': [(500, 1000), (800,3000), (20, 100), (5, 36)]}
Using the 'sorted' function, how do I sort it in ascending order based on the first value in the list? Likewise, how do I sort it in ascending order based on the second value in the list?
Many thanks
I would iterate through your items, then in-place sort based on the first element of each tuple.
B = {
'Mary': [(850, 1000), (9, 10), (1000, 3000), (250, 550)],
'john': [(500, 1000), (800,3000), (20, 100), (5, 36)],
}
for item in B:
B[item].sort(key = lambda i: i[0])
Output
{
'john': [(5, 36), (20, 100), (500, 1000), (800, 3000)],
'Mary': [(9, 10), (250, 550), (850, 1000), (1000, 3000)]
}
You have to use its key argument. Key is a function which takes the element of the iterable as an agrument and returns the value on which sorting is based:
for e in B:
B[e] = sorted(B[e], key=lambda x: x[Element_ID])
Element ID is the index of the element on which you want to base your sort. So it will be 1 if you want to sort according to the second element and 0 if you want to sort according to the first element.
EDIT:
Also it would be faster to use list's sort method instead of sorted:
for e in B:
B[e].sort(B[e], key=lambda x: x[Element_ID])

How do you construct an array suitable for numpy sorting?

I need to sort two arrays simultaneously, or rather I need to sort one of the arrays and bring the corresponding element of its associated array with it as I sort. That is if the array is [(5, 33), (4, 44), (3, 55)] and I sort by the first axis (labeled below dtype='alpha') then I want: [(3.0, 55.0) (4.0, 44.0) (5.0, 33.0)]. These are really big data sets and I need to sort first ( for nlog(n) speed ) before I do some other operations. I don't know how to merge my two separate arrays though in the proper manner to get the sort algorithm working. I think my problem is rather simple. I tried three different methods:
import numpy
x=numpy.asarray([5,4,3])
y=numpy.asarray([33,44,55])
dtype=[('alpha',float), ('beta',float)]
values=numpy.array([(x),(y)])
values=numpy.rollaxis(values,1)
#values = numpy.array(values, dtype=dtype)
#a=numpy.array(values,dtype=dtype)
#q=numpy.sort(a,order='alpha')
print "Try 1:\n", values
values=numpy.empty((len(x),2))
for n in range (len(x)):
values[n][0]=y[n]
values[n][1]=x[n]
print "Try 2:\n", values
#values = numpy.array(values, dtype=dtype)
#a=numpy.array(values,dtype=dtype)
#q=numpy.sort(a,order='alpha')
###
values = [(x[0], y[0]), (x[1],y[1]) , (x[2],y[2])]
print "Try 3:\n", values
values = numpy.array(values, dtype=dtype)
a=numpy.array(values,dtype=dtype)
q=numpy.sort(a,order='alpha')
print "Result:\n",q
I commented out the first and second trys because they create errors, I knew the third one would work because that was mirroring what I saw when I was RTFM. Given the arrays x and y (which are very large, just examples shown) how do I construct the array (called values) that can be called by numpy.sort properly?
*** Zip works great, thanks. Bonus question: How can I later unzip the sorted data into two arrays again?
I think what you want is the zip function. If you have
x = [1,2,3]
y = [4,5,6]
then zip(x,y) == [(1,4),(2,5),(3,6)]
So your array could be constructed using
a = numpy.array(zip(x,y), dtype=dtype)
for your bonus question -- zip actually unzips too:
In [1]: a = range(10)
In [2]: b = range(10, 20)
In [3]: c = zip(a, b)
In [4]: c
Out[4]:
[(0, 10),
(1, 11),
(2, 12),
(3, 13),
(4, 14),
(5, 15),
(6, 16),
(7, 17),
(8, 18),
(9, 19)]
In [5]: d, e = zip(*c)
In [6]: d, e
Out[6]: ((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (10, 11, 12, 13, 14, 15, 16, 17, 18, 19))
Simon suggested argsort as an alternative approach; I'd recommend it as the way to go. No messy merging, zipping, or unzipping: just access by index.
idx = numpy.argsort(x)
ans = [ (x[idx[i]],y[idx[i]]) for i in idx]
zip() might be inefficient for large arrays. numpy.dstack() could be used instead of zip:
ndx = numpy.argsort(x)
values = numpy.dstack((x[ndx], y[ndx]))
I think you just need to specify the axis that you are sorting on when you have made your final ndarray. Alternatively argsort one of the original arrays and you'll have an index array that you can use to look up in both x and y, which might mean you don't need values at all.
(scipy.org seems to be unreachable right now or I would post you a link to some docs)
Given that your description doesn't quite match your code snippet it's hard to say with certainty, but I think you have over-complicated the creation of your numpy array.
I couldn't get a working solution using Numpy's sort function, but here's something else that works:
import numpy
x = [5,4,3]
y = [33,44,55]
r = numpy.asarray([(x[i],y[i]) for i in numpy.lexsort([x])])
lexsort returns the permutation of the array indices which puts the rows in sorted order. If you wanted your results sorted on multiple keys, e.g. by x and then by y, use numpy.lexsort([x,y]) instead.

Categories

Resources