Generating 3-tuples from a set of 2-tuples - python

In an earlier question:
Generating maximum number of 3-tuples from a list of 2-tuples
I got an answer from #AChampion that seems to work if the number of 2-tuples is divisible by 3. However, the solution fails if we, for example, have 10 2-tuples. After fumbling with it for a while I'm under the impression that it is impossible to find a perfect solution for say:
(1,2)(1,3),(1,4),(2,3),(2,4),(3,4)
So I'm interested in finding one solution that minimizes the number of remainder tuples. In the example above the result could be:
(1,2,3) # derived from (1,2), (1,3), (2,3)
(1,4),(2,4),(3,4) # remainder tuples
The rule for generating 3-tuple from 3 2-tuple is:
(a,b), (b,c), (c,a) -> (a, b, c)
i.e. the 2-tuples is a cycle with length 3. The order of the elements in a 3-tuple is not important, i.e:
(a,b,c) == (c,a,b)
I'm actually interested in the case where we have a number n:
for x in range(1,n+1):
for y in range(1,n+1):
if x!=y:
a.append((x,y))
# a = [ (1,2),...,(1,n), (2,1),(2,3),...,(2,n),...(n,1),...,(n,n-1) ]
From a, minimize the number of 2-tuples that is left when producing 3-tuples. Each 2-tuple can only be used once.
I wrapped my brain around this for several hours but I can't seem to come up with an elegant solution (well, neither have I found an ugly one:-) for the general case. Any thoughts?

For this you need to create number of combinations that will use for replacement. Then loop over you data for 3 item that contains any of above combinations and replace them.
I have done thi in several steps.
from itertools import combinations
# create replacements elements
number_combinations_raw = list(combinations(range(1, 5), 3))
# create proper number combinations
number_combinations = []
for item in number_combinations_raw:
if (item[0] + 1 == item[1]) and (item[1] + 1 == item[2]):
number_combinations.append(item)
# create test data
data = [(1, 2), (1, 3), (1, 4), (2, 3), (2, 4)]
# reduce data
reduce_data = []
for number_set in number_combinations:
count = 0
merged_data = []
for item in data:
if (number_set[0] in item and number_set[1] in item) or (number_set[1] in item and number_set[2] in item) \
or (number_set[0] in item and number_set[2] in item):
merged_data.append(item)
count += 1
if count == 3:
reduce_data.append((number_set, merged_data))
# delete merged elements from data list and add replacement
for item in data:
for reduce_item in reduce_data:
for element in reduce_item[1]:
if element in data:
data.remove(element)
data = [reduce_item[0]] + data
# remove duplicated replaced elements
final_list = list(dict.fromkeys(data))
Output:
[(1, 2, 3), (1, 4), (2, 4)]

Related

I'm not able to understand this code in tuple

init_tuple = [(0, 1), (1, 2), (2, 3)]
result = sum(n for _, n in init_tuple)
print(result)
The output for this code is 6. Could someone explain how it worked?
Your code extracts each tuple and sums all values in the second position (i.e. [1]).
If you rewrite it in loops, it may be easier to understand:
init_tuple = [(0, 1), (1, 2), (2, 3)]
result = 0
for (val1, val2) in init_tuple:
result = result + val2
print(result)
The expression (n for _, n in init_tuple) is a generator expression. You can iterate on such an expression to get all the values it generates. In that case it reads as: generate the second component of each tuple of init_tuple.
(Note on _: The _ here stands for the first component of the tuple. It is common in python to use this name when you don't care about the variable it refers to (i.e., if you don't plan to use it) as it is the case here. Another way to write your generator would then be (tup[1] for tup in init_tuple))
You can iterate over a generator expression using for loop. For example:
>>> for x in (n for _, n in init_tuple):
>>> print(x)
1
2
3
And of course, since you can iterate on a generator expression, you can sum it as you have done in your code.
To get better understanding first look at this.
init_tuple = [(0, 1), (1, 2), (2, 3)]
sum = 0
for x,y in init_tuple:
sum = sum + y
print(sum)
Now, you can see that what above code does is that it calculate sum of second elements of tuple, its equivalent to your code as both does same job.
for x,y in init_tuple:
x hold first value of tuple and y hold second of tuple, in first iteration:
x = 0, y = 1,
then in second iteration:
x = 1, y = 2 and so on.
In your case you don't need first element of tuple so you just use _ instead of using variable.

How to take intersection of 2 or more lists that includes tuples?

I want to compare tuples in two or more lists and print out the intersection of them. I have 25 element (which includes empty) in every tuple and tuple count changes in every list.
So far I have tried taking intersection of two lists, the code that I used can be seen below :
res_final = set(tuple(x) for x in res).intersection(set(tuple(x) for x in res1))
output:
set()
(res and res1 are my lists)
Hope this example helps:
import numpy as np
np.random.seed(0) # random seed for repeatability
a_ = np.random.randint(15,size=(1000,2)) # create random data for tuples
b_ = np.random.randint(15,size=(1000,2)) # create random data for tuples
a, b = set(tuple(d) for d in a_), set(tuple(d) for d in b_) # set of tuples
intersection = a&b # intersection
print(intersection) # result
In the code, matrices of random variables are created, then the rows are converted to tuples. Then we get the set of tuples and finally the important part for you, the intersection of the tuples.
If your input looks something like this:
in_1 = [(1, 1), (2, 2), (3, 3)]
in_2 = [(4, 4), (5, 5), (1, 1)]
in_3 = [(6, 6), (7, 7), (1, 1)]
ins = [in_1, in_2, in_3]
then I think you can use itertools.combinations to find pairwise intersections, and then take a set from them in order to remove duplicates.
from itertools import combinations
intersected = []
for first, second in combinations(ins, 2):
elems = set(first).intersection(set(second))
intersected.extend(elems)
dedup_intersected = set(intersected)
print(dedup_intersected)
# {(1, 1)}

How to retain position values of original list after the elements of the list have been sorted into pairs (Python)?

sample = ['AAAA','CGCG','TTTT','AT$T','ACAC','ATGC','AATA']
Position = [0, 1, 2, 3, 4, 5, 6]
I have the above sample with positions associated with each element. I do several steps of filtering, the code of which is given here.
The steps in the elimination are:
#If each base is identical to itself eliminate those elements eg. AAAA, TTTT
#If there are more than 2 types of bases (i.e.' conversions'== 1 ) then eliminate those elements eg. ATGC
#Make pairs of all remaining combinations
#If a $ in the pair, then the corresponding base from the other pair is eliminated eg. (CGCG,AT$T) ==> (CGG, ATT) and (ATT, AAA)
#Remove all pairs where one of the elements has all identical bases eg. (ATT,AAA)
In the end, I have an output with different combinations of the above as shown below.
Final Output [['CGG','ATT'],['CGCG','ACAC'],['CGCG','AATA'],['ATT','ACC']]
I need to find a way such that I get the positions of these pairs with respect to the original sample as below.
Position = [[1,3],[1,4],[1,6],[3,4]]
You could convert the list to a list of tuples first
xs = ['AAAA', 'CGCG', 'TTTT', 'AT$T', 'ACAC', 'ATGC', 'AATA']
ys = [(i, x) for i,x in enumerate(xs)]
print(ys)
=> [(0, 'AAAA'), (1, 'CGCG'), (2, 'TTTT'), (3, 'AT$T'), (4, 'ACAC'), (5, 'ATGC'), (6, 'AATA')]
Then work with that as your input list instead

Python: fast dictionary of big int keys

I have got a list of >10.000 int items. The values of the items can be very high, up to 10^27. Now I want to create all pairs of the items and calculate their sum. Then I want to look for different pairs with the same sum.
For example:
l[0] = 4
l[1] = 3
l[2] = 6
l[3] = 1
...
pairs[10] = [(0,2)] # 10 is the sum of the values of l[0] and l[2]
pairs[7] = [(0,1), (2,3)] # 7 is the sum of the values of l[0] and l[1] or l[2] and l[3]
pairs[5] = [(0,3)]
pairs[9] = [(1,2)]
...
The contents of pairs[7] is what I am looking for. It gives me two pairs with the same value sum.
I have implemented it as follows - and I wonder if it can be done faster. Currently, for 10.000 items it takes >6 hours on a fast machine. (As I said, the values of l and so the keys of pairs are ints up to 10^27.)
l = [4,3,6,1]
pairs = {}
for i in range( len( l ) ):
for j in range(i+1, len( l ) ):
s = l[i] + l[j]
if not s in pairs:
pairs[s] = []
pairs[s].append((i,j))
# pairs = {9: [(1, 2)], 10: [(0, 2)], 4: [(1, 3)], 5: [(0, 3)], 7: [(0, 1), (2, 3)]}
Edit: I want to add some background, as asked by Simon Stelling.
The goal is to find Formal Analogies like
lays : laid :: says : said
within a list of words like
[ lays, lay, laid, says, said, foo, bar ... ]
I already have a function analogy(a,b,c,d) giving True if a : b :: c : d. However, I would need to check all possible quadruples created from the list, which would be a complexity of around O((n^4)/2).
As a pre-filter, I want to use the char-count property. It says that every char has the same count in (a,d) and in (b,c). For instance, in "layssaid" we have got 2 a's, and so we do in "laidsays"
So the idea until now was
for every word to create a "char count vector" and represent it as an integer (the items in the list l)
create all pairings in pairs and see if there are "pair clusters", i.e. more than one pair for a particular char count vector sum.
And it works, it's just slow. The complexity is down to around O((n^2)/2) but this is still a lot, and especially the dictionary lookup and insert is done that often.
There are the trivial optimizations like caching constant values in a local variable and using xrange instead of range:
pairs = {}
len_l = len(l)
for i in xrange(len_l):
for j in xrange(i+1, len_l):
s = l[i] + l[j]
res = pairs.setdefault(s, [])
res.append((i,j))
However, it is probably far more wise to not pre-calculate the list and instead optimize the method on a concept level. What is the intrinsic goal you want to achieve? Do you really just want to calculate what you do? Or are you going to use that result for something else? What is that something else?
Just a hint. Have a look on itertools.combinations.
This is not exactly what you are looking for (because it stores pair of values, not of indexes), but it can be a starting code:
from itertools import combinations
for (a, b) in combinations(l, 2):
pairs.setdefault(a + b, []).append((a, b))
The above comment from SimonStelling is correct; generating all possible pairs is just fundamentally slow, and there's nothing you can do about it aside from altering your algorithm. The correct function to use from itertools is product; and you can get some minor improvements from not creating extra variables or doing unnecessary list indexes, but underneath the hood these are still all O(n^2). Here's how I would do it:
from itertools import product
l = [4,3,6,1]
pairs = {}
for (m,n) in product(l,repeat=2):
pairs.setdefault(m+n, []).append((m,n))
Finally, I have came up with my own solution, just taking half of the calculation time on average.
The basic idea: Instead of reading and writing into the growing dictionary n^2 times, I first collect all the sums in a list. Then I sort the list. Within the sorted list, I then look for same neighbouring items.
This is the code:
from operator import itemgetter
def getPairClusters( l ):
# first, we just store all possible pairs sequentially
# clustering will happen later
pairs = []
for i in xrange( len( l) ):
for j in xrange(i+1, len( l ) ):
pair = l[i] + l[j]
pairs.append( ( pair, i, j ) )
pairs.sort(key=itemgetter(0))
# pairs = [ (4, 1, 3), (5, 0, 3), (7, 0, 1), (7, 2, 3), (9, 1, 2), (10, 0, 2)]
# a list item of pairs now contains a tuple (like (4, 1, 3)) with
# * the sum of two l items: 4
# * the index of the two l items: 1, 3
# now clustering starts
# we want to find neighbouring items as
# (7, 0, 1), (7, 2, 3)
# (since 7=7)
pairClusters = []
# flag if we are within a cluster
# while iterating over pairs list
withinCluster = False
# iterate over pair list
for i in xrange(len(pairs)-1):
if not withinCluster:
if pairs[i][0] == pairs[i+1][0]:
# if not within a cluster
# and found 2 neighbouring same numbers:
# init new cluster
pairCluster = [ ( pairs[i][1], pairs[i][2] ) ]
withinCluster = True
else:
# if still within cluster
if pairs[i][0] == pairs[i+1][0]:
pairCluster.append( ( pairs[i][1], pairs[i][2] ) )
# else cluster has ended
# (next neighbouring item has different number)
else:
pairCluster.append( ( pairs[i][1], pairs[i][2] ) )
pairClusters.append(pairCluster)
withinCluster = False
return pairClusters
l = [4,3,6,1]
print getPairClusters(l)

making list in python

When i executed the following python script
list= (1,2,3,4,1,2,7,8)
for number in list:
item1= number
item2= list[list.index(item1)+2]
couple= item1, item2
print couple
the goal is to link each number with the second following
I obtain this result
(1, 3)
(2, 4)
(3, 1)
(4, 2)
(1, 3)
(2, 4)
(and then the index gets out of range but this is not the problem)
My question is why the number 1 in the fifth line is still coupled to the number 3 and how can i make that it is coupled to the number 7; idem for the number 2 in the sixth line that should be coupled to the number 8.
additional question
what do I do if i only want to make a list of the couples that start with 1: [(1,3), (1,7)]
list.index returns the offset of the first occurrence of the value in the list - thus if you do [1,1,1].index(1), the answer will always be 0, even though 1 and 2 are also valid answers.
Instead, try:
from itertools import islice, izip, ifilter
mylist = [1,2,3,4,1,2,7,8]
for pair in ifilter(lambda x: x[0]==1, izip(mylist, islice(mylist, 2, None))):
print pair
results in
(1, 3)
(1, 7)
xs.index(x) gives you the index of the first occurence of x in xs. So when you get to the second 1, .index gives you the index of the first 1.
If you need the index alongside the value, use enumerate: for i, number in enumerate(numbers): print number, numbers[i+2].
Note that I deliberately didn't use the name list. It's the name of a built-in, you shouldn't overwrite it. Also note that (..., ...) is a tuple (and therefore can't be changed), not a list (which is defined in square brackets [..., ...] and can be changed).
You have duplicates in the list so index always returns the first index.
Start your program with for index in range(len(list) - 1)
You are using .index which returns the first occurrence of number.
consider:
for number in range(len(list)):
item1= list[number]
item2= list[number+2]
couple= item1, item2
print couple
>>> zip(lst, lst[2:])
[(1, 3), (2, 4), (3, 1), (4, 2), (1, 7), (2, 8)]
To get only pairs (1, X):
>>> [(a, b) for (a, b) in zip(lst, lst[2:]) if a == 1]
[(1, 3), (1, 7)]
Recommended reading:
http://docs.python.org/tutorial/datastructures.html
http://docs.python.org/howto/functional.html

Categories

Resources