How to split list of tuples on condition? - python

I have list of tuples:
my_list = [(1,'a','b','c'), (2,'d','e','f'), (3,'g','h','i'), (1,'j','k','l'), (2,'m','n','o'), (1,'p','q','r'), (2,'s','t','u')]
I need to split it on sublists of tuples starting with a tuple where first item is '1'.
[(1,'a','b','c'), (2,'d','e','f'), (3,'g','h','i')]
[(1,'j','k','l'), (2,'m','n','o')]
[(1,'p','q','r'), (2,'s','t','u')]

You're effectively computing some kind of a "groupwhile" function -- you want to split at every tuple you find starting in a 1. This looks an awful lot like itertools.groupby, and if we keep a tiny bit of global state (the one_count variable in our example) we can re-use the grouping/aggregation logic already built-in to the language to get your desired result.
import itertools
# The inner function is just so that one_count will be initialized only
# as many times as we want to call this rather than exactly once via
# some kind of global variable.
def gen_count():
def _cnt(t, one_count=[0]):
if t[0] == 1:
one_count[0] += 1
return one_count[0]
return _cnt
result = [list(g[1]) for g in itertools.groupby(my_list, key=gen_count())]
A more traditional solution would be to iterate through your example and append intermediate outputs to a result set.
result = []
for i, *x in my_list:
if i==1:
result.append([(i, *x)])
else:
result[-1].append((i, *x))

Try this code. I assume the break is when the first character (1) is found again. I also assume the output is a list.
my_list = [(1,'a','b','c'), (2,'d','e','f'), (3,'g','h','i'), (1,'j','k','l'), (2,'m','n','o'), (1,'p','q','r'), (2,'s','t','u')]
ch = my_list[0][0]
all = []
st = 0
for i, t in enumerate(my_list):
if t[0] == ch:
if i != 0:
all.append(my_list[st:i])
st = i
else:
all.append(my_list[st:i])
print(all)
Output
[
[(1, 'a', 'b', 'c'), (2, 'd', 'e', 'f'), (3, 'g', 'h', 'i')],
[(1, 'j', 'k', 'l'), (2, 'm', 'n', 'o')],
[(1, 'p', 'q', 'r')]
]

Related

Copying from one file to another file and pairing in python

I have a file (say, file.txt) which looks like:
A,B,C,32
D,E,F,65
G,H,I,76
J,K,L,98
M,N,O,55
J,K,L,98
S,T,U,46
G,H,I,76
Now as there are 8 rows so I want to make 4 pairs. For the 1st time when it will take 2 rows, it will take the row with the highest 4th column value and the 2nd row can be anyone from the remaining 7 rows but self pairing is not allowed. That means [(J,K,L,98),(J,K,L,98)] can't be one pair. Suppose in the first iteration it has taken [(J,K,L,98),(S,T,U,46)] as a pair, so in the next iteration there will be rest of the 6 rows participating.
Below is how I was trying:
from random import choice
file_name = 'file.txt'
fp = open(file_name)
val = []
for line in fp.readlines():
val.append(line.replace("\n","").split(","))
val = sorted(val, key=lambda x: x[-1], reverse=True)
item_list = []
for i in val:
i = list(map(str, i))
s = ','.join(i).replace("\n","")
item_list.append(s)
print(*item_list,sep='\n')
list1=[]
for i in val:
list1.append(i[-1])
print(list1)
p=max(list1)
print(max(list1))
final_list=[]
for i in val:
if(i[-1]==p):
final_list.append(i)
elif(i[-1]==choice(list1)):
final_list.append(i)
print(final_list)
Please help me out.
My interpretation of the problem is that you need to take a row from the file (table) with the highest numeric value in the 4th token. You then want to take another row at random and make that into a pair (a list of tuples). Key rule is that no pair can have equal tuples. I propose this:
from random import choice
# handles each row from the file creating an appropriately structured tuple
def mfunc(s):
t = s.strip().split(',')
t[-1] = int(t[-1])
return tuple(t)
# selects a random tuple from the given list ensuring that the selection cannot match the reference tuple
def gar(lst, ref=None):
rv = choice([e for e in lst if e != ref]) if ref else choice(lst)
lst.remove(rv)
return rv
with open('file.txt') as infile:
rows = sorted(list(map(mfunc, infile.readlines())), key=lambda x: x[-1])
output = []
t = rows[-1] # get row with highest value
rows.pop(-1) # then remove it
output.append([t, gar(rows, t)])
while len(rows) > 1:
a = gar(rows)
output.append([a, gar(rows, a)])
print(output)
Output (example):
[[('J', 'K', 'L', 98), ('M', 'N', 'O', 55)], [('S', 'T', 'U', 46), ('A', 'B', 'C', 32)], [('G', 'H', 'I', 76), ('D', 'E', 'F', 65)], [('G', 'H', 'I', 76), ('J', 'K', 'L', 98)]]

why this python function skips index 1 to 3 without iterating index 2 when running in a for loop

I wrote a function in order to remove parts that duplicates in two strings. I first transform string into list and iterate through the two list to find if characters on the same position are the same. The problem is when iterating,
the code skips index 2. (ex:list="index",the iterator jump to 'd' after iterating 'i').
I've tried to use "replace" method to do string operation but I did not get the result I want. "Replace" method removed parts that I want.
def popp(s,t):
s_lis=list(s)
t_lis=list(t)
ind=0
for i,j in zip(s_lis,t_lis):
if i==j:
s_lis.pop(ind)
t_lis.pop(ind)
else:ind+=1
return s_lis,t_lis
# test the code
print(popp('hackerhappy','hackerrank'))
expected result: ['h','p','p','y'] ['r','n','k']
actual result: ['k', 'r', 'h', 'a', 'p', 'p', 'y'], ['k', 'r', 'r', 'a', 'n', 'k']
To begin with, you should use itertools.zip_longest which makes a zip out of the longest subsequence. You are using zip which makes a zip out of the shortest subsequence which is what you don't want.
So in our case, it will be
print(list(zip_longest(s_lis, t_lis)))
#[('h', 'h'), ('a', 'a'), ('c', 'c'), ('k', 'k'), ('e', 'e'),
#('r', 'r'), ('h', 'r'), ('a', 'a'), ('p', 'n'), ('p', 'k'), ('y', None)]
Then you should use another list to append the non-common characters rather then operating on the same list you are iterating on via s_lis.pop(idx)
So if the characters in the tuple do not match, append them if they are not None
from itertools import zip_longest
def popp(s,t):
s_lis = list(s)
t_lis = list(t)
s_res = []
t_res = []
#Use zip_longest to zip the two lists
for i, j in zip_longest(s_lis, t_lis):
#If the characters do not match, and they are not None, append them
#to the list
if i != j:
if i!=None:
s_res.append(i)
if j!=None:
t_res.append(j)
return s_res, t_res
The output will look like:
print(popp('hackerhappy','hackerrank'))
#(['h', 'p', 'p', 'y'], ['r', 'n', 'k'])
You could modify your code slightly
def popp(s, t):
s_lis = list(s)
t_lis = list(t)
s_res = []
t_res = []
# match each character. Stops once the
# shortest list ends
for i, j in zip(s_lis, t_lis):
if i != j:
s_res.append(i)
t_res.append(j)
# if s is longer, take rest of the string and
# add it to residual
if len(s) > len(t):
for x in s_lis[len(t):]:
s_res.append(x)
if len(t) > len(s):
for x in t_lis[len(s):]:
t_res.append(x)
print(s_res)
print(t_res)
popp('hackerhappy','hackerrank')

'Clumping' a list in python

I've been trying to 'clump' a list
I mean putting items together depending on the item inbetween, so ['d','-','g','p','q','-','a','v','i'] becomes ['d-g','p','q-a','v','i'] when 'clumped' around any '-'
Here's my attempt:
def clump(List):
box = []
for item in List:
try:
if List[List.index(item) + 1] == "-":
box.append("".join(List[List.index(item):List.index(item)+3]))
else:
box.append(item)
except:
pass
return box
However, it outputs (for the example above)
['d-g', '-', 'g', 'p', 'q-a', '-', 'a', 'v']
As I have no idea how to skip the next two items
Also, the code is a complete mess, mainly due to the try and except statement (I use it, otherwise I get an IndexError, when it reaches the last item)
How can it be fixed (or completely rewritten)?
Thanks
Here's an O(n) solution that maintains a flag determining whether or not you are currently clumping. It then manipulates the last item in the list based on this condition:
def clump(arr):
started = False
out = []
for item in arr:
if item == '-':
started = True
out[-1] += item
elif started:
out[-1] += item
started = False
else:
out.append(item)
return out
In action:
In [53]: clump(x)
Out[53]: ['d-g', 'p', 'q-a', 'v', 'i']
This solution will fail if the first item in the list is a dash, but that seems like it should be an invalid input.
Here is a solution using re.sub
>>> import re
>>> l = ['d','-','g','p','q','-','a','v','i']
>>> re.sub(':-:', '-', ':'.join(l)).split(':')
['d-g', 'p', 'q-a', 'v', 'i']
And here is another solution using itertools.zip_longest
>>> from itertools import zip_longest
>>> l = ['d','-','g','p','q','-','a','v','i']
>>> [x+y+z if y=='-' else x for x,y,z in zip_longest(l, l[1:], l[2:], fillvalue='') if '-' not in [x,z]]
['d-g', 'g', 'q-a', 'a', 'v', 'i']

Append all possibilities from sequences with numbers

I have a question that is consuming my brain. Let us suppose that the variable I stores a sequence, and the variable II another one, and the variable III another one too. The variable one will represent the number 1, the next 2 and the next 3; and then I have another key variable with random characters of these 3 sequences. Giving that fact, I can easily translate the characters of this key variable in the correspondent numbers. In the example, x = 'afh', than, it is the same to say that x = '123', because A OR B OR C = 1, and so on.
Now comes the complicated part:
When the key variable x is translated into numbers, each character individually, I can also return characters randomly from the result. For example: x = '123', then I can return a list like ['a','e','f'], or ['b','d','i'], especially if I use random.choice(). From this, what I couldn't figure out how to do yet is:
How can I append into a list ALL THE POSSIBLE VARIATIONS from the variables I, II, III. For example:
['adg','beh','cfi','aei','ceg',...]
I know how to print endlessly random combinations, but in this case, I get repetitions, and I don't want them. I want to append to a list exactly all the possible variations between I, II and III, because when they're translated into numbers, I can return any character from the correspondent sequence. Well, I hope my example is self-explainable. I thank you all very much for the attention!
I = 'abc' # 1
II = 'def' # 2
III = 'ghi' # 3
x = 'afh' # Random possibility: It could be an input.
L = []
LL = []
for i in range(len(x)):
if x[i] in I:
L.append(1)
if x[i] in II:
L.append(2)
if x[i] in III:
L.append(3)
for i in range(len(L)): # Here lies the mistery...
if L[i] == 1:
LL.append(I)
if L[i] == 2:
LL.append(II)
if L[i] == 3:
LL.append(III)
print L
print LL
The output is:
[1, 2, 3]
['abc', 'def', 'ghi']
Here's how I would rewrite your code. Lengthy if statements like yours are a big code smell. I put the sequences into a tuple and used a single loop. I also replaced the second loop with a list comprehension.
By the way, you could also simplify the indexing if you used zero based indexing like a sensible person.
I = 'abc' # 1
II = 'def' # 2
III = 'ghi' # 3
x = 'afh' # Random possibility: It could be an input.
L = []
LL = []
lists = I, II, III
for c in x:
for i, seq in enumerate(lists):
if c in seq:
L.append(i+1)
LL = [lists[i-1] for i in L]
print L
print LL
Also, be sure to check out the itertools module, and in particular the product function. It's not clear exactly what you mean, but product gives you all combinations of an item from each of a list of sequences.
Thank you very much Antimony! The answer is exactly product() from itertools. The code with it is bloody far more simple:
from itertools import *
I = 'abc' # 1
II = 'def' # 2
III = 'ghi' # 3
IV = product(I,II,III)
for i in IV:
print i
And the output is exactly what I wanted, every possible combination:
('a', 'd', 'g')
('a', 'd', 'h')
('a', 'd', 'i')
('a', 'e', 'g')
('a', 'e', 'h')
('a', 'e', 'i')
('a', 'f', 'g')
('a', 'f', 'h')
('a', 'f', 'i')
('b', 'd', 'g')
('b', 'd', 'h')
('b', 'd', 'i')
('b', 'e', 'g')
('b', 'e', 'h')
('b', 'e', 'i')
('b', 'f', 'g')
('b', 'f', 'h')
('b', 'f', 'i')
('c', 'd', 'g')
('c', 'd', 'h')
('c', 'd', 'i')
('c', 'e', 'g')
('c', 'e', 'h')
('c', 'e', 'i')
('c', 'f', 'g')
('c', 'f', 'h')
('c', 'f', 'i')
python 3.2
[(i,v,c) for i in I for v in II for c in III]

Multiple Tuple to Two-Pair Tuple in Python?

What is the nicest way of splitting this:
tuple = ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h')
into this:
tuples = [('a', 'b'), ('c', 'd'), ('e', 'f'), ('g', 'h')]
Assuming that the input always has an even number of values.
zip() is your friend:
t = ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h')
zip(t[::2], t[1::2])
[(tuple[a], tuple[a+1]) for a in range(0,len(tuple),2)]
Or, using itertools (see the recipe for grouper):
from itertools import izip
def group2(iterable):
args = [iter(iterable)] * 2
return izip(*args)
tuples = [ab for ab in group2(tuple)]
I present this code based on Peter Hoffmann's answer as a response to dfa's comment.
It is guaranteed to work whether or not your tuple has an even number of elements.
[(tup[i], tup[i+1]) for i in range(0, (len(tup)/2)*2, 2)]
The (len(tup)/2)*2 range parameter calculates the highest even number less or equal to the length of the tuple so it is guaranteed to work whether or not the tuple has an even number of elements.
The result of the method is going to be a list. This can be converted to tuples using the tuple() function.
Sample:
def inPairs(tup):
return [(tup[i], tup[i+1]) for i in range(0, (len(tup)/2)*2, 2)]
# odd number of elements
print("Odd Set")
odd = range(5)
print(odd)
po = inPairs(odd)
print(po)
# even number of elements
print("Even Set")
even = range(4)
print(even)
pe = inPairs(even)
print(pe)
Output
Odd Set
[0, 1, 2, 3, 4]
[(0, 1), (2, 3)]
Even Set
[0, 1, 2, 3]
[(0, 1), (2, 3)]
Here's a general recipe for any-size chunk, if it might not always be 2:
def chunk(seq, n):
return [seq[i:i+n] for i in range(0, len(seq), n)]
chunks= chunk(tuples, 2)
Or, if you enjoy iterators:
def iterchunk(iterable, n):
it= iter(iterable)
while True:
chunk= []
try:
for i in range(n):
chunk.append(it.next())
except StopIteration:
break
finally:
if len(chunk)!=0:
yield tuple(chunk)

Categories

Resources