Copying from one file to another file and pairing in python - python

I have a file (say, file.txt) which looks like:
A,B,C,32
D,E,F,65
G,H,I,76
J,K,L,98
M,N,O,55
J,K,L,98
S,T,U,46
G,H,I,76
Now as there are 8 rows so I want to make 4 pairs. For the 1st time when it will take 2 rows, it will take the row with the highest 4th column value and the 2nd row can be anyone from the remaining 7 rows but self pairing is not allowed. That means [(J,K,L,98),(J,K,L,98)] can't be one pair. Suppose in the first iteration it has taken [(J,K,L,98),(S,T,U,46)] as a pair, so in the next iteration there will be rest of the 6 rows participating.
Below is how I was trying:
from random import choice
file_name = 'file.txt'
fp = open(file_name)
val = []
for line in fp.readlines():
val.append(line.replace("\n","").split(","))
val = sorted(val, key=lambda x: x[-1], reverse=True)
item_list = []
for i in val:
i = list(map(str, i))
s = ','.join(i).replace("\n","")
item_list.append(s)
print(*item_list,sep='\n')
list1=[]
for i in val:
list1.append(i[-1])
print(list1)
p=max(list1)
print(max(list1))
final_list=[]
for i in val:
if(i[-1]==p):
final_list.append(i)
elif(i[-1]==choice(list1)):
final_list.append(i)
print(final_list)
Please help me out.

My interpretation of the problem is that you need to take a row from the file (table) with the highest numeric value in the 4th token. You then want to take another row at random and make that into a pair (a list of tuples). Key rule is that no pair can have equal tuples. I propose this:
from random import choice
# handles each row from the file creating an appropriately structured tuple
def mfunc(s):
t = s.strip().split(',')
t[-1] = int(t[-1])
return tuple(t)
# selects a random tuple from the given list ensuring that the selection cannot match the reference tuple
def gar(lst, ref=None):
rv = choice([e for e in lst if e != ref]) if ref else choice(lst)
lst.remove(rv)
return rv
with open('file.txt') as infile:
rows = sorted(list(map(mfunc, infile.readlines())), key=lambda x: x[-1])
output = []
t = rows[-1] # get row with highest value
rows.pop(-1) # then remove it
output.append([t, gar(rows, t)])
while len(rows) > 1:
a = gar(rows)
output.append([a, gar(rows, a)])
print(output)
Output (example):
[[('J', 'K', 'L', 98), ('M', 'N', 'O', 55)], [('S', 'T', 'U', 46), ('A', 'B', 'C', 32)], [('G', 'H', 'I', 76), ('D', 'E', 'F', 65)], [('G', 'H', 'I', 76), ('J', 'K', 'L', 98)]]

Related

How to compare first and last index of a string in a list

input :['slx', 'poo', 'lan', 'ava', 'slur']
output:['s', 'o', 'l', 'a', 'r']
how do you compare the first and last index of a string in a list?
Thanks in advance!
I am assuming that you want to compare the characters and get the 'smaller' one (lexicographically). You can use list comprehension and min for that:
lst = ['slx', 'poo', 'lan', 'ava', 'slur']
output = [min(x[0], x[-1]) for x in lst]
print(output) # ['s', 'o', 'l', 'a', 'r']
Comparing two strings is done lexicographically: for example, 'banana' < 'car' == True, since "banana" comes before "car" in a dictionary. So for example, 's' < 'x', so min('s', 'x') would be 's', which explains the first element of the output list.
s = ['slx', 'poo', 'lan', 'ava', 'slur']
print(list(map(lambda x: min(x[0], x[-1]), s)))
Assuming that you want to find the smallest value (compared between first and last character) from a string inside a list:
to do this
list = ['slx', 'poo', 'lan', 'ava', 'slur']
result=[]
for item in list:
leng = len(item)
if item[0] < item[leng-1]:
result.append(item[0])
else:
result.append(item[leng-1])
print(result)

How to split list of tuples on condition?

I have list of tuples:
my_list = [(1,'a','b','c'), (2,'d','e','f'), (3,'g','h','i'), (1,'j','k','l'), (2,'m','n','o'), (1,'p','q','r'), (2,'s','t','u')]
I need to split it on sublists of tuples starting with a tuple where first item is '1'.
[(1,'a','b','c'), (2,'d','e','f'), (3,'g','h','i')]
[(1,'j','k','l'), (2,'m','n','o')]
[(1,'p','q','r'), (2,'s','t','u')]
You're effectively computing some kind of a "groupwhile" function -- you want to split at every tuple you find starting in a 1. This looks an awful lot like itertools.groupby, and if we keep a tiny bit of global state (the one_count variable in our example) we can re-use the grouping/aggregation logic already built-in to the language to get your desired result.
import itertools
# The inner function is just so that one_count will be initialized only
# as many times as we want to call this rather than exactly once via
# some kind of global variable.
def gen_count():
def _cnt(t, one_count=[0]):
if t[0] == 1:
one_count[0] += 1
return one_count[0]
return _cnt
result = [list(g[1]) for g in itertools.groupby(my_list, key=gen_count())]
A more traditional solution would be to iterate through your example and append intermediate outputs to a result set.
result = []
for i, *x in my_list:
if i==1:
result.append([(i, *x)])
else:
result[-1].append((i, *x))
Try this code. I assume the break is when the first character (1) is found again. I also assume the output is a list.
my_list = [(1,'a','b','c'), (2,'d','e','f'), (3,'g','h','i'), (1,'j','k','l'), (2,'m','n','o'), (1,'p','q','r'), (2,'s','t','u')]
ch = my_list[0][0]
all = []
st = 0
for i, t in enumerate(my_list):
if t[0] == ch:
if i != 0:
all.append(my_list[st:i])
st = i
else:
all.append(my_list[st:i])
print(all)
Output
[
[(1, 'a', 'b', 'c'), (2, 'd', 'e', 'f'), (3, 'g', 'h', 'i')],
[(1, 'j', 'k', 'l'), (2, 'm', 'n', 'o')],
[(1, 'p', 'q', 'r')]
]

why this python function skips index 1 to 3 without iterating index 2 when running in a for loop

I wrote a function in order to remove parts that duplicates in two strings. I first transform string into list and iterate through the two list to find if characters on the same position are the same. The problem is when iterating,
the code skips index 2. (ex:list="index",the iterator jump to 'd' after iterating 'i').
I've tried to use "replace" method to do string operation but I did not get the result I want. "Replace" method removed parts that I want.
def popp(s,t):
s_lis=list(s)
t_lis=list(t)
ind=0
for i,j in zip(s_lis,t_lis):
if i==j:
s_lis.pop(ind)
t_lis.pop(ind)
else:ind+=1
return s_lis,t_lis
# test the code
print(popp('hackerhappy','hackerrank'))
expected result: ['h','p','p','y'] ['r','n','k']
actual result: ['k', 'r', 'h', 'a', 'p', 'p', 'y'], ['k', 'r', 'r', 'a', 'n', 'k']
To begin with, you should use itertools.zip_longest which makes a zip out of the longest subsequence. You are using zip which makes a zip out of the shortest subsequence which is what you don't want.
So in our case, it will be
print(list(zip_longest(s_lis, t_lis)))
#[('h', 'h'), ('a', 'a'), ('c', 'c'), ('k', 'k'), ('e', 'e'),
#('r', 'r'), ('h', 'r'), ('a', 'a'), ('p', 'n'), ('p', 'k'), ('y', None)]
Then you should use another list to append the non-common characters rather then operating on the same list you are iterating on via s_lis.pop(idx)
So if the characters in the tuple do not match, append them if they are not None
from itertools import zip_longest
def popp(s,t):
s_lis = list(s)
t_lis = list(t)
s_res = []
t_res = []
#Use zip_longest to zip the two lists
for i, j in zip_longest(s_lis, t_lis):
#If the characters do not match, and they are not None, append them
#to the list
if i != j:
if i!=None:
s_res.append(i)
if j!=None:
t_res.append(j)
return s_res, t_res
The output will look like:
print(popp('hackerhappy','hackerrank'))
#(['h', 'p', 'p', 'y'], ['r', 'n', 'k'])
You could modify your code slightly
def popp(s, t):
s_lis = list(s)
t_lis = list(t)
s_res = []
t_res = []
# match each character. Stops once the
# shortest list ends
for i, j in zip(s_lis, t_lis):
if i != j:
s_res.append(i)
t_res.append(j)
# if s is longer, take rest of the string and
# add it to residual
if len(s) > len(t):
for x in s_lis[len(t):]:
s_res.append(x)
if len(t) > len(s):
for x in t_lis[len(s):]:
t_res.append(x)
print(s_res)
print(t_res)
popp('hackerhappy','hackerrank')

Printing a dictionary in python with n elements per line

Given a .txt with about 200,000 lines of single words, I need to count how many times each letter appears as the first letter of a word. I have a dictionary with keys 'a' - 'z', with counts assigned to each of their values. I need to print them out in the form
a:10,978 b:7,890 c:12,201 d:9,562 e:6,008
f:7,095 g:5,660 (...)
The dictionary currently prints like this
[('a', 10898), ('b', 9950), ('c', 17045), ('d', 10675), ('e', 7421), ('f', 7138), ('g', 5998), ('h', 6619), ('i', 7128), ('j', 1505), ('k'...
How do I remove the brackets & parentheses and print only 5 counts per line? Also, after I sorted the dictionary by keys, it started printing as key, value instead of key:value
def main():
file_name = open('dictionary.txt', 'r').readlines()
alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
letter = {}
for i in alphabet:
letter[i]=0
for n in letter:
for p in file_name:
if p.startswith(n):
letter[n] = letter[n]+1
letter = sorted(letter.items())
print(letter)
main()
You couuld use the following:
It loops through your list, groups it by 5 elements, then prints it in the desired format.
In [15]:
letter = [('a', 10898), ('b', 9950), ('c', 17045), ('d', 10675), ('e', 7421), ('f', 7138), ('g', 5998), ('h', 6619), ('i', 7128), ('j', 1505)]
Replace print(letter) with following:
for grp in range(0, len(letter), 5):
print(' '.join(elm[0] + ':' + '{:,}'.format(elm[1]) for elm in letter[grp:grp+5]))
a:10,898 b:9,950 c:17,045 d:10,675 e:7,421
f:7,138 g:5,998 h:6,619 i:7,128 j:1,505
A collections.Counter dict will get the count of all the first letters on each line, then split into chunks and join:
from collections import Counter
with open('dictionary.txt') as f: # automatically closes your file
# iterate once over the file object as opposed to storing 200k lines
# and 26 iterations over the lines
c = Counter(line[0] for line in f)
srt = sorted(c.items())
# create five element chunks from the sorted items
chunks = (srt[i:i+5] for i in range(0, len(srt), 5))
for chk in chunks:
# format and join
print(" ".join("{}:{:,}".format(c[0],c[1]) for c in chk))
If you may have something other than letters a-z use isalpha in the loop:
c = Counter(line[0] for line in f if line[0].isalpha())
There was a Format Specifier for Thousands Separator added in python 2.7.

how to handle an empty list, python3

I have a function that creates a list of aminoacids (aaCandidates), but this list can even be empty. If this list is empty I would like to jump the step and continue with the following one.
My code is:
def magicfunction(referenceDistance,referenceAA):
amminoacids = ('A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I', 'L', 'K', 'M', 'F', 'P', 'S', 'T', 'W', 'Y', 'V')
aaCandidates = list()
for aa in amminoacids:
if distance(aa,referenceAA) == referenceDistance:
aaCandidates.append(aa)
if not aaCandidates:
break
luckyAA = choice(aaCandidates)
return(luckyAA)
I call this function in another file as follow:
for i in range(lenghtPairs):
r1 = randrange(20)
r2 = randrange(20)
coppie.append([aminoacidi[r1], aminoacidi[r2]])
for i in range(lenghtPairs):
dictionary = dict()
frequenze = dict()
if i == 0:
a = randrange(20)
b = randrange(20)
pairs[0] = [aminoacids[a], aminoacids[b]]
else:
c = randrange(20)
pairs[i][0] = aminoacids[c]
distanceNeighbours = distance(pairs[i][0],pairs[i-1][0])
aaChosen = magicfunction(distanceNeighbours,pairs[i-1][1])
pairs[i][1] = aaChosen
print(i + 1)
I tried the condition => if not aaCandidates: break but it didn't work:
File "/.../lib/python3.4/random.py", line 255, in choice
raise IndexError('Cannot choose from an empty sequence')
IndexError: Cannot choose from an empty sequence
Your list is empty, so random.choice() fails. You'll need to decide what to do instead when the list is empty, but do so outside of the for loop, so when the list has completed building:
for aa in amminoacids:
if distance(aa,referenceAA) == referenceDistance:
aaCandidates.append(aa)
if not aaCandidates:
return 'some default choice'
luckyAA = choice(aaCandidates)
return luckyAA
All that putting your break in the loop achieves is to ensure that nothing is going to be added to your list if the first aa was not a candidate.

Categories

Resources