Extract patterns from tuples - python

I have a list of tuples
a=[('a', 0), ('c', 1), ('d', 0), ('b', 1), ('t',1), ('j',2), ('k',3), ('s', 4), ('l',1), ('y',1), ('r',2), ('b',3), ('k',4)]
I want output like
[[1,1,1,2,3,4],[1,1,2,3,4]]
and corresponding letters
[['c', 'b', 't', 'j', 'k', 's'], ['l', 'y', 'r', 'b', 'k']]
I need to remove 0's in between and the pattern always starts with 1

Using a simple loop and tracking the previous non-zero value:
letters = []
numbers = []
prev = 2
for l,n in a:
if n == 0:
continue
elif prev > 1 and n == 1:
letters.append([])
numbers.append([])
letters[-1].append(l)
numbers[-1].append(n)
prev = n
letters
# [['c', 'b', 't', 'j', 'k', 's'], ['l', 'y', 'r', 'b', 'k']]
numbers
# [[1, 1, 1, 2, 3, 4], [1, 1, 2, 3, 4]]

Related

Adjusting the fucntion to find location for more than one base

I created this function and it finds the location of the base in a dna sequences. Like dna = ['A', 'G', 'C', 'G', 'T', 'A', 'G', 'T', 'C', 'G', 'A', 'T', 'C', 'A', 'A', 'T', 'T', 'A', 'T', 'A', 'C', 'G', 'A', 'T', 'C', 'G', 'G', 'G', 'T', 'A', 'T']. I need it to find more than one base at a time like 'A''T'. Can anyone help?
def position(list, value):
pos = []
for n in range(len(list)):
if list[n] == value:
pos.append(n)
return pos
You can work with the dna sequence as a string, and then use regex:
import re
dna_str = ''.join(dna)
pattern = r'AT'
pos = [(i.start(0), i.end(0)) for i in re.finditer(pattern, dna_str)]
print(pos)
[(10, 12), (14, 16), (17, 19), (22, 24), (29, 31)]
side note, good not to use keywords for variable names. list is a python keyword
def position(l: list, values: list): -> list
pos = []
for i, val in enumerate(l):
if val in values:
pos.append(i)
return pos
You should definitely use Python built-in functions. For instance, instead of position(list, value) you could use comprehension
[n for n,x in enumerate(dna) if x == 'A']
Finding a bigram could be reduced to the above if you consider pairs of letters:
[n for n,x in enumerate(zip(dna[:-1], dna[1:])) if x==('A','T')]
If instead you want to find the positions of either 'A' or 'T', you could just specify that as the condition
[n for n,x in enumerate(dna) if x in ('A', 'T')]
Python will efficiently find a substring of a string starting from any point.
def positions(dnalist, substr):
dna = "".join(dnalist) # make single string
st = 0
pos = []
while True:
a_pos = dna.find(substr, st)
if a_pos < 0:
return pos
pos.append(a_pos)
st = a_pos + 1
Test usage:
>>> testdna = ['A', 'G', 'C', 'G', 'T', 'A', 'G', 'T', 'C', 'G', 'A', 'T', 'C', 'A', 'A', 'T', 'T', 'A', 'T', 'A', 'C', 'G', 'A', 'T', 'C', 'G', 'G', 'G', 'T', 'A', 'T']
>>> positions(testdna, "AT")
[10, 14, 17, 22, 29]

Python how to revert the pattern of a list rearrangement

So I am rearranging a list based on an index pattern and would like to find a way to calculate the pattern I need to revert the list back to its original order.
for my example I am using a list of 5 items as I can work out the pattern needed to revert the list back to its original state.
However this isn't so easy when dealing with 100's of list items.
def rearrange(pattern: list, L: list):
new_list = []
for i in pattern:
new_list.append(L[i-1])
return new_list
print(rearrange([2,5,1,3,4], ['q','t','g','x','r']))
#['t', 'r', 'q', 'g', 'x']
and in order to set it back to the original pattern
I would use
print(rearrange([3,1,4,5,2],['t', 'r', 'q', 'g', 'x']))
#['q', 't', 'g', 'x', 'r']
What I am looking for is a way to calculate the pattern "[3,1,4,5,2]"
regarding the above example.
whist running the script so that I can set the list back to its original order.
Using a larger example:
print(rearrange([18,20,10,11,13,1,9,12,16,6,15,5,3,7,17,2,19,8,14,4],['e','p','b','i','s','r','q','h','m','f','c','g','d','k','l','t','a','n','j','o']))
#['n', 'o', 'f', 'c', 'd', 'e', 'm', 'g', 't', 'r', 'l', 's', 'b', 'q', 'a', 'p', 'j', 'h', 'k', 'i']
but I need to know the pattern to use with this new list in order to return it to its original state.
print(rearrange([???],['n', 'o', 'f', 'c', 'd', 'e', 'm', 'g', 't', 'r', 'l', 's', 'b', 'q', 'a', 'p', 'j', 'h', 'k', 'i']))
#['e','p','b','i','s','r','q','h','m','f','c','g','d','k','l','t','a','n','j','o']
This is commonly called "argsort". But since you're using 1-based indexing, you're off-by-one. You can get it with numpy:
>>> pattern
[2, 5, 1, 3, 4]
>>> import numpy as np
>>> np.argsort(pattern) + 1
array([3, 1, 4, 5, 2])
Without numpy:
>>> [1 + i for i in sorted(range(len(pattern)), key=pattern.__getitem__)]
[3, 1, 4, 5, 2]
What about something like below:
def revert_pattern(pattern):
pattern_i = [0]*len(pattern)
for k in range(len(pattern)):
pattern_i[pattern[k]-1] = k+1
return pattern_i
print(revert_pattern([2, 5, 1, 3, 4]))
# [3, 1, 4, 5, 2]
Note: I followed your logic but I recommend you using 0 as the smallest indexes instead of 1 since it requires somes extra +1/-1 that could be avoided
def rearrange(p, l):
arr = [l[i - 1] for i in p]
d = {v : i + 1 for i, v in enumerate(arr)}
order = [d[k] for k in l]
return arr, order
a = [2, 5, 1, 3, 4]
b = ['q', 't', 'g', 'x', 'r']
rearrange(a, b)
# (['t', 'r', 'q', 'g', 'x'], [3, 1, 4, 5, 2])
OR maybe
def revert(p):
z = zip(p, list(range(len(p))))
return [x + 1 for _, x in sorted(z)]
a = [2, 5, 1, 3, 4]
revert(a)
# [3, 1, 4, 5, 2]

How do I count letters occurring in a string without using a dictionary or list.count()?

I am trying to count each letter up without using count() or
dict().
I did write something but I am still having issues with my code.
myString = []
#countList = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
myString ="pynativepynvepynative"
countList = [len(myString)+1]
for i in range(len(myString)):
#print("Here 0")
for j in range(len(countList)):
#print("Here 1")
if i == countList[j]:
#print("Here 1.1")
countList[j+1] = (countList[j+1] + 1)
break
else:
#print("Here 2")
countList.append(myString[i])
countList.append(1)
break
print(countList)
Expected output:
['p', 3, 'y', 3, 'n', 3, 'a', 2, 't', 2, 'i', 2, 'v', 3, 'e', 3]
Actual output:
[22, 'p', 1, 'y', 1, 'n', 1, 'a', 1, 't', 1, 'i', 1, 'v', 1, 'e', 1, 'p', 1, 'y', 1, 'n', 1, 'v', 1, 'e', 1, 'p', 1, 'y', 1, 'n', 1, 'a', 1, 't', 1, 'i', 1, 'v', 1, 'e', 1]
what can you do is get the unique letters from the string and the for each unique letter loop through the string to count its frequency.
def func_count(string):
letter = []
for char in string:
if char not in letter:
letter.append(char)
res = []
for let in letter:
count = 0
for char in string:
if let == char:
count+=1
res.extend([let, count])
# res = {a:b for a,b in zip(res[::2], res[1::2])}
return res
string = "pynativepynvepynative"
solution = func_count(string)
print(solution)
output
['p', 3, 'y', 3, 'n', 3, 'a', 2, 't', 2, 'i', 2, 'v', 3, 'e', 3]
edit, if you want solution in dict form add res = {a:b for a,b in zip(res[::2], res[1::2])} in function
Using my question code, I was able to get the right answer modifying the question's code
My problem was that I did not know how to initiate countList properly
countList = []
myString ="pynativepynvepynative"
for i in range(len(myString)):
#print("Here 0")
for j in range(len(countList)):
#print("Here 1")
#print(myString[i])
#print(j)
#print(countList[j])
if myString[i] == countList[j]:
#print("Here 1.1")
#print(myString[i])
countList[j+1] = (countList[j+1] + 1)
break
else :
#print("Here 2")
countList.append(myString[i])
countList.append(1)
print(countList)
Actual output:
['p', 3, 'y', 3, 'n', 3, 'a', 2, 't', 2, 'i', 2, 'v', 3, 'e', 3]
Use collections.Counter, the dict subclass for counting objects, which makes this a one-liner:
from collections import Counter
c = Counter('pynativepynvepynative')
Counter({'p': 3, 'y': 3, 'n': 3, 'v': 3, 'e': 3, 'a': 2, 't': 2, 'i': 2})
(Technically this isn't a dict, it's a subclass of dict.)
You can get a list-of-tuple from it:
>>> c.most_common()
[('p', 3), ('y', 3), ('n', 3), ('v', 3), ('e', 3), ('a', 2), ('t', 2), ('i', 2)]
Lists or tuples are undesirable for counting things, because you want to be able to separately access/sort by the keys (objects that you're counting) and the values (counts). In theory you can do that on list-of-list/tuple, but it's a pain, and Counter alrady defines several of the methods you'll need.

How to define section number that 2d-array is divided to using Python?

I have this data structure:
It is 2d-array that is divided on 3 sections. For each letter in the array I need to define Section number. For example, letters a,b,c,d are in Section 1; e,f,g,h are in Section 2.
My code. Firstly, this 2d-array preparation:
from itertools import cycle
letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l']
#2d-array initialization
width, height = 3, 6
repRange = cycle(range(1, 3))
values = [0] * (width - 1)
array2d = [[next(repRange)] + values for y in range(height)]
#Filling array with letters:
m = 0
for i in range(height):
for j in range(1,width):
array2d[i][j] = letters[m]
m+=1
#Printing:
for row in array2d:
print(row)
Output:
[1, 'a', 'b']
[2, 'c', 'd']
[1, 'e', 'f']
[2, 'g', 'h']
[1, 'i', 'j']
[2, 'k', 'l']
Now I need to determine section number of each letter and save it along with the letter itself. I use defineSection function and save values in dictionary:
def defineSection(i, division, height):
if i <= division:
return 1
elif division*2 >= i > division :
return 2
elif division*3 >= i > division*2 :
return 3
dic = {}
for i in range(height):
for j in range(1,width):
section = defineSection(i+1, 2, height)
dic.update({array2d[i][j] : section})
for item in dic.items():
print(item)
Output:
('f', 2)
('b', 1)
('c', 1)
('e', 2)
('k', 3)
('g', 2)
('d', 1)
('a', 1)
('l', 3)
('h', 2)
('i', 3)
('j', 3)
It defined all section numbers for each letter correctly. But defineSection method is primitive and will not work if number of rows is bigger than 6.
I don't know how to implement defineSection method so that it defines Section number automatically taking into account only current Row number, division and number of rows in total.
Question: Is there some way I can simply determine section number without so many if-elif conditions and independently of total number of rows?
You can simplify your matrix creation code immensely. All you need is a letters iterator, which returns itself so you can iterate 2-letters at a time using zip.
In [3]: from itertools import cycle
In [4]: letters = "abcdefghijkl"
In [5]: ranges = cycle(range(1,3))
In [6]: iter_letters = iter(letters)
In [7]: matrix = [[i,a,b] for i,a,b in zip(ranges,iter_letters,iter_letters)]
In [8]: matrix
Out[8]:
[[1, 'a', 'b'],
[2, 'c', 'd'],
[1, 'e', 'f'],
[2, 'g', 'h'],
[1, 'i', 'j'],
[2, 'k', 'l']]
As for assigning sections, note that a section is every two rows, which is four letters, so you can use simple floor division to "skip" counts.
In [9]: sections = {letter:(i//4 + 1) for i,letter in enumerate(letters)}
In [10]: sections
Out[10]:
{'a': 1,
'b': 1,
'c': 1,
'd': 1,
'e': 2,
'f': 2,
'g': 2,
'h': 2,
'i': 3,
'j': 3,
'k': 3,
'l': 3}

fPython : Making a new list from a random list of letters

letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']#alphabet
bag_o_letters = []#letters to chose from
letter_count = [9, 2, 2, 4, 12, 2, 3, 2, 9, 1, 1, 4, 2, 6, 8, 2, 1, 6, 4, 6, 4, 2, 2, 1, 2, 1]#random indexes to chose from
for x in range(26):#loops through the random index of letter_count
for i in range(letter_count[x]):#chooses the index
bag_o_letters.append(letters[x])#appends the index of the letter to bag_o_letters
rack = []#list for the person to see
for a in range(7):#makes the list 7 letters long
rack.append(bag_o_letters.pop(random.randint(0,len(letters)-1)))#appends the letter to rack(supposedly...)
print(rack)
In this code that you just read it should choose random letters and put 7 of those letters in a rack that the person can see. It shows a error that I've looked over many times, but I just can't see what is wrong.
I put comments on the side to understand the code.
It shows this error:
rack.append(bag_of_letters.pop(random.randint(0,len(letters)-1)))
IndexError: pop index out of range
Can someone please help?
After this code, I am going to make a input statement for the user to make a word from those letters.
The first time through the loop, you append one value to bag_of_letters, and then you try to pop an index of random.randint(0,len(letters)-1). It doesn't have that many elements to pop from yet. Instead of this approach, you can make a list of the required length and sample from it:
letters = ['a', ...]#alphabet
letter_count = [9, ...]#random indexes to chose from
bag_of_letters = [l*c for l,c in zip(letters, letter_count)]
...
rack = random.sample(bag_o_letters, 7)
You're selecting the index to pop for bag_of_letters from the length of letters which is obviously larger.
You should instead do:
rack.append(bag_of_letters.pop(random.randint(0, len(bag_of_letters)-1)))
# ^^^^^^^^^^^^^^
However, there are likely to be more problems with your code. I'll suggest you use random.sample in one line of code or random.shuffle on a copy of the list, and then slice up till index 7. Both will give you 7 randomly selected letters:
import random
print(random.sample(letters, 7))
# ['m', 'u', 'l', 'z', 'r', 'd', 'x']
import random
letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
letters_copy = letters[:]
random.shuffle(letters_copy)
print(letters_copy[:7])
# ['c', 'e', 'x', 'b', 'w', 'f', 'v']
The IndexError is expected:
pop(...)
L.pop([index]) -> item -- remove and return item at index (default last).
Raises IndexError if list is empty or index is out of range.
You need to subtract 1 from the bounds of the call to random() after each pop(). Right now you are doing this:
l = [1,2,3]
random_idx = 2
l.pop(random_idx)
>>> l == [1,3]
random_idx = 3
l.pop(random_idx)
>>>> IndexError: pop index out of range
So instead, pop() based on len(bag_o_letter) rather than len(letter).
Why not do something like this:
letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
letter_count = [9, 2, 2, 4, 12, 2, 3, 2, 9, 1, 1, 4, 2, 6, 8, 2, 1, 6, 4, 6, 4, 2, 2, 1, 2, 1]#random indexes to chose from
from random import shuffle
all_letters = list(''.join([l*c for l,c in zip(letters, letter_count)]))
shuffle(all_letters)
for i in range(int(len(all_letters)/7)):
print all_letters[i*7:(i+1)*7]
So I assume this is for something like scrabble? Your issue is that you're choosing a random index from your list of letters, not bag_o_letters. Maybe try this:
rack = []
for i in range(7):
index = random.randint(0, len(bag_o_letter) - 1)
rack.append(bag_o_letters.pop(index))

Categories

Resources