How to do slicing in strings in python? - python

I am trying to do slicing in string "abcdeeefghij", here I want the slicing in such a way that whatever input I use, i divide the output in the format of a list (such that in one list element no alphabets repeat).
In this case [abcde,e,efghij].
Another example is if input is "aaabcdefghiii". Here the expected output is [a,a,acbdefghi,i,i].
Also amongst the list if I want to find the highest len character i tried the below logic:
max_str = max(len(sub_strings[0]),len(sub_strings[1]),len(sub_strings[2]))
print(max_str) #output - 6
which will yield 6 as the output, but i presume this logic is not a generic one: Can someone suggest a generic logic to print the length of the maximum string.

Here is how:
s = "abcdeeefghij"
l = ['']
for c in s: # For character in s
if c in l[-1]: # If the character is already in the last string in l
l.append('') # Add a new string to l
l[-1] += c # Add the character to either the last string, either new, or old
print(l)
Output:
['abcde', 'e', 'efghij']

Use a regular expression:
import re
rx = re.compile(r'(\w)\1+')
strings = ['abcdeeefghij', 'aaabcdefghiii']
lst = [[part for part in rx.split(item) if part] for item in strings]
print(lst)
Which yields
[['abcd', 'e', 'fghij'], ['a', 'bcdefgh', 'i']]

You would loop over the characters in the input and start a new string if there is an existing match, otherwise join them onto the last string in the output list.
input_ = "aaabcdefghiii"
output = []
for char in input_:
if not output or char in output[-1]:
output.append("")
output[-1] += char
print(output)

To avoid repetition of alphabet within a list element repeat, you can greedily track what are the words that are already in the current list. Append the word to your answer once you detected a repeating alphabet.
from collections import defaultdict
s = input()
ans = []
d = defaultdict(int)
cur = ""
for i in s:
if d[i]:
ans.append(cur)
cur = i # start again since there is repeatition
d = defaultdict(int)
d[i] = 1
else:
cur += i #append to cur since no repetition yet
d[i] = 1
if cur: # handlign the last part
ans.append(cur)
print(ans)
An input of aaabcdefghiii produces ['a', 'a', 'abcdefghi', 'i', 'i'] as expected.

Related

String manipulation with python and storing alphabets and digits in separate lists

For a given string s='ab12dc3e6' I want to add 'ab' and '12' in two different lists. that means for output i am trying to achieve as temp1=['ab','dc','e'] and for temp2=['12,'3','6'].
I am not able to do so with the following code. Can someone provide an efficient way to do it?
S = "ab12dc3e6"
temp=list(S)
x=''
temp1=[]
temp2=[]
for i in range(len(temp)):
while i<len(temp) and (temp[i] and temp[i+1]).isdigit():
x+=temp[i]
i+=1
temp1.append(x)
if not temp[i].isdigit():
break
You can also solve this without any imports:
S = "ab12dc3e6"
def get_adjacent_by_func(content, func):
"""Returns a list of elements from content that fullfull func(...)"""
result = [[]]
for c in content:
if func(c):
# add to last inner list
result[-1].append(c)
elif result[-1]: # last inner list is filled
# add new inner list
result.append([])
# return only non empty inner lists
return [''.join(r) for r in result if r]
print(get_adjacent_by_func(S, str.isalpha))
print(get_adjacent_by_func(S, str.isdigit))
Output:
['ab', 'dc', 'e']
['12', '3', '6']
you can use regex, where you group letters and digits, then append them to lists
import re
S = "ab12dc3e6"
pattern = re.compile(r"([a-zA-Z]*)(\d*)")
temp1 = []
temp2 = []
for match in pattern.finditer(S):
# extract words
#dont append empty match
if match.group(1):
temp1.append(match.group(1))
print(match.group(1))
# extract numbers
#dont append empty match
if match.group(2):
temp2.append(match.group(2))
print(match.group(2))
print(temp1)
print(temp2)
Your code does nothing for isalpha - you also run into IndexError on
while i<len(temp) and (temp[i] and temp[i+1]).isdigit():
for i == len(temp)-1.
You can use itertools.takewhile and the correct string methods of str.isdigit and str.isalpha to filter your string down:
S = "ab12dc3e6"
r = {"digit":[], "letter":[]}
from itertools import takewhile, cycle
# switch between the two test methods
c = cycle([str.isalpha, str.isdigit])
r = {}
i = 0
while S:
what = next(c) # get next method to use
k = ''.join(takewhile(what, S))
S = S[len(k):]
r.setdefault(what.__name__, []).append(k)
print(r)
Output:
{'isalpha': ['ab', 'dc', 'e'],
'isdigit': ['12', '3', '6']}
This essentially creates a dictionary where each seperate list is stored under the functions name:
To get the lists, use r["isalpha"] or r["isdigit"].

Python - change text in string by random from a list

I want to write a loop function that go through each letter in my list called original.
original = ['ABCD', 'DCBA', 'AAAA', 'AABB']
letters = ['A', 'B', 'C', 'D']
p = 1
for o in original: # loop through the original list
for i in range(0,len(o)): # loop through each letter in selected list
if random.randint(1,10) == p: #if this gives me the probability that is met
# I want to change the current letter on the current index to
# something else different from the letter list by random (maybe random.choice)
Im new to python please can you advice.
I dont want to use class or any other library but random please
First, the zero in
for i in range(0, len(o))
is redundant. You want to give random.choice a list of letters that include everything in letters minus the current letter. The fastest way I can think of doing this is with a set:
newletters = list(set(letters).difference(o[i])
Now you have a list that includes all the letters in "letters" except for the letter at o[i].
To assign the letter (after you get it from random.choice), turn your "original" word into a list:
o_list = list(o)
and assign it as
l = random.choice(newletters)
o_list[i] = l
new_word = "".join(o_list)
As for actually inserting that new word back into your list of originals, you would have to know the index of the old word - I would use enumerate to do this:
original = ['ABCD', 'DCBA', 'AAAA', 'AABB']
letters = ['A', 'B', 'C', 'D']
p = 1
for index, o in enumerate(original): # loop through the original list
for i in range(len(o)): # loop through each letter in selected list
if random.randint(1,10) == p:
newletters = list(set(letters).difference(o[i])
o_list = list(o)
l = random.choice(newletters)
o_list[i] = l
new_word = "".join(o_list)
original[index] = new_word
In python, you can not modify strings at all. You can get letters by index, select specific strings, but not modify them. To change the said list you can use original.pop(o) and add the said edited string in the list with original.append('AB" + random.choice(letters) + 'C' as you said. To be more clear: you use list.append(element) to add element to list and you use list.pop(element) to remove element from list. Again, you can never edit strings in python, you can only create new ones and store the edited old ones, for example, new_string = old_string[:4], this particular code will store all the characters in old_string, up to index 4 into the new string. Really hope I helped!
Assuming you want to update original
import random
original = ['ABCD', 'DCBA', 'AAAA', 'AABB']
letters = ['A', 'B', 'C', 'D']
p = 1
for i, o in enumerate(original):
new_letters = [] # updated letters for word o
for c in o:
if random.randint(1,10) == p:
t = letters[:] # copy of letters
t.remove(c) # remove letter from copy (so letters remains unchanged)
new_letters.append(random.choice(t)) # choice over remaining letters
else:
new_letters.append(c)
original[i] = ''.join(new_letters) # convert newsletters list to string
# and replace in original
print(original)

Get sequences from a file and store them into a list in python

Here is the code (i took it from this discussion Translation DNA to Protein, but here i'm using RNA instead of DNA file):
from itertools import takewhile
def translate_rna(sequence, d, stop_codons=('UAA', 'UGA', 'UAG')):
start = sequence.find('AUG')
# Take sequence from the first start codon
trimmed_sequence = sequence[start:]
# Split it into triplets
codons = [trimmed_sequence[i:i + 3] for i in range(0, len(trimmed_sequence), 3)]
# Take all codons until first stop codon
coding_sequence = takewhile(lambda x: x not in stop_codons and len(x) == 3, codons)
# Translate and join into string
protein_sequence = ''.join([codontable[codon] for codon in coding_sequence])
# This line assumes there is always stop codon in the sequence
return "{0}".format(protein_sequence)
Calling the translate_rna function:
sequence = ''
for line in open("to_rna", "r"):
sequence += line.strip()
translate_rna(sequence, d)
My to_rna file looks like:
CCGCCCCUCUGCCCCAGUCACUGAGCCGCCGCCGAGGAUUCAGCAGCCUCCCCCUUGAGCCCCCUCGCUU
CCCGACGUUCCGUUCCCCCCUGCCCGCCUUCUCCCGCCACCGCCGCCGCCGCCUUCCGCAGGCCGUUUCC
ACCGAGGAAAAGGAAUCGUAUCGUAUGUCCGCUAUCCAG.........
The function translate only the first proteine (from the first AUG to the first stop_codon)
I think the problem is in this line:
# Take all codons until first stop codon
coding_sequence = takewhile(lambda x: x not in stop_codons and len(x) == 3 , codons)
My question is : How can i tell python (after finding the first AUG and store it into coding_sequence as a list) to search again the next AUG in the RNA file and sotre it in the next position.
As a result, i wanna have a list like that:
['here_is_the_1st_coding_sequence', 'here_is_the_2nd_coding_sequence', ...]
PS : This is a homework, so i can't use Biopython.
EDIT:
A simple way to describe the problem:
From this code:
from itertools import takewhile
lst = ['N', 'A', 'B', 'Z', 'C', 'A', 'V', 'V' 'Z', 'X']
ch = ''.join(lst)
stop = 'Z'
start = ch.find('A')
seq = takewhile(lambda x: x not in stop, ch)
I want to get this:
['AB', 'AVV']
EDIT 2:
For instance, from this string:
UUUAUGCGCCGCUAACCCAUGGUUCCCUAGUGGUCCUGACGCAUGUGA
I should get as result:
['AUGCGCCGC', 'AUGGUUCCC', 'AUG']
looking at your basic code, because I couldn't quite follow your main stuff, it looks like you just want to split your string on all occurences of another string, and substring the string starting from the index of another string. If that is wrong, please tell me and I can update accordingly.
To achieve this, python has a builtin str.split(sub) which splits a string at every occurence of sub. Also, it has a str.index(sub) which returns the first index of sub. Example:
>>> ch = 'NABZCAVZX'
>>> ch[ch.index('A'):].split('Z')
['AB', 'CAV', 'X']
you can also specify sub strings that aren't just one char:
>>> ch = 'NACBABQZCVEZTZCGE'
>>> ch[ch.index('AB'):].split('ZC')
['ABQ', 'VEZT', 'GE']
Using multiple delimiters:
>>> import re
>>> stop_codons = ['UAA','UGA','UAG']
>>> re.compile('|'.join(stop_codons))\
>>> delim = re.compile('|'.join(stop_codons))
>>> ch = 'CCHAUAABEGTAUAAVEGTUGAVKEGUAABEGEUGABRLVBUAGCGGA'
>>> delim.split(ch)
['CCHA', 'BEGTA', 'VEGT', 'VKEG', 'BEGE', 'BRLVB', 'CGGA']
note that there is no order preferance to the split, ie if there is a UGA string ahead of a UAA, it will still split on the UGA. I am not sure if thats what you want but thats it.

Append to List Nested in Dictionary

I am trying to append to lists nested in a dictionary so I can see which letters follow a letter. I have the desired result at the bottom I would like to get. Why is this not matching up?
word = 'google'
word_map = {}
word_length = len(word)
last_letter = word_length - 1
for index, letter in enumerate(word):
if index < last_letter:
if letter not in word_map.keys():
word_map[letter] = list(word[index+1])
if letter in word_map.keys():
word_map[letter].append(word[index+1])
if index == last_letter:
word_map[letter] = None
print word_map
desired_result = {'g':['o', 'l'], 'o':['o', 'g'], 'l':['e'],'e':None}
print desired_result
Use the standard library to your advantage:
from itertools import izip_longest
from collections import defaultdict
s = 'google'
d = defaultdict(list)
for l1,l2 in izip_longest(s,s[1:],fillvalue=None):
d[l1].append(l2)
print d
The first trick here is to yield the letters pair-wise (with a None at the end). That's exactly what we do with izip_longest(s,s[1:],fillvalue=None). From there, it's a simple matter of appending the second letter to the dictionary list which corresponds to the first character. The defaultdict allows us to avoid all sorts of tests to check if the key is in the dict or not.
if letter not in word_map.keys():
word_map[letter] = list(word[index+1])
# now letter IS in word_map, so this also executes:
if letter in word_map.keys():
word_map[letter].append(word[index+1])
You meant:
if letter not in word_map.keys():
word_map[letter] = list(word[index+1])
else:
word_map[letter].append(word[index+1])
Another thing: what if the last letter also occurs in the middle of the word?

Python Function to return a list of common letters in first and last names

Question: DO NOT USE SETS IN YOUR FUNCTION: Uses lists to return a list of the common letters in the first and last names (the intersection) Prompt user for first and last name and call the function with the first and last names as arguments and print the returned list.
I can't figure out why my program is just printing "No matches" even if there are letter matches. Anything helps! Thanks a bunch!
Code so far:
import string
def getCommonLetters(text1, text2):
""" Take two strings and return a list of letters common to
both strings."""
text1List = text1.split()
text2List = text2.split()
for i in range(0, len(text1List)):
text1List[i] = getCleanText(text1List[i])
for i in range(0, len(text2List)):
text2List[i] = getCleanText(text2List[i])
outList = []
for letter in text1List:
if letter in text2List and letter not in outList:
outList.append(letter)
return outList
def getCleanText(text):
"""Return letter in lower case stripped of whitespace and
punctuation characters"""
text = text.lower()
badCharacters = string.whitespace + string.punctuation
for character in badCharacters:
text = text.replace(character, "")
return text
userText1 = raw_input("Enter your first name: ")
userText2 = raw_input("Enter your last name: ")
result = getCommonLetters(userText1, userText2)
numMatches = len(result)
if numMatches == 0:
print "No matches."
else:
print "Number of matches:", numMatches
for letter in result:
print letter
Try this:
def CommonLetters(s1, s2):
l1=list(''.join(s1.split()))
l2=list(''.join(s2.split()))
return [x for x in l1 if x in l2]
print CommonLetters('Tom','Dom de Tommaso')
Output:
>>> ['T', 'o', 'm']
for letter in text1List:
Here's your problem. text1List is a list, not a string. You iterate on a list of strings (['Bobby', 'Tables'] for instance) and you check if 'Bobby' is in the list text2List.
You want to iterate on every character of your string text1 and check if it is present in the string text2.
There's a few non-pythonic idioms in your code, but you'll learn that in time.
Follow-up: What happens if I type my first name in lowercase and my last name in uppercase? Will your code find any match?
Prior to set() being the common idiom for duplicate removal in Python 2.5, you could use the conversion of a list to a dictionary to remove duplicates.
Here is an example:
def CommonLetters(s1, s2):
d={}
for l in s1:
if l in s2 and l.isalpha():
d[l]=d.get(l,0)+1
return d
print CommonLetters('matteo', 'dom de tommaso')
This prints the count of the common letters like so:
{'a': 1, 'e': 1, 'm': 1, 't': 2, 'o': 1}
If you want to have a list of those common letters, just use the keys() method of the dictionary:
print CommonLetters('matteo', 'dom de tommaso').keys()
Which prints just the keys:
['a', 'e', 'm', 't', 'o']
If you want upper and lower case letters to match, add the logic to this line:
if l in s2 and l.isalpha():

Categories

Resources