Characters in string replacement. clean(dna) [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
In data analytics, it is very common for data to come to us in a dirty form, with errors related to how it was transcribed or downloaded. Since we know any sequence of dna must consist of the four bases 'a', 'g', 't', 'c', any other letters appearing in dna must be a mistake. Write a function clean(dna) that returns a new DNA string in which every character that is not an A, C, G, or T is replaced with an N. For example, clean('goat') should return the string 'gnat'. You can assume dna is all lowercase, but don't assume anything about the nature of the wrong characters (e.g. they could even have been accidentally transcribed as numbers).
clean('') → ''
clean('agct7ttczttctgactgcaacgggcaatatgtctctxtgtggattaaaaaaagagtgtcygatagcagcttctgaactggttacctgcc') → 'agctnttcnttctgactgcaacgggcaatatgtctctntgtggattaaaaaaagagtgtcngatagcagcttctgaactggttacctgcc'
clean('gtgagtaaattaaaattttnttgacttaggtcactaaptactttaaccaatataggbatagcgcacagacagataaaaattacagagtac') → 'gtgagtaaattaaaattttnttgacttaggtcactaantactttaaccaatataggnatagcgcacagacagataaaaattacagagtac'
Using for loop
No import

I hope I am not doing your school work for you.
I see an answer has already been posted using .sub, but you asked for only for loops to be used
def clean(text):
cleaned_text=""
for i in range(0, len(text)):
if text[i] in "agtc":
cleaned_text=cleaned_text+text[i]
else:
cleaned_text=cleaned_text+"n"
return cleaned_text
print(clean("agct7ttczttctgactgcaacgggcaatatgtctctxtgtggattaaaaaaagagtgtcygatagcagcttctgaactggttacctgcc"))
# returns agctnttcnttctgactgcaacgggcaatatgtctctntgtggattaaaaaaagagtgtcngatagcagcttctgaactggttacctgcc

Use re.sub:
Import re
dna = 'gtgagtaaattaaaattttnttgacttaggtcactaaptactttaaccaatataggbatagcgcacagacagataaaaattacagagtac'
dna = re.sub(r'[^ACTG]','N',dna.upper())
print(dna)

Related

python regex that accepts words made from characters of a string but never the original string [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed yesterday.
Improve this question
I got an interesting assignment not too long ago, which was to make a regex that would take in a word. It would accept any word made from characters in the word, but never a word that had the original word in it.
Example:
Original string: "adam"
Strings it accepts: "ada", "a", "adm" "adaam", "amda"
Strings it doesn't accept: "adam" "aaadam" "amdadam"
I've been on this for a longer period of time and can't figure it out. Tried making it into an DFA (Deterministic finite automata) as well in the beginning hoping it would make it easier for me to figure it out when I see all the transitions.
Any ideas?
Here I made a regex for it:
https://regex101.com/r/eC2dwx/1
^((?!adam)[adam])+$
Explenation:
^ matches the start of a str and $ the end, so must be string
()+ matches the group one or more times
(?!adam) Not adam
[adam] Any letter of adam

Finding a combination of characters and digits in a string python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 months ago.
Improve this question
I have a list of lists, and I'm attempting to loop through and check to see if the strings in a specific index in each of the inner lists contain a combination of "XY" and then 4 numbers immediately following. The "XY" could be in various locations of the string, so I'm struggling with the syntax beyond just using "XY" in row[5]. How to I add the digits after the "XY" to check? Something that combines the "XY" and isdigit()? Am I stuck using the find function to return an index and then going from there?
You can use Python's regex module re with this pattern that matches XY and then four digits anywhere in the string.
import re
pattern = r'XY\d{4}'
my_list = [['XY0'],['XY1234','AB1234'],['XY1234','ABC123XY5678DEF6789']]
elem_to_check = 1
for row in my_list:
if len(row) > elem_to_check:
for found in re.findall(pattern, row[elem_to_check]):
print(f'{found} found in {row[elem_to_check]}')
Output:
XY5678 found in ABC123XY5678DEF6789

Can I slice a sentence while ignoring whitespaces? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
Like the title says, can I slice a sentence while ignoring whitespaces in Python?
For example, if the last letter of my word is sliced, the second letter of the following word needs to be sliced (while I'm using [::2]). I also have to preserve punctuations, so split isn't really an option. Replacing whitespaces isn't an option either, because I would have no way to put them back in the correct spot.
Sample input:
Myevmyozrtilets gwaaarkmv yuozub ubpi farfokm ctbhpe pientsfiydqe. zBmuvtk tahgelyu anlpsmo ttzevagrk yioquj awpyaoryts.
Expected output:
Memories warm you up from the inside. But they also tear you apart.
Sample implementation below.
Takes in consideration the punctuation (it looks like you've got it apart of the whitespace).
You'd enjoy trying to implement it on your own, I'm sure.
f="Myevmyozrtilets gwaaarkmv yuozub ubpi farfokm ctbhpe pientsfiydqe. zBmuvtk tahgelyu anlpsmo ttzevagrk yioquj awpyaoryts."
def g(f):
c=0
for l in f:
if l not in string.ascii_letters:
yield l
else:
if c%2==0:
yield l
c+=1
''.join(g(f))
'Memories warm you up from the inside. But they also tear you apart.'

Looking for Words in a List with Similar Letters [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I have a list with types of cheese and I want to be able to search for gouda by just writing "g" and "o" instead of writing the full sentence.
I've looked for solutions but none are exactly what I am looking for. Maybe this is something common but I just started a week ago with Python I don't know many of the terms.
For some reason I got this cancelled so Im writing this paragraph so the person that answered can answer again
Here is a link to another StackOverflow post I found: Link.
This explains what I think you are looking for in your problem.
This code will print gouda from the wordlist:
wordlist = ['gouda','miss','lake','que','mess']
letters = set('g')
for word in wordlist:
if letters & set(word):
print(word)
All you have to do is set whatever letters you want to search for in the list to the letter variable (in the brackets) and it will return the words that contain the letters you entered.
ex. I added gouda (your example) to this list. If you set the letters variable, to g, it searches the wordlist for any words that contain the letter g, in this case it will return gouda from the wordlist as it is the only word that contains the letter 'g'.
The only downfall of this is if you enter 'ms' to search this wordlist you will get two responses, miss and mess as they both contain letters 'm,s' so in some cases you will have to be more specific if you only want one word to be returned.
Note: this is not my code, I got it from the post linked here, and above.

How to create a strand count? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
How to create a .py file using the information below to make it test. The example is a dockets.py formatted, but I need to original file with .py not test.py or dockets.py.
RNA Nucleotide Strand Count
Implement a function rna_strand_count(dna, strands) that takes two parameters, dna (a string) and strands (a list of strings), and returns a dictionary whose keys are the individual strings in strands and values are the number of occurances of those strings in the RNA complement transcribed from the original DNA.
>> rna_strand_count('AAAA', ['AA'])
{'AA': 3}
The function shall meet these conditions:
rna_count should be able to handle DNA strings in either upper or lowercase form and return the dictionary with keys in the form given
rna_count should also be robust to whitespace:
Whitespace can be assumed not to be part of a strand
For the purpose of counting strands, whitespace in the DNA string can be stripped
A complete set of unit tests for this function shall be included, using both the unittest and doctest modules.
May be you want this:
import re
def rna_strand_count(test_string,sub_string):
test_string = test_string.replace(' ','').upper()
sub_string = sub_string.replace(' ','').upper()
first = sub_string[0]
second = sub_string[1:]
return {sub_string:len(re.findall(r'{0}(?={1})'.format(first,second),test_string))}
print rna_strand_count('AAAA','AA')

Categories

Resources