Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
How to create a .py file using the information below to make it test. The example is a dockets.py formatted, but I need to original file with .py not test.py or dockets.py.
RNA Nucleotide Strand Count
Implement a function rna_strand_count(dna, strands) that takes two parameters, dna (a string) and strands (a list of strings), and returns a dictionary whose keys are the individual strings in strands and values are the number of occurances of those strings in the RNA complement transcribed from the original DNA.
>> rna_strand_count('AAAA', ['AA'])
{'AA': 3}
The function shall meet these conditions:
rna_count should be able to handle DNA strings in either upper or lowercase form and return the dictionary with keys in the form given
rna_count should also be robust to whitespace:
Whitespace can be assumed not to be part of a strand
For the purpose of counting strands, whitespace in the DNA string can be stripped
A complete set of unit tests for this function shall be included, using both the unittest and doctest modules.
May be you want this:
import re
def rna_strand_count(test_string,sub_string):
test_string = test_string.replace(' ','').upper()
sub_string = sub_string.replace(' ','').upper()
first = sub_string[0]
second = sub_string[1:]
return {sub_string:len(re.findall(r'{0}(?={1})'.format(first,second),test_string))}
print rna_strand_count('AAAA','AA')
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 months ago.
Improve this question
I have a text containing a URL that needs to be reworked.
text='dfs:/?url=https://myserver/c12&ofg={"tes":{"id":1812}}'
I need to replace programmatically the id value (in this example 1812, which is unknown before the execution) with a fixed substring (e.g. 189). So the end result must be
'dfs:/?url=https://myserver/c12&ofg={"tes":{"id":189}}'
As I'm programming in Python, I guess that I should use the regular expression (module re) to automatically replace that value between "id": and }} but I couldn't find one that works for this use case.
I assume you are always generating the same URL with that pattern, and the value to 'change' is always in {"id":X}. One way to solve this particular problem is with a positive lookbehind + re.sub replacement.
import re
pattern = re.compile(r"(?<=\"id\":)\d+")
string = "dfs:/?url=https://myserver/c12&ofg={\"tes\":{\"id\":1812}}"
print(pattern.sub("desired_value", string))
Generated output will contain desired_value in place of the 1812. A good explanation of what is happening is done in regex101 but a quick rep of what is happening in the pattern:
Matches any digit one or more times ONLY if behind has "id":, without consuming characters
what about simply splitting the string twice? eg.
my_string = 'dfs:/?url=https://myserver/c12&ofg={"tes":{"id":1812}}'
substring = my_string.split('"id":',1)[1]
substring = substring.split('}}')[0]
print(my_string.replace(substring, "189"))
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
In data analytics, it is very common for data to come to us in a dirty form, with errors related to how it was transcribed or downloaded. Since we know any sequence of dna must consist of the four bases 'a', 'g', 't', 'c', any other letters appearing in dna must be a mistake. Write a function clean(dna) that returns a new DNA string in which every character that is not an A, C, G, or T is replaced with an N. For example, clean('goat') should return the string 'gnat'. You can assume dna is all lowercase, but don't assume anything about the nature of the wrong characters (e.g. they could even have been accidentally transcribed as numbers).
clean('') → ''
clean('agct7ttczttctgactgcaacgggcaatatgtctctxtgtggattaaaaaaagagtgtcygatagcagcttctgaactggttacctgcc') → 'agctnttcnttctgactgcaacgggcaatatgtctctntgtggattaaaaaaagagtgtcngatagcagcttctgaactggttacctgcc'
clean('gtgagtaaattaaaattttnttgacttaggtcactaaptactttaaccaatataggbatagcgcacagacagataaaaattacagagtac') → 'gtgagtaaattaaaattttnttgacttaggtcactaantactttaaccaatataggnatagcgcacagacagataaaaattacagagtac'
Using for loop
No import
I hope I am not doing your school work for you.
I see an answer has already been posted using .sub, but you asked for only for loops to be used
def clean(text):
cleaned_text=""
for i in range(0, len(text)):
if text[i] in "agtc":
cleaned_text=cleaned_text+text[i]
else:
cleaned_text=cleaned_text+"n"
return cleaned_text
print(clean("agct7ttczttctgactgcaacgggcaatatgtctctxtgtggattaaaaaaagagtgtcygatagcagcttctgaactggttacctgcc"))
# returns agctnttcnttctgactgcaacgggcaatatgtctctntgtggattaaaaaaagagtgtcngatagcagcttctgaactggttacctgcc
Use re.sub:
Import re
dna = 'gtgagtaaattaaaattttnttgacttaggtcactaaptactttaaccaatataggbatagcgcacagacagataaaaattacagagtac'
dna = re.sub(r'[^ACTG]','N',dna.upper())
print(dna)
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I need to create a python program vanilla (without library), which can compute text document similarities between different documents.
The program takes documents as an input and computes a dictionary (matrix) for words of the given input. Each document consists of a sentence and when a new document goes into the program, we need to compare it to the other documents in order to find similar documents. See example below:
Given text input:
input_text = ["Why I like music", "Beer and music is my favorite combination",
"The sun is shining", "How to dance in GTA5", ]
The sentences have to be transformed into vectors, see example:
Hope you can help.
Here some ideas:
use new_str = str.upper() so beer and Beer will be same (if you
need this)
use list = str.split() to make a list of the words
in your string.
use set = set(list) to get rid of double words
if needed.
start with an empty word_list. Copy the first set in the word_list. In the following steps you can loop over the entries in your set and check if they are part of your word_list.
for word in set:
if word not in word_list:
word_list.append(word)
Now you can make a multi-hot vector from your sentence. (1 if word_list[i] in sentence else 0)
Don't forget to make your multi-hot vectors longer (additional zeros) if you add a word to word_list.
last step: make a matrix from your vectors.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
With the following expected input:
[u'able,991', u'about,11', u'burger,15', u'actor,22']
How can I split each string by the comma and return the second half of the string as an int?
This is what I have so far:
def split_fileA(line):
# split the input line in word and count on the comma
<ENTER_CODE_HERE>
# turn the count to an integer
<ENTER_CODE_HERE>
return (word, count)
One of the first things you'll need in learning how to code, is to get to know the set of functions and types you have natively available to you. Python's built-in functions is a good place to start. Also get the habit of consulting the documentation for the stuff you use; it's a good habit. In this case you'll need split and int. Split does pretty much what it says, it splits a given string into multiple tokens, given a separator. You'll find several examples with a simple search in google. int, on the other hand, parses a string (one of the things it does) into a numeric value.
In your case, this is what it means:
def split_fileA(line):
# split the input line in word and count on the comma
word, count = line.split(',')
# turn the count to an integer
count = int(count)
return (word, count)
You won't get this much here in stackoverflow, has other users are often reluctant to do your homework for you. It seems to me that you are at the very beginning of learning how to code so I hope this helps you get started, but remember that learning is also about trial and error.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I am fairly new to python
One of the exercises I have been given is to create the python pseudo code for the following problem:
write an algorithm that given a dictionary (list) of words, finds up to 10 anagrams of a given word.
I'm stuck on ideas on how to solve this.
Currently I have (it's not even proper pseudo)
# Go through the words in the list
# put each letter in some sort of array
# Find words with the letters from this array
I guess this is way too generalistic, I have searched online for specific functions I could use but have not found any.
Any help on specific functions that would help, in making slightly more specified pseudo code?
Here is some help, without writing the code for you
#define a key method
#determine the key using each set of letters, such as the letters of a word in
#alphabetical order
#keyof("word") returns "dorw"
#keyof("dad") returns "add"
#keyof("add") returns "add"
#ingest the word set method
#put the word set into a dictionary which maps
#key->list of up to 10 angrams
#get angrams method
#accept a word as a parameter
#convert the word to its key
#look up the key in the dictionary
#return the up to 10 angrams
#test case: add "dad" and "add" to the word set.
# getting angrams for "dad" should return "dad" and "add"
#test case: add "palm" and "lamp" to the word set.
# getting angrams for "palm" should return "palm" and "lamp"
#consider storing 11 angrams in the list
#a01, a02, a03, a04, a05, a06, a07, a08, a09, a10, a11.
#Then if a01 is provided, you can return a02-a11, which is 10 angrams