i have a function which gets a string like "ABA?" the question-mark is a wildcard which can either be A or B the passed string can have multiple wildcards. It should give back multiple strings as an array with all possible solutions. My Code is far to slow. I am new to python so its a little difficult to find a good solution.
When "ABA?" is passed it should return ['ABAA', 'ABAB'].
from itertools import product
def possibilities(param):
result = []
for i in product([A,B], repeat=param.count('?')):
string = param
for p in [i]:
for val in p:
string = string.replace('?', str(val), 1)
result.append(string)
return result
You should use a generator so you can loop over the output. If you have a lot of wildcards, the complete list would require a lot of memory to store completely (it grows exponentially!)
import itertools
def possibilties(s):
n_wildcard = s.count('?')
for subs in itertools.product(['A','B'], repeat=n_wildcard):
subs = iter(subs)
yield ''.join([x if x != '?' else subs.next() for x in s])
for p in possibilties('A?BA?'):
print p
This gives:
AABAA
AABAB
ABBAA
ABBAB
Related
First off I'm using python.
I have a list of items called tier1 it looks like this.
tier1 = ['a1','a2,'a3',..,'an']
I have 2 functions called functionA and functionZ.
They both take a string as their argument and produce a list output like this. The lists must be produced during execution time and are not available from the start. Only tier1 is available.
listOutput = functionA(tier1[0]).
listOutput looks like this
listOutput = ['b1','b2,'b3',..,'bn']
The next time functionA is used on listOutput lets say item 'b1', it will produce
listOutput = functionA('b1')
output:
listOutput = ['bc1','bc2,'bc3',..,'bcn']
This time when functionA is used, on lets say 'bc1', it might come up empty, so functionZ is used on 'bc1' is used instead and the output is stored somewhere.
listOutput = functionA('bc1')
output
listOutput = []
So I use
listOutput = functionZ('bc1')
output
listOutput = ['series1','series2','series3',....,'seriesn']
Now I have to go back and try bc2, until bcn doing the same logic. Once that's done, I will use functionA on 'b2'. and so on.
The depth of each item is variable.
It looks something like this
As long as listOutput is not empty, functionA must be used on the listOutput items or tier1 items until it comes up empty. Then functionZ must be used on whichever item in the list on which functionA comes up empty.
After tier1, listOutput will also always be a list, which must also be cycled through one by one and the same logic must be used.
I am trying to make a recursive function based on this but I'm stuck.
So far I have,
def recursivefunction (idnum): #idnum will be one of the list items from tier1 or the listOutputs produced
listOutput = functionA(idnum)
if not listOutput:
return functionZ(idnum)
else:
return recursivefunction(listOutput)
But my functions return lists, how do I get them to go deeper into each list until functionZ is used and once it's used to move on to the next item in the list.
Do I need to create a new kind of data structure?
I have no idea where to start, should I be looking to create some kind of class with linked lists?
The way I understand your problem:
there is an input list tier1, which is a list of strings
there are two functions, A and Z
A, when applied to a string, returns a list of strings
Z, when applied to a string, returns some value (type is unclear, assume list of string as well)
the algorithm:
for each element of tier1, apply A to the element
if the result is an empty list, apply Z to the element instead, no further processing
otherwise, if the result is not empty, apply the algorithm on the list
So, in Python:
from random import randint
# since you haven't shared what A and Z do,
# I'm just having them do random stuff that matches your description
def function_a(s):
# giving it a 75% chance to be empty
if randint(1, 4) != 1:
return []
else:
# otherwise between 1 and 4 random strings from some selection
return [['a', 'b', 'c'][randint(0, 2)] for _ in range(randint(1,4))]
# in the real case, I'm sure the result depends on `s` but it doesn't matter
def function_z(s):
# otherwise between 0 and 4 random strings from some selection
return [['x', 'y', 'z'][randint(0, 2)] for _ in range(randint(0,4))]
def solution(xs):
# this is really the answer to your question:
rs = []
for x in xs:
# first compute A of x
r = function_a(x)
# if that's the empty list
if not r:
# then we want Z of x instead
r = function_z(x)
else:
# otherwise, it's the same algorithm applied to all of r
r = solution(r)
# whatever the result, append it to rs
rs.append(r)
return rs
tier1 = ['a1', 'a2', 'a3', 'a4']
print(solution(tier1))
Note that function_a and function_z are just functions generating random results with the types of results you specified. You didn't share what the logic of A and Z really is, so it's hard to verify if the results are what you want.
However, the function solution does exactly what you say it should - if I understand you somewhat complicated explanation of it correctly.
Given that the solution to your question is basically this:
def solution(xs):
rs = []
for x in xs:
r = function_a(x)
if not r:
r = function_z(x)
else:
r = solution(r)
rs.append(r)
return rs
Which can even be rewritten to:
def solution_brief(xs):
return [function_z(r) if not r else solution(r) for r in [function_a(x) for x in xs]]
You should reexamine your problem description. The key with programming is understanding the problem and breaking it down to its essential steps. Once you've done that, code is quick to follow. Whether you prefer the first or second solution probable depends on experience and possibly on tiny performance differences.
By the way, any solution written as a recursive function, can also be written purely iterative - that's often preferable from a memory and performance perspective, but recursive functions can have the advantage of being very clean and simple and therefore easier to maintain.
Putting my coding where my mouth is, here's an iterative solution of the same problem, just for fun (not optimal by any means):
def solution_iterative(xs):
if not xs:
return xs
rs = xs.copy()
stack_rs = [rs]
stack_is = [0]
while stack_rs:
r = function_a(stack_rs[-1][stack_is[-1]])
if not r:
stack_rs[-1][stack_is[-1]] = function_z(stack_rs[-1][stack_is[-1]])
stack_is[-1] += 1
else:
stack_rs[-1][stack_is[-1]] = r
stack_rs.append(r)
stack_is.append(0)
while stack_is and stack_is[-1] >= len(stack_rs[-1]):
stack_is.pop()
stack_rs.pop()
if stack_is:
stack_is[-1] += 1
return rs
This code is part of a challenge that requires the code to give back permutations of a string with no duplicates. The code executes but under some of the challenges it doesn't pass because of the time gate and i dont know a way to make it execute faster.
from itertools import permutations as perm
def permutations(string):
permList = list(perm(string))
joinedList = [''.join(tups) for tups in permList]
ans = []
[ans.append(x) for x in joinedList if x not in ans]
return ans
Again code runs for certain examples but examples with large strings and alot of matches the code takes too long and fails the challenge.
If you want to prevent duplicates, use a set, not lists. Your code takes forever because you're constantly scanning the list while you are inserting new data. Instead you can do a constant time lookup / replacement
And you can save on storage costs by using generator comprehension rather than list comprehension
def permutations(string):
permList = (''.join(p) for p in perm(string)))
result = set()
for p in permList:
result.add(p)
return list(result)
Ideally, you should keep the output in a generator for as long as possible; evaluating it takes time and space.
Here we maintain a set of seen elements in order to avoid yielding them again thus keeping each unique.
import itertools
def unique_permutations(seq):
seen = set()
for p in itertools.permutations(seq):
if p not in seen:
seen.add(p)
yield(p)
for p in unique_permutations('aaab'):
print(p)
i have a list called "self.__sequences" with some DNA sequences, and the following is part of that list
['AAAACATCAGTATCCATCAGGATCAGTTTGGAAAGGGAGAGGCAATTTTTCCTAAACATGTGTTCAAATGGTCTGAGACAGACGTTAAAATGAAAAGGGG\n', 'TTAGAAACTATGGGATTATTCACTCCCTAGGTACTGAGAATGGAAACTTTCTTTGCCTTAATCGTTGACATCCCCTCTTTTAGGTTCTTGCTTCCTAACA\n', 'CTGAGTAAATCATATACTCAATGATTTTTTTATGTGTGTGCATGTGTGCTGTTGATATTCTTCAGTACCAAAACCCATCATCTTATTTGCATAGGGAAGT\n', 'CTGCCAGCACGCTGTCACCTCTCAATAACAGTGAGTGTAATGGCCATACTCTTGATTTGGTTTTTGCCTTATGAATCAGTGGCTAAAAATATTATTTAAT\n', 'ACTTATATTATGTTGACACTCAAAAATTTCAGAATTTGGAGTATTTTGAATTTCAGATTTTCTGATTAGGGATGTACCTGTACTTTTTTTTTTTTTTTTT\n', 'TTTGTTCTTTTTGTAATGGGGCCAGATGTCACTCATTCCACATGTAGTATCCAGATTGAAATGAAATGAGGTAGAACTGACCCAGGCTGGACAAGGAAGG\n', 'AAGAGGTAAAGGAAACAGACTGATGGCTGGAGAATTTGACAACGTATAAGAGAATCTGAGAATTCTTTTGAAAAATACTCAAATTTCCAGCCAAGATAGA\n', 'ACACTTGAGCATTAAGAGGAAACACCAAGGAAACAGATTTTAGGTCAAGAAAAAGAAGAGCTCTCTCATGTCAGAGCAGCCTAGAGCAGGAAAGTGCTGT\n', 'ACATCTATGCCCACCACACCTNGGTATGCANTGATGCTCATGAGATGGGAGGTGGCTACAGATTGCTCCATATAGAAATGTTACCTAGCATGTTAAAGAT\n']
I want to compute the gc conent for each DNA sequence and returns a dictionary with DNA:gc content. For example, something like that:
{(AAAACATCAGTATCCATCAGGATCAGTTTGGAAAGGGAGAGGCAATTTTTCCTAAACATGTGTTCAAATGGTCTGAGACAGACGTTAAAATGAAAAGGGG:0.5), (TTAGAAACTATGGGATTATTCACTCCCTAGGTACTGAGAATGGAAACTTTCTTTGCCTTAATCGTTGACATCCCCTCTTTTAGGTTCTTGCTTCCTAACA:0.33)}
gc content= (Count(G) + Count(C)) / (Count(A) + Count(T) + Count(G) + Count(C))
I write the following code but it gives me nothing!
def get_gc_content(self):
for i in range (len(self.__sequence)):
if seq[i] in self.__sequence:
return (seq.count('G')+seq.count('C'))/float(seq.count('G')+seq.count('C')+seq.count('T')+seq.count('A'))
Can anyone help me to improve my code?
Assuming you analyze DNA (not RNA, etc) and strip() newlines and spaces from your sequences, seq.count('A') + seq.count('G') + seq.count('C') + seq.count('T') would always equal len(seq).
Note that seq.some_method_name operates on the whole sequence. You don't need the for loop that iterates over sequence elements at all.
The i in self.__sequence is always False (you pick an integer and see if it belogs to to sequence of four possible letters), so it does nothing.
The first return inside the loop will break the loop.
Here's a piece of code that seems to work:
def getContentOf(target_list, seq):
# add a 1 for each nucleotide in target_list
target_count = sum(1 for x in seq if x in target_list)
return float(target_count) / len(seq)
Answers look sensible:
>>> getContentOf(['G', 'C'], 'AGCT')
0.5
>>> getContentOf(['G', 'C'], 'AGCTATAT')
0.25
>>> _
So what you need is something like {seq: getContentOf(['G', 'C'], seq)}
BTW the sequences you gave in your post seem to have different G+C content than your examples state.
what about this:
self.myDict = {}
def create_dna_dict(self):
for i in seq:
if i in self.__sequence:
self.myDict[i] = (seq.count('G') + seq.count('C')) / float(seq.count('G') + seq.count('C') + seq.count('T') + seq.count('A'))
a few things though:
Are you sure seq shouldn't be self.seq?
__sequence is a very odd variable name. It seems unconventional.
In the example of the dict you want, you have an incorrect dict
syntax
I am quite sure that you're dict, with it's tuples and lack of strings:
{(AAAACATCAGTATCCATCAGGATCAGTTTGGAAAGGGAGAGGCAATTTTTCCTAAACATGTGTTCAAATGGTCTGAGACAGACGTTAAAATGAAAAGGGG:0.5), (TTAGAAACTATGGGATTATTCACTCCCTAGGTACTGAGAATGGAAACTTTCTTTGCCTTAATCGTTGACATCCCCTCTTTTAGGTTCTTGCTTCCTAACA:0.33)}
Should look like this, with those brackets removed and the keys are strings:
{"AAAACATCAGTATCCATCAGGATCAGTTTGGAAAGGGAGAGGCAATTTTTCCTAAACATGTGTTCAAATGGTCTGAGACAGACGTTAAAATGAAAAGGGG":0.5, "TTAGAAACTATGGGATTATTCACTCCCTAGGTACTGAGAATGGAAACTTTCTTTGCCTTAATCGTTGACATCCCCTCTTTTAGGTTCTTGCTTCCTAACA":0.33}
I have an arbitrary list of arbitrary (but uniform) lists of numbers. (They are the boundary coordinates of bins in an n-space whose corners I want to plot, but that's not important.) I want to generate a list of all the possible combinations. So: [[1,2], [3,4],[5,6]] produces [[1,3,5],[1,3,6],[1,4,5],[1,4,6],[2,3,5]...].
Can anyone help me improve this code? I don't like the isinstance() call, but I can't figure out a more python-ish way to append the elements on the first pass, when the first arg (pos) is a list of numbers as opposed to a list of lists.
def recurse(pos, vals):
out = []
for p in pos:
pl = p if isinstance(p,list) else [p]
for x in vals[0]:
out.append(pl + [x])
if vals[1:]:
return recurse(out, vals[1:])
else:
return out
a = [[1,2,3],[4,5,6],[7,8,9],[11,12,13]]
b = recurse(a[0], a[1:])
Thank you.
From your example it seems all you want is
from itertools import product
a = [[1,2,3],[4,5,6],[7,8,9],[11,12,13]]
print list(product(*a))
Try with the itertools.product
import itertools
a = [[1,2,3],[4,5,6],[7,8,9],[11,12,13]]
iterator = itertools.product(*a)
result = [item for item in iterator.next()]
To be more pythonic you have don't want to do type checking. Python is all about duct typing. What happens if you pass a tuple to the function (which should be more efficient).
You could try
if type(p) != list:
try:
p = list(p)
except TypeError:
p = [p]
pl = p
When there is a library/module that does what you want, you should opt use it (+1 to all those who mentioned itertools.product). However, if you're interested in an algorithm to accomplish this, you are looking for a class of algorithms called recursive descent
answer = []
def recurse(points, curr=[]):
if not points:
answer.append(curr)
curr = []
return
else:
for coord in points[0]:
recurse(points[1:], curr+[coord])
(This is professional best practise/ pattern interest, not home work request)
INPUT: any unordered sequence or generator items, function myfilter(item) returns True if filter condition is fulfilled
OUTPUT: (filter_true, filter_false) tuple of sequences of
original type which contain the
elements partitioned according to
filter in original sequence order.
How would you express this without doing double filtering, or should I use double filtering? Maybe filter and loop/generator/list comprehencion with next could be answer?
Should I take out the requirement of keeping the type or just change requirement giving tuple of tuple/generator result, I can not return easily generator for generator input, or can I? (The requirements are self-made)
Here test of best candidate at the moment, offering two streams instead of tuple
import itertools as it
from sympy.ntheory import isprime as myfilter
mylist = xrange(1000001,1010000,2)
left,right = it.tee((myfilter(x), x) for x in mylist)
filter_true = (x for p,x in left if p)
filter_false = (x for p,x in right if not p)
print 'Hundred primes and non-primes odd numbers'
print '\n'.join( " Prime %i, not prime %i" %
(next(filter_true),next(filter_false))
for i in range(100))
Here is a way to do it which only calls myfilter once for each item and will also work if mylist is a generator
import itertools as it
left,right = it.tee((myfilter(x), x) for x in mylist)
filter_true = (x for p,x in left if p)
filter_false = (x for p,x in right if not p)
Let's suppose that your probleme is not memory but cpu, myfilter is heavy and you don't want to iterate and filter the original dataset twice. Here are some single pass ideas :
The simple and versatile version (memoryvorous) :
filter_true=[]
filter_false=[]
for item in items:
if myfilter(item):
filter_true.append(item)
else:
filter_false.append(item)
The memory friendly version : (doesn't work with generators (unless used with list(items)))
while items:
item=items.pop()
if myfilter(item):
filter_true.append(item)
else:
filter_false.append(item)
The generator friendly version :
while True:
try:
item=next(items)
if myfilter(item):
filter_true.append(item)
else:
filter_false.append(item)
except StopIteration:
break
The easy way (but less efficient) is to tee the iterable and filter both of them:
import itertools
left, right = itertools.tee( mylist )
filter_true = (x for x in left if myfilter(x))
filter_false = (x for x in right if myfilter(x))
This is less efficient than the optimal solution, because myfilter will be called repeatedly for each element. That is, if you have tested an element in left, you shouldn't have to re-test it in right because you already know the answer. If you require this optimisation, it shouldn't be hard to implement: have a look at the implementation of tee for clues. You'll need a deque for each returned iterable which you stock with the elements of the original sequence that should go in it but haven't been asked for yet.
I think your best bet will be constructing two separate generators:
filter_true = (x for x in mylist if myfilter(x))
filter_false = (x for x in mylist if not myfilter(x))