Automatic dictionary generation - Python - python

I am trying to create a program which outputs all permutations of a string of length n whilst avoiding a defined substring, of length k. For example:
Derive all possible strings, up to a length of 5 characters, that can be generated from an initial empty set, which can either go to A or B, but the string cannot contain the substring "AAB" which is not allowed.
i.e. base case of [""] is the empty set.
The dictionary would be - A:{A}, B:{A,B}
From the empty set we can go to A, and we can go to B. We can not go to a B after an A but we can go to an A after a B. And both A and B can access themselves
example output: a,b,aa,bb,ba,aaa,bbb,baa,bba ... etc
How would I go about prompting a user to define a substring to avoid, and from that generate a dictionary which abides to these rules?
Any help or clarification would be greatly received.
Regards,
rkhad

The itertools module has a useful method called permutations():
(from http://docs.python.org/library/itertools.html#itertools.permutations)
itertools.permutations(iterable[, r])
Return successive r length permutations of elements in the iterable.
If r is not specified or is None, then r defaults to the length of the
iterable and all possible full-length permutations are generated.
Permutations are emitted in lexicographic sort order. So, if the input
iterable is sorted, the permutation tuples will be produced in sorted
order.
List comprehensions provide an easy way to filter generated permutations like this, but beware that if you are storing permutations of a large string that you will quickly get a very large list. You may want to therefore use a set to whittle down your list to non-duplicates. Also, you may find the function sorted to be useful if you intend to iterate through your "paths" in lexicographic order. Lastly, the in operator, when applied to strings, checks for a substring (x in y checks if x is a substring of y).
>>> from itertools import permutations
>>> perms = [''.join(p) for p in permutations('AAAABBBB', 4)]
>>> len(perms)
1680
>>> len(set(perms))
16
>>> filtered = [p for p in sorted(set(perms)) if 'AB' not in p]
>>> filtered
['AAAA', 'BAAA', 'BBAA', 'BBBA', 'BBBB']
I'm working on my dissertation right now too, in the area of Formal Languages. The concept of substring membership can be represented by a very simple regular grammar which corresponds to a deterministic finite automaton. To jog your memory:
http://en.wikipedia.org/wiki/Regular_grammar
http://en.wikipedia.org/wiki/Finite-state_machine
When you look into these you will find that you need to somehow keep track of the current "state" of your computation if you want it to have different "dictionaries" at different phases. I encourage you to read the wikipedia articles, and ask me some follow-up questions as I'd be happy to help you work through this.

Related

Selecting mutiple random sequences of N elements from M different-length lists in Python

I have X lists of elements, each list containing a different number of elements (without repetitions inside a single list). I want to generate (if possible, 500) sequences of 3 elements, where each element belongs to a different list, and sequences do not repeat. So something like:
X (in this case 4) lists of elements: [A1,A2], [B1,B2,B3,B4], [C1], [D1,D2,D3,D4,D5]
possible results: [A1,B2,D2], [B3,C1,D2], [A1,B2,C1]... (here 500 sequences are impossible, so would be less)
I think I know how to do it with a nasty loop: join all the lists, random.sample(len(l),3) from the joint list, if 2 indices belong to the same list repeat, if not, check if the sequence was not found before. But that would be very slow. I am looking for a more pythonic or more mathematically clever way.
Perhaps a better way would be to use random.sample([A,B,C,D], 3, p=[len(A), len(B), len(C), len(D)]), then for each sequence from it randomly select an element from each group in the sequence, then check if a new sequence generated in this way hasn't been generated before. But again, a lot of looping.
Any better ideas?
Check itertools module (combination and permutation in particular).
You can get a random.choice() from the permutations of 3 elements from the X lists (thus selecting 3 lists), and for each of them get a random.choice() (random module).

Manual product with unknown number of arguments

The following examples give the same result:
A.
product = []
for a in "abcd":
for b in "xy":
product.append((a,b))
B.
from itertools import product
list(product("abcd","xy"))
How can I calculate the cartesian product like in example A when I don't know the number of arguments n?
REASON I'm asking this:
Consider this piece of code:
allocations = list(product(*strategies.values()))
for alloc in allocations:
PWC[alloc] = [a for (a,b) in zip(help,alloc) if coalitions[a] >= sum(b)]
The values of the strategies dictionary are list of tuples, help is an auxiliary variable (a list with the same length of every alloc) and coalitions is another dictionary that assigns to the tuples in help some numeric value.
Since strategies values are sorted, I know that the if statement won't be true anymore after a certain alloc. Since allocations is a pretty big list, I would avoid tons of comparisons and tons of sums if I could use the example algorithm A.
You can do:
items = ["abcd","xy"]
from itertools import product
list(product(*items))
The list items can contain an arbitrary number of strings and it'll the calculation with product will provide you with the Cartesian product of those strings.
Note that you don't have to turn it into a list - you can iterate over it and stop when you no longer wish to continue:
for item in product(*items):
print(item)
if condition:
break
If you just want to abort the allocations after you hit a certain condition, and you want to avoid generating all the elements from the cartesian product for those, then simply don’t make a list of all combinations in the first place.
itertools.product is lazy that means that it will only generate a single value of the cartesian product at a time. So you never need to generate all elements, and you also never need to compare the elements then. Just don’t call list() on the result as that would iterate the whole sequence and store all possible combinations in memory:
allocations = product(*strategies.values())
for alloc in allocations:
PWC[alloc] = [a for (a,b) in zip(help,alloc) if coalitions[a] >= sum(b)]
# check whether you can stop looking at more values from the cartesian product
if someCondition(alloc):
break
It’s just important to note how itertools.product generates the values, what pattern it follows. It’s basically equivalent to the following:
for a in firstIterable:
for b in secondIterable:
for c in thirdIterable:
…
for n in nthIterable:
yield (a, b, c, …, n)
So you get an increasing pattern from the left side of your iterables. So make sure that you order the iterables in a way that you can correctly specify a break condition.

Determine whether string contained within another string in python

I am looking to determine whether a string is fully contained at the start of a list of other string. For example if i had the string cde, and the list of strings:
['ab', 'bce', 'cdef']
then it would be determine that cde is contained at the start of cdef
I'm also looking to go the other way around - i.e. if i had the term abc to identify that ab from the above list is contained within it.
Now obviously this is trivial to set up with a for loop, checking each instance with the function startswith, however this is not scalable with a very large list of possibilities to check on.
While checking each instance is O(n) [and hence very slow if you have 100,000 possibilities], i am looking for a way of checking of O(1) ... it feels like if the "list" was pre-sorted, then can simply extract the nearest match, but not sure how.
Clarification:
I solely looking where there is a perfect match at the start of the string (i.e the whole of search term is included).
I will be looking up multiple search terms (thus while initially sorting the data may not be quick, the sunk cost would save on subsequent look troughs).
Ideally would return every possible match (i.e. if cdef and cdefg where in the list, and looking up cde, then both would be returned).
I use the term "list" loosely, as in a collection of terms.
It's not possible in O(1), since by definition you have to go over the entire array. If the array is sorted then you can do a binary search for your string, and then check if the element at that position starts with your string. That operation is O(log n).
import bisect
# return the index of the string starting with the prefix
# or None if no such string is in the list
def search(a, prefix):
i = bisect.bisect_left(a, prefix)
isAtStart = (i < len(a) and a[i].startswith(prefix))
return i if isAtStart else None
search(['ab', 'bce', 'cdef'], 'bc')

Calculating permutations without repetitions in Python

I have two lists of items:
A = 'mno'
B = 'xyz'
I want to generate all permutations, without replacement, simulating replacing all combinations of items in A with items in B, without repetition. e.g.
>>> do_my_permutation(A, B)
['mno', 'xno', 'mxo', 'mnx', 'xyo', 'mxy', 'xyz', 'zno', 'mzo', 'mnz', ...]
This is straight-forward enough for me to write from scratch, but I'm aware of Python's starndard itertools module, which I believe may already implement this. However, I'm having trouble identifying the function that implements this exact behavior. Is there a function in this module I can use to accomplish this?
Is this what you need:
["".join(elem) for elem in itertools.permutations(A+B, 3)]
and replace permutations with combinations if you want all orderings of the same three letters to be collapsed down into a single item (e.g. so that 'mxo' and 'mox' do not each individually appear in the output).
You're looking for itertools.permutations.
From the docs:
Elements are treated as unique based on their position, not on their value. So if the input elements are unique, there will be no repeat values.
To have only unique, lexically sorted, permutations, you can use this code:
import itertools
A = 'mno'
B = 'xyz'
s= {"".join(sorted(elem)) for elem in itertools.permutations(A+B, 3)}

Insert number to a list

I have an ordered dictionary like following:
source =([('a',[1,2,3,4,5,6,7,11,13,17]),('b',[1,2,3,12])])
I want to calculate the length of each key's value first, then calculate the sqrt of it, say it is L.
Insert L to the positions which can be divided without remainder and insert "1" after other number.
For example, source['a'] = [1,2,3,4,5,6,7,11,13,17] the length of it is 9.
Thus sqrt of len(source['a']) is 3.
Insert number 3 at the position which can be divided exactly by 3 (eg. position 3, position 6, position 9) if the position of the number can not be divided exactly by 3 then insert 1 after it.
To get a result like folloing:
result=([('a',["1,1","2,1","3,3","4,1","5,1","6,3","7,1","11,1","13,3","10,1"]),('b',["1,1","2,2","3,1","12,2"])]
I dont know how to change the item in the list to a string pair. BTW, this is not my homework assignment, I was trying to build a boolean retrival engine, the source data is too big, so I just created a simple sample here to explain what I want to achive :)
As this seems to be a homework, I will try to help you with the part you are facing problem with
I dont know how to change the item in the list to a string pair.
As the entire list needs to be updated, its better to recreate it rather than update it in place, though its possible as lists are mutable
Consider a list
lst = [1,2,3,4,5]
to convert it to a list of strings, you can use list comprehension
lst = [str(e) for e in lst]
You may also use built-in map as map(str,lst), but you need to remember than in Py3.X, map returns a map object, so it needs to be handled accordingly
Condition in a comprehension is best expressed as a conditional statement
<TRUE-STATEMENT> if <condition> else <FALSE-STATEMENT>
To get the index of any item in a list, your best bet is to use the built-in enumerate
If you need to create a formatted string expression from a sequence of items, its suggested to use the format string specifier
"{},{}".format(a,b)
The length of any sequence including a list can be calculated through the built-in len
You can use the operator ** with fractional power or use the math module and invoke the sqrt function to calculate the square-root
Now you just have to combine each of the above suggestion to solve your problem.

Categories

Resources