How to generate the keys of a dictionary using permutations - python

I need to create a dictionary, values could be left blank or zero but i need the keys to be all the possible combinations of ABCD characters with lenght k. For example, for k = 8
lex = defaultdict(int)
lex = {
'AAAAAAAA':0,
'AAAAAAAB':0,
'AAAAAABB':0,
...}
so far i have tried somethink like this, i know it's wrong but i have no idea how to make it work, i'm new in python so please bear with me.
mydiction = {}
mylist = []
mylist = itertools.permutations('ACTG', 8)
for keys in mydiction:
mydiction[keys] = mylist.next()
print(mydiction)

You can do it in one line, but what you are looking for is combinations_with_replacement
from itertools import combinations_with_replacement
mydict = {"".join(key):0 for key in combinations_with_replacement('ACTG', 8)}

What you're describing isn't permutations, but combinations with replacement. There's a function for that in the itertools module as well.
Note, however, that there are sixty thousand combinations there. Trying to put them all in a dict, or even just iterate over them all, is NOT going to produce happy results.
What's your use case? It's possible you just need to recognize combinations, rather than generating them all exhaustively. And each combination is intrinsically associated with a particular 16-bit integer index, so you could instead store and operate on that.

Although the combinations_with_replacement function works perfectly fine, you will be generating a huge list of string with a collision rate which is relatively high (around 20%)
What you are looking to do can be done using base4 integers. Not only are they faster to process, more memory efficient, but they also have 0 collision (each number is its own hash) meaning a guaranteed O(1) look-up time in worst case.
def num_to_hash(n, k, literals='ABCD'):
return ''.join((literals[(n >> (k - x)*2 & 3)] for x in xrange(1, k+1)))
k = 2
d = {num_to_hash(x, k, 'ACTG'): 0 for x in xrange((4**k) - 1)}
print d
output:
{'AA': 0,
'AC': 0,
'AG': 0,
'AT': 0,
'CA': 0,
'CC': 0,
'CG': 0,
'CT': 0,
'GA': 0,
'GC': 0,
'GT': 0,
'TA': 0,
'TC': 0,
'TG': 0,
'TT': 0}

Related

How to test for different values and missing keys in multiple dictionaries within a list?

I have a list of dictionaries that represent all of the attributes and values on a selection of nodes in Maya. I need to find any differences in the values as well as if there are attributes found on some but not all nodes.
node_dict = [{translateX: 0, translateY: 10, translateZ: 0}, {translateX: 0, translateY: 10, translateZ: 0}, {translateX: 0, translateY: 0, translateZ: 0}]
I need a way to iterate the list of dicts and return only the keys that are different. However, if one value is different, then all of those key values need to be returned.
desired output
diff_dict = {translateY: [10, 10, 0]}
My biggest issue is how to setup the for loops or whatever to test each against each other and report out. Hoping someone has an idea, been hitting this wall too long.
You can just change the representation and check for the condition like this:
node_dict = [{'translateX': 0, 'translateY': 10, 'translateZ': 0}, {'translateX': 0, 'translateY': 10, 'translateZ': 0}, {'translateX': 0, 'translateY': 0, 'translateZ': 0}]
result = {}
for x in node_dict:
for key, value in x.items():
if key in result:
result[key].append(value)
else:
result[key] = [value]
result = { k:v for k,v in result.items() if not (v[1:] == v[:-1]) }
print(result)
this will print {'translateY': [10, 10, 0]}
The for loop iterate over the dictionaries and reconstruct it to a one dictionary where the keys are the keys on all dictionaries and the values are arrays contains all different values for the same key. The last line check if each list have the same values or not to keep it in the final result.

Compare position of a specfic item from an old to a new list

I have a list in python on the same form as the following:
list = [["player1", "team1", "pointsPlayer1"],["player2", "team2", "pointsPlayer2"]]
The list contains between 0 and 20 players.
My problem is that I want to be able to know how many places each player gains or loses after new points are given (after a round).
I know I'll have to make a copy of the list before points are given and then compare it to the updated list, but I don't know how to do that.
What you wrote is an equivalent structure of numpy array which is basically a list of lists, and I'd suggest you using numpy array for this. I don't know what do you mean by gain but this is a little snippet that I quickly wrote to give you an idea. Does it work for you? (Assuming that the array is entirely of type int)
import numpy as np
players = np.zeros((20,3), dtype=int)
def get_gains(points): # also numpy array
gains_dict = {}
for i, point in enumerate(points):
current_points = players[i][2]
gain = current_points + point
gains_dict[players[i][0]] = gain
current_points = gain
return gains_dict
collections.Counter is a useful tool for this kind of task. It's not clear exactly what your inputs and desired output look like, but below is an example you should be able to adapt.
from collections import Counter
c = Counter()
# ROUND 1
c.update({'player1': 40, 'player2': 20, 'player3': 30})
# Counter({'player1': 40, 'player2': 40, 'player3': 60})
order1 = {k: idx for idx, (k, _) in enumerate(c.most_common())}
# {'player1': 0, 'player2': 2, 'player3': 1}
# ROUND 2
c.update({'player1': 10, 'player2': 40, 'player3': 15})
# Counter({'player1': 50, 'player2': 60, 'player3': 45})
order2 = {k: idx for idx, (k, _) in enumerate(c.most_common())}
# {'player1': 1, 'player2': 0, 'player3': 2}
# CHANGE IN ORDER
change = {k: order2[k] - order1[k] for k in c}
# {'player1': 1, 'player2': -2, 'player3': 1}

Building up a counting function

I need to build up a counting function starting from a dictionary. The dictionary is a classical Bag_of_Words and looks like as follows:
D={'the':5, 'pow':2, 'poo':2, 'row':2, 'bub':1, 'bob':1}
I need the function that for a given integer returns the number of words with at least that number of occurrences. In the example F(2)=4, all words but 'bub' and 'bob'.
First of all I build up the inverse dictionary of D:
ID={5:1, 2:3, 1:2}
I think I'm fine with that. Then here is the code:
values=list(ID.keys())
values.sort(reverse=True)
Lk=[]
Nw=0
for val in values:
Nw=Nw+ID[val]
Lk.append([Nw, val])
The code works fine but I do not like it. The point is that I would prefer to use a list comprehension to build up Lk; also I really ate the Nw variable I have used. It does not seems pythonic at all
you can create a sorted array of your word counts then find the insertion point with np.searchsorted to get how many items are to either side of it... np.searchsorted is very efficient and fast. If your dictionary doesn't change often this call is basically free compared to other methods
import numpy as np
def F(n, D):
#creating the array each time would be slow if it doesn't change move this
#outside the function
arr = np.array(D.values())
arr.sort()
L = len(arr)
return L - np.searchsorted(arr, n) #this line does all the work...
what's going on....
first we take just the word counts (and convert to a sorted array)...
D = {"I'm": 12, "pretty": 3, "sure":12, "the": 45, "Donald": 12, "is": 3, "on": 90, "crack": 11}
vals = np.arrau(D.values())
#vals = array([90, 12, 12, 3, 11, 12, 45, 3])
vals.sort()
#vals = array([ 3, 3, 11, 12, 12, 12, 45, 90])
then if we want to know how many values are greater than or equal to n, we simply find the length of the list beyond the first number greater than or equal to n. We do this by determining the leftmost index where n would be inserted (insertion sort) and subtracting that from the total number of positions (len)
# how many are >= 10?
# insertion point for value of 10..
#
# | index: 2
# v
# array([ 3, 3, 11, 12, 12, 12, 45, 90])
#find how many elements there are
#len(arr) = 8
#subtract.. 2-8 = 6 elements that are >= 10
A fun little trick for counting things: True has a numerical value of 1 and False has a numerical value of 0. SO we can do things like
sum(v >= k for v in D.values())
where k is the value you're comparing against.
collections.Counter() is ideal choice for this. Use them on dict.values() list. Also, you need not to install them explicitly like numpy. Sample example:
>>> from collections import Counter
>>> D = {'the': 5, 'pow': 2, 'poo': 2, 'row': 2, 'bub': 1, 'bob': 1}
>>> c = Counter(D.values())
>>> c
{2: 3, 1: 2, 5: 1}

Number permutations in python iterative

I need to generate permutations of digits, the number can be bigger than the digit count. For my current purpose I need to generate permutations of these digits 0, 1, 2 to get numbers of upto 20 digits length. For example I the first few permutations would be 0, 1, 2, 10, 11, 12, ... 1122, 1211.
There are existing answers using Iterator in Python here or here and those gives the full permutation directly.
But I need to perform some tests over each permutations and if I keep the entire permutations list in memory it becomes too big, especially for 20 digits it comes to 320 permutations.
So my question is can it be done without recursion, so that I can perform the tests over each permutations.
Edit:
I'm looking at permutations with repetitions. So for a number of say 20 digits, each digit can take value from [0, 1, 2]. That's why the number of permutations in that case will come to 320.
What you are looking is called a Cartesian product, not a permutation. Python itertools have a method itertools.product() to produce the desired result:
import itertools
for p in itertools.product(range(3), repeat=4):
print p
Output is 3^4 lines:
(0, 0, 0, 0)
(0, 0, 0, 1)
(0, 0, 0, 2)
(0, 0, 1, 0)
(0, 0, 1, 1)
...
(2, 2, 2, 1)
(2, 2, 2, 2)
To produce output tuples with length form 1 to 4, use an additional iteration:
for l in range(1, 5):
for p in itertools.product(range(3), repeat=l):
print p
Finally, this works for string elements, too:
for i in range(5):
for p in itertools.product(('0', '1', '2'), repeat=i):
print ''.join(p),
print
Output:
0 1 2 00 01 02 10 11 12 20 21 22 000 001 002 010 [...] 2220 2221 2222
Yes, your program could like like this:
import itertools
def perform_test(permutation):
pass
# permutations() does not construct entire list, but yields
# results one by on.
for permutation in itertools.permutations([1, 2, 3, 4, 5], 2):
perform_test(permutation)
While there are ways to do this using itertools etc, here is a way that is a bit different from what you would normally do.
If you were to have a list of these permutations in order, what you would actually have is ternary numbers that represent their place in the list. e.g. list[4] is 11 which is 4 in ternary (3*1+1*1). So you could convert the index value that you want to test into ternary and that would produce the correct value.
While python can do conversion from an integer to its form in that base (e.g. int("11",3) outputs 4) the reverse is not implicitly implemented. There are lots of implementations out there though. Here is a good one (modified for your case):
def digit_to_char(digit):
if digit < 10:
return str(digit)
return chr(ord('a') + digit - 10)
def perm(number):
(d, m) = divmod(number, 3)
if d > 0:
return perm(d) + digit_to_char(m)
return digit_to_char(m)
So if you wanted to find the 20th permutation, you could do perm(20), which would give you 202. So now you can just do a regular loop through the index values that you want. With no storage of big lists in memory.
permutation = 0
i = 0
while len(str(permutation)) < 20:
permutation = perm(i)
do_test(permutation)
i += 1

Sort a complex Python dictionary by just one of its values

I am writing a little optimization tool for purchasing stamps at the post office.
In the process I am using a dictionary, which I am sorting according to what I learned in this other "famous" question:
Sort a Python dictionary by value
In my case my dictionary is mildly more complex:
- one four-item-tuple to make the key
- and another five-item-tuple to make the data.
The origin of this dictionary is an iteration, where each successful loop is adding one line:
MyDicco[A, B, C, D] = eval, post, number, types, over
This is just a tiny example of a trivial run, trying for 75 cents:
{
(0, 0, 1, 1): (22, 75, 2, 2, 0)
(0, 0, 0, 3): (31, 75, 3, 1, 0)
(0, 0, 2, 0): (2521, 100, 2, 1, 25)
(0, 1, 0, 0): (12511, 200, 1, 1, 125)
(1, 0, 0, 0): (27511, 350, 1, 1, 275)
}
So far I am using this code to sort (is is working):
MyDiccoSorted = sorted(MyDicco.items(), key=operator.itemgetter(1))
I am sorting by my evaluation-score, because the sorting is all about bringing the best solution to the top. The evaluation-score is just one datum out of a five-item-tuple (in the example those are the evaluation-scores: 22, 31, 2521, 12511 and 27511).
As you can see in the example above, it is sorting (as I want it) by the second tuple, index 1. But I had to (grumpily) bring my "evaluation-score" to the front of my second tuple. The code is obviously using the entire second-tuple for the sorting-process, which is heavy and not needed.
Here is my question: How can I please sort more precisely. I do not want to sort by the entire second tuple of my dictionary: I want to target the first item precisely.
And ideally I would like to put this value back to its original position, namely to be the last item in the second tuple - and still sort by it.
I have read-up on and experimented with the syntax of operator.itemgetter() but have not managed to just "grab" the "first item of my second item".
https://docs.python.org/3/library/operator.html?highlight=operator.itemgetter#operator.itemgetter
(note: It is permissible to use tuples as keys and values, according to:
https://docs.python.org/3/tutorial/datastructures.html?highlight=dictionary
and those are working fine for my project; this question is just about better sorting)
For those who like a little background (you will yell at me that I should use some other method, but I am learning about dictionaries right now (which is one of the purposes of this project)):
This optimization is for developing countries, where often certain values of stamps are not available, or are limited in stock at any given post office. It will later run on Android phones.
We are doing regular mailings (yes, letters). Figuring out the exact postage for each destination with the available values and finding solutions with low stocks of certain values is a not-trivial process, if you consider six different destination-based-postages and hundreds of letters to mail.
There are other modules which help turning the theoretical optimum solution into something that can actually be purchased on any given day, by strategic dialog-guidance...
About my dictionary in this question:
I iterate over all reasonable (high enough to make the needed postage and only overpaying up to a fraction of one stamp) combinations of stamp-values.
Then I calculate a "success" value, which is based on the number of stamps needed (priority), the number of types needed (lower priority)(because purchasing different stamps takes extra time at the counter) and a very high penalty for paying-over. So lowest value means highest success.
I collect all reasonable "solutions" in a dictionary where the tuple of needed-stamps serves as the key, and another tuple of some results-data makes up the values. It is mildly over-defined because a human needs to read it at this phase in the project (for debugging).
If you are curious and want to read the example (first line):
The colums are:
number of stamps of 350 cents
number of stamps of 200 cents
number of stamps of 50 cents
number of stamps of 25 cents
evaluation-score
calculated applied postage
total number of stamps applied
total number of stamp-types
over-payment in cents if any
Or in words: (Assuming a postal service is offering existing stamps of 350, 200, 50 and 25 cents), I can apply postage of 75 cents by using 1x 50 cents and 1x 25 cents. This gives me a success-rating of 22 (the best in this list), postage is 75 cents, needing two stamps of two different values and having 0 cents overpayment.
You can just use a double index, something like this should work:
MyDiccoSorted = sorted(MyDicco.items(), key=lambda s: s[1][2])
Just set 2 to whatever the index is of the ID in the tuple.
I find it easier to use lambda expressions than to remember the various operator functions.
Assuming, for the moment, that your eval score is the 3rd item of your value tuple (i.e. (post, number, eval, types, over):
MyDiccoSorted = sorted(MyDicco.items(), key=lamba x:x[1][2])
Alternatively, you can create a named function to do the job:
def myKey(x): return x[1][2]
MyDiccoSorted = sorted(MyDicco.items(), key=myKey)
You can use a lambda expression instead of operator.itemgetter() , to get the precise element to sort on. Assuming your eval is the first item in the tuple of values, otherwise use the index of the precise element you want in x[1][0] .Example -
MyDiccoSorted = sorted(MyDicco.items(), key=lambda x: x[1][0])
How this works -
A dict.items() returns something similar to a list of tuples (though not exactly that in Python 3.x) , Example -
>>> d = {1:2,3:4}
>>> d.items()
dict_items([(1, 2), (3, 4)])
Now, in sorted() function, the key argument accepts a function object (which can be lambda , or operator.itemgetter() which also return a function, or any simple function) , the function that you pass to key should accept one argument, which would be the element of the list being sorted.
Then that key function is called with each element, and you are expected to return the correct value to sort the list on. An example to help you understand this -
>>> def foo(x):
... print('x =',x)
... return x[1]
...
>>> sorted(d.items(),key=foo)
x = (1, 2)
x = (3, 4)
[(1, 2), (3, 4)]
does this do what you need?
sorted(MyDicco.items(), key=lambda x: x[1][0])
index_of_evaluation_score = 0
MyDiccoSorted = sorted(MyDicco.items(), key=lambda key_value: key_value[1][index_of_evaluation_score])
Placing your evaluation score back at the end where you wanted it, you can use the following:
MyDicco = {
(0, 0, 1, 1): (75, 2, 2, 0, 22),
(0, 0, 0, 3): (75, 3, 1, 0, 31),
(0, 0, 2, 0): (100, 2, 1, 25, 2521),
(0, 1, 0, 0): (200, 1, 1, 125, 12511),
(1, 0, 0, 0): (350, 1, 1, 275, 27511)}
MyDiccoSorted = sorted(MyDicco.items(), key=lambda x: x[1][4])
print MyDiccoSorted
Giving:
[((0, 0, 1, 1), (75, 2, 2, 0, 22)), ((0, 0, 0, 3), (75, 3, 1, 0, 31)), ((0, 0, 2, 0), (100, 2, 1, 25, 2521)), ((0, 1, 0, 0), (200, 1, 1, 125, 12511)), ((1, 0, 0, 0), (350, 1, 1, 275, 27511))]
I think one of the things you might be looking for is a stable sort.
Sorting functions in Python are generally "stable" sorts. For example, if you sort:
1 4 6
2 8 1
1 2 3
2 1 8
by its first column, you'll get:
1 4 6
1 2 3
2 8 1
2 1 8
The order of rows sharing the same value in column 1 does not change. 1 4 6 is sorted before 1 2 3 because that was the original order of these rows before the column 1 sort. Sorting has been 'stable' since version 2.2 of Python. More details here.
On another note I'm interested in how much you had to explain your code. That is a sign that the code would benefit from refactoring to make its purpose clearer.
Named tuples could be used to remove the hard-to-read tuple indices you see in many answer here, e.g. key=lambda x: x[1][0]-- what does that actually mean? What is it doing?
Here's a version using named tuples that helps readers (most importantly, you!) understand what your code is trying to do. Note how the lambda now explains itself much better.
from collections import namedtuple
StampMix = namedtuple('StampMix', ['c350', 'c200', 'c50', 'c25'])
Stats = namedtuple('Stats', ['score', 'postage', 'stamps', 'types', 'overpayment'])
data = {
(0, 0, 1, 1): (22, 75, 2, 2, 0),
(0, 0, 0, 3): (31, 75, 3, 1, 0),
(0, 0, 2, 0): (2521, 100, 2, 1, 25),
(0, 1, 0, 0): (12511, 200, 1, 1, 125),
(1, 0, 0, 0): (27511, 350, 1, 1, 275)
}
candidates = {}
for stampmix, stats in data.items():
candidates[StampMix(*stampmix)] = Stats(*stats)
print(sorted(candidates.items(), key=lambda candidate: candidate[1].score))
You can see the benefits of this approach in the output:
>>> python namedtuple.py
(prettied-up output follows...)
[
(StampMix(c350=0, c200=0, c50=1, c25=1), Stats(score=22, postage=75, stamps=2, types=2, overpayment=0)),
(StampMix(c350=0, c200=0, c50=0, c25=3), Stats(score=31, postage=75, stamps=3, types=1, overpayment=0)),
(StampMix(c350=0, c200=0, c50=2, c25=0), Stats(score=2521, postage=100, stamps=2, types=1, overpayment=25)),
(StampMix(c350=0, c200=1, c50=0, c25=0), Stats(score=12511, postage=200, stamps=1, types=1, overpayment=125)),
(StampMix(c350=1, c200=0, c50=0, c25=0), Stats(score=27511, postage=350, stamps=1, types=1, overpayment=275))
]
and it will help with your algorithms too. For example:
def score(stats):
return stats.postage * stats.stamps * stats.types + 1000 * stats.overpayment

Categories

Resources