Generating unique permutations in a predictable order with Python and Sympy - python

I have a list of elements in python. In the final code, the length of the list will be variable and the elements of my list much more lengthy, but I can demonstrate my question with three dummy elements.
Basically, for any list of length n, there will be n-1 identical elements, and 1 unique element.
So, in the example of three elements I have:
test = ['b', 'a', 'a']
For small cases such as 3, where I can verify by eye the order of the elements, I have been using a function from the Sympy module, as below:
permutations = list(multiset_permutations(test))
However, once cases become too large, I'm not certain that the order will be predictable. And, checking the official documentations doesn't really clarify the issue for me very well.
Is there a way to generate these permutations in a predictable order such that I could know, for example:
permutations[0] = ['b', 'a', 'a']
permutations[1] = ['a', 'b', 'a']
permutations[2] = ['a', 'a', 'b']
Thank you for any help that can be given.

The elements are ordered before the permutations are generated by multiset_permutations. Your result will not depend on the order of the input items.

Related

set item at multiple indexes in a list

I am trying to find a way to use a list of indexes to set values at multiple places in a list (as is possible with numpy arrays).
I found that I can map __getitem__ and a list of indexes to return the values at those indexes:
# something like
a_list = ['a', 'b', 'c']
idxs = [0, 1]
get_map = map(a_list.__getitem__, idxs)
print(list(get_map)) # yields ['a', 'b']
However, when applying this same line of thought to __setitem__, the setting fails. This probably has something to do with pass-by-reference vs pass-by-value, which I have never fully understood no matter how many times I've read about it.
Is there a way to do this?
b_list = ['a', 'b', 'c']
idxs = [0, 1]
put_map = map(b_list.__setitem__, idx, ['YAY', 'YAY'])
print(b_list) # yields ['YAY', 'YAY', 'c']
For my use case, I only want to set one value at multiple locations. Not multiple values at multiple locations.
EDIT: I know how to use list comprehension. I am trying to mimic numpy's capability to accept a list of indexes for both getting and setting items in an array, except for lists.
The difference between the get and set case is that in the get case you are interested in the result of map itself, but in the set case you want a side effect. Thus, you never consume the map generator and the instructions are never actually executed. Once you do, b_list gets changed as expected.
>>> put_map = map(b_list.__setitem__, idxs, ['YAY', 'YAY'])
>>> b_list
['a', 'b', 'c']
>>> list(put_map)
[None, None]
>>> b_list
['YAY', 'YAY', 'c']
Having said that, the proper way for get would be a list comprehension and for set a simple for loop. That also has the advantage that you do not have to repeat the value to put in place n times.
>>> for i in idxs: b_list[i] = "YAY"
>>> [b_list[i] for i in idxs]
['YAY', 'YAY']

How to arrange the output of set based on predefined list

list1=['f','l','a','m','e','s'] #This is the predefined list
list2=['e','e','f','a','s','a'] #This is the list with repitition
x=list(set(list2)) # I want to remove duplicates
print(x)
Here I want the variable x to retain the order which list1 has. For example, if at one instance set(list2) produces the output as ['e','f','a','s'], I want it to produce ['f','a','e','s'] (Just by following the order of list1).
Can anyone help me with this?
Construct a dictionary that maps characters to their position in list1. Use its get method as the sort-key.
>>> dict1 = dict(zip(list1, range(len(list1))))
>>> sorted(set(list2), key=dict1.get)
['f', 'a', 'e', 's']
This is one way using dictionary:
list1=['f','l','a','m','e','s'] #This is the predefined list
list2=['e','e','f','a','s','a'] #This is the list with repitition
x=list(set(list2)) # I want to remove duplicates
d = {key:value for value, key in enumerate(list1)}
x.sort(key=d.get)
print(x)
# ['f', 'a', 'e', 's']
Method index from the list class can do the job:
sorted(set(list2), key=list1.index)
What is best usually depends on actual use. With this problem it is important to know the expected sizes of the lists to choose the most efficient approach. If we are keeping much of the dictionary the following query works well and has the additional benefit that it is easy to read.
set2 = set(list2)
x = [i for i in list1 if i in set2]
It would also work without turning list2 into a set first. However, this would run much slower with a large list2.

How to split a list on an element delimiter [duplicate]

This question already has answers here:
How to split a list into chunks determined by a separator?
(4 answers)
Closed 5 years ago.
Are there concise and elegant ways of splitting a list in Python into a list of sub-lists by a delimiting element, such that ['a', 'delim', 'b'] -> [['a'], ['b']]?
Here is the example:
ldat = ['a','b','c','a','b','c','a','b']
dlim = 'c'
lspl = [] # an elegant python one-liner wanted on this line!
print(lspl) # want: [['a', 'b'], ['a', 'b'], ['a', 'b']]
Working examples that seem overly complex
I have surveyed documentation and related questions on stackoverflow - many referenced below - which did not answer my question, and am summarizing my research below: several approaches which do generate the desired output, but are verbose and intricate, and what is happening (splitting a list) is not immediately apparent -- you really have to squint.
Are there better ways? I am primarily interested in readability for beginners (e.g. teaching), canonical / 'Pythonic' approaches, and secondarily in the most efficient approaches (e.g. timeit speed). Ideally answers would address both Python 2.7 and 3.x.
with conditional .append()
Loop through the list and either append to the last output list or add a new output list. Based on an example that includes the delimiter, but altered to exclude it. I'm not sure how to make it a one-liner, or if that is even desirable.
lspl = [[]]
for i in ldat:
if i==dlim:
lspl.append([])
else:
lspl[-1].append(i)
print(lspl) # prints: [['a', 'b'], ['a', 'b'], ['a', 'b']]
with itertools.groupby
Combine itertools.groupby with list comprehension. Many answers include delimeters, this is based on those that exclude delimeters.
import itertools
lspl = [list(y) for x, y in itertools.groupby(ldat, lambda z: z == dlim) if not x]
print(lspl) # prints: [['a', 'b'], ['a', 'b'], ['a', 'b']]
with slicing on indices
Some related questions have discussed how to use slicing after using .index() -- however answers usually focus on finding the first index only. One can extend this approach by first finding a list of indices and then looping through a self-zipped list to slice the ranges.
indices = [i for i, x in enumerate(ldat) if x == dlim]
lspl = [ldat[s+1:e] for s, e in zip([-1] + indices, indices + [len(ldat)])]
print(lspl) # prints: [['a', 'b'], ['a', 'b'], ['a', 'b']]
However, like all the approaches I have found, this seems like a very complex way of enacting a simple split-on-delimiter operation.
Comparison to string splitting
By comparison and as a model only, here is a working, concise, and elegant way of splitting
a string into a list of sub-strings by a delimiter.
sdat = 'abcabcab'
dlim = 'c'
sspl = sdat.split(dlim)
print(sspl) # prints: ['ab', 'ab', 'ab']
NOTE: I understand there is no split method on lists in Python, and I am not asking about splitting a string. I am also not asking about splitting element-strings into new elements.
or this:
ldat = ['a','b','c','a','b','c','a','b']
dlim = 'c'
lspl = [] # an elegant python one-liner wanted on this line!
print(lspl) # want: [['a', 'b'], ['a', 'b'], ['a', 'b']]
s = str(ldat).replace(", '%s', " % dlim, "],[")
result = eval(s)
print(result)

Why does pythons slice indexing give counter intuitive results? [duplicate]

This question already has answers here:
Printing a column of a 2-D List in Python
(7 answers)
Closed 6 years ago.
If I have a 2D list in python Data and I want to create a slice of that 2D list, where I select all the elements from the first index and a single on from the second.
eg.
Data = [[a,b,c],[d,e,f],[h,i,g]]
and I want the list;
raw_data = [b,e,i]
Why does doing something like;
raw_data = Data[:][1]
not give the desired output?
I have specified the whole first index and the 1 index for the second.
Instead I get the output that is;
raw_data = [d,e,f]
Which is what I would expect to get from;
raw_data = Data[1][:]
raw_data = [d,e,f]
So;
Data[1][:] = Data[:][1]
Which is not compatible with my mental model of how lists work in python.
Instead I have to use a loop to do it;
raw_data = []
for i in xrange(0,len(Data),1):
raw_data.append(Data[i][1])
So my question is, can anyone explain why Data[1][:] = Data[:][1] ?
Thanks for reading!
lst[:] has no explicit start an no explicit end, so according to the Python documentation, it will return a copy of the list starting at the start and ending at the end of the list. In other words, it will return a copy of same list you have before. So:
>>> Data = [['a','b','c'],['d','e','f'],['h','i','g']]
>>> Data[:]
[['a', 'b', 'c'], ['d', 'e', 'f'], ['h', 'i', 'g']]
So when you say Data[:], that will evaluate to the same as a copy of Data, meaning that Data[:][1] essentially is just Data[1], which is [d,e,f]
If you do it the other way:
>>> Data[1]
['d', 'e', 'f']
>>> Data[1][:]
['d', 'e', 'f']
You get the second element in data, [d,e,f], then you use that same list slicing syntax as before to get that same list again.
To get what you want, I'd use a list comprehension:
>>> [x[1] for x in Data]
['b', 'e', 'i']
Simple as that.
Vanilla Python doesn't have two dimensional arrays, but it does allow for extensions to implement them. You have a list of lists, which is somewhat different.
The solution to your problem is to use numpy which does have a 2d array type. You can then say data[:,1]
Why your example doesn't work as you expect: data[:] means "a copy of data" and so data[:][1] means the index 1 element of the copy of data, which is [d,e,f]
It's pretty obvious if you go through what's happening from left to right.
raw_data = Data[:]
will give you the entirety of Data, so the whole list of lists: [[a,b,c],[d,e,f],[h,i,g]]
raw_data = Data[:][1]
will then give you the element at index 1 in this list, which is [d,e,f].
On the other hand,
raw_data = Data[1]
will return the element at position 1 in data, which is also [d,e,f].
[:] on this object will again return itself in its entirety.
What you are trying to do is best done with a list comprehension, such as:
raw_data = [x[1] for x in Data]
This will give you list of all second elements in all lists in Data.

Python : DIY generalize this "all_subsets" function to any size subsets

Implementing a toy Apriori algorithm for a small-data association rule mine, I have a need for a function to return all subsets.
The length of the subsets is given by parameter i. I need to generalize this function for any i. The cases for i 1 or 2 are trivial, and the general pattern can be seen : a list of tuples of length i where order is imposed to prevent duplicates.
def all_subsets(di,i):
if i == 1:
return di
elif i == 2:
return [(d1,d2) for d1 in di for d2 in di if d1 < d2]
else:
return [ ... ]
How can I generalize this i nested loops pattern in a concise manner, say using list comprehensions, generators or some "functional programming" concepts?
I was thinking of some kind of list of functions, but I don't really know how I can generalize i nested loops. Any hints or full answers will be treated as awesome.
Instead of rolling out your own, you could use itertools.combinations().
You mention in a comment that the code here is opaque to you. But it's probably the best way of implementing a combinations function of the kind you're aiming for, and it's worth understanding, so I'll try to explain it in detail.
The basic idea is that given a sequence and a number of items to choose, we can represent each combination as a sequence of indices into the given sequence. So for example, say we have a list ['a', 'b', 'c', 'd', 'e'], and we want to generate all combinations of two values from that list.
Our first combination looks like this...
['a', 'b', 'c', 'd', 'e']
^ ^
...and is represented by the list of indices [0, 1]. Our next combination looks like this:
['a', 'b', 'c', 'd', 'e']
^ ^
And is represented by the list of indices [0, 2].
We keep moving the second caret forward, keeping the first in place, until the second caret reaches the end. Then we move the first caret to index 1 and "reset" the process by moving the second caret back to index 2.
['a', 'b', 'c', 'd', 'e']
^ ^
Then we repeat the process, moving the second caret forward until it reaches the end, and then moving the first forward by one and resetting the second.
Now we have to figure out how to do this by manipulating the list of indices. It turns out that this is quite simple. The final combination will look like this:
['a', 'b', 'c', 'd', 'e']
^ ^
And the index representation of this will be [3, 4]. These are the maximum possible values for the indices, and are equal to i + n - r, where i is the position in the list, n is the number of values (5 in this case), and r is the number of choices (2 in this case). So as soon as a particular index reaches this value, it can go no higher, and will need to be "reset".
So with that in mind, here's a step-by-step analysis of the code:
def combinations(iterable, r):
pool = tuple(iterable)
n = len(pool)
First, given input based on the above example, pool would be is our list of characters above converted into a tuple, and n is simply the number of items in the pool.
if r > n:
return
We can't select more than n items from an n item list without replacement, so we simply return in that case.
indices = range(r)
Now we have our indices, initialized to the first combination ([0, 1]). So we yield it:
yield tuple(pool[i] for i in indices)
Then we generate the remaining combinations, using an infinite loop.
while True:
Inside the loop, we first step backwards through the list of indices searching for an index that hasn't reached it's maximum value yet. We use the formula described above (i + n - r) to determine the maximum value for a given index. If we find an index that hasn't reached it's maximum value, then we break out of the loop.
for i in reversed(range(r)):
if indices[i] != i + n - r:
break
If we don't find one, then that means that all the indices are at their maximum value, and so we're done iterating. (This uses the little-known for-else construct; the else block is executed only if the for loop terminates normally.)
else:
return
So now we know that index i needs to be incremented:
indices[i] += 1
Additionally, the indices after i are all at their maximum values, and so need to be reset.
for j in range(i+1, r):
indices[j] = indices[j-1] + 1
Now we have the next set of indices, so we yield another combination.
yield tuple(pool[i] for i in indices)
There are several variations on this approach; in another, instead of stepping backwards through the indices, you step forward, incrementing the first index that has a "gap" between it and the following index, and resetting the lower indices.
Finally, you could also define this recursively, although pragmatically, the recursive definition probably won't be as efficient.
Then you are not doing Apriori.
In Apriori, you never enumerate all subsets of size k, except for k=1.
In any larger size, you construct the combinations according to Apriori-Gen.
That is much more efficient, and actually at least as easy as manually building all combinations.
Here is an example. Assuming the following itemsets were found frequent:
ABCD
ABCF
ABEF
ABDF
ACDF
BCDF
Then apriori will construct only one single candidate (by prefix rule!):
ABC + D - ABC + D + F
ABC + F /
And then it will next check whether the other subsets were also found frequent, i.e.
BCDF
ACDF
ABDF
Since all of them were in the previous round, this candidate survives and will be tested in the next linear scan over the data set.
Apriori is all about not having to check all subsets of size k, but only those that have a chance to be frequent, given the previous knowledge.
Okay here is my rolled own version :
def all_subsets(source,size):
index = len(source)
index_sets = [()]
for sz in xrange(size):
next_list = []
for s in index_sets:
si = s[len(s)-1] if len(s) > 0 else -1
next_list += [s+(i,) for i in xrange(si+1,index)]
index_sets = next_list
subsets = []
for index_set in index_sets:
rev = [source[i] for i in index_set]
subsets.append(rev)
return subsets
Yields:
>>> Apriori.all_subsets(['c','r','i','s'],2)
[['c', 'r'], ['c', 'i'], ['c', 's'], ['r', 'i'], ['r', 's'], ['i', 's']]

Categories

Resources