How to select the all the permutations - python

I want to get a list of all permutations or ranks of permutations where the ith element is k and the len is greater than k and labeled with n. A list of integers from 1..n should be permuted. How Can this be done?
For the first Element of the permutation its trivial. But how does it work for ith Element? Iterating through n! permutations is not an option.

First of all, notice that this problem can easily be transformed into just ranking/listing permutations. All that you need to do is write a function that takes a permutation of 1..(n-1) and transforms it into a permutation meeting your condition, and vice versa. (Going one way just increment every number in the permutation that is bigger than k and insert k in the ith position. Going the other remove the k and decrement everything larger than k.)
But ranking/listing is a well-understood problem. See https://rosettacode.org/wiki/Permutations/Rank_of_a_permutation for solutions in multiple languages, including three in Python.
This idea can be extended to more conditions like the first. You just need to write more general transforms first.

Warning: the length of permutations is (n-1)!, and the total size of permutations is O(n*(n-1)!).
import itertools
i,k,n = 1,5,10 #pick these
permutations = [p for p in itertools.permutations(list(range(1,n+1,1))) if p[i]==k]

Related

How can I scramble an array in linear time without having duplicate elements next to one another?

For example:
T = [b, c, b, b, a, c] #each element represents questions related to topics a, b, c
What I want is make sure that no 2 questions from same topic are next to one another. (see the T where b, b are next to eachother)
So I want to rearrange the T in such a that no two questions belonging to same topic are next to each other, i.e. Tnew = [b, c, b, a, b, c]
But condition is that we have to do it linear time i.e. O(n) or Big-O of (n)
The algorithm that I thought of:
1)Create a dict or map to hold the occurrence of each topics:
a --> 1
b --> 3
c --> 2
2) Now based on the counts we can create new array such that:
A = [a, b, b, b, c, c]
3) Now perform unsorting of Array which I believe runs in O(n).
(unsorting is basically find midpoint and then merge the elements alternately from each half.
Can someone please help me design a pseudocode or algorithm that can do this better on any inputs with k number of topics?
This is a random question that I am practing for exam.
There is a linearithmic approach of time complexity O(log c * n) where c is the number of unique items and n is the total number of items.
It works as follows:
Create a frequency table of the items. This is just a list of tuples that tells how many of each item are available. Let's store the tuples in the list as (quantity, item). This Step is O(n). You could use collections.Counter in python, or collections.defaultdict(int) or a vanilla dict.
Heapify the list of tuples in a max heap. This can be done in O(n). This heap has the items with the largest number of quantity at the front. You could use heapq module in python. Let's call this heap hp.
Have a list for the results called res.
Now run a loop while len(hp) > 0: and do as follows:
pop the 2 largest elements from the heap. O(log c) operation.
add one from each to res. Make sure you handle edge cases properly, if any.
decrement the quantity of both items. If their quantity > 0 push them back on the heap. O(log c) operation.
At the end, you could end with one item that has no peers for inter weaving them. This can happen if the quantity of one item is larger than the sum of the quantities of all the other items. But there's no way around this. Your input data must respect this condition.
One final note about time complexity: If the number of unique items is constant, we could drop the log c factor from the time complexity and consider it linear. This is mainly a case of how we define things.
Here's the O(n) solution (inspired in part by #user1984's answer):
Imagine you know how many of each element to insert, and have ordered these counts. Say we then decided to build up a solution by interleaving groups of elements incrementally. We start off with just our group of elements G0 with lowest frequency. Then, we take the next most popular group G1, and interleave these values into our existing list.
If we were to continue in this fashion, we could observe a few rules:
if the current group G has more elements than all over smaller groups combined plus one, then:
the result will have elements of G neighboring each other
regardless of its prior state, no elements from the smaller groups will neighbor each other (inter-group nor intra-group)
otherwise
the result will have no elements of G neighboring each other
regardless, G contains enough elements to separate individual elements of the next smallest group G-1, if positioned wisely.
With this in mind, we can see (recursively) that as long as we shift our interleaving to overlap any outstanding neighbor violations, we can guarantee that as long as G has fewer elements than the smaller groups combined plus two, the overall result absolutely will not have any neighbor violations.
Of course, the logic I outlined above for computing this poses some performance issues, so we're going to compute the same result in a slightly different, but equivalent, way. We'll instead insert the largest group first, and work our way smaller.
First, create a frequency table. You need to form a item: quantity mapping, so something like collections.Counter() in python. This takes O(N) time.
Next, order this mapping by frequency. This can be done in O(N) time using counting sort, since all elements are integers. Note there's c elements, but c <= N, so O(c) is O(N).
After that, build a linked list of length N, with node values from [0, N) (ascending). This will help us track which indices to write into next.
For each item in our ordered mapping, iterate from 0 to the associated count (exclusive). Each iteration, remove the current element from linked list ((re)starting at the head), and traverse two nodes forward in the linked list. Insert the item/number into the destination array at the index of each removed node. This will take O(N) time since we traverse ~2k nodes per item (where k in the group size), and the combined size of all groups is N, so 2N traversals. Each removal can be performed in O(1) time, so O(N) for traversing and removing.
So all in all, this will take O(N) time, utilizing linked lists, hash tables (or O(1) access mappings of some sort), and counting sort.
The collections module can help. Using Counter gets you a number of occurrence of each question in O(n) time. Converting those into iterators in a deque will allow you to interleave the questions sequentially but you need to process them in decreasing order of occurrences. Getting the counters in order of frequency would normally require a sort, which is O(nLogn), but you can use a pigeon hole approach to group the iterators by common frequency in O(n) time then go through the groups in reverse order of frequency. The maximum number of distinct frequencies is knownable and will be less or equal to √(0.25+2n)-0.5 which is less than O(n).
There will be at most n iterators so building the deque will be <= O(n). going through the iterators until exhaustion will take at most 2n iterations:
T= ['b', 'c', 'b', 'b', 'a', 'c']
from collections import Counter,deque
from itertools import repeat
result = []
counts = Counter(T) # count occurrences O(n)
common = [[] for _ in range(max(counts.values()))] # frequency groups
for t,n in counts.items():
common[n-1].append(repeat(t,n)) # iterators by freq.
Q = deque(iq for cq in reversed(common) for iq in cq) # queue iterators
while Q: # 2n iterations or less
iq = Q.popleft() # O(1) - extract first iterator
q = next(iq,None) # O(1) - next question
if not q: continue # - exhausted, removed from deque
result.append(q) # O(1) - add question to result
Q.insert(1,iq) # O(1) - put back iterator as 2nd
print(result)
['b', 'c', 'b', 'c', 'b', 'a']

find longest noncontiguous nondecreasing subsequence in array

I want to prove why finding the longest non contiguous non decreasing sub sequence of an array of size n, in O(n).
By "find" I mean know its length, and a list of the relevant indices.
Here is a solution in NlogN.
Here is the Wikipedia article.
I want to convince myself that it can't be done any faster.
Here is partial proof:
Assume this were possible faster than O(nlogn), for simplicity, O(n) but this holds for anything better than O(nlogn)
We can merge two sorted arrays into a single sorted array composed of all elements of both in O(n1 + n2).
Given an array A, we could then find its longest longest non contiguous non decreasing sub sequence in O(n).
If this sequence is smaller then n/2, then for reversed(A) it is larger or equal to n/2 [I need proof for that]
This way, we can split the array into sorted chunks, each time in O(n), and being left with a remainder of size k that can also be split and sorted in O(k) + O(remainer) until we are left with one element which is O(1).
Thus, sorting the array would take O(n)
Here is a solution in O(n+klogk), where k is the number of elements which are in an unsorted position.
Armed with the above algorithm, we can find which of the k elements are out of their position by sorting using the above algorithm O(n+klogk) and traversing the unsorted and the sorted arrays together, finding the indices of the k (or less) differences (single pass, O(n)).
The rest of the indices would define a sorted array which is also a non consecutive subsequence of the input array, which is of the maximal length, because by definition, those indices are sorted, and the other k aren't.
In general, this is O(n log n), but in case there is a need to search for a longest non consecutive subsequence, it would make sense to optimize like this, because the problem will probably be such that n << k`.

Limiting the number of combinations /permutations in python

I was going to generate some combination using the itertools, when i realized that as the number of elements increase the time taken will increase exponentially. Can i limit or indicate the maximum number of permutations to be produced so that itertools would stop after that limit is reached.
What i mean to say is:
Currently i have
#big_list is a list of lists
permutation_list = list(itertools.product(*big_list))
Currently this permutation list has over 6 Million permutations. I am pretty sure if i add another list, this number would hit the billion mark.
What i really need is a significant amount of permutations (lets say 5000). Is there a way to limit the size of the permutation_list that is produced?
You need to use itertools.islice, like this
itertools.islice(itertools.product(*big_list), 5000)
It doesn't create the entire list in memory, but it returns an iterator which consumes the actual iterable lazily. You can convert that to a list like this
list(itertools.islice(itertools.product(*big_list), 5000))
itertools.islice has many benefits such as ability to set start and step. Solutions below aren't that flexible and you should use them only if start is 0 and step is 1. On the other hand, they don't require any imports.
You could create a tiny wrapper around itertools.product
it = itertools.product(*big_list)
pg = (next(it) for _ in range(5000)) # generator expression
(next(it) for _ in range(5000)) returns a generator not capable of producing more than 5000 values. Convert it to list by using the list constructor
pl = list(pg)
or by wrapping the generator expression with square brackets (instead of round ones)
pl = [next(it) for _ in range(5000)] # list comprehension
Another solution, which is just as efficient as the first one, is
pg = (p for p, _ in zip(itertools.product(*big_list), range(5000))
Works in Python 3+, where zip returns an iterator that stops when the shortest iterable is exhausted. Conversion to list is done as in the first solution.
You can try out this method to get particular number of permutations number of results a permutation produce is n! where n stands for the number of elements in a list for example if you want to get only 2 results then you can try the following:
Use any temporary variable and limit it
from itertools import permutations
m=['a','b','c','d']
per=permutations(m)
temp=1
for i in list(per):
if temp<=2: #2 is the limit set
print (i)
temp=temp+1
else:
break

Given a list L labeled 1 to N, and a process that "removes" a random element from consideration, how can one efficiently keep track of min(L)?

The question is pretty much in the title, but say I have a list L
L = [1,2,3,4,5]
min(L) = 1 here. Now I remove 4. The min is still 1. Then I remove 2. The min is still 1. Then I remove 1. The min is now 3. Then I remove 3. The min is now 5, and so on.
I am wondering if there is a good way to keep track of the min of the list at all times without needing to do min(L) or scanning through the entire list, etc.
There is an efficiency cost to actually removing the items from the list because it has to move everything else over. Re-sorting the list each time is expensive, too. Is there a way around this?
To remove a random element you need to know what elements have not been removed yet.
To know the minimum element, you need to sort or scan the items.
A min heap implemented as an array neatly solves both problems. The cost to remove an item is O(log N) and the cost to find the min is O(1). The items are stored contiguously in an array, so choosing one at random is very easy, O(1).
The min heap is described on this Wikipedia page
BTW, if the data are large, you can leave them in place and store pointers or indexes in the min heap and adjust the comparison operator accordingly.
Google for self-balancing binary search trees. Building one from the initial list takes O(n lg n) time, and finding and removing an arbitrary item will take O(lg n) (instead of O(n) for finding/removing from a simple list). A smallest item will always appear in the root of the tree.
This question may be useful. It provides links to several implementation of various balanced binary search trees. The advice to use a hash table does not apply well to your case, since it does not address maintaining a minimum item.
Here's a solution that need O(N lg N) preprocessing time + O(lg N) update time and O(lg(n)*lg(n)) delete time.
Preprocessing:
step 1: sort the L
step 2: for each item L[i], map L[i]->i
step 3: Build a Binary Indexed Tree or segment tree where for every 1<=i<=length of L, BIT[i]=1 and keep the sum of the ranges.
Query type delete:
Step 1: if an item x is said to be removed, with a binary search on array L (where L is sorted) or from the mapping find its index. set BIT[index[x]] = 0 and update all the ranges. Runtime: O(lg N)
Query type findMin:
Step 1: do a binary search over array L. for every mid, find the sum on BIT from 1-mid. if BIT[mid]>0 then we know some value<=mid is still alive. So we set hi=mid-1. otherwise we set low=mid+1. Runtime: O(lg**2N)
Same can be done with Segment tree.
Edit: If I'm not wrong per query can be processed in O(1) with Linked List
If sorting isn't in your best interest, I would suggest only do comparisons where you need to do them. If you remove elements that are not the old minimum, and you aren't inserting any new elements, there isn't a re-scan necessary for a minimum value.
Can you give us some more information about the processing going on that you are trying to do?
Comment answer: You don't have to compute min(L). Just keep track of its index and then only re-run the scan for min(L) when you remove at(or below) the old index (and make sure you track it accordingly).
Your current approach of rescanning when the minimum is removed is O(1)-time in expectation for each removal (assuming every item is equally likely to be removed).
Given a list of n items, a rescan is necessary with probability 1/n, so the expected work at each step is n * 1/n = O(1).

Sum of maximal element from every possible subset of size 'k' of an array

I have a very large list comprising of about 10,000 elements and each element is an integer as big as 5 billion. I would like to find the sum of maximal elements from every possible subset of size 'k' (given by the user) of an array whose maximum size is 10,000 elements. The only solution that comes to my head is to generate each of the subset (using itertools) and find its maximum element. But this would take an insane amount of time! What would be a pythonic way to solve this?
Don't use python, use mathematics first. This is a combinatorial problem: If you have an array S of n numbers (n large), and generate all possible subsets of size k, you want to calculate the sum of the maximal elements of the subsets.
Assuming the numbers are all distinct (though it also works if they are not), you can calculate exactly how often each will appear in a subset, and go on from there without ever actually constructing a subset. You should have taken it over to math.stackexchange.com, they'd have sorted you out in a jiffy. Here it is, but without the nice math notation:
Sort your array in increasing order and let S_1 be the smallest (first) number,
S_2 the next smallest, and so on. (Note: Indexing from 1).
S_n, the largest element, is obviously the maximal element of any subset
it is part of, and there are exactly (n-1 choose k-1) such subsets.
Of the subsets that don't contain S_n, there are (n-2 choose k-1)
subsets that contain S_{n-1}, in which it is the largest element.
Continue this until you come down to S_k, the k-th smallest number
(counting from the smallest), which will be the maximum of exactly one
subset: (k-1 choose k-1) = 1. Smaller numbers (S_1 to S_{k-1})
can never be maximal: Every set of k elements will contain something
larger.
Sum the above (n-k+1 terms), and there's your answer:
S_n*(n-1 choose k-1) + S_{n-1}*(n-2 choose k-1) + ... + S_k*(k-1 choose k-1)
Writing the terms from smallest to largest, this is just the sum
Sum(i=k..n) S_i * (i-1 choose k-1)
If we were on math.stackexchange you'd get it in the proper mathematical notation, but you get the idea.

Categories

Resources