For example:
T = [b, c, b, b, a, c] #each element represents questions related to topics a, b, c
What I want is make sure that no 2 questions from same topic are next to one another. (see the T where b, b are next to eachother)
So I want to rearrange the T in such a that no two questions belonging to same topic are next to each other, i.e. Tnew = [b, c, b, a, b, c]
But condition is that we have to do it linear time i.e. O(n) or Big-O of (n)
The algorithm that I thought of:
1)Create a dict or map to hold the occurrence of each topics:
a --> 1
b --> 3
c --> 2
2) Now based on the counts we can create new array such that:
A = [a, b, b, b, c, c]
3) Now perform unsorting of Array which I believe runs in O(n).
(unsorting is basically find midpoint and then merge the elements alternately from each half.
Can someone please help me design a pseudocode or algorithm that can do this better on any inputs with k number of topics?
This is a random question that I am practing for exam.
There is a linearithmic approach of time complexity O(log c * n) where c is the number of unique items and n is the total number of items.
It works as follows:
Create a frequency table of the items. This is just a list of tuples that tells how many of each item are available. Let's store the tuples in the list as (quantity, item). This Step is O(n). You could use collections.Counter in python, or collections.defaultdict(int) or a vanilla dict.
Heapify the list of tuples in a max heap. This can be done in O(n). This heap has the items with the largest number of quantity at the front. You could use heapq module in python. Let's call this heap hp.
Have a list for the results called res.
Now run a loop while len(hp) > 0: and do as follows:
pop the 2 largest elements from the heap. O(log c) operation.
add one from each to res. Make sure you handle edge cases properly, if any.
decrement the quantity of both items. If their quantity > 0 push them back on the heap. O(log c) operation.
At the end, you could end with one item that has no peers for inter weaving them. This can happen if the quantity of one item is larger than the sum of the quantities of all the other items. But there's no way around this. Your input data must respect this condition.
One final note about time complexity: If the number of unique items is constant, we could drop the log c factor from the time complexity and consider it linear. This is mainly a case of how we define things.
Here's the O(n) solution (inspired in part by #user1984's answer):
Imagine you know how many of each element to insert, and have ordered these counts. Say we then decided to build up a solution by interleaving groups of elements incrementally. We start off with just our group of elements G0 with lowest frequency. Then, we take the next most popular group G1, and interleave these values into our existing list.
If we were to continue in this fashion, we could observe a few rules:
if the current group G has more elements than all over smaller groups combined plus one, then:
the result will have elements of G neighboring each other
regardless of its prior state, no elements from the smaller groups will neighbor each other (inter-group nor intra-group)
otherwise
the result will have no elements of G neighboring each other
regardless, G contains enough elements to separate individual elements of the next smallest group G-1, if positioned wisely.
With this in mind, we can see (recursively) that as long as we shift our interleaving to overlap any outstanding neighbor violations, we can guarantee that as long as G has fewer elements than the smaller groups combined plus two, the overall result absolutely will not have any neighbor violations.
Of course, the logic I outlined above for computing this poses some performance issues, so we're going to compute the same result in a slightly different, but equivalent, way. We'll instead insert the largest group first, and work our way smaller.
First, create a frequency table. You need to form a item: quantity mapping, so something like collections.Counter() in python. This takes O(N) time.
Next, order this mapping by frequency. This can be done in O(N) time using counting sort, since all elements are integers. Note there's c elements, but c <= N, so O(c) is O(N).
After that, build a linked list of length N, with node values from [0, N) (ascending). This will help us track which indices to write into next.
For each item in our ordered mapping, iterate from 0 to the associated count (exclusive). Each iteration, remove the current element from linked list ((re)starting at the head), and traverse two nodes forward in the linked list. Insert the item/number into the destination array at the index of each removed node. This will take O(N) time since we traverse ~2k nodes per item (where k in the group size), and the combined size of all groups is N, so 2N traversals. Each removal can be performed in O(1) time, so O(N) for traversing and removing.
So all in all, this will take O(N) time, utilizing linked lists, hash tables (or O(1) access mappings of some sort), and counting sort.
The collections module can help. Using Counter gets you a number of occurrence of each question in O(n) time. Converting those into iterators in a deque will allow you to interleave the questions sequentially but you need to process them in decreasing order of occurrences. Getting the counters in order of frequency would normally require a sort, which is O(nLogn), but you can use a pigeon hole approach to group the iterators by common frequency in O(n) time then go through the groups in reverse order of frequency. The maximum number of distinct frequencies is knownable and will be less or equal to √(0.25+2n)-0.5 which is less than O(n).
There will be at most n iterators so building the deque will be <= O(n). going through the iterators until exhaustion will take at most 2n iterations:
T= ['b', 'c', 'b', 'b', 'a', 'c']
from collections import Counter,deque
from itertools import repeat
result = []
counts = Counter(T) # count occurrences O(n)
common = [[] for _ in range(max(counts.values()))] # frequency groups
for t,n in counts.items():
common[n-1].append(repeat(t,n)) # iterators by freq.
Q = deque(iq for cq in reversed(common) for iq in cq) # queue iterators
while Q: # 2n iterations or less
iq = Q.popleft() # O(1) - extract first iterator
q = next(iq,None) # O(1) - next question
if not q: continue # - exhausted, removed from deque
result.append(q) # O(1) - add question to result
Q.insert(1,iq) # O(1) - put back iterator as 2nd
print(result)
['b', 'c', 'b', 'c', 'b', 'a']
I want to get a list of all permutations or ranks of permutations where the ith element is k and the len is greater than k and labeled with n. A list of integers from 1..n should be permuted. How Can this be done?
For the first Element of the permutation its trivial. But how does it work for ith Element? Iterating through n! permutations is not an option.
First of all, notice that this problem can easily be transformed into just ranking/listing permutations. All that you need to do is write a function that takes a permutation of 1..(n-1) and transforms it into a permutation meeting your condition, and vice versa. (Going one way just increment every number in the permutation that is bigger than k and insert k in the ith position. Going the other remove the k and decrement everything larger than k.)
But ranking/listing is a well-understood problem. See https://rosettacode.org/wiki/Permutations/Rank_of_a_permutation for solutions in multiple languages, including three in Python.
This idea can be extended to more conditions like the first. You just need to write more general transforms first.
Warning: the length of permutations is (n-1)!, and the total size of permutations is O(n*(n-1)!).
import itertools
i,k,n = 1,5,10 #pick these
permutations = [p for p in itertools.permutations(list(range(1,n+1,1))) if p[i]==k]
I read that the heapq.merge function is specifically used to merge 2 sorted arrays? is the time complexity O(n)? if not what is it and why? Also what is its space-complexity.
I was solving the question to merger 2 sorted arrays with 2 pointers and could achieve O(n) time complexity and O(n) space complexity.
If you are merging K sorted arrays and each array has N elements then heapq.merge will perform the operation in time complexity of NK(logK). Because NK is the total number of elements across all the K arrays and each element has to be compared. While logK is the time complexity of the bubble down operations from the top of the heap (that is the height of the heap).
In your case K=2, log(2) = 1. So in your special case it is O(n)
heapq.merge can be used to merge any number of sorted iterables. Its time complexity is O(NlogK) where N is the total number of elements whereas K are the items fed into the minheap for comparison.
The space complexity is O(K) because the minheap has K items at any given point in time during the execution.
I am currently working on a project that require me to compute the largest anagram group in an dictionary using trie and the return result must in alphabetical order. The solution should compute the result in O(C) time and space complexity where C represent all characters inside dictionary.
I plan to construct the trie by creating new array of size 26 at each node and index of the array represent a letter from A - Z, eg array[0] = A, array[1] = B etc. I will also map all letter from A - Z to an unique prime so that I can compute an unique number for anagram group.
During insertion, I insert the pointer of next letter's array and the result of multiplication between current letter's prime number and previous letter's prime number to its corresponding index.
I did not certain if my approach is correct as I feel the space complexity probably did not meet the requirement, any suggestion or hint would be appreciated.
I have a very large list comprising of about 10,000 elements and each element is an integer as big as 5 billion. I would like to find the sum of maximal elements from every possible subset of size 'k' (given by the user) of an array whose maximum size is 10,000 elements. The only solution that comes to my head is to generate each of the subset (using itertools) and find its maximum element. But this would take an insane amount of time! What would be a pythonic way to solve this?
Don't use python, use mathematics first. This is a combinatorial problem: If you have an array S of n numbers (n large), and generate all possible subsets of size k, you want to calculate the sum of the maximal elements of the subsets.
Assuming the numbers are all distinct (though it also works if they are not), you can calculate exactly how often each will appear in a subset, and go on from there without ever actually constructing a subset. You should have taken it over to math.stackexchange.com, they'd have sorted you out in a jiffy. Here it is, but without the nice math notation:
Sort your array in increasing order and let S_1 be the smallest (first) number,
S_2 the next smallest, and so on. (Note: Indexing from 1).
S_n, the largest element, is obviously the maximal element of any subset
it is part of, and there are exactly (n-1 choose k-1) such subsets.
Of the subsets that don't contain S_n, there are (n-2 choose k-1)
subsets that contain S_{n-1}, in which it is the largest element.
Continue this until you come down to S_k, the k-th smallest number
(counting from the smallest), which will be the maximum of exactly one
subset: (k-1 choose k-1) = 1. Smaller numbers (S_1 to S_{k-1})
can never be maximal: Every set of k elements will contain something
larger.
Sum the above (n-k+1 terms), and there's your answer:
S_n*(n-1 choose k-1) + S_{n-1}*(n-2 choose k-1) + ... + S_k*(k-1 choose k-1)
Writing the terms from smallest to largest, this is just the sum
Sum(i=k..n) S_i * (i-1 choose k-1)
If we were on math.stackexchange you'd get it in the proper mathematical notation, but you get the idea.