Nested loops-keep the number of iterations constant

Nested loops-keep the number of iterations constant - python

By request by #mypetition I am editing my question, although I think that the astronomy details here are unimportant.
I have a file of the form:
a e q Q i lasc aper M H dist comment i_f i_h i_free
45.23710 0.1394 38.93105 51.54315 5.0300 19.9336 286.2554 164.9683 8.41 51.3773 warm 0.000 62.000 4.796
46.78620 0.1404 40.21742 53.35498 3.1061 148.9657 192.3009 337.5967 7.37 40.8789 cold 0.000 42.000 2.473
45.79450 0.1230 40.16178 51.42722 8.0695 104.6470 348.5004 32.9457 8.45 41.3089 warm 0.000 47.000 6.451
42.95280 0.0145 42.32998 43.57562 2.9273 126.3988 262.8777 163.4198 7.36 43.5518 cold 0.000 161.000 2.186
There are 1.6e6 lines in total. These are orbital elements. I need to compute the Minimum Orbit Intersection Distance (MOID) between each pair of orbits, e.g line 1 with line 2, line 1 with line3 and so forth until I reach the end of the file. Then, I start from the second line and go to the end of the file. Then start from the third line and agin go to the end of the file etc. Since I have 1.6e6 orbits, that would be ~1e12 orbit pairs.
I don't want to load all these 1e12 calculation on 1 cpu and wait forever, so I am planning to use a cluster and launch multiple serial jobs.
I need to iterate over 1.6e6 elements, where I start with the first elements and go to the end of the file, then start from the second and go to the end of the file etc, until I lastly start with T-1 and go to T. These will result in 10^12 iterations and I am planning split them into multiple jobs, where each job does C=10^7 calculations, so I can run them on a computer cluster.
I came up with the following nested loop:
for i in range( M, N)
for j in range( i+1, T)
where M=1 and changes according to the number of jobs that I will have. T=1.6e6 is constant (number of lines to iterate over). I want to find the index N, so that the total number of operations is C=10^7. Here is how I approached the problem:
[T-(N+1) + T-(M+1)]*(M-N+1)/2=C - because the number of the operations are just the sum of the arithmetic series above. So, I solve the quadratic equation and I get the roots. Here is the python code for that:
import numpy as np
import math
C=1.0e7 # How many calculations per job do you want?
T=1.6e6 # How many orbits do you have?
M=1 # what is the starting index of outer loop?
# N = end index of outer loop (this is to be calculated!)
P=1
l=0
with open('indx.txt','w') as f:
while P<T:
l=l+1
K=np.roots([-1,2*T,M**2-2*T*(M-1)-2*C])
N=int(round(K[1]))
f.write("%s %s\n" % (P,P+N))
M=K[1]+1
P=P+N+1
However, keeping the above solutions, updating M=M+N, I noticed that the condition C=10^7 is not satisfied. Here is a list of the first few indices.
M N
1 7
8 21
22 41
42 67
68 99
100 138
139 183
184 234
235 291
....
....
1583930 1588385
1588386 1592847
1592848 1597316
1597317 1601791
But if you look at the pair before the last, the loop over i=1592848 - 1597316 and j=i+1, T will produce more calculations than C=10^7 i.e roughly (2685+7153)*4468/2 ~ 2.2e7.
Any idea on how to solve this problem, keeping C=1e7 constant, which will provide the number of jobs (with similar running time) I need to run in order to iterate over 1.6e6 lines.
Hopefully, this explanation is enough according to #mypetition standards and am hoping to resolve the problem.
Your help will be highly appreciated!

I don't know if the nature of each job can lend itself to a different kind of split but if they do, you could try to use the same trick that Gauss used to come up with ∑1..n = n(n+1)/2
The trick was to line up the sequence with a reversed copy of it:
1 2 3 4 5 6 7 8
8 7 6 5 4 3 2 1
-- -- -- -- -- -- -- --
9 9 9 9 9 9 9 9 = 8 * 9 is twice the sum so (8*9)/2 = ∑1..8 = 36
Based on this, if you split the pair of series down the middle, you will get 4 pairs of runs that will process the same number of elements:
1 2 3 4
8 7 6 5
-- -- -- --
9 9 9 9
Sou you would have 8 runs separated in 4 jobs. Each job would process n+1 (9) elements and compute two runs that have a complementary number of elements
Job 1 would do run 8..8 and run 1..8 (length 1 and 8)
Job 2 would do run 7..8 and run 2..8
Job 3 would do run 6..8 and run 3..8
Job 4 would do run 7..8 and run 4..8
In more general terms:
Job i of (N+1)/2 does runs (N-i+1)..N and i..N
If individual runs can't be parallelized further, this should give you the optimal spread (practically the square root of the total process time)
in Python (pseudo code):
size = len(array)
for index in range((size+1)//2):
launchJob(array, run1Start=index, run2Start=size-index-1)
note: you may want to adjust the starting points if you're not using zero based indexes.
note2: if you're not processing the last element on its own (i.e. N..N is excluded), one of your jobs will have N elements to process instead of N+1 and you will have to make an exception for that one
Adding more jobs will not significantly improve total processing time but if you want fewer parallel jobs, you can still keep them fairly equal by grouping pairs.
e.g. 2 jobs : [ 1,8,2,7 ] and [3,6,4,5] = 18 per job
Ideally your number of jobs should be a divider of the number of pairs. If not, you will still get a relatively balanced processing time by spreading extra pairs (or runs) evenly over the other jobs. If you choose to spread runs, select the ones in the middle of the list of pairs (because they will have individual processing times that are closer to each other).

Related

How to partition a list into similar average subsets

In a list of 300, I need to position 50 items, all repeated 6 times (300 total) in such a way that each item is within a certain range, and the average position of the item in the list is around the middle (150).
By each item in a certain range, I mean the 8 smaller subsets which are positions like:
1-36, 37-73, 74-110, 111-148, 149-186, 187-225, 226-262, 263-300. So, for example, item 1 could have positions 1, 38, 158, 198, 238, 271 in the list, with an average position of 150.6.
I'm trying to do this to automate a currently manual and time consuming process, but I'm having trouble figuring out the algorithm. My current thinking is for each item, randomly position the item into each segment, ensuring that if I choose the minimum position for each subsequent segment, the average cannot be higher than 150(+-2), if it is, randomize the previous position again until a number works. But thinking about it, it seems like it may not work and probably won't be fast. I'd really appreciate any help with this
(coding in Python if it matters)
EDIT:
to clarify, I am trying to position these items randomly, so for example, item1 would not appear 1st in all the subsets (I know that wouldn't make an avg of 150, just for clarification sake). In the example I supplied, item 1 would appear first in the first subset, second in the second subset and 9th in the 3rd. This is actually where I am having trouble

This is straightforward by construction. Let's refer to your 8 slices (subsets) in four pairs. Note that I've corrected the arithmetic on the slice boundaries.
A 1-38 , 264-300 37 slots first & last
B 39-75 , 226-263 38 slots
C 76-113, 188-225 38 slots
D 113-150, 151-187 37 slots middle pair
More specifically, we will pair there in reverse, mapping locations 1-300, 2-299, 3-298, etc. Those pairs of elements will receive the same value from the list of 50 items.
Now, we need sets of 6 slices in 3 pairs, distributed evenly. Each of these sets will omit one of our pairs above:
A B C items 1-12
A B D items 13-24
A C D items 25-36
B C D items 37-48
Since we allocate these in strict pairs, we will now have a mean of exactly 150.5 for each of the 48 objects, the optimum solution. Were the quantity of items divisible by 4, we could finish the allocation trivially. However ...
We now have items 49 & 50 remaining, 12 items. Slices A & D have 2 pairs open; B & C have 4 pairs open. We allocate these to sets ABC and BCD, finishing the construction.
Every item is allocated to 6 different slices, and has a mean position of 150.5, the mean of the entire collection of 300.
Response to OP comment
I never said they were to be placed in order of item number. Go ahead and do it that way, but only for the lower half of the slices (1-150).
Now, shuffle each of those partitions. Finally, make the upper half the mirror-image of the lower half. Problem solved -- maybe, depending on your definition of "random". The first half has high entropy, but the second half is entirely deterministic, given the first half.

How to optimise the code for very large values of N. (Time Limit: 1.0 sec(s) for each input file , Memory Limit: 256 MB )

Your task is to construct a tower in N days by following these conditions:
1. Every day you are provided with one disk of distinct size.
2. The disk with larger sizes should be placed at the bottom of the tower.
3. The disk with smaller sizes should be placed at the top of the tower.
The order in which tower must be constructed is as follows:
1. You cannot put a new disk on the top of the tower until all the larger disks that are given to you get placed.
Print N lines denoting the disk sizes that can be put on the tower on the day.
def Solve (arr):
maxx =N
s= []
ss=[]
for i in range(len(arr)):
if ((arr[i] == maxx) or (maxx in s) ):
ss.append(str(maxx) + " ")
maxx-=1
for k in sorted(s)[::-1]:
if(k == maxx):
ss.append(str(k)+" ")
maxx-=1
del s[s.index(k)]
ss.append("\n")
else:
s.append(arr[i])
ss.append("\n")
return ss
N = int(input())
arr = list(map(int, input().split()))
out_ = Solve(arr)
print("".join(out_))
Input format:
First line: N denoting the total number of disks that are given to you in the N subsequent days
Second line: N integers in which the integer " i" denote the size ofenter code here the disks that are given to you on the i th day
Note: All the disk sizes are distinct integers in the range of 1 to N
Output format
Print N lines. In the i th line, print the size of disks that can be placed on the top of the tower in descending order of the disk sizes.
If on the i th day no disks can be placed, then leave that line empty.
Constraints:
1 <= N <= 10^6
1 <= Size of a Disk <=N
Sample Input:
5
4 5 1 2 3
Sample Output:
5 4
3 2 1
Explanation:
On the first day, the disk of size 4 is given. But you cannot put the disk on the bottom of the tower as a disk of size 5 is still remaining.
On the second day, the disk of size 5 will be given so now disk of sizes 5 and 4 can be placed on the tower.
On the third and fourth day, disks cannot be placed on the tower as the disk of 3 needs to be given yet. Therefore, these lines are empty.
On the fifth day, all the disks of sizes 3, 2, and 1 can be placed on the top of the tower.

According to your sample inputs and outputs, you know all the sizes in advance. It means that you can sort the array first by decreasing values. Then, you just need to loop through the array to output them (skipping the value if it is the same as the previous one). You can initialize a variable that holds the previous size with a huge value, so that you don't need to check within the loop if it has been initialized (it will only save a constant factor for the execution time, but if speed is a big concern, it's still worth avoiding useless operations withing loops).
The important thing is to think of the number of operations in the worst case (aka worst-case complexity), computed in terms of N. For the performance analysis, sorting an array of n items takes nlog(n) steps in the worst case. It's a bad idea to sort the data at each step of the loop. The sort in python has been optimized a lot, and even if TimSort can be much better when the data are already sorted (linear), your original algorithm complexity is still at least O(nlog(n) + (n-1)²) = O(n²). Sorting everything first, and avoiding the sort within the loop will give you a much faster algorithm, in O(n*log(n)) in the worst case, and O(n) if you are very lucky with the sizes.

If you paraphrase the solution as follows, the answer more or less falls out:
Wheel k can only be placed if wheel k+1 has been placed and k has arrived
So a solution in pseudocode:
Associate each wheel with its arrival day
Sort by descending wheel size
You now have a collection days of arrival days.
Let now=0
While days is not empty:
If now < days[0]: you cannot place a wheel today. Print an empty line. Increment now.
If now >= days[0]: you can place the next biggest wheel. Remove the first element of days and loop again.

i am getting Terminated due to timeout for a particular test case in which my loop executes 100000 times .can anyone help me solving this problem?

Problem: i am getting Terminated due to timeout for a particular test case in which my loop executes 100000 times .can anyone help me solving this problem?
QUES. You have an empty sequence, and you will be given queries. Each query is one of these three types:
1 x -Push the element x into the stack.
2 -Delete the element present at the top of the stack.
3 -Print the maximum element in the stack.
Input Format
The first line of input contains an integer N, . The next N lines each contain an above mentioned query. (It is guaranteed that each query is valid.)
Constraints
Output Format
For each type 3 query, print the maximum element in the stack on a new line.
Sample Input
10
1 97
2
1 20
2
1 26
1 20
2
3
1 91
3
Sample Output
26
91
//My Code
# Enter your code here. Read input from STDIN. Print output to STDOUT
stack=[]
top=-1
n=int(input())
for i in range(n):
x=list(map(int,input().split()))
if x[0]==1:
top+=1
stack.append(x[1])
elif x[0]==2:
top=top-1
stack.pop()
else:
print(max(stack))
Test Case for terminated timeout:
100000
1 86627537
1 938778873
1 495914598
3
3
3
3
3
3
1 507065127
1 230961732
3
1 641113507
1 123729858
1 706231036
3
1 218881566
1 759861012
3{-truncated-}

Since you are always printing the max value, it probably makes sense to keep a separate sorted array. I can only guess that calling max(stack) is "timing out" in this test case because it is searching an unsorted list.
I'm guessing this is a lesson in algorithmic complexity. Maintaining a sorted array and pulling the max value has a lookup time of O(1) where calling max() is dependent upon the implementation.

Random Sudoku Generator

I'm trying to build a python script that generates a 9x9 block with numbers 1-9 that are unique along the rows, columns and within the 3x3 blocks - you know, Sudoku!
So, I thought I would start simple and get more complicated as I went. First I made it so it randomly populated each array value with a number 1-9. Then made sure numbers along rows weren't replicated. Next, I wanted to the same for rows & columns. I think my code is OK - it's certainly not fast but I don't know why it jams up..
import numpy as np
import random
#import pdb
#pdb.set_trace()
#Soduku solver!
#Number input
soduku = np.zeros(shape=(9,9))
for i in range(0,9,1):
for j in range(0,9,1):
while True:
x = random.randint(1,9)
if x not in soduku[i,:] and x not in soduku[:,j]:
soduku[i,j] = x
if j == 8: print(soduku[i,:])
break
So it moves across the columns populating with random ints, drops a row and repeats. The most the code should really need to do is generate 9 numbers for each square if it's really unlucky - I think if we worked it out it would be less than 9*9*9 values needing generating. Something is breaking it!
Any ideas?!

I think what's happening is that your code is getting stuck in your while-loop. You test for the condition if x not in soduku[i,:] and x not in soduku[:,j], but what happens if this condition is not met? It's very likely that your code is running into a dead-end sudoku board (can't be solved with any values), and it's getting stuck inside the while-loop because the condition to break can never be met.

Generating it like this is very unlikely to work. There are many ways where you can generate 8 of the 9 3*3 squares making it impossible to fill in the last square at all, makign it hang forever.
Another approach would be to fill in all the numbers on at the time (so, all the 1s first, then all the 2s, etc.). It would be like the Eight queens puzzle, but with 9 queens. And when you get to a position where it is impossible to place a number, restart.
Another approach would be to start all the squares at 9 and strategically decrement them somehow, e.g. first decrement all the ones that cannot be 9, excluding the 9s in the current row/column/square, then if they are all impossible or all possible, randomly decrement one.
You can also try to enumerate all sudoku boards, then reverse the enumaration function with a random integer, but I don't know how successful this may be, but this is the only method where they could be chosen with uniform randomness.

You are coming at the problem from a difficult direction. It is much easier to start with a valid Sudoku board and play with it to make a different valid Sudoku board.
An easy valid board is:
1 2 3 | 4 5 6 | 7 8 9
4 5 6 | 7 8 9 | 1 2 3
7 8 9 | 1 2 3 | 4 5 6
---------------------
2 3 4 | 5 6 7 | 8 9 1
5 6 7 | 8 9 1 | 2 3 4
8 9 1 | 2 3 4 | 5 6 7
---------------------
3 4 5 | 6 7 8 | 9 1 2
6 7 8 | 9 1 2 | 3 4 5
9 1 2 | 3 4 5 | 6 7 8
Having found a valid board you can make a new valid board by playing with your original.
You can swap any row of three 3x3 blocks with any other block row. You can swap any column of three 3x3 blocks with another block column. Within each block row you can swap single cell rows; within each block column you can swap single cell columns. Finally you can permute the digits so there are different digits in the cells as long as the permutation is consistent across the whole board.
None of these changes will make a valid board invalid.

I use permutations(range(1,10)) from itertools to create a list of all possible rows. Then I put each row into a sudoku from top to bottom one by one. If contradicts occurs, use another row from the list. In this approach, I can find out some valid completed sudoku board in a short time. It continue generate completed board within a minute.
And then I remove numbers from the valid completed sudoku board one by one in random positions. After removing each number, check if it still has unique solution. If not, resume the original number and change to next random position. Usually I can remove 55~60 numbers from the board. It take time within a minute, too. It is workable.
However, the first few generated the completed sudoku board has number 1,2,3,4,5,6,7,8,9 in the first row. So I shuffle the whole list. After shuffling the list, it becomes difficult to generate a completed sudoku board. Mission fails.
A better approach may be in this ways. You collect some sudoku from the internet. You complete them so that they are used as seeds. You remove numbers from them as mention above in paragraph 2. You can get some sudoku. You can use these sudokus to further generate more by any of the following methods
swap row 1 and row 3, or row 4 and row 6, or row 7 and row 9
similar method for columns
swap 3x3 blocks 1,4,7 with 3,6,9 or 1,2,3 with 7,8,9 correspondingly.
mirror the sudoku vertical or horizontal
rotate 90, 180, 270 the sudoku
random permute the numbers on the board. For example, 1->2, 2->3, 3->4, .... 8->9, 9->1. Or you can just swap only 2 of them. eg. 1->2, 2->1. This also works.

Sum of array diagonal

I'm very new at this and have to do this for a project so keep that in mind.
I need to write a function sumOfDiagonal that has one parameter of type list.
The list is a 4x4 2-dimensional array of integers (4 rows and 4 columns of integers).
The function must return the sum of the integers in the diagonal positions from top right to bottom left.
I have not tried anything because I have no idea where to begin, so would appreciate some guidance.

Since you haven't specified a language (and this is probably classwork anyway), I'll have to provide pseudo-code. Given the 4x4 2d array, the basic idea is to use a loop specifying the index, and use that index to get the correct elements in both dimensions. Say we had the array:
[][0] [][1] [][2] [][3]
----- ----- ----- -----
[0][] 1 2 3 4
[1][] 5 6 7 8
[2][] 9 10 11 12
[3][] 13 14 15 16
and we wanted to sum the top-left-to-bottom-right diagonal (1+6+11+16)(1). That would be something like:
def sumOfDiagonal (arr, sz):
sum = 0
for i = 0 to sz - 1 inclusive:
sum = sum + arr[i][i]
return sum
That's using the normal means of accessing an array. If, as may be given the ambiguity in the question, your array is actually a list of some description (such as a linked list of sixteen elements), you'll just need to adjust how you get the "array" elements.
For example, a 16-element list would need to get nodes 0, 5, 10 and 15 so you could run through the list skipping four nodes after each accumulation.
By way of example, here's some Python code(2) for doing the top-left-to-bottom-right variant, which outputs 34 (1+6+11+16) as expected:
def sumOfDiagonals(arr):
sum = 0
for i in range(len(arr)):
sum += arr[i][i]
return sum
print(sumOfDiagonals([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]]))
(1) To do top right to bottom left simply requires you to change the second term into sz - i - 1.
(2) Python is the ideal pseudo-code language when you want to be able to test your pseudo-code, provided you stay away from its more complex corners :-)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.