I am given a table of teams A and B where for each pair of 2 players there is number. The rows represent players of players of team A and columns of players of the team B. If a number is positive, it means that the player A is better than the plyaer from the B team and vice versa if negative.
For example:
-710 415 527 -641 175 48
-447 -799 253 626 304 895
509 -523 -758 -678 -689 92
24 -318 -61 -9 174 255
487 408 696 861 -394 -67
Both teams know this table.
Now, what is done is that the team A reports 5 players, the team B can look at them and choose the best 5 players for them.
If we want to compere the teams we sum up the numbers on the given positions from the table knowing that each team has a captain who is counted twice (as if a team had 6 players and the captain is there twice), if the sum is positive, the team A is better.
The input are numbers a (the number of rows/players A) and b (columns/players B) and the table like this:
6
6
-54 -927 428 -510 911 93
-710 415 527 -641 175 48
-447 -799 253 626 304 895
509 -523 -758 -678 -689 92
24 -318 -61 -9 174 255
487 408 696 861 -394 -67
The output should be 1282.
So, what I did was that I put the numbers into a matrix like this:
a, b = int(input()), int(input())
matrix = [list(map(int,input().split())) for _ in range(a)]
I used a MinHeap and a MaxHeap for this. I put the rows into the MaxHeap because A team wants the biggest, then I get 5 best A players from it as follows:
for player, values in enumerate(matrix):
maxheap.enqueue(sum(values), player)
playersA = []
overallA = 0
for i in range(5):
ov, pl = maxheap.remove_max()
if i == 0: # it is a captain
playersA.append(pl)
overallA += ov
playersA.append(pl)
overallA += ov
The B team knowing the A players the uses the MinHeap to find its best 5 players:
for i in range(b):
player = []
ov = 0
for j in range(a): #take out a column of a matrix
player.append(matrix[j][i])
for rival in playersA: #counting only players already chosen by A
ov += player[rival]
minheap.enqueue(ov,i)
playersB = []
overallB = 0
for i in range(5):
ov, pl = minheap.remove_min()
if i == 0:
playersB.append(pl)
overallB += ov
playersB.append(pl)
overallB += ov
Having the players, then I count the sum from the matrix:
out = 0
for a in playersA:
for b in playersB:
out += matrix[a][b]
print(out)
However, this solution doesn't give the right solutions always. For example, it does for the input:
10
10
-802 -781 826 997 -403 243 -533 -694 195 182
103 182 -14 130 953 -900 43 334 -724 716
-350 506 184 691 -785 742 -303 -682 186 -520
25 -815 475 -407 -78 509 -512 714 898 243
758 -743 -504 -160 855 -792 -177 747 188 -190
333 -439 529 795 -500 112 625 -2 -994 282
824 498 -899 158 453 644 117 598 432 310
-799 594 933 -15 47 -687 68 480 -933 -631
741 400 979 -52 -78 -744 -573 -170 882 -610
-376 -928 -324 658 -538 811 -724 848 344 -308
But it doesn't for
11
11
279 475 -894 -641 -716 687 253 -451 580 -727 -509
880 -778 -867 -527 816 -458 -136 -517 217 58 740
360 -841 492 -3 940 754 -584 715 -389 438 -887
-739 664 972 838 -974 -802 799 258 628 3 815
952 -404 -273 -323 -948 674 687 233 62 -339 352
285 -535 -812 -452 -335 -452 -799 -902 691 195 -837
-78 56 459 -178 631 -348 481 608 -131 -575 732
-212 -826 -547 440 -399 -994 486 -382 -509 483 -786
-94 -983 785 -8 445 -462 -138 804 749 890 -890
-184 872 -341 776 447 -573 405 462 -76 -69 906
-617 704 292 287 464 -711 354 428 444 -42 45
So the question is: Can it be done like this or is there another fast algorithm ( O(n ** 2 ) / O(n ** 3) etc.), or I just gave to try all the possible combinations using brute force in O(n!) time complexity?
There is a way to do that with a polynomial complexity.
To show you why your solution doesn't work, let's consider an other simpler problem. Let's say each team only choose 2 players and there is no captain.
Let's also take a simple score matrix:
1 1 1 2 1
1 1 1 1 1
0 3 0 2 0
0 0 0 0 4
0 0 0 0 4
Here you can see that team A has no chance to win (as there are no negative numbers), but still they are going to try their best. Who should they pick?
Using your algorithm, team A should pick their best players and their ranking would be:
pa0 < pa1 = pa2 < pa3 = pa4
If they choose pa3 and pa4, who both have a score of 4 (which is bad, but not as bad as pa0's score of 6), team B will win by 8 (they will choose pb4 and an other player who doesn't matter).
On the other hand, if team A chose pa0 and pa1 (who are worse than pa3 and pa4 by your metric), the best team B can get is winning by 5 (if they choose pb3 and any other player)
Basically, your approximation fails to take into consideration that team B can only choose two players and thus can't take advantage of the pa0+pa1 weakness while it can easily exploit pa3+pa4's one.
A better solution would be for team A to evaluate each player's score only by taking into account their 2 worst scores (or 5 if 5 players are to be selected): this would make the ranking as such:
pa2 < pa3 = pa4 < pa0 < pa1
Still it would be an approximation: some combinations like pa2+pa3 are actually not as bad as they sound as, once again, the weaknesses are spread enough that team B can't exploit them all (although for this very example the approximation yields the best result).
What we really need to pick is not the two best players, but the best combination of two players, and sadly there is no way I know of other than trying all the $s!/(k!(s-k)!)$ combinations of k players among s (the size of the team). It is not so bad, though, as for k=2 that's only $s*(s-1)/2$ and for k=5 that's $s*(s-1)(s-2)(s-3)*(s-4)/5!$, which is still polynomial in complexity despite being in O(s^5). Adding a captain to the mix only multiplies the number of combinations by k. It also requires a twist on how to calculate the score but you should be able to find that.
Now that team A have selected their players, team B have the easy job to select theirs. This is way simpler as here each player can be chosen individually.
example of how this last algorithm should work with the score matrix provided in the beginning.
team A has 10 possible combinations: pa0+pa1, pa0+pa2, pa0+pa3, pa0+pa4, pa1+pa2, pa1+pa3, pa1+pa4, pa2+pa3, pa2+pa4, pa3+pa4. Their respective scores are: 5, 8, 7, 7, 7, 6, 6, 7, 7, 8.
The best combination is pa0+pa1, so that's what they send to team B.
Team B calculate each of its player's score against pa0+pa1: pb0:2, pb1:2, pb2:2, pb3:3, pb4:2. pb3 is the best, all the others are equals, thus team B sends pb3+pb4 (for example), and the "answer" is 5.
I have a huge dataset which contains coordinates of particles. In order to split the data into test and training set I want to divide the space into many subspaces; I did this with a for-loop in every direction (x,y,z) but when running the code it takes very long and is not efficient enough especially for large datasets:
particle_boxes = []
init = 0
final = 50
number_box = 5
for i in range(number_box):
for j in range(number_box):
for k in range(number_box):
index_particle = df_particles['X'].between(init+i*final, final+final*i)&df_particles['Y'].between(init+j*final, final+final*j)&df_particles['Z'].between(init+k*final, final+final*k)
particle_boxes.append(df_particles[index_particle])
where init and final define the box size, df_particles contains every particle coordinate (x,y,z).
After running this particle_boxes contains 125 (number_box^3) equal spaced subboxes.
Is there any way to write this code more efficiently?
Note on efficiency
I conducted a number of tests using other tricks and nothing changed substantially. This is roughly as good as any other technique I used.
I'm curious to see if anyone else comes up with something order of magnitude faster.
Sample data
np.random.seed([3, 1415])
df_particles = pd.DataFrame(
np.random.randint(250, size=(1000, 3)),
columns=['X', 'Y', 'Z']
)
Solution
Construct an array a that represents your boundaries
a = np.array([50, 100, 150, 200, 250])
Then use searchsorted to create the individual dimensional bins
x_bin = a.searchsorted(df_particles['X'].to_numpy())
y_bin = a.searchsorted(df_particles['Y'].to_numpy())
z_bin = a.searchsorted(df_particles['Z'].to_numpy())
Use groupby on the three bins. I used trickery to get that into a dict
g = dict((*df_particles.groupby([x_bin, y_bin, z_bin]),))
We can see the first zone
g[(0, 0, 0)]
X Y Z
30 2 36 47
194 0 34 45
276 46 37 34
364 10 16 21
378 4 15 4
429 12 34 13
645 36 17 5
743 18 36 13
876 46 11 34
and the last
g[(4, 4, 4)]
X Y Z
87 223 236 213
125 206 241 249
174 218 247 221
234 222 204 237
298 208 211 225
461 234 204 238
596 209 229 241
731 210 220 242
761 225 215 231
762 206 241 240
840 211 241 238
846 212 242 241
899 249 203 228
970 214 217 232
981 236 216 248
Instead of multiple nested for loops, consider one loop using itertools.product. But of course avoid any loops if possible as #piRSquared shows:
from itertools import product
particle_boxes = []
for i, j, k in product(range(number_box), range(number_box), range(number_box)):
index_particle = (df_particles['X'].between(init+i*final, final+final*i) &
df_particles['Y'].between(init+j*final, final+final*j) &
df_particles['Z'].between(init+k*final, final+final*k))
particle_boxes.append(df_particles[index_particle])
Alternatively, with list comprehension:
def sub_df(i, j, k)
index_particle = (df_particles['X'].between(init+i*final, final+final*i) &
df_particles['Y'].between(init+j*final, final+final*j) &
df_particles['Z'].between(init+k*final, final+final*k))
return df_particles[index_particle]
particle_boxes = [sub_df(i, j, k) for product(range(number_box), range(number_box), range(number_box))]
Have a look at train_test_split function available in the scikit-learn lib.
I think it is almost the kind of functionality that you need.
The code is consultable on Github.
Program description:
Find all the prime numbers between 1 and 4,027 and print them in a table which
"reads down", using as few rows as possible, and using as few sheets of paper
as possible. (This is because I have to print them out on paper to turn it in.) All numbers should be right-justified in their column. The height
of the columns should all be the same, except for perhaps the last column,
which might have a few blank entries towards its bottom row.
The plan for my first function is to find all prime numbers between the range above and put them in a list. Then I want my second function to display the list in a table that reads up to down.
2 23 59
3 29 61
5 31 67
7 37 71
11 41 73
13 43 79
17 47 83
19 53 89
ect...
This all I've been able to come up with myself:
def findPrimes(n):
""" Adds calculated prime numbers to a list. """
prime_list = list()
for number in range(1, n + 1):
prime = True
for i in range(2, number):
if(number % i == 0):
prime = False
if prime:
prime_list.append(number)
return prime_list
def displayPrimes():
pass
print(findPrimes(4027))
I'm not sure how to make a row/column display in Python. I remember using Java in my previous class and we had to use a for loop inside a for loop I believe. Do I have to do something similar to that?
Although I frequently don't answer questions where the original poster hasn't even made an attempt to solve the problem themselves, I decided to make an exception of yours—mostly because I found it an interesting (and surprisingly challenging) problem that required solving a number of somewhat tricky sub-problems.
I also optimized your find_primes() function slightly by taking advantage of some reatively well-know computational shortcuts for calculating them.
For testing and demo purposes, I made the tables only 15 rows high to force more than one page to be generated as shown in the output at the end.
from itertools import zip_longest
import locale
import math
locale.setlocale(locale.LC_ALL, '') # enable locale-specific formatting
def zip_discard(*iterables, _NULL=object()):
""" Like zip_longest() but doesn't fill out all rows to equal length.
https://stackoverflow.com/questions/38054593/zip-longest-without-fillvalue
"""
return [[entry for entry in iterable if entry is not _NULL]
for iterable in zip_longest(*iterables, fillvalue=_NULL)]
def grouper(n, seq):
""" Group elements in sequence into groups of "n" items. """
for i in range(0, len(seq), n):
yield seq[i:i+n]
def tabularize(width, height, numbers):
""" Print list of numbers in column-major tabular form given the dimensions
of the table in characters (rows and columns). Will create multiple
tables of required to display all numbers.
"""
# Determine number of chars needed to hold longest formatted numeric value
gap = 2 # including space between numbers
col_width = len('{:n}'.format(max(numbers))) + gap
# Determine number of columns that will fit within the table's width.
num_cols = width // col_width
chunk_size = num_cols * height # maximum numbers in each table
for i, chunk in enumerate(grouper(chunk_size, numbers), start=1):
print('---- Page {} ----'.format(i))
num_rows = int(math.ceil(len(chunk) / num_cols)) # rounded up
table = zip_discard(*grouper(num_rows, chunk))
for row in table:
print(''.join(('{:{width}n}'.format(num, width=col_width)
for num in row)))
def find_primes(n):
""" Create list of prime numbers from 1 to n. """
prime_list = []
for number in range(1, n+1):
for i in range(2, int(math.sqrt(number)) + 1):
if not number % i: # Evenly divisible?
break # Not prime.
else:
prime_list.append(number)
return prime_list
primes = find_primes(4027)
tabularize(80, 15, primes)
Output:
---- Page 1 ----
1 47 113 197 281 379 463 571 659 761 863
2 53 127 199 283 383 467 577 661 769 877
3 59 131 211 293 389 479 587 673 773 881
5 61 137 223 307 397 487 593 677 787 883
7 67 139 227 311 401 491 599 683 797 887
11 71 149 229 313 409 499 601 691 809 907
13 73 151 233 317 419 503 607 701 811 911
17 79 157 239 331 421 509 613 709 821 919
19 83 163 241 337 431 521 617 719 823 929
23 89 167 251 347 433 523 619 727 827 937
29 97 173 257 349 439 541 631 733 829 941
31 101 179 263 353 443 547 641 739 839 947
37 103 181 269 359 449 557 643 743 853 953
41 107 191 271 367 457 563 647 751 857 967
43 109 193 277 373 461 569 653 757 859 971
---- Page 2 ----
977 1,069 1,187 1,291 1,427 1,511 1,613 1,733 1,867 1,987 2,087
983 1,087 1,193 1,297 1,429 1,523 1,619 1,741 1,871 1,993 2,089
991 1,091 1,201 1,301 1,433 1,531 1,621 1,747 1,873 1,997 2,099
997 1,093 1,213 1,303 1,439 1,543 1,627 1,753 1,877 1,999 2,111
1,009 1,097 1,217 1,307 1,447 1,549 1,637 1,759 1,879 2,003 2,113
1,013 1,103 1,223 1,319 1,451 1,553 1,657 1,777 1,889 2,011 2,129
1,019 1,109 1,229 1,321 1,453 1,559 1,663 1,783 1,901 2,017 2,131
1,021 1,117 1,231 1,327 1,459 1,567 1,667 1,787 1,907 2,027 2,137
1,031 1,123 1,237 1,361 1,471 1,571 1,669 1,789 1,913 2,029 2,141
1,033 1,129 1,249 1,367 1,481 1,579 1,693 1,801 1,931 2,039 2,143
1,039 1,151 1,259 1,373 1,483 1,583 1,697 1,811 1,933 2,053 2,153
1,049 1,153 1,277 1,381 1,487 1,597 1,699 1,823 1,949 2,063 2,161
1,051 1,163 1,279 1,399 1,489 1,601 1,709 1,831 1,951 2,069 2,179
1,061 1,171 1,283 1,409 1,493 1,607 1,721 1,847 1,973 2,081 2,203
1,063 1,181 1,289 1,423 1,499 1,609 1,723 1,861 1,979 2,083 2,207
---- Page 3 ----
2,213 2,333 2,423 2,557 2,687 2,789 2,903 3,037 3,181 3,307 3,413
2,221 2,339 2,437 2,579 2,689 2,791 2,909 3,041 3,187 3,313 3,433
2,237 2,341 2,441 2,591 2,693 2,797 2,917 3,049 3,191 3,319 3,449
2,239 2,347 2,447 2,593 2,699 2,801 2,927 3,061 3,203 3,323 3,457
2,243 2,351 2,459 2,609 2,707 2,803 2,939 3,067 3,209 3,329 3,461
2,251 2,357 2,467 2,617 2,711 2,819 2,953 3,079 3,217 3,331 3,463
2,267 2,371 2,473 2,621 2,713 2,833 2,957 3,083 3,221 3,343 3,467
2,269 2,377 2,477 2,633 2,719 2,837 2,963 3,089 3,229 3,347 3,469
2,273 2,381 2,503 2,647 2,729 2,843 2,969 3,109 3,251 3,359 3,491
2,281 2,383 2,521 2,657 2,731 2,851 2,971 3,119 3,253 3,361 3,499
2,287 2,389 2,531 2,659 2,741 2,857 2,999 3,121 3,257 3,371 3,511
2,293 2,393 2,539 2,663 2,749 2,861 3,001 3,137 3,259 3,373 3,517
2,297 2,399 2,543 2,671 2,753 2,879 3,011 3,163 3,271 3,389 3,527
2,309 2,411 2,549 2,677 2,767 2,887 3,019 3,167 3,299 3,391 3,529
2,311 2,417 2,551 2,683 2,777 2,897 3,023 3,169 3,301 3,407 3,533
---- Page 4 ----
3,539 3,581 3,623 3,673 3,719 3,769 3,823 3,877 3,919 3,967 4,019
3,541 3,583 3,631 3,677 3,727 3,779 3,833 3,881 3,923 3,989 4,021
3,547 3,593 3,637 3,691 3,733 3,793 3,847 3,889 3,929 4,001 4,027
3,557 3,607 3,643 3,697 3,739 3,797 3,851 3,907 3,931 4,003
3,559 3,613 3,659 3,701 3,761 3,803 3,853 3,911 3,943 4,007
3,571 3,617 3,671 3,709 3,767 3,821 3,863 3,917 3,947 4,013
Currently I am iterating over one array and for each value in this array I am looking for the closest value at the corresponding point in another array that is within a region surrounding the corresponding point.
In summary: For any point in an array, how far away from a corresponding point in another array do you need to go to get the same value.
The code seems to work well for small arrays, however I am working now with 1024x768 arrays, leading me to wait a long time for each run....
Any help or advice would be greatly appreciated as I have been on this for a while!!
Example matrix in format Im using: np.array[[1,2],[3,4]]
#Distance to agreement
#Used later to define a region of pixels around a corresponding point
#to iterate over:
DTA = 26
#To account for noise in pixels - doesnt have to find the exact value,
#just one within +/-130 of it.
limit = 130
#Containers for all pixel value matches and also the smallest distance
#to pixel match
Dist = []
Dist_min = []
#Continer matrix for gamma pass/fail values
Dist_to_agree = np.zeros((i_size,j_size))
#i,j indexes the reference matrix (x), ii,jj indexes the measured
#matrix(y). Finds a match within the limits,
#appends the distance to the match into Dist.
#Then find the minimum distance to a match for that pixel and append it
#to dist_min
for i, k in enumerate(x):
for j, l in enumerate(k):
#added 10 packing to y matrix, so need to shift it by 10 in i&j
for ii in range((i+10)-DTA,(i+10)+DTA):
for jj in range((j+10)-DTA,(j+10)+DTA):
#If the pixel value is within a range to account for noise,
#let it be "found"
if (y[ii,jj]-limit) <= x[i,j] <= (y[ii,jj]+limit):
#Calculating distance
dist_eu = sqrt(((i)-(ii))**2 + ((j) - (jj))**2)
Dist.append(dist_eu)
#If a value cannot be found within the noise range,
#append 10 = instant fail.
else:
Dist.append(10)
try:
Dist_min.append(min(Dist))
Dist_to_agree[i,j] = min(Dist)
except ValueError:
pass
#Need to reset container or previous values will also be
#accounted for when finding minimum
Dist = []
print Dist_to_agree
First, you are getting the elements of x in k and l, but then throwing that away and indexing x again. So in place of x[i,j], you could just use l, which would be much faster (although l isn't a very meaningful name, something like xi and xij might be better).
Second, you are recomputing y[ii,jj]-limit and y[ii,jj]+limitevery time. If you have enough memory, you can-precomputer these:ym = y-limitandyp = y+limit`.
Third, appending to a list is slower than creating an array and setting the values for long lists vs. long arrays. You can also skip the entire else clause by pre-setting the default value.
Fourth, you are computing min(dist) twice, and further may be using the python version rather than the numpy version, the latter being faster for arrays (which is another reason to make dist and array).
However, the biggest speedup would be to vectorize the inner two loops. Here is my tests, with x=np.random.random((10,10)) and y=np.random.random((100,100)):
Your version takes 623 ms.
Here is my version, which takes 7.6 ms:
dta = 26
limit = 130
dist_to_agree = np.zeros_like(x)
dist_min = []
ym = y-limit
yp = y+limit
for i, xi in enumerate(x):
irange = (i-np.arange(i+10-dta, i+10+dta))**2
if not irange.size:
continue
ymi = ym[i+10-dta:i+10+dta, :]
ypi = yp[i+10-dta:i+10+dta, :]
for j, xij in enumerate(xi):
jrange = (j-np.arange(j+10-dta, j+10+dta))**2
if not jrange.size:
continue
ymij = ymi[:, j+10-dta:j+10+dta]
ypij = ypi[:, j+10-dta:j+10+dta]
imesh, jmesh = np.meshgrid(irange, jrange, indexing='ij')
dist = np.sqrt(imesh+jmesh)
dist[ymij > xij or xij < ypij] = 10
mindist = dist.min()
dist_min.append(mindist)
dist_to_agree[i,j] = mindist
print(dist_to_agree)
#Ciaran
Meshgrid is kinda a vectorized equivalent of two nested loops. Below are two equivalent ways of calculating the dist. One with loops and one with meshgrid+numpy vector operations. The second one is six times faster.
DTA=5
i=100
j=200
def func1():
dist1=np.zeros((DTA*2,DTA*2))
for ii in range((i+10)-DTA,(i+10)+DTA):
for jj in range((j+10)-DTA,(j+10)+DTA):
dist1[ii-((i+10)-DTA),jj-((j+10)-DTA)] =sqrt(((i)-(ii))**2 + ((j) - (jj))**2)
return dist1
def func2():
dist2=np.zeros((DTA*2,DTA*2))
ii, jj = meshgrid(np.arange((i+10)-DTA,(i+10)+DTA),
np.arange((j+10)-DTA,(j+10)+DTA))
dist2=np.sqrt((i-ii)**2+(j-jj)**2)
return dist2
This is how ii and jj matrices look after meshgrid operation
ii=
[[105 106 107 108 109 110 111 112 113 114]
[105 106 107 108 109 110 111 112 113 114]
[105 106 107 108 109 110 111 112 113 114]
[105 106 107 108 109 110 111 112 113 114]
[105 106 107 108 109 110 111 112 113 114]
[105 106 107 108 109 110 111 112 113 114]
[105 106 107 108 109 110 111 112 113 114]
[105 106 107 108 109 110 111 112 113 114]
[105 106 107 108 109 110 111 112 113 114]
[105 106 107 108 109 110 111 112 113 114]]
jj=
[[205 205 205 205 205 205 205 205 205 205]
[206 206 206 206 206 206 206 206 206 206]
[207 207 207 207 207 207 207 207 207 207]
[208 208 208 208 208 208 208 208 208 208]
[209 209 209 209 209 209 209 209 209 209]
[210 210 210 210 210 210 210 210 210 210]
[211 211 211 211 211 211 211 211 211 211]
[212 212 212 212 212 212 212 212 212 212]
[213 213 213 213 213 213 213 213 213 213]
[214 214 214 214 214 214 214 214 214 214]]
for loops are very slow in pure python and you have four nested loops which will be very slow. Cython does wonders to the for loop speed. You can also try vectorization. While I'm not sure I fully understand what you are trying to do, you may try to vectorize at last some of the operations. Especially the last two loops.
So instead of two ii,jj cycles over
y[ii,jj]-limit) <= x[i,j] <= (y[ii,jj]+limit)
you can do something like
ii, jj = meshgrid(np.arange((i+10)-DTA,(i+10)+DTA), np.arange((j+10)-DTA,(j+10)+DTA))
t=(y[(i+10)-DTA,(i+10)+DTA]-limit>=x[i,j]) & (y[(i+10)-DTA,(i+10)+DTA]+limit<=x[i,j])
Dist=np.sqrt((i-ii)**2)+(j-jj)**2))
np.min(Dist[t]) will have your minimum distance for element i,j
The numbapro compiler offers gpu Acceleration. Unfortunately it isn't free.
http://docs.continuum.io/numbapro/