Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have a list full of integers(it's not sorted) and I have 2 input:
-input no.1 the sum I want to get
-input no.2 the maximum number of usable element to get the sum
The sum can't be higher than the given value(input no.1) but can be less by -10. The number of used elements of the list can be equal to or less than the given value(input no.2).
from random import choice
def Diff(li1, li2):
return (list(list(set(li1)-set(li2)) + list(set(li2)-set(li1))))
def find_the_elements(current_sum, wanted_sum, used_elements, max_number_of_elements, n_o_elements):
solution = 0
while solution != 1:
elemnt=choice(Diff(elemts, used_elements))
used_elements.append(elemnt)
current_sum+=elemnt
n_o_elements+=1
if max_number_of_elements<=max_number_of_elements and current_sum in wanted_sum:
return used_elements
elif n_o_elements>max_number_of_elements or current_sum>wanted_sum.stop:
return -1
else:
x=find_the_elements(current_sum=current_sum, wanted_sum=wanted_sum, used_elements=used_elements, n_o_elements=n_o_elements, max_number_of_elements=max_number_of_elements)
if x!=-1:
return used_elements
elif x==-1:
return -1
elemts = [535, 508, 456, 612, 764, 628, 530, 709, 676, 546, 579, 676,
564, 565, 742, 657, 577, 514, 650, 590, 621, 642, 684, 567, 670, 609, 571, 655, 681, 615, 617, 569, 656, 615,
542, 711, 777, 763, 663, 657, 532, 630, 636, 445, 495, 567, 603, 598, 629, 651, 608, 653, 669, 603, 655, 622,
578, 551, 560, 712, 642, 637, 545, 631, 479, 614, 710, 458, 615, 659, 636, 578, 629, 622, 584, 582, 650, 636,
693, 527, 577, 711, 601, 530, 1028, 683, 589, 590, 670, 409,582, 635, 558, 607, 648, 542, 726, 534, 540, 590, 649, 482, 664, 629, 555, 596, 613, 572, 516, 479, 562, 452,
586]
max_no_elements = int(input())
wanted_sum = int(input())
solution = -1
while solution == -1:
solution = find_the_elements(current_sum=0, wanted_sum=range(wanted_sum - 10, wanted_sum + 1), used_elements=[], max_number_of_elements=max_no_elements, n_o_elements=0)
print(solution)
That's my solution for it but I think I should do it differently because originally I work with a much bigger list and each elements(integer) of the list is much 10-20x bigger.
Recursion with memoization (i.e. dynamic programing) is probably the best approach for this:
def closeSum(A,S,N,p=0,memo=None):
if not N: return [],0
if memo is None: memo = dict() # memoization
if (S,N,p) in memo: return memo[S,N,p]
best,bestSum = [],0
for i,a in enumerate(A[p:],p): # combine remaining elements for sum
if a>S: continue # ignore excessive values
if a == S: return [a],a # end on perfect match
r = [a] + closeSum(A,S-a,N-1,i+1,memo)[0] # extend sum to get closer
sr = sum(r)
if sr+10>=S and sr>bestSum: # track best so far
best,bestSum = r,sr
memo[S,N,p]=(best,sum(best)) # memoization
return best,sum(best)
output:
elemts = [535, 508, 456, 612, 764, 628, 530, 709, 676, 546, 579, 676,
564, 565, 742, 657, 577, 514, 650, 590, 621, 642, 684, 567, 670, 609, 571, 655, 681, 615, 617, 569, 656, 615,
542, 711, 777, 763, 663, 657, 532, 630, 636, 445, 495, 567, 603, 598, 629, 651, 608, 653, 669, 603, 655, 622,
578, 551, 560, 712, 642, 637, 545, 631, 479, 614, 710, 458, 615, 659, 636, 578, 629, 622, 584, 582, 650, 636,
693, 527, 577, 711, 601, 530, 1028, 683, 589, 590, 670, 409,582, 635, 558, 607, 648, 542, 726, 534, 540, 590, 649, 482, 664, 629, 555, 596, 613, 572, 516, 479, 562, 452,
586]
closeSum(elemts,1001,3)
[456, 545], 1001
closeSum(elemts,5522,7)
[764, 742, 777, 763, 712, 1028, 726], 5512
closeSum(elemts,5522,10)
[535, 508, 456, 612, 764, 628, 530, 546, 409, 534], 5522
It works relatively fast when there is an exact match but still takes a while for the larger values/item counts when it doesn't.
Note that there is still room for some optimization such as keeping track of the total of remaining elements (from position p) and exiting if they can't add up to the target sum.
This question already has answers here:
Print the even numbers from a given list
(6 answers)
Closed 2 years ago.
I hope you understand my question. If the list called nums have even numbers ı want it to print them.
import random
nums = [951, 402, 984, 651, 360, 69, 408, 319, 601, 485, 980, 507, 725, 547, 544,
615, 83, 165, 141, 501, 263, 617, 865, 575, 219, 390, 984, 592, 236, 105, 942, 941,
386, 462, 47, 418, 907, 344, 236, 375, 823, 566, 597, 978, 328, 615, 953, 345,
399, 162, 758, 219, 918, 237, 412, 566, 826, 248, 866, 950, 626, 949, 687, 217,
815, 67, 104, 58, 512, 24, 892, 894, 767, 553, 81, 379, 843, 831, 445, 742, 717,
958, 609, 842, 451, 688, 753, 854, 685, 93, 857, 440, 380, 126, 721, 328, 753, 470,
743, 527]
if nums % 2 == 0:
for i in nums:
print(i)
if i == 3:
b
else:
i += 1
Just rearrange this code:
# loop trough the list
for i in nums:
# check if number is even
if i % 2 == 0:
# print it
print(i)
You are checking if the list is divisible by 2(which is impossible):
f = [i for i in nums if i%2 == 0]
for i in f:
print(i)
I have a list of strings that are 10 characters long.
final_list = ['ACTGCATGTC',
'CAACACAACG',
'TTCATGCCGA',
'AGCCGTGTAT',
'CAGTCACCAT',
'TCGTACGTGC',
'GAGATTGGTG',
'GCATGTTCCA',
...]
Full file
I would like to pick 384 from the 1389 total strings, so that the A,C,G and T characters are as equally represented as possible:
from collections import defaultdict
import pandas as pd
balance_df = pd.DataFrame.from_records(final_list)
pos_dict = defaultdict()
for i in range(0, len(balance_df.columns)):
pos_dict[i] = Counter(balance_df[i])
pd.DataFrame.from_dict(pos_dict)
Ideally every letter should be represented 96 times at each position in the final 384 list.
0 1 2 3 4 5 6 7 8 9
A 383 375 372 353 342 342 333 326 319 318
C 401 398 388 380 380 373 367 372 381 379
G 304 317 315 350 349 360 363 366 372 380
T 301 299 314 306 318 314 326 325 317 312
I attempted to do this by keeping track of accepted strings and then creating a list of the two most under represented characters and allowing only those to be added the next iteration:
from heapq import nsmallest
compliance_dict = defaultdict(dict)
for s in range(0,10):
#set up dict
compliance_dict[s]['A'] = 0
compliance_dict[s]['T'] = 0
compliance_dict[s]['G'] = 0
compliance_dict[s]['C'] = 0
def acceptable_balance(counts, str_to_add):
allowed = defaultdict(list)
for s in range(0,10):
ratio_dict = defaultdict()
total_row = sum(compliance_dict[s].values())
if total_row == 0:
allowed[s].extend(['A','T','C','G'])
else:
ratio_dict['A'] = compliance_dict[s].get('A')/total_row
ratio_dict['T'] = compliance_dict[s].get('T')/total_row
ratio_dict['G'] = compliance_dict[s].get('G')/total_row
ratio_dict['C'] = compliance_dict[s].get('C')/total_row
two_lowest = nsmallest(2,ratio_dict,key=lambda x: (ratio_dict.get(x),x))
for al in two_lowest:
allowed[s].append(al)
reject = []
for s in range(0,10):
if str_to_add[s] in allowed[s]:
reject.append(0)
else:
reject.append(1)
if sum(reject) == 0:
add = True
else:
add = False
return add
def check_balance(count_dict, new_str):
added = False
if acceptable_balance(count_dict, new_str):
for s in range(0,len(new_str)):
#add count
count_dict[s][new_str[s]] += 1
added = True
return added
First of all, there are 1.07e354 combinations, so brute forcing them is impossible.
Any algorithm which depends on making future decisions based on what strings have been accepted so far is liable to be stuck in a local extremum. For example, what if the next string fits your criteria, but if you were to reject it and wait for the one after it, you'd get a perfect solution? And if you accept the next one, which you will do, the one after it may now be rejected anyway. In the worst case, based on your choices so far no available string will be good anymore and you won't be able to reach a solution.
Your method is very inflexible, because you'll reject any string that doesn't have one of the two most under represented bases for each position. You can't even reach a solution unless you have a pretty low tolerance, e.g., allow a string as long as half of its bases are from the two most under represented for each position. And even then the solution will be very suboptimal.
Solution
I propose an iterative metric minimisation approach. You choose any 384 strings and you leave the rest in a "pool". For each string in your chosen list, you substitute it with each one in the pool and measure whether this improves your metric. If it does, you make the switch. After you have gone through all 384 strings, if your metric has improved, you can begin the process again, else you have converged to a solution.
We can represent each string as a 4x10 table like the one in your question, with 1s in the appropriate places and 0 everywhere else. In fact, it's slightly more efficient if we have a flat array with 40 elements, but the idea is the same. After we sum all 384 such arrays, we get the equivalent of your pandas table. Since the mean is 96 by definition and you want as many elements as possible to be as close to 96 as possible, the standard deviation (SD) is the perfect metric.
import numpy as np
def decompose_strings(strings):
decomposition = np.zeros((len(strings), 40,))
strides = dict(zip('ATCG', range(4)))
for i, string in enumerate(strings):
for j, value in enumerate(string):
decomposition[i,10 * strides[value] + j] = 1
return decomposition
def minimise_variance(table, size):
idx = list(np.random.choice(range(table.shape[0]), size, replace=False))
chosen = idx
pool = [i for i in range(table.shape[0]) if i not in idx]
print('{0:>10s}{1:>10s}'.format('start', 'end'))
print('-' * 20)
std = table[chosen].sum(axis=0).std()
while True:
start_std = std
for i, chosen_idx in enumerate(chosen):
# for each `i`, the remaining `size` - 1 elements will sum up
# to the same costant, so we should only calculate it once
temp_sum = table[chosen].sum(axis=0) - table[chosen_idx]
j_better = None
for j, pool_idx in enumerate(pool):
current_std = (temp_sum + table[pool_idx]).std()
if current_std < std:
std = current_std
j_better = j
if j_better is not None:
chosen[i] = pool[j_better]
pool[j_better] = chosen_idx
else:
chosen[i] = chosen_idx
print('{0:10.6f}{1:10.6f}'.format(start_std, std))
if start_std == std:
break
return chosen
And to run it
with open('final_list.txt') as f:
data = f.read().split('\n')[:-1]
table = decompose_strings(data)
solution = minimise_variance(table, 384)
On average, a solution converges in 4 iterations, with each iteration taking 15 seconds on my machine.
Every solution will have a lot of table values with 96 and a few will be 95 or 97. In fact, each 95 will be paired with a 97, so that the mean can be 96. This means that the number of errors will always be an even number and in this case we can even calculate the SD with np.sqrt(errors / 40).
I collected the results from 200 runs and plotted a histogram of the number of errors (inversed the formula above to compute it from the SD).
EDIT
We can do better than that if we chain the solutions. We call the function again and ask it to start with the previously returned solution, but we swap one element for a new one and then let it converge. While it is true that by swapping a random element in we increase the SD and the new solution may even have a higher SD than the previous one, the SD seems to generally be confined in the 10-14 error range. Not only that, but it is very likely the new function call will converge within 2 iterations; one to find something new and one to confirm that there is nothing better.
# just change this
def minimise_variance(table, size):
idx = list(np.random.choice(range(table.shape[0]), size, replace=False))
# to this
def minimise_variance(table, size, idx=None):
if not idx:
idx = list(np.random.choice(range(table.shape[0]), size, replace=False))
else:
idx = list(idx)
# By shuffling the indices we ensure there is no bias
# in which element is rotated out and which ones are
# considered first for improvement.
np.random.shuffle(idx)
while True:
switch_idx = np.random.choice(range(table.shape[0]))
if switch_idx not in idx:
# if we were to switch out the first element, it's likely
# the old solution could be found again
idx[-1] = switch_idx
break
And run it like so
solutions = [minimise_variance(table, 384)]
for _ in range(1, 10):
solutions.append(minimise_variance(table, 384, solutions[-1]))
I wrote a C version of this code and collected 100k runs.
There were 22 solutions with 4 errors, all fairly unique with one another.
The sorted indices of one of them were
[3, 11, 28, 121, 123, 125, 132, 263, 264, 272, 292, 307, 314, 319, 334, 341, 350, 355, 365, 366, 371, 388, 390, 399, 401, 404, 425, 434, 441, 449, 458, 459, 474, 475, 480, 484, 485, 486, 487, 488, 489, 490, 496, 498, 499, 500, 501, 502, 504, 505, 507, 508, 512, 516, 517, 518, 519, 523, 525, 530, 534, 535, 540, 541, 544, 546, 548, 549, 551, 552, 555, 557, 558, 559, 560, 562, 563, 564, 566, 567, 569, 570, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 586, 587, 589, 591, 593, 600, 611, 633, 643, 647, 655, 658, 659, 665, 667, 668, 669, 672, 674, 679, 680, 683, 686, 693, 697, 715, 718, 720, 723, 724, 725, 729, 732, 735, 736, 737, 741, 742, 749, 751, 753, 755, 758, 760, 764, 765, 766, 767, 771, 772, 773, 775, 779, 780, 782, 783, 786, 787, 789, 790, 791, 798, 801, 806, 807, 808, 810, 811, 814, 816, 817, 820, 822, 823, 825, 826, 827, 830, 831, 832, 834, 835, 836, 840, 843, 845, 846, 847, 849, 850, 853, 855, 858, 867, 871, 874, 884, 887, 889, 897, 900, 905, 912, 915, 918, 941, 946, 956, 958, 959, 966, 971, 975, 976, 980, 984, 986, 988, 990, 991, 996, 999, 1001, 1003, 1011, 1013, 1015, 1016, 1017, 1018, 1020, 1028, 1029, 1032, 1036, 1037, 1038, 1039, 1041, 1042, 1045, 1046, 1047, 1048, 1049, 1050, 1055, 1057, 1058, 1059, 1060, 1061, 1062, 1063, 1064, 1065, 1066, 1067, 1069, 1071, 1072, 1074, 1075, 1076, 1077, 1078, 1080, 1083, 1084, 1085, 1087, 1089, 1091, 1093, 1095, 1098, 1099, 1103, 1107, 1109, 1110, 1113, 1118, 1119, 1124, 1125, 1126, 1127, 1128, 1130, 1133, 1135, 1136, 1138, 1140, 1141, 1142, 1145, 1146, 1149, 1150, 1152, 1153, 1154, 1156, 1157, 1158, 1159, 1160, 1161, 1162, 1163, 1164, 1165, 1166, 1167, 1169, 1170, 1171, 1173, 1175, 1176, 1178, 1179, 1180, 1181, 1182, 1183, 1184, 1185, 1187, 1188, 1189, 1190, 1191, 1192, 1194, 1196, 1198, 1199, 1201, 1203, 1204, 1205, 1206, 1207, 1208, 1209, 1210, 1211, 1212, 1213, 1214, 1217, 1218, 1220, 1221, 1222, 1223, 1224, 1225, 1226, 1227, 1230, 1231, 1233, 1234, 1235, 1236, 1240, 1241, 1242, 1243, 1246, 1247, 1250, 1255, 1257, 1258, 1259, 1260, 1262, 1265, 1266, 1267, 1268, 1276, 1279, 1321]
And its pandas table
0 1 2 3 4 5 6 7 8 9
A 96 96 96 96 96 96 96 96 96 96
C 97 96 96 96 96 96 96 96 96 96
G 96 96 96 96 96 97 96 96 96 96
T 95 96 96 96 96 95 96 96 96 96
I have a 2D numpy array, z, in which I would like to assign values to nan based on the equation of a line +/- a width of 20. I am trying to implement the Raman 2nd scattering correction as it is done by the eem_remove_scattering method in the eemR package listed here:
https://cran.r-project.org/web/packages/eemR/vignettes/introduction.html
but the method isn't visible.
import numpy as np
ex = np.array([240, 245, 250, 255, 260, 265, 270, 275, 280, 285, 290, 295, 300,
305, 310, 315, 320, 325, 330, 335, 340, 345, 350, 355, 360, 365,
370, 375, 380, 385, 390, 395, 400, 405, 410, 415, 420, 425, 430,
435, 440, 445, 450])
em = np.array([300, 302, 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324,
326, 328, 330, 332, 334, 336, 338, 340, 342, 344, 346, 348, 350,
352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376,
378, 380, 382, 384, 386, 388, 390, 392, 394, 396, 398, 400, 402,
404, 406, 408, 410, 412, 414, 416, 418, 420, 422, 424, 426, 428,
430, 432, 434, 436, 438, 440, 442, 444, 446, 448, 450, 452, 454,
456, 458, 460, 462, 464, 466, 468, 470, 472, 474, 476, 478, 480,
482, 484, 486, 488, 490, 492, 494, 496, 498, 500, 502, 504, 506,
508, 510, 512, 514, 516, 518, 520, 522, 524, 526, 528, 530, 532,
534, 536, 538, 540, 542, 544, 546, 548, 550, 552, 554, 556, 558,
560, 562, 564, 566, 568, 570, 572, 574, 576, 578, 580, 582, 584,
586, 588, 590, 592, 594, 596, 598, 600])
X, Y = np.meshgrid(ex, em)
z = np.sin(X) + np.cos(Y)
The equation that I would like to apply is em = - 2 ex/ (0.00036*ex-1) + 500.
I want to set every value in the array that intersects this line (+/- 20 ) to be set to nans. Its simple enough to set a single element to nans, but I havent been able to locate a python function to apply this equation to the array and only set values that intersect with this line to nans.
The desired output would be a new array with the same dimensions as z, but with the values that intersect the line equivalent to nan. Any suggestions on how to proceed are greatly appreciated.
Use np.where in the form np.where( "condition for intersection", np.nan, z):
zi = np.where( np.abs(-2*X/(0.00036*X-1) + 500 - Y) <= 20, np.nan, z)
As a matter of fact, there are no intersections here because (0.00036*ex-1) is close to -1 for all your values, which makes - 2*ex/(0.00036*ex-1) close to 2*ex, and adding 500 brings this over any values you have in em. But in principle this works.
Also, I suspect that the goal you plan to achieve by setting those values to NaN would be better achieved by using a masked array.
This question already has answers here:
Print all even numbers in a list until a given number
(6 answers)
Closed 10 years ago.
My (simple) code gives an error at the last line. What am I doing wrong?
The question:
Loop through and print out all even numbers from the numbers list in the same order they are received. Don't print any numbers that come after 237 in the sequence.
What am I doing wrong?
numbers = [951, 402, 984, 651, 360, 69, 408, 319, 601, 485, 980, 507, 725, 547, 544,
615, 83, 165, 141, 501, 263, 617, 865, 575, 219, 390, 984, 592, 236, 105, 942, 941,
386, 462, 47, 418, 907, 344, 236, 375, 823, 566, 597, 978, 328, 615, 953, 345,
399, 162, 758, 219, 918, 237, 412, 566, 826, 248, 866, 950, 626, 949, 687, 217,
815, 67, 104, 58, 512, 24, 892, 894, 767, 553, 81, 379, 843, 831, 445, 742, 717,
958, 609, 842, 451, 688, 753, 854, 685, 93, 857, 440, 380, 126, 721, 328, 753, 470,
743, 527]
# your code goes here
for number in numbers:
if number <= 237 and number % 2 == 0:
continue
print numbers
You need to lose the continue, it'll move the loop to the next iteration instead. I think you were looking for break (when you find 237).
Just print number, but do make print() a function for Python 3.
for number in numbers:
if number == 237:
break
if number % 2 == 0:
print(number)