Wrong number in binary search - python

I am making this binary search program in Python and I have a very long list (1000 numbers long).
You are supposed to write in a number yourself, and the program should give you the index of where that number is placed in the list (provided the number you write in is in the list). However, this the index given rarely is the exact actual index. Say, the actual index of a number is 300, the program will maybe say 301, or 302. It is very close, but not perfect. Sometimes though, it does give the right answer. I would like it to give the exact correct answer, but I don't know what's wrong. Any help would be appreciated.
print ('BINARY SEARCH PROGRAMME\n')
data = [3079, 1006, 1965, 3275, 1466, 3498, 3606, 8140, 2854, 6241, 7216, 2973, 641, 9911, 7346, 5211, 4851, 4023, 9335, 6645, 7951, 1034, 581, 1585, 5519, 8386, 204, 3700, 1737, 8597, 4922, 7094, 4329, 8766, 1092, 1799, 1151, 8316, 4267, 2368, 9505, 2829, 1986, 3527, 8817, 1013, 6209, 7749, 8152, 3887, 1361, 702, 1888, 6807, 9101, 523, 1862, 46, 5094, 4799, 625, 946, 5684, 5832, 1650, 9902, 2727, 4896, 616, 166, 8065, 5033, 9533, 1833, 220, 9397, 6388, 9185, 4247, 3676, 7673, 2437, 5206, 2130, 4553, 2492, 2719, 6212, 5541, 7550, 4685, 4157, 2336, 5759, 5896, 1327, 4004, 4573, 3213, 9708, 1677, 9583, 8436, 4220, 3229, 1261, 1377, 8365, 2613, 407, 2115, 6429, 9410, 1029, 2224, 6753, 4099, 5966, 9341, 5764, 4381, 6461, 6834, 5089, 6518, 2402, 5685, 7705, 5188, 713, 7147, 6068, 3020, 5723, 4972, 5052, 7133, 8567, 7023, 8086, 9068, 1088, 9025, 7471, 4852, 2624, 7067, 3847, 9886, 8565, 1727, 79, 8563, 376, 9949, 2487, 2155, 9557, 6811, 9175, 9862, 998, 4009, 93, 6923, 4405, 3831, 3780, 6708, 1353, 144, 8801, 6855, 3826, 5253, 2582, 1925, 7419, 5580, 3956, 2320, 7855, 7046, 9992, 4184, 6648, 9706, 3002, 9395, 4265, 122, 8642, 3182, 3106, 2581, 8537, 6644, 1682, 7000, 858, 1686, 1518, 4461, 4728, 6195, 3391, 1929, 9055, 3814, 9391, 2527, 7888, 1464, 9455, 1516, 4464, 9784, 119, 16, 8145, 9683, 3530, 9457, 9310, 8551, 9214, 4501, 205, 9173, 8842, 9488, 4692, 993, 5452, 1807, 9405, 3272, 5295, 2984, 8835, 3534, 6436, 7429, 9729, 8975, 7513, 2453, 6935, 7481, 562, 69, 1169, 2752, 1280, 4829, 6147, 3380, 8296, 6546, 8083, 9585, 8494, 6915, 3320, 5439, 3536, 5309, 8652, 5747, 7073, 9624, 2946, 8055, 5261, 3083, 505, 639, 2859, 3582, 1407, 4027, 4048, 8329, 5454, 8805, 4079, 3192, 4978, 1298, 4873, 697, 3270, 2169, 2966, 1446, 5704, 3144, 9919, 9887, 7336, 2703, 8306, 2263, 5683, 4604, 5871, 6629, 8992, 3031, 4592, 6960, 7192, 1844, 8916, 8910, 7162, 1250, 4043, 7624, 6961, 1695, 2386, 1086, 7790, 3364, 8543, 8013, 2980, 4342, 8500, 453, 5320, 6409, 2923, 1531, 4505, 4089, 7387, 6440, 8513, 8586, 9131, 1093, 2611, 2804, 7417, 4678, 8924, 8811, 2630, 36, 7957, 7247, 2554, 3052, 8584, 4969, 7557, 9030, 2890, 9299, 7128, 1016, 9978, 3131, 96, 295, 3446, 1571, 2067, 604, 5120, 2644, 693, 6377, 8815, 2473, 2722, 1530, 4299, 3136, 872, 5419, 4654, 5283, 5506, 3715, 1376, 2682, 4107, 4493, 9570, 8898, 3611, 800, 4652, 6895, 6730, 2477, 5802, 8596, 4439, 3061, 2658, 5254, 3940, 194, 5225, 3764, 7181, 8707, 8040, 7433, 3018, 123, 9087, 9019, 8189, 1837, 4200, 4123, 9456, 2465, 6562, 7352, 147, 8706, 9425, 1266, 4971, 8718, 3796, 8876, 6988, 3709, 5625, 3169, 5480, 6721, 274, 2036, 8236, 208, 7621, 805, 2903, 8663, 9986, 3404, 9463, 8103, 5047, 9355, 6120, 7207, 2112, 310, 4747, 3366, 6530, 9347, 7246, 4574, 912, 1555, 4755, 9674, 875, 9880, 4297, 4537, 2710, 8641, 1814, 535, 1870, 6033, 4327, 5011, 8549, 6199, 6676, 9913, 6483, 3470, 3096, 7152, 8338, 346, 9421, 8570, 8068, 9331, 7827, 7622, 472, 4514, 2605, 9647, 7864, 7021, 4797, 1144, 7324, 5835, 4435, 8542, 948, 9573, 7602, 52, 7805, 6322, 371, 3426, 3763, 1715, 2236, 1279, 9868, 9191, 7150, 4379, 4447, 4038, 667, 1265, 9630, 9814, 9592, 556, 8710, 6861, 5864, 114, 127, 4460, 4124, 1639, 2716, 4586, 8071, 109, 1766, 8121, 4120, 7955, 828, 14, 7154, 618, 9636, 9313, 8408, 4418, 1900, 3001, 9820, 811, 9386, 8941, 3681, 2442, 8752, 6477, 331, 597, 5087, 2475, 8951, 3040, 3607, 4305, 7730, 9953, 2991, 6795, 1454, 8171, 4494, 8120, 6554, 9521, 8487, 4240, 1064, 822, 5823, 5756, 5371, 3058, 7416, 788, 7498, 4352, 4142, 5719, 4091, 3316, 4578, 1708, 6888, 3977, 4820, 790, 9108, 430, 6924, 6152, 4409, 8383, 6734, 3601, 6316, 8643, 8903, 6607, 3679, 2921, 1792, 8456, 6913, 1297, 1091, 2083, 3203, 3033, 6159, 3839, 6934, 4688, 1758, 5082, 6251, 7856, 8717, 1315, 2329, 1187, 9631, 2333, 4041, 3840, 2429, 5730, 9152, 2614, 8461, 8544, 1176, 8759, 9598, 2041, 4377, 973, 3744, 4292, 8922, 5838, 6869, 3004, 2513, 1767, 3414, 9715, 9718, 8019, 5282, 9889, 3438, 975, 1081, 1780, 7909, 7376, 7605, 8708, 9723, 1457, 3666, 4594, 5727, 9294, 6487, 2508, 3600, 9350, 8751, 9121, 5927, 8203, 8250, 3591, 1917, 4513, 9287, 1041, 9342, 7370, 3402, 6871, 4551, 1632, 2413, 6183, 2291, 4176, 7006, 9771, 9568, 5760, 6657, 4239, 6198, 2610, 9207, 8692, 7089, 7866, 4236, 5713, 9385, 7001, 43, 1343, 7958, 2608, 9377, 5075, 3990, 5124, 9728, 8519, 4623, 5613, 7981, 1832, 275, 4375, 1044, 8573, 9782, 738, 421, 2999, 7511, 3888, 1337, 390, 1987, 8735, 2953, 4237, 5234, 9070, 1693, 7769, 2085, 8698, 762, 7563, 7443, 7271, 3989, 7524, 6083, 5398, 2575, 5367, 5686, 870, 7331, 5659, 3363, 3539, 9061, 7802, 9910, 7520, 1707, 2532, 4479, 5517, 7975, 6177, 3116, 653, 9677, 7721, 9969, 6772, 507, 1763, 168, 1924, 6364, 9877, 9485, 2250, 1497, 1395, 5158, 7233, 229, 4119, 8190, 9836, 9654, 3580, 9801, 7668, 2846, 8636, 6397, 3560, 2460, 8509, 6371, 4144, 8850, 5878, 5266, 1211, 9678, 2371, 1123, 8015, 1161, 3242, 2518, 5718, 4643, 314, 3784, 4249, 8554, 9102, 9524, 4795, 2167, 5346, 5644, 5954, 7312, 6770, 9688, 4425, 664, 7364, 5614, 512, 7634, 8812, 367, 3957, 9498, 2463, 6825, 2886, 8610, 5255, 8345, 5850, 4231, 5748, 6248, 6787, 345, 9713, 5923, 6843, 318, 8491, 9841, 9792, 1009, 9603, 2956, 106, 5937, 3650, 4842, 9756, 1995, 1940, 2227, 9619, 7013, 2542, 223, 3345, 4216, 6021, 3465, 3868, 7731, 5851, 7782, 6973, 3503, 4859, 7390, 207, 1208, 3012, 5006, 5505, 2797, 9558, 1109, 1662, 9555, 2996, 5744, 8734, 5005, 2838, 8555, 7132, 5200, 1906, 1757, 9096, 9639, 4884, 3491, 8433, 9591, 7385, 746, 1335, 6941, 5160, 3341, 4893, 4344, 2537, 2212, 4017, 6433, 1441, 7085, 1652, 8133, 3084, 9812, 9218, 3589, 8748, 2747, 4325, 3720, 228, 3238, 6299, 9821, 6565, 3646, 5086, 1477, 3859, 9003, 5642, 9086, 9774, 8535, 736, 8508, 8488, 7035, 9398, 8932, 6175, 8309, 5769, 7191, 22, 6469, 290, 4244, 4078, 8072, 214, 2835, 6737, 4691, 4726, 1291, 1242, 9172, 1559, 1236, 7899, 4882, 1195, 5576, 504, 6652, 3746]
data = sorted(data)
indl = 0
roun = 0
inp = int(input('Enter number to search:\n'))
if inp in data:
while len(data) > 1:
if len(data) % 2 == 0:
half = len(data) / 2
elif len(data) % 2 == 1:
half = int(round(float(len(data) / 2)))
half = int(half)
if inp < data[half]:
data = data[:half]
elif len(data) < 3 and inp == data[half]:
indl = indl + half
break
else:
while len(data) > half:
data.remove(data[0])
indl = indl + int(len(data))
else:
print ('\nNumber not found\n')
print ('The given number is on index number:\n', indl)

Related

How to create a histogram from counts with bins spaced every 0.1

I have the following dataframe:
df = {'count1': [2.2336, 2.2454, 2.2538, 2.2716999999999996, 2.2798000000000003, 2.2843, 2.2906, 2.2969, 2.3223000000000003, 2.3282, 2.3356999999999997, 2.3544, 2.3651999999999997, 2.3727, 2.3775, 2.3823000000000003, 2.392, 2.4051, 2.4092, 2.4133, 2.4168000000000003, 2.4175, 2.4209, 2.4392, 2.4476, 2.456, 2.461, 2.4723, 2.4776, 2.4882, 2.4989, 2.5095, 2.5221999999999998, 2.5318, 2.5422, 2.5494, 2.559, 2.5654, 2.5814, 2.5878, 2.6238, 2.6178000000000003, 2.624, 2.6303, 2.6366, 2.6425, 2.6481999999999997, 2.6525, 2.6553, 2.663, 2.6712, 2.6898, 2.7051, 2.7144, 2.727, 2.7416, 2.7472, 2.7512, 2.7557, 2.7574, 2.7594000000000003, 2.7636, 2.7699000000000003, 2.7761, 2.7809, 2.7855, 2.7902, 2.7948000000000004, 2.7995, 2.8043, 2.815, 2.8249, 2.8352, 2.8455, 2.8708, 2.8874, 2.9004000000000003, 2.9301, 2.9399, 2.9513000000000003, 2.9634, 2.9745999999999997, 2.9852, 2.9959000000000002, 3.0037, 3.0093, 3.015, 3.0184, 3.0206, 3.0225, 3.0245, 3.0264, 3.0282, 3.0305999999999997, 3.0331, 3.0334, 3.0361, 3.0388, 3.0418000000000003, 3.0443000000000002, 3.0463, 3.0464, 3.0481, 3.0496999999999996, 3.0514, 3.0530999999999997, 3.0544000000000002, 3.0556, 3.0569, 3.0581, 3.0623, 3.0627, 3.0633000000000004, 3.0638, 3.0643000000000002, 3.0648, 3.0652, 3.0656999999999996, 3.0663, 3.0675, 3.0682, 3.0688, 3.0695, 3.0702, 3.0721, 3.0741, 3.0761, 3.078, 3.08, 3.082, 3.0839000000000003, 3.0859, 3.0879000000000003, 3.0898000000000003, 3.0918, 3.0938000000000003, 3.0994, 3.1050999999999997, 3.1144000000000003, 3.1613, 3.1649000000000003, 3.1752, 3.1869, 3.1899, 3.1925, 3.1976, 3.2001, 3.2051999999999996, 3.2098, 3.2123000000000004],
'count2': [3144, 3944, 7888, 4428, 68874, 5480, 56697, 20560, 8744, 91190, 352, 924, 1308611, 480, 51146, 170373, 58792, 11424, 1288673, 1845105, 401464, 657930, 1361172, 199373, 19753, 39082, 776, 7533, 9289, 36731, 53865, 100140, 59274, 35740, 2648, 144998, 78616, 848241, 34579, 216591, 22512, 4024, 17168, 1552, 13760, 8344, 65589, 43104, 44672, 917115, 16256, 4168, 29679, 22571, 7720, 452, 8836, 6888, 18578, 5148, 9289, 442, 214, 485, 3164, 1101, 1010, 9048, 293, 1628, 960, 517, 2362, 1262, 1524, 1173, 1348, 1288, 25568, 8416, 5792, 4944, 504, 4696, 2336, 458, 453, 1220, 1149, 6688, 6956, 7324, 7100, 7784, 5650, 5076, 5336, 6792, 5212, 4592, 5260, 1279, 654, 842, 990, 782, 1412, 1363, 935, 996, 775, 1471, 1525, 1398, 1097, 1082, 1668, 1007, 497, 598, 645, 698, 541, 504, 549, 540, 1568, 514, 578, 2906, 4360, 3916, 11944, 1434, 1589, 732, 641, 477, 307, 1884, 3232, 2408, 1016, 332, 139, 344, 4784, 1784, 1324, 204]}
df = pd.DataFrame(df)
And I want to plot a barplot with it, where the x axis is count1 and the y axis count2, with bins spaced every 0.1 intervals.
I used this:
plt.bar(x=df['count1'], y=df['count2'], width=0.1)
But it returns me this error:
TypeError: bar() missing 1 required positional argument: 'height'
I'm trying to replicate an R code:
ggplot(df, aes(x= count1,
y= count2)) +
geom_col() +
ylim(0, 2000000) +
scale_x_binned()
That generates the following graph:
To get a histogram from values and counts, you can use the weights= parameter of plt.hist.
To create bins with a width of 0.1, you can use np.arange(...,..., 0.1).
The rwidth=0.9 parameter makes the bars a bit narrower.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = {'count1': [2.2336, 2.2454, 2.2538, 2.2716999999999996, 2.2798000000000003, 2.2843, 2.2906, 2.2969, 2.3223000000000003, 2.3282, 2.3356999999999997, 2.3544, 2.3651999999999997, 2.3727, 2.3775, 2.3823000000000003, 2.392, 2.4051, 2.4092, 2.4133, 2.4168000000000003, 2.4175, 2.4209, 2.4392, 2.4476, 2.456, 2.461, 2.4723, 2.4776, 2.4882, 2.4989, 2.5095, 2.5221999999999998, 2.5318, 2.5422, 2.5494, 2.559, 2.5654, 2.5814, 2.5878, 2.6238, 2.6178000000000003, 2.624, 2.6303, 2.6366, 2.6425, 2.6481999999999997, 2.6525, 2.6553, 2.663, 2.6712, 2.6898, 2.7051, 2.7144, 2.727, 2.7416, 2.7472, 2.7512, 2.7557, 2.7574, 2.7594000000000003, 2.7636, 2.7699000000000003, 2.7761, 2.7809, 2.7855, 2.7902, 2.7948000000000004, 2.7995, 2.8043, 2.815, 2.8249, 2.8352, 2.8455, 2.8708, 2.8874, 2.9004000000000003, 2.9301, 2.9399, 2.9513000000000003, 2.9634, 2.9745999999999997, 2.9852, 2.9959000000000002, 3.0037, 3.0093, 3.015, 3.0184, 3.0206, 3.0225, 3.0245, 3.0264, 3.0282, 3.0305999999999997, 3.0331, 3.0334, 3.0361, 3.0388, 3.0418000000000003, 3.0443000000000002, 3.0463, 3.0464, 3.0481, 3.0496999999999996, 3.0514, 3.0530999999999997, 3.0544000000000002, 3.0556, 3.0569, 3.0581, 3.0623, 3.0627, 3.0633000000000004, 3.0638, 3.0643000000000002, 3.0648, 3.0652, 3.0656999999999996, 3.0663, 3.0675, 3.0682, 3.0688, 3.0695, 3.0702, 3.0721, 3.0741, 3.0761, 3.078, 3.08, 3.082, 3.0839000000000003, 3.0859, 3.0879000000000003, 3.0898000000000003, 3.0918, 3.0938000000000003, 3.0994, 3.1050999999999997, 3.1144000000000003, 3.1613, 3.1649000000000003, 3.1752, 3.1869, 3.1899, 3.1925, 3.1976, 3.2001, 3.2051999999999996, 3.2098, 3.2123000000000004],
'count2': [3144, 3944, 7888, 4428, 68874, 5480, 56697, 20560, 8744, 91190, 352, 924, 1308611, 480, 51146, 170373, 58792, 11424, 1288673, 1845105, 401464, 657930, 1361172, 199373, 19753, 39082, 776, 7533, 9289, 36731, 53865, 100140, 59274, 35740, 2648, 144998, 78616, 848241, 34579, 216591, 22512, 4024, 17168, 1552, 13760, 8344, 65589, 43104, 44672, 917115, 16256, 4168, 29679, 22571, 7720, 452, 8836, 6888, 18578, 5148, 9289, 442, 214, 485, 3164, 1101, 1010, 9048, 293, 1628, 960, 517, 2362, 1262, 1524, 1173, 1348, 1288, 25568, 8416, 5792, 4944, 504, 4696, 2336, 458, 453, 1220, 1149, 6688, 6956, 7324, 7100, 7784, 5650, 5076, 5336, 6792, 5212, 4592, 5260, 1279, 654, 842, 990, 782, 1412, 1363, 935, 996, 775, 1471, 1525, 1398, 1097, 1082, 1668, 1007, 497, 598, 645, 698, 541, 504, 549, 540, 1568, 514, 578, 2906, 4360, 3916, 11944, 1434, 1589, 732, 641, 477, 307, 1884, 3232, 2408, 1016, 332, 139, 344, 4784, 1784, 1324, 204]}
df = pd.DataFrame(df)
bin_start = np.trunc(df['count1'].min() * 10) / 10
bin_end = df['count1'].max() + 0.1
plt.style.use('ggplot')
plt.hist(x=df['count1'], weights=df['count2'], bins=np.arange(bin_start, bin_end, 0.1), rwidth=0.9)
plt.gca().get_yaxis().get_major_formatter().set_scientific(False)
plt.xlabel('count1')
plt.ylabel('count2')
plt.tight_layout()
plt.show()

Sorting of Co-ordinates and categorize by the position of co-ordinates and not by the score

python
coords score
[1018, 370, 1345, 370, 1345, 699, 1018, 699, 1018, 370] 0.9988
[1344, 366, 1669, 366, 1669, 690, 1344, 690, 1344, 366] 0.9985
[1341, 688, 1669, 688, 1669, 1012, 1341, 1012, 1341, 688] 0.9985
[2643, 49, 2972, 49, 2972, 362, 2643, 362, 2643, 49] 0.9984
[1018, 1020, 1341, 1020, 1341, 1342, 1018, 1342, 1018, 1020] 0.9984
[2321, 371, 2651, 371, 2651, 696, 2321, 696, 2321, 371] 0.9984
[2970, 1018, 3296, 1018, 3296, 1345, 2970, 1345, 2970, 1018] 0.9984
[1016, 696, 1342, 696, 1342, 1011, 1016, 1011, 1016, 696] 0.9984
[697, 371, 1020, 371, 1020, 693, 697, 693, 697, 371] 0.9984
[1341, 1017, 1668, 1017, 1668, 1348, 1341, 1348, 1341, 1017] 0.9984
[2975, 366, 3300, 366, 3300, 686, 2975, 686, 2975, 366] 0.9984
[2319, 701, 2645, 701, 2645, 1017, 2319, 1017, 2319, 701] 0.9984
[2976, 51, 3298, 51, 3298, 363, 2976, 363, 2976, 51] 0.9984
[2645, 1349, 2971, 1349, 2971, 1665, 2645, 1665, 2645, 1349] 0.9983
[2972, 1659, 3295, 1659, 3295, 1991, 2972, 1991, 2972, 1659] 0.9983
[1013, 1343, 1343, 1343, 1343, 1671, 1013, 1671, 1013, 1343] 0.9983
[3298, 47, 3619, 47, 3619, 359, 3298, 359, 3298, 47] 0.9983
[1676, 367, 1999, 367, 1999, 690, 1676, 690, 1676, 367] 0.9983
[2323, 50, 2644, 50, 2644, 366, 2323, 366, 2323, 50] 0.9983
[2000, 371, 2326, 371, 2326, 691, 2000, 691, 2000, 371] 0.9983
[2650, 372, 2971, 372, 2971, 690, 2650, 690, 2650, 372] 0.9983
[2972, 1348, 3298, 1348, 3298, 1664, 2972, 1664, 2972, 1348] 0.9982
[1019, 1671, 1344, 1671, 1344, 1986, 1019, 1986, 1019, 1671] 0.9982
[2648, 1021, 2971, 1021, 2971, 1340, 2648, 1340, 2648, 1021] 0.9982
[695, 690, 1017, 690, 1017, 1015, 695, 1015, 695, 690] 0.9982
[1998, 52, 2323, 52, 2323, 365, 1998, 365, 1998, 52] 0.9982
[1021, 49, 1342, 49, 1342, 361, 1021, 361, 1021, 49] 0.9982
[2317, 1344, 2645, 1344, 2645, 1666, 2317, 1666, 2317, 1344] 0.9982
[1343, 1670, 1667, 1670, 1667, 1988, 1343, 1988, 1343, 1670] 0.9982
[692, 47, 1019, 47, 1019, 364, 692, 364, 692, 47] 0.9982
[370, 370, 695, 370, 695, 695, 370, 695, 370, 370] 0.9981
[1344, 1347, 1674, 1347, 1674, 1673, 1344, 1673, 1344, 1347] 0.9981
[1670, 53, 1992, 53, 1992, 369, 1670, 369, 1670, 53] 0.9981
[1345, 51, 1667, 51, 1667, 365, 1345, 365, 1345, 51] 0.9981
[3301, 364, 3623, 364, 3623, 692, 3301, 692, 3301, 364] 0.9981
[2646, 692, 2973, 692, 2973, 1014, 2646, 1014, 2646, 692] 0.9981
[1672, 689, 1995, 689, 1995, 1015, 1672, 1015, 1672, 689] 0.9981
[374, 696, 695, 696, 695, 1017, 374, 1017, 374, 696] 0.9980
[1994, 695, 2323, 695, 2323, 1022, 1994, 1022, 1994, 695] 0.9980
[2321, 1667, 2645, 1667, 2645, 1993, 2321, 1993, 2321, 1667] 0.9980
[3300, 694, 3619, 694, 3619, 1016, 3300, 1016, 3300, 694] 0.9980
[372, 1021, 694, 1021, 694, 1337, 372, 1337, 372, 1021] 0.9980
[370, 1671, 691, 1671, 691, 1991, 370, 1991, 370, 1671] 0.9979
[2641, 1671, 2971, 1671, 2971, 1985, 2641, 1985, 2641, 1671] 0.9979
[2315, 1017, 2644, 1017, 2644, 1343, 2315, 1343, 2315, 1017] 0.9979
[694, 1022, 1016, 1022, 1016, 1339, 694, 1339, 694, 1022] 0.9979
[2000, 1672, 2322, 1672, 2322, 1994, 2000, 1994, 2000, 1672] 0.9978
[367, 50, 690, 50, 690, 365, 367, 365, 367, 50] 0.9978
[371, 1339, 692, 1339, 692, 1671, 371, 1671, 371, 1339] 0.9978
[691, 1341, 1016, 1341, 1016, 1668, 691, 1668, 691, 1341] 0.9977
[1996, 1350, 2319, 1350, 2319, 1675, 1996, 1675, 1996, 1350] 0.9977
[1673, 1020, 1996, 1020, 1996, 1348, 1673, 1348, 1673, 1020] 0.9976
[692, 1670, 1019, 1670, 1019, 1989, 692, 1989, 692, 1670] 0.9976
[2000, 1023, 2322, 1023, 2322, 1349, 2000, 1349, 2000, 1023] 0.9976
[1675, 1347, 1995, 1347, 1995, 1671, 1675, 1671, 1675, 1347] 0.9975
[3295, 1344, 3618, 1344, 3618, 1673, 3295, 1673, 3295, 1344] 0.9975
[1673, 1671, 1992, 1671, 1992, 1989, 1673, 1989, 1673, 1671] 0.9975
[3297, 1017, 3617, 1017, 3617, 1340, 3297, 1340, 3297, 1017] 0.9974
[3300, 1673, 3622, 1673, 3622, 1990, 3300, 1990, 3300, 1673] 0.9973
[3620, 51, 3940, 51, 3940, 361, 3620, 361, 3620, 51] 0.9972
[3625, 368, 3947, 368, 3947, 689, 3625, 689, 3625, 368] 0.9969
[3622, 699, 3944, 699, 3944, 1013, 3622, 1013, 3622, 699] 0.9969
[43, 697, 371, 697, 371, 1011, 43, 1011, 43, 697] 0.9967
[43, 1021, 372, 1021, 372, 1342, 43, 1342, 43, 1021] 0.9966
[3622, 1667, 3942, 1667, 3942, 1990, 3622, 1990, 3622, 1667] 0.9961
[3619, 1021, 3938, 1021, 3938, 1339, 3619, 1339, 3619, 1021] 0.9960
[45, 378, 372, 378, 372, 689, 45, 689, 45, 378] 0.9959
[3623, 1348, 3946, 1348, 3946, 1671, 3623, 1671, 3623, 1348] 0.9958
[46, 1667, 372, 1667, 372, 1989, 46, 1989, 46, 1667] 0.9957
[41, 1351, 367, 1351, 367, 1671, 41, 1671, 41, 1351] 0.9957
[43, 49, 370, 49, 370, 362, 43, 362, 43, 49] 0.9957
[2972, 695, 3299, 695, 3299, 1011, 2972, 1011, 2972, 695] 0.9638
Here is the DataFrame that I have. There are 72 rows x 2 columns. (The above DataFrame is the snippet of the DataFrame. If you count the number of cells in this Electroluminescence (EL) image of a solar module you'll note that it has 72 photovoltaic cells.
Column 'coords' has the co-ordinates of the each polygon segment in the image.
Column 'score' is the accuracy score of the tile ( whether placed in the desired position ) corresponding to the co ordinates. The score is not important but is an output of the model.
Let me explain where these polycoordinate segmentations are coming from...
I have designed an image segmentation model which outputs the coordinate arrays above but I have been asked to tag each segmented cell with a an (x,y) identity.
The image segmentation model has no concept of location so the initial result set is sorted by the probability that a cell has been correctly identified.
Now consider the top left tile as (0,0), Move to the right one cell and that will be tagged as (0,1). Move down 1 cell from there and you would tag that cell (1,1) etc...
Basically: How can I process these poly coordinates and end up with the (x,y) identity of each cell?

Python - Removing Multiples and Finding Prime Numbers

I am starting to learn Python, and I chose to do this problem I found online (as I am just starting to do lists):
"Write a program that adds all integers from 2 to 100,000 to a list. Then, remove the multiples of 2 (but not 2), multiples of 3 (but not 3), and so on, up to and including the multiples of 100,000. Print the remaining values."
So far, the code below is what I have. I have gotten it to work up to 10,000 as a base, but not farther:
# FUNCTIONS
def fillList(list):
for i in range(2, 10001):
list.append(i)
return list
def removeMultiples(list):
for i in range(10000, 1, -1):
remove = False
for j in range(100, 1, -1):
if i != j and i % j == 0:
remove = True
if remove == True:
list.remove(i)
return list
# main
def main():
exampleList = []
exampleList = fillList(exampleList)
print(removeMultiples(exampleList))
# PROGRAM RUN
main()
This is the output:
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997, 1009, 1013, 1019, 1021, 1031, 1033, 1039, 1049, 1051, 1061, 1063, 1069, 1087, 1091, 1093, 1097, 1103, 1109, 1117, 1123, 1129, 1151, 1153, 1163, 1171, 1181, 1187, 1193, 1201, 1213, 1217, 1223, 1229, 1231, 1237, 1249, 1259, 1277, 1279, 1283, 1289, 1291, 1297, 1301, 1303, 1307, 1319, 1321, 1327, 1361, 1367, 1373, 1381, 1399, 1409, 1423, 1427, 1429, 1433, 1439, 1447, 1451, 1453, 1459, 1471, 1481, 1483, 1487, 1489, 1493, 1499, 1511, 1523, 1531, 1543, 1549, 1553, 1559, 1567, 1571, 1579, 1583, 1597, 1601, 1607, 1609, 1613, 1619, 1621, 1627, 1637, 1657, 1663, 1667, 1669, 1693, 1697, 1699, 1709, 1721, 1723, 1733, 1741, 1747, 1753, 1759, 1777, 1783, 1787, 1789, 1801, 1811, 1823, 1831, 1847, 1861, 1867, 1871, 1873, 1877, 1879, 1889, 1901, 1907, 1913, 1931, 1933, 1949, 1951, 1973, 1979, 1987, 1993, 1997, 1999, 2003, 2011, 2017, 2027, 2029, 2039, 2053, 2063, 2069, 2081, 2083, 2087, 2089, 2099, 2111, 2113, 2129, 2131, 2137, 2141, 2143, 2153, 2161, 2179, 2203, 2207, 2213, 2221, 2237, 2239, 2243, 2251, 2267, 2269, 2273, 2281, 2287, 2293, 2297, 2309, 2311, 2333, 2339, 2341, 2347, 2351, 2357, 2371, 2377, 2381, 2383, 2389, 2393, 2399, 2411, 2417, 2423, 2437, 2441, 2447, 2459, 2467, 2473, 2477, 2503, 2521, 2531, 2539, 2543, 2549, 2551, 2557, 2579, 2591, 2593, 2609, 2617, 2621, 2633, 2647, 2657, 2659, 2663, 2671, 2677, 2683, 2687, 2689, 2693, 2699, 2707, 2711, 2713, 2719, 2729, 2731, 2741, 2749, 2753, 2767, 2777, 2789, 2791, 2797, 2801, 2803, 2819, 2833, 2837, 2843, 2851, 2857, 2861, 2879, 2887, 2897, 2903, 2909, 2917, 2927, 2939, 2953, 2957, 2963, 2969, 2971, 2999, 3001, 3011, 3019, 3023, 3037, 3041, 3049, 3061, 3067, 3079, 3083, 3089, 3109, 3119, 3121, 3137, 3163, 3167, 3169, 3181, 3187, 3191, 3203, 3209, 3217, 3221, 3229, 3251, 3253, 3257, 3259, 3271, 3299, 3301, 3307, 3313, 3319, 3323, 3329, 3331, 3343, 3347, 3359, 3361, 3371, 3373, 3389, 3391, 3407, 3413, 3433, 3449, 3457, 3461, 3463, 3467, 3469, 3491, 3499, 3511, 3517, 3527, 3529, 3533, 3539, 3541, 3547, 3557, 3559, 3571, 3581, 3583, 3593, 3607, 3613, 3617, 3623, 3631, 3637, 3643, 3659, 3671, 3673, 3677, 3691, 3697, 3701, 3709, 3719, 3727, 3733, 3739, 3761, 3767, 3769, 3779, 3793, 3797, 3803, 3821, 3823, 3833, 3847, 3851, 3853, 3863, 3877, 3881, 3889, 3907, 3911, 3917, 3919, 3923, 3929, 3931, 3943, 3947, 3967, 3989, 4001, 4003, 4007, 4013, 4019, 4021, 4027, 4049, 4051, 4057, 4073, 4079, 4091, 4093, 4099, 4111, 4127, 4129, 4133, 4139, 4153, 4157, 4159, 4177, 4201, 4211, 4217, 4219, 4229, 4231, 4241, 4243, 4253, 4259, 4261, 4271, 4273, 4283, 4289, 4297, 4327, 4337, 4339, 4349, 4357, 4363, 4373, 4391, 4397, 4409, 4421, 4423, 4441, 4447, 4451, 4457, 4463, 4481, 4483, 4493, 4507, 4513, 4517, 4519, 4523, 4547, 4549, 4561, 4567, 4583, 4591, 4597, 4603, 4621, 4637, 4639, 4643, 4649, 4651, 4657, 4663, 4673, 4679, 4691, 4703, 4721, 4723, 4729, 4733, 4751, 4759, 4783, 4787, 4789, 4793, 4799, 4801, 4813, 4817, 4831, 4861, 4871, 4877, 4889, 4903, 4909, 4919, 4931, 4933, 4937, 4943, 4951, 4957, 4967, 4969, 4973, 4987, 4993, 4999, 5003, 5009, 5011, 5021, 5023, 5039, 5051, 5059, 5077, 5081, 5087, 5099, 5101, 5107, 5113, 5119, 5147, 5153, 5167, 5171, 5179, 5189, 5197, 5209, 5227, 5231, 5233, 5237, 5261, 5273, 5279, 5281, 5297, 5303, 5309, 5323, 5333, 5347, 5351, 5381, 5387, 5393, 5399, 5407, 5413, 5417, 5419, 5431, 5437, 5441, 5443, 5449, 5471, 5477, 5479, 5483, 5501, 5503, 5507, 5519, 5521, 5527, 5531, 5557, 5563, 5569, 5573, 5581, 5591, 5623, 5639, 5641, 5647, 5651, 5653, 5657, 5659, 5669, 5683, 5689, 5693, 5701, 5711, 5717, 5737, 5741, 5743, 5749, 5779, 5783, 5791, 5801, 5807, 5813, 5821, 5827, 5839, 5843, 5849, 5851, 5857, 5861, 5867, 5869, 5879, 5881, 5897, 5903, 5923, 5927, 5939, 5953, 5981, 5987, 6007, 6011, 6029, 6037, 6043, 6047, 6053, 6067, 6073, 6079, 6089, 6091, 6101, 6113, 6121, 6131, 6133, 6143, 6151, 6163, 6173, 6197, 6199, 6203, 6211, 6217, 6221, 6229, 6247, 6257, 6263, 6269, 6271, 6277, 6287, 6299, 6301, 6311, 6317, 6323, 6329, 6337, 6343, 6353, 6359, 6361, 6367, 6373, 6379, 6389, 6397, 6421, 6427, 6449, 6451, 6469, 6473, 6481, 6491, 6521, 6529, 6547, 6551, 6553, 6563, 6569, 6571, 6577, 6581, 6599, 6607, 6619, 6637, 6653, 6659, 6661, 6673, 6679, 6689, 6691, 6701, 6703, 6709, 6719, 6733, 6737, 6761, 6763, 6779, 6781, 6791, 6793, 6803, 6823, 6827, 6829, 6833, 6841, 6857, 6863, 6869, 6871, 6883, 6899, 6907, 6911, 6917, 6947, 6949, 6959, 6961, 6967, 6971, 6977, 6983, 6991, 6997, 7001, 7013, 7019, 7027, 7039, 7043, 7057, 7069, 7079, 7103, 7109, 7121, 7127, 7129, 7151, 7159, 7177, 7187, 7193, 7207, 7211, 7213, 7219, 7229, 7237, 7243, 7247, 7253, 7283, 7297, 7307, 7309, 7321, 7331, 7333, 7349, 7351, 7369, 7393, 7411, 7417, 7433, 7451, 7457, 7459, 7477, 7481, 7487, 7489, 7499, 7507, 7517, 7523, 7529, 7537, 7541, 7547, 7549, 7559, 7561, 7573, 7577, 7583, 7589, 7591, 7603, 7607, 7621, 7639, 7643, 7649, 7669, 7673, 7681, 7687, 7691, 7699, 7703, 7717, 7723, 7727, 7741, 7753, 7757, 7759, 7789, 7793, 7817, 7823, 7829, 7841, 7853, 7867, 7873, 7877, 7879, 7883, 7901, 7907, 7919, 7927, 7933, 7937, 7949, 7951, 7963, 7993, 8009, 8011, 8017, 8039, 8053, 8059, 8069, 8081, 8087, 8089, 8093, 8101, 8111, 8117, 8123, 8147, 8161, 8167, 8171, 8179, 8191, 8209, 8219, 8221, 8231, 8233, 8237, 8243, 8263, 8269, 8273, 8287, 8291, 8293, 8297, 8311, 8317, 8329, 8353, 8363, 8369, 8377, 8387, 8389, 8419, 8423, 8429, 8431, 8443, 8447, 8461, 8467, 8501, 8513, 8521, 8527, 8537, 8539, 8543, 8563, 8573, 8581, 8597, 8599, 8609, 8623, 8627, 8629, 8641, 8647, 8663, 8669, 8677, 8681, 8689, 8693, 8699, 8707, 8713, 8719, 8731, 8737, 8741, 8747, 8753, 8761, 8779, 8783, 8803, 8807, 8819, 8821, 8831, 8837, 8839, 8849, 8861, 8863, 8867, 8887, 8893, 8923, 8929, 8933, 8941, 8951, 8963, 8969, 8971, 8999, 9001, 9007, 9011, 9013, 9029, 9041, 9043, 9049, 9059, 9067, 9091, 9103, 9109, 9127, 9133, 9137, 9151, 9157, 9161, 9173, 9181, 9187, 9199, 9203, 9209, 9221, 9227, 9239, 9241, 9257, 9277, 9281, 9283, 9293, 9311, 9319, 9323, 9337, 9341, 9343, 9349, 9371, 9377, 9391, 9397, 9403, 9413, 9419, 9421, 9431, 9433, 9437, 9439, 9461, 9463, 9467, 9473, 9479, 9491, 9497, 9511, 9521, 9533, 9539, 9547, 9551, 9587, 9601, 9613, 9619, 9623, 9629, 9631, 9643, 9649, 9661, 9677, 9679, 9689, 9697, 9719, 9721, 9733, 9739, 9743, 9749, 9767, 9769, 9781, 9787, 9791, 9803, 9811, 9817, 9829, 9833, 9839, 9851, 9857, 9859, 9871, 9883, 9887, 9901, 9907, 9923, 9929, 9931, 9941, 9949, 9967, 9973]
What could I change in my code so that it will print up to 100,000?
There is no integer limit in Python so what you're doing should work fine depending on your computer's memory and your own patience. If it works for 10,000 it will work for 100,000 given your logic. IMO, if you're just learning I suggest moving on to a new problem, you've basically already solved this one.
What could I change in my code so that it will print up to 100,000?
There are a few issues to consider. First, just changing the 10,000's to 100,000's is obviously insuficient. There's that 100 in there that represents the square root of 10,000 so that needs to become 316 or so, i.e. the square root of 100,000. Which leads to a larger issue, not hardcoding these values!
Given the (inefficient) algorithm you're using, the jump from 10,000 to 100,000 means an execution time jump from a second to nearly a minute and a half! So, your code may be fine with the larger range but possibly your patience isn't, and you forcibly quit the program before it has a chance to complete.
Let's address the above by making the code adjust itself to the range, simplify some of the logic, and make it more efficient:
def fillList(n):
return list(range(2, n + 1))
def removeMultiples(numbers): # don't use *list* as a variable name!
for i in range(len(numbers) -1, -1, -1):
number = numbers[i]
for j in range(int(number ** 0.5), numbers[0] - 1, -1):
if number % j == 0:
del numbers[i]
break
return numbers
def main():
exampleList = fillList(100000)
print(removeMultiples(exampleList))
main()
By deleting our unwanted numbers by index, instead of by value which requires a search, we get the time down from nearly a minute and a half to just over one second:
> time python3 test.py
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67,
71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149,
151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229,
233, 239, 241, 251, ..., 99563, 99571, 99577, 99581, 99607, 99611, 99623,
99643, 99661, 99667, 99679, 99689, 99707, 99709, 99713, 99719, 99721,
99733, 99761, 99767, 99787, 99793, 99809, 99817, 99823, 99829, 99833,
99839, 99859, 99871, 99877, 99881, 99901, 99907, 99923, 99929, 99961,
99971, 99989, 99991]
1.153u 0.015s 0:01.18 98.3% 0+0k 0+0io 0pf+0w
>

Can't get the pandas dataframe output without index

signals is a pandas dataframe where I store buy and sell signals. With the getbuySignal function, I can get on which date a buy signals has been generated, and store these date in vector_buy array. Then with theget_closefunction, I want to get the close price(from thesh600004dataframe) of the 20days before each 'buy day' and store them in vector_close.
I printed thevector_close to check whether my code is correct. But I got very odd output. The output for the if i<20 part in theget_close function is an array contains only close prices. But the output for else part contains both close prince and datetime index. Like what shows in the picture at the bottom.
vector_buy = []
vector_close = []
def get_buySignal():
list_buy = []
for i in range(0, len(signals.index)):
global vector_buy
if signals['positions'][i]==1.0:
list_buy.append(i)
vector_buy = np.array(list_buy)
def get_close():
close_list = []
for i in vector_buy:
global vector_close
if i < 20:
close_list.append(sh600004['close'][range(0,i)])
vector_close = np.array(close_list)
print vector_close
else:
close_list.append(sh600004['close'][range(i-19,i)])
vector_close = np.array(close_list)
print vector_close
get_buySignals()
get_close()
Here's the output forbuy_vector
array([ 10, 37, 47, 57, 82, 94, 102, 148, 165, 179, 188,
201, 248, 260, 270, 272, 290, 299, 331, 350, 361, 373,
409, 417, 435, 449, 457, 465, 491, 511, 527, 536, 555,
571, 592, 609, 634, 661, 672, 706, 718, 735, 776, 794,
807, 838, 854, 870, 890, 907, 915, 934, 969, 1004, 1013,
1020, 1032, 1034, 1039, 1050, 1099, 1116, 1140, 1157, 1202, 1214,
1228, 1238, 1257, 1276, 1297, 1311, 1319, 1347, 1376, 1379, 1389,
1406, 1425, 1455, 1460, 1478, 1492, 1518, 1533, 1545, 1559, 1574,
1590, 1615, 1627, 1657, 1683, 1692, 1704, 1731, 1742, 1758, 1775,
1795, 1824, 1836, 1852, 1864, 1905, 1913, 1950, 1966, 1986, 1999,
2005, 2020, 2046, 2079, 2088, 2108, 2113, 2124, 2145, 2154, 2166,
2178, 2218, 2234, 2244, 2251, 2302, 2309, 2311, 2324, 2339, 2351,
2372, 2387, 2397, 2408, 2422, 2446, 2462])
Maybe you could optimize your code and use the power of Pandas to solve your problem. If I got it right, you'd like to get all the Indizes of Signals where signal['positions']==1. So you should'nt use the for-loop but try this:
list_buy = signal[signal['positions'] == 1].index
converting this in an np.array is easy:
vector_buy = list_buy.as_matrix()
For getting the last 20 entries in vector_close, try something like this:
vector_close = sh600004['close'].tail(20)
or
vector_close = sh600004['close'].head(20)

Why my python multiprocessing code return the same result in randomized number? [duplicate]

This question already has answers here:
Python Multiprocessing Numpy Random [duplicate]
(2 answers)
Closed 7 years ago.
I'm analyzing a large graph. So, I divide the graph into chunks and hopefully with multi-core CPU it would be faster. However, my model is a randomized model so there's a chance that the results of each run won't be the same. I'm testing the idea and I get the same result all the time so I'm wondering if my code is correct.
Here's my code
from multiprocessing import Process, Queue
# split a list into evenly sized chunks
def chunks(l, n):
return [l[i:i+n] for i in range(0, len(l), n)]
def multiprocessing_icm(queue, nodes):
queue.put(independent_cascade_igraph(twitter_igraph, nodes, steps=1))
def dispatch_jobs(data, job_number):
total = len(data)
chunk_size = total / job_number
slice = chunks(data, chunk_size)
jobs = []
processes = []
queue = Queue()
for i, s in enumerate(slice):
j = Process(target=multiprocessing_icm, args=(queue, s))
jobs.append(j)
for j in jobs:
j.start()
for j in jobs:
j.join()
return queue
dispatch_jobs(['121817564', '121817564'], 2)
if you're wondering what independent_cascade_igraph is. Here's the code
def independent_cascade_igraph(G, seeds, steps=0):
# init activation probabilities
for e in G.es():
if 'act_prob' not in e.attributes():
e['act_prob'] = 0.1
elif e['act_prob'] > 1:
raise Exception("edge activation probability:", e['act_prob'], "cannot be larger than 1")
# perform diffusion
A = copy.deepcopy(seeds) # prevent side effect
if steps <= 0:
# perform diffusion until no more nodes can be activated
return _diffuse_all(G, A)
# perform diffusion for at most "steps" rounds
return _diffuse_k_rounds(G, A, steps)
def _diffuse_all(G, A):
tried_edges = set()
layer_i_nodes = [ ]
layer_i_nodes.append([i for i in A]) # prevent side effect
while True:
len_old = len(A)
(A, activated_nodes_of_this_round, cur_tried_edges) = _diffuse_one_round(G, A, tried_edges)
layer_i_nodes.append(activated_nodes_of_this_round)
tried_edges = tried_edges.union(cur_tried_edges)
if len(A) == len_old:
break
return layer_i_nodes
def _diffuse_k_rounds(G, A, steps):
tried_edges = set()
layer_i_nodes = [ ]
layer_i_nodes.append([i for i in A])
while steps > 0 and len(A) < G.vcount():
len_old = len(A)
(A, activated_nodes_of_this_round, cur_tried_edges) = _diffuse_one_round(G, A, tried_edges)
layer_i_nodes.append(activated_nodes_of_this_round)
tried_edges = tried_edges.union(cur_tried_edges)
if len(A) == len_old:
break
steps -= 1
return layer_i_nodes
def _diffuse_one_round(G, A, tried_edges):
activated_nodes_of_this_round = set()
cur_tried_edges = set()
for s in A:
for nb in G.successors(s):
if nb in A or (s, nb) in tried_edges or (s, nb) in cur_tried_edges:
continue
if _prop_success(G, s, nb):
activated_nodes_of_this_round.add(nb)
cur_tried_edges.add((s, nb))
activated_nodes_of_this_round = list(activated_nodes_of_this_round)
A.extend(activated_nodes_of_this_round)
return A, activated_nodes_of_this_round, cur_tried_edges
def _prop_success(G, src, dest):
'''
act_prob = 0.1
for e in G.es():
if (src, dest) == e.tuple:
act_prob = e['act_prob']
break
'''
return random.random() <= 0.1
Here's the result of multiprocessing
[['121817564'], [1538, 1539, 4, 517, 1547, 528, 2066, 1623, 1540, 538, 1199, 31, 1056, 1058, 547, 1061, 1116, 1067, 1069, 563, 1077, 1591, 1972, 1595, 1597, 1598, 1088, 1090, 1608, 1656, 1098, 1463, 1105, 1619, 1622, 1111, 601, 1627, 604, 1629, 606, 95, 612, 101, 1980, 618, 1652, 1897, 1144, 639, 640, 641, 647, 650, 1815, 1677, 143, 1170, 1731, 660, 1173, 1690, 1692, 1562, 1563, 1189, 1702, 687, 689, 1203, 1205, 1719, 703, 1219, 1229, 1744, 376, 1746, 211, 1748, 213, 1238, 218, 221, 735, 227, 1764, 741, 230, 1769, 1258, 1780, 1269, 1783, 761, 763, 1788, 1789, 1287, 769, 258, 1286, 263, 264, 780, 1298, 1299, 1812, 473, 1822, 1828, 806, 811, 1324, 814, 304, 478, 310, 826, 1858, 1349, 326, 327, 1352, 329, 1358, 336, 852, 341, 854, 1879, 1679, 868, 2022, 1385, 1902, 1904, 881, 1907, 1398, 1911, 888, 1940, 1402, 1941, 1920, 1830, 387, 1942, 905, 1931, 1411, 399, 1426, 915, 916, 917, 406, 407, 1433, 1947, 1441, 419, 1445, 1804, 428, 1454, 1455, 948, 1973, 951, 1466, 443, 1468, 1471, 1474, 1988, 966, 1479, 1487, 976, 467, 1870, 2007, 985, 1498, 990, 1504, 1124, 485, 486, 489, 492, 2029, 2033, 1524, 1534, 2038, 1018, 1535, 510, 1125]]
[['121817564'], [1538, 1539, 4, 517, 1547, 528, 2066, 1623, 1540, 538, 1199, 31, 1056, 1058, 547, 1061, 1116, 1067, 1069, 563, 1077, 1591, 1972, 1595, 1597, 1598, 1088, 1090, 1608, 1656, 1098, 1463, 1105, 1619, 1622, 1111, 601, 1627, 604, 1629, 606, 95, 612, 101, 1980, 618, 1652, 1897, 1144, 639, 640, 641, 647, 650, 1815, 1677, 143, 1170, 1731, 660, 1173, 1690, 1692, 1562, 1563, 1189, 1702, 687, 689, 1203, 1205, 1719, 703, 1219, 1229, 1744, 376, 1746, 211, 1748, 213, 1238, 218, 221, 735, 227, 1764, 741, 230, 1769, 1258, 1780, 1269, 1783, 761, 763, 1788, 1789, 1287, 769, 258, 1286, 263, 264, 780, 1298, 1299, 1812, 473, 1822, 1828, 806, 811, 1324, 814, 304, 478, 310, 826, 1858, 1349, 326, 327, 1352, 329, 1358, 336, 852, 341, 854, 1879, 1679, 868, 2022, 1385, 1902, 1904, 881, 1907, 1398, 1911, 888, 1940, 1402, 1941, 1920, 1830, 387, 1942, 905, 1931, 1411, 399, 1426, 915, 916, 917, 406, 407, 1433, 1947, 1441, 419, 1445, 1804, 428, 1454, 1455, 948, 1973, 951, 1466, 443, 1468, 1471, 1474, 1988, 966, 1479, 1487, 976, 467, 1870, 2007, 985, 1498, 990, 1504, 1124, 485, 486, 489, 492, 2029, 2033, 1524, 1534, 2038, 1018, 1535, 510, 1125]]
But here's the example if I run indepedent_cascade_igraph twice
independent_cascade_igraph(twitter_igraph, ['121817564'], steps=1)
[['121817564'],
[514,
1773,
1540,
1878,
2057,
1035,
1550,
2064,
1042,
533,
1558,
1048,
1054,
544,
545,
1061,
1067,
1885,
1072,
350,
1592,
1460,...
independent_cascade_igraph(twitter_igraph, ['121817564'], steps=1)
[['121817564'],
[1027,
2055,
8,
1452,
1546,
1038,
532,
1045,
542,
546,
1059,
549,
1575,
1576,
2030,
1067,
1068,
1071,
564,
573,
575,
1462,
584,
1293,
1105,
595,
599,
1722,
1633,
1634,
614,
1128,
1131,
1286,
621,
1647,
1648,
627,
636,
1662,
1664,
1665,
130,
1671,
1677,
656,
1169,
148,
1686,
1690,
667,
1186,
163,
1700,
1191,
1705,
1711,...
So, what I'm hoping to get out of this is if I have a list of 500 ids, I would like the first CPU to calculate the first 250 and the second CPU to calculate the last 250 and then merge the result. I'm not sure if I understand multiprocessing correctly.
As mentioned e.g. in this SO answer, in *nix child processes inherit the state of the RNG. Call random.seed() in every child process to initialize it yourself to a per-process seed, or randomly.
Haven't read your program in detail but my general feeling is that you probably have a random number generator seed problem. If you run twice the program on the same CPU the random number generator's state will be different the second time you run it. But if you run it on 2 different CPUs, maybe your generators are initialized with the same default seed, thus giving the same results.

Categories

Resources