Related
I have the following dataframe:
df = {'count1': [2.2336, 2.2454, 2.2538, 2.2716999999999996, 2.2798000000000003, 2.2843, 2.2906, 2.2969, 2.3223000000000003, 2.3282, 2.3356999999999997, 2.3544, 2.3651999999999997, 2.3727, 2.3775, 2.3823000000000003, 2.392, 2.4051, 2.4092, 2.4133, 2.4168000000000003, 2.4175, 2.4209, 2.4392, 2.4476, 2.456, 2.461, 2.4723, 2.4776, 2.4882, 2.4989, 2.5095, 2.5221999999999998, 2.5318, 2.5422, 2.5494, 2.559, 2.5654, 2.5814, 2.5878, 2.6238, 2.6178000000000003, 2.624, 2.6303, 2.6366, 2.6425, 2.6481999999999997, 2.6525, 2.6553, 2.663, 2.6712, 2.6898, 2.7051, 2.7144, 2.727, 2.7416, 2.7472, 2.7512, 2.7557, 2.7574, 2.7594000000000003, 2.7636, 2.7699000000000003, 2.7761, 2.7809, 2.7855, 2.7902, 2.7948000000000004, 2.7995, 2.8043, 2.815, 2.8249, 2.8352, 2.8455, 2.8708, 2.8874, 2.9004000000000003, 2.9301, 2.9399, 2.9513000000000003, 2.9634, 2.9745999999999997, 2.9852, 2.9959000000000002, 3.0037, 3.0093, 3.015, 3.0184, 3.0206, 3.0225, 3.0245, 3.0264, 3.0282, 3.0305999999999997, 3.0331, 3.0334, 3.0361, 3.0388, 3.0418000000000003, 3.0443000000000002, 3.0463, 3.0464, 3.0481, 3.0496999999999996, 3.0514, 3.0530999999999997, 3.0544000000000002, 3.0556, 3.0569, 3.0581, 3.0623, 3.0627, 3.0633000000000004, 3.0638, 3.0643000000000002, 3.0648, 3.0652, 3.0656999999999996, 3.0663, 3.0675, 3.0682, 3.0688, 3.0695, 3.0702, 3.0721, 3.0741, 3.0761, 3.078, 3.08, 3.082, 3.0839000000000003, 3.0859, 3.0879000000000003, 3.0898000000000003, 3.0918, 3.0938000000000003, 3.0994, 3.1050999999999997, 3.1144000000000003, 3.1613, 3.1649000000000003, 3.1752, 3.1869, 3.1899, 3.1925, 3.1976, 3.2001, 3.2051999999999996, 3.2098, 3.2123000000000004],
'count2': [3144, 3944, 7888, 4428, 68874, 5480, 56697, 20560, 8744, 91190, 352, 924, 1308611, 480, 51146, 170373, 58792, 11424, 1288673, 1845105, 401464, 657930, 1361172, 199373, 19753, 39082, 776, 7533, 9289, 36731, 53865, 100140, 59274, 35740, 2648, 144998, 78616, 848241, 34579, 216591, 22512, 4024, 17168, 1552, 13760, 8344, 65589, 43104, 44672, 917115, 16256, 4168, 29679, 22571, 7720, 452, 8836, 6888, 18578, 5148, 9289, 442, 214, 485, 3164, 1101, 1010, 9048, 293, 1628, 960, 517, 2362, 1262, 1524, 1173, 1348, 1288, 25568, 8416, 5792, 4944, 504, 4696, 2336, 458, 453, 1220, 1149, 6688, 6956, 7324, 7100, 7784, 5650, 5076, 5336, 6792, 5212, 4592, 5260, 1279, 654, 842, 990, 782, 1412, 1363, 935, 996, 775, 1471, 1525, 1398, 1097, 1082, 1668, 1007, 497, 598, 645, 698, 541, 504, 549, 540, 1568, 514, 578, 2906, 4360, 3916, 11944, 1434, 1589, 732, 641, 477, 307, 1884, 3232, 2408, 1016, 332, 139, 344, 4784, 1784, 1324, 204]}
df = pd.DataFrame(df)
And I want to plot a barplot with it, where the x axis is count1 and the y axis count2, with bins spaced every 0.1 intervals.
I used this:
plt.bar(x=df['count1'], y=df['count2'], width=0.1)
But it returns me this error:
TypeError: bar() missing 1 required positional argument: 'height'
I'm trying to replicate an R code:
ggplot(df, aes(x= count1,
y= count2)) +
geom_col() +
ylim(0, 2000000) +
scale_x_binned()
That generates the following graph:
To get a histogram from values and counts, you can use the weights= parameter of plt.hist.
To create bins with a width of 0.1, you can use np.arange(...,..., 0.1).
The rwidth=0.9 parameter makes the bars a bit narrower.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = {'count1': [2.2336, 2.2454, 2.2538, 2.2716999999999996, 2.2798000000000003, 2.2843, 2.2906, 2.2969, 2.3223000000000003, 2.3282, 2.3356999999999997, 2.3544, 2.3651999999999997, 2.3727, 2.3775, 2.3823000000000003, 2.392, 2.4051, 2.4092, 2.4133, 2.4168000000000003, 2.4175, 2.4209, 2.4392, 2.4476, 2.456, 2.461, 2.4723, 2.4776, 2.4882, 2.4989, 2.5095, 2.5221999999999998, 2.5318, 2.5422, 2.5494, 2.559, 2.5654, 2.5814, 2.5878, 2.6238, 2.6178000000000003, 2.624, 2.6303, 2.6366, 2.6425, 2.6481999999999997, 2.6525, 2.6553, 2.663, 2.6712, 2.6898, 2.7051, 2.7144, 2.727, 2.7416, 2.7472, 2.7512, 2.7557, 2.7574, 2.7594000000000003, 2.7636, 2.7699000000000003, 2.7761, 2.7809, 2.7855, 2.7902, 2.7948000000000004, 2.7995, 2.8043, 2.815, 2.8249, 2.8352, 2.8455, 2.8708, 2.8874, 2.9004000000000003, 2.9301, 2.9399, 2.9513000000000003, 2.9634, 2.9745999999999997, 2.9852, 2.9959000000000002, 3.0037, 3.0093, 3.015, 3.0184, 3.0206, 3.0225, 3.0245, 3.0264, 3.0282, 3.0305999999999997, 3.0331, 3.0334, 3.0361, 3.0388, 3.0418000000000003, 3.0443000000000002, 3.0463, 3.0464, 3.0481, 3.0496999999999996, 3.0514, 3.0530999999999997, 3.0544000000000002, 3.0556, 3.0569, 3.0581, 3.0623, 3.0627, 3.0633000000000004, 3.0638, 3.0643000000000002, 3.0648, 3.0652, 3.0656999999999996, 3.0663, 3.0675, 3.0682, 3.0688, 3.0695, 3.0702, 3.0721, 3.0741, 3.0761, 3.078, 3.08, 3.082, 3.0839000000000003, 3.0859, 3.0879000000000003, 3.0898000000000003, 3.0918, 3.0938000000000003, 3.0994, 3.1050999999999997, 3.1144000000000003, 3.1613, 3.1649000000000003, 3.1752, 3.1869, 3.1899, 3.1925, 3.1976, 3.2001, 3.2051999999999996, 3.2098, 3.2123000000000004],
'count2': [3144, 3944, 7888, 4428, 68874, 5480, 56697, 20560, 8744, 91190, 352, 924, 1308611, 480, 51146, 170373, 58792, 11424, 1288673, 1845105, 401464, 657930, 1361172, 199373, 19753, 39082, 776, 7533, 9289, 36731, 53865, 100140, 59274, 35740, 2648, 144998, 78616, 848241, 34579, 216591, 22512, 4024, 17168, 1552, 13760, 8344, 65589, 43104, 44672, 917115, 16256, 4168, 29679, 22571, 7720, 452, 8836, 6888, 18578, 5148, 9289, 442, 214, 485, 3164, 1101, 1010, 9048, 293, 1628, 960, 517, 2362, 1262, 1524, 1173, 1348, 1288, 25568, 8416, 5792, 4944, 504, 4696, 2336, 458, 453, 1220, 1149, 6688, 6956, 7324, 7100, 7784, 5650, 5076, 5336, 6792, 5212, 4592, 5260, 1279, 654, 842, 990, 782, 1412, 1363, 935, 996, 775, 1471, 1525, 1398, 1097, 1082, 1668, 1007, 497, 598, 645, 698, 541, 504, 549, 540, 1568, 514, 578, 2906, 4360, 3916, 11944, 1434, 1589, 732, 641, 477, 307, 1884, 3232, 2408, 1016, 332, 139, 344, 4784, 1784, 1324, 204]}
df = pd.DataFrame(df)
bin_start = np.trunc(df['count1'].min() * 10) / 10
bin_end = df['count1'].max() + 0.1
plt.style.use('ggplot')
plt.hist(x=df['count1'], weights=df['count2'], bins=np.arange(bin_start, bin_end, 0.1), rwidth=0.9)
plt.gca().get_yaxis().get_major_formatter().set_scientific(False)
plt.xlabel('count1')
plt.ylabel('count2')
plt.tight_layout()
plt.show()
I am making this binary search program in Python and I have a very long list (1000 numbers long).
You are supposed to write in a number yourself, and the program should give you the index of where that number is placed in the list (provided the number you write in is in the list). However, this the index given rarely is the exact actual index. Say, the actual index of a number is 300, the program will maybe say 301, or 302. It is very close, but not perfect. Sometimes though, it does give the right answer. I would like it to give the exact correct answer, but I don't know what's wrong. Any help would be appreciated.
print ('BINARY SEARCH PROGRAMME\n')
data = [3079, 1006, 1965, 3275, 1466, 3498, 3606, 8140, 2854, 6241, 7216, 2973, 641, 9911, 7346, 5211, 4851, 4023, 9335, 6645, 7951, 1034, 581, 1585, 5519, 8386, 204, 3700, 1737, 8597, 4922, 7094, 4329, 8766, 1092, 1799, 1151, 8316, 4267, 2368, 9505, 2829, 1986, 3527, 8817, 1013, 6209, 7749, 8152, 3887, 1361, 702, 1888, 6807, 9101, 523, 1862, 46, 5094, 4799, 625, 946, 5684, 5832, 1650, 9902, 2727, 4896, 616, 166, 8065, 5033, 9533, 1833, 220, 9397, 6388, 9185, 4247, 3676, 7673, 2437, 5206, 2130, 4553, 2492, 2719, 6212, 5541, 7550, 4685, 4157, 2336, 5759, 5896, 1327, 4004, 4573, 3213, 9708, 1677, 9583, 8436, 4220, 3229, 1261, 1377, 8365, 2613, 407, 2115, 6429, 9410, 1029, 2224, 6753, 4099, 5966, 9341, 5764, 4381, 6461, 6834, 5089, 6518, 2402, 5685, 7705, 5188, 713, 7147, 6068, 3020, 5723, 4972, 5052, 7133, 8567, 7023, 8086, 9068, 1088, 9025, 7471, 4852, 2624, 7067, 3847, 9886, 8565, 1727, 79, 8563, 376, 9949, 2487, 2155, 9557, 6811, 9175, 9862, 998, 4009, 93, 6923, 4405, 3831, 3780, 6708, 1353, 144, 8801, 6855, 3826, 5253, 2582, 1925, 7419, 5580, 3956, 2320, 7855, 7046, 9992, 4184, 6648, 9706, 3002, 9395, 4265, 122, 8642, 3182, 3106, 2581, 8537, 6644, 1682, 7000, 858, 1686, 1518, 4461, 4728, 6195, 3391, 1929, 9055, 3814, 9391, 2527, 7888, 1464, 9455, 1516, 4464, 9784, 119, 16, 8145, 9683, 3530, 9457, 9310, 8551, 9214, 4501, 205, 9173, 8842, 9488, 4692, 993, 5452, 1807, 9405, 3272, 5295, 2984, 8835, 3534, 6436, 7429, 9729, 8975, 7513, 2453, 6935, 7481, 562, 69, 1169, 2752, 1280, 4829, 6147, 3380, 8296, 6546, 8083, 9585, 8494, 6915, 3320, 5439, 3536, 5309, 8652, 5747, 7073, 9624, 2946, 8055, 5261, 3083, 505, 639, 2859, 3582, 1407, 4027, 4048, 8329, 5454, 8805, 4079, 3192, 4978, 1298, 4873, 697, 3270, 2169, 2966, 1446, 5704, 3144, 9919, 9887, 7336, 2703, 8306, 2263, 5683, 4604, 5871, 6629, 8992, 3031, 4592, 6960, 7192, 1844, 8916, 8910, 7162, 1250, 4043, 7624, 6961, 1695, 2386, 1086, 7790, 3364, 8543, 8013, 2980, 4342, 8500, 453, 5320, 6409, 2923, 1531, 4505, 4089, 7387, 6440, 8513, 8586, 9131, 1093, 2611, 2804, 7417, 4678, 8924, 8811, 2630, 36, 7957, 7247, 2554, 3052, 8584, 4969, 7557, 9030, 2890, 9299, 7128, 1016, 9978, 3131, 96, 295, 3446, 1571, 2067, 604, 5120, 2644, 693, 6377, 8815, 2473, 2722, 1530, 4299, 3136, 872, 5419, 4654, 5283, 5506, 3715, 1376, 2682, 4107, 4493, 9570, 8898, 3611, 800, 4652, 6895, 6730, 2477, 5802, 8596, 4439, 3061, 2658, 5254, 3940, 194, 5225, 3764, 7181, 8707, 8040, 7433, 3018, 123, 9087, 9019, 8189, 1837, 4200, 4123, 9456, 2465, 6562, 7352, 147, 8706, 9425, 1266, 4971, 8718, 3796, 8876, 6988, 3709, 5625, 3169, 5480, 6721, 274, 2036, 8236, 208, 7621, 805, 2903, 8663, 9986, 3404, 9463, 8103, 5047, 9355, 6120, 7207, 2112, 310, 4747, 3366, 6530, 9347, 7246, 4574, 912, 1555, 4755, 9674, 875, 9880, 4297, 4537, 2710, 8641, 1814, 535, 1870, 6033, 4327, 5011, 8549, 6199, 6676, 9913, 6483, 3470, 3096, 7152, 8338, 346, 9421, 8570, 8068, 9331, 7827, 7622, 472, 4514, 2605, 9647, 7864, 7021, 4797, 1144, 7324, 5835, 4435, 8542, 948, 9573, 7602, 52, 7805, 6322, 371, 3426, 3763, 1715, 2236, 1279, 9868, 9191, 7150, 4379, 4447, 4038, 667, 1265, 9630, 9814, 9592, 556, 8710, 6861, 5864, 114, 127, 4460, 4124, 1639, 2716, 4586, 8071, 109, 1766, 8121, 4120, 7955, 828, 14, 7154, 618, 9636, 9313, 8408, 4418, 1900, 3001, 9820, 811, 9386, 8941, 3681, 2442, 8752, 6477, 331, 597, 5087, 2475, 8951, 3040, 3607, 4305, 7730, 9953, 2991, 6795, 1454, 8171, 4494, 8120, 6554, 9521, 8487, 4240, 1064, 822, 5823, 5756, 5371, 3058, 7416, 788, 7498, 4352, 4142, 5719, 4091, 3316, 4578, 1708, 6888, 3977, 4820, 790, 9108, 430, 6924, 6152, 4409, 8383, 6734, 3601, 6316, 8643, 8903, 6607, 3679, 2921, 1792, 8456, 6913, 1297, 1091, 2083, 3203, 3033, 6159, 3839, 6934, 4688, 1758, 5082, 6251, 7856, 8717, 1315, 2329, 1187, 9631, 2333, 4041, 3840, 2429, 5730, 9152, 2614, 8461, 8544, 1176, 8759, 9598, 2041, 4377, 973, 3744, 4292, 8922, 5838, 6869, 3004, 2513, 1767, 3414, 9715, 9718, 8019, 5282, 9889, 3438, 975, 1081, 1780, 7909, 7376, 7605, 8708, 9723, 1457, 3666, 4594, 5727, 9294, 6487, 2508, 3600, 9350, 8751, 9121, 5927, 8203, 8250, 3591, 1917, 4513, 9287, 1041, 9342, 7370, 3402, 6871, 4551, 1632, 2413, 6183, 2291, 4176, 7006, 9771, 9568, 5760, 6657, 4239, 6198, 2610, 9207, 8692, 7089, 7866, 4236, 5713, 9385, 7001, 43, 1343, 7958, 2608, 9377, 5075, 3990, 5124, 9728, 8519, 4623, 5613, 7981, 1832, 275, 4375, 1044, 8573, 9782, 738, 421, 2999, 7511, 3888, 1337, 390, 1987, 8735, 2953, 4237, 5234, 9070, 1693, 7769, 2085, 8698, 762, 7563, 7443, 7271, 3989, 7524, 6083, 5398, 2575, 5367, 5686, 870, 7331, 5659, 3363, 3539, 9061, 7802, 9910, 7520, 1707, 2532, 4479, 5517, 7975, 6177, 3116, 653, 9677, 7721, 9969, 6772, 507, 1763, 168, 1924, 6364, 9877, 9485, 2250, 1497, 1395, 5158, 7233, 229, 4119, 8190, 9836, 9654, 3580, 9801, 7668, 2846, 8636, 6397, 3560, 2460, 8509, 6371, 4144, 8850, 5878, 5266, 1211, 9678, 2371, 1123, 8015, 1161, 3242, 2518, 5718, 4643, 314, 3784, 4249, 8554, 9102, 9524, 4795, 2167, 5346, 5644, 5954, 7312, 6770, 9688, 4425, 664, 7364, 5614, 512, 7634, 8812, 367, 3957, 9498, 2463, 6825, 2886, 8610, 5255, 8345, 5850, 4231, 5748, 6248, 6787, 345, 9713, 5923, 6843, 318, 8491, 9841, 9792, 1009, 9603, 2956, 106, 5937, 3650, 4842, 9756, 1995, 1940, 2227, 9619, 7013, 2542, 223, 3345, 4216, 6021, 3465, 3868, 7731, 5851, 7782, 6973, 3503, 4859, 7390, 207, 1208, 3012, 5006, 5505, 2797, 9558, 1109, 1662, 9555, 2996, 5744, 8734, 5005, 2838, 8555, 7132, 5200, 1906, 1757, 9096, 9639, 4884, 3491, 8433, 9591, 7385, 746, 1335, 6941, 5160, 3341, 4893, 4344, 2537, 2212, 4017, 6433, 1441, 7085, 1652, 8133, 3084, 9812, 9218, 3589, 8748, 2747, 4325, 3720, 228, 3238, 6299, 9821, 6565, 3646, 5086, 1477, 3859, 9003, 5642, 9086, 9774, 8535, 736, 8508, 8488, 7035, 9398, 8932, 6175, 8309, 5769, 7191, 22, 6469, 290, 4244, 4078, 8072, 214, 2835, 6737, 4691, 4726, 1291, 1242, 9172, 1559, 1236, 7899, 4882, 1195, 5576, 504, 6652, 3746]
data = sorted(data)
indl = 0
roun = 0
inp = int(input('Enter number to search:\n'))
if inp in data:
while len(data) > 1:
if len(data) % 2 == 0:
half = len(data) / 2
elif len(data) % 2 == 1:
half = int(round(float(len(data) / 2)))
half = int(half)
if inp < data[half]:
data = data[:half]
elif len(data) < 3 and inp == data[half]:
indl = indl + half
break
else:
while len(data) > half:
data.remove(data[0])
indl = indl + int(len(data))
else:
print ('\nNumber not found\n')
print ('The given number is on index number:\n', indl)
I have sequence data with the following properties.
1) Very less or no repetitions of elements in a sequence
2) The length of each sequence is constant
3) The length of each sequence is far greater than the total number of sequences.
Let's say following are the sequences.
Input sequences:
A_input=np.array([1799, 2156, 2087, 1454, 515, 199, 1011, 3467, 4210, 3361, 2024,
4641, 497, 3845, 4136, 2978, 1371, 1953, 3611, 1349])
B_input=np.array([1350, 1129, 3681, 4487, 637, 1285, 3412, 1277, 892, 2009, 4401,
1329, 4300, 866, 2201, 3275, 4513, 346, 3164, 1262])
C_input=np.array([ 739, 77, 4818, 2759, 70, 121, 273, 1915, 103, 2983, 3709,
3354, 2856, 3391, 3379, 2593, 3924, 1768, 2650, 2721])
D_input=np.array([1845, 4673, 1419, 1323, 736, 4912, 2104, 2055, 3844, 3219, 2611,
1869, 1369, 1946, 3559, 1445, 3660, 554, 1579, 467])
E_input=np.array([1646, 4461, 944, 211, 3552, 3107, 4602, 3934, 4381, 4959, 4595,
4040, 4834, 2593, 1558, 2760, 1303, 824, 2856, 976])
Output regression:
A_output=0.4
B_output=0.8
C_output=0.1
D_output=0.2
E_output=0.3
I tried to used few methods like SGT but most of them works for sequences with repetitions in it?
Is there any method in python to train and predict the regression output values of the input sequences of the specific properties discussed above?
I am starting to learn Python, and I chose to do this problem I found online (as I am just starting to do lists):
"Write a program that adds all integers from 2 to 100,000 to a list. Then, remove the multiples of 2 (but not 2), multiples of 3 (but not 3), and so on, up to and including the multiples of 100,000. Print the remaining values."
So far, the code below is what I have. I have gotten it to work up to 10,000 as a base, but not farther:
# FUNCTIONS
def fillList(list):
for i in range(2, 10001):
list.append(i)
return list
def removeMultiples(list):
for i in range(10000, 1, -1):
remove = False
for j in range(100, 1, -1):
if i != j and i % j == 0:
remove = True
if remove == True:
list.remove(i)
return list
# main
def main():
exampleList = []
exampleList = fillList(exampleList)
print(removeMultiples(exampleList))
# PROGRAM RUN
main()
This is the output:
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997, 1009, 1013, 1019, 1021, 1031, 1033, 1039, 1049, 1051, 1061, 1063, 1069, 1087, 1091, 1093, 1097, 1103, 1109, 1117, 1123, 1129, 1151, 1153, 1163, 1171, 1181, 1187, 1193, 1201, 1213, 1217, 1223, 1229, 1231, 1237, 1249, 1259, 1277, 1279, 1283, 1289, 1291, 1297, 1301, 1303, 1307, 1319, 1321, 1327, 1361, 1367, 1373, 1381, 1399, 1409, 1423, 1427, 1429, 1433, 1439, 1447, 1451, 1453, 1459, 1471, 1481, 1483, 1487, 1489, 1493, 1499, 1511, 1523, 1531, 1543, 1549, 1553, 1559, 1567, 1571, 1579, 1583, 1597, 1601, 1607, 1609, 1613, 1619, 1621, 1627, 1637, 1657, 1663, 1667, 1669, 1693, 1697, 1699, 1709, 1721, 1723, 1733, 1741, 1747, 1753, 1759, 1777, 1783, 1787, 1789, 1801, 1811, 1823, 1831, 1847, 1861, 1867, 1871, 1873, 1877, 1879, 1889, 1901, 1907, 1913, 1931, 1933, 1949, 1951, 1973, 1979, 1987, 1993, 1997, 1999, 2003, 2011, 2017, 2027, 2029, 2039, 2053, 2063, 2069, 2081, 2083, 2087, 2089, 2099, 2111, 2113, 2129, 2131, 2137, 2141, 2143, 2153, 2161, 2179, 2203, 2207, 2213, 2221, 2237, 2239, 2243, 2251, 2267, 2269, 2273, 2281, 2287, 2293, 2297, 2309, 2311, 2333, 2339, 2341, 2347, 2351, 2357, 2371, 2377, 2381, 2383, 2389, 2393, 2399, 2411, 2417, 2423, 2437, 2441, 2447, 2459, 2467, 2473, 2477, 2503, 2521, 2531, 2539, 2543, 2549, 2551, 2557, 2579, 2591, 2593, 2609, 2617, 2621, 2633, 2647, 2657, 2659, 2663, 2671, 2677, 2683, 2687, 2689, 2693, 2699, 2707, 2711, 2713, 2719, 2729, 2731, 2741, 2749, 2753, 2767, 2777, 2789, 2791, 2797, 2801, 2803, 2819, 2833, 2837, 2843, 2851, 2857, 2861, 2879, 2887, 2897, 2903, 2909, 2917, 2927, 2939, 2953, 2957, 2963, 2969, 2971, 2999, 3001, 3011, 3019, 3023, 3037, 3041, 3049, 3061, 3067, 3079, 3083, 3089, 3109, 3119, 3121, 3137, 3163, 3167, 3169, 3181, 3187, 3191, 3203, 3209, 3217, 3221, 3229, 3251, 3253, 3257, 3259, 3271, 3299, 3301, 3307, 3313, 3319, 3323, 3329, 3331, 3343, 3347, 3359, 3361, 3371, 3373, 3389, 3391, 3407, 3413, 3433, 3449, 3457, 3461, 3463, 3467, 3469, 3491, 3499, 3511, 3517, 3527, 3529, 3533, 3539, 3541, 3547, 3557, 3559, 3571, 3581, 3583, 3593, 3607, 3613, 3617, 3623, 3631, 3637, 3643, 3659, 3671, 3673, 3677, 3691, 3697, 3701, 3709, 3719, 3727, 3733, 3739, 3761, 3767, 3769, 3779, 3793, 3797, 3803, 3821, 3823, 3833, 3847, 3851, 3853, 3863, 3877, 3881, 3889, 3907, 3911, 3917, 3919, 3923, 3929, 3931, 3943, 3947, 3967, 3989, 4001, 4003, 4007, 4013, 4019, 4021, 4027, 4049, 4051, 4057, 4073, 4079, 4091, 4093, 4099, 4111, 4127, 4129, 4133, 4139, 4153, 4157, 4159, 4177, 4201, 4211, 4217, 4219, 4229, 4231, 4241, 4243, 4253, 4259, 4261, 4271, 4273, 4283, 4289, 4297, 4327, 4337, 4339, 4349, 4357, 4363, 4373, 4391, 4397, 4409, 4421, 4423, 4441, 4447, 4451, 4457, 4463, 4481, 4483, 4493, 4507, 4513, 4517, 4519, 4523, 4547, 4549, 4561, 4567, 4583, 4591, 4597, 4603, 4621, 4637, 4639, 4643, 4649, 4651, 4657, 4663, 4673, 4679, 4691, 4703, 4721, 4723, 4729, 4733, 4751, 4759, 4783, 4787, 4789, 4793, 4799, 4801, 4813, 4817, 4831, 4861, 4871, 4877, 4889, 4903, 4909, 4919, 4931, 4933, 4937, 4943, 4951, 4957, 4967, 4969, 4973, 4987, 4993, 4999, 5003, 5009, 5011, 5021, 5023, 5039, 5051, 5059, 5077, 5081, 5087, 5099, 5101, 5107, 5113, 5119, 5147, 5153, 5167, 5171, 5179, 5189, 5197, 5209, 5227, 5231, 5233, 5237, 5261, 5273, 5279, 5281, 5297, 5303, 5309, 5323, 5333, 5347, 5351, 5381, 5387, 5393, 5399, 5407, 5413, 5417, 5419, 5431, 5437, 5441, 5443, 5449, 5471, 5477, 5479, 5483, 5501, 5503, 5507, 5519, 5521, 5527, 5531, 5557, 5563, 5569, 5573, 5581, 5591, 5623, 5639, 5641, 5647, 5651, 5653, 5657, 5659, 5669, 5683, 5689, 5693, 5701, 5711, 5717, 5737, 5741, 5743, 5749, 5779, 5783, 5791, 5801, 5807, 5813, 5821, 5827, 5839, 5843, 5849, 5851, 5857, 5861, 5867, 5869, 5879, 5881, 5897, 5903, 5923, 5927, 5939, 5953, 5981, 5987, 6007, 6011, 6029, 6037, 6043, 6047, 6053, 6067, 6073, 6079, 6089, 6091, 6101, 6113, 6121, 6131, 6133, 6143, 6151, 6163, 6173, 6197, 6199, 6203, 6211, 6217, 6221, 6229, 6247, 6257, 6263, 6269, 6271, 6277, 6287, 6299, 6301, 6311, 6317, 6323, 6329, 6337, 6343, 6353, 6359, 6361, 6367, 6373, 6379, 6389, 6397, 6421, 6427, 6449, 6451, 6469, 6473, 6481, 6491, 6521, 6529, 6547, 6551, 6553, 6563, 6569, 6571, 6577, 6581, 6599, 6607, 6619, 6637, 6653, 6659, 6661, 6673, 6679, 6689, 6691, 6701, 6703, 6709, 6719, 6733, 6737, 6761, 6763, 6779, 6781, 6791, 6793, 6803, 6823, 6827, 6829, 6833, 6841, 6857, 6863, 6869, 6871, 6883, 6899, 6907, 6911, 6917, 6947, 6949, 6959, 6961, 6967, 6971, 6977, 6983, 6991, 6997, 7001, 7013, 7019, 7027, 7039, 7043, 7057, 7069, 7079, 7103, 7109, 7121, 7127, 7129, 7151, 7159, 7177, 7187, 7193, 7207, 7211, 7213, 7219, 7229, 7237, 7243, 7247, 7253, 7283, 7297, 7307, 7309, 7321, 7331, 7333, 7349, 7351, 7369, 7393, 7411, 7417, 7433, 7451, 7457, 7459, 7477, 7481, 7487, 7489, 7499, 7507, 7517, 7523, 7529, 7537, 7541, 7547, 7549, 7559, 7561, 7573, 7577, 7583, 7589, 7591, 7603, 7607, 7621, 7639, 7643, 7649, 7669, 7673, 7681, 7687, 7691, 7699, 7703, 7717, 7723, 7727, 7741, 7753, 7757, 7759, 7789, 7793, 7817, 7823, 7829, 7841, 7853, 7867, 7873, 7877, 7879, 7883, 7901, 7907, 7919, 7927, 7933, 7937, 7949, 7951, 7963, 7993, 8009, 8011, 8017, 8039, 8053, 8059, 8069, 8081, 8087, 8089, 8093, 8101, 8111, 8117, 8123, 8147, 8161, 8167, 8171, 8179, 8191, 8209, 8219, 8221, 8231, 8233, 8237, 8243, 8263, 8269, 8273, 8287, 8291, 8293, 8297, 8311, 8317, 8329, 8353, 8363, 8369, 8377, 8387, 8389, 8419, 8423, 8429, 8431, 8443, 8447, 8461, 8467, 8501, 8513, 8521, 8527, 8537, 8539, 8543, 8563, 8573, 8581, 8597, 8599, 8609, 8623, 8627, 8629, 8641, 8647, 8663, 8669, 8677, 8681, 8689, 8693, 8699, 8707, 8713, 8719, 8731, 8737, 8741, 8747, 8753, 8761, 8779, 8783, 8803, 8807, 8819, 8821, 8831, 8837, 8839, 8849, 8861, 8863, 8867, 8887, 8893, 8923, 8929, 8933, 8941, 8951, 8963, 8969, 8971, 8999, 9001, 9007, 9011, 9013, 9029, 9041, 9043, 9049, 9059, 9067, 9091, 9103, 9109, 9127, 9133, 9137, 9151, 9157, 9161, 9173, 9181, 9187, 9199, 9203, 9209, 9221, 9227, 9239, 9241, 9257, 9277, 9281, 9283, 9293, 9311, 9319, 9323, 9337, 9341, 9343, 9349, 9371, 9377, 9391, 9397, 9403, 9413, 9419, 9421, 9431, 9433, 9437, 9439, 9461, 9463, 9467, 9473, 9479, 9491, 9497, 9511, 9521, 9533, 9539, 9547, 9551, 9587, 9601, 9613, 9619, 9623, 9629, 9631, 9643, 9649, 9661, 9677, 9679, 9689, 9697, 9719, 9721, 9733, 9739, 9743, 9749, 9767, 9769, 9781, 9787, 9791, 9803, 9811, 9817, 9829, 9833, 9839, 9851, 9857, 9859, 9871, 9883, 9887, 9901, 9907, 9923, 9929, 9931, 9941, 9949, 9967, 9973]
What could I change in my code so that it will print up to 100,000?
There is no integer limit in Python so what you're doing should work fine depending on your computer's memory and your own patience. If it works for 10,000 it will work for 100,000 given your logic. IMO, if you're just learning I suggest moving on to a new problem, you've basically already solved this one.
What could I change in my code so that it will print up to 100,000?
There are a few issues to consider. First, just changing the 10,000's to 100,000's is obviously insuficient. There's that 100 in there that represents the square root of 10,000 so that needs to become 316 or so, i.e. the square root of 100,000. Which leads to a larger issue, not hardcoding these values!
Given the (inefficient) algorithm you're using, the jump from 10,000 to 100,000 means an execution time jump from a second to nearly a minute and a half! So, your code may be fine with the larger range but possibly your patience isn't, and you forcibly quit the program before it has a chance to complete.
Let's address the above by making the code adjust itself to the range, simplify some of the logic, and make it more efficient:
def fillList(n):
return list(range(2, n + 1))
def removeMultiples(numbers): # don't use *list* as a variable name!
for i in range(len(numbers) -1, -1, -1):
number = numbers[i]
for j in range(int(number ** 0.5), numbers[0] - 1, -1):
if number % j == 0:
del numbers[i]
break
return numbers
def main():
exampleList = fillList(100000)
print(removeMultiples(exampleList))
main()
By deleting our unwanted numbers by index, instead of by value which requires a search, we get the time down from nearly a minute and a half to just over one second:
> time python3 test.py
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67,
71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149,
151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229,
233, 239, 241, 251, ..., 99563, 99571, 99577, 99581, 99607, 99611, 99623,
99643, 99661, 99667, 99679, 99689, 99707, 99709, 99713, 99719, 99721,
99733, 99761, 99767, 99787, 99793, 99809, 99817, 99823, 99829, 99833,
99839, 99859, 99871, 99877, 99881, 99901, 99907, 99923, 99929, 99961,
99971, 99989, 99991]
1.153u 0.015s 0:01.18 98.3% 0+0k 0+0io 0pf+0w
>
This question already has answers here:
Python Multiprocessing Numpy Random [duplicate]
(2 answers)
Closed 7 years ago.
I'm analyzing a large graph. So, I divide the graph into chunks and hopefully with multi-core CPU it would be faster. However, my model is a randomized model so there's a chance that the results of each run won't be the same. I'm testing the idea and I get the same result all the time so I'm wondering if my code is correct.
Here's my code
from multiprocessing import Process, Queue
# split a list into evenly sized chunks
def chunks(l, n):
return [l[i:i+n] for i in range(0, len(l), n)]
def multiprocessing_icm(queue, nodes):
queue.put(independent_cascade_igraph(twitter_igraph, nodes, steps=1))
def dispatch_jobs(data, job_number):
total = len(data)
chunk_size = total / job_number
slice = chunks(data, chunk_size)
jobs = []
processes = []
queue = Queue()
for i, s in enumerate(slice):
j = Process(target=multiprocessing_icm, args=(queue, s))
jobs.append(j)
for j in jobs:
j.start()
for j in jobs:
j.join()
return queue
dispatch_jobs(['121817564', '121817564'], 2)
if you're wondering what independent_cascade_igraph is. Here's the code
def independent_cascade_igraph(G, seeds, steps=0):
# init activation probabilities
for e in G.es():
if 'act_prob' not in e.attributes():
e['act_prob'] = 0.1
elif e['act_prob'] > 1:
raise Exception("edge activation probability:", e['act_prob'], "cannot be larger than 1")
# perform diffusion
A = copy.deepcopy(seeds) # prevent side effect
if steps <= 0:
# perform diffusion until no more nodes can be activated
return _diffuse_all(G, A)
# perform diffusion for at most "steps" rounds
return _diffuse_k_rounds(G, A, steps)
def _diffuse_all(G, A):
tried_edges = set()
layer_i_nodes = [ ]
layer_i_nodes.append([i for i in A]) # prevent side effect
while True:
len_old = len(A)
(A, activated_nodes_of_this_round, cur_tried_edges) = _diffuse_one_round(G, A, tried_edges)
layer_i_nodes.append(activated_nodes_of_this_round)
tried_edges = tried_edges.union(cur_tried_edges)
if len(A) == len_old:
break
return layer_i_nodes
def _diffuse_k_rounds(G, A, steps):
tried_edges = set()
layer_i_nodes = [ ]
layer_i_nodes.append([i for i in A])
while steps > 0 and len(A) < G.vcount():
len_old = len(A)
(A, activated_nodes_of_this_round, cur_tried_edges) = _diffuse_one_round(G, A, tried_edges)
layer_i_nodes.append(activated_nodes_of_this_round)
tried_edges = tried_edges.union(cur_tried_edges)
if len(A) == len_old:
break
steps -= 1
return layer_i_nodes
def _diffuse_one_round(G, A, tried_edges):
activated_nodes_of_this_round = set()
cur_tried_edges = set()
for s in A:
for nb in G.successors(s):
if nb in A or (s, nb) in tried_edges or (s, nb) in cur_tried_edges:
continue
if _prop_success(G, s, nb):
activated_nodes_of_this_round.add(nb)
cur_tried_edges.add((s, nb))
activated_nodes_of_this_round = list(activated_nodes_of_this_round)
A.extend(activated_nodes_of_this_round)
return A, activated_nodes_of_this_round, cur_tried_edges
def _prop_success(G, src, dest):
'''
act_prob = 0.1
for e in G.es():
if (src, dest) == e.tuple:
act_prob = e['act_prob']
break
'''
return random.random() <= 0.1
Here's the result of multiprocessing
[['121817564'], [1538, 1539, 4, 517, 1547, 528, 2066, 1623, 1540, 538, 1199, 31, 1056, 1058, 547, 1061, 1116, 1067, 1069, 563, 1077, 1591, 1972, 1595, 1597, 1598, 1088, 1090, 1608, 1656, 1098, 1463, 1105, 1619, 1622, 1111, 601, 1627, 604, 1629, 606, 95, 612, 101, 1980, 618, 1652, 1897, 1144, 639, 640, 641, 647, 650, 1815, 1677, 143, 1170, 1731, 660, 1173, 1690, 1692, 1562, 1563, 1189, 1702, 687, 689, 1203, 1205, 1719, 703, 1219, 1229, 1744, 376, 1746, 211, 1748, 213, 1238, 218, 221, 735, 227, 1764, 741, 230, 1769, 1258, 1780, 1269, 1783, 761, 763, 1788, 1789, 1287, 769, 258, 1286, 263, 264, 780, 1298, 1299, 1812, 473, 1822, 1828, 806, 811, 1324, 814, 304, 478, 310, 826, 1858, 1349, 326, 327, 1352, 329, 1358, 336, 852, 341, 854, 1879, 1679, 868, 2022, 1385, 1902, 1904, 881, 1907, 1398, 1911, 888, 1940, 1402, 1941, 1920, 1830, 387, 1942, 905, 1931, 1411, 399, 1426, 915, 916, 917, 406, 407, 1433, 1947, 1441, 419, 1445, 1804, 428, 1454, 1455, 948, 1973, 951, 1466, 443, 1468, 1471, 1474, 1988, 966, 1479, 1487, 976, 467, 1870, 2007, 985, 1498, 990, 1504, 1124, 485, 486, 489, 492, 2029, 2033, 1524, 1534, 2038, 1018, 1535, 510, 1125]]
[['121817564'], [1538, 1539, 4, 517, 1547, 528, 2066, 1623, 1540, 538, 1199, 31, 1056, 1058, 547, 1061, 1116, 1067, 1069, 563, 1077, 1591, 1972, 1595, 1597, 1598, 1088, 1090, 1608, 1656, 1098, 1463, 1105, 1619, 1622, 1111, 601, 1627, 604, 1629, 606, 95, 612, 101, 1980, 618, 1652, 1897, 1144, 639, 640, 641, 647, 650, 1815, 1677, 143, 1170, 1731, 660, 1173, 1690, 1692, 1562, 1563, 1189, 1702, 687, 689, 1203, 1205, 1719, 703, 1219, 1229, 1744, 376, 1746, 211, 1748, 213, 1238, 218, 221, 735, 227, 1764, 741, 230, 1769, 1258, 1780, 1269, 1783, 761, 763, 1788, 1789, 1287, 769, 258, 1286, 263, 264, 780, 1298, 1299, 1812, 473, 1822, 1828, 806, 811, 1324, 814, 304, 478, 310, 826, 1858, 1349, 326, 327, 1352, 329, 1358, 336, 852, 341, 854, 1879, 1679, 868, 2022, 1385, 1902, 1904, 881, 1907, 1398, 1911, 888, 1940, 1402, 1941, 1920, 1830, 387, 1942, 905, 1931, 1411, 399, 1426, 915, 916, 917, 406, 407, 1433, 1947, 1441, 419, 1445, 1804, 428, 1454, 1455, 948, 1973, 951, 1466, 443, 1468, 1471, 1474, 1988, 966, 1479, 1487, 976, 467, 1870, 2007, 985, 1498, 990, 1504, 1124, 485, 486, 489, 492, 2029, 2033, 1524, 1534, 2038, 1018, 1535, 510, 1125]]
But here's the example if I run indepedent_cascade_igraph twice
independent_cascade_igraph(twitter_igraph, ['121817564'], steps=1)
[['121817564'],
[514,
1773,
1540,
1878,
2057,
1035,
1550,
2064,
1042,
533,
1558,
1048,
1054,
544,
545,
1061,
1067,
1885,
1072,
350,
1592,
1460,...
independent_cascade_igraph(twitter_igraph, ['121817564'], steps=1)
[['121817564'],
[1027,
2055,
8,
1452,
1546,
1038,
532,
1045,
542,
546,
1059,
549,
1575,
1576,
2030,
1067,
1068,
1071,
564,
573,
575,
1462,
584,
1293,
1105,
595,
599,
1722,
1633,
1634,
614,
1128,
1131,
1286,
621,
1647,
1648,
627,
636,
1662,
1664,
1665,
130,
1671,
1677,
656,
1169,
148,
1686,
1690,
667,
1186,
163,
1700,
1191,
1705,
1711,...
So, what I'm hoping to get out of this is if I have a list of 500 ids, I would like the first CPU to calculate the first 250 and the second CPU to calculate the last 250 and then merge the result. I'm not sure if I understand multiprocessing correctly.
As mentioned e.g. in this SO answer, in *nix child processes inherit the state of the RNG. Call random.seed() in every child process to initialize it yourself to a per-process seed, or randomly.
Haven't read your program in detail but my general feeling is that you probably have a random number generator seed problem. If you run twice the program on the same CPU the random number generator's state will be different the second time you run it. But if you run it on 2 different CPUs, maybe your generators are initialized with the same default seed, thus giving the same results.