I have a list like this:
['|', 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 655, 656, 657, 658, 659, 660, 661, 662, 663, 664, 665, 666, 667, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 683, 684, 685, 686, 687, 688, 689, 690, 691, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 714, 715, 716, 717, 718, 719, 720, '|', 840, 841, 842, 843, 844, 845, 846, 847, 848, 849, 850, 851, 852, 853, 854, 855, 856, 857, 858, 859, 860, 861, 862, 863, 864, 865, 866, 867, 868, 869, 870, 871, 872, 873, 874, 875, 876, 877, 878, 879, 880, 881, 882, 883, 884, 885, 886, 887, 888, 889, 890, 891, 892, 893, 894, 895, 896, 897, 898, 899, 900, '|', 960, 961, 962, 963, 964, 965, 966, 967, 968, 969, 970, 971, 972, 973, 974, 975, 976, 977, 978, 979, 980, 981, 982, 983, 984, 985, 986, 987, 988, 989, 990, 991, 992, 993, 994, 995, 996, 997, 998, 999, 1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011, 1012, 1013, 1014, 1015, 1016, 1017, 1018, 1019, 1020, 1021, 1022, 1023, 1024, 1025, 1026, 1027, 1028, 1029, 1030, 1031, 1032, 1033, 1034, 1035, 1036, 1037, 1038, 1039, 1040, 1041, 1042, 1043, 1044, 1045, 1046, 1047, 1048, 1049, 1050, 1051, 1052, 1053, 1054, 1055, 1056, 1057, 1058, 1059, 1060, 1061, 1062, 1063, 1064, 1065, 1066, 1067, 1068, 1069, 1070, 1071, 1072, 1073, 1074, 1075, 1076, 1077, 1078, 1079, 1080, 1081, 1082, 1083, 1084, 1085, 1086, 1087, 1088, 1089, 1090, 1091, 1092, 1093, 1094, 1095, 1096, 1097, 1098, 1099, 1100, 1101, 1102, 1103, 1104, 1105, 1106, 1107, 1108, 1109, 1110, 1111, 1112, 1113, 1114, 1115, 1116, 1117, 1118, 1119, 1120, 1121, 1122, 1123, 1124, 1125, 1126, 1127, 1128, 1129, 1130, 1131, 1132, 1133, 1134, 1135, 1136, 1137, 1138, 1139, 1140, '|']
And I'd like to convert it so that each element between two "|" symbols are put into a nested list instead.
So what I'd like it to look like is:
[[480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 655, 656, 657, 658, 659, 660, 661, 662, 663, 664, 665, 666, 667, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 683, 684, 685, 686, 687, 688, 689, 690, 691, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 714, 715, 716, 717, 718, 719, 720], [840, 841, 842, 843, 844, 845, 846, 847, 848, 849, 850, 851, 852, 853, 854, 855, 856, 857, 858, 859, 860, 861, 862, 863, 864, 865, 866, 867, 868, 869, 870, 871, 872, 873, 874, 875, 876, 877, 878, 879, 880, 881, 882, 883, 884, 885, 886, 887, 888, 889, 890, 891, 892, 893, 894, 895, 896, 897, 898, 899, 900], [960, 961, 962, 963, 964, 965, 966, 967, 968, 969, 970, 971, 972, 973, 974, 975, 976, 977, 978, 979, 980, 981, 982, 983, 984, 985, 986, 987, 988, 989, 990, 991, 992, 993, 994, 995, 996, 997, 998, 999, 1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011, 1012, 1013, 1014, 1015, 1016, 1017, 1018, 1019, 1020, 1021, 1022, 1023, 1024, 1025, 1026, 1027, 1028, 1029, 1030, 1031, 1032, 1033, 1034, 1035, 1036, 1037, 1038, 1039, 1040, 1041, 1042, 1043, 1044, 1045, 1046, 1047, 1048, 1049, 1050, 1051, 1052, 1053, 1054, 1055, 1056, 1057, 1058, 1059, 1060, 1061, 1062, 1063, 1064, 1065, 1066, 1067, 1068, 1069, 1070, 1071, 1072, 1073, 1074, 1075, 1076, 1077, 1078, 1079, 1080, 1081, 1082, 1083, 1084, 1085, 1086, 1087, 1088, 1089, 1090, 1091, 1092, 1093, 1094, 1095, 1096, 1097, 1098, 1099, 1100, 1101, 1102, 1103, 1104, 1105, 1106, 1107, 1108, 1109, 1110, 1111, 1112, 1113, 1114, 1115, 1116, 1117, 1118, 1119, 1120, 1121, 1122, 1123, 1124, 1125, 1126, 1127, 1128, 1129, 1130, 1131, 1132, 1133, 1134, 1135, 1136, 1137, 1138, 1139, 1140]]
You can use itertools.groupby for this. As the key for grouping, just test whether the current element is your separator, then discard all the segments where it is the separator.
>>> from itertools import groupby
>>> data = ["|", 1, 2, 3, 4, "|", 5, 6, "|", 7, 8, 9, "|"]
>>> [list(g) for k, g in groupby(data, key=lambda x: x != "|") if k]
[[1, 2, 3, 4], [5, 6], [7, 8, 9]]
This also works well whether or not your list starts and/or ends with a separator. Empty segments are discarded, though.
>>> data = [1, 2, "|", "|", 3, 4]
>>> [list(g) for k, g in groupby(data, key=lambda x: x != "|") if k]
[[1, 2], [3, 4]]
superList=[]
list=[]
for item in inputList:
if item == '|':
if list:
superList.append(list)
list=[]
else:
list.append(item)
print(superList)
Related
I have the following dataframe:
df = {'count1': [2.2336, 2.2454, 2.2538, 2.2716999999999996, 2.2798000000000003, 2.2843, 2.2906, 2.2969, 2.3223000000000003, 2.3282, 2.3356999999999997, 2.3544, 2.3651999999999997, 2.3727, 2.3775, 2.3823000000000003, 2.392, 2.4051, 2.4092, 2.4133, 2.4168000000000003, 2.4175, 2.4209, 2.4392, 2.4476, 2.456, 2.461, 2.4723, 2.4776, 2.4882, 2.4989, 2.5095, 2.5221999999999998, 2.5318, 2.5422, 2.5494, 2.559, 2.5654, 2.5814, 2.5878, 2.6238, 2.6178000000000003, 2.624, 2.6303, 2.6366, 2.6425, 2.6481999999999997, 2.6525, 2.6553, 2.663, 2.6712, 2.6898, 2.7051, 2.7144, 2.727, 2.7416, 2.7472, 2.7512, 2.7557, 2.7574, 2.7594000000000003, 2.7636, 2.7699000000000003, 2.7761, 2.7809, 2.7855, 2.7902, 2.7948000000000004, 2.7995, 2.8043, 2.815, 2.8249, 2.8352, 2.8455, 2.8708, 2.8874, 2.9004000000000003, 2.9301, 2.9399, 2.9513000000000003, 2.9634, 2.9745999999999997, 2.9852, 2.9959000000000002, 3.0037, 3.0093, 3.015, 3.0184, 3.0206, 3.0225, 3.0245, 3.0264, 3.0282, 3.0305999999999997, 3.0331, 3.0334, 3.0361, 3.0388, 3.0418000000000003, 3.0443000000000002, 3.0463, 3.0464, 3.0481, 3.0496999999999996, 3.0514, 3.0530999999999997, 3.0544000000000002, 3.0556, 3.0569, 3.0581, 3.0623, 3.0627, 3.0633000000000004, 3.0638, 3.0643000000000002, 3.0648, 3.0652, 3.0656999999999996, 3.0663, 3.0675, 3.0682, 3.0688, 3.0695, 3.0702, 3.0721, 3.0741, 3.0761, 3.078, 3.08, 3.082, 3.0839000000000003, 3.0859, 3.0879000000000003, 3.0898000000000003, 3.0918, 3.0938000000000003, 3.0994, 3.1050999999999997, 3.1144000000000003, 3.1613, 3.1649000000000003, 3.1752, 3.1869, 3.1899, 3.1925, 3.1976, 3.2001, 3.2051999999999996, 3.2098, 3.2123000000000004],
'count2': [3144, 3944, 7888, 4428, 68874, 5480, 56697, 20560, 8744, 91190, 352, 924, 1308611, 480, 51146, 170373, 58792, 11424, 1288673, 1845105, 401464, 657930, 1361172, 199373, 19753, 39082, 776, 7533, 9289, 36731, 53865, 100140, 59274, 35740, 2648, 144998, 78616, 848241, 34579, 216591, 22512, 4024, 17168, 1552, 13760, 8344, 65589, 43104, 44672, 917115, 16256, 4168, 29679, 22571, 7720, 452, 8836, 6888, 18578, 5148, 9289, 442, 214, 485, 3164, 1101, 1010, 9048, 293, 1628, 960, 517, 2362, 1262, 1524, 1173, 1348, 1288, 25568, 8416, 5792, 4944, 504, 4696, 2336, 458, 453, 1220, 1149, 6688, 6956, 7324, 7100, 7784, 5650, 5076, 5336, 6792, 5212, 4592, 5260, 1279, 654, 842, 990, 782, 1412, 1363, 935, 996, 775, 1471, 1525, 1398, 1097, 1082, 1668, 1007, 497, 598, 645, 698, 541, 504, 549, 540, 1568, 514, 578, 2906, 4360, 3916, 11944, 1434, 1589, 732, 641, 477, 307, 1884, 3232, 2408, 1016, 332, 139, 344, 4784, 1784, 1324, 204]}
df = pd.DataFrame(df)
And I want to plot a barplot with it, where the x axis is count1 and the y axis count2, with bins spaced every 0.1 intervals.
I used this:
plt.bar(x=df['count1'], y=df['count2'], width=0.1)
But it returns me this error:
TypeError: bar() missing 1 required positional argument: 'height'
I'm trying to replicate an R code:
ggplot(df, aes(x= count1,
y= count2)) +
geom_col() +
ylim(0, 2000000) +
scale_x_binned()
That generates the following graph:
To get a histogram from values and counts, you can use the weights= parameter of plt.hist.
To create bins with a width of 0.1, you can use np.arange(...,..., 0.1).
The rwidth=0.9 parameter makes the bars a bit narrower.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = {'count1': [2.2336, 2.2454, 2.2538, 2.2716999999999996, 2.2798000000000003, 2.2843, 2.2906, 2.2969, 2.3223000000000003, 2.3282, 2.3356999999999997, 2.3544, 2.3651999999999997, 2.3727, 2.3775, 2.3823000000000003, 2.392, 2.4051, 2.4092, 2.4133, 2.4168000000000003, 2.4175, 2.4209, 2.4392, 2.4476, 2.456, 2.461, 2.4723, 2.4776, 2.4882, 2.4989, 2.5095, 2.5221999999999998, 2.5318, 2.5422, 2.5494, 2.559, 2.5654, 2.5814, 2.5878, 2.6238, 2.6178000000000003, 2.624, 2.6303, 2.6366, 2.6425, 2.6481999999999997, 2.6525, 2.6553, 2.663, 2.6712, 2.6898, 2.7051, 2.7144, 2.727, 2.7416, 2.7472, 2.7512, 2.7557, 2.7574, 2.7594000000000003, 2.7636, 2.7699000000000003, 2.7761, 2.7809, 2.7855, 2.7902, 2.7948000000000004, 2.7995, 2.8043, 2.815, 2.8249, 2.8352, 2.8455, 2.8708, 2.8874, 2.9004000000000003, 2.9301, 2.9399, 2.9513000000000003, 2.9634, 2.9745999999999997, 2.9852, 2.9959000000000002, 3.0037, 3.0093, 3.015, 3.0184, 3.0206, 3.0225, 3.0245, 3.0264, 3.0282, 3.0305999999999997, 3.0331, 3.0334, 3.0361, 3.0388, 3.0418000000000003, 3.0443000000000002, 3.0463, 3.0464, 3.0481, 3.0496999999999996, 3.0514, 3.0530999999999997, 3.0544000000000002, 3.0556, 3.0569, 3.0581, 3.0623, 3.0627, 3.0633000000000004, 3.0638, 3.0643000000000002, 3.0648, 3.0652, 3.0656999999999996, 3.0663, 3.0675, 3.0682, 3.0688, 3.0695, 3.0702, 3.0721, 3.0741, 3.0761, 3.078, 3.08, 3.082, 3.0839000000000003, 3.0859, 3.0879000000000003, 3.0898000000000003, 3.0918, 3.0938000000000003, 3.0994, 3.1050999999999997, 3.1144000000000003, 3.1613, 3.1649000000000003, 3.1752, 3.1869, 3.1899, 3.1925, 3.1976, 3.2001, 3.2051999999999996, 3.2098, 3.2123000000000004],
'count2': [3144, 3944, 7888, 4428, 68874, 5480, 56697, 20560, 8744, 91190, 352, 924, 1308611, 480, 51146, 170373, 58792, 11424, 1288673, 1845105, 401464, 657930, 1361172, 199373, 19753, 39082, 776, 7533, 9289, 36731, 53865, 100140, 59274, 35740, 2648, 144998, 78616, 848241, 34579, 216591, 22512, 4024, 17168, 1552, 13760, 8344, 65589, 43104, 44672, 917115, 16256, 4168, 29679, 22571, 7720, 452, 8836, 6888, 18578, 5148, 9289, 442, 214, 485, 3164, 1101, 1010, 9048, 293, 1628, 960, 517, 2362, 1262, 1524, 1173, 1348, 1288, 25568, 8416, 5792, 4944, 504, 4696, 2336, 458, 453, 1220, 1149, 6688, 6956, 7324, 7100, 7784, 5650, 5076, 5336, 6792, 5212, 4592, 5260, 1279, 654, 842, 990, 782, 1412, 1363, 935, 996, 775, 1471, 1525, 1398, 1097, 1082, 1668, 1007, 497, 598, 645, 698, 541, 504, 549, 540, 1568, 514, 578, 2906, 4360, 3916, 11944, 1434, 1589, 732, 641, 477, 307, 1884, 3232, 2408, 1016, 332, 139, 344, 4784, 1784, 1324, 204]}
df = pd.DataFrame(df)
bin_start = np.trunc(df['count1'].min() * 10) / 10
bin_end = df['count1'].max() + 0.1
plt.style.use('ggplot')
plt.hist(x=df['count1'], weights=df['count2'], bins=np.arange(bin_start, bin_end, 0.1), rwidth=0.9)
plt.gca().get_yaxis().get_major_formatter().set_scientific(False)
plt.xlabel('count1')
plt.ylabel('count2')
plt.tight_layout()
plt.show()
I have some files that should be read. Based on their length, I have to choose some indices using a for-loop. The problem is that in every cycle of for-loop, it adds the past list to the next list which is incorrect. How can I stop this repetition and have one separated list in each cycle. The code that I have written is below:
import os
import pandas as pd
import numpy as np
# Function to find middle index
def find_midd(input_list): #middleIndex %2 != 0:
middleIndex = int(float(len(input_list))) /2
if middleIndex %2 == 0:
return middleIndex
else:
return middleIndex - 0.5
file1= np.arange(0,1197)
file2= np.arange(0,1000)
file3= np.arange(0,1204)
file4= np.arange(0,1303)
file5= np.arange(0,1100)
file6= np.arange(0,1420)
file7= np.arange(0,999)
l = [f for f in sorted(os.listdir('.')) if f.startswith('file')]
for i, d in enumerate(l):
df = pd.read_csv(d,sep="\s+",header=None)
e = df.iloc[:,5]
c = df.iloc[:,4]
df1 = pd.DataFrame()
middleIndex = find_midd(e)
middleIndex = int(middleIndex)
lis = [5,10,12, 15 , 17]
for i in lis:
ans = middleIndex+15*i
NEW_middle_index.append(ans)
print(NEW_middle_index)
The current wrong output is:
[641, 716, 746, 791, 821]
[641, 716, 746, 791, 821, 682, 757, 787, 832, 862]
[641, 716, 746, 791, 821, 682, 757, 787, 832, 862, 598, 673, 703, 748, 778]
[641, 716, 746, 791, 821, 682, 757, 787, 832, 862, 598, 673, 703, 748, 778, 675, 750, 780, 825, 855]
[641, 716, 746, 791, 821, 682, 757, 787, 832, 862, 598, 673, 703, 748, 778, 675, 750, 780, 825, 855, 707, 782, 812, 857, 887]
[641, 716, 746, 791, 821, 682, 757, 787, 832, 862, 598, 673, 703, 748, 778, 675, 750, 780, 825, 855, 707, 782, 812, 857, 887, 693, 768, 798, 843, 873]
The expected result:
[641, 716, 746, 791, 821]
[682, 757, 787, 832, 862]
[598, 673, 703, 748, 778]
[675, 750, 780, 825, 855]
[707, 782, 812, 857, 887]
[693, 768, 798, 843, 873]
In your code you are appending to the same variable NEW_middle_index. To solve this issue you can use a different variable for every iteration.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have a list full of integers(it's not sorted) and I have 2 input:
-input no.1 the sum I want to get
-input no.2 the maximum number of usable element to get the sum
The sum can't be higher than the given value(input no.1) but can be less by -10. The number of used elements of the list can be equal to or less than the given value(input no.2).
from random import choice
def Diff(li1, li2):
return (list(list(set(li1)-set(li2)) + list(set(li2)-set(li1))))
def find_the_elements(current_sum, wanted_sum, used_elements, max_number_of_elements, n_o_elements):
solution = 0
while solution != 1:
elemnt=choice(Diff(elemts, used_elements))
used_elements.append(elemnt)
current_sum+=elemnt
n_o_elements+=1
if max_number_of_elements<=max_number_of_elements and current_sum in wanted_sum:
return used_elements
elif n_o_elements>max_number_of_elements or current_sum>wanted_sum.stop:
return -1
else:
x=find_the_elements(current_sum=current_sum, wanted_sum=wanted_sum, used_elements=used_elements, n_o_elements=n_o_elements, max_number_of_elements=max_number_of_elements)
if x!=-1:
return used_elements
elif x==-1:
return -1
elemts = [535, 508, 456, 612, 764, 628, 530, 709, 676, 546, 579, 676,
564, 565, 742, 657, 577, 514, 650, 590, 621, 642, 684, 567, 670, 609, 571, 655, 681, 615, 617, 569, 656, 615,
542, 711, 777, 763, 663, 657, 532, 630, 636, 445, 495, 567, 603, 598, 629, 651, 608, 653, 669, 603, 655, 622,
578, 551, 560, 712, 642, 637, 545, 631, 479, 614, 710, 458, 615, 659, 636, 578, 629, 622, 584, 582, 650, 636,
693, 527, 577, 711, 601, 530, 1028, 683, 589, 590, 670, 409,582, 635, 558, 607, 648, 542, 726, 534, 540, 590, 649, 482, 664, 629, 555, 596, 613, 572, 516, 479, 562, 452,
586]
max_no_elements = int(input())
wanted_sum = int(input())
solution = -1
while solution == -1:
solution = find_the_elements(current_sum=0, wanted_sum=range(wanted_sum - 10, wanted_sum + 1), used_elements=[], max_number_of_elements=max_no_elements, n_o_elements=0)
print(solution)
That's my solution for it but I think I should do it differently because originally I work with a much bigger list and each elements(integer) of the list is much 10-20x bigger.
Recursion with memoization (i.e. dynamic programing) is probably the best approach for this:
def closeSum(A,S,N,p=0,memo=None):
if not N: return [],0
if memo is None: memo = dict() # memoization
if (S,N,p) in memo: return memo[S,N,p]
best,bestSum = [],0
for i,a in enumerate(A[p:],p): # combine remaining elements for sum
if a>S: continue # ignore excessive values
if a == S: return [a],a # end on perfect match
r = [a] + closeSum(A,S-a,N-1,i+1,memo)[0] # extend sum to get closer
sr = sum(r)
if sr+10>=S and sr>bestSum: # track best so far
best,bestSum = r,sr
memo[S,N,p]=(best,sum(best)) # memoization
return best,sum(best)
output:
elemts = [535, 508, 456, 612, 764, 628, 530, 709, 676, 546, 579, 676,
564, 565, 742, 657, 577, 514, 650, 590, 621, 642, 684, 567, 670, 609, 571, 655, 681, 615, 617, 569, 656, 615,
542, 711, 777, 763, 663, 657, 532, 630, 636, 445, 495, 567, 603, 598, 629, 651, 608, 653, 669, 603, 655, 622,
578, 551, 560, 712, 642, 637, 545, 631, 479, 614, 710, 458, 615, 659, 636, 578, 629, 622, 584, 582, 650, 636,
693, 527, 577, 711, 601, 530, 1028, 683, 589, 590, 670, 409,582, 635, 558, 607, 648, 542, 726, 534, 540, 590, 649, 482, 664, 629, 555, 596, 613, 572, 516, 479, 562, 452,
586]
closeSum(elemts,1001,3)
[456, 545], 1001
closeSum(elemts,5522,7)
[764, 742, 777, 763, 712, 1028, 726], 5512
closeSum(elemts,5522,10)
[535, 508, 456, 612, 764, 628, 530, 546, 409, 534], 5522
It works relatively fast when there is an exact match but still takes a while for the larger values/item counts when it doesn't.
Note that there is still room for some optimization such as keeping track of the total of remaining elements (from position p) and exiting if they can't add up to the target sum.
This question already has answers here:
How can I select a variable by (string) name?
(5 answers)
Closed 8 months ago.
I want to use form_sate_data which is a string str(), to reference same named local variable inside the is_valid_phone function, you can see in print function.
form_state_data will always be two character short code for states that also exists as local variable containing list of postal code as integer in function _is_valid_phone .
form_state_data = 'AL'
form_phone_data_sliced = 205
def is_valid_phone(form_state_data, form_phone_data_sliced):
# Some codes are not correct.
AL = [205, 251, 256, 334, 938]
AK = [907]
AZ = [480, 520, 602, 623, 928]
AR = [479, 501, 870]
CA = [209, 213, 279, 310, 323, 408, 415, 424, 442, 510, 530, 559, 562, 619, 626, 628, 650, 657, 661, 669, 707, 714, 747, 760, 805, 818, 820, 831, 858, 909, 916, 925, 949, 951]
CO = [303, 719, 720, 970]
CT = [203, 475, 860, 959]
DE = [302]
DC = [202]
FL = [239, 305, 321, 352, 386, 407, 561, 727, 754, 772, 786, 813, 850, 863, 904, 941, 954]
GA = [229, 404, 470, 478, 678, 706, 762, 770, 912]
HI = [808]
ID = [208, 986]
IL = [217, 224, 309, 312, 331, 618, 630, 708, 773, 779, 815, 847, 872]
IN = [219, 260, 317, 463, 574, 765, 812, 930]
IA = [319, 515, 563, 641, 712]
KS = [316, 620, 785, 913]
KY = [270, 364, 502, 606, 859]
LA = [225, 318, 337, 504, 985]
ME = [207]
MT = [339, 351, 413, 508, 617, 774, 781, 857, 978]
NE = [308, 402, 531]
NV = [702, 725, 775]
NH = [603]
NJ = [201, 551, 609, 640, 732, 848, 856, 862, 908, 973]
NM = [505, 575]
NY = [212, 315, 332, 347, 516, 518, 585, 607, 631, 646, 680, 716, 718, 838, 845, 914, 917, 929, 934]
NC = [252, 336, 704, 743, 828, 910, 919, 980, 984]
ND = [701]
OH = [216, 220, 234, 330, 380, 419, 440, 513, 567, 614, 740, 937]
OK = [405, 539, 580, 918]
OR = [458, 503, 541, 971]
MD = [240, 301, 410, 443, 667]
MA = [218, 320, 507, 612, 651, 763, 952]
MI = [228, 601, 662, 769]
MN = [218, 320, 507, 612, 651, 763, 952]
MS = [314, 417, 573, 636, 660, 816]
MO = [406]
PA = [215, 223, 267, 272, 412, 445, 484, 570, 610, 717, 724, 814, 878]
RI = [401]
SC = [803, 843, 854, 864]
SD = [605]
TN = [423, 615, 629, 731, 865, 901, 931]
TX = [210, 214, 254, 281, 325, 346, 361, 409, 430, 432, 469, 512, 682, 713, 726, 737, 806, 817, 830, 832, 903, 915, 936, 940, 956, 972, 979]
UT = [385, 435, 801]
VT = [802]
VA = [276, 434, 540, 571, 703, 757, 804]
WA = [206, 253, 360, 425, 509, 564]
WV = [304, 681]
WI = [262, 414, 534, 608, 715, 920]
WY = [307]
print(form_phone_data_sliced in form_state_data)
is_valid_phone(form_state_data,form_phone_data_sliced)
You should use a dictionary to store the state codes, below is a example of how to achieve this.
states = {
'AL': [205, 251, 256, 334, 938],
'AK': [907],
'AZ': [480, 520, 602, 623, 928],
'AR': [479, 501, 870],
'CA': [209, 213, 279, 310, 323, 408, 415, 424, 442, 510, 530, 559, 562, 619, 626, 628, 650, 657, 661, 669, 707, 714, 747, 760, 805, 818, 820, 831, 858, 909, 916, 925, 949, 951],
'CO': [303, 719, 720, 970],
'CT': [203, 475, 860, 959],
'DE': [302],
'DC': [202],
'FL': [239, 305, 321, 352, 386, 407, 561, 727, 754, 772, 786, 813, 850, 863, 904, 941, 954],
'GA': [229, 404, 470, 478, 678, 706, 762, 770, 912],
'HI': [808],
'ID': [208, 986],
'IL': [217, 224, 309, 312, 331, 618, 630, 708, 773, 779, 815, 847, 872],
'IN': [219, 260, 317, 463, 574, 765, 812, 930],
'IA': [319, 515, 563, 641, 712],
'KS': [316, 620, 785, 913],
'KY': [270, 364, 502, 606, 859],
'LA': [225, 318, 337, 504, 985],
'ME': [207],
'MT': [339, 351, 413, 508, 617, 774, 781, 857, 978],
'NE': [308, 402, 531],
'NV': [702, 725, 775],
'NH': [603],
'NJ': [201, 551, 609, 640, 732, 848, 856, 862, 908, 973],
'NM': [505, 575],
'NY': [212, 315, 332, 347, 516, 518, 585, 607, 631, 646, 680, 716, 718, 838, 845, 914, 917, 929, 934],
'NC': [252, 336, 704, 743, 828, 910, 919, 980, 984],
'ND': [701],
'OH': [216, 220, 234, 330, 380, 419, 440, 513, 567, 614, 740, 937],
'OK': [405, 539, 580, 918],
'OR': [458, 503, 541, 971],
'MD': [240, 301, 410, 443, 667],
'MA': [218, 320, 507, 612, 651, 763, 952],
'MI': [228, 601, 662, 769],
'MN': [218, 320, 507, 612, 651, 763, 952],
'MS': [314, 417, 573, 636, 660, 816],
'MO': [406],
'PA': [215, 223, 267, 272, 412, 445, 484, 570, 610, 717, 724, 814, 878],
'RI': [401],
'SC': [803, 843, 854, 864],
'SD': [605],
'TN': [423, 615, 629, 731, 865, 901, 931],
'TX': [210, 214, 254, 281, 325, 346, 361, 409, 430, 432, 469, 512, 682, 713, 726, 737, 806, 817, 830, 832, 903, 915, 936, 940, 956, 972, 979],
'UT': [385, 435, 801],
'VT': [802],
'VA': [276, 434, 540, 571, 703, 757, 804],
'WA': [206, 253, 360, 425, 509, 564],
'WV': [304, 681],
'WI': [262, 414, 534, 608, 715, 920],
'WY': [307],
}
is_valid_phone = lambda state, code : code in states[state]
print(is_valid_phone('AL', 205))
print(is_valid_phone('AL', 2000005))
If you really want to assign these to variables (bad practice), instead of making you can change function for class, and then access all variables by calling vars() on the class.
class Phone:
AL = [1,2]
p = phone()
print(vars(p)['AL'])
vars() takes an object and outputs a dict of all objects inside it, accessible with strings.
What you want to do like people in the comments have said is make all of the states into a dictionary and then use this code:
if form_phone_data_sliced in states[form_state_code]:
return True
This question already has answers here:
Python Multiprocessing Numpy Random [duplicate]
(2 answers)
Closed 7 years ago.
I'm analyzing a large graph. So, I divide the graph into chunks and hopefully with multi-core CPU it would be faster. However, my model is a randomized model so there's a chance that the results of each run won't be the same. I'm testing the idea and I get the same result all the time so I'm wondering if my code is correct.
Here's my code
from multiprocessing import Process, Queue
# split a list into evenly sized chunks
def chunks(l, n):
return [l[i:i+n] for i in range(0, len(l), n)]
def multiprocessing_icm(queue, nodes):
queue.put(independent_cascade_igraph(twitter_igraph, nodes, steps=1))
def dispatch_jobs(data, job_number):
total = len(data)
chunk_size = total / job_number
slice = chunks(data, chunk_size)
jobs = []
processes = []
queue = Queue()
for i, s in enumerate(slice):
j = Process(target=multiprocessing_icm, args=(queue, s))
jobs.append(j)
for j in jobs:
j.start()
for j in jobs:
j.join()
return queue
dispatch_jobs(['121817564', '121817564'], 2)
if you're wondering what independent_cascade_igraph is. Here's the code
def independent_cascade_igraph(G, seeds, steps=0):
# init activation probabilities
for e in G.es():
if 'act_prob' not in e.attributes():
e['act_prob'] = 0.1
elif e['act_prob'] > 1:
raise Exception("edge activation probability:", e['act_prob'], "cannot be larger than 1")
# perform diffusion
A = copy.deepcopy(seeds) # prevent side effect
if steps <= 0:
# perform diffusion until no more nodes can be activated
return _diffuse_all(G, A)
# perform diffusion for at most "steps" rounds
return _diffuse_k_rounds(G, A, steps)
def _diffuse_all(G, A):
tried_edges = set()
layer_i_nodes = [ ]
layer_i_nodes.append([i for i in A]) # prevent side effect
while True:
len_old = len(A)
(A, activated_nodes_of_this_round, cur_tried_edges) = _diffuse_one_round(G, A, tried_edges)
layer_i_nodes.append(activated_nodes_of_this_round)
tried_edges = tried_edges.union(cur_tried_edges)
if len(A) == len_old:
break
return layer_i_nodes
def _diffuse_k_rounds(G, A, steps):
tried_edges = set()
layer_i_nodes = [ ]
layer_i_nodes.append([i for i in A])
while steps > 0 and len(A) < G.vcount():
len_old = len(A)
(A, activated_nodes_of_this_round, cur_tried_edges) = _diffuse_one_round(G, A, tried_edges)
layer_i_nodes.append(activated_nodes_of_this_round)
tried_edges = tried_edges.union(cur_tried_edges)
if len(A) == len_old:
break
steps -= 1
return layer_i_nodes
def _diffuse_one_round(G, A, tried_edges):
activated_nodes_of_this_round = set()
cur_tried_edges = set()
for s in A:
for nb in G.successors(s):
if nb in A or (s, nb) in tried_edges or (s, nb) in cur_tried_edges:
continue
if _prop_success(G, s, nb):
activated_nodes_of_this_round.add(nb)
cur_tried_edges.add((s, nb))
activated_nodes_of_this_round = list(activated_nodes_of_this_round)
A.extend(activated_nodes_of_this_round)
return A, activated_nodes_of_this_round, cur_tried_edges
def _prop_success(G, src, dest):
'''
act_prob = 0.1
for e in G.es():
if (src, dest) == e.tuple:
act_prob = e['act_prob']
break
'''
return random.random() <= 0.1
Here's the result of multiprocessing
[['121817564'], [1538, 1539, 4, 517, 1547, 528, 2066, 1623, 1540, 538, 1199, 31, 1056, 1058, 547, 1061, 1116, 1067, 1069, 563, 1077, 1591, 1972, 1595, 1597, 1598, 1088, 1090, 1608, 1656, 1098, 1463, 1105, 1619, 1622, 1111, 601, 1627, 604, 1629, 606, 95, 612, 101, 1980, 618, 1652, 1897, 1144, 639, 640, 641, 647, 650, 1815, 1677, 143, 1170, 1731, 660, 1173, 1690, 1692, 1562, 1563, 1189, 1702, 687, 689, 1203, 1205, 1719, 703, 1219, 1229, 1744, 376, 1746, 211, 1748, 213, 1238, 218, 221, 735, 227, 1764, 741, 230, 1769, 1258, 1780, 1269, 1783, 761, 763, 1788, 1789, 1287, 769, 258, 1286, 263, 264, 780, 1298, 1299, 1812, 473, 1822, 1828, 806, 811, 1324, 814, 304, 478, 310, 826, 1858, 1349, 326, 327, 1352, 329, 1358, 336, 852, 341, 854, 1879, 1679, 868, 2022, 1385, 1902, 1904, 881, 1907, 1398, 1911, 888, 1940, 1402, 1941, 1920, 1830, 387, 1942, 905, 1931, 1411, 399, 1426, 915, 916, 917, 406, 407, 1433, 1947, 1441, 419, 1445, 1804, 428, 1454, 1455, 948, 1973, 951, 1466, 443, 1468, 1471, 1474, 1988, 966, 1479, 1487, 976, 467, 1870, 2007, 985, 1498, 990, 1504, 1124, 485, 486, 489, 492, 2029, 2033, 1524, 1534, 2038, 1018, 1535, 510, 1125]]
[['121817564'], [1538, 1539, 4, 517, 1547, 528, 2066, 1623, 1540, 538, 1199, 31, 1056, 1058, 547, 1061, 1116, 1067, 1069, 563, 1077, 1591, 1972, 1595, 1597, 1598, 1088, 1090, 1608, 1656, 1098, 1463, 1105, 1619, 1622, 1111, 601, 1627, 604, 1629, 606, 95, 612, 101, 1980, 618, 1652, 1897, 1144, 639, 640, 641, 647, 650, 1815, 1677, 143, 1170, 1731, 660, 1173, 1690, 1692, 1562, 1563, 1189, 1702, 687, 689, 1203, 1205, 1719, 703, 1219, 1229, 1744, 376, 1746, 211, 1748, 213, 1238, 218, 221, 735, 227, 1764, 741, 230, 1769, 1258, 1780, 1269, 1783, 761, 763, 1788, 1789, 1287, 769, 258, 1286, 263, 264, 780, 1298, 1299, 1812, 473, 1822, 1828, 806, 811, 1324, 814, 304, 478, 310, 826, 1858, 1349, 326, 327, 1352, 329, 1358, 336, 852, 341, 854, 1879, 1679, 868, 2022, 1385, 1902, 1904, 881, 1907, 1398, 1911, 888, 1940, 1402, 1941, 1920, 1830, 387, 1942, 905, 1931, 1411, 399, 1426, 915, 916, 917, 406, 407, 1433, 1947, 1441, 419, 1445, 1804, 428, 1454, 1455, 948, 1973, 951, 1466, 443, 1468, 1471, 1474, 1988, 966, 1479, 1487, 976, 467, 1870, 2007, 985, 1498, 990, 1504, 1124, 485, 486, 489, 492, 2029, 2033, 1524, 1534, 2038, 1018, 1535, 510, 1125]]
But here's the example if I run indepedent_cascade_igraph twice
independent_cascade_igraph(twitter_igraph, ['121817564'], steps=1)
[['121817564'],
[514,
1773,
1540,
1878,
2057,
1035,
1550,
2064,
1042,
533,
1558,
1048,
1054,
544,
545,
1061,
1067,
1885,
1072,
350,
1592,
1460,...
independent_cascade_igraph(twitter_igraph, ['121817564'], steps=1)
[['121817564'],
[1027,
2055,
8,
1452,
1546,
1038,
532,
1045,
542,
546,
1059,
549,
1575,
1576,
2030,
1067,
1068,
1071,
564,
573,
575,
1462,
584,
1293,
1105,
595,
599,
1722,
1633,
1634,
614,
1128,
1131,
1286,
621,
1647,
1648,
627,
636,
1662,
1664,
1665,
130,
1671,
1677,
656,
1169,
148,
1686,
1690,
667,
1186,
163,
1700,
1191,
1705,
1711,...
So, what I'm hoping to get out of this is if I have a list of 500 ids, I would like the first CPU to calculate the first 250 and the second CPU to calculate the last 250 and then merge the result. I'm not sure if I understand multiprocessing correctly.
As mentioned e.g. in this SO answer, in *nix child processes inherit the state of the RNG. Call random.seed() in every child process to initialize it yourself to a per-process seed, or randomly.
Haven't read your program in detail but my general feeling is that you probably have a random number generator seed problem. If you run twice the program on the same CPU the random number generator's state will be different the second time you run it. But if you run it on 2 different CPUs, maybe your generators are initialized with the same default seed, thus giving the same results.