Related
For Example
first_interval = [40, 50, 60, 70, 80, 90]
second_interval = [49, 59, 69, 79, 89, 99]
Data = [40, 42, 47, 49, 50, 52, 55, 56, 57, 59, 60, 61, 63, 65, 65, 65, 66, 68, 68, 69, 72, 74, 78, 79, 81, 85, 87, 88, 90, 98]
x = first_interval[0] <= data <= second_interval[0]
y = first_interval[1] <= data <= second_intercal[1] # and so on
I want to know how many numbers from data is between 40-49, 50-59, 60-69 and so on
frequency = [4, 6] # 4 is x and 6 is y
Iterate on the bounds using zip, then with a list comprehension you can filter the correct values
first_interval = [40, 50, 60, 70, 80, 90]
second_interval = [49, 59, 69, 79, 89, 99]
data = [40, 42, 47, 49, 50, 52, 55, 56, 57, 59, 60, 61, 63, 65, 65,
65, 66, 68, 68, 69, 72, 74, 78, 79, 81, 85, 87, 88, 90, 98]
result = {}
for start, end in zip(first_interval, second_interval):
result[(start, end)] = len([v for v in data if start <= v <= end])
print(result)
# {(40, 49): 4, (50, 59): 6, (60, 69): 10, (70, 79): 4, (80, 89): 4, (90, 99): 2}
print(result[(40, 49)])
# 4
The version with a list and len is easier to understand
result[(start, end)] = len([v for v in data if start <= v <= end])
But the following version would be more performant for bigger size, as it's a generator, it won't have to build the whole list to just forget it after
result[(start, end)] = sum((1 for v in data if start <= v <= end))
Another version, that doesn't use the predefined bounds, and so is much performant as it's complexity is O(n) and not O(n*m) as the first one : you iterate once on values, not on values for each bounds
result = defaultdict(int) # from collections import defaultdict
for value in data:
start = 10 * (value // 10)
result[(start, start + 9)] += 1
This may help you :
first_interval = [40, 50, 60, 70, 80, 90]
second_interval = [49, 59, 69, 79, 89, 99]
Data = [40, 42, 47, 49, 50, 52, 55, 56, 57, 59, 60, 61, 63, 65, 65, 65, 66, 68, 68, 69, 72, 74, 78, 79, 81, 85, 87, 88, 90, 98]
def find_occurence(start,end,data):
counter = 0
for i in data :
if start<=i<=end :
counter += 1
return counter
print(find_occurence(first_interval[0],second_interval[0],Data)) #this gives you the anser for x and the same thing for y
Note : start :means from where you want to start.
end : mean where you want to stop.
We can use numpy.histogram with bins defined by:
first_interval bins, but open on the right
max(second_interval) to determine the close of rightmost bin
Code
# Generate counts and bins (right most edge given by max(second_interval))
frequency, bins = np.histogram(data, bins = first_interval + [max(second_interval)])
# Show Results
for i in range(len(frequency)):
if i < len(frequency) - 1:
print(f'{bins[i]}-{bins[i+1]-1} : {frequency[i]}') # frequency doesn't include right edge
else:
print(f'{bins[i]}-{bins[i+1]} : {frequency[i]}') # frequency includes right edge in last bin
Output
40-49 : 4
50-59 : 6
60-69 : 10
70-79 : 4
80-89 : 4
90-99 : 2
I have two sets of data here:
data_feb = ['1st February', 45, 68, 70, 61, 54, 80, 72, 69, 73, 72, 58, 72, 64, 45, 42]
data_aug = ['1st August', 19, 27, 41, 42, 9, 14, 29, 34, 25, 29, 44, 43, 6, 17]
I loop over it to create another list here:
for i in data_feb:
#
if type(i) == int:
feb_numbers.append(i)
for i in data_aug:
if type(i) == int:
aug_numbers.append(i)
But here i have an algorithm to sort them:
feb_zero_to_ten = []
feb_ten_to_twenty = []
feb_twenty_to_thirty = []
feb_thirty_to_forty = []
feb_forty_to_fifty = []
feb_fifty_to_sixty = []
feb_sixty_to_seventy = []
feb_seventy_to_eighty = []
feb_eighty_to_ninety = []
feb_ninety_to_hundred = []
aug_zero_to_ten = []
aug_ten_to_twenty = []
aug_twenty_to_thirty = []
aug_thirty_to_forty = []
aug_forty_to_fifty = []
aug_fifty_to_sixty = []
aug_sixty_to_seventy = []
aug_seventy_to_eighty = []
aug_eighty_to_ninety = []
aug_ninety_to_hundred = []
# for loop to iterate over months numbers, sorting them into their correct columns by the 'tens' digit
for i, j in zip(feb_numbers, aug_numbers):
if 0 <= i < 10 and 0 <= j < 10:
feb_zero_to_ten.append(i)
aug_zero_to_ten.append(j)
elif 10 <= i < 20 and 10 <= j < 20:
feb_ten_to_twenty.append(i)
aug_ten_to_twenty.append(j)
elif 20 <= i < 30 and 20 <= j < 30:
feb_twenty_to_thirty.append(i)
aug_twenty_to_thirty.append(j)
elif 30 <= i < 40 and 30 <= j < 40:
feb_thirty_to_forty.append(i)
aug_thirty_to_forty.append(j)
elif 40 <= i < 50 and 40 <= j < 50:
feb_forty_to_fifty.append(i)
aug_forty_to_fifty.append(j)
elif 50 <= i < 60 and 50 <= j < 60:
feb_fifty_to_sixty.append(i)
aug_fifty_to_sixty.append(j)
elif 60 <= i < 70 and 60 <= j < 70:
feb_sixty_to_seventy.append(i)
aug_sixty_to_seventy.append(j)
elif 70 <= i < 80 and 70 <= j < 80:
feb_seventy_to_eighty.append(i)
aug_seventy_to_eighty.append(j)
elif 80 <= i < 90 and 80 <= j < 90:
feb_eighty_to_ninety.append(i)
aug_eighty_to_ninety.append(j)
elif 90 <= i < 100 and 90 <= j < 100:
feb_ninety_to_hundred.append(i)
aug_ninety_to_hundred.append(j)
This approach using zip() is not working. I am wondering if using this approach is not worth it, also I am trying to make this code as efficient as possible so any pointers would be very helpful. Thank you.
data_feb = ['1st February', 45, 68, 70, 61, 54, 80, 72, 69, 73, 72, 58, 72, 64, 45, 42]
data_aug = ['1st August', 19, 27, 41, 42, 9, 14, 29, 34, 25, 29, 44, 43, 6, 17]
feb_numbers=[i for i in data_feb if isinstance(i,int) ]
aug_numbers=[i for i in data_aug if isinstance(i,int) ]
from itertools import groupby
[list(g) for k,g in groupby(sorted(aug_numbers),key=lambda x :x//10)]
Output:
[[6, 9], [14, 17, 19], [25, 27, 29, 29], [34], [41, 42, 43, 44]]
You can use itertools groupby to group those numbers
Your approach is flawed here. you test that both i and j are BOTH in a range but if you look at your numbers you will see that might not happen, thing like (45, 19) do not fit in any of the ifs. If you look at the logic you are trying to achieve then you will notice that you actually want to separate you number by their leading digit (the tens), An easy approach is to make buckets and fill them like this:
feb_buckets = [[] for item in range(10)] # this makes a list of 10 buckets (lists)
aug_buckets = [[] for item in range(10)]
for feb, aug in zip(feb_numbers,aug_numbers):
feb_bucket[feb//10].append(feb) # // is integer division (which rounds down)
aug_bucket[aug//10].append(aug)
once you understand the logic you can then simplify the code even further by taking #ajay approach and using itertools.groupby
Don't create all list by hand, just create list of lists and then access them by index.
For numbers in range 0 to 10 use feb[0] to 10 to 20 feb[1] etc.
If you don't know if the lists will have the same length, use function find_in_range for each list.
You can use this code below for this:
data_feb = ['1st February', 45, 68, 70, 61, 54, 80, 72, 69, 73, 72, 58, 72, 64, 45, 42]
data_aug = ['1st August', 19, 27, 41, 42, 9, 14, 29, 34, 25, 29, 44, 43, 6, 17]
#Dont create all list by hand, just create list of lists and then acces them by index
feb = [[] for i in range(10)]
aug = [[] for i in range(10)]
def find_in_range(in_list, out_list):
for x in sorted(in_list[1:]): #exclude the first index, because it is a string
for i in range(10):
if i*10 < x < (i+1)*10:
out_list[i].append(x)
find_in_range(data_feb, feb)
find_in_range(data_aug, aug)
print("Feb: ", feb)
print("Aug", aug)
This is the output:
Feb: [[], [], [], [], [42, 45, 45], [54, 58], [61, 64, 68, 69], [72, 72, 72, 73], [], []]
Aug [[6, 9], [14, 17, 19], [25, 27, 29, 29], [34], [41, 42, 43, 44], [], [], [], [], []]
As you can see the first four arrays of feb list are empty because in the list data_feb were not any numbers between 0-40.
One good way to accomplish the sort is by having a sorted list of lists. The final output would be fed_sorted and aug_sorted. The i-th list would be in range of [i*10, (i+1)*10).
data_feb = ['1st February', 45, 68, 70, 61, 54, 80, 72, 69, 73, 72, 58, 72, 64, 45, 42]
data_aug = ['1st August', 19, 27, 41, 42, 9, 14, 29, 34, 25, 29, 44, 43, 6, 17]
feb_numbers = [x for x in data_feb if type(x) == int]
aug_numbers = [x for x in data_aug if type(x) == int]
GROUP_SIZE = 10 # 0-9, 10-19, 20-29....
feb_sorted = [[x for x in feb_numbers if x in range(i * 10, (i + 1) * 10)] for i in range(GROUP_SIZE)]
aug_sorted = [[x for x in aug_numbers if x in range(i * 10, (i + 1) * 10)] for i in range(GROUP_SIZE)]
print(feb_sorted)
print(aug_sorted)
The first part of your code can be made more effecient like so:
data_feb = ['1st February', 45, 68, 70, 61, 54, 80, 72, 69, 73, 72, 58, 72, 64, 45, 42]
data_aug = ['1st August', 19, 27, 41, 42, 9, 14, 29, 34, 25, 29, 44, 43, 6, 17]
data_feb = [x for x in data_feb if x == int]
data_aug = [x for x in data_aug if x == int]
For the second part it is unclear what you are trying to achieve. Can you try to give some more background? Why do you need this many lists? What do you mean by sorting them and why do you need them sorted in this way?
Please see the code listed below for clarification
Def AvgCalc(test)
Return sum(test) / Len (test)
test = [2,4,3,10,33]
Answer = AvgCalc(test)
Print(“Avg is “ + answer + )
if test[0] > (answer*1.2)
Print test[0]
if test[1] > (answer*1.2)
Print test[1]
if test[2] > (answer*1.2)
Print test[2]
if test[3] > (answer*1.2)
Print test[3]
if test[4] > (answer*1.2$
Print test[4]
Try this
def AvgCalc( values ):
avg = sum( values ) / len( values )
print( "Average is: " + str( avg ) )
print( [ q for q in values if q > avg*1.2 ] )
return avg
For example
>>> x = [79, 46, 49, 6, 7, 23, 96, 1, 76, 33, 94, 59, 12, 73, 61, 41, 47, 97, 1, 82]
>>> AvgCalc( x )
Average is: 49.15
[79, 96, 76, 94, 59, 73, 61, 97, 82]
49.15
If you really need each value over 20% to be printed on a separate line, change the last line of AvgCalc to
print( "\n".join( [ str(q) for q in values if q > avg*1.2 ] ) )
Output
>>> x = [79, 46, 49, 6, 7, 23, 96, 1, 76, 33, 94, 59, 12, 73, 61, 41, 47, 97, 1, 82]
>>> AvgCalc( x )
Average is: 49.15
79
96
76
94
59
73
61
97
82
49.15
I have two arrays and I am wanting to loop through a second array to only return arrays whose first element is equal to an element from another array.
a = [10, 11, 12, 13, 14]
b = [[9, 23, 45, 67, 56, 23, 54], [10, 8, 52, 30, 15, 47, 109], [11, 81,
152, 54, 112, 78, 167], [13, 82, 84, 63, 24, 26, 78], [18, 182, 25, 63, 96,
104, 74]]
I have two different arrays, a and b. I would like to find a way to look through each of the sub-arrays(?) within b in which
the first value is equal to the values in array a to create a new array, c.
The result I am looking for is:
c = [[10, 8, 52, 30, 15, 47, 109],[11, 81, 152, 54, 112, 78, 167],[13, 82, 84, 63, 24, 26, 78]]
Does Python have a tool to do this in a way Excel has MATCH()?
I tried looping in a manner such as:
for i in a:
if i in b:
print (b)
But because there are other elements within the array, this way is not working. Any help would be greatly appreciated.
Further explanation of the problem:
a = [5, 6, 7, 9, 12]
I read in a excel file using XLRD (b_csv_data):
Start Count Error Constant Result1 Result2 Result3 Result4
5 41 0 45 23 54 66 19
5.4 44 1 21 52 35 6 50
6 16 1 42 95 39 1 13
6.9 50 1 22 71 86 59 97
7 38 1 43 50 47 83 67
8 26 1 29 100 63 15 40
9 46 0 28 85 9 27 81
12 43 0 21 74 78 20 85
Next, I created a look to read in a select number of rows. For simplicity, this file above only has a few rows. My current file has about 100 rows.
for r in range (1, 7): #skipping headers and only wanting first few rows to start
b_raw = b_csv_data.row_values(r)
b = np.array(b_raw) # I created this b numpy array from the line of code above
Use np.isin -
In [8]: b[np.isin(b[:,0],a)]
Out[8]:
array([[ 10, 8, 52, 30, 15],
[ 11, 81, 152, 54, 112],
[ 13, 82, 84, 63, 24]])
With sorted a, we can also use np.searchsorted -
idx = np.searchsorted(a,b[:,0])
idx[idx==len(a)] = 0
out = b[a[idx] == b[:,0]]
If you have an array with different number of elements per row, which is essentially array of lists, you need to modify the slicing part. So, in that case, get the first off elements -
b0 = [bi[0] for bi in b]
Then, use b0 to replace all instances of b[:,0] in earlier posted methods.
Use list comprehension:
c = [l for l in b if l[0] in a]
Output:
[[10, 8, 52, 30, 15], [11, 81, 152, 54, 112], [13, 82, 84, 63, 24]]
If your list or arrays are considerably large, using numpy.isin can be significantly faster:
b[np.isin(b[:, 0], a), :]
Benchmark:
a = [10, 11, 12, 13, 14]
b = [[9, 23, 45, 67, 56], [10, 8, 52, 30, 15], [11, 81, 152, 54, 112],
[13, 82, 84, 63, 24], [18, 182, 25, 63, 96]]
list_comp, np_isin = [], []
for i in range(1,100):
a_test = a * i
b_test = b * i
list_comp.append(timeit.timeit('[l for l in b_test if l[0] in a_test]', number=10, globals=globals()))
a_arr = np.array(a_test)
b_arr = np.array(b_test)
np_isin.append(timeit.timeit('b_arr[np.isin(b_arr[:, 0], a_arr), :]', number=10, globals=globals()))
While it is not clear and concise, I would recommend using list comprehension if the b is shorter than 100. Otherwise, numpy is your way to go.
You are doing it reverse. It is better to loop through the elements of b array and check if it is present in a. If yes then print that element of b. See the answer below.
a = [10, 11, 12, 13, 14]
b = [[9, 23, 45, 67, 56, 23, 54], [10, 8, 52, 30, 15, 47, 109], [11, 81, 152, 54, 112, 78, 167], [13, 82, 84, 63, 24, 26, 78], [18, 182, 25, 63, 96, 104, 74]]
for bb in b: # if you want to check only the first element of b is in a
if bb[0] in a:
print(bb)
for bb in b: # if you want to check if any element of b is in a
for bbb in bb:
if bbb in a:
print(bb)
Output:
[10, 8, 52, 30, 15, 47, 109]
[11, 81, 152, 54, 112, 78, 167]
[13, 82, 84, 63, 24, 26, 78]
I have a list of strings, it's like:
['25 32 49 50 61 72 78 41\n',
'41 51 69 72 33 81 24 66\n']
I want to convert this list of strings, to a list of lists of ints. So my list would be:
[[25, 32, 49, 50, 61, 72, 78, 41], [41, 51, 69, 72, 33, 81, 24, 66]]
I've been thinking over this for a while, and couldn't find a solution.
By the way, the list of strings, which I gave above, is populated using
open("file", "r").readlines()
use split() to split the string into list, and then use int() to convert them into integers.
using map():
In [10]: lis=['25 32 49 50 61 72 78 41\n',
....: '41 51 69 72 33 81 24 66\n']
In [11]: [map(int,x.split()) for x in lis]
Out[11]: [[25, 32, 49, 50, 61, 72, 78, 41], [41, 51, 69, 72, 33, 81, 24, 66]]
or using list comprehension:
In [14]: [[int(y) for y in x.split()] for x in lis]
Out[14]: [[25, 32, 49, 50, 61, 72, 78, 41], [41, 51, 69, 72, 33, 81, 24, 66]]
you can directly create this list from your file also, no need of readlines():
with open("file") as f:
lis=[map(int,line.split()) for line in f]
print lis
...
[[25, 32, 49, 50, 61, 72, 78, 41], [41, 51, 69, 72, 33, 81, 24, 66]]
x = ['25 32 49 50 61 72 78 41\n', '41 51 69 72 33 81 24 66\n']
map(lambda elem:map(int, elem.split()), x)
b=[[int(x) for x in i.split()] for i in open("file", "r").readlines()]
Try this list comprehension