Bin sequential values in list - python

I have the list:
new_maks = [75, 76, 77, 78, 79, 80, 81, 85, 86, 87, 88, 89, 91]
I want to bin the elements into areas where the next element are sequentially increased by 1. My initial idea is basically to initialize two lists bin_start and bin_end and iterate through new_maks to check for sequential values.
bin_start = []
bin_end = []
counter = 0
for i in range(len(new_maks)):
if new_maks[i] == new_maks[0]:
bin_start.append(new_maks[i])
elif (new_maks[i] - new_maks[i-1]) ==1:
try:
bin_end[counter] = new_maks[i]
except:
bin_end.append(new_maks[i])
elif (new_maks[i] - new_maks[i-1]) >1:
if new_maks[i] != new_maks[-1]:
bin_start.append(new_maks[i])
counter +=1
Which produces the desired result of:
bin_start= [75, 85]
bin_end = [81, 89]
Is there a simpler/vectorized way to achieve this result?

Here's for performance efficiency with NumPy tools -
def start_stop_seq1(a):
m = np.r_[False,np.diff(a)==1,False]
return a[m[:-1]!=m[1:]].reshape(-1,2).T
Sample run -
In [34]: a # input array
Out[34]:
array([ 75, 76, 77, 78, 79, 80, 81, 85, 86, 87, 88, 89, 91,
92, 93, 100, 101, 110])
In [35]: start_stop_seq1(a)
Out[35]:
array([[ 75, 85, 91, 100],
[ 81, 89, 93, 101]])
Alternative #1 : One liner with one more np.diff
We can go one step further to achieve compactness -
In [43]: a[np.diff(np.r_[False,np.diff(a)==1,False])].reshape(-1,2).T
Out[43]:
array([[ 75, 85, 91, 100],
[ 81, 89, 93, 101]])

A simpler way could be to use groupby and count:
from itertools import groupby, count
counter = count(1)
new_mask = [75, 76, 77, 78, 79, 80, 81, 85, 86, 87, 88, 89]
generator = ((first, last) for key, (first, *_, last) in groupby(new_mask, key=lambda val: val - next(counter)))
bin_start, bin_end = zip(*generator)
print(bin_start)
print(bin_end)
Output
(75, 85)
(81, 89)
This is based in an old itertools recipe. If you fancy pandas you could do something like this:
import pandas as pd
new_mask = [75, 76, 77, 78, 79, 80, 81, 85, 86, 87, 88, 89]
s = pd.Series(data=new_mask)
result = s.groupby(s.values - s.index).agg(['first', 'last'])
bin_start, bin_end = zip(*result.itertuples(index=False))
print(bin_start)
print(bin_end)
Again this is based on the principle that consecutive increasing (by 1) values will have the same difference against an running sequence. As mentioned in the linked documentation:
The key to the solution is differencing with a range so that
consecutive numbers all appear in same group.

Related

given an random 100 number with duplicate. I want to count how many number is inside an interval of number in python

For Example
first_interval = [40, 50, 60, 70, 80, 90]
second_interval = [49, 59, 69, 79, 89, 99]
Data = [40, 42, 47, 49, 50, 52, 55, 56, 57, 59, 60, 61, 63, 65, 65, 65, 66, 68, 68, 69, 72, 74, 78, 79, 81, 85, 87, 88, 90, 98]
x = first_interval[0] <= data <= second_interval[0]
y = first_interval[1] <= data <= second_intercal[1] # and so on
I want to know how many numbers from data is between 40-49, 50-59, 60-69 and so on
frequency = [4, 6] # 4 is x and 6 is y
Iterate on the bounds using zip, then with a list comprehension you can filter the correct values
first_interval = [40, 50, 60, 70, 80, 90]
second_interval = [49, 59, 69, 79, 89, 99]
data = [40, 42, 47, 49, 50, 52, 55, 56, 57, 59, 60, 61, 63, 65, 65,
65, 66, 68, 68, 69, 72, 74, 78, 79, 81, 85, 87, 88, 90, 98]
result = {}
for start, end in zip(first_interval, second_interval):
result[(start, end)] = len([v for v in data if start <= v <= end])
print(result)
# {(40, 49): 4, (50, 59): 6, (60, 69): 10, (70, 79): 4, (80, 89): 4, (90, 99): 2}
print(result[(40, 49)])
# 4
The version with a list and len is easier to understand
result[(start, end)] = len([v for v in data if start <= v <= end])
But the following version would be more performant for bigger size, as it's a generator, it won't have to build the whole list to just forget it after
result[(start, end)] = sum((1 for v in data if start <= v <= end))
Another version, that doesn't use the predefined bounds, and so is much performant as it's complexity is O(n) and not O(n*m) as the first one : you iterate once on values, not on values for each bounds
result = defaultdict(int) # from collections import defaultdict
for value in data:
start = 10 * (value // 10)
result[(start, start + 9)] += 1
This may help you :
first_interval = [40, 50, 60, 70, 80, 90]
second_interval = [49, 59, 69, 79, 89, 99]
Data = [40, 42, 47, 49, 50, 52, 55, 56, 57, 59, 60, 61, 63, 65, 65, 65, 66, 68, 68, 69, 72, 74, 78, 79, 81, 85, 87, 88, 90, 98]
def find_occurence(start,end,data):
counter = 0
for i in data :
if start<=i<=end :
counter += 1
return counter
print(find_occurence(first_interval[0],second_interval[0],Data)) #this gives you the anser for x and the same thing for y
Note : start :means from where you want to start.
end : mean where you want to stop.
We can use numpy.histogram with bins defined by:
first_interval bins, but open on the right
max(second_interval) to determine the close of rightmost bin
Code
# Generate counts and bins (right most edge given by max(second_interval))
frequency, bins = np.histogram(data, bins = first_interval + [max(second_interval)])
# Show Results
for i in range(len(frequency)):
if i < len(frequency) - 1:
print(f'{bins[i]}-{bins[i+1]-1} : {frequency[i]}') # frequency doesn't include right edge
else:
print(f'{bins[i]}-{bins[i+1]} : {frequency[i]}') # frequency includes right edge in last bin
Output
40-49 : 4
50-59 : 6
60-69 : 10
70-79 : 4
80-89 : 4
90-99 : 2

Sorting Algorithm output at end of pass 3

Given the following initially unsorted list:
[77, 101, 40, 43, 81, 129, 85, 144]
Which sorting algorithm produces the following list at the end of Pass Number 3? Is it Bubble, Insertion or Selection?
[40, 43, 77, 81, 85, 101, 129, 144]
Can someone give me a clue on how I can solve this please.
Insertion sort would change the relative order of at most 3 items in 3 passes resulting in the first 3 items being in order and the rest unchanged. Selection sort would affect only the positions of the first 3 items and the 3 smallest (or greatest) items. Only the bubble sort would swap other items around. The movements of 40 and 129 is a telltale sign that points to a Bubble sort.
Note that this may be a trick question because all numbers that need to be shifted are at most 2 positions off except 2 of them (101 & 129 which are the 2nd and 3rd largest and would end up in their right places after 2 passes). A properly implemented Bubble sort would not get to a 3rd pass. So the answer could be "none of them"
Insertion sort:
def insertion_sort(array):
for i in range(1, len(array)):
key_item = array[i]
j = i - 1
while j >= 0 and array[j] > key_item:
array[j + 1] = array[j]
j -= 1
array[j + 1] = key_item
print("Step",i,":",array)
return array
data=[77, 101, 40, 43, 81, 129, 85, 144]
insertion_sort(data)
Output:
Step 1 : [77, 101, 40, 43, 81, 129, 85, 144]
Step 2 : [40, 77, 101, 43, 81, 129, 85, 144]
Step 3 : [40, 43, 77, 101, 81, 129, 85, 144]
Step 4 : [40, 43, 77, 81, 101, 129, 85, 144]
Step 5 : [40, 43, 77, 81, 101, 129, 85, 144]
Step 6 : [40, 43, 77, 81, 85, 101, 129, 144]
Step 7 : [40, 43, 77, 81, 85, 101, 129, 144]
Bubble sort:
def bubble_sort(array):
n = len(array)
for i in range(n):
already_sorted = True
for j in range(n - i - 1):
if array[j] > array[j + 1]:
array[j], array[j + 1] = array[j + 1], array[j]
already_sorted = False
if already_sorted:
break
print("Step:",n-j-1)
print(array)
return array
data = [77, 101, 40, 43, 81, 129, 85, 144]
bubble_sort(data)
Output:
Step: 1
[77, 40, 43, 81, 101, 85, 129, 144]
Step: 2
[40, 43, 77, 81, 85, 101, 129, 144]
Selection Sort:
def selectionSort(array, size):
for step in range(size):
min_idx = step
for i in range(step + 1, size):
if array[i] < array[min_idx]:
min_idx = i
(array[step], array[min_idx]) = (array[min_idx], array[step])
print("step",step+1,":",end="")
print(array)
data = [77, 101, 40, 43, 81, 129, 85, 144]
size = len(data)
selectionSort(data, size)
Output:
step 1 :[40, 101, 77, 43, 81, 129, 85, 144]
step 2 :[40, 43, 77, 101, 81, 129, 85, 144]
step 3 :[40, 43, 77, 101, 81, 129, 85, 144]
step 4 :[40, 43, 77, 81, 101, 129, 85, 144]
step 5 :[40, 43, 77, 81, 85, 129, 101, 144]
step 6 :[40, 43, 77, 81, 85, 101, 129, 144]
step 7 :[40, 43, 77, 81, 85, 101, 129, 144]
step 8 :[40, 43, 77, 81, 85, 101, 129, 144]
You can also get more guidelines from the link below how to run algorithms:
https://realpython.com/sorting-algorithms-python/

How to generate a list of numbers in python

guys. I am now working on a python algorithm and I am new to python. I'd like to generate a list of numbers like 4, 7, 8, 11, 12, 13, 16, 17, 18, 19, 22, 23, 24, 25... with 2 for loops.
I've done some work to find some numbers and I am close to the result I want, which is generate a list contains this numbers
My code is here:
for x in range(0, 6, 1):
start_ind = int(((x+3) * (x+2)) / 2 + 1)
print("start index is ", [start_ind], x)
start_node = node[start_ind]
for y in range(0, x):
ind = start_ind + y + 1
ind_list = node[ind]
index = [ind_list]
print(index)
Node is a list:
node = ['n%d' % i for i in range(0, 36, 1)]
What I received from this code is:
start index is [7] 1
['n8']
start index is [11] 2
['n12']
['n13']
start index is [16] 3
['n17']
['n18']
['n19']
start index is [22] 4
['n23']
['n24']
['n25']
['n26']
start index is [29] 5
['n30']
['n31']
['n32']
['n33']
['n34']
This seems to give the same list: and I think it's much clearer what's happening!
val=4
result=[]
for i in range(1,7):
for j in range(val,val+i):
val = val+1
result.append(j)
val = j+3
print(result)
Do not think you need a loop for this, let alone two:
import numpy as np
dif = np.ones(100, dtype = np.int32)
dif[np.cumsum(np.arange(14))] = 3
(1+np.cumsum(dif)).tolist()
output
[4, 7, 8, 11, 12, 13, 16, 17, 18, 19, 22, 23, 24, 25, 26, 29, 30, 31, 32, 33, 34, 37, 38, 39, 40, 41, 42, 43, 46, 47, 48, 49, 50, 51, 52, 53, 56, 57, 58, 59, 60, 61, 62, 63, 64, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 121, 122, 123, 124, 125, 126, 127, 128, 129]
ind_list = []
start_ind = 4
for x in range(0, 6):
ind_list.append(start_ind)
for y in range(1, x+1):
ind_list.append(start_ind + y)
start_ind = ind_list[len(ind_list)-1]+3
print(ind_list)
You could probably use this. the print function works fine, the list I assume works fairly well for the numbers provided. It appends the new number at the beginning of the loop, with a cotinually longer loop each time for x. I'm assuming the number sequence is 4, 4+3, 4+3+1, 4+3+1+3, 4+3+1+3+1, 4+3+1+3+1+1, 4+3+1+3+1+1+3, ....

Getting values of each number to be not more 90

I have a code which generates random number and put them in a list. The total of the values of these number must follow a defined value (in this case 6066). The numbers in the list also have to be a certain amount, meaning that i want 95 numbers to be generated randomly into a list, and the total of the values of these 95 numbers in the list is equals to 6066.
The code :
import random
def num(n, total):
dividers = sorted(random.sample(range(1, total), n - 1))
j= [a - b for a, b in zip(dividers + [total], [0] + dividers)]
return j
i=num(95,6066)
print (i)
The problem im facing is that i do not want any of the values of the 95 numbers in the list to exceed 85. How do i do this?
I have tried:
import random
def num(n, total):
dividers = sorted(random.sample(range(1, total), n - 1))
j= [a - b for a, b in zip(dividers + [total], [0] + dividers)]
for k in j:
if k>85:
j.remove(k)
return j
i=num(95,6066)
print (i)
But this only removes the number which are more than 85 from the list, i need to have 95 numbers in the list and total up to 6066
One solution will be to consider the problem like you are trying to distribute 6066 items between 95 buckets and each one has a capacity of 85, so you just loop over the items and each time choose a bucket that is not already full.
Here is a simple implementation. It won't be particularly fast, but it avoids the need to backtrack because there is no possibility of violating the rules (total sum incorrect or individual value exceeds the maximum).
Note that other solutions that are equally valid within your rules may have a different probability distribution, but you have not said anything about what probability distribution you require.
import random
def num(n, total, maxv):
if total > n * maxv:
raise ValueError("incompatible requirements")
vals = [0 for _ in range(n)]
not_full = list(range(n))
for _ in range(total):
index = random.choice(not_full)
vals[index] += 1
if vals[index] == maxv:
not_full.remove(index)
return vals
answer = num(95, 6066, 85)
print(answer)
print(max(answer))
print(sum(answer))
Gives:
[59, 59, 73, 63, 77, 58, 54, 71, 73, 67, 69, 67, 58, 79, 63, 59, 80, 58, 77, 64, 62, 64, 54, 50, 64, 72, 62, 69, 81, 61, 63, 50, 65, 56, 60, 51, 59, 61, 63, 56, 67, 69, 69, 64, 85, 66, 74, 66, 63, 63, 63, 68, 84, 66, 53, 82, 59, 66, 63, 58, 67, 58, 59, 58, 69, 56, 63, 61, 73, 58, 65, 60, 61, 53, 68, 51, 58, 57, 67, 60, 65, 73, 63, 59, 62, 49, 66, 59, 64, 56, 69, 58, 61, 67, 74]
85
6066

Python Nested list function to return score for students exam's.

I am trying to write a function named studentGrades(gradeList) that takes a nested list and returns a list with the average score for each student.
An example would be:
grades= [['Student','Quiz 1','Quiz 2','Quiz 3','Final'],
['John', 100, 90, 80, 90],
['McVay', 88, 99, 11, 15],
['Rita', 45, 56, 67, 89],
['Ketan', 59, 61, 67, 32],
['Saranya', 73, 79, 83, 45],
['Min', 89, 97, 101, 100]]
studentGrades(grades)
# Sample output below
>>> [90, 53, 64, 54, 70, 96]
I don't know how to do this using a nested loop. Any help or guidance is appreciated.
Incase you need a one liner
[int(sum(i[1:])/len(i[1:])) for i in grades[1:]]
Output:
[90, 53, 64, 54, 70, 96]
As I suggested in the comments, you can also use a dictionary for that. Dictionaries are basically made for situations like these and if you want to use your data any further, they might prove beneficial.
You would first have to convert your current structure, which I assumed is fixed in the sense that you have a "header" in grades, and then lists of the form [name,points..]. You can do:
grades= [['Student','Quiz 1','Quiz 2','Quiz 3','Final'],
['John', 100, 90, 80, 90],
['McVay', 88, 99, 11, 15],
['Rita', 45, 56, 67, 89],
['Ketan', 59, 61, 67, 32],
['Saranya', 73, 79, 83, 45],
['Min', 89, 97, 101, 100]]
gradedict = {}
for rows in grades[1:]:
gradedict[rows[0]] = rows[1:]
To initialize the dictionary. Then the following function:
def studentGrades(gradedict):
avgdict = {}
for key,lst in gradedict.items():
avgdict[key] = sum(lst)/len(lst)
return avgdict
Returns another dictionary with the corresponding averages. You could loop through the names, i.e. the keys, of either of those to print that. For example:
gradedict = studentGrades(gradedict)
for x in gradedict:
print("student ",x, " achieved an average of: ",gradedict[x])
which is just to show you how to access the elements. you can of course also loop through keys,items as I did in the function.
Hope this helps.
You can do something like this:
def studentGrades(a):
# return [sum(k[1:])/(len(k)-1) for k in a[1:]]
# Or
l = len(a[0]) - 1 # how many quizzes, thanks to #Mad Physicist
return [sum(k[1:])/l for k in a[1:]]
grades= [['Student','Quiz 1','Quiz 2','Quiz 3','Final'],
['John', 100, 90, 80, 90],
['McVay', 88, 99, 11, 15],
['Rita', 45, 56, 67, 89],
['Ketan', 59, 61, 67, 32],
['Saranya', 73, 79, 83, 45],
['Min', 89, 97, 101, 100]]
final = studentGrades(grades)
print(final)
Output:
[90, 53, 64, 54, 70, 96]
Why not just use pandas?
df = pd.DataFrame(grades).set_index(0).iloc[1:].mean(axis=1).astype(int)
Output:
John 90
McVay 53
Rita 64
Ketan 54
Saranya 70
Min 96
or
list(df.values)
Output:
[90, 53, 64, 54, 70, 96]
grades= [['Student','Quiz 1','Quiz 2','Quiz 3','Final'],
['John', 100, 90, 80, 90],
['McVay', 88, 99, 11, 15],
['Rita', 45, 56, 67, 89],
['Ketan', 59, 61, 67, 32],
['Saranya', 73, 79, 83, 45],
['Min', 89, 97, 101, 100]]
def average(grades):
for m in grades[1:]:
t = m[1:]
l = len(t)
s = sum(t)
yield s//l
list(average(grades))

Categories

Resources