Python find numbers between range in list or array

Python find numbers between range in list or array - python

I have a list with millions of numbers which are always increasing to the end, I need to find and return numbers within a specified range e.g. numbers greater than X but less than Y, the numbers in the list can change and the values I'm searching for change as well
I have been using this method, please note this is a basic example the numbers are not uniform or the same as shown below in my program
l = [i for i in range(2000000)]
nums = []
for element in l:
if element > 950004:
break
if element > 950000:
nums.append(element)
#[950001, 950002, 950003, 950004]
Although fast, I kind of need it to be a bit faster for what my program is doing, the numbers change a lot so I'm wondering if there's a better way to do this with a pandas series or a numpy array? but so far all I've done is make an example in numpy:
a = numpy.array(l,dtype=numpy.int64)
Would a pandas series be more functional? Making use of query()? what would be the best way to approach this with an array as opposed to a python list of python objects

Here is a solution using binary search. You are speaking of millions of numbers. Technically binary search will make the algorithm faster by decreasing the runtime complexity to O(log n) neglecting the final slicing step.
import bisect
l = [i for i in range(2000000)]
lower_bound = 950000
upper_bound = 950004
lower_bound_i = bisect.bisect_left(l, lower_bound)
upper_bound_i = bisect.bisect_right(l, upper_bound, lo=lower_bound_i)
nums = l[lower_bound_i:upper_bound_i]

The following are two implementations for binary search (based on code from here) - one which searches for an upper limit and one which searches for a lower limit. Does this work better for you?
def binary_search_upper(seq, limit):
min = 0
max = len(seq) - 1
while True:
if max < min:
return -1
m = (min + max) / 2
if m == (len(seq) -1) or (seq[m] <= limit and seq[m+1] > limit):
return m
elif seq[m] < limit:
min = m+1
else:
max = m - 1
def binary_search_lower(seq, limit):
min = 0
max = len(seq) - 1
while True:
if max < min:
return -1
m = (min + max) / 2
if m == 0 or (seq[m] >= limit and seq[m-1] < limit):
return m
elif seq[m] < limit:
min = m+1
else:
max = m - 1
l = [i for i in range(2000000)]
print binary_search_upper(l, 950004)
print binary_search_lower(l, 950000)

You could use numpy to get a subset of your list using a boolean slice.
import numpy as np
a = np.arange(2000000)
nums = a[(950000<a) & (a<=950004)]
nums
# returns
array([950001, 950002, 950003, 950004])

Related

Guidance on removing a nested for loop from function

I'm trying to write the fastest algorithm possible to return the number of "magic triples" (i.e. x, y, z where z is a multiple of y and y is a multiple of x) in a list of 3-2000 integers.
(Note: I believe the list was expected to be sorted and unique but one of the test examples given was [1,1,1] with the expected result of 1 - that is a mistake in the challenge itself though because the definition of a magic triple was explicitly noted as x < y < z, which [1,1,1] isn't. In any case, I was trying to optimise an algorithm for sorted lists of unique integers.)
I haven't been able to work out a solution that doesn't include having three consecutive loops and therefore being O(n^3). I've seen one online that is O(n^2) but I can't get my head around what it's doing, so it doesn't feel right to submit it.
My code is:
def solution(l):
if len(l) < 3:
return 0
elif l == [1,1,1]:
return 1
else:
halfway = int(l[-1]/2)
quarterway = int(halfway/2)
quarterIndex = 0
halfIndex = 0
for i in range(len(l)):
if l[i] >= quarterway:
quarterIndex = i
break
for i in range(len(l)):
if l[i] >= halfway:
halfIndex = i
break
triples = 0
for i in l[:quarterIndex+1]:
for j in l[:halfIndex+1]:
if j != i and j % i == 0:
multiple = 2
while (j * multiple) <= l[-1]:
if j * multiple in l:
triples += 1
multiple += 1
return triples
I've spent quite a lot of time going through examples manually and removing loops through unnecessary sections of the lists but this still completes a list of 2,000 integers in about a second where the O(n^2) solution I found completes the same list in 0.6 seconds - it seems like such a small difference but obviously it means mine takes 60% longer.
Am I missing a really obvious way of removing one of the loops?
Also, I saw mention of making a directed graph and I see the promise in that. I can make the list of first nodes from the original list with a built-in function, so in principle I presume that means I can make the overall graph with two for loops and then return the length of the third node list, but I hit a wall with that too. I just can't seem to make progress without that third loop!!

from array import array
def num_triples(l):
n = len(l)
pairs = set()
lower_counts = array("I", (0 for _ in range(n)))
upper_counts = lower_counts[:]
for i in range(n - 1):
lower = l[i]
for j in range(i + 1, n):
upper = l[j]
if upper % lower == 0:
lower_counts[i] += 1
upper_counts[j] += 1
return sum(nx * nz for nz, nx in zip(lower_counts, upper_counts))
Here, lower_counts[i] is the number of pairs of which the ith number is the y, and z is the other number in the pair (i.e. the number of different z values for this y).
Similarly, upper_counts[i] is the number of pairs of which the ith number is the y, and x is the other number in the pair (i.e. the number of different x values for this y).
So the number of triples in which the ith number is the y value is just the product of those two numbers.
The use of an array here for storing the counts is for scalability of access time. Tests show that up to n=2000 it makes negligible difference in practice, and even up to n=20000 it only made about a 1% difference to the run time (compared to using a list), but it could in principle be the fastest growing term for very large n.

How about using itertools.combinations instead of nested for loops? Combined with list comprehension, it's cleaner and much faster. Let's say l = [your list of integers] and let's assume it's already sorted.
from itertools import combinations
def div(i,j,k): # this function has the logic
return l[k]%l[j]==l[j]%l[i]==0
r = sum([div(i,j,k) for i,j,k in combinations(range(len(l)),3) if i<j<k])

#alaniwi provided a very smart iterative solution.
Here is a recursive solution.
def find_magicals(lst, nplet):
"""Find the number of magical n-plets in a given lst"""
res = 0
for i, base in enumerate(lst):
# find all the multiples of current base
multiples = [num for num in lst[i + 1:] if not num % base]
res += len(multiples) if nplet <= 2 else find_magicals(multiples, nplet - 1)
return res
def solution(lst):
return find_magicals(lst, 3)
The problem can be divided into selecting any number in the original list as the base (i.e x), how many du-plets we can find among the numbers bigger than the base. Since the method to find all du-plets is the same as finding tri-plets, we can solve the problem recursively.
From my testing, this recursive solution is comparable to, if not more performant than, the iterative solution.

This answer was the first suggestion by #alaniwi and is the one I've found to be the fastest (at 0.59 seconds for a 2,000 integer list).
def solution(l):
n = len(l)
lower_counts = dict((val, 0) for val in l)
upper_counts = lower_counts.copy()
for i in range(n - 1):
lower = l[i]
for j in range(i + 1, n):
upper = l[j]
if upper % lower == 0:
lower_counts[lower] += 1
upper_counts[upper] += 1
return sum((lower_counts[y] * upper_counts[y] for y in l))
I think I've managed to get my head around it. What it is essentially doing is comparing each number in the list with every other number to see if the smaller is divisible by the larger and makes two dictionaries:
One with the number of times a number is divisible by a larger
number,
One with the number of times it has a smaller number divisible by
it.
You compare the two dictionaries and multiply the values for each key because the key having a 0 in either essentially means it is not the second number in a triple.
Example:
l = [1,2,3,4,5,6]
lower_counts = {1:5, 2:2, 3:1, 4:0, 5:0, 6:0}
upper_counts = {1:0, 2:1, 3:1, 4:2, 5:1, 6:3}
triple_tuple = ([1,2,4], [1,2,6], [1,3,6])

Sum of array excluding the maximum and the minimum number

I have to add up excluding the minimum and maximum of any list of numbers.
This code work but it possible using a better solution only with 2 if and do not using a min/max function or another sum function pre existent?
def total(array):
sum = 0
min = array[0]
max = array[0]
for x in array:
if x>max:
if max != min:
sum += max
max = x
elif x == max:
pass
elif x < min:
if min != max:
sum += min
min = x
elif x == min:
pass
else:
sum+=x
print(min)
print(max)
return sum

This solution utilizes more off-the-shelf functions as opposed to fewer. That said, it only uses the traditional scientific computing stack so it's something you might want to become familiar with at some point depending on your goals. Most importantly, it makes the solution much more readable.
import numpy as np
import pandas as pd
xs = pd.Series([1,2,3,4,5,6,7,8,12])
xs[xs.between(np.min(xs),np.max(xs), inclusive=False)].sum()

Python Radix Sort

I'm trying to implement Radix sort in python.
My current program is not working correctly in that a list like [41,51,2,3,123] will be sorted correctly to [2,3,41,51,123], but something like [52,41,51,42,23] will become [23,41,42,52,51] (52 and 51 are in the wrong place).
I think I know why this is happening, because when I compare the digits in the tens place, I don't compare units as well (same for higher powers of 10).
How do I fix this issue so that my program runs in the fastest way possible? Thanks!
def radixsort(aList):
BASEMOD = 10
terminateLoop = False
temp = 0
power = 0
newList = []
while not terminateLoop:
terminateLoop = True
tempnums = [[] for x in range(BASEMOD)]
for x in aList:
temp = int(x / (BASEMOD ** power))
tempnums[temp % BASEMOD].append(x)
if terminateLoop:
terminateLoop = False
for y in tempnums:
for x in range(len(y)):
if int(y[x] / (BASEMOD ** (power+1))) == 0:
newList.append(y[x])
aList.remove(y[x])
power += 1
return newList
print(radixsort([1,4,1,5,5,6,12,52,1,5,51,2,21,415,12,51,2,51,2]))

Currently, your sort does nothing to reorder values based on anything but their highest digit. You get 41 and 42 right only by chance (since they are in the correct relative order in the initial list).
You should be always build a new list based on each cycle of the sort.
def radix_sort(nums, base=10):
result_list = []
power = 0
while nums:
bins = [[] for _ in range(base)]
for x in nums:
bins[x // base**power % base].append(x)
nums = []
for bin in bins:
for x in bin:
if x < base**(power+1):
result_list.append(x)
else:
nums.append(x)
power += 1
return result_list
Note that radix sort is not necessarily faster than a comparison-based sort. It only has a lower complexity if the number of items to be sorted is larger than the range of the item's values. Its complexity is O(len(nums) * log(max(nums))) rather than O(len(nums) * log(len(nums))).

Radix sort sorts the elements by first grouping the individual digits of the same place value. [2,3,41,51,123] first we group them based on first digits.
[[],[41,51],[2],[3,123],[],[],[],[],[],[]]
Then, sort the elements according to their increasing/decreasing order. new array will be
[41,51,2,3,123]
then we will be sorting based on tenth digit. in this case [2,3]=[02,03]
[[2,3],[],[123],[],[41],[51],[],[],[],[]]
now new array will be
[2,3,123,41,51]
lastly based on 100th digits. this time [2,3,41,51]=[002,003,041,051]
[[2,3,41,51],[123],[],[],[],[],[],[],[],[]]
finally we end up having [2,3,41,51,123]
def radixsort(A):
if not isinstance(A,list):
raise TypeError('')
n=len(A)
maxelement=max(A)
digits=len(str(maxelement)) # how many digits in the maxelement
l=[]
bins=[l]*10 # [[],[],.........[]] 10 bins
for i in range(digits):
for j in range(n): #withing this we traverse unsorted array
e=int((A[j]/pow(10,i))%10)
if len(bins[e])>0:
bins[e].append(A[j]) #adds item to the end
else:
bins[e]=[A[j]]
k=0 # used for the index of resorted arrayA
for x in range(10):#we traverse the bins and sort the array
if len(bins[x])>0:
for y in range(len(bins[x])):
A[k]=bins[x].pop(0) #remove element from the beginning
k=k+1

An algorithm for randomly generating integer partitions of a particular length, in Python?

I've been using the random_element() function provided by SAGE to generate random integer partitions for a given integer (N) that are a particular length (S). I'm trying to generate unbiased random samples from the set of all partitions for given values of N and S. SAGE's function quickly returns random partitions for N (i.e. Partitions(N).random_element()).
However, it slows immensely when adding S (i.e. Partitions(N,length=S).random_element()). Likewise, filtering out random partitions of N that are of length S is incredibly slow.
However, and I hope this helps someone, I've found that in the case when the function returns a partition of N not matching the length S, that the conjugate partition is often of length S. That is:
S = 10
N = 100
part = list(Partitions(N).random_element())
if len(part) != S:
SAD = list(Partition(part).conjugate())
if len(SAD) != S:
continue
This increases the rate at which partitions of length S are found and appears to produce unbiased samples (I've examined the results against entire sets of partitions for various values of N and S).
However, I'm using values of N (e.g. 10,000) and S (e.g. 300) that make even this approach impractically slow. The comment associated with SAGE's random_element() function admits there is plenty of room for optimization. So, is there a way to more quickly generate unbiased (i.e. random uniform) samples of integer partitions matching given values of N and S, perhaps, by not generating partitions that do not match S? Additionally, using conjugate partitions works well in many cases to produce unbiased samples, but I can't say that I precisely understand why.

Finally, I have a definitively unbiased method that has a zero rejection rate. Of course, I've tested it to make sure the results are representative samples of entire feasible sets. It's very fast and totally unbiased. Enjoy.
from sage.all import *
import random
First, a function to find the smallest maximum addend for a partition of n with s parts
def min_max(n,s):
_min = int(floor(float(n)/float(s)))
if int(n%s) > 0:
_min +=1
return _min
Next, A function that uses a cache and memoiziation to find the number of partitions
of n with s parts having x as the largest part. This is fast, but I think there's
a more elegant solution to be had. e.g., Often: P(N,S,max=K) = P(N-K,S-1)
Thanks to ante (https://stackoverflow.com/users/494076/ante) for helping me with this:
Finding the number of integer partitions given a total, a number of parts, and a maximum summand
D = {}
def P(n,s,x):
if n > s*x or x <= 0: return 0
if n == s*x: return 1
if (n,s,x) not in D:
D[(n,s,x)] = sum(P(n-i*x, s-i, x-1) for i in xrange(s))
return D[(n,s,x)]
Finally, a function to find uniform random partitions of n with s parts, with no rejection rate! Each randomly chosen number codes for a specific partition of n having s parts.
def random_partition(n,s):
S = s
partition = []
_min = min_max(n,S)
_max = n-S+1
total = number_of_partitions(n,S)
which = random.randrange(1,total+1) # random number
while n:
for k in range(_min,_max+1):
count = P(n,S,k)
if count >= which:
count = P(n,S,k-1)
break
partition.append(k)
n -= k
if n == 0: break
S -= 1
which -= count
_min = min_max(n,S)
_max = k
return partition

I ran into a similar problem when I was trying to calculate the probability of the strong birthday problem.
First off, the partition function explodes when given only modest amount of numbers. You'll be returning a LOT of information. No matter which method you're using N = 10000 and S = 300 will generate ridiculous amounts of data. It will be slow. Chances are any pure python implementation you use will be equally slow or slower. Look to making a CModule.
If you want to try python the approach I took as a combination of itertools and generators to keep memory usage down. I don't seem to have my code handy anymore, but here's a good impementation:
http://wordaligned.org/articles/partitioning-with-python
EDIT:
Found my code:
def partition(a, b=-1, limit=365):
if (b == -1):
b = a
if (a == 2 or a == 3):
if (b >= a and limit):
yield [a]
else:
return
elif (a > 3):
if (a <= b):
yield [a]
c = 0
if b > a-2:
c = a-2
else:
c = b
for i in xrange(c, 1, -1):
if (limit):
for j in partition(a-i, i, limit-1):
yield [i] + j

Simple approach: randomly assign the integers:
def random_partition(n, s):
partition = [0] * s
for x in range(n):
partition[random.randrange(s)] += 1
return partition

Subset sum Problem

recently I became interested in the subset-sum problem which is finding a zero-sum subset in a superset. I found some solutions on SO, in addition, I came across a particular solution which uses the dynamic programming approach. I translated his solution in python based on his qualitative descriptions. I'm trying to optimize this for larger lists which eats up a lot of my memory. Can someone recommend optimizations or other techniques to solve this particular problem? Here's my attempt in python:
import random
from time import time
from itertools import product
time0 = time()
# create a zero matrix of size a (row), b(col)
def create_zero_matrix(a,b):
return [[0]*b for x in xrange(a)]
# generate a list of size num with random integers with an upper and lower bound
def random_ints(num, lower=-1000, upper=1000):
return [random.randrange(lower,upper+1) for i in range(num)]
# split a list up into N and P where N be the sum of the negative values and P the sum of the positive values.
# 0 does not count because of additive identity
def split_sum(A):
N_list = []
P_list = []
for x in A:
if x < 0:
N_list.append(x)
elif x > 0:
P_list.append(x)
return [sum(N_list), sum(P_list)]
# since the column indexes are in the range from 0 to P - N
# we would like to retrieve them based on the index in the range N to P
# n := row, m := col
def get_element(table, n, m, N):
if n < 0:
return 0
try:
return table[n][m - N]
except:
return 0
# same definition as above
def set_element(table, n, m, N, value):
table[n][m - N] = value
# input array
#A = [1, -3, 2, 4]
A = random_ints(200)
[N, P] = split_sum(A)
# create a zero matrix of size m (row) by n (col)
#
# m := the number of elements in A
# n := P - N + 1 (by definition N <= s <= P)
#
# each element in the matrix will be a value of either 0 (false) or 1 (true)
m = len(A)
n = P - N + 1;
table = create_zero_matrix(m, n)
# set first element in index (0, A[0]) to be true
# Definition: Q(1,s) := (x1 == s). Note that index starts at 0 instead of 1.
set_element(table, 0, A[0], N, 1)
# iterate through each table element
#for i in xrange(1, m): #row
# for s in xrange(N, P + 1): #col
for i, s in product(xrange(1, m), xrange(N, P + 1)):
if get_element(table, i - 1, s, N) or A[i] == s or get_element(table, i - 1, s - A[i], N):
#set_element(table, i, s, N, 1)
table[i][s - N] = 1
# find zero-sum subset solution
s = 0
solution = []
for i in reversed(xrange(0, m)):
if get_element(table, i - 1, s, N) == 0 and get_element(table, i, s, N) == 1:
s = s - A[i]
solution.append(A[i])
print "Solution: ",solution
time1 = time()
print "Time execution: ", time1 - time0

I'm not quite sure if your solution is exact or a PTA (poly-time approximation).
But, as someone pointed out, this problem is indeed NP-Complete.
Meaning, every known (exact) algorithm has an exponential time behavior on the size of the input.
Meaning, if you can process 1 operation in .01 nanosecond then, for a list of 59 elements it'll take:
2^59 ops --> 2^59 seconds --> 2^26 years --> 1 year
-------------- ---------------
10.000.000.000 3600 x 24 x 365
You can find heuristics, which give you just a CHANCE of finding an exact solution in polynomial time.
On the other side, if you restrict the problem (to another) using bounds for the values of the numbers in the set, then the problem complexity reduces to polynomial time. But even then the memory space consumed will be a polynomial of VERY High Order.
The memory consumed will be much larger than the few gigabytes you have in memory.
And even much larger than the few tera-bytes on your hard drive.
( That's for small values of the bound for the value of the elements in the set )
May be this is the case of your Dynamic programing algorithm.
It seemed to me that you were using a bound of 1000 when building your initialization matrix.
You can try a smaller bound. That is... if your input is consistently consist of small values.
Good Luck!

Someone on Hacker News came up with the following solution to the problem, which I quite liked. It just happens to be in python :):
def subset_summing_to_zero (activities):
subsets = {0: []}
for (activity, cost) in activities.iteritems():
old_subsets = subsets
subsets = {}
for (prev_sum, subset) in old_subsets.iteritems():
subsets[prev_sum] = subset
new_sum = prev_sum + cost
new_subset = subset + [activity]
if 0 == new_sum:
new_subset.sort()
return new_subset
else:
subsets[new_sum] = new_subset
return []
I spent a few minutes with it and it worked very well.

An interesting article on optimizing python code is available here. Basically the main result is that you should inline your frequent loops, so in your case this would mean instead of calling get_element twice per loop, put the actual code of that function inside the loop in order to avoid the function call overhead.
Hope that helps! Cheers

, 1st eye catch
def split_sum(A):
N_list = 0
P_list = 0
for x in A:
if x < 0:
N_list+=x
elif x > 0:
P_list+=x
return [N_list, P_list]
Some advices:
Try to use 1D list and use bitarray to reduce memory footprint at minimum (http://pypi.python.org/pypi/bitarray) so you will just change get / set functon. This should reduce your memory footprint by at lest 64 (integer in list is pointer to integer whit type so it can be factor 3*32)
Avoid using try - catch, but figure out proper ranges at beginning, you might found out that you will gain huge speed.

The following code works for Python 3.3+ , I have used the itertools module in Python that has some great methods to use.
from itertools import chain, combinations
def powerset(iterable):
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
nums = input("Enter the Elements").strip().split()
inputSum = int(input("Enter the Sum You want"))
for i, combo in enumerate(powerset(nums), 1):
sum = 0
for num in combo:
sum += int(num)
if sum == inputSum:
print(combo)
The Input Output is as Follows:
Enter the Elements 1 2 3 4
Enter the Sum You want 5
('1', '4')
('2', '3')

Just change the values in your set w and correspondingly make an array x as big as the len of w then pass the last value in the subsetsum function as the sum for which u want subsets and you wl bw done (if u want to check by giving your own values).
def subsetsum(cs,k,r,x,w,d):
x[k]=1
if(cs+w[k]==d):
for i in range(0,k+1):
if x[i]==1:
print (w[i],end=" ")
print()
elif cs+w[k]+w[k+1]<=d :
subsetsum(cs+w[k],k+1,r-w[k],x,w,d)
if((cs +r-w[k]>=d) and (cs+w[k]<=d)) :
x[k]=0
subsetsum(cs,k+1,r-w[k],x,w,d)
#driver for the above code
w=[2,3,4,5,0]
x=[0,0,0,0,0]
subsetsum(0,0,sum(w),x,w,7)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python find numbers between range in list or array - python

You could use numpy to get a subset of your list using a boolean slice. import numpy as np a = np.arange(2000000) nums = a[(950000<a) & (a<=950004)] nums # returns array([950001, 950002, 950003, 950004])

Related

Guidance on removing a nested for loop from function

Sum of array excluding the maximum and the minimum number

Python Radix Sort

An algorithm for randomly generating integer partitions of a particular length, in Python?

Subset sum Problem

Categories

Resources