Operation on each possible pair of items in the list

Operation on each possible pair of items in the list - python

I want to collect the values of the product of each pair in a list of numbers.
This works fine with smaller numbers, but not with larger ones. How can I optimize my solution?
#will work fine with min = 10 & max = 99
#but not with these values under
min = 1000
max = 9999
seq = range(min, max + 1)
products = set()
for i in seq:
for j in seq:
p = i * j
products.add(p)

You can use numpy to take the outer product and then take the unique values.
min_num = 1000
max_num = 9999
numbers = np.arange(min_num, max_num+1)
products = np.unique(np.outer(numbers, numbers))

You can build a set directly with a comprehension. To optimize, only compute each product once by multiplying numbers with subsequent ones and themselves rather than every inverted pairs (which only wastes time producing duplicate values):
lo = 1000
hi = 9999
prods = {i*j for i in range(lo,hi+1) for j in range(i,hi+1)}
print(len(prods)) # 20789643

Nested list comprehension should be faster:
products = [i*j for i in seq for j in seq]

Related

Find the maximum element by summing overlapping intervals

Say we are given the total size of the interval space. Say we are also given an array of tuples giving us the start and end indices of the interval to sum over along with a value. After completing all the sums, we would like to return the maximum element. How would I go about solving this efficiently?
Input format: n = interval space, intervals = array of tuples that contain start index, end index, and value to add to each element
Eg:
Input: n = 5, intervals = [(1,2,100),(2,5,100),(3,4,100)]
Output: 200
so array is initially [0,0,0,0,0]
At each iteration the following modifications will be made:
1) [100,100,0,0,0]
2) [100,200,100,100,100]
3) [100,200,200,200,100]
Thus the answer is 200.
All I've figured out so far is the brute force solution of splicing the array and adding a value to the spliced portion. How can I do better? Any help is appreciated!

One way is to separate your intervals into a beginning and an end, and specify how much is added or subtracted to the total based on whether you are in that interval or not. Once you sort the intervals based on their location on the number line, you traverse it, adding or subtracting the values based on whether you enter or leave the interval. Here is some code to do so:
def find_max_val(intervals):
operations = []
for i in intervals:
operations.append([i[0],i[2]])
operations.append([i[1]+1,-i[2]])
unique_ops = defaultdict(int)
for operation in operations:
unique_ops[operation[0]] += operation[1]
sorted_keys = sorted(unique_ops.keys())
print(unique_ops)
curr_val = unique_ops[sorted_keys[0]]
max_val = curr_val
for key in sorted_keys[1:]:
curr_val += unique_ops[key]
max_val = max(max_val, curr_val)
return max_val
intervals = [(1,2,100),(2,5,100),(3,4,100)]
print(find_max_val(intervals))
# Output: 200

Here is the code for 3 intervals.
n = int(input())
x = [0]*n
interval = []
for i in range(3):
s = int(input()) #start
e = int(input()) #end
v = int(input()) #value
#add value
for i in range (s-1, e):
x[i] += v
print(max(x))

You can use list comprehension to do a lot of the work.
n=5
intervals = [(1,2,100),(2,5,100),(3,4,100)]
intlst = [[r[2] if i>=r[0]-1 and i<=r[1]-1 else 0 for i in range(n)] for r in intervals]
lst = [0]*n #[0,0,0,0,0]
for ls in intlst:
lst = [lst[i]+ls[i] for i in range(n)]
print(lst)
print(max(lst))
Output
[100, 200, 200, 200, 100]
200

Get indexes for Subsample of list of lists

I have several lists of data in python:
a = [2,45,1,3]
b = [4,6,3,6,7,1,37,48,19]
c = [45,122]
total = [a,b,c]
I want to get n random indexes from them:
n = 7
# some code
result = [[1,3], [2,6,8], [0,1]] # or
result = [[0], [0,2,6,8], [0,1]] # or
result = [[0,1], [0,2,3,6,8], []] # or any other
The idea - it takes randomly any elements (indexes of that elements) from any arrays, but total count of them must be n.
So my idea - generate random indexes:
n = 7
total_len = sum([len(el) for el in total])
inds = random.sample(range(total_length), n))
But how then get such indexes?
I think about np.cumsum() and shift indixes after that but can't find elegant solution...
P.S.
Actually, I need to use it for loading data from a several csv files using skiprow option. So my idea - get indexes for every file, and this let me load only necessary rows from every file.
So my real task:
i have several csv files of different length and need to get n random rows from them.
My idea:
lengths = my_func_to_get_lengths_for_every_csv(paths) # list of lengths
# generate random subsamle of indexes
skip = ...
for ind, fil in enumerate(files):
pd.read_csv(fil, skiprows=skip[ind])

You could flatten the list first and then take your samples:
total_flat = [item for sublist in total for item in sublist]
inds = random.sample(total_flat , k=n)

Is this what you mean?
relative_inds = []
min_bound = 0
for lst in total:
relative_inds.append([i - min_bound for i in inds if min_bound <= i < min_bound + len(lst)])
min_bound += len(lst)

How to save memory in python3?

I have some question about memory error in python3.6
import itertools
input_list = ['a','b','c','d']
group_to_find = list(itertools.product(input_list,input_list))
a = []
for i in range(len(group_to_find)):
if group_to_find[i] not in a:
a.append(group_to_find[i])
group_to_find = list(itertools.product(input_list,input_list))
MemoryError

You are creating a list, in full, from the Cartesian product of your input list, so in addition to input_list you now need len(input_list) ** 2 memory slots for all the results. You then filter that list down again to a 4th list. All in all, for N items you need memory for 2N + (N * N) references. If N is 1000, that's 1 million and 2 thousand references, for N = 1 million, you need 1 million million plus 2 million references. Etc.
Your code doesn't need to create the group_to_find list, at all, for two reasons:
You could just iterate and handle each pair individually:
a = []
for pair in itertools.product(input_list, repeat=2):
if pair not in a:
a.append(pair)
This is still going to be slow, because pair not in a has to scan the whole list to find matches. You do this N times, for up to K pairs (where K is the product of the number of unique values in input_list, potentially equal to N), so that's N * K time spent checking for duplicates. You could use a = set() to make that faster. But see point 2.
Your end product in a is the exact same list of pairs that itertools.product() would produce anyway, unless you input values are not unique. You could just make those unique first:
a = itertools.product(set(input_list), repeat=2)
Again, don't put this in a list. Iterate over it in a loop and use the pairs it produces one by one.

Python Radix Sort

I'm trying to implement Radix sort in python.
My current program is not working correctly in that a list like [41,51,2,3,123] will be sorted correctly to [2,3,41,51,123], but something like [52,41,51,42,23] will become [23,41,42,52,51] (52 and 51 are in the wrong place).
I think I know why this is happening, because when I compare the digits in the tens place, I don't compare units as well (same for higher powers of 10).
How do I fix this issue so that my program runs in the fastest way possible? Thanks!
def radixsort(aList):
BASEMOD = 10
terminateLoop = False
temp = 0
power = 0
newList = []
while not terminateLoop:
terminateLoop = True
tempnums = [[] for x in range(BASEMOD)]
for x in aList:
temp = int(x / (BASEMOD ** power))
tempnums[temp % BASEMOD].append(x)
if terminateLoop:
terminateLoop = False
for y in tempnums:
for x in range(len(y)):
if int(y[x] / (BASEMOD ** (power+1))) == 0:
newList.append(y[x])
aList.remove(y[x])
power += 1
return newList
print(radixsort([1,4,1,5,5,6,12,52,1,5,51,2,21,415,12,51,2,51,2]))

Currently, your sort does nothing to reorder values based on anything but their highest digit. You get 41 and 42 right only by chance (since they are in the correct relative order in the initial list).
You should be always build a new list based on each cycle of the sort.
def radix_sort(nums, base=10):
result_list = []
power = 0
while nums:
bins = [[] for _ in range(base)]
for x in nums:
bins[x // base**power % base].append(x)
nums = []
for bin in bins:
for x in bin:
if x < base**(power+1):
result_list.append(x)
else:
nums.append(x)
power += 1
return result_list
Note that radix sort is not necessarily faster than a comparison-based sort. It only has a lower complexity if the number of items to be sorted is larger than the range of the item's values. Its complexity is O(len(nums) * log(max(nums))) rather than O(len(nums) * log(len(nums))).

Radix sort sorts the elements by first grouping the individual digits of the same place value. [2,3,41,51,123] first we group them based on first digits.
[[],[41,51],[2],[3,123],[],[],[],[],[],[]]
Then, sort the elements according to their increasing/decreasing order. new array will be
[41,51,2,3,123]
then we will be sorting based on tenth digit. in this case [2,3]=[02,03]
[[2,3],[],[123],[],[41],[51],[],[],[],[]]
now new array will be
[2,3,123,41,51]
lastly based on 100th digits. this time [2,3,41,51]=[002,003,041,051]
[[2,3,41,51],[123],[],[],[],[],[],[],[],[]]
finally we end up having [2,3,41,51,123]
def radixsort(A):
if not isinstance(A,list):
raise TypeError('')
n=len(A)
maxelement=max(A)
digits=len(str(maxelement)) # how many digits in the maxelement
l=[]
bins=[l]*10 # [[],[],.........[]] 10 bins
for i in range(digits):
for j in range(n): #withing this we traverse unsorted array
e=int((A[j]/pow(10,i))%10)
if len(bins[e])>0:
bins[e].append(A[j]) #adds item to the end
else:
bins[e]=[A[j]]
k=0 # used for the index of resorted arrayA
for x in range(10):#we traverse the bins and sort the array
if len(bins[x])>0:
for y in range(len(bins[x])):
A[k]=bins[x].pop(0) #remove element from the beginning
k=k+1

Python - How to generate a binning index for a list?

I have a 10 bins:
bins = [0,1,2,3,4,5,6,7,8,9]
I have a list of 25 values:
values = [10,0,0,14,14,123,235,0,0,0,0,0,12,12,1235,23,234,15,15,23,136,34,34,37,45]
I want to bin the values sequentially into the bins so each value is grouped into its bin:
binnedValues = [[10,0],[0,14,14],[123,235],[0,0,0],[0,0],[12,12,1235],[23,234],[15,15,23],[136,34,34],[37,45]]
As you can see, the number of values in the bin is not always the same, (as len(values) != len(bins))
Also, I have lots of different values lists that are all different sizes. So I need to do this a number of times for the same number of bins, but different lengths of values lists. The above is an example - the real bin size is 10k, and the real len(values) is from ~10k to ~750k..
Is there a way to do this consistently? I need to maintain the order of the values, but split the values list evenly so there is a 'fair' and 'even' number of the values range distributed to each of the bins.
I think I can use numpy.digitize, but having had a look, I can't see how to generate the 'binned' list

Are you trying to split the list into lists of alternating size between 2 and 3 elements? That's doable, then.
from itertools import cycle
values = [10,0,0,14,14,123,235,0,0,0,0,0,12,12,1235,23,234,15,15,23,136,34,34,37,45]
splits = cycle([2,3])
bins = []
count = 0
while count < len(values):
splitby = splits.next()
bins.append(values[count:count+splitby])
count += splitby
print bins
Edit:
Ah, I see what you're requesting... sort of. Something more like:
from itertools import cycle
from math import floor, ceil
values = [10,0,0,14,14,123,235,0,0,0,0,0,12,12,1235,23,234,15,15,23,136,34,34,37,45]
number_bins = 10
bins_lower = int(floor(len(values) / float(number_bins)))
bins_upper = int(ceil(len(values) / float(number_bins)))
splits = cycle([bins_lower, bins_upper])
bins = []
count = 0
while count < len(values):
splitby = splits.next()
bins.append(values[count:count+splitby])
count += splitby
print bins
If you want to more variety in bin size, you can add more numbers to splits
Edit 2:
Ashwin's way, which is more concise without being harder to understand.
from itertools import cycle, islice
from math import floor, ceil
values = [10,0,0,14,14,123,235,0,0,0,0,0,12,12,1235,23,234,15,15,23,136,34,34,37,45]
number_bins = 10
bins_lower = int(floor(len(values) / float(number_bins)))
bins_upper = int(ceil(len(values) / float(number_bins)))
splits = cycle([bins_lower, bins_upper])
it = iter(values)
bins = [list(islice(it,next(splits))) for _ in range(10)]
print bins

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Operation on each possible pair of items in the list - python

You can use numpy to take the outer product and then take the unique values. min_num = 1000 max_num = 9999 numbers = np.arange(min_num, max_num+1) products = np.unique(np.outer(numbers, numbers))

Nested list comprehension should be faster: products = [i*j for i in seq for j in seq]

Related

Find the maximum element by summing overlapping intervals

Get indexes for Subsample of list of lists

How to save memory in python3?

Python Radix Sort

Python - How to generate a binning index for a list?

Categories

Resources