Slice array using other array - python

I have two arrays of the same length that contain elements from 0 to 1. For example:
x = np.linspace(0,1,100)
y = np.random.permutation(x)
I grouped the elements of x in bins of width 0.1:
bins = np.arange(0,1,0.1)
x_bin = []
for i in range(1,10):
x_bin.append(x[np.digitize(x,bins)==i])
Now I would like to slice y in groups which have the same lengths of the arrays in x_bin.
How can I do that?
A possible way is:
y0 = y[0:len(x_bin[0])]
and so on, but it is not very elegant.

This may be what you want to use as a more elegant solution than using loops:
l = [len(x) for x in x_bin] # get bin lengths
split_indices = np.cumsum(l) # sum up lengths for correct split indices
y_split = np.split(y, split_indices)
I got the array lengths via list comprehension and then splitted the np array using the gathered indices. This can be shortened to a single python instruction, but it is much easier to read this way.

A possible way is:
y0 = y[0:len(x_bin[0])]
and so on, but it is not very elegant.
instead of using y0 = ... y1 = ... you can make a list of slices:
slices = []
for n in len(y):
slices.append(y[n:len(x_bin[0])])
(this might be wrong, but the principle is there)
instead of haveing y0 y1 and so on, you will have slices[0], slices[1] and so on

Related

Calculation involving iterations over all combinations of values in two lists in Python [duplicate]

This question already has answers here:
How to get the cartesian product of multiple lists
(17 answers)
Closed 5 months ago.
I need to make calculations using two lists each with 36 elements in them. The calculation must use one value in each list using all combinations. Example:
listx = [x1 , x2 , x3 , ... , x36]
listy = [y1 , y2 , y3 , ... , y36]
F(x,y) = ((y-x)*(a/b))+x
x and y in F(x,y) must assume all combinations inside listx and listy. Results should be a matrix of (36 x 36)
This is what I've tried so far:
listx = np.arange(-0.05,0.301,0.01)
listy = np.arange(-0.05,0.301,0.01)
for x in listx:
for y in listy:
F = ((y-x)*(a/b))+x
print(F)
So I think the issue is that you are having trouble conceptualizing the grid that these solutions are supposed to be stored in. This calculation is good because it is an introduction to certain optimizations and additionally there are a few ways to do it. I'll show you the three I threw together.
First, you could do it with lists and loops, which is very inefficient (numpy is just to show the shape):
import numpy as np
x, y = [], []
length = 35
for i in range(length+1):
x.append(i/length) # Normalizing over the range of the grid
y.append(i/length) # to compare to later example
def func(x, y, a, b):
return ((y-x)*(a/b))+x
a=b=1 # Set a value for a and b
row = []
for i in x:
column = []
for j in y:
column.append(func(i,j,a,b))
row.append(column)
print(row)
print(np.shape(row))
This will output a solution assuming a and b are known, and it is a 36x36 matrix. To make the matrix, we have to create a large memory space which I called row and smaller memory spaces that are recreated each iteration of the loop I called column. The inner-most loop appends the values to the column list, while the evaluated column lists are appended to the top level row list. It will then have a matrix-esque appearance even if it is just a list of lists.
A more efficient way to do this is to use numpy. First, we can keep the loops if you wish and do the calculation with numpy arrays:
import numpy as np
x = y = np.linspace(0,1,36)
result = np.zeros((len(x), len(y)))
F = lambda x,y,a,b: ((y-x)*(a/b))+x
a=b=1
for idx, i in enumerate(x):
for jdx, j in enumerate(y):
result[idx, jdx] = F(i,j,a,b) # plug in value at idx, jdx grip point
print(result)
print(result.shape)
So here we create the grid using linspace and I just chose values from 0 to 1 in 36 steps. After this, I create the grid we will store the solutions in by making a numpy array with dimensions given by the length of the x and y arrays. Finally The function is created with a lambda function, which serves the same purpose of the def previously, just in one line. The loop is kept for now, which iterates over the values i, j and indexes of each idx, jdx. The results are added into the allocated storage at each index with result[idx, jdx] = F(i,j,a,b).
We can do better, because numpy exists to help remove loops in calculations. Instead, we can utilize the meshgrid function to create a matrix and evaluate the function with it, as so:
import numpy as np
x = y = np.linspace(0,1,36)
X, Y = np.meshgrid(x,y)
F = lambda x,y,a,b: ((y-x)*(a/b))+x
a=b=1
result = F(X,Y,a,b) # Plug in grid directly
print(result.T)
print(result.shape)
Here we use the numpy arrays and tell meshgrid that we want a 36x36 array with these values at each grid point. Then we define the lambda function as before and pass the new X and Y to the function. The output does not require additional storage or loops, so then we get the result.
It is good to practice using numpy for any calculation you want to do, because they can usually be done without loops.

Create an array in python from two different arrays with all possible combinations with having fixed selection from each array

I am trying to get the combinations of the two arrays with a fixed selection from each array.
Arrays:
X = ['A','B','C','D','E']
Y = ['1','2','3','4']
My condition for selection will be Sx = 3 and Sy = 2 (the output should have 3 elements from X and 2 elements from Y which are fixed)
The output should be similar to this with all possible combinations
XY = [('A','B','C','1','2'),('B','C','D','2','3'),....)]
How can I do that?
use itertools
combos1 = itertools.combinations(X,r=Sx)
combos2 = itertools.combinations(Y,r=Sy)
prod1 = [a+b for a,b in itertools.product(combos1,combos2)]
this is computationally expensive ... it may take a while for big alphabets

Multiply two list of different sizes element wise without using libraries in python

#create a simple list in python
#list comprehension
x = [i for i in range(100)]
print (x)
#using loops
squares = []
for x in range(10):
squares.append(x**2)
print (squares)
multiples = k*[z for z in x] for k in squares
So in the last line of code I am trying to multiply both the lists. the problem is the lists are not of the same side and k*[z for z in x] this part is also incorrect.
For problems with iteration, I suggest anyone to check Loop Like A Native by Ned Batchelder and Looping like a Pro by David Baumgold
Option 1
If you want to multiply them as far as the shortest list goes, zip is your friend:
multiples = [a * b for a, b in zip (x, squares)]
Option 2
If you want a matrix with the product, then you can do it like this
result = [
[a * b for a in x]
for b in squares
]
I don't quite understand what the desired output would be. As the function stands now, you would have a list of lists, where the first element has 100 elements, the second one 400, the third 900, and so on.
One thing that's strange: The expression [z for z in x] defines a list that is identical to x. So, you might just write k*x
If you want to multiply the elements of both lists, you would have to write [[k*z for z in x] for k in squares]. This would lead to a list of 10 lists of 100 elements (or a 10x100-matrix) containing the products of your lists.
If you want to have one list of length 100 in the end that holds some kind of products, you will have to think about how to proceed with the shorter list.
EDIT: Or if you want to multiply them as far as possible until you reach the end of the shorter list, FRANCOIS CYRIL's solution is an elegant way to do so.
You can loop on each array to multiply element by element at the same position in a result array:
i = 0
arrayRes = []
while i < min (len(array1),len(array2)):
arrayRes.append(array1[i]*array2[i])
i+=1
Or do you prefer to multiply them, matrix way?
x = 0
y = 0
arrayRes = []
while x < len(array1):
arrayRes.append([])
while y < len(array2):
arrayRes[x].append(array1[x]*array2[y])
y+=1
x+=1

Splitting array of coordinates dependent on the Y value in Python

I have an array of coordinates, and I would like to split the array into two arrays dependent on the Y value when there is a large gap in the Y value. This post: Split an array dependent on the array values in Python does it dependent on the x value, and the method I use is like this:
array = [[1,5],[3,5],[6,7],[8,7],[25,25],[26,50],.....]
n = len(array)
for i in range(n-1):
if abs(array[i][0] - array[i+1][0]) >= 10:
arr1 = array[:i+1]
arr2 = array[i+1:]
I figured that when I want to split it dependent on the Y value I could just change:
if abs(array[i][0] - array[i+1][0]) to if abs(array[0][i] - array[0][i+1])
This does not work and I get IndexError: list index out of range.
I'm quite new to coding and I'm wondering why this does not work for finding gap in Y value when it works for finding the gap in the X value?
Also, how should I go about splitting the array depending on the Y value?
Any help is much appreciated!
you have to switch to this:
array = [[1,5],[3,5],[6,7],[8,7],[25,25],[26,50]]
n = len(array)
for i in range(n-1):
if abs(array[i][1] - array[i+1][1]) >= 10:
arr1 = array[:i+1]
arr2 = array[i+1:]

Why isn't my implementation O(NlogN)?

I was implementing and testing answers to this SO question -
Given an array of integers find the number of all ordered pairs of elements in the array whose sum lies in a given range [a,b]
The answer with the most upvotes (currently) only provides a text description of an algorithm that should be O(NlogN):
Sort the array... .
For each element x in the array:
Consider the array slice after the element.
Do a binary search on this array slice for [a - x], call it y0. If no exact match is found, consider the closest match bigger than [a - x] as y0.
Output all elements (x, y) from y0 forwards as long as x + y <= b. ... If you only need to count the number of pairs, you can do it in O(nlogn). Modify the above algorithm so [b - x] (or the next smaller element) is also searched for.
My implementation:
import bisect
def ani(arr, a, b):
# Sort the array (say in increasing order).
arr.sort()
count = 0
for ndx, x in enumerate(arr):
# Consider the array slice after the element
after = arr[ndx+1:]
# Do a binary search on this array slice for [a - x], call it y0
lower = a - x
y0 = bisect.bisect_left(after, lower)
# If you only need to count the number of pairs
# Modify the ... algorithm so [b - x] ... is also searched for
upper = b - x
y1 = bisect.bisect_right(after, upper)
count += y1 - y0
return count
When I plot Time versus N or some function of N I am seeing an exponential or N^2 response.
# generate timings
T = list() # run-times
N = range(100, 10001, 100) # N
arr = [random.randint(-10, 10) for _ in xrange(1000000)]
print 'start'
start = time.time()
for n in N:
arr1 = arr[:n]
t = Timer('ani(arr1, 5, 16)', 'from __main__ import arr1, ani')
timing_loops = 100
T.append(t.timeit(timing_loops) / timing_loops)
Is my implementation incorrect or is the author's claim incorrect?
Here are some plots of the data.
T vs N
T / NlogN vs N - one commenter thought this should NOT produce a linear plot - but it does.
T vs NlogN - I thought this should be linear if the complexity is NlogN but it is not.
If nothing else, this is your error:
for ndx, x in enumerate(arr):
# Consider the array slice after the element
after = arr[ndx+1:]
arr[ndx+1:] creates a copy of the list of length len(arr) - ndx, so therefore your loop is O(n^2).
Instead, use the lo and hi arguments to bisect.bisect.

Categories

Resources