I have a whole set of data. This data is in x and y coordinates. I'm trying to write a program that counts the amount of data between x and y coordinates...
So for example, let's say I have
(3,4)
(6,3)
(7,6)
(5,5)
(6,7)
and I can only count the data where 5<x<7 and 4<y<6
Then, the coords this program would count is:
(5,5)
(7,6)
So I would get 2.
I can figure out how to do this if I only set one constraint... For example if I just had a list of 1,2,3,4,5,6,7,8 and needed to count the numbers where 3<x<7... I could do that. However, I'm having trouble figuring out how to handle this if there's two constraints.
Thank you so much!
You can use an and to state that both conditions should be satisfied. For example:
sum(5<=x<=7 and 4<=y<=6 for x,y in coord_list)
with coord_list the list of coordinates. Note that in order to satisfy the fact that the count should be 2, you should use less than or equal (<=) operators instead of less than operators (<).
This produces:
>>> coord_list = [(3,4),(6,3),(7,6),(5,5),(6,7)]
>>> sum(5<=x<=7 and 4<=y<=6 for x,y in coord_list)
2
You can obtain the list of coordinates, for instance using list comprehension:
[(x,y) for x,y in coord_list if 5<=x<=7 and 4<=y<=6]
If the number of elements in a point is arbitrary, it is better to use indexing:
[t for t in coord_list if 5<=t[0]<=7 and 4<=t[1]<=6]
Related
I have two arrays:
X = np.linspace(0,100,101) and Y = np.linspace(0,100,15). How to make the values of one array be rounded to the values of the other array. that is, I finally have an array of size 101 but the values of the array X came close to the nearest values of the array Y.
Like this?
Y[np.absolute(X[:,np.newaxis]-Y).argmin(axis=1)]
The simplest way is to use X array to produse Y array. Like this:
Y = X[::len(X)//15]
But it'll give you a little biased numbers.
For this particular case you also can use simple round function:
Y = np.array(list(map(round, Y)))
In general, you can solve this by searching for the minimal difference between elements in arrays:
Y = X[[abs(X-i).argmin() for i in Y]]
I'll preface this with saying that I'm new to Python, but not new to OOP.
I'm using numpy.where to find the indices in n arrays at which a particular condition is met, specifically if the value in the array is greater than x.
What I want to do is find the indicies in which all n arrays meet that condition - so in each each array, at index y, the element is greater than x.
n0[y] > x
n1[y] > x
n2[y] > x
n3[y] > x
For example, if my arrays after using numpy.where were:
a = [0,1,2,3,4,5,6,7,8,9,10]
b = [0,2,4,6,8,10,12,14,16,18,20]
c = [0,2,3,5,7,11,13,17,19,23]
d = [0,1,2,3,5,8,13,21,34,55]
I want to get the output
[0,2]
I found the function numpy.isin, which seems to do what I want for just two arrays. I don't know how to go about expanding this to more than two arrays and am not sure if it's possible.
Here's the start of my code, in which I generate the indices meeting my criteria:
n = np.empty([0])
n = np.append(n,np.where(sensor[i] > x)[0])
I'm a little stuck. I know I could create a new array with the same number of indicies as my original arrays and set the values in it to true or false, but that would not be very efficient and my original arrays are 25k+ elements long.
To find the intersection of n different arrays, first convert them all to sets. Then it is possible to apply set.intersection(). For the example with a, b, c and d, simply do:
set.intersection(*map(set, [a,b,c,d]))
This will result in a set {0, 2}.
i have a very large 1D python array x of somewhat repeating numbers and along with it some data d of the same size.
x = np.array([48531, 62312, 23345, 62312, 1567, ..., 23345, 23345])
d = np.array([0 , 1 , 2 , 3 , 4 , ..., 99998, 99999])
in my context "very large" refers to 10k...100k entries. Some of them are repeating so the number of unique entries is about 5k...15k.
I would like to group them into the bins. This should be done by creating two objects. One is a matrix buffer, b of data items taken from d. The other object is a vector v of unique x values each of the buffer columns refers to. Here's the example:
v = [48531, 62312, 23345, 1567, ...]
b = [[0 , 1 , 2 , 4 , ...]
[X , 3 , ....., ...., ...]
[ ...., ....., ....., ...., ...]
[X , X , 99998, X , ...]
[X , X , 99999, X , ...] ]
Since the numbers of occurrences of each unique number in x vary some of the values in the buffer b are invalid (indicated by the capital X, i.e. "don't care").
It's very easy to derive v in numpy:
v, n = np.unique(x, return_counts=True) # yay, just 5ms
and we even get n which is the number of valid entries within each column in b. Moreover, (np.max(n), v.shape[0]) returns the shape of the matrix b that needs to be allocated.
But how to efficiently generate b?
A for-loop could help
b = np.zeros((np.max(n), v.shape[0]))
for i in range(v.shape[0]):
idx = np.flatnonzero(x == v[i])
b[0:n[i], i] = d[idx]
This loop iterates over all columns of b and extracts the indices idxby identifying all the locations where x == v.
However I don't like the solution because of the rather slow for loop (taking about 50x longer than the unique command). I'd rather have the operation vectorized.
So one vectorized approach would be to create a matrix of indices where x == v and then run the nonzero() command on it along the columns. however, this matrix would require memory in the range of 150k x 15k, so about 8GB on a 32 bit system.
To me it sounds rather silly that the np.unique-operation can even efficiently return the inverted indices so that x = v[inv_indices] but that there is no way to get the v-to-x assignment lists for each bin in v. This should come almost for free when the function is scanning through x. Implementation-wise the only challenge would be the unknown size of the resulting index-matrix.
Another way of phrasing this problem assuming that the np.unique-command is the method-to-use for binning:
given the three arrays x, v, inv_indices where v are the unique elements in x and x = v[inv_indices] is there an efficient way of generating the index vectors v_to_x[i] such that all(v[i] == x[v_to_x[i]]) for all bins i?
I shouldn't have to spend more time than for the np.unique-command itself. And I'm happy to provide an upper bound for the number of items in each bin (say e.g. 50).
based on the suggestion from #user202729 I wrote this code
x_sorted_args = np.argsort(x)
x_sorted = x[x_sorted_args]
i = 0
v = -np.ones(T)
b = np.zeros((K, T))
for k,g in groupby(enumerate(x_sorted), lambda tup: tup[1]):
groups = np.array(list(g))[:,0]
size = groups.shape[0]
v[i] = k
b[0:size, i] = d[x_sorted_args[groups]]
i += 1
in runs in about ~100ms which results in some considerable speedup w.r.t. the original code posted above.
It first enumerates the values in x adding the corresponding index information. Then the enumeration is grouped by the actual x value which in fact is the second value of the tuple generated by enumerate().
The for loop iterates over all the groups turning those iterators of tuples g into the groups matrix of size (size x 2) and then throws away the second column, i.e. the x values keeping only the indices. This leads to groups being just a 1D array.
groupby() only works on sorted arrays.
Good work. I'm just wondering if we can do even better? Still a lot of unreasonable data copying seems to happen. Creating a list of tuples and then turning this into a 2D matrix just to throw away half of it still feels a bit suboptimal.
I received the answer I was looking for by rephrasing the question, see here: python: vectorized cumulative counting
by "cumulative counting" the inv_indices returned by np.unique() we receive the array indices of the sparse matrix so that
c = cumcount(inv_indices)
b[inv_indices, c] = d
cumulative counting as proposed in the thread linked above is very efficient. Run times lower than 20ms are very realistic.
I am working on some molecular dynamics using Python, and the arrays tend to get pretty large. It would be helpful to have a quick check to see if certain vectors appear in the arrays.
After searching for way to do this, I was surprised to see this question doesn't seem to come up.
In particular,
if I have something like
import numpy as np
y = [[1,2,3], [1,3,2]]
x = np.array([[1,2,3],[3,2,1],[2,3,1],[10,5,6]])
and I want to see if the specific vectors from y are present in x (not just the elements), how would I do so?
Using something like
for i in y:
if i in x:
print(i)
will simply return every y array vector that contains at least one element of i.
Thoughts?
If you want to check if ALL vectors in y are present in the array, you could try:
import numpy as np
y = [[1,2,3], [1,3,2]]
x = np.array([[1,2,3],[3,2,1],[2,3,1],[10,5,6]])
all(True if i in x else False for i in y)
# True
You don't explicitly give your expected output, but I infer that you want to see only [1, 2, 3] as the output from this program.
You get that output if you make x merely another list, rather than a NumPy array.
The best strategy will depend on sizes and numbers. A quick solution is
[np.where(np.all(x==row, axis=-1))[0] for row in y]
# [array([0]), array([], dtype=int64)]
The result list gives for each row in y a possibly empty array of positions in x where the row occurs.
I'm looking to create a program which randomly generates coins on an 8x8 grid. I've got two lists being created (one list for the X co-ordinate and list for the Y co-ordinate). On these lists, the two co-ordinates cannot be the same. It's difficult to explain, so here's what I mean by example:
[1, 7, 4, **6**, 9, 2, 3, **6**, 8, 0] (list for the x co-ordinate)
[9, 3, 3, **1**, 2, 8, 0, **1**, 6, 1] (list for the y co-ordinate)
So, two lists are created. However (6,1) appears twice. I don't want this. So, how would I allow for this in my code, to ensure that this is ignored and the numbers are regenerated into different co-ordinates? The code I have is below, I don't really know how to implement such a system thing!
def treasurePro():
global coinListX, coinListY
coinListX = []
coinListY = []
for x in range(10):
num = randint(0,8)
coinListX.append(num)
print(coinListX)
for x in range(10):
num = randint(0,8)
if num == 0 and coinListX[x] == 0:
treasurePro() #goes back to the beginning to restart.
else:
coinListY.append(num)
print(coinListY)
Don't create two lists with coordinates, at least not initially. That only makes it harder to detect duplicates.
You could either create tuples with coordinates so you can detect duplicates, or even produce a range of integers that represent your coordinates in sequence, then sample from those. The latter is extremely efficient.
To create tuples, essentially you want to create 8 unique such tuples:
def treasurePro():
coords = []
while len(coords) < 8:
coord = randint(0, 8), randint(0, 8)
if coord not in coords:
coords.append(coord)
# now you have 8 unique pairs. split them out
coinListX, coinListY = zip(*coords)
This isn't all that efficient, as the coord not in coords test has to scan the whole list which is growing with each new coordinate. For a large number of coordinates to pick, this can slow down significantly. You'd have to add an extra seen = set() object that you also add coordinates to and test again in the loop to remedy that. There is a better way however.
Your board is a 9x9 size, so you have 81 unique coordinates. If you used random.sample() on a range() object (xrange() in Python 2), you could trivially create 8 unique values, then 'extract' a row and column number from those:
def treasurePro():
coords = random.sample(range(9 * 9), 8) # use xrange in Python 2
coinListX = [c // 9 for c in coords]
coinListY = [c % 9 for c in coords]
Here random.sample() guarantees that you get 8 unique coordinates.
This is also far more efficient than generating all possible tuples up-front; using range() in Python 3 makes the above use O(K) memory, where K is the number of values you need to generate, while creating all coordinates up front would take O(N^2) memory (where N is the size of a board side).
You may want to store a list of (x, y) coordinates still rather than use two separate lists. Create one with coords = [(c // 9, c % 9) for c in coords].
Your board is small enough that you can simply generate all possibilities, take a sample, and then transpose into the desired separate lists for X and Y.
possibilities = [(a,b) for a in range(10) for b in range(10)]
places = random.sample(possibilities, 10)
x,y = zip(*places)
You want to generate random coordinates, but you also want to reject any
pair of coordinates that already appears in the list. (Incidentally,
instead of two separate lists of integers, I would suggest using one
list of ordered pairs, i.e., tuples of two integers.)
One way to reject duplicates would be to search the existing list for
the new set. This is O(n) and slower than it needs to be, though it
would certainly work in your use case where n can't exceed 64.
Another way would be to maintain a second data structure where you can
look up each of the 64 cells in O(1) time, such as an 8x8 array of
booleans. Indeed, you could use this one structure by itself; to get a
list of the coordinates used, just traverse it.
cordX = [x for x in range(10)]
cordY = cordX[:]
random.shuffle(cordX)
random.shuffle(cordY)