Python Array Item Remove

Python Array Item Remove - python

I have some Python experience, but (obviously) not an overwhelming wealth of knowledge. In short, I need to create validation arrays for Neural Network testing using 1/5th of a main array. So, I am able to append i % 5 == 0 values to a test array. From there, I can still print every 5th value. Now, when I go to remove values, I get the out of index or range error; I know it is because the remove is changing the overall value of len(string_arr). However, I have been unsuccessful in figuring out a way to compensate.
Below is a 'dummy' program to solve what I need, but not the actual thing I am working on. I need 1/5th on a validation array, remove that 1/5th from the main array, and have 4/5th left on the main to train on. Below, I tried to appended to another array and remove those values in order to not mess up the len(string_arr)... did not work.
Thank you
english_list = open('file')
for word in english_list.readlines():
word = word[:-1].lower()
if len(word) == 6:
string_arr.append(word)
target_arr.append(0)
print(string_arr)
print(len(string_arr))
for i in range(len(string_arr)):
if i % 5 == 0:
test_arr.append(string_arr[i])
for i in range(len(string_arr)):
if i % 5 == 0:
one_more.append(string_arr[i])
for i in range(len(string_arr)):
if one_more[i] == string_arr:
string_arr.remove(one_more[i])
print(test_arr)
print(len(test_arr))
print(string_arr)
print(len(string_arr))

You can do it in a slightly more compact manner with fancy list indexing:
small_cut = string_arr[::5]
remaining = [j for sub in zip(string_arr[1::5], string_arr[2::5], string_arr[3::5], string_arr[4::5]) for j in sub]
This is assuming you're working with normal python lists; if you're working with something like a numPy array, there might be even easier approaches (potentially more efficient, too) to do what you want.

Related

Converting MatLab for loop with array code to python

I was given code in Matlab made by someone else and asked to convert to python. However, I do not know MatLab.This is the code:
for i = 1:nWind
[input(a:b,:), t(a:b,1)] = EulerMethod(A(:,:,:,i),S(:,:,i),B(:,i),n,scale(:,i),tf,options);
fprintf("%d\n",i);
for j = 1:b
vwa = generate_wind([input(j,9);input(j,10)],A(:,:,:,i),S(:,:,i),B(:,i),n,scale(:,i));
wxa(j) = vwa(1);
wya(j) = vwa(2);
end
% Pick random indexes for filtered inputs
rand_index = randi(tf/0.01-1,1,filter_size);
inputf(c:d,:) = input(a+rand_index,:);
wxf(c:d,1) = wxa(1,a+rand_index);
wyf(c:d,1) = wya(1,a+rand_index);
wzf(c:d,1) = 0;
I am confused on what [input(a:b,:), t(a:b,1)] mean and if wxf, wzf, wyf are part of the MatLab library or if it's made. Also, EulerMethod and generate_wind are seprate classes. Can someone help me convert this code to python?
The only thing I really changed so far is changing the for loop from:
for i = 1:nWind
to
for i in range(1,nWind):

There's several things to unpack here.
First, MATLAB indexing is 1-based, while Python indexing is 0-based. So, your for i = 1:nWind from MATLAB should translate to for i in range(0,nWind) in Python (with the zero optional). For nWind = 5, MATLAB would produce 1,2,3,4,5 while Python range would produce 0,1,2,3,4.
Second, wxf, wyf, and wzf are local variables. MATLAB is unique in that you can assign into specific indices at the same time variables are declared. These lines are assigning the first rows of wxa and wya (since their first index is 1) into the first columns of wxf and wyf (since their second index is 1). MATLAB will also expand an array if you assign past its end.
Without seeing the rest of the code, I don't really know what c and d are doing. If c is initialized to 1 before the loop and there's something like c = d+1; later, then it would be that your variables wxf, wyf, and wzf are being initialized on the first iteration of the loop and expanded on later iterations. This is a common (if frowned upon) pattern in MATLAB. If this is the case, you'd replicate it in Python by initializing to an empty array before the loop and using the array's extend() method inside the loop (though I bet it's frowned upon in Python, as well). But really, we need you to edit your question to include a, b, c, and d if you want to be sure this is really the case.
Third, EulerMethod and generate_wind are functions, not classes. EulerMethod returns two outputs, which you'd probably replicate in Python by returning a tuple.
[input(a:b,:), t(a:b,1)] = EulerMethod(...); is assigning the two outputs of EulerMethod into specific ranges of input and t. Similar concepts as in points 1 and 2 apply here.
Those are the answers to what you expressed confusion about. Without sitting down and doing it myself, I don't have enough experience in Python to give more Python-specific recommendations.

Remove array from a list if another array with the same entry at a certain position is already in said list

My question might sound a little confusing but what I am trying to do is find an efficient method to iterate over a list that contains arrays and remove arrays from the list if they have the same entries at certain positions. Let's say I have a list with 3x2 arrays and want to make the list unique regarding the last two elements in the bottom row for example. What I came up with so far is the following:
import numpy as np
my_array_list = [np.array([[1,2,3],[4,5,6]]), np.array([[9,8,7],[6,5,4]]),
np.array([[2,3,4],[5,6,7]]), np.array([[1,7,8],[0,5,6]])]
while i < len(my_array_list):
j = i + 1
while j < len(my_array_list):
if my_array_list[i][1,1] == my_array_list[j][1,1] and my_array_list[i][1,2] == my_array_list[j][1,2]:
del my_array_list[j]
else:
j += 1
i += 1
print(my_array_list)
>>> my_array_list = [np.array([[1,2,3],[4,5,6]]), np.array([[9,8,7],[6,5,4]]),
np.array([[2,3,4],[5,6,7]])] ## Since 5,6 is already in the last two columns of the first array the last array got deleted
This loop does what I want but the problem is that it is very slow. The data I am having will be generated from a Monte Carlo Simulation thus there will most likely be millions of arrays in my list. I was wondering if there was a faster way to do this like to somehow remember which combinations have already been encountered so I only have to loop over the list once and not len(my_array_list)-times for every single combination.
Thanks in advance, any help would be appreciated.
Cheers!

Iterating over elements, finding minima per each element

First time posting, so I apologize for any confusion.
I have two numpy arrays which are time stamps for a signal.
chan1,chan2 looks like:
911.05, 7.7
1055.6, 455.0
1513.4, 1368.15
4604.6, 3004.4
4970.35, 3344.25
13998.25, 4029.9
15008.7, 6310.15
15757.35, 7309.75
16244.2, 8696.1
16554.65, 9940.0
..., ...
and so on, (up to 65000 elements per chan. pre file)
Edit : The lists are already sorted but the issue is that they are not always equal in spacing. There are gaps that could show up, which would misalign them, so chan1[3] could be closer to chan2[23] instead of, if the spacing was qual chan2[2 or 3 or 4] : End edit
For each elements in chan1, I am interested in finding the closest neighbor in chan2, which is done with:
$ np.min(np.abs(chan2-chan1[i]))
and to keep track of positive or neg. difference:
$ index=np.where( np.abs( chan2-chan1[i]) == res[i])[0][0]
$ if chan2[index]-chan1[i] <0.0 : res[i]=res[i]*(-1.0)
Lastly, I create a histogram of all the differences, in a range I am interested in.
My concern is that I do this in the for loop. I usually try to avoid for loops when I can by utilizing the numpy arrays, as each operation can be performed on the entire array. However, in this case I am unable to find a solution or a build in function (which I understand run significantly faster than anything I can make).
The routine takes about 0.03 seconds per file. There are a few more things happening outside of the function but not a significant number, mostly plotting after everything is done, and a loop to read in files.
I was wondering if anyone has seen a similar problem, or is familiar enough with the python libraries to suggest a solution (maybe a build in function?) to obtain the data I am interested in? I have to go over hundred of thousands of files, and currently my data analysis is about 10 slower than data acquisition. We are also in the middle of upgrading our instruments to where we will be able to obtain data 10-100 times faster, and so the analysis speed is going to become an serious issue.
I would prefer not to use a cluster to brute force the problem, and not too familiar with parallel processing, although I would not mind dabbling in it. It would take me a while to write it in C, and I am not sure if I would be able to make it faster.
Thank you in advance for your help.
def gen_hist(chan1,chan2):
res=np.arange(1,len(chan1)+1,1)*0.0
for i in range(len(chan1)):
res[i]=np.min(np.abs(chan2-chan1[i]))
index=np.where( np.abs( chan2-chan1[i]) == res[i])[0][0]
if chan2[index]-chan1[i] <0.0 : res[i]=res[i]*(-1.0)
return np.histogram(res,bins=np.arange(time_range[0]-interval,\
time_range[-1]+interval,\
interval))[0]
After all the files are cycled through I obtain a plot of the data:
Example of the histogram

Your question is a little vague, but I'm assuming that, given two sorted arrays, you're trying to return an array containing the differences between each element of the first array and the closest value in the second array.
Your algorithm will have a worst case of O(n^2) (np.where() and np.min() are O(n)). I would tackle this by using two iterators instead of one. You store the previous (r_p) and current (r_c) value of the right array and the current (l_c) value of the left array. For each value of the left array, increment the right array until r_c > l_c. Then append min(abs(r_p - l_c), abs(r_c - l_c)) to your result.
In code:
l = [ ... ]
r = [ ... ]
i = 0
j = 0
result = []
r_p = r_c = r[0]
while i < len(l):
l_c = l[i]
while r_c < l and j < len(r):
j += 1
r_c = r[j]
r_p = r[j-1]
result.append(min(abs(r_c - l_c), abs(r_p - l_c)))
i += 1
This runs in O(n). If you need additional speed out of it, try writing it in C or running it in Cython.

Array: Ascending and Multiplication Table

How to arrange 5 integers using array in ascending order, no use of sort()
then 5x5 multiplication table using array too, like using list, array append.

I'm going to take a stab at what I believe you're asking, mostly because I hope it's educational. You're lucky I'm procrastinating studying at the moment.
Sorting, because who likes entropy anyway?
Bubbles!
Your first task is to look at the bubble sort, a sorting algorithm that's as simple to code as it is to understand. (It performs poorly with large arrays due to its O(n2) performance but is probably among the first sorts a lot of people encounter.) I highly, highly suggest you understand the algorithm before even thinking about looking at code.
How does it work?
Start at the beginning! Look at the first pair of numbers. If they're in the wrong order, swap them. Increment your starting position by 1 and repeat until the end of the array.
What would this look like in Python?
I'm glad you asked. You need to loop through the whole array and swap whenever appropriate. Thankfully Python makes swapping very easy, allowing you to pull tricks like a, b = b, a. We can (hopefully quickly) write down some code to do what we want:
def bubble_sort(array):
for i in xrange(len(array)):
for j in xrange(len(array) - i - 1):
if array[j] > array[j + 1]:
array[j], array[j + 1] = array[j + 1], array[j]
return array
This should be straightforward and follows the sorting procedure directly. Pass in an array (or list of numbers) and it will return an array sorted in ascending order.
Multiplication Table
I'm assuming you mean something like this table that you learn in first grade. The requirement I'm imposing on your vague wording is that we want to return a 2D array where the first row is multiples of 0, the second is multiples of 1, etc. This goes for the columns as well, since multiplication tables are symmetric between rows and columns. There are a number of possible approaches, but I'm only going to consider the one I personally find the most elegant and Pythonic. Python comes packed with great list comprehension, so why not make use of it? Try this:
table = [[x*y for x in xrange(6)] for y in xrange(6)]
This creates a 6x6 matrix, i.e. the multiplication table from 0–5. Take some time to really understand this code. I think that list comprehension is absolutely fundamental to Python and is something that sets it apart. If you look at the (i, j)th element of the array, you'll see that it equals ij. For example, table[3][2] == 6 is true.
I desperately hope you learned something useful from this. Next time you post a question, hopefully you'll give us more to work on.

Python - comparing elements of list with 'neighbour' elements

This may be more of an 'approach' or conceptual question.
Basically, I have a python a multi-dimensional list like so:
my_list = [[0,1,1,1,0,1], [1,1,1,0,0,1], [1,1,0,0,0,1], [1,1,1,1,1,1]]
What I have to do is iterate through the array and compare each element with those directly surrounding it as though the list was layed out as a matrix.
For instance, given the first element of the first row, my_list[0][0], I need to know know the value of my_list[0][1], my_list[1][0] and my_list[1][1]. The value of the 'surrounding' elements will determine how the current element should be operated on. Of course for an element in the heart of the array, 8 comparisons will be necessary.
Now I know I could simply iterate through the array and compare with the indexed values, as above. I was curious as to whether there was a more efficient way which limited the amount of iteration required? Should I iterate through the array as is, or iterate and compare only values to either side and then transpose the array and run it again. This, however would ignore those values to the diagonal. And should I store results of the element lookups, so I don't keep determining the value of the same element multiple times?
I suspect this may have a fundamental approach in Computer Science, and I am eager to get feedback on the best approach using Python as opposed to looking for a specific answer to my problem.

You may get faster, and possibly even simpler, code by using numpy, or other alternatives (see below for details). But from a theoretical point of view, in terms of algorithmic complexity, the best you can get is O(N*M), and you can do that with your design (if I understand it correctly). For example:
def neighbors(matrix, row, col):
for i in row-1, row, row+1:
if i < 0 or i == len(matrix): continue
for j in col-1, col, col+1:
if j < 0 or j == len(matrix[i]): continue
if i == row and j == col: continue
yield matrix[i][j]
matrix = [[0,1,1,1,0,1], [1,1,1,0,0,1], [1,1,0,0,0,1], [1,1,1,1,1,1]]
for i, row in enumerate(matrix):
for j, cell in enumerate(cell):
for neighbor in neighbors(matrix, i, j):
do_stuff(cell, neighbor)
This has takes N * M * 8 steps (actually, a bit less than that, because many cells will have fewer than 8 neighbors). And algorithmically, there's no way you can do better than O(N * M). So, you're done.
(In some cases, you can make things simpler—with no significant change either way in performance—by thinking in terms of iterator transformations. For example, you can easily create a grouper over adjacent triplets from a list a by properly zipping a, a[1:], and a[2:], and you can extend this to adjacent 2-dimensional nonets. But I think in this case, it would just make your code more complicated that writing an explicit neighbors iterator and explicit for loops over the matrix.)
However, practically, you can get a whole lot faster, in various ways. For example:
Using numpy, you may get an order of magnitude or so faster. When you're iterating a tight loop and doing simple arithmetic, that's one of the things that Python is particularly slow at, and numpy can do it in C (or Fortran) instead.
Using your favorite GPGPU library, you can explicitly vectorize your operations.
Using multiprocessing, you can break the matrix up into pieces and perform multiple pieces in parallel on separate cores (or even separate machines).
Of course for a single 4x6 matrix, none of these are worth doing… except possibly for numpy, which may make your code simpler as well as faster, as long as you can express your operations naturally in matrix/broadcast terms.
In fact, even if you can't easily express things that way, just using numpy to store the matrix may make things a little simpler (and save some memory, if that matters). For example, numpy can let you access a single column from a matrix naturally, while in pure Python, you need to write something like [row[col] for row in matrix].
So, how would you tackle this with numpy?
First, you should read over numpy.matrix and ufunc (or, better, some higher-level tutorial, but I don't have one to recommend) before going too much further.
Anyway, it depends on what you're doing with each set of neighbors, but there are three basic ideas.
First, if you can convert your operation into simple matrix math, that's always easiest.
If not, you can create 8 "neighbor matrices" just by shifting the matrix in each direction, then perform simple operations against each neighbor. For some cases, it may be easier to start with an N+2 x N+2 matrix with suitable "empty" values (usually 0 or nan) in the outer rim. Alternatively, you can shift the matrix over and fill in empty values. Or, for some operations, you don't need an identical-sized matrix, so you can just crop the matrix to create a neighbor. It really depends on what operations you want to do.
For example, taking your input as a fixed 6x4 board for the Game of Life:
def neighbors(matrix):
for i in -1, 0, 1:
for j in -1, 0, 1:
if i == 0 and j == 0: continue
yield np.roll(np.roll(matrix, i, 0), j, 1)
matrix = np.matrix([[0,0,0,0,0,0,0,0],
[0,0,1,1,1,0,1,0],
[0,1,1,1,0,0,1,0],
[0,1,1,0,0,0,1,0],
[0,1,1,1,1,1,1,0],
[0,0,0,0,0,0,0,0]])
while True:
livecount = sum(neighbors(matrix))
matrix = (matrix & (livecount==2)) | (livecount==3)
(Note that this isn't the best way to solve this problem, but I think it's relatively easy to understand, and likely to illuminate whatever your actual problem is.)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.