How to select a certain amount of data as numpy array?

How to select a certain amount of data as numpy array? - python

I have total 1000 txt files which are filled with data. I have copied all of them into a single txt file and have loaded it into my python code as:
data = numpy.loadtxt('C:\data.txt')
This is fine up to this point. Now, what I need is to select every 5th file from those 1000 txt files (i.e. 200 files) and load their combined content into a single variable. I am confused about how to do this.
Need help.

Why not load the files one at a time (assuming the files are data-0000 through data-0999):
datasets = []
for file_number in range(1000):
datasets.append(numpy.loadtxt("c:\\data-%04d" %(file_number, ))
Then you can get every fifth file with: every_fifth_file = datasets[::5]. See also: Explain Python's slice notation

It is crucial for us to know if the files have the same number of lines or not. If they do, you can proceed as you are already and use a slicing trick. If they don't then you will need to load the files separately to achieve what you want - the positions where files are delimited has already been lost in the merge.
Personally, I think David's suggestion is better in either case. But if you want to push ahead with slicing the big data array up, read on...
>>> import numpy as np
>>> n = 2 # number of lines in each file
>>> N = 5 # number of files
>>> x = np.eye(n*N, dtype=int) # fake example data
>>> x
array([[1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1]])
>>> np.vstack(x[n*i:n*(i+1)] for i in range(N)[::2]) # every second file
array([[1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1]])
>>> np.vstack(x[n*i:n*(i+1)] for i in range(N)[1::3]) # every third file, skipping the first
array([[0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1]])

By putting all your 1000 files in a single one, you simplified the operation of loading the data in Numpy (good point), but you lsot the information about how many lines were in each of the initial files (bad point).
If you know that all your files have the same number of lines, great! Using N files, with m lines in each file, your array should have a length of N*m. So, data[:m] has the lines of your first file, data[m:2*m] of your your second file, and so forth. So, your fifth file is data[4*m:5*m], your tenth data[9*m:10*m]. Of course, you could do some simple recursion to find the lines you want. But we can use the fact that the arrays have the same number of lines: let's reshape the array!
If data has a shape of (N*m,d), where d is the number of columns of each file, you could reshape with:
data_reshaped = data.reshape(N,m,d)
or even simpler:
data.shape = (N, m, d)
Now, data is 3D. You simply access every other 5th entry with data[::5], which will give you an array of shape (N/5, m, d), whose first element will be your initial 5th array...
Note that this trick works only if the files have the same number of lines. If they don't then you're stuck with finding the lines you want from a list of the number of lines in each file.

Related

Detect index of multiple maximum in a 2D array

Let's say I have a 2D array with a size of m x n elements.
Now, I want to get the indices of all maximums. So the result should be something like:
[(m1, n1), (m2, n2)] where m and n indicate the x and y coordinates of my maximums.
With only one maximum its quite easy, but with more, I'm getting stuck.
import numpy as np
pixel = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 189, 12, 0, 0, 1, 0, 0, 0, 0],
[0, 6, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 203, 9, 0],
[0, 0, 0, 0, 0, 0, 0, 12, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 5, 245, 0, 0, 0, 7, 4, 0],
[0, 0, 0, 0, 0, 0, 0, 250, 8, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
result = np.where(pixel == pixel.max())
print("cross detection at y:", result[0][0], "x:", result[1][0])
print(pixel)
Does somebody have an idea? It would be great, thanks!

Try this:
x,y = np.where(pixel == np.max(pixel))
this will return x axis and y axis of all the elements with maximum values
Now,for your question you can do
np.array((x,y)).T
for this last code I look up to this question

Try numpy.argmax, it returns the indices of the maximum values along an axis.

place the mydata_array into the random location of Big_array of zeros

mydata is an numpy array of shape(10,100,100) of the form(z,y,x). And i have created the empty array of shape(10,800,800). Now i need to place the mydata_array into some random locations of empty_array such that if I would plot the output, it should look like mydata is placed randomly in the ouput plot of array(10,800,800).
I used the np.hstack() and np.vstack().
But it places the mydata_array side by side. I need to place my_data_array in random location.
How could i do this? Any Suggestions please..
Regards
Raj

Here's a demonstration of placing several copies of one array inside another, using slice indexing:
In [802]: out = np.zeros((10,10),int)
In [803]: src = np.arange(6).reshape(2,3)
In [804]: out
Out[804]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
...
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
One copy in the upper left:
In [805]: out[:2,:3] = src
In [806]: out
Out[806]:
array([[0, 1, 2, 0, 0, 0, 0, 0, 0, 0],
[3, 4, 5, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
....
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
Several more copies:
In [808]: out[4:6, 6:9] = src
In [809]: out[1:3, 4:7] = src
In [810]: out
Out[810]:
array([[0, 1, 2, 0, 0, 0, 0, 0, 0, 0],
[3, 4, 5, 0, 0, 1, 2, 0, 0, 0],
[0, 0, 0, 0, 3, 4, 5, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1, 2, 0],
[0, 0, 0, 0, 0, 0, 3, 4, 5, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
Just repeat that kind of action for a selection of random locations. Make sure that the slice ranges match the src shape, and that they lie within the dimensions of the target array.
While may be possible to insert many copies at once (the flattening of the answer may be needed), let's start with understanding how to insert one copy at a time.
=========
#alvis' answer places the src items in shuffled order on one row of the out (or wrapped rows):
array([[2, 4, 5, 3, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
...
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
===================
Looped placement of multiple blocks:
def foo1(src, idx, NM):
out = np.zeros(NM, dtype=src.dtype)
n,m = src.shape
for i,j in idx:
out[i:i+n, j:j+m] = src
return out
idx=np.array([[0,0],[1,4],[4,4],[8,7],[7,2]])
In [940]: out1 = foo1(src, idx, (10,10))
In [941]: out1
Out[941]:
array([[0, 1, 2, 0, 0, 0, 0, 0, 0, 0],
[3, 4, 5, 0, 0, 1, 2, 0, 0, 0],
[0, 0, 0, 0, 3, 4, 5, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 2, 0, 0, 0],
[0, 0, 0, 0, 3, 4, 5, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 2, 0, 0, 0, 0, 0],
[0, 0, 3, 4, 5, 0, 0, 0, 1, 2],
[0, 0, 0, 0, 0, 0, 0, 3, 4, 5]])
================
Placement of a block with advanced indexing (arrays instead of slices):
In [880]: I = np.array([1,1,1,2,2,2])
In [881]: J = np.array([3,4,5,3,4,5])
In [882]: out[I,J] = src.flat
In [883]: out
Out[883]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 2, 0, 0, 0, 0],
[0, 0, 0, 3, 4, 5, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
...
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
And for multiple blocks
def foo2(src, idx, NM):
out = np.zeros(NM, dtype=src.dtype)
n,m = src.shape
ni = len(idx)
IJ = [np.mgrid[i:i+n, j:j+m] for i,j in idx]
IJ = np.concatenate(IJ, axis=1).reshape(2,-1)
out[IJ[0,:], IJ[1,:]] = np.tile(src,(ni,1)).flat
return out
In this small example the alternate is considerably slower (14x). For (1000,1000) out it is still slow (6x). Most of the time is spent in generating IJ.
This handles the I,J index calculation much faster (it needs to be generalize), but it is still slower than the looped slicing:
def foo3(src, idx, NM):
out = np.zeros(NM, dtype=src.dtype)
n,m = src.shape
ni = len(idx)
I = np.repeat((idx[:,[0]]+np.arange(2)).flatten(),3)
J = np.repeat((idx[:,[1]]+np.arange(3)),2,axis=0).flatten()
out[I, J] = np.tile(src,(ni,1)).flat
return out
This reminds me of work I did years ago to speed up the creation of a finite element stiffness matrix in MATLAB. There it was per-element stiffness blocks that needed to be placed in a large sparse global stiffness matrix.
==================
Regular pattern with broadcasting (see edit history)

According to your question, you don't need to preserve elements relatively to the first dimension of your array. For example, if there is one non-zero element a in (100,100) matrix z=0, and two elements b and c in the matrix z=1, then in your output all a, b, c can appear in z=0. In this case I suggest the following solution:
import numpy as np
#replace this with your input data
mydata = np.ones((10,100,100))
mydata_large = np.zeros((10,800,800))
mydata_flatten = mydata.flatten()
ind = np.array([i for i in range(len(mydata_flatten))])
np.random.shuffle(ind)
mydata_large_f = mydata_large.flatten()
np.put(mydata_large_f,ind[:len(mydata_flatten)],mydata_flatten)
mydata_large = np.reshape(mydata_large_f, (10,800,800))

How can I manipulate a list of lists?

How can i iterate through a list of lists so as to make any of the lists with a "1" have the top(0), top left(0), top right(0), bottom(0), bottom right(0),bottom left(0) also become a "1" as shown below? making list 1 become list 2
list_1 =[[0,0,0,0,0,0,0,0],
[0,0,0,0,0,0,0,0],
[0,0,0,1,0,0,0,0],
[0,0,0,0,0,0,0,0]]
list_2 =[[0,0,0,0,0,0,0,0],
[0,0,1,1,1,0,0,0],
[0,0,1,1,1,0,0,0],
[0,0,1,1,1,0,0,0]]

This is a common operation known as "dilation" in image processing. Your problem is 2-dimensional, so you would be best served using
a more appropriate 2-d data structure than a list of lists, and
an already available library function, rather than reinvent the wheel
Here is an example using a numpy ndarray and scipy's binary_dilation respectively:
>>> import numpy as np
>>> from scipy import ndimage
>>> a = np.array([[0,0,0,0,0,0,0,0],
[0,0,0,0,0,0,0,0],
[0,0,0,1,0,0,0,0],
[0,0,0,0,0,0,0,0]], dtype=int)
>>> ndimage.binary_dilation(a, structure=ndimage.generate_binary_structure(2, 2)).astype(a.dtype)
array([[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0]])

With numpy, which is more suitable to manipulate 2D list in general. If you're doing image analysis, see #wim answer. Otherwise here is how you could manage it with numpy only.
> import numpy as np
> list_1 =[[0,0,0,0,0,0,0,0],
[0,0,0,0,0,0,0,0],
[0,0,0,1,0,0,0,0],
[0,0,0,0,0,0,0,0]]
> l = np.array(list_1) # convert the list into a numpy array
> pos = np.where(l==1) # get the position where the array is equal to one
> pos
(array([2]), array([3]))
# make a lambda function to limit the lower indexes:
get_low = lambda x: x-1 if x>0 else x
# get_high is not needed.
# slice the array around that position and set the value to one
> l[get_low(pos[0]):pos[0]+2,
get_low(pos[1]):pos[1]+2] = 1
> l
array([[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0]])
> corner
array([[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1]])
> p = np.where(corner==1)
> corner[get_low(p[0]):p[0]+2,
get_low(p[1]):p[1]+2] = 1
> corner
array([[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0, 1, 1]])
HTH

Why do I get and error when my loop runs the 2nd time? TypeError: 'int' object has no attribute 'getitem'

The aim of this code is to change the numbers on the board according to given moves.
This is a simplified excerpt from my code and I'd like the principle to stay the same.
It seems like the code runs through the first loop but then gives an error when it runs through it another time: TypeError: 'int' object has no attribute 'getitem'
Help would be appreciated.
import numpy
board = numpy.array([[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 2, 0, 0, 0],
[0, 0, 0, 2, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0]])
boardlist0 = [board]*2
boardlist1 = []
ind = 0
move = [[0,0], [7,4]]
for k in move:
move = move[ind]
boardlist0[ind][move[0]][move[1]] = 1
boardlist1.append(boardlist0)
ind += 1

ind = 0
move = [[0,0], [7,4]]
for k in move:
move = move[ind]
print(move)
prints
[0, 0]
0
On the second iteration, move equals 0. So
move[0]
raises a TypeError.
I'm not quite sure what the intention of your code is, but you could avoid the TypeError using k instead of move. (Below I've renamed move --> moves, and k --> move):
import numpy
board = numpy.array([[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 2, 0, 0, 0],
[0, 0, 0, 2, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0]])
boardlist0 = [board]*2
moves = [[0,0], [7,4]]
for move, board in zip(moves, boardlist0):
board[move[0], move[1]] = 1
for board in boardlist0:
print(board)
Note that boardlist = [board]*2 makes a 2-element list where each element references the exact same object -- not a copy of -- board. Thus, altering boardlist0[0] affects boardlist0[1], and vice versa. If instead you want two independent boards, use
boardlist0 = [board.copy() for i in range(2)]

2d array of zeros

There is no array type in python, but to emulate it we can use lists. I want to have 2d array-like structure filled in with zeros. My question is: what is the difference, if any, in this two expressions:
zeros = [[0 for i in xrange(M)] for j in xrange(M)]
and
zeros = [[0]*M]*N
Will zeros be same? which one is better to use by means of speed and readability?

You should use numpy.zeros. If that isn't an option, you want the first version. In the second version, if you change one value, it will be changed elsewhere in the list -- e.g.:
>>> a = [[0]*10]*10
>>> a
[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
>>> a[0][0] = 1
>>> a
[[1, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
This is because (as you read the expression from the inside out), you create a list of 10 zeros. You then create a list of 10 references to that initial list of 10 zeros.
Note that:
zeros = [ [0]*M for _ in range(N) ] # Use xrange if you're still stuck in the python2.x dark ages :).
will also work and it avoids the nested list comprehension. If numpy isn't on the table, this is the form I would use.

for Python 3 (no more xrange), the preferred answer
zeros = [ [0] * N for _ in range(M)]
for M x N array of zeros

In second case you create a list of references to the same list. If you have code like:
[lst] * N
where the lst is a reference to a list, you will have the following list:
[lst, lst, lst, lst, ..., lst]
But because the result list contains references to the same object, if you change a value in one row it will be changed in all other rows.

Zhe Hu's answer is the safer one and should have been the best answer. This is because if we use the accepted answer method
a = [[0] * 2] * 2
a[0][0] = 1
print(a)
will give the answer
[[1,0],[1,0]]
So even though you just want to update the first row first column value, all the values in the same column get updated. However
a = [[0] * 2 for _ in range(2)]
a[0][0] = 1
print(a)
gives the correct answer
[[1,0],[0,0]]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to select a certain amount of data as numpy array? - python

Related

Detect index of multiple maximum in a 2D array

place the mydata_array into the random location of Big_array of zeros

How can I manipulate a list of lists?

Why do I get and error when my loop runs the 2nd time? TypeError: 'int' object has no attribute 'getitem'

2d array of zeros

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to select a certain amount of data as numpy array? - python

Related

Detect index of multiple maximum in a 2D array

place the mydata_array into the random location of Big_array of zeros

How can I manipulate a list of lists?

Why do I get and error when my loop runs the 2nd time? TypeError: 'int' object has no attribute '__getitem__'

2d array of zeros

Categories

Resources

Why do I get and error when my loop runs the 2nd time? TypeError: 'int' object has no attribute 'getitem'