I have a (2x3) raster file with the following values:
-5-6
-4-5
-1-2
Normally the .xyz GIS file format would be column organised represented by the following numpy array: (coordinates are lower left corner)
col = numpy.array([[0,0,-1],[1,0,-2],[0,1,-3],[1,1,-4],[0,2,-5],[1,2,-6]])
Unfortunately I have a row organized structure (from this data comes from https://www.opengeodata.nrw.de/). It can be represented by the following numpy array:
row = numpy.array([[0,0,-1],[0,1,-3],[0,2,-5],[1,0,-2],[1,1,-4],[1,2,-6]])
print (row)
[[ 0 0 -1]
[ 0 1 -3]
[ 0 2 -5]
[ 1 0 -2]
[ 1 1 -4]
[ 1 2 -6]]
I need to rearrange this row array into a col array. I am currently using this code:
rr = row.reshape(2,3,3)
stack = numpy.column_stack(rr[:,:,:])
new_col =(stack.reshape(-1,3))
print (new_col)
[[ 0 0 -1]
[ 1 0 -2]
[ 0 1 -3]
[ 1 1 -4]
[ 0 2 -5]
[ 1 2 -6]]
This works but my question: Is this the best way to tackle this array transformation? I have little experience manipulation numpy arrays.
Thanks
Nicolas
You can use transpose method to rearrange the axes.
import numpy
col = numpy.array([[0,0,-1],[1,0,-2],[0,1,-3],[1,1,-4],[0,2,-5],[1,2,-6]])
row = numpy.array([[0,0,-1],[0,1,-3],[0,2,-5],[1,0,-2],[1,1,-4],[1,2,-6]])
# New solution
new_col = row.reshape(2,3,3).transpose(1,0,2).reshape(-1,3)
print(numpy.array_equal(col, new_col))
It works faster than via using column_stack or hstack.
I think what your doing is fine, but for readability I would use
stack = numpy.hstack(rr)
instead of
stack = numpy.column_stack(rr[:,:,:])
Related
I need some help in converting the following code to a more efficient one without using iterrows().
for index, row in df.iterrows():
alist=row['index_vec'].strip("[] ").split(",")
blist=[int(i) for i in alist]
for col in blist:
df.loc[index, str(col)] = df.loc[index, str(col)] +1
The above code basically reads a string under 'index_vec' column, parses and converts to integers, and then increments the associated columns by one for each integer. An example of the output is shown below:
Take the 0th row as an example. Its string value is "[370, 370, -1]". So the above code increments column "370" by 2 and column "-1" by 1. The output display is truncated so that only "-10" to "17" columns are shown.
The use of iterrows() is very slow to process a large dataframe. I'd like to get some help in speeding it up. Thank you.
You can also use apply and set axis = 1 to go row wise. Then create a custom function pass into apply:
Example starting df:
index_vec 1201 370 -1
0 [370, -1, -1] 0 0 1
1 [1201, 1201] 0 1 1
import pandas as pd
df = pd.DataFrame({'index_vec': ["[370, -1, -1]", "[1201, 1201]"], '1201': [0, 0], '370': [0, 1], '-1': [1, 1]})
def add_counts(x):
counts = pd.Series(x['index_vec'].strip("[]").split(", ")).value_counts()
x[counts.index] = x[counts.index] + counts
return x
df.apply(add_counts, axis = 1)
print(df)
Outputs:
index_vec 1201 370 -1
0 [370, -1, -1] 0 1 3
1 [1201, 1201] 2 1 1
Let us do
a=df['index_vec'].str.strip("[] ").str.split(",").explode()
s=pd.crosstab(a.index,a).reindex_like(df).fillna(0)
df=df.add(a)
I have a numpy 2D array and I want to turn it to -1\1 values based on the following logic:
a. find the argmax() of each row
b. based on that 1D array (a) assign the values it contain the value 1
c. based on the negation of this 1D array assign the value -1
Example:
arr2D = np.random.randint(10,size=(3,3))
idx = np.argmax(arr2D, axis=1)
arr2D = [[5 4 1]
[0 9 4]
[4 2 6]]
idx = [0 1 2]
arr2D[idx] = 1
arr2D[~idx] = -1
what I get is this:
arr2D = [[-1 -1 -1]
[-1 -1 -1]
[-1 -1 -1]]
while I wanted:
arr2D = [[1 -1 -1]
[-1 1 -1]
[-1 -1 1]]
appreciate some help,
Thanks
Approach #1
Create a mask with those argmax -
mask = idx[:,None] == np.arange(arr2D.shape[1])
Then, use those indices and then use it to create those 1s and -1s array -
out = 2*mask-1
Alternatively, we could use np.where -
out = np.where(mask,1,-1)
Approach #2
Another way to create the mask would be -
mask = np.zeros(arr2D.shape, dtype=bool)
mask[np.arange(len(idx)),idx] = 1
Then, get out using one of the methods as listed in approach #1.
Approach #3
One more way would be like so -
out = np.full(arr2D.shape, -1)
out[np.arange(len(idx)),idx] = 1
Alternatively, we could use np.put_along_axis for the assignment -
np.put_along_axis(out,idx[:,None],1,axis=1)
This question already has an answer here:
Compute inverse of 2D arrays along the third axis in a 3D array without loops
(1 answer)
Closed 5 years ago.
I have a numpy array of size (4, 4, 6890), which basically stores contains 6890 4x4 matrices. I need to invert all of them and I am currently doing in a loop, which I know is a bad practice
for i in range(0, T.shape[2]):
T_inv[:,:,i] = np.linalg.inv(T[:,:,i])
How can I do it with a single call?
np.linalg.inv will do it, but you need to rearrange your axes:
T_inv = np.moveaxis(np.linalg.inv(np.moveaxis(T, -1, 0)), 0, -1)
Might be better to just construct T so that T.shape = (68690, 4, 4). It will help with broadcasting as well.
I'm not sure how to do it with numpy, but check this out:
[ A 0 0 ] [ A^(-1) 0 0 ] [ I 0 0 ]
[ 0 B 0 ] * [ 0 B^(-1) 0 ] = [ 0 I 0 ]
[ 0 0 C ] [ 0 0 C^(-1) ] [ 0 0 I ]
A,B,C being matrices of the same size (for example 4x4), and A^(-1), B^(-1), C^(-1), being their inverses. I is a unity matrix.
So, what does this tell us? We can construct a large sparse block-diagonal matrix with all the sub-matrices (4x4) on diagonal, take the inverse of that large matrix, and just read-out the sub-matrices' inverses off diagonal blocks.
I have created a numpy matrix with all elements initialized to zeros as shown:
[[[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
...
This is to resemble an image of the screenshot of a webpage which is of the size 1200 X 1000.
I have identified a few rectangular region of interest for different HTML objects such as Radiobutton, Textbox and dropdown within the screenshot image and assigned them fixed values like 1,2 and 3 for the respective object-regions in the numpy matrix created.
So the resultant matrix almost looks like :
[[[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
...,
[[1 1 1 1]
[1 1 1 1]
[1 1 1 1]
...,
[0 0 0 0]
[2 2 2 2]
[0 0 0 0]
...,
I wish to now prepare data set for Convolutional neural network with the patches from the screenshot image. For the purpose of improving the quality of the data supplied to the CNN, I wish to filter the patches and provide only the patches to the CNN which has presence of the objects i.e. Textbox, Radiobutton etc which were detected earlier (Radiobutton and dropdown selections should be there fully and button atleast 50% of the region should be included in the patch). Any ideas how it can be realized in python?
a very naive approach
maxY, maxX = np.shape(theMatrix)
for curY in range(0,maxY):
for curX in range(0,maxX):
print theMatrix[curY,curX],
print " "
You can just use the plot function to plot a 2D-array. Take for example:
import numpy as np
import matplotlib.pyplot as pyplot
x = np.random.rand(3, 2)
which will yield us
array([[ 0.53255518, 0.04687357],
[ 0.4600085 , 0.73059902],
[ 0.7153942 , 0.68812506]])
If you use the pyplot.plot(x, 'ro'), it will plot you the figure given below.
The row numbers are put in x-axis and the values are plotted in the y-axis. But from the nature of your problem , I suspect you need the columns numbers to be put in x-axis and the values in y-axis. To do so, you can simply transpose your matrix.
pyplot.plot(x.T,'ro')
pyplot.show()
which now yields (for the same array) the figure given below.
Question
Is there a good way to transform a DataFrame with an n-level index into an n-D Numpy array (a.k.a n-tensor)?
Example
Suppose I set up a DataFrame like
from pandas import DataFrame, MultiIndex
index = range(2), range(3)
value = range(2 * 3)
frame = DataFrame(value, columns=['value'],
index=MultiIndex.from_product(index)).drop((1, 0))
print frame
which outputs
value
0 0 0
1 1
2 3
1 1 5
2 6
The index is a 2-level hierarchical index. I can extract a 2-D Numpy array from the data using
print frame.unstack().values
which outputs
[[ 0. 1. 2.]
[ nan 4. 5.]]
How does this generalize to an n-level index?
Playing with unstack(), it seems that it can only be used to massage the 2-D shape of the DataFrame, but not to add an axis.
I cannot use e.g. frame.values.reshape(x, y, z), since this would require that the frame contains exactly x * y * z rows, which cannot be guaranteed. This is what I tried to demonstrate by drop()ing a row in the above example.
Any suggestions are highly appreciated.
Edit. This approach is much more elegant (and two orders of magnitude faster) than the one I gave below.
# create an empty array of NaN of the right dimensions
shape = map(len, frame.index.levels)
arr = np.full(shape, np.nan)
# fill it using Numpy's advanced indexing
arr[frame.index.codes] = frame.values.flat
# ...or in Pandas < 0.24.0, use
# arr[frame.index.labels] = frame.values.flat
Original solution. Given a setup similar to above, but in 3-D,
from pandas import DataFrame, MultiIndex
from itertools import product
index = range(2), range(2), range(2)
value = range(2 * 2 * 2)
frame = DataFrame(value, columns=['value'],
index=MultiIndex.from_product(index)).drop((1, 0, 1))
print(frame)
we have
value
0 0 0 0
1 1
1 0 2
1 3
1 0 0 4
1 0 6
1 7
Now, we proceed using the reshape() route, but with some preprocessing to ensure that the length along each dimension will be consistent.
First, reindex the data frame with the full cartesian product of all dimensions. NaN values will be inserted as needed. This operation can be both slow and consume a lot of memory, depending on the number of dimensions and on the size of the data frame.
levels = map(tuple, frame.index.levels)
index = list(product(*levels))
frame = frame.reindex(index)
print(frame)
which outputs
value
0 0 0 0
1 1
1 0 2
1 3
1 0 0 4
1 NaN
1 0 6
1 7
Now, reshape() will work as intended.
shape = map(len, frame.index.levels)
print(frame.values.reshape(shape))
which outputs
[[[ 0. 1.]
[ 2. 3.]]
[[ 4. nan]
[ 6. 7.]]]
The (rather ugly) one-liner is
frame.reindex(list(product(*map(tuple, frame.index.levels)))).values\
.reshape(map(len, frame.index.levels))
This can be done quite nicely using the Python xarray package which can be found here: http://xarray.pydata.org/en/stable/. It has great integration with Pandas and is quite intuitive once you get to grips with it.
If you have a multiindex series you can call the built-in method multiindex_series.to_xarray() (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_xarray.html). This will generate a DataArray object, which is essentially a name-indexed numpy array, using the index values and names as coordinates. Following this you can call .values on the DataArray object to get the underlying numpy array.
If you need your tensor to conform to a set of keys in a specific order, you can also call .reindex(index_name = index_values_in_order) (http://xarray.pydata.org/en/stable/generated/xarray.DataArray.reindex.html) on the DataArray. This can be extremely useful and makes working with the newly generated tensor much easier!