Multiply row in numpy array of fields with a list - python

Following on from this question:
Unexpectedly large array created with numpy.ones when setting names
When I multiply
a = np.ones([len(sectors),len(columns)])
a[0,:] *= [1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8]
It works fine.
When I try
columns = ["Port Wt", "Bench Wt", "Port Retn", "Bench Retn",
"Attrib", "Select", "Inter", "Total"]
a = np.ones((10,), dtype={"names":columns, "formats":["f8"]*len(columns)})
a[0] *= [1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8]
I get the error
TypeError: cannot convert to an int; scalar object is not a number
I would like to use field-names if possible. What am I doing wrong here?
Many thanks.

A element (row) of this a can be modified by assigning it a tuple. We can take advantage of the fact that lists easily convert to and from arrays, to write:
In [162]: a = np.ones((10,), dtype={"names":columns, "formats":["f8"]*len(columns)})
In [163]: x=[1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8]
In [164]: a[0]=tuple(np.array(x)*list(a[0]))
In [165]: a
Out[165]:
array([(1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8),
...], dtype=[('Port Wt', '<f8'), ('Bench Wt', '<f8'),...
More generally you could write
a[i] = tuple(foo(list(a[i]))
Multiple values ('rows') of a can be set with a list of tuples.
An earlier SO structure array question (https://stackoverflow.com/a/26183332/901925) suggests another solution - create a partner 2d array that shares the same data buffer.
In [311]: a1 = np.empty((10,8)) # conventional 2d array
In [312]: a1.data = a.data # share the a's data buffer
In [313]: a1[0] *= x # do math on a1
In [314]: a1
Out[314]:
array([[ 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8],
...
[ 1. , 1. , 1. , 1. , 1. , 1. , 1. , 1. ]])
In [315]: a
Out[315]:
array([(1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8),
...
(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0)],
dtype=[('Port Wt', '<f8'), ('Bench Wt', '<f8'), ('Port Retn', '<f8'), ('Bench Retn', '<f8'), ('Attrib', '<f8'), ('Select', '<f8'), ('Inter', '<f8'), ('Total', '<f8')])
By sharing the data buffer, changes made to a1 affect a as well.
It might be better to view 2d a1 as the primary array, and a as a structured view. a could be constructed on the fly, as needed to display the data, access columns by name, or write to a csv file.

The rows of your array a are not numpy's arrays, the closest things to them are possibly tuples
>>> import numpy as np
>>> columns = ["Port Wt", "Bench Wt", "Port Retn", "Bench Retn",
... "Attrib", "Select", "Inter", "Total"]
>>> a = np.ones((10,), dtype={"names":columns, "formats":["f8"]*len(columns)})
>>> type(a[0,0])
IndexError: too many indices
>>> type(a[0][0])
numpy.float64
>>> type(a[0])
numpy.void
>>>
on the contrary the columns of a are ndarray's and you can multiply them by a list of floats of the correct length (not the nuber of columns but the number of rows)
>>> type(a['Select'])
numpy.ndarray
>>> a['Select']*[1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-27-fc8dc4596098> in <module>()
----> 1 a['Select']*[1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8]
ValueError: operands could not be broadcast together with shapes (10,) (8,)
>>> a['Select']*[1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8, 0,0]
array([ 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 0. , 0. ])
>>>
Edit
In response to a comment from OP: «is it not possible to apply a function to a row in a named array of fields (or tuple) in numpy?»
The only way that I know of is
>>> a[0] = tuple(b*a[c][0] for b, c in zip([1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8],columns))
>>> print a
[(1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8)
(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0)
(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0)
(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0)
(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0)
(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0)
(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0)
(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0)
(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0)
(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0)]
>>>
but I'm not the most skilled numpy expert around... maybe one of the least skilled indeed

Related

Change String to Float Python

I would like to convert a list that I took out from a txt file into a float so I can make some calculous after in Python. I have the following list:
['1,0,1.2', '2,-1.5,1.2', '3,-1.5,0', '4,0,0', '5,1.5,1.2']
And I would like it to look like this:
[1,0,1.2,2,-1.5,1.2,3,-1.5,0,4,0,0,5,1.5,1.2]
All of them being float type.
Thank you in advance
Two loops are needed here, an outer for the array and an inner loop over the spitted strings.
>>> new = [float(v) for inner in a for v in inner.split(",")]
>>> new
[1.0, 0.0, 1.2, 2.0, -1.5, 1.2, 3.0, -1.5, 0.0, 4.0, 0.0, 0.0, 5.0, 1.5, 1.2]
EDIT:
To up it to accept any case, differentiate between int/float for example:
>>> from ast import literal_eval
>>> new = [literal_eval(v) for inner in a for v in inner.split(",")]
>>> new
[1, 0, 1.2, 2, -1.5, 1.2, 3, -1.5, 0, 4, 0, 0, 5, 1.5, 1.2]
You can do with map,
In [1]: l = ['1,0,1.2', '2,-1.5,1.2', '3,-1.5,0', '4,0,0', '5,1.5,1.2']
In [2]: list(map(float, ','.join(l).split(',')))
Out[2]: [1.0, 0.0, 1.2, 2.0, -1.5, 1.2, 3.0, -1.5, 0.0, 4.0, 0.0, 0.0, 5.0, 1.5, 1.2]

Adjusting the size of marker in legend in plt.fill

I want to plot a line against dataset that is actually two lines that are very close together, so I am plotting it with plt.fill. Now, I want to adjust the width of the line of the legend for the fillplot to have a similar width to the line in plt.plot. I tried using linewidth= 0.2, but with linewidth I can only change the width in the plt.plot legend.
I changed the original data to the area between these arrays, because I didn't want to upload the data file. Also changing the data of a measurement that I am comparing my theory against seems disingenuous, so I probably shouldn't do that.
import matplotlib.pyplot as plt
import numpy as np
a_seq = [0.,0.5, 1.5 ,2., 2.5, 3., 3.5, 4., 4., 4. ]
l1 = [0.0, 0.0, 1.0, 1.0, 1.0, 2.0, 3.0, 4.0, 4.0, 4.0]
l2 = [0.0, 0.2, 1.1, 1.2, 1.2, 2.1, 3.2, 4.0, 4.0, 4.0]
plt.plot(np.arange(0,len(a_seq)),a_seq)
#plt.plot(l1)
#plt.plot(l2)
plt.fill_between( np.arange(0,len(l1)), l2, l1, alpha=0.5)
plt.legend(['line3','line4'])
plt.show()
You need to access that legend object with legendHandles and then set the height of it with set_height to your desired height, like so:
import matplotlib.pyplot as plt
import numpy as np
a_seq = [0.,0.5, 1.5 ,2., 2.5, 3., 3.5, 4., 4., 4. ]
l1 = [0.0, 0.0, 1.0, 1.0, 1.0, 2.0, 3.0, 4.0, 4.0, 4.0]
l2 = [0.0, 0.2, 1.1, 1.2, 1.2, 2.1, 3.2, 4.0, 4.0, 4.0]
plt.plot(np.arange(0,len(a_seq)),a_seq)
plt.fill_between( np.arange(0,len(l1)), l2, l1, alpha=0.5)
legend = plt.legend(['line3','line4'])
legend.legendHandles[1].set_height(2.0)
plt.show()
Output:

How to calculate sigma_1 and sigma_2 with Covariance Matrix

I'm reading this article.
In the "Covariance matrix & SVD" section,
there are two \sigmas, which are \sigma_1 and \sigma_2.
Those values are 14.4 and 0.19, respectively.
How can I get these values?
I already calculated the covariance matrix with Numpy:
import numpy as np
a = np.array([[2.9, -1.5, 0.1, -1.0, 2.1, -4.0, -2.0, 2.2, 0.2, 2.0, 1.5, -2.5],
[4.0, -0.9, 0.0, -1.0, 3.0, -5.0, -3.5, 2.6, 1.0, 3.5, 1.0, -4.7]])
cov_mat = (a.shape[1] - 1) * np.cov(a)
print(cov_mat)
# b = np.std(a, axis=1)**0.5
b = (a.shape[1] - 1) * np.std(a, axis=1)**0.5
# b = np.std(cov_mat, axis=1)
# b = np.std(cov_mat, axis=1)**0.5
print(b)
The result is:
[[ 53.46 73.42]
[ 73.42 107.16]]
[15.98102431 19.0154037 ]
No matter what I do, I can't get 14.4 and 0.19.
Are they just wrong values?
Please help me. Thank you in advance.
Don't know why you "un-sampled' your covariance, but the original np.cov output is what you want to get eigenvalues of:
np.linalg.eigvalsh(np.cov(a))
Out[]: array([ 0.19403958, 14.4077786 ])

Plotting mesh data from vtk python using matplotlib

The following questions makes use of vtk python but what I am attempting to do should not require any knowledge of vtk because I have converted the data I wish to plot into numpy arrays described below. If anyone does know of an improvement to the way I go about actually processing the vtk data into numpy, please let me know!
I have some data that I have extracted using vtk python. The data consists of a 3D unstructured grid and has several 'blocks'. The block I am interested in is block0. The data is contained at each cell rather than at each point. I wish to plot a contourf plot of a scalar variable on this grid using matplotlib. In essence my problem comes down to the following:
Given a set of cell faces with known vertices in space and a known scalar field variable, create a contour plot as one would get if one had created a numpy.meshgrid and used plt.contourf/plt.pcolormesh etc. Basically I post process my vtk data like so:
numCells = block0.GetCells().GetNumberOfCells()
# Array of the 8 vertices that make up a cell in 3D
cellPtsArray = np.zeros((numCells,8,3))
# Array of the 4 vertices that make up a cell face
facePtsArray = np.zeros((numCells,4,3))
#Array to store scalar field value from each cell
valueArray = np.zeros((numCells,1))
for i in xrange(numCells):
cell = block0.GetCell(i)
numCellPts = cell.GetNumberOfPoints()
for j in xrange(numCellPts):
cellPtsArray[i,j,:] = block0.GetPoint(cell.GetPointId(j))
valueArray[i] = block0.GetCellData().GetArray(3).GetValue(i)
xyFacePts = cell.GetFaceArray(3)
facePtsArray[i,:,:] = cellPtsArray[i,xyFacePts,:]
Now I wish to create a contour plot of this data (fill each cell in space according to an appropriate colormap of the scalar field variable). Is there a good built in function in matplotlib to do this? Note that I cannot use any form of automatic triangulation-the connectivity of the mesh is already specified by facePtsArray by the fact that connections between points of a cell have been ordered correctly (see my plot below)
Here is some test data:
import numpy as np
import matplotlib.pyplot as plt
# An example of the array containing the mesh information: In this case the
# dimensionality is (9,4,3) denoting 9 adjacent cells, each with 4 vertices and
# each vertex having (x,y,z) coordinates.
facePtsArray = np.asarray([[[0.0, 0.0, 0.0 ],
[1.0, 0.0, 0.0 ],
[1.0, 0.5, 0.0 ],
[0.0, 0.5, 0.0 ]],
[[0.0, 0.5, 0.0 ],
[1.0, 0.5, 0.0 ],
[1.0, 1.0, 0.0 ],
[0.0, 1.0, 0.0 ]],
[[0.0, 1.0, 0.0 ],
[1.0, 1.0, 0.0 ],
[1.0, 1.5, 0.0 ],
[0.0, 1.5, 0.0 ]],
[[1.0, 0.0, 0.0 ],
[2.0, -0.25, 0.0],
[2.0, 0.25, 0.0],
[1.0, 0.5, 0.0]],
[[1.0, 0.5, 0.0],
[2.0, 0.25, 0.0],
[2.0, 0.75, 0.0],
[1.0, 1.0, 0.0]],
[[1.0, 1.0, 0.0],
[2.0, 0.75, 0.0],
[2.0, 1.25, 0.0],
[1.0, 1.5, 0.0]],
[[2.0, -0.25, 0.0],
[2.5, -0.75, 0.0],
[2.5, -0.25, 0.0 ],
[2.0, 0.25, 0.0]],
[[2.0, 0.25, 0.0],
[2.5, -0.25,0.0],
[2.5, 0.25, 0.0],
[2.0, 0.75, 0.0]],
[[2.0, 0.75, 0.0],
[2.5, 0.25, 0.0],
[2.5, 0.75, 0.0],
[2.0, 1.25, 0.0]]])
valueArray = np.random.rand(9) # Scalar field values for each cell
plt.figure()
for i in xrange(9):
plt.plot(facePtsArray[i,:,0], facePtsArray[i,:,1], 'ko-')
plt.show()

How to return a view of several columns in numpy structured array

I can see several columns (fields) at once in a numpy structured array by indexing with a list of the field names, for example
import numpy as np
a = np.array([(1.5, 2.5, (1.0,2.0)), (3.,4.,(4.,5.)), (1.,3.,(2.,6.))],
dtype=[('x',float), ('y',float), ('value',float,(2,2))])
print a[['x','y']]
#[(1.5, 2.5) (3.0, 4.0) (1.0, 3.0)]
print a[['x','y']].dtype
#[('x', '<f4') ('y', '<f4')])
But the problem is that it seems to be a copy rather than a view:
b = a[['x','y']]
b[0] = (9.,9.)
print b
#[(9.0, 9.0) (3.0, 4.0) (1.0, 3.0)]
print a[['x','y']]
#[(1.5, 2.5) (3.0, 4.0) (1.0, 3.0)]
If I only select one column, it's a view:
c = x['y']
c[0] = 99.
print c
#[ 99. 4. 3. ]
print a['y']
#[ 99. 4. 3. ]
Is there any way I can get the view behavior for more than one column at once?
I have two workarounds, one is to just loop through the columns, the other is to create a hierarchical dtype, so that the one column actually returns a structured array with the two (or more) fields that I want. Unfortunately, zip also returns a copy, so I can't do:
x = a['x']; y = a['y']
z = zip(x,y)
z[0] = (9.,9.)
You can create a dtype object contains only the fields that you want, and use numpy.ndarray() to create a view of original array:
import numpy as np
strc = np.zeros(3, dtype=[('x', int), ('y', float), ('z', int), ('t', "i8")])
def fields_view(arr, fields):
dtype2 = np.dtype({name:arr.dtype.fields[name] for name in fields})
return np.ndarray(arr.shape, dtype2, arr, 0, arr.strides)
v1 = fields_view(strc, ["x", "z"])
v1[0] = 10, 100
v2 = fields_view(strc, ["y", "z"])
v2[1:] = [(3.14, 7)]
v3 = fields_view(strc, ["x", "t"])
v3[1:] = [(1000, 2**16)]
print(strc)
here is the output:
[(10, 0.0, 100, 0L) (1000, 3.14, 7, 65536L) (1000, 3.14, 7, 65536L)]
Building on #HYRY's answer, you could also use ndarray's method getfield:
def fields_view(array, fields):
return array.getfield(numpy.dtype(
{name: array.dtype.fields[name] for name in fields}
))
As of Numpy version 1.16, the code you propose will return a view. See 'NumPy 1.16.0 Release Notes->Future Changes->multi-field views return a view instead of a copy' on this page:
https://numpy.org/doc/stable/release/1.16.0-notes.html#multi-field-views-return-a-view-instead-of-a-copy
I don't think there is an easy way to achieve what you want. In general, you cannot take an arbitrary view into an array. Try the following:
>>> a
array([(1.5, 2.5, [[1.0, 2.0], [1.0, 2.0]]),
(3.0, 4.0, [[4.0, 5.0], [4.0, 5.0]]),
(1.0, 3.0, [[2.0, 6.0], [2.0, 6.0]])],
dtype=[('x', '<f8'), ('y', '<f8'), ('value', '<f8', (2, 2))])
>>> a.view(float)
array([ 1.5, 2.5, 1. , 2. , 1. , 2. , 3. , 4. , 4. , 5. , 4. ,
5. , 1. , 3. , 2. , 6. , 2. , 6. ])
The float view of your record array shows you how the actual data is stored in memory. A view into this data has to be expressible as a combination of a shape, strides and offset into the above data. So if you wanted, for instance, a view of 'x' and 'y' only, you could do the following:
>>> from numpy.lib.stride_tricks import as_strided
>>> b = as_strided(a.view(float), shape=a.shape + (2,),
strides=a.strides + a.view(float).strides)
>>> b
array([[ 1.5, 2.5],
[ 3. , 4. ],
[ 1. , 3. ]])
The as_strided does the same as the perhaps easier to understand:
>>> bb = a.view(float).reshape(a.shape + (-1,))[:, :2]
>>> bb
array([[ 1.5, 2.5],
[ 3. , 4. ],
[ 1. , 3. ]])
Either of this is a view into a:
>>> b[0,0] =0
>>> a
array([(0.0, 2.5, [[0.0, 2.0], [1.0, 2.0]]),
(3.0, 4.0, [[4.0, 5.0], [4.0, 5.0]]),
(1.0, 3.0, [[2.0, 6.0], [2.0, 6.0]])],
dtype=[('x', '<f8'), ('y', '<f8'), ('value', '<f8', (2, 2))])
>>> bb[2, 1] = 0
>>> a
array([(0.0, 2.5, [[0.0, 2.0], [1.0, 2.0]]),
(3.0, 4.0, [[4.0, 5.0], [4.0, 5.0]]),
(1.0, 0.0, [[2.0, 6.0], [2.0, 6.0]])],
dtype=[('x', '<f8'), ('y', '<f8'), ('value', '<f8', (2, 2))])
It would be nice if either of this could be converted into a record array, but numpy refuses to do so, the reason not being all that clear to me:
>>> b.view([('x',float), ('y',float)])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: new type not compatible with array.
Of course what works (sort of) for 'x' and 'y' would not work, for instance, for 'x' and 'value', so in general the answer is: it cannot be done.
In my case 'several columns' happens to be equal to two columns of the same data type, where I can use the following function to make a view:
def make_view(arr, fields, dtype):
offsets = [arr.dtype.fields[f][1] for f in fields]
offset = min(offsets)
stride = max(offsets)
return np.ndarray((len(arr), 2), buffer=arr, offset=offset, strides=(arr.strides[0], stride-offset), dtype=dtype)
I think this boils down the the same thing #Jamie said, it cannot be done in general, but for two columns of the same dtype it can. The result of this function is not a dict but a good old fashioned numpy array.

Categories

Resources