Numpy concatenating ints to string - python

I have a rather big numpy array of M*N ints and I want to end up with a M array of strings that have all N corresponding ints concatenated.
I tried using a view but this is probably not the way to go with numpy.

Hope this is what you want
numpy.apply_along_axis(numpy.array_str,0,array)
Look the documentation of apply_along_axis
http://docs.scipy.org/doc/numpy/reference/generated/numpy.apply_along_axis.html
and array_str
http://docs.scipy.org/doc/numpy/reference/generated/numpy.array_str.html
for deeper understanding

if you want to concatenating ints to string it wasnt be numpy array longer ! you will have a list with string indexes.
this is an example that concatenate 'a' to a numpy zero array :
>>> np.zeros(10)
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
>>> [str(i)+'a' for i in np.zeros(10)]
['0.0a', '0.0a', '0.0a', '0.0a', '0.0a', '0.0a', '0.0a', '0.0a', '0.0a', '0.0a']
>>>

Without an example, your request is unclear. But here's one way of understanding it
In [13]: X=np.arange(12).reshape(3,4)
In [14]: np.array([''.join([str(i) for i in x]) for x in X])
Out[14]:
array(['0123', '4567', '891011'],
dtype='|S6')
I have a 3x4 array; I convert each element to a string using str(i), and use join to concatenate the strings of a row into one longer string.
That's not a very satisfying answer, especially when joining '9' to '10'. Of course it could be refined by elaborating on the 'int' to 'string' formatting (ie. fixed width), maybe adding delimiters in the 'join', etc.
In [21]: np.array([','.join(['{:*^8}'.format(i) for i in x]) for x in X])
Out[21]:
array(['***0****,***1****,***2****,***3****',
'***4****,***5****,***6****,***7****',
'***8****,***9****,***10***,***11***'],
dtype='|S35')
A view would only work if what you want is some sort of string to bytes representation, or str to char.

Related

Calculate the mean over a mixed data structure

I have a list of lists that looks something like:
data = [
[1., np.array([2., 3., 4.]), ...],
[5., np.array([6., 7., 8.]), ...],
...
]
where each of the internal lists is the same length and contains the same data type/shape at each entry. I would like to calculate the mean over corresponding entries and return something of the same structure as the internal lists. For example, in the above case (assuming only two entries) I want the result to be:
[3., np.array([4., 5., 6.]), ...]
What is the best way to do this with Python?
data is a list, so a list comprehension seems like a natural option. Even if it were a numpy array, given that it's a jagged array, it wouldn't benefit from being wrapped in an ndarray anyway, so a list comp would still be the best option, in my opinion.
Anyway, use zip() to "transpose" data and call np.mean() in a loop to find mean along the first axis.
[np.mean(x, axis=0) for x in zip(*data)]
# [3.0, array([4., 5., 6.]), array([[2., 2.], [2., 2.]])]
if you have a list exactly the same as the one shown in the example, you can do it with the following code.
First we declare some variables to store our results:
number_sum = 0
list_sum = np.array([0,0,0])
It is important that you initialize the values ​​you need to 0 in list_sum. That is, if the data array contains 5 elements, that array should be: list_sum = np.array([0,0,0,0,0]).
The next step is to perform the sum of all elements in data. First we add the int values ​​and then we perform the sum of each element of the list as follows:
for number, nparray in data:
number_sum += number
for index, item in enumerate(nparray):
list_sum[index] += item
Since we know how the variable data is structured (each input is made up of an int value and an np.array) we can do the addition that way. Although be careful with the computational complexity because in examples with longer arrays it could become very high in terms of complexity, since two for loops are being nested.
Finally, you can check that if you divide the sum of the elements by the length of data you get the desired value:
print(number_sum/len(data))
print(list_sum/len(data))
Now you just have to add those two new values ​​to a new list. I hope it helps, greetings and good luck!
The following works:
import numpy as np
data = [
[1., np.array([2., 3., 4.]), np.array([[1., 1.], [1., 1.]])],
[5., np.array([6., 7., 8.]), np.array([[3., 3.], [3., 3.]])],
]
number_of_samples = len(data)
number_of_elements = len(data[0])
means = []
for ielement in range(number_of_elements):
mean_list = []
for isample in range(number_of_samples):
mean_list.append(data[isample][ielement])
mean_list = np.stack(mean_list)
mean = mean_list.mean(axis=0)
means.append(mean)
print(means)
but is a bit ugly, nests a for loops, and does not seem to be very pythonic. Any improvements over this are welcomed.

Populate with new value(s) to a fixed-shape numpy array filled with zeros

Given that a numpy array is stored contiguously, if we try to append or extend to it then this happens not in-place but, instead, a new copy of the array is created with adequate 'room' for the append or extend to occur contiguously (see https://stackoverflow.com/a/13215559/3286832).
To avoid that, and assuming we are lucky enough to know the specific number of elements we expect the array to have, we can create a numpy array with a fixed size filled with zeros:
import numpy as np
a = np.zeros(shape=(100,)) # [0. 0. 0. ... 0. 0. 0.]
Say that we want to populate this array each element with a new value each time (e.g. provided by the user) by editing this array in-place:
pos = 0
a[pos] = 0.002 # [0.002 0. 0. ... 0. 0. 0.]
pos = pos + 1
a[pos] = 0.101 # [0.002 0.101 0. ... 0. 0. 0.]
# etc.
pos = -1
a[pos] = 42.00 # [0.002 0.101 ... ... ... 42.]
Question:
Is there a way to keep track of the next available position pos (i.e. last position not previously populated with a new input value) instead of manually incrementing pos each time?
Is there a way in efficiently achieving this in numpy, preferably? Or is there a way of achieving this in another Python library (e.g. scipy or Pandas)?
(edited the question according the comments and initial answers which stated how not clear my initial question was phrased - hope this now is clearer)
Actually, your question is still confusing for me. How do you define the new value you want to insert to the new position? Is it coming from outside of your code? Do you have all the new values for your array, or only part of it?
Probably, you can use the slices in numpy, which are exactly for fast updating of the array, however, I'm not exactly sure that this is what you want to do.
Some samples for you:
>>> import numpy as np
>>> a = np.zeros(shape=(10,))
>>> a
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
>>> a[3:6] += 1
>>> a
array([0., 0., 0., 1., 1., 1., 0., 0., 0., 0.])
>>> a[:4] += .001
>>> a
array([1.000e-03, 1.000e-03, 1.000e-03, 1.001e+00, 1.000e+00, 1.000e+00,
0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00])
>>> a[3:5] = [2, 1]
>>> a
array([1.e-03, 1.e-03, 1.e-03, 2.e+00, 1.e+00, 1.e+00, 0.e+00, 0.e+00,
0.e+00, 0.e+00])
>>>
If I understand you correctly, you need some kind of circular buffer. Python has collections.deque for this purpose.
Here is my custom implementation of circular buffer using h5py, but you can change it to numpy.
Update: As it was already mentioned in comments it is impossible to track changes of an np.array out of the box. Instead, you can implement your own class and track all the necessary changes there (see my implementation as an example, i.e. concatenate arrays to extend its size). I'd suggest you to use python list if you need appending or deque if you need fixed size. The both arrays can be then converted to np.array

Adding arrays which may contain 'None'-entries

I have a question regarding the addition of numpy arrays.
Let's assume I have defined a function
def foo(a,b):
return a+b
that takes two arrays of the same shape and simply returns their sum.
Now, I have to deal with the cases that some of the entries may be None.
I would like to deal with those entries as they correspond to float(0), such that
[1.0,None,2.0] + [1.0,2.0,2.0]
would add up to
[2.0,2.0,4.0]
Can you provide me with an already-implemented solution?
TIA
I suggest numpy.nan_to_num:
>>> np.nan_to_num(np.array([1.0,None,2.0], dtype=np.float))
array([ 1., 0., 2.])
Then,
>>> def foo(a,b):
... return np.nan_to_num(a) + np.nan_to_num(b)
...
>>> foo(np.array([1.0,None,2.0], dtype=np.float), np.array([1.0,2.0,2.0], dtype=np.float))
array([ 2., 0., 4.])
Usually, the answer to this is to use an array of floats, rather than an array of arbitrary objects, and then use np.nan instead of None. NaN has well-defined semantics for arithmetic. (Also, using an array of floats instead of objects will make your code significantly more time and space efficient.)
Notice that you don't have to manually convert None to np.nan if you build the array with an explicit dtype of float or np.float64. Both of these are equivalent:
>>> a = np.array([1.0,np.nan,2.0])
>>> a = np.array([1.0,None,2.0],dtype=float)
Which means that if, for some reason, you really needed arrays of arbitrary objects with actual None in them, you could do that, and then convert it to an array of floats on the fly to get the benefits of NaN:
>>> a.astype(float) + b.astype(float)
At any rate, in this case, just using NaN isn't sufficient:
>>> a = np.array([1.0,np.nan,2.0])
>>> b = np.array([1.0,2.0,2.0])
>>> a + b
array([ 2., nan, 4.])
That's because the semantics of NaN are that the result of any operation with NaN returns NaN. But you want to treat it as 0.
But it does make the problem easy to solve. The simplest way to solve that is with the function nan_to_num:
>>> np.nan_to_num(a, 0)
array([1., 0., 2.0])
>>> np.nan_to_num(a, 0) + np.nan_to_num(b, 0)
array([2., 2., 4.])
You can use column_stack to concatenates both arrays along the second axis then use np.nansum() to sum items over the second axis.
In [15]: a = np.array([1.0,None,2.0], dtype=np.float)
# Using dtype here is necessary to convert None to np.nan
In [16]: b = np.array([1.0,2.0,2.0])
In [17]: np.nansum(np.column_stack((a, b)), 1)
Out[17]: array([2., 2., 4.])

How to slice a np.array using a list of unknown length?

I am probably using the wrong names/notation (and an answer probably exists on SO, but I can't find it). Please help me clarify so that I can update the post, and help people like me in the future.
I have an array A of unknown dimension n, and a list of indexes of unknown length l, where l<=n.
I want to be able to select the slice of A that correspond to the indexes in l. I.e I want:
A = np.zeros([3,4,5])
idx = [1,3]
B = # general command I am looking for
B_bad = A[idx] # shape = (2,4,5), not what I want!
B_manual = A[idx[0], idx[1]] # shape = (5), what I want, but as a general expression.
# it is okay that the indexes are in order i.e. 0, 1, 2, ...
You need a tuple:
>>> A[tuple(idx)]
array([0., 0., 0., 0., 0.])
>>> A[tuple(idx)].shape
(5,)
Indexing with a list doesn't have the same meaning. See numpy's documentation on indexing for more information.

How does numpy array typing interact with object?

I am currently trying to implement a datatype that stores floats in an numpy array. However trying to assign an array with elements of this type with various lengths seems to obviously break the code. One would assign a sequence to an array element, which is not possible.
One can bypass this by using the data type object instead of float. Why is that? How could one resolve this problem using floats without creating a sequence?
Example code that does not work.
from numpy import *
foo= dtype(float32, [])
x = array([[2., 3.], [3.]], dtype=foo)
Example code that does work:
from numpy import *
foo= dtype(float32, [])
x = array([[2., 3.], [3., 2.]], dtype=foo)
Example code that does work, I try to replicate for float:
from numpy import *
foo= dtype(object, [])
x = array([[2., 3.], [3.]], dtype=foo)
The object dtype in Numpy simply creates an array of pointers to Python objects. This means you lose the performance advantage you usually get from Numpy, but it's still sometimes useful to do this.
Your last example creates a one-dimensional Numpy array of length two, so that's two pointers to Python objects. Both these objects happen to be lists, and Python list have arbitrary dynamic length.
I don't know what you were trying to achieve with this, but note that
>>> np.dtype(np.float32, []) == np.float32
True
Arrays require the same number of elements for each row. So, if you feed a list of lists in numpy and all sublists have the same number of elements, it'll happily convert it to an array. This is why your second example works.
If the sublists are not the same length, then each sublist is treated as a single object and you end up with a 1D array of objects. This is why your third example works. Your first example doesn't work because you try to cast a sequence of objects to floats, which isn't possible.
In short, you can't create an array of floats if your sublists are of different lengths. At best, you can create an array of 1D arrays, since they are still considered objects.
>>> x = np.array(list(map(np.array, [[2., 3.], [3.]])))
>>> x
array([array([ 2., 3.]), array([ 3.])], dtype=object)
>>> x[0]
array([ 2., 3.])
>>> x[0][1]
3.0
>>> # but you can't do this
>>> x[0,1]
Traceback (most recent call last):
File "<pyshell#18>", line 1, in <module>
x[0,1]
IndexError: too many indices for array
If you're bent on creating a float 2D array, you have to extend all your sublists to the same size with None, which will be converted to np.nan.
>>> lists = [[2., 3.], [3.]]
>>> max_len = max(map(len, lists))
>>> for i, sublist in enumerate(lists):
sublist = sublist + [None] * (max_len - len(sublist))
lists[i] = sublist
>>> np.array(lists, dtype=np.float32)
array([[ 2., 3.],
[ 3., nan]], dtype=float32)

Categories

Resources