Converting an array to a float, how to reverse the process? - python

Suppose we start with an integer numpy array with integers between 0 and 99, i.e.
x = np.array([[1,2,3,1],[10,5,0,2]],dtype=int)
Now we want to represent rows in this array with a single unique value. One simple way to do this is representing it as a floating number. An intuitive way to do this is
rescale = np.power(10,np.arange(0,2*x.shape[1],2)[::-1],dtype=float)
codes = np.dot(x,rescale)
where we exploit that the integers have at most 2 digits. (I'm casting rescale as a float to avoid exceeding the maximum value of int in case the entries of x have more elements; this is not very elegant)
This returns
array([ 1020301., 10050002.])
How can this process be reversed to obtain x again?
I'm thinking of converting codes to a string, then split the string every 2nd entry. I'm not too familiar with these string operations, especially when they have to be executed on all entries of an array simultaneously. A problem is also that the first number has a varying number of digits, so trailing zeros have to be added in some way.
Maybe something simpler is possible using some divisions or rounding, or perhaps represting the rows of the array in a different manner. Important is that at least the initial conversion is fast and vectorized.
Suggestions are welcome.

First, you need to find the correct number of columns:
number_of_cols = max(ceil(math.log(v, 100)) for v in codes)
Note that is your first column is always 0, then there is no way with your code to know it even existed: [[0, 1], [0, 2]] -> [1., 2.] -> [[1], [2]] or [[0, 0, 0, 1], [0, 0, 0, 2]]. It might be something to consider.
Anyways, here is a mockup for the string way:
def decode_with_string(codes):
number_of_cols = max(ceil(math.log(v, 100)) for v in codes)
str_format = '{:0%dd}'%(2*number_of_cols) # prepare to format numbers as string
return [[int(str_format.format(int(code))[2*i:2*i+2]) # extract the wanted digits
for i in range(number_of_cols)] # for all columns
for code in codes] # for all rows
But you can also compute the numbers directly:
def decode_direct(codes):
number_of_cols = max(ceil(math.log(v, 100)) for v in codes)
return [[floor(code/(100**index)) % 100
for index in range(number_of_cols-1, -1, -1)]
for code in codes]
Example:
>>> codes = [ 1020301., 10050002.]
>>> number_of_cols = max(ceil(math.log(v, 100)) for v in codes)
>>> print(number_of_cols)
4
>>> print(decode_with_strings(codes))
[[1, 2, 3, 1], [10, 5, 0, 2]]
>>> print(decode_direct(codes))
[[1, 2, 3, 1], [10, 5, 0, 2]]
Here is a numpy solution:
>>> divisors = np.power(0.01, np.arange(number_of_cols-1, -1, -1))
>>> x = np.mod(np.floor(divisors*codes.reshape((codes.shape[0], 1))), 100)
Finally, you say you use float in case of overflow of int. First, the mantissa of floating point numbers is also limited, so you don't eliminate the risk of overflow. Second, in Python3, integer actually have unlimited precision.

You could exploit that Numpy stores its arrays as continuous blocks in memory. So storing the memory-block as binary string and remembering the shape of the array should be sufficient:
import numpy as np
x = np.array([[1,2,3,1],[10,5,0,2]], dtype=np.uint8) # 8 Bit are enough for 2 digits
x_sh = x.shape
# flatten array and convert to binarystring
xs = x.ravel().tostring()
# convert back and reshape:
y = np.reshape(np.fromstring(xs, np.uint8), x_sh)
The reason for flattening the array first is that you don't need to pay attention to the storage order of 2D arrays (C or FORTRAN order). Of course you also could generate a string for each row separately too:
import numpy as np
x = np.array([[1,2,3,1],[10,5,0,2]], dtype=np.uint8) # 8 Bit are enough for 2 digits
# conversion:
xss = [xr.tostring() for xr in x]
# conversion back:
y = np.array([np.fromstring(xs, np.uint8) for xs in xss])

Since your numbers are between 0 and 99, you should rather pad up to 2 digits: 0 becomes "00" , 5 becomes "05" and 50 becomes "50". That way, all you need to do is repeatedly divide your number by 100 and you'll get the values. Your encoding will also be smaller, since every number is encoded in 2 digits instead of 2-3 as you currently do.
If you want to be able to detect [0,0,0] (which is currently undistinguishable from [0] or [O.....O]) as well, add a 1 in front of your number: 1000000 is [0,0,0] and 100 is [0]. When your division returns 1, you know you've finished.
You can easily construct a string with that information and cast it to a number afterwards.

Related

Function Failing at Large List Sizes

I have a question: Starting with a 1-indexed array of zeros and a list of operations, for each operation add a value to each the array element between two given indices, inclusive. Once all operations have been performed, return the maximum value in the array.
Example: n = 10, Queries = [[1,5,3],[4,8,7],[6,9,1]]
The following will be the resultant output after iterating through the array, Index 1-5 will have 3 added to it etc...:
[0,0,0, 0, 0,0,0,0,0, 0]
[3,3,3, 3, 3,0,0,0,0, 0]
[3,3,3,10,10,7,7,7,0, 0]
[3,3,3,10,10,8,8,8,1, 0]
Finally you output the max value in the final list:
[3,3,3,10,10,8,8,8,1, 0]
My current solution:
def Operations(size, Array):
ResultArray = [0]*size
Values = [[i.pop(2)] for i in Array]
for index, i in enumerate(Array):
#Current Values in = Sum between the current values in the Results Array AND the added operation of equal length
#Results Array
ResultArray[i[0]-1:i[1]] = list(map(sum, zip(ResultArray[i[0]-1:i[1]], Values[index]*len(ResultArray[i[0]-1:i[1]]))))
Result = max(ResultArray)
return Result
def main():
nm = input().split()
n = int(nm[0])
m = int(nm[1])
queries = []
for _ in range(m):
queries.append(list(map(int, input().rstrip().split())))
result = Operations(n, queries)
if __name__ == "__main__":
main()
Example input: The first line contains two space-separated integers n and m, the size of the array and the number of operations.
Each of the next m lines contains three space-separated integers a,b and k, the left index, right index and summand.
5 3
1 2 100
2 5 100
3 4 100
Compiler Error at Large Sizes:
Runtime Error
Currently this solution is working for smaller final lists of length 4000, however in order test cases where length = 10,000,000 it is failing. I do not know why this is the case and I cannot provide the example input since it is so massive. Is there anything clear as to why it would fail in larger cases?
I think the problem is that you make too many intermediary trow away list here:
ResultArray[i[0]-1:i[1]] = list(map(sum, zip(ResultArray[i[0]-1:i[1]], Values[index]*len(ResultArray[i[0]-1:i[1]]))))
this ResultArray[i[0]-1:i[1]] result in a list and you do it twice, and one is just to get the size, which is a complete waste of resources, then you make another list with Values[index]*len(...) and finally compile that into yet another list that will also be throw away once it is assigned into the original, so you make 4 throw away list, so for example lets said the the slice size is of 5.000.000, then you are making 4 of those or 20.000.000 extra space you are consuming, 15.000.000 of which you don't really need, and if your original list is of 10.000.000 elements, well just do the math...
You can get the same result for your list(map(...)) with list comprehension like
[v+Value[index][0] for v in ResultArray[i[0]-1:i[1]] ]
now we use two less lists, and we can reduce one list more by making it a generator expression, given that slice assignment does not need that you assign a list specifically, just something that is iterable
(v+Value[index][0] for v in ResultArray[i[0]-1:i[1]] )
I don't know if internally the slice assignment it make it a list first or not, but hopefully it doesn't, and with that we go back to just one extra list
here is an example
>>> a=[0]*10
>>> a
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
>>> a[1:5] = (3+v for v in a[1:5])
>>> a
[0, 3, 3, 3, 3, 0, 0, 0, 0, 0]
>>>
we can reduce it to zero extra list (assuming that internally it doesn't make one) by using itertools.islice
>>> import itertools
>>> a[3:7] = (1+v for v in itertools.islice(a,3,7))
>>> a
[0, 3, 3, 4, 4, 1, 1, 0, 0, 0]
>>>

Assigning to all entries whose indices sum to some value

I have an array X of binary numbers and shape (2, 2, ..., 2), and would like to assign the value 1 to all entries whose indices sum to 0 modulo 2 and the value 0 to the rest.
For example, if we had X.shape = (2, 2, 2) then I would like to assign 1 to X[0, 0, 0], X[0, 1, 1], X[1, 0, 1], X[1, 1, 0] and 0 to the other 4 entries.
What is the most efficient way of doing this? I assume I should create this array with the np.bool datatype, so the solution should work with that in mind.
Here are a direct method and a tricksy one. The tricksy one uses bit packing and exploits certain repetitive patterns. For large n this gives a considerable speedup (>50 # n=19).
import functools as ft
import numpy as np
def direct(n):
I = np.arange(2, dtype='u1')
return ft.reduce(np.bitwise_xor, np.ix_(I[::-1], *(n-1)*(I,)))
def smartish(n):
assert n >= 6
b = np.empty(1<<(n-3), 'u1')
b[[0, 3, 5, 6]] = 0b10010110
b[[1, 2, 4, 7]] = 0b01101001
i = b.view('u8')
jp = 1
for j in range(0, n-7, 2):
i[3*jp:4*jp] = i[:jp]
i[jp:3*jp].reshape(2, -1)[...] = 0xffff_ffff_ffff_ffff ^ i[:jp]
jp *= 4
if n & 1:
i[jp:] = 0xffff_ffff_ffff_ffff ^ i[:jp]
return np.unpackbits(b).reshape(n*(2,))
from timeit import timeit
assert np.all(smartish(19) == direct(19))
print(f"direct {timeit(lambda: direct(19), number=100)*10:.3f} ms")
print(f"smartish {timeit(lambda: smartish(19), number=100)*10:.3f} ms")
Sample run on a 2^19 box:
direct 5.408 ms
smartish 0.079 ms
Please note that these return uint8 arrays, for example:
>>> direct(3)
array([[[1, 0],
[0, 1]],
[[0, 1],
[1, 0]]], dtype=uint8)
But these can be view-cast to bool at virtually zero cost:
>>> direct(3).view('?')
array([[[ True, False],
[False, True]],
[[False, True],
[ True, False]]])
Explainer:
direct method: One straight-forward way of checking bit parity is to xor the bits together. We need to do this in a "reducing" way, i.e. we have to apply the binary operation xor to the first two operands, then to the result and the third operand, then to that result and the fourth operand and so forth. This is what functools.reduce does.
Also, we don't want to do this just once but on each point of a 2^n grid. The numpy way of doing this are open grids. These can be generated from 1D axes using np.ix_ or in simple cases using np.ogrid. Note that we flip the very first axis to account for the fact that we want inverted parity.
smartish method. We make two main optimizations. 1) xor is a bitwise operation meaning that it does "64-way parallel computation" for free if we pack our bits into a 64 bit uint. 2) If we flatten the 2^n hypercube then position n in the linear arrangement corresponds to cell (bit1, bit2, bit3, ...) in the hypercube where bit1, bit2 etc. is the binary representation (with leading zeros) of n. Now note that if we have computed the parities of positions 0 .. 0b11..11 = 2^k-1 then we can get the parities of 2^k..2^(k+1)-1 by simply copying and inverting the already computed parities. For example k = 2:
0b000, 0b001, 0b010, 0b011 would be what we have and
0b100, 0b101, 0b110, 0b111 would be what we need to compute
^ ^ ^ ^
Since these two sequences differ only in the marked bit it is clear that indeed their cross digit sums differ by one and the parities are inverted.
As an exercise work out what can be said in a similar vein about the next 2^k entries and the 2^k entries after those.

Loop over clump_masked indices

I have an array y_filtered that contains some masked values. I want to replace these values by some value I calculate based on their neighbouring values. I can get the indices of the masked values by using masked_slices = ma.clump_masked(y_filtered). This returns a list of slices, e.g. [slice(194, 196, None)].
I can easily get the values from my masked array, by using y_filtered[masked_slices], and even loop over them. However, I need to access the index of the values as well, so i can calculate its new value based on its neighbours. Enumerate (logically) returns 0, 1, etc. instead of the indices I need.
Here's the solution I came up with.
# get indices of masked data
masked_slices = ma.clump_masked(y_filtered)
y_enum = [(i, y_i) for i, y_i in zip(range(len(y_filtered)), y_filtered)]
for sl in masked_slices:
for i, y_i in y_enum[sl]:
# simplified example calculation
y_filtered[i] = np.average(y_filtered[i-2:i+2])
It is very ugly method i.m.o. and I think there has to be a better way to do this. Any suggestions?
Thanks!
EDIT:
I figured out a better way to achieve what I think you want to do. This code picks every window of 5 elements and compute its (masked) average, then uses those values to fill the gaps in the original array. If some index does not have any unmasked value close enough it will just leave it as masked:
import numpy as np
from numpy.lib.stride_tricks import as_strided
SMOOTH_MARGIN = 2
x = np.ma.array(data=[1, 2, 3, 4, 5, 6, 8, 9, 10],
mask=[0, 1, 0, 0, 1, 1, 1, 1, 0])
print(x)
# [1 -- 3 4 -- -- -- -- 10]
pad_data = np.pad(x.data, (SMOOTH_MARGIN, SMOOTH_MARGIN), mode='constant')
pad_mask = np.pad(x.mask, (SMOOTH_MARGIN, SMOOTH_MARGIN), mode='constant',
constant_values=True)
k = 2 * SMOOTH_MARGIN + 1
isize = x.dtype.itemsize
msize = x.mask.dtype.itemsize
x_pad = np.ma.array(
data=as_strided(pad_data, (len(x), k), (isize, isize), writeable=False),
mask=as_strided(pad_mask, (len(x), k), (msize, msize), writeable=False))
x_avg = np.ma.average(x_pad, axis=1).astype(x_pad.dtype)
fill_mask = ~x_avg.mask & x.mask
result = x.copy()
result[fill_mask] = x_avg[fill_mask]
print(result)
# [1 2 3 4 3 4 10 10 10]
(note all the values are integers here because x was originally of integer type)
The original posted code has a few errors, firstly it both reads and writes values from y_filtered in the loop, so the results of later indices are affected by the previous iterations, this could be fixed with a copy of the original y_filtered. Second, [i-2:i+2] should probably be [max(i-2, 0):i+3], in order to have a symmetric window starting at zero or later always.
You could do this:
from itertools import chain
# get indices of masked data
masked_slices = ma.clump_masked(y_filtered)
for idx in chain.from_iterable(range(s.start, s.stop) for s in masked_slices):
y_filtered[idx] = np.average(y_filtered[max(idx - 2, 0):idx + 3])

Get bit on n position from all elements in ndarray in python

i have a 3D array of int32. I would like to transform each item from array to its corresponding bit value on "n" th position. My current approach is to loop through the whole array, but I think it can be done much more efficiently.
for z in range(0,dim[2]):
for y in range(0,dim[1]):
for x in range(0,dim[0]):
byte='{0:032b}'.format(array[z][y][x])
array[z][y][x]=int(byte>>n) & 1
Looking forward to your answers.
If you are dealing with large arrays, you are better off using numpy. Applying bitwise operations on a numpy array is much faster than applying it on python lists.
import numpy as np
a = np.random.randint(1,65, (2,2,2))
print a
Out[12]:
array([[[37, 46],
[47, 34]],
[[ 3, 15],
[44, 57]]])
print (a>>1)&1
Out[16]:
array([[[0, 1],
[1, 1]],
[[1, 1],
[0, 0]]])
Unless there is an intrinsic relation between the different points, you have no other choice than to loop over them to discover their current values. So the best you can do, will always be O(n^3)
What I don't get however, is why you go over the hassle of converting a number to a 32bit string, then back to int.
If you want to check if the nth bit of a number is set, you would do the following:
power_n = 1 << (n - 1)
for z in xrange(0,dim[2]):
for y in xrange(0,dim[1]):
for x in xrange(0,dim[0]):
array[z][y][x]= 0 if array[z][y][x] & power_n == 0 else 1
Not that in this example, I'm assuming that N is a 1-index (first bit is at n=1).

How do I access the length of a square array in Python?

Say that I have an array map[x][y] where I don't know the height or width of the array. array.length() will return the overall height *width, but what if I want to know what the height and width are independently? Is that possible through python inbuilt utilities?
I imagine you're trying to code some kind of game on a two-dimensional map.
Python doesn't have multidimensional arrays in the C or C++ sense (which are, themselves, just syntax sugar around a 1D array.) Instead, Python has lists, which are a strictly one-dimensional affair.
You can fake a two-dimensional array by creating a list which contains other lists. Like so:
width = 10
height = 10
map = [ [None]*width for i in range(height) ]
And you can get the width and height by:
height = len(map)
width = len(map[0])
This will only give the expected result if every sublist of map is the same length, i.e. if the map is a rectangular list of lists. Python will not enforce this restriction for you (why would it?) so you will have to enforce it yourself.
As stated in the other answer, numpy has true N-dimensional arrays optimised for numerical math.
Sidenote: note the use of the list comprehension map = [ [None]*width for i in range(height)] as opposed to the following:
map = [ [None]*width ] * height
which has unexpected effects:
>>> a = [ [0] * 3 ] * 3 #create a 3x3 array of zeroes
>>> a
[[0, 0, 0],
[0, 0, 0],
[0, 0, 0]]
>>> a[1][1] = 9 #change the number in the middle of the grid to '9'
>>> a
[[0, 9, 0], # wtf?
[0, 9, 0],
[0, 9, 0]] # wtf?
This is because the [list] * n operator for arrays doesn't make n new copies of list; instead, it makes n copies of a reference to list. Changing one of the list references will change them all.
array is a single dimensional array. If you want multiple dimensional, take a look at the standard Python list. See http://docs.python.org/library/stdtypes.html#sequence-types-str-unicode-list-tuple-bytearray-buffer-xrange
Note that in the case of a list, you can get the "height" of the list (the number of "rows"), but the rows don't have to contain the same number of columns so you can't get the width directly. You could find the shortest row, the longest row, the average row, etc, but you'll need to loop through the list to do it.
Depending on what you're doing, you might consider numpy instead.
Disclaimer: I don't know python at all, but figured I'd take a look :)
Looking on this page, it looks like you're after map.ndim
EDIT
Hmm, it looks like i stumbled across a library called NumPy
EDIT 2
map.ndim in your case would be 2, ie - the number of dimensions in your array. map.shape will give you a tuple of integers describing the dimensions of your array. In your case 2,2 - so you would need map.shape[1] for the width of your array.

Categories

Resources