I am trying to port a portion of a code written in a different language (an obscure one called Igor Pro by Wavemetrics for those of you have heard of it) to Python.
In this code, there is a conversion of a data type from a 16-bit integer (read in from a 16-bit, big endian binary file) to single-precision (32-bit) floating-point. In this program, the conversion is as follows:
Signed 16-bit integer:
print tmp
tmp[0]={-24160,18597,-24160,18597,-24160}
converted to 32-bit floating-point:
Redimension/S/E=1 tmp
print tmp
tmp[0]={339213,339213,5.79801e-41,0,0}
The /S flag/option indicates that the data type of tmp should be float32 instead of int16. However, I believe the important flag/option is /E=1, which is said to "Force reshape without converting or moving data."
In Python, the conversion is as follows:
>>> tmp[:5]
array([-24160, 18597, -24160, 18597, -24160], dtype=int16)
>>> tmp.astype('float32')
array([-24160., 18597., -24160., ..., 18597., -24160., 18597.], dtype=float32)
Which is what I expect, but I need to find a function/operation that emulates the /E=1 option in the original code above. Is there an obvious way in which -24160 and 18597 would both be converted to 339213? Does this have anything to do with byteswap or newbyteorder or something else?
import numpy
tmp=numpy.array([-24160,18597,-24160,18597,-24160, 0], numpy.int16)
tmp.dtype = numpy.float32
print tmp
Result:
[ 3.39213000e+05 3.39213000e+05 5.79801253e-41]
I had to add a zero to the list of value because there are an odd number of values. It cannot interpret those as 32 bit floats since there 5 16 bit values.
Use view instead of astype:
In [9]: tmp=np.array([-24160, 18597, -24160, 18597, -24160, 18597], dtype=int16)
In [10]: tmp.view('float32')
Out[10]: array([ 339213., 339213., 339213.], dtype=float32)
.astype creates a copy of the array cast to the new dtype
.view returns a view of the array (with the same underlying data),
with the data interpreted according to the new dtype.
Is there an obvious way in which -24160 and 18597 would both be converted to 339213?
No, but neither is there any obvious way in which -24160 would convert to 339213 and 5.79801e-41 and 0.
It looks more like the conversion takes two input numbers to create one output (probably by concatenating the raw 2×16 bits to 32 bits and calling the result a float). In that case the pair -24160,18597 consistently becomes 339213, and 5.79801e-41 probably results from -24160,0 where the 0 is invented because we run out of inputs. Since 5.79801e-41 looks like it might be a single-precision denormal, this implies that the two 16-bit blocks are probably concatenated in little-endian order.
It remains to see whether you need to byte-swap each of the 16-bit inputs, but you can check that for yourself.
Related
I'm encountering a problem with incorrect numpy calculations when the inputs to a calculation are a numpy array with a 32-bit integer data type, but the outputs include larger numbers that require 64-bit representation.
Here's a minimal working example:
arr = np.ones(5, dtype=int) * (2**24 + 300) # arr.dtype defaults to 'int32'
# Following comment from #hpaulj I changed the first line, which was originally:
# arr = np.zeros(5, dtype=int)
# arr[:] = 2**24 + 300
single_value_calc = 2**8 * (2**24 + 300)
numpy_calc = 2**8 * arr
print(single_value_calc)
print(numpy_calc[0])
# RESULTS
4295044096
76800
The desired output is that the numpy array contains the correct value of 4295044096, which requires 64 bits to represent it. i.e. I would have expected numpy arrays to automatically upcast from int32 to int64 when the output requires it, rather maintaining a 32-bit output and wrapping back to 0 after the value of 2^32 is exceeded.
Of course, I can fix the problem manually by forcing int64 representation:
numpy_calc2 = 2**8 * arr.astype('int64')
but this is undesirable for general code, since the output will only need 64-bit representation (i.e. to hold large numbers) in some cases and not all. In my use case, performance is critical so forcing upcasting every time would be costly.
Is this the intended behaviour of numpy arrays? And if so, is there a clean, performant solution please?
Type casting and promotion in numpy is fairly complicated and occasionally surprising. This recent unofficial write-up by Sebastian Berg explains some of the nuances of the subject (mostly concentrating on scalars and 0d arrays).
Quoting from this document:
Python Integers and Floats
Note that python integers are handled exactly like numpy ones. They are, however, special in that they do not have a dtype associated with them explicitly. Value based logic, as described here, seems useful for python integers and floats to allow:
arr = np.arange(10, dtype=np.int8)
arr += 1
# or:
res = arr + 1
res.dtype == np.int8
which ensures that no upcast (for example with higher memory usage) occurs.
(emphasis mine.)
See also Allan Haldane's gist suggesting C-style type coercion, linked from the previous document:
Currently, when two dtypes are involved in a binary operation numpy's principle is that "the output dtype's range covers the range of both input dtypes", and when a single dtype is involved there is never any cast.
(emphasis again mine.)
So my understanding is that the promotion rules for numpy scalars and arrays differ, primarily because it's not feasible to check every element inside an array to determine whether casting can be done safely. Again from the former document:
Scalar based rules
Unlike arrays, where inspection of all values is not feasable, for scalars (and 0-D arrays) the value is inspected.
This would mean that you can either use np.int64 from the start to be safe (and if you're on linux then dtype=int will actually do this on its own), or check the maximum value of your arrays before suspect operations and determine if you have to promote the dtype yourself, on a case-by-case basis. I understand that this might not be feasible if you are doing a lot of calculations, but I don't believe there is a way around this considering numpy's current type promotion rules.
Why does Python not cast long numbers to numpy floats when doing sth. like
a = np.array([10.0, 56.0]) + long(10**47)
The dtype of the variable a is object. I did not expect this when during an maximum likelihood optimization problem one fit parameter B was an integer and thus 10**B became a long.
Is this due to fear of precision loss?
I suspect this is because python is able to store arbitrarily long integers and so numpy realizes that it can't safely cast the result to a known data type. Therefore, it falls back to treating the array as an array of python objects and multiplies elementwise using python's rules (which casts to a float).
You can see what the result type is by using np.result_type:
>>> np.result_type(np.array([10.0, 56.0],long(10**47))
dtype('O')
Based on the documentation for np.result_type what happens is:
First, np.min_scalar_type() is called on each of the inputs:
>>> np.min_scalar_type(np.array([10.0, 56.0]))
dtype('float64')
>>> np.min_scalar_type(long(10**47))
dtype('O')
Second, the result is determined by combining these types using np.promote_types:
>>> np.promote_types(np.float64,np.dtype('O'))
dtype('O')
Why is python inserting strange negative numbers into my array?
I am reading some numerical data from a plain text file like below:
fp=np.genfromtxt("mytextfile.txt", dtype=('int32'), delimiter='\r\n')
The information contained in the file are all positive numbers, file is formatted like below and there are 300000 of these numbers:
12345
45678
1056789
232323
6789010001
1023242556
When I print out the fp read in array, the first half of the array is correctly read in but the last half is strange negative numbers that aren't in my file at all.
How can I get it to read correctly what is in the file?
You told it the numbers are int32s, but at least some of your numbers, e.g. 6789010001 are larger than a signed 32 bit quantity can represent (6789010001 is larger than an unsigned 32 bit quantity can represent).
If all the numbers are positive, I'd suggested using uint64 as your data type (you should check that all numbers in the file are in fact less than 2**64 though).
You have data larger then int32 can save, this causes overflow and makes it negative.
Numpy's Integers doesn't act like python's integer it's like a C integer
Try change dtype from int32 to int64 or object may help
I have two arrays that are identical (by design because I obtained the second one by doing an FFT and then inverse FFT of the first one). However, when I write the first one to a .wav file, I get sound-producing file as opposed to not when I do the same with the second one. I get no sound. Here is my code:
fs, data = wavfile.read(filename)
a = data.T[0]
c = fft(a)
y2 = fftp.ifft(c)
y2 = np.array([int(round(i)) for i in y2.real])
Now when I try:
sum(y2==a)==len(a)
I get True, which means the two arrays are identical. The only difference is that one has "dtype=int16":
In [322]: a
Out[322]: array([ 1, 1, 1, ..., 21, 20, 21], dtype=int16)
In [321]: y2
Out[321]: array([ 1, 1, 1, ..., 21, 20, 21])
How do I convert the second array to a format where it produces a valid .wav file as well?
That "only difference" is a huge difference.
The WAV format, by default, stores samples as signed little-endian 16-bit integers. So, when you write an array of int16 values as raw data, you get a playable WAV file (on a little-endian system, at least).
But when you write an array of int32 values, you get nonsense—each number turns into 2 samples, one of which is the high word of your data, the next of which is the low word. So, you've got your original audio samples at half speed, and interleaves with what's effectively random noise.
Or, alternatively, you can use a non-default WAV format. You didn't show enough of your code to show how you're handling this, but you can write WAV files in a variety of different formats, from 8-bit unsigned int to 32-bit float, and 32-bit signed ints are a valid format. WAV files can even handle compression (including MP3).
But less-common formats may not actually play with every tool; a lot of programs assume a WAV is 16-bit integers, and won't know what to do with anything else.
So, you're probably better off writing 16-bit integers.
Or, maybe you're already doing that—writing 32-bit int values with the right header—and maybe your player is handling them properly.
But you're writing 32-bit int values between -32768 and 32767. Which means you're only using 1/65536th of the dynamic range, so everything will be incredibly quiet. If you want to write 32-bit int values, you want to normalize them to the 32-bit int range, not the 16-bit int range.
The simplest solution to all of these problems is: convert the values back to int16 before writing them:
y3 = y2.astype(np.int16)
I am trying to convert a number stored as a list of ints to a float type. I got the number via a serial console and want to reassemble it back together into a float.
The way I would do it in C is something like this:
bit_data = ((int16_t)byte_array[0] << 8) | byte_array[1];
result = (float)bit_data;
What I tried to use in python is a much more simple conversion:
result = int_list[0]*256.0 + int_list[1]
However, this does not preserve the sign of the result, as the C code does.
What is the right way to do this in python?
UPDATE:
Python version is 2.7.3.
My byte array has a length of 2.
in the python code byte_array is list of ints. I've renamed it to avoid misunderstanding. I can not just use the float() function because it will not preserve the sign of the number.
I'm a bit confused by what data you have, and how it is represented in Python. As I understand it, you have received two unsigned bytes over a serial connection, which are now represented by a list of two python ints. This data represents a big endian 16-bit signed integer, which you want to extract and turn into a float. eg. [0xFF, 0xFE] -> -2 -> -2.0
import array, struct
two_unsigned_bytes = [255, 254] # represented by ints
byte_array = array.array("B", two_unsigned_bytes)
# change above to "b" if the ints represent signed bytes ie. in range -128 to 127
signed_16_bit_int, = struct.unpack(">h", byte_array)
float_result = float(signed_16_bit_int)
I think what you want is the struct module.
Here's a round trip snippet:
import struct
sampleValue = 42.13
somebytes = struct.pack('=f', sampleValue)
print(somebytes)
result = struct.unpack('=f', somebytes)
print(result)
result may be surprising to you. unpack returns a tuple. So to get to the value you can do
result[0]
or modify the result setting line to be
result = struct.unpack('=f', some bytes)[0]
I personally hate that, so use the following instead
result , = struct.unpack('=f', some bytes) # tuple unpacking on assignment
The second thing you'll notice is that the value has extra digits of noise. That's because python's native floating point representation is double.
(This is python3 btw, adjust for using old versions of python as appropriate)
I am not sure I really understand what you are doing, but I think you got 4 bytes from a stream and know them to represent a float32 value. The way you handling this suggests big-endian byte-order.
Python has the struct package (https://docs.python.org/2/library/struct.html) to handle bytestreams.
import struct
stream = struct.pack(">f", 2/3.)
len(stream) # 4
reconstructed_float = struct.unpack(">f", stream)
Okay, so I think int_list isn't really just a list of ints. The ints are constrained to 0-255 and represent bytes that can be built into a signed integer. You then want to turn that into a float. The trick is to set the sign of the first byte properly and then procede much like you did.
float((-(byte_array[0]-127) if byte_array[0]>127 else byte_array[0])*256 + byte_array[1])