Casting to a string for numeric sorting - python

I have a set of numbers and I need to generate a string-hash that is sortable for these numbers. The numbers can be integers or floats, for example:
-5.75E+100
-4
-1.74E-101
1.74E-101
5
9
11
52.3
5.75E+100
I think to do non-exponents for integers and floats it would be simple:
# whatever the padding needs to be
>>> sorted(map(lambda x: str(x).zfill(10), [-4, 5, 52.3]))
['-000000004', '0000000005', '00000052.3']
However, what would be a more comprehensive way to generate a string-hash here that would sort properly for the above list of numbers? I am fine prepending exponents, if necessary (or converting everything to an exponent, if required), and encoding negative numbers in complement code, if that's required too.

Every float object has a built-in function hex() that will convert it to a hex string. That's almost enough to make a sortable string, but there are a few problems.
First, negative numbers have a leading - but positive numbers don't have anything. You need to add a leading character to positive numbers.
Second, - comes after + in the sorting order. You need to replace one or the other to make the order correct.
Third, the exponent comes at the end of the string. It needs to get moved to the front of the string to make it more significant, but the sign needs to stay at the absolute front.
Fourth, the exponent is a variable number of digits. It needs to be zero filled so that it has a consistent size.
Putting it all together produces something like this:
def sortable_string(number):
hex_num = float(number).hex()
if not hex_num.startswith('-'):
hex_num = '+' + hex_num
hex_num = hex_num.replace('-', '!')
hex_parts = hex_num.split('p')
exponent = hex_parts[1][0] + hex_parts[1][1:].ljust(4, '0')
return hex_parts[0][0] + exponent + hex_parts[0][1:]

You can try this,
nums = sorted(map(eval, data))
nums = list(map(str, nums))
nums
Output -
['-5.75E+100', '-4', '-1.74E-101', '1.74E-101', '5', '52.3', '5.75E+100']

Related

In Python - how to retrieve the binary code of negative integer from the memory? [duplicate]

Integers in Python are stored in two's complement, correct?
Although:
>>> x = 5
>>> bin(x)
0b101
And:
>>> x = -5
>>> bin(x)
-0b101
That's pretty lame. How do I get python to give me the numbers in REAL binary bits, and without the 0b infront of it? So:
>>> x = 5
>>> bin(x)
0101
>>> y = -5
>>> bin(y)
1011
It works best if you provide a mask. That way you specify how far to sign extend.
>>> bin(-27 & 0b1111111111111111)
'0b1111111111100101'
Or perhaps more generally:
def bindigits(n, bits):
s = bin(n & int("1"*bits, 2))[2:]
return ("{0:0>%s}" % (bits)).format(s)
>>> print bindigits(-31337, 24)
111111111000010110010111
In basic theory, the actual width of the number is a function of the size of the storage. If it's a 32-bit number, then a negative number has a 1 in the MSB of a set of 32. If it's a 64-bit value, then there are 64 bits to display.
But in Python, integer precision is limited only to the constraints of your hardware. On my computer, this actually works, but it consumes 9GB of RAM just to store the value of x. Anything higher and I get a MemoryError. If I had more RAM, I could store larger numbers.
>>> x = 1 << (1 << 36)
So with that in mind, what binary number represents -1? Python is well-capable of interpreting literally millions (and even billions) of bits of precision, as the previous example shows. In 2's complement, the sign bit extends all the way to the left, but in Python there is no pre-defined number of bits; there are as many as you need.
But then you run into ambiguity: does binary 1 represent 1, or -1? Well, it could be either. Does 111 represent 7 or -1? Again, it could be either. So does 111111111 represent 511, or -1... well, both, depending on your precision.
Python needs a way to represent these numbers in binary so that there's no ambiguity of their meaning. The 0b prefix just says "this number is in binary". Just like 0x means "this number is in hex". So if I say 0b1111, how do I know if the user wants -1 or 15? There are two options:
Option A: The sign bit
You could declare that all numbers are signed, and the left-most bit is the sign bit. That means 0b1 is -1, while 0b01 is 1. That also means that 0b111 is also -1, while 0b0111 is 7. In the end, this is probably more confusing than helpful particularly because most binary arithmetic is going to be unsigned anyway, and people are more likely to run into mistakes by accidentally marking a number as negative because they didn't include an explicit sign bit.
Option B: The sign indication
With this option, binary numbers are represented unsigned, and negative numbers have a "-" prefix, just like they do in decimal. This is (a) more consistent with decimal, (b) more compatible with the way binary values are most likely going to be used. You lose the ability to specify a negative number using its two's complement representation, but remember that two's complement is a storage implementation detail, not a proper indication of the underlying value itself. It shouldn't have to be something that the user has to understand.
In the end, Option B makes the most sense. There's less confusion and the user isn't required to understand the storage details.
To properly interpret a binary sequence as two's complement, there needs to a length associated with the sequence. When you are working low-level types that correspond directly to CPU registers, there is an implicit length. Since Python integers can have an arbitrary length, there really isn't an internal two's complement format. Since there isn't a length associated with a number, there is no way to distinguish between positive and negative numbers. To remove the ambiguity, bin() includes a minus sign when formatting a negative number.
Python's arbitrary length integer type actually uses a sign-magnitude internal format. The logical operations (bit shifting, and, or, etc.) are designed to mimic two's complement format. This is typical of multiple precision libraries.
Here is a little bit more readable version of Tylerl answer, for example let's say you want -2 in its 8-bits negative representation of "two's complement" :
bin(-2 & (2**8-1))
2**8 stands for the ninth bit (256), substract 1 to it and you have all the preceding bits set to one (255)
for 8 and 16 bits masks, you can replace (2**8-1) by 0xff, or 0xffff. The hexadecimal version becomes less readalbe after that point.
If this is unclear, here is a regular function of it:
def twosComplement (value, bitLength) :
return bin(value & (2**bitLength - 1))
The compliment of one minus number's meaning is mod value minus the positive value.
So I think,the brief way for the compliment of -27 is
bin((1<<32) - 27) // 32 bit length '0b11111111111111111111111111100101'
bin((1<<16) - 27)
bin((1<<8) - 27) // 8 bit length '0b11100101'
Not sure how to get what you want using the standard lib. There are a handful of scripts and packages out there that will do the conversion for you.
I just wanted to note the "why" , and why it's not lame.
bin() doesn't return binary bits. it converts the number to a binary string. the leading '0b' tells the interpreter that you're dealing with a binary number , as per the python language definition. this way you can directly work with binary numbers, like this
>>> 0b01
1
>>> 0b10
2
>>> 0b11
3
>>> 0b01 + 0b10
3
that's not lame. that's great.
http://docs.python.org/library/functions.html#bin
bin(x)
Convert an integer number to a binary string.
http://docs.python.org/reference/lexical_analysis.html#integers
Integer and long integer literals are described by the following lexical definitions:
bininteger ::= "0" ("b" | "B") bindigit+
bindigit ::= "0" | "1"
Use slices to get rid of unwanted '0b'.
bin(5)[2:]
'101'
or if you want digits,
tuple ( bin(5)[2:] )
('1', '0', '1')
or even
map( int, tuple( bin(5)[2:] ) )
[1, 0, 1]
tobin = lambda x, count=8: "".join(map(lambda y:str((x>>y)&1), range(count-1, -1, -1)))
e.g.
tobin(5) # => '00000101'
tobin(5, 4) # => '0101'
tobin(-5, 4) # => '1011'
Or as clear functions:
# Returns bit y of x (10 base). i.e.
# bit 2 of 5 is 1
# bit 1 of 5 is 0
# bit 0 of 5 is 1
def getBit(y, x):
return str((x>>y)&1)
# Returns the first `count` bits of base 10 integer `x`
def tobin(x, count=8):
shift = range(count-1, -1, -1)
bits = map(lambda y: getBit(y, x), shift)
return "".join(bits)
(Adapted from W.J. Van de Laan's comment)
I'm not entirely certain what you ultimately want to do, but you might want to look at the bitarray package.
def tobin(data, width):
data_str = bin(data & (2**width-1))[2:].zfill(width)
return data_str
You can use the Binary fractions package. This package implements TwosComplement with binary integers and binary fractions. You can convert binary-fraction strings into their twos complement and vice-versa
Example:
>>> from binary_fractions import TwosComplement
>>> TwosComplement.to_float("11111111111") # TwosComplement --> float
-1.0
>>> TwosComplement.to_float("11111111100") # TwosComplement --> float
-4.0
>>> TwosComplement(-1.5) # float --> TwosComplement
'10.1'
>>> TwosComplement(1.5) # float --> TwosComplement
'01.1'
>>> TwosComplement(5) # int --> TwosComplement
'0101'
To use this with Binary's instead of float's you can use the Binary class inside the same package.
PS: Shameless plug, I'm the author of this package.
For positive numbers, just use:
bin(x)[2:].zfill(4)
For negative numbers, it's a little different:
bin((eval("0b"+str(int(bin(x)[3:].zfill(4).replace("0","2").replace("1","0").replace("2","1"))))+eval("0b1")))[2:].zfill(4)
As a whole script, this is how it should look:
def binary(number):
if number < 0:
return bin((eval("0b"+str(int(bin(number)[3:].zfill(4).replace("0","2").replace("1","0").replace("2","1"))))+eval("0b1")))[2:].zfill(4)
return bin(number)[2:].zfill(4)
x=input()
print binary(x)
A modification on tylerl's very helpful answer that provides sign extension for positive numbers as well as negative (no error checking).
def to2sCompStr(num, bitWidth):
num &= (2 << bitWidth-1)-1 # mask
formatStr = '{:0'+str(bitWidth)+'b}'
ret = formatStr.format(int(num))
return ret
Example:
In [11]: to2sCompStr(-24, 18)
Out[11]: '111111111111101000'
In [12]: to2sCompStr(24, 18)
Out[12]: '000000000000011000'
No need, it already is. It is just python choosing to represent it differently. If you start printing each nibble separately, it will show its true colours.
checkNIB = '{0:04b}'.format
checkBYT = lambda x: '-'.join( map( checkNIB, [ (x>>4)&0xf, x&0xf] ) )
checkBTS = lambda x: '-'.join( [ checkBYT( ( x>>(shift*8) )&0xff ) for shift in reversed( range(4) ) if ( x>>(shift*8) )&0xff ] )
print( checkBTS(-0x0002) )
Output is simple:
>>>1111-1111-1111-1111-1111-1111-1111-1110
Now it reverts to original representation when you want to display a twos complement of an nibble but it is still possible if you divide it into halves of nibble and so. Just have in mind that the best result is with negative hex and binary integer interpretations simple numbers not so much, also with hex you can set up the byte size.
We can leverage the property of bit-wise XOR. Use bit-wise XOR to flip the bits and then add 1. Then you can use the python inbuilt bin() function to get the binary representation of the 2's complement. Here's an example function:
def twos_complement(input_number):
print(bin(input_number)) # prints binary value of input
mask = 2**(1 + len(bin(input_number)[2:])) - 1 # Calculate mask to do bitwise XOR operation
twos_comp = (input_number ^ mask) + 1 # calculate 2's complement, for negative of input_number (-1 * input_number)
print(bin(twos_comp)) # print 2's complement representation of negative of input_number.
I hope this solves your problem`
num = input("Enter number : ")
bin_num=bin(num)
binary = '0' + binary_num[2:]
print binary

In python, i want to thousands separate decimal numbers and display with exactly 2 decimal digits

Consider the following numbers:
1000.10
1000.11
1000.113
I would like to get these to print out in python as:
1,000.10
1,000.11
1,000.11
The following transformations almost do this, except that whenever the second digit to the right of the decimal point is a zero, the zero is elided and as a result that number doesn't line up properly.
This is my attempt:
for n in [1000.10, 1000.11, 1000.112]:
nf = '%.2f' %n # nf is a 2 digit decimal number, but a string
nff = float(nf) # nff is a float which the next transformation needs
n_comma = f'{nff:,}' # this puts the commas in
print('%10s' %n_comma)
1,000.1
1,000.11
1,000.11
Is there a way to avoid eliding the ending zero in the first number?
You want the format specifier ',.2f'. ,, as you noted, performs comma separation of thousands, while .2f specifies that two digits are to be retained:
print([f'{number:,.2f}' for number in n])
Output:
['1,000.10', '1,000.11', '1,000.11']
You can simply use f'{n:,.2f}' to combine the thusand separator and the 2 decimal digits format specifiers:
for n in [1000.10, 1000.11, 1000.112]:
print(f'{n:,.2f}')
Outputs
1,000.10
1,000.11
1,000.11
You might be able to do it like this:
num = 100.0
print(str(num) + "0")
So you print the number as a string plus 0 at the end.
Update:
So that it doesn’t do this to all numbers, try doing something like:
if num == 1000.10:
#add the zero
elif num == 1000.20:
#again, add the zero
#and so on and so on...
So if the number has a zero at the end (its decimal values are .10, .20, .30, etc.), add one, and if not, don’t.

How to convert signed string to its Binary equivalent in Python?

I am using the itertool function to enter value to a list. The itertool function is taking the value as a str, not as an int. After that, I need to convert the values from the list to its Binary equivalent. The problem arises when I need to convert a negative value e.g. -5. My code is taking the "-" as a str, but I need it to consider it as a negative sign before the following numerical value.Does the concept of unsigned integer come into play?
My code is-
L3= list(itertools.repeat("-1",5))
file= open(filename, 'w')
L3_1=[ ]
for item in L3:
x3=bytes(item,"ascii")
L3_1.append(' '.join(["{0:b}".format(x).zfill(8) for x in x3]))
for item in L3_1:
file.write("%s\n" % item)
file.close()
It's not entirely clear what your problem is and what you want to achieve, so please correct me if I make wrong assumptions.
Anyways, converting integers to binary representation is easily done using bin. For example, bin(5) gives you '0b101'. Now, bin(-5) gives you '-0b101' - which is not what you expect when being used to binary from other languages, e.g. C.
The "problem" is that integers in python are not fixed size. There's no int16, int32, uint8 and such. Python will just add bits as it needs to. That means a negative number cannot be represented by its complement - 0b11111011 is not -5 as for int8, but 251. Since binaries are potentially infinite, there's no fixed position to place a sign bit. Thus, python has to add the explicit unary -. This is different from interpreting -5 as the strings "-" and "5".
If you want to get the binary representation for negative, fixed size integers, I think you have to do it by yourself. A function that does this could look like this:
def bin_int(number, size=8):
max_val = int('0b' + ('1'* (size - 1)), 2) # e.g. 0b01111111
assert -max_val <= number <= max_val, 'Number out of range'
if number >=0:
return bin(number)
sign = int('0b1' + ('0' * size), 2) # e.g. 0b10000000
return bin(number + sign)
Now, to do what you initially wanted: write the binary representation of numbers to a file.
output_list = [1, -1, -5, -64, 0] # iterable *containing* integers
with open(filename, 'w') as output_file: # with statement is safer for writing
for number in output_list:
output_file.write(bin_int(number) + '\n')
Or if you just want to check the result:
print([bin_int(number) for number in [1, -1, -5, -64, -127]])
# ['0b1', '0b11111111', '0b11111011', '0b11000000', '0b10000001']
Note that if you want to strip the 0b, you can do that via bin_int(number)[2:], e.g. output_file.write(bin_int(number)[2:] + '\n'). This removes the first two characters from the string holding the binary representation.

Set fixed length integer in python

After a bit of googling, nothing came up. I am manipulating sequence numbers for network packets and need the numbers to be of a fixed length. For example:
>>> 0000 + 1
1
Instead, I'd like the integer that is returned to be 0001. Are there any built-in commands for setting an integer of fixed length?
Edit: I do not need to print these integers, I need to actually manipulate them. I will need them to iterate but they must be fixed length so that they can be easily found in a networking protocol head file.
What you're asking doesn't make any sense. The integer 0011 and the integer 11 are exactly the same number.*
If you want to format them as strings to print them out or to search a text file, you can do that with, e.g., format(n, '04'). It doesn't matter whether you're formatting 11 or 0011, they're both the same number, and that number will format to the string '0011'.
If you want to convert them to big-endian 32-bit C-style unsigned integers, again, they're both the same number, and struct.pack('>I', n) will pack that number to the byte string b'\x00\x00\x00\x0b'.
If you want to add them modulo 10000, again, they're both the same number, and (n + 9990) % 10000 will give you 1.
No matter what operation you dream up, there will be no difference.
* Actually, in Python 2.x, number literals starting with 0 are treated as octal, not decimal, so 0011 is actually 9, not 11. And in 3.x numbers starting with 0 are a SyntaxError, to avoid the confusion caused by accidentally writing octal numbers. But forget all that. We're not talking about the Python number literals, we're talking about something even simpler here: the numbers themselves.
Numbers don't have a "length", they're just numbers. The representation of a number as text, in a string, has a length. To convert numbers to strings in Python, use the format() function:
x = 1
s = "{:04d}".format(x)
print(s)

Convert a list of float to string in Python

I have a list of floats in Python and when I convert it into a string, I get the following
[1883.95, 1878.3299999999999, 1869.4300000000001, 1863.4000000000001]
These floats have 2 digits after the decimal point when I created them (I believe so),
Then I used
str(mylist)
How do I get a string with 2 digits after the decimal point?
======================
Let me be more specific, I want the end result to be a string and I want to keep the separators:
"[1883.95, 1878.33, 1869.43, 1863.40]"
I need to do some string operations afterwards. For example +="!\t!".
Inspired by #senshin the following code works for example, but I think there is a better way
msg = "["
for x in mylist:
msg += '{:.2f}'.format(x)+','
msg = msg[0:len(msg)-1]
msg+="]"
print msg
Use string formatting to get the desired number of decimal places.
>>> nums = [1883.95, 1878.3299999999999, 1869.4300000000001, 1863.4000000000001]
>>> ['{:.2f}'.format(x) for x in nums]
['1883.95', '1878.33', '1869.43', '1863.40']
The format string {:.2f} means "print a fixed-point number (f) with two places after the decimal point (.2)". str.format will automatically round the number correctly (assuming you entered the numbers with two decimal places in the first place, in which case the floating-point error won't be enough to mess with the rounding).
If you want to keep full precision, the syntactically simplest/clearest way seems to be
mylist = list(map(str, mylist))
map(lambda n: '%.2f'%n, [1883.95, 1878.3299999999999, 1869.4300000000001, 1863.4000000000001])
map() invokes the callable passed in the first argument for each element in the list/iterable passed as the second argument.
Get rid of the ' marks:
>>> nums = [1883.95, 1878.3299999999999, 1869.4300000000001, 1863.4000000000001]
>>> '[{:s}]'.format(', '.join(['{:.2f}'.format(x) for x in nums]))
'[1883.95, 1878.33, 1869.43, 1863.40]'
['{:.2f}'.format(x) for x in nums] makes a list of strings, as in the accepted answer.
', '.join([list]) returns one string with ', ' inserted between the list elements.
'[{:s}]'.format(joined_string) adds the brackets.
str([round(i, 2) for i in mylist])
Using numpy you may do:
np.array2string(np.asarray(mylist), precision=2, separator=', ')

Categories

Resources