I implemented one's complement addition of 16 bit integers in python, however I am trying to see if there is a better way to do it.
# This function returns a string of the bits (exactly 16 bits)
# for the number (in base 10 passed to it)
def get_bits(some_num):
binar = bin(some_num)[2::]
zeroes = 16 - len(binar)
padding = zeroes*"0"
binar = padding + binar
return binar
# This function adds the numbers, and handles the carry over
# from the most significant bit
def add_bits(num1, num2):
result = bin(int(num1,2) + int(num2,2))[2::]
# There is no carryover
if len(result) <= 16 :
result = get_bits(int(result,2))
# There is carryover
else :
result = result[1::]
one = '0000000000000001'
result = bin(int(result,2) + int(one,2))[2::]
result = get_bits(int(result,2))
return result
And now an example of running it would be:
print add_bits("1010001111101001", "1000000110110101")
returns :
0010010110011111
Is what wrote safe as far as results (Note I didn't do any negation here since that part is trivial, I am more interested in the intermediate steps)? Is there a better pythonic way to do it?
Thanks for any help.
Converting back and forth between string and ints to do math is inefficient. Do the math in integers and use formatting to display binary:
MOD = 1 << 16
def ones_comp_add16(num1,num2):
result = num1 + num2
return result if result < MOD else (result+1) % MOD
n1 = 0b1010001111101001
n2 = 0b1000000110110101
result = ones_comp_add16(n1,n2)
print('''\
{:016b}
+ {:016b}
------------------
{:016b}'''.format(n1,n2,result))
Output:
1010001111101001
+ 1000000110110101
------------------
0010010110011111
Converting back and forth between numbers, lists of one-bit strings, and strings probably doesn't feel like a very Pythonic way to get started.
More specifically, converting an int to a sequence of bits by using bin(i)[2:] is pretty hacky. It may be worth doing anyway (e.g., because it's more concise or more efficient than doing it numerically), but even if it is, it would be better to wrap it in a function named for what it does (and maybe even add a comment explaining why you did it that way).
You've also got unnecessarily complexifying code in there. For example, to do the carry, you do this:
one = '0000000000000001'
result = bin(int(result,2) + int(one,2))[2::]
But you know that int(one,2) is just the number 1, unless you've screwed up, so why not just use 1, which is shorter, more readable and obvious, and removes any chance of screwing up?
And you're not following PEP 8 style.
So, sticking with your basic design of "use a string for the bits, use only the basic string operations that are unchanged from Python 1.5 through 3.5 instead of format, and do the basic addition on integers instead of on the bits", I'd write it something like this:
def to_bits(n):
return bin(n)[2:]
def from_bits(n):
return int(n, 2)
def pad_bits(b, length=16):
return ["0"*length + b][-length:]
def add_bits(num1, num2):
result = to_bits(from_bits(num1) + from_bits(num2))
if len(result) <= 16: # no carry
return pad_bits(result)
return pad_bits(to_bits(from_bits(result[1:]) + 1))
But an even better solution would be to abstract out the string representation completely. Build a class that knows how to act like an integer, but also knows how to act like a sequence of bits. Or just find one on PyPI. Then your code becomes trivial. For example:
from bitstring import BitArray
def add_bits(n1, n2):
"""
Given two BitArray values of the same length, return a BitArray
of the same length that's the one's complement addition.
"""
result = n1.uint + n2.uint
if result >= (1 << n1.length):
result = result % n1.length + 1
return BitArray(uint=result, length=n1.length)
I'm not sure that bitstring is actually the best module for what you're doing. There are a half-dozen different bit-manipulating libraries on PyPI, all of which have different interfaces and different strengths and weaknesses; I just picked the first one that came up in a search and slapped together an implementation using it.
Related
Background
I have a function called get_player_path that takes in a list of strings player_file_list and a int value total_players. For the sake of example i have reduced the list of strings and also set the int value to a very small number.
Each string in the player_file_list either has a year-date/player_id/some_random_file.file_extension or
year-date/player_id/IDATs/some_random_number/some_random_file.file_extension
Issue
What i am essentially trying to achieve here is go through this list and store all unique year-date/player_id path in a set until it's length reaches the value of total_players
My current approach does not seem the most efficient to me and i am wondering if i can speed up my function get_player_path in anyway??
Code
def get_player_path(player_file_list, total_players):
player_files_to_process = set()
for player_file in player_file_list:
player_file = player_file.split("/")
file_path = f"{player_file[0]}/{player_file[1]}/"
player_files_to_process.add(file_path)
if len(player_files_to_process) == total_players:
break
return sorted(player_files_to_process)
player_file_list = [
"2020-10-27/31001804320549/31001804320549.json",
"2020-10-27/31001804320549/IDATs/204825150047/foo_bar_Red.idat",
"2020-10-28/31001804320548/31001804320549.json",
"2020-10-28/31001804320548/IDATs/204825150123/foo_bar_Red.idat",
"2020-10-29/31001804320547/31001804320549.json",
"2020-10-29/31001804320547/IDATs/204825150227/foo_bar_Red.idat",
"2020-10-30/31001804320546/31001804320549.json",
"2020-10-30/31001804320546/IDATs/123455150047/foo_bar_Red.idat",
"2020-10-31/31001804320545/31001804320549.json",
"2020-10-31/31001804320545/IDATs/597625150047/foo_bar_Red.idat",
]
print(get_player_path(player_file_list, 2))
Output
['2020-10-27/31001804320549/', '2020-10-28/31001804320548/']
Let's analyze your function first:
your loop should take linear time (O(n)) in the length of the input list, assuming the path lengths are bounded by a relatively "small" number;
the sorting takes O(n log(n)) comparisons.
Thus the sorting has the dominant cost when the list becomes big. You can micro-optimize your loop as much as you want, but as long as you keep that sorting at the end, your effort won't make much of a difference with big lists.
Your approach is fine if you're just writing a Python script. If you really needed perfomances with huge lists, you would probably be using some other language. Nonetheless, if you really care about performances (or just to learn new stuff), you could try one of the following approaches:
replace the generic sorting algorithm with something specific for strings; see here for example
use a trie, removing the need for sorting; this could be theoretically better but probably worse in practice.
Just for completeness, as a micro-optimization, assuming the date has a fixed length of 10 characters:
def get_player_path(player_file_list, total_players):
player_files_to_process = set()
for player_file in player_file_list:
end = player_file.find('/', 12) # <--- len(date) + len('/') + 1
file_path = player_file[:end] # <---
player_files_to_process.add(file_path)
if len(player_files_to_process) == total_players:
break
return sorted(player_files_to_process)
If the IDs have fixed length too, as in your example list, then you don't need any split or find, just:
LENGTH = DATE_LENGTH + ID_LENGTH + 1 # 1 is for the slash between date and id
...
for player_file in player_file_list:
file_path = player_file[:LENGTH]
...
EDIT: fixed the LENGTH initialization, I had forgotten to add 1
I'll leave this solution here which can be further improved, hope it helps.
player_file_list = (
"2020-10-27/31001804320549/31001804320549.json",
"2020-10-27/31001804320549/IDATs/204825150047/foo_bar_Red.idat",
"2020-10-28/31001804320548/31001804320549.json",
"2020-10-28/31001804320548/IDATs/204825150123/foo_bar_Red.idat",
"2020-10-29/31001804320547/31001804320549.json",
"2020-10-29/31001804320547/IDATs/204825150227/foo_bar_Red.idat",
"2020-10-30/31001804320546/31001804320549.json",
"2020-10-30/31001804320546/IDATs/123455150047/foo_bar_Red.idat",
"2020-10-31/31001804320545/31001804320549.json",
"2020-10-31/31001804320545/IDATs/597625150047/foo_bar_Red.idat",
)
def get_player_path(l, n):
pfl = set()
for i in l:
i = "/".join(i.split("/")[0:2])
if i not in pfl:
pfl.add(i)
if len(pfl) == n:
return pfl
if n > len(pfl):
print("not enough matches")
return
print(get_player_path(player_file_list, 2))
# {'2020-10-27/31001804320549', '2020-10-28/31001804320548'}
Python Demo
Use dict so that you don't have to sort it since your list is already sorted. If you still need to sort you can always use sorted in the return statement. Add import re and replace your function as follows:
def get_player_path(player_file_list, total_players):
dct = {re.search('^\w+-\w+-\w+/\w+',pf).group(): 1 for pf in player_file_list}
return [k for i,k in enumerate(dct.keys()) if i < total_players]
I'm new to programming so I thought I'd ask here for help.
So when I use:
eval('12.5 + 3.2'),
it converts 12.5 and 3.2 into floats.
But I want them to be converted into the Decimal datatype.
I can use:
from decimal import Decimal
eval(Decimal(12.5) + Decimal(3.2))
But I can't do that in my program as I'm accepting user input.
I've found a solution but it uses regular expressions, which I'm not familiar with right now (and I can't find it again for some reason).
It would be great if someone could help me out. Thanks!
UPDATE: apparently the official docs has a recipe that does exactly what you're looking for. From https://docs.python.org/3/library/tokenize.html#examples:
from tokenize import tokenize, untokenize, NUMBER, STRING, NAME, OP
from io import BytesIO
def decistmt(s):
"""Substitute Decimals for floats in a string of statements.
>>> from decimal import Decimal
>>> s = 'print(+21.3e-5*-.1234/81.7)'
>>> decistmt(s)
"print (+Decimal ('21.3e-5')*-Decimal ('.1234')/Decimal ('81.7'))"
The format of the exponent is inherited from the platform C library.
Known cases are "e-007" (Windows) and "e-07" (not Windows). Since
we're only showing 12 digits, and the 13th isn't close to 5, the
rest of the output should be platform-independent.
>>> exec(s) #doctest: +ELLIPSIS
-3.21716034272e-0...7
Output from calculations with Decimal should be identical across all
platforms.
>>> exec(decistmt(s))
-3.217160342717258261933904529E-7
"""
result = []
g = tokenize(BytesIO(s.encode('utf-8')).readline) # tokenize the string
for toknum, tokval, _, _, _ in g:
if toknum == NUMBER and '.' in tokval: # replace NUMBER tokens
result.extend([
(NAME, 'Decimal'),
(OP, '('),
(STRING, repr(tokval)),
(OP, ')')
])
else:
result.append((toknum, tokval))
return untokenize(result).decode('utf-8')
Which you can then use like so:
from decimal import Decimal
s = "12.5 + 3.2 + 1.0000000000000001 + (1.0 if 2.0 else 3.0)"
s = decistmt(s)
print(s)
print(eval(s))
Result:
Decimal ('12.5')+Decimal ('3.2')+Decimal ('1.0000000000000001')+(Decimal ('1.0')if Decimal ('2.0')else Decimal ('3.0'))
17.7000000000000001
Feel free to skip the rest of this answer, which is now only of interest to historians of half-correct solutions.
As far as I know, there's no easy way to "hook into" eval in order to change how it interprets float objects.
But if we use the ast module to convert your string into an abstract syntax tree before evaling it, then we can manipulate the tree to replace the floats with Decimal calls.
import ast
from decimal import Decimal
def construct_decimal_node(value):
return ast.Call(
func = ast.Name(id="Decimal", ctx=ast.Load()),
args = [value],
keywords = []
)
return expr
class FloatLiteralReplacer(ast.NodeTransformer):
def visit_Num(self, node):
return construct_decimal_node(node)
s = '12.5 + 3.2'
node = ast.parse(s, mode="eval")
node = FloatLiteralReplacer().visit(node)
ast.fix_missing_locations(node) #add diagnostic information to the nodes we created
code = compile(node, filename="", mode="eval")
result = eval(code)
print("The type of the result of this expression is:", type(result))
print("The result of this expression is:", result)
Result:
The type of the result of this expression is: <class 'decimal.Decimal'>
The result of this expression is: 15.70000000000000017763568394
As you can see, the result is identical to what you would have gotten if you had calculated Decimal(12.5) + Decimal(3.2) directly.
But perhaps you're thinking "Why isn't the result 15.7?". This is because Decimal(3.2) is not exactly identical to 3.2. It's actually equal to 3.20000000000000017763568394002504646778106689453125. This is a hazard when it comes to initializing decimals using float objects -- the inaccuracy is already present. Better to use strings to create decimals, e.g. Decimal("3.2").
Maybe you're now thinking "Ok, so how do I turn 12.5 + 3.2 into Decimal("12.5") + Decimal("3.2")?". The quickest approach would be to modify construct_decimal_node so the Call's args is an ast.Str rather than an ast.Num:
import ast
from decimal import Decimal
def construct_decimal_node(value):
return ast.Call(
func = ast.Name(id="Decimal", ctx=ast.Load()),
args = [ast.Str(str(value.n))],
keywords = []
)
return expr
class FloatLiteralReplacer(ast.NodeTransformer):
def visit_Num(self, node):
return construct_decimal_node(node)
s = '12.5 + 3.2'
node = ast.parse(s, mode="eval")
node = FloatLiteralReplacer().visit(node)
ast.fix_missing_locations(node) #add diagnostic information to the nodes we created
code = compile(node, filename="", mode="eval")
result = eval(code)
print("The type of the result of this expression is:", type(result))
print("The result of this expression is:", result)
Result:
The type of the result of this expression is: <class 'decimal.Decimal'>
The result of this expression is: 15.7
But take care: while I expect this approach to return good results most of the time, there is a corner case where it returns surprising results. In particular, when the expression contains a float f such that float(str(f)) != f. In other words, when the printed representation of the float lacks the precision necessary to represent the float exactly.
For example, if you changed s in the above code to "1.0000000000000001 + 0", the result would be 1.0. This is incorrect, since the result of Decimal("1.0000000000000001") + Decimal("0") is 1.0000000000000001.
I'm not sure how you could prevent this problem... By the time ast.parse has finished executing, the float literal has already been converted into a float object, and there's no obvious way to retrieve the string that was used to create it. Perhaps you could extract it from the expression string, but you'd basically have to reinvent Python's parser to do that.
So we can generate a unique id with str(uuid.uuid4()), which is 36 characters long.
Is there another method to generate a unique ID which is shorter in terms of characters?
EDIT:
If ID is usable as primary key then even better
Granularity should be better than 1ms
This code could be distributed, so we can't assume time independence.
If this is for use as a primary key field in db, consider just using auto-incrementing integer instead.
str(uuid.uuid4()) is 36 chars but it has four useless dashes (-) in it, and it's limited to 0-9 a-f.
Better uuid4 in 32 chars:
>>> uuid.uuid4().hex
'b327fc1b6a2343e48af311343fc3f5a8'
Or just b64 encode and slice some urandom bytes (up to you to guarantee uniqueness):
>>> base64.b64encode(os.urandom(32))[:8]
b'iR4hZqs9'
TLDR
Most of the times it's better to work with numbers internally and encode them to short IDs externally. So here's a function for Python3, PowerShell & VBA that will convert an int32 to an alphanumeric ID. Use it like this:
int32_to_id(225204568)
'F2AXP8'
For distributed code use ULIDs: https://github.com/mdipierro/ulid
They are much longer but unique across different machines.
How short are the IDs?
It will encode about half a billion IDs in 6 characters so it's as compact as possible while still using only non-ambiguous digits and letters.
How can I get even shorter IDs?
If you want even more compact IDs/codes/Serial Numbers, you can easily expand the character set by just changing the chars="..." definition. For example if you allow all lower and upper case letters you can have 56 billion IDs within the same 6 characters. Adding a few symbols (like ~!##$%^&*()_+-=) gives you 208 billion IDs.
So why didn't you go for the shortest possible IDs?
The character set I'm using in my code has an advantage: It generates IDs that are easy to copy-paste (no symbols so double clicking selects the whole ID), easy to read without mistakes (no look-alike characters like 2 and Z) and rather easy to communicate verbally (only upper case letters). Sticking to numeric digits only is your best option for verbal communication but they are not compact.
I'm convinced: show me the code
Python 3
def int32_to_id(n):
if n==0: return "0"
chars="0123456789ACEFHJKLMNPRTUVWXY"
length=len(chars)
result=""
remain=n
while remain>0:
pos = remain % length
remain = remain // length
result = chars[pos] + result
return result
PowerShell
function int32_to_id($n){
$chars="0123456789ACEFHJKLMNPRTUVWXY"
$length=$chars.length
$result=""; $remain=[int]$n
do {
$pos = $remain % $length
$remain = [int][Math]::Floor($remain / $length)
$result = $chars[$pos] + $result
} while ($remain -gt 0)
$result
}
VBA
Function int32_to_id(n)
Dim chars$, length, result$, remain, pos
If n = 0 Then int32_to_id = "0": Exit Function
chars$ = "0123456789ACEFHJKLMNPRTUVWXY"
length = Len(chars$)
result$ = ""
remain = n
Do While (remain > 0)
pos = remain Mod length
remain = Int(remain / length)
result$ = Mid(chars$, pos + 1, 1) + result$
Loop
int32_to_id = result
End Function
Function id_to_int32(id$)
Dim chars$, length, result, remain, pos, value, power
chars$ = "0123456789ACEFHJKLMNPRTUVWXY"
length = Len(chars$)
result = 0
power = 1
For pos = Len(id$) To 1 Step -1
result = result + (InStr(chars$, Mid(id$, pos, 1)) - 1) * power
power = power * length
Next
id_to_int32 = result
End Function
Public Sub test_id_to_int32()
Dim i
For i = 0 To 28 ^ 3
If id_to_int32(int32_to_id(i)) <> i Then Debug.Print "Error, i=", i, "int32_to_id(i)", int32_to_id(i), "id_to_int32('" & int32_to_id(i) & "')", id_to_int32(int32_to_id(i))
Next
Debug.Print "Done testing"
End Sub
Yes. Just use the current UTC millis. This number never repeats.
const uniqueID = new Date().getTime();
EDIT
If you have the rather seldom requirement to produce more than one ID within the same millisecond, this method is of no use as this number‘s granularity is 1ms.
I'm writing a program using bitarray
for example:
bytePerInt = sys.getsizeof(1)
class BitMap(object):
def __init__(self,bits):
self.bitsPerInt = 8*bytePerInt
size = bits/self.bitsPerInt+1
self.bitarray = [0]*size
#set the bit of pos as 1
def setBit(self,pos):
index = pos/self.bitsPerInt
shift = pos%self.bitsPerInt
operator = self.bitarray[index]
mask = 1<<shift
operator|=mask
self.bitarray[index] = operator
I want to get the modulus with adding instead of %, such as num&31 instead of num%32.
However, bytePerInt is 24 in my computer, bitsPerInt is 24*8=192, which is not a power-of-2-number, as a result, I can't anding 191 to get the modulus, so what I can do?
Like others I'm not sure what you mean by and the essential element in the array is Int, but if you are creating a bit array of booleans (1 and 0), use bitarray.
I'm rewriting some code from Ruby to Python. The code is for a Perceptron, listed in section 8.2.6 of Clever Algorithms: Nature-Inspired Programming Recipes. I've never used Ruby before and I don't understand this part:
def test_weights(weights, domain, num_inputs)
correct = 0
domain.each do |pattern|
input_vector = Array.new(num_inputs) {|k| pattern[k].to_f}
output = get_output(weights, input_vector)
correct += 1 if output.round == pattern.last
end
return correct
end
Some explanation: num_inputs is an integer (2 in my case), and domain is a list of arrays: [[1,0,1], [0,0,0], etc.]
I don't understand this line:
input_vector = Array.new(num_inputs) {|k| pattern[k].to_f}
It creates an array with 2 values, every values |k| stores pattern[k].to_f, but what is pattern[k].to_f?
Try this:
input_vector = [float(pattern[i]) for i in range(num_inputs)]
pattern[k].to_f
converts pattern[k] to a float.
I'm not a Ruby expert, but I think it would be something like this in Python:
def test_weights(weights, domain, num_inputs):
correct = 0
for pattern in domain:
output = get_output(weights, pattern[:num_inputs])
if round(output) == pattern[-1]:
correct += 1
return correct
There is plenty of scope for optimising this: if num_inputs is always one less then the length of the lists in domain then you may not need that parameter at all.
Be careful about doing line by line translations from one language to another: that tends not to give good results no matter what languages are involved.
Edit: since you said you don't think you need to convert to float you can just slice the required number of elements from the domain value. I've updated my code accordingly.