Limitations to .format in python - python

So I am fairly new to Python as I am sure will become apparent.
Anyways, is there a limit to the number of arguments I can pass when using .format?
I have a list of 8000 numbers that need to replace existing numbers in a long input in various places in the input. At the moment, I am planning on doing this:
text = """ very long input with many {0}..{1}..{8000} in various places """
file = open('new_input', 'w')
file.write(text.format(x,x1,x2,....x8000))
Any advice would be much appreciated!

As wim notes, you could do it with argument unpacking, but if you actually passed them positionally as individual named arguments, it wouldn't work; there is a limit of 255 explicitly provided individual arguments.
Demonstration:
>>> globals().update(('x{}'.format(i), i) for i in range(8000))
>>> codestr = '("{{}}"*8000).format({})'.format(', '.join('x{}'.format(i) for i in range(8000)))
>>> eval(codestr)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1
SyntaxError: more than 255 arguments
The limit is due to how the CALL_FUNCTION opcode is defined; it's encoded as a single byte indicating the opcode, then one byte for the number of positional arguments, and one for the number of keyword arguments. While in theory it could handle up to 510 total arguments, they actually impose a combined limit of 255 arguments, presumably for consistency. So you can't actually call a function with more than 255 total arguments without involving * or ** unpacking.
This is all technically an implementation detail BTW; there is no language requirement that it work this way, so it could change in a future release of CPython (the reference interpreter) and behave differently in any other interpreter (most of which don't produce or use CPython bytecode anyway) right now.

I'm not aware of any hard limit and 8000 is not that big anyway, I think it should not be any problem.
Example with positional templating:
>>> text = "{} "*8000
>>> text = text.format(*range(8000))
>>> '{' in text
False
>>> text[:50]
'0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 '
>>> text[-50:]
'7990 7991 7992 7993 7994 7995 7996 7997 7998 7999 '
Example with name templating:
>>> s = ' '.join(['{{x{}}}'.format(i) for i in range(8000)])
>>> d = {'x{}'.format(i):i for i in range(8000)}
>>> s[:25] + '...' + s[-24:]
'{x0} {x1} {x2} {x3} {x4} ... {x7997} {x7998} {x7999}'
>>> s = s.format(**d)
>>> s[:25] + '...' + s[-24:]
'0 1 2 3 4 5 6 7 8 9 10 11...7995 7996 7997 7998 7999'

Related

How to calculate the verification digit of the Tax ID in the country of Paraguay (calcular digito verificador del RUC)

In the country of Paraguay (South America) each taxpayer has a Tax ID (called RUC: Registro Único del Contribuyente) assigned by the government (Ministerio de Hacienda, Secretaría de Tributación).
This RUC is a number followed by a verification digit (dígito verificador), for example 123456-0. The government tells you the verification digit when you request your RUC.
Is there a way for me to calculate the verification digit based on the RUC? Is it a known formula?
In my case, I have a database of suppliers and customers, collected over the years by several employees of the company.
Now I need to run checks to see if all the RUCs were entered correctly or if there are typing mistakes.
My preference would be a Python solution, but I'll take whatever solutions I get to point me in the right direction.
Edit: This is a self-answer to share knowledge that took me hours/days to find. I marked this question as "answer your own question" (don't know if that changes anything).
The verification digit of the RUC is calculated using formula very similar (but not equal) to a method called Modulo 11; that is at least the info I got reading the following tech sites (content is in Spanish):
https://www.yoelprogramador.com/funncion-para-calcular-el-digito-verificador-del-ruc/
http://groovypy.wikidot.com/blog:02
https://es.wikipedia.org/wiki/C%C3%B3digo_de_control#M.C3.B3dulo_11
I analyzed the solutions provided in the mentioned pages and ran my own tests against a list of RUCs and their known verification digits, which led me to a final formula that returns the expected output, but which is DIFFERENT from the solutions in the mentioned links.
The final formula I got to calculate the verification digit of the RUC is shown in this example (80009735-1):
Multiply each digit of the RUC (without considering the verification digit) by a factor based on the position of the digit within the RUC (starting from the right side of the RUC) and sum all the results of these multiplications:
RUC: 8 0 0 0 9 7 3 5
Position: 7 6 5 4 3 2 1 0
Multiplications: 8x(7+2) 0x(6+2) 0x(5+2) 0x(4+2) 9x(3+2) 7x(2+2) 3x(1+2) 5x(0+2)
Results: 72 0 0 0 45 28 9 10
Sum of results: 164
Divide the sum by 11 and use the remainder of the division to determine the verification digit:
If the remainder is greater than 1, the the verification digit is 11 - remainder
If the remainder is 0 or 1, the the verification digit is 0
In out example:
Sum of results: 164
Division: 164 / 11 ==> quotient 14, remainder 10
Verification digit: 11 - 10 ==> 1
Here is my Python version of the formula:
def calculate_dv_of_ruc(input_str):
# assure that we have a string
if not isinstance(input_str, str):
input_str = str(input_str)
# try to convert to 'int' to validate that it contains only digits.
# I suspect that this is faster than checking each char independently
int(input_str)
the_sum = 0
for i, c in enumerate(reversed(input_str)):
the_sum += (i + 2) * int(c)
base = 11
_, rem = divmod(the_sum, base)
if rem > 1:
dv = base - rem
else:
dv = 0
return dv
Testing this function it returns the expected results, raising errors when the input has other characters than digits:
>>> calculate_dv_of_ruc(80009735)
1
>>> calculate_dv_of_ruc('80009735')
1
>>> calculate_dv_of_ruc('80009735A')
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "<input>", line 8, in calculate_dv_of_ruc
ValueError: invalid literal for int() with base 10: '80009735A'

Unexpected Python for-loop behaviour?

Can someone explain what happened in the second run ? Why did I get a stream of 9's when the code should have given an error?
>>> for __ in range(10): #first run
... print(__)
...
0
1
2
3
4
5
6
7
8
9
This was the second run
>>> for __ in range(10): #second run
... print(_)
...
9
9
9
9
9
9
9
9
9
9
>>> exit()
After this, when I ran the code for the third time, the same code executed as expected and gave the below error. I realize that this question has no practical use. But, I would really like to know why it happened?
NameError: name '_' is not defined
The _ variable is set in the Python interpreter, always holding the last non-None result of any expression statement that has been run.
From the Reserved Classifiers and Identifiers reference:
The special identifier _ is used in the interactive interpreter to store the result of the last evaluation; it is stored in the builtins module.
and from sys.displayhook():
If value is not None, this function prints repr(value) to sys.stdout, and saves value in builtins._. [...] sys.displayhook is called on the result of evaluating an expression entered in an interactive Python session.
Here, that result was 9, from an expression you must have run before the code you shared.
The NameError indicates you restarted the Python interpreter and did not yet run an expression statement yet that produced a non-None value:
>>> _
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name '_' is not defined
>>> 3 * 3
9
>>> _
9

Counting the real number of arguments in python

Is there any way to count the real number of arguments passed to a function in python, even when some defaults values are set? I'm trying to write a function, which replaces a certain range of a text file with 0 (or an offset value), but it doesn't work because python returns the number of arguments including the arguments which are not passed.
The input file is like this:
foo.txt
0
1
2
3
4
5
6
7
8
9
10
11
Here is the code:
import os
import numpy as np
from inspect import signature
def substitute_with(FILE, start, end=10, offset=0):
sig = signature(substitute_with)
params = sig.parameters
print('len(params): %d' % len(params))
filename, file_extension = os.path.splitext(FILE)
file_i = FILE
file_o = filename + '_0' + file_extension
Z = np.loadtxt(file_i)
with open(file_o, "w") as fid:
length_Z = len(Z)
print('length_Z: %d' % length_Z)
if(len(params) < 3): # gives me 4, but actually, I need 2 here!
end=length_Z
X = np.concatenate([np.ones((start)), np.zeros((end-start)), np.ones((length_Z-end))])
Y = np.concatenate([np.zeros((start)), np.ones((end-start)), np.zeros((length_Z-end))])*offset
A=Z.T*X+Y
for i in range(0, length_Z):
fid.write('%d\n' % (A[i]))
#substitute_with('foo.txt',4,8)
#substitute_with('foo.txt',4,8,2)
substitute_with('foo.txt',4)
... This works only when the 3rd argument 'end' is passed. Without the 3rd argument, from 4 through the end (11) are supposed to be replaced with 0. But, in reality, from 4 through 9 are replaced with 0.
I reluctantly set a default value (=10) to end, otherwise the compiler gives me the following error:
TypeError: substitute_with() missing 1 required positional argument: 'end'
So, how would you guys solve this? Please don't tell me to check the length of the file first and then give it to the function. It should be done inside the function. In MATLAB, 'nargin' returns the real number of arguments, so this kind of logic would work easily. If python cannot do this, it's gonna be a shame.
Just use None as the default for end:
def substitute_with(FILE, start, end=None, offset=0):
...
if end is None:
end = length_Z
...

Modify function at runtime (pulling local variable out)

Imagine this simple function creating a modified value of a variable default, modified:
default = 0
def modify():
modified = default + 1
print(modified) # replace with OS call, I can't see the output
modify() # 1
default # 0
disassembled:
import dis
dis.dis(modify)
2 0 LOAD_GLOBAL 0 (default)
3 LOAD_CONST 1 (1)
6 BINARY_ADD
7 STORE_FAST 0 (modified)
3 10 LOAD_GLOBAL 1 (print)
13 LOAD_FAST 0 (modified)
16 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
19 POP_TOP
20 LOAD_CONST 0 (None)
23 RETURN_VALUE
I can't change the function modify(), but I know what's in it either directly (I can see the code) or indirectly (disassembly). What I need it is to get a value of the modified variable, so I though maybe there is a way how to remove specific parts (print(modified)) of the function through dis module, but I didn't find anything.
Is there any way how to remove probably everything except return_value after 16 CALL_FUNCTION and replace it with e.g. return modified? Or is there any other way how to pull a local variable out without actually executing the last line(s)?
As a possible solution I see 3 ways:
pulling disassembled codes and creating my own function (or inplace) according to them with removing the code I don't want (everything after 16 ...)
modifying the function's return value, so that it returns modified (that unfortunately calls the OS function)
manually recreating the function according to the source code
I'd like to avoid the second way, which is probably easier than the first one, but I must avoid the third way, so... is there any way how to solve my problem?
There is a 4th option: replace the print() global:
printed = []
print = lambda *args: printed.extend(args)
modify()
del print
modified = printed[0]
It is otherwise possible to produce modified bytecode, but this can easily lead to bugs that blow up the interpreter (there is zero protection from invalid bytecode), so be warned.
You can create a new function object with a new code object with updated bytecode; based on the offsets in the dis you showed, I manually created new bytecode that would return the local variable at index 0:
>>> altered_bytecode = modify.__code__.co_code[:8] + bytes(
... [dis.opmap['LOAD_FAST'], 0, # load local variable 0 onto the stack
... dis.opmap['RETURN_VALUE']])) # and return it.
>>> dis.dis(altered_bytecode)
0 LOAD_GLOBAL 0 (0)
2 LOAD_CONST 1 (1)
4 BINARY_ADD
6 STORE_FAST 0 (0)
8 LOAD_FAST 0 (0)
10 RETURN_VALUE
RETURN_VALUE returns the object at the top of the stack; all I did was inject a LOAD_FAST opcode to load what modified references onto the stack.
You'd have to create a new code object, then a new function object wrapping the code object, to make this callable:
>>> code = type(modify.__code__)
>>> function = type(modify)
>>> ocode = modify.__code__
>>> new_modify = function(
... code(ocode.co_argcount, ocode.co_kwonlyargcount, ocode.co_nlocals, ocode.co_stacksize,
... ocode.co_flags, altered_bytecode,
... ocode.co_consts, ocode.co_names, ocode.co_varnames, ocode.co_filename,
... 'new_modify', ocode.co_firstlineno, ocode.co_lnotab, ocode.co_freevars,
... ocode.co_cellvars),
... modify.__globals__, 'new_modify', modify.__defaults__, modify.__closure__)
>>> new_modify()
1
This does, obviously, require some understanding of how Python bytecode works in the first place; the dis module does contain descriptions of the various codes, and the dis.opmap dictionary lets you map back to byte values.
There are a few modules out there that try to make this easier; take a look at byteplay, the bytecode module of the pwnypack project or several others, if you want to explore this further.
I can also heartily recommend you watch the Playing with Python Bytecode presentation given by Scott Sanderson, Joe Jevnik at PyCon 2016, and play with their codetransformer module. Highly entertaining and very informative.

modifying python bytecode

I was wondering how to modify byte code, then recompile that code so I can use it in python as a function? I've been trying:
a = """
def fact():
a = 8
a = 0
"""
c = compile(a, '<string>', 'exec')
w = c.co_consts[0].co_code
dis(w)
which decompiles to:
0 LOAD_CONST 1 (1)
3 STORE_FAST 1 (1)
6 LOAD_CONST 2 (2)
9 STORE_FAST 1 (1)
12 LOAD_CONST 0 (0)
15 RETURN_VALUE
supposing I want to get rid of lines 0 and 3, I call:
x = c.co_consts[0].co_code[6:16]
dis(x)
which results in :
0 LOAD_CONST 2 (2)
3 STORE_FAST 1 (1)
6 LOAD_CONST 0 (0)
9 RETURN_VALUE
my problem is what to do with x, if I try exec x I get an 'expected string without nullbytes and I get the same for exec w,
trying to compile x results in: compile() expected string without null bytes.
I'm not sure what the best way to proceed, except maybe I need to create some kind of code-object, but I'm not sure how, but I'm assuming it must be
possible aka byteplay, python assemblers et al
I'm using python 2.7.10, but I'd like it to be future compatible (Eg python 3) if it's possible.
Update: For sundry reasons I have started writing a Cross-Python-version assembler. See https://github.com/rocky/python-xasm. It is still in very early beta. See also bytecode.
As far as I know there is no other currently-maintained Python assembler. PEAK's Bytecode Disassembler was developed for Python 2.6, and later modified to support early Python 2.7.
It is pretty cool from the documentation. But it relies on other PEAK libraries which might be problematic.
I'll go through the whole example to give you a feel for what you'd have to do. It is not pretty, but then you should expect that.
Basically after modifying the bytecode, you need to create a new types.CodeType object. You need a new one because many of the objects in the code type, for good reason, you can't change. For example the interpreter may have some of these object values cached.
After creating code, you can use this in functions that use a code type which can be used in exec or eval.
Or you can write this to a bytecode file. Alas the code format has changed between Python versions 1.3, 1,5, 2.0, 3.0, 3.8, and 3.10. And by the way so has the optimization and bytecodes. In fact, in Python 3.6 they will be word codes not bytecodes.
So here is what you'd have to do for your example:
a = """
def fact():
a = 8
a = 0
return a
"""
c = compile(a, '<string>', 'exec')
fn_code = c.co_consts[0] # Pick up the function code from the main code
from dis import dis
dis(fn_code)
print("=" * 30)
x = fn_code.co_code[6:16] # modify bytecode
import types
opt_fn_code = types.CodeType(fn_code.co_argcount,
# c.co_kwonlyargcount, Add this in Python3
# c.co_posonlyargcount, Add this in Python 3.8+
fn_code.co_nlocals,
fn_code.co_stacksize,
fn_code.co_flags,
x, # fn_code.co_code: this you changed
fn_code.co_consts,
fn_code.co_names,
fn_code.co_varnames,
fn_code.co_filename,
fn_code.co_name,
fn_code.co_firstlineno,
fn_code.co_lnotab, # In general, You should adjust this
fn_code.co_freevars,
fn_code.co_cellvars)
dis(opt_fn_code)
print("=" * 30)
print("Result is", eval(opt_fn_code))
# Now let's change the value of what's returned
co_consts = list(opt_fn_code.co_consts)
co_consts[-1] = 10
opt_fn_code = types.CodeType(fn_code.co_argcount,
# c.co_kwonlyargcount, Add this in Python3
# c.co_posonlyargcount, Add this in Python 3.8+
fn_code.co_nlocals,
fn_code.co_stacksize,
fn_code.co_flags,
x, # fn_code.co_code: this you changed
tuple(co_consts), # this is now changed too
fn_code.co_names,
fn_code.co_varnames,
fn_code.co_filename,
fn_code.co_name,
fn_code.co_firstlineno,
fn_code.co_lnotab, # In general, You should adjust this
fn_code.co_freevars,
fn_code.co_cellvars)
dis(opt_fn_code)
print("=" * 30)
print("Result is now", eval(opt_fn_code))
When I ran this here is what I got:
3 0 LOAD_CONST 1 (8)
3 STORE_FAST 0 (a)
4 6 LOAD_CONST 2 (0)
9 STORE_FAST 0 (a)
5 12 LOAD_FAST 0 (a)
15 RETURN_VALUE
==============================
3 0 LOAD_CONST 2 (0)
3 STORE_FAST 0 (a)
4 6 LOAD_FAST 0 (a)
9 RETURN_VALUE
==============================
('Result is', 0)
3 0 LOAD_CONST 2 (10)
3 STORE_FAST 0 (a)
4 6 LOAD_FAST 0 (a)
9 RETURN_VALUE
==============================
('Result is now', 10)
Notice that the line numbers haven't changed even though I removed in code a couple of lines. That is because I didn't update fn_code.co_lnotab.
If you want to now write a Python bytecode file from this. Here is what you'd do:
co_consts = list(c.co_consts)
co_consts[0] = opt_fn_code
c1 = types.CodeType(c.co_argcount,
# c.co_posonlyargcount, Add this in Python 3.8+
# c.co_kwonlyargcount, Add this in Python3
c.co_nlocals,
c.co_stacksize,
c.co_flags,
c.co_code,
tuple(co_consts),
c.co_names,
c.co_varnames,
c.co_filename,
c.co_name,
c.co_firstlineno,
c.co_lnotab, # In general, You should adjust this
c.co_freevars,
c.co_cellvars)
from struct import pack
with open('/tmp/testing.pyc', 'w') as fp:
fp.write(pack('Hcc', 62211, '\r', '\n')) # Python 2.7 magic number
import time
fp.write(pack('I', int(time.time())))
# In Python 3.7+ you need to PEP 552 bits
# In Python 3 you need to write out the size mod 2**32 here
import marshal
fp.write(marshal.dumps(c1))
To simplify writing the boilerplate bytecode above, I've added a routine to xasm called write_pycfile().
Now to check the results:
$ uncompyle6 /tmp/testing.pyc
# uncompyle6 version 2.9.2
# Python bytecode 2.7 (62211)
# Disassembled from: Python 2.7.12 (default, Jul 26 2016, 22:53:31)
# [GCC 5.4.0 20160609]
# Embedded file name: <string>
# Compiled at: 2016-10-18 05:52:13
def fact():
a = 0
# okay decompiling /tmp/testing.pyc
$ pydisasm /tmp/testing.pyc
# pydisasm version 3.1.0
# Python bytecode 2.7 (62211) disassembled from Python 2.7
# Timestamp in code: 2016-10-18 05:52:13
# Method Name: <module>
# Filename: <string>
# Argument count: 0
# Number of locals: 0
# Stack size: 1
# Flags: 0x00000040 (NOFREE)
# Constants:
# 0: <code object fact at 0x7f815843e4b0, file "<string>", line 2>
# 1: None
# Names:
# 0: fact
2 0 LOAD_CONST 0 (<code object fact at 0x7f815843e4b0, file "<string>", line 2>)
3 MAKE_FUNCTION 0
6 STORE_NAME 0 (fact)
9 LOAD_CONST 1 (None)
12 RETURN_VALUE
# Method Name: fact
# Filename: <string>
# Argument count: 0
# Number of locals: 1
# Stack size: 1
# Flags: 0x00000043 (NOFREE | NEWLOCALS | OPTIMIZED)
# Constants:
# 0: None
# 1: 8
# 2: 10
# Local variables:
# 0: a
3 0 LOAD_CONST 2 (10)
3 STORE_FAST 0 (a)
4 6 LOAD_CONST 0 (None)
9 RETURN_VALUE
$
An alternate approach for optimization is to optimize at the Abstract Syntax Tree level (AST). The compile, eval and exec functions can start from an AST, or you can dump the AST. You could also write this back out as Python source using the Python module astor
Note however that some kinds of optimization like tail-recursion elimination might leave bytecode in a form that it can't be transformed in a truly faithful way to source code. See my pycon2018 Columbia Lightning Talk for a video I made which eliminates tail recursion in bytecode to get an idea of what I'm talking about here.
If you want to be able to debug and single step bytecode instructions. See my bytecode interpreter and its bytecode debugger.

Categories

Resources