programmatically transform python code via lib2to3

programmatically transform python code via lib2to3 - python

I'd like to transform all occurrences of "some_func(a, b)" in a Python module to "assert a == b" via Python's standard lib's lib2to3. I wrote a script that would take source as input:
# convert.py
# convert assert_equal(a, b) to assert a == b
from lib2to3 import refactor
refac = refactor.RefactoringTool(['fix_assert_equal'])
result = refac.refactor_string('assert_equal(123, 456)\n', 'assert equal')
print(result)
and the actual fixer in a separate module:
# fix_assert_equal.py, in same folder as convert.py
from lib2to3 import fixer_base, pygram, pytree, pgen2
import ast
import logging
grammar = pygram.python_grammar
logger = logging.getLogger("RefactoringTool")
driver = pgen2.driver.Driver(grammar, convert=pytree.convert, logger=logger)
dest_tree = driver.parse_string('assert a == b\n')
class FixAssertEqual(fixer_base.BaseFix):
BM_compatible = True
PATTERN = """
power< 'assert_equal'
trailer<
'('
arglist<
obj1=any ','
obj2=any
>
')'
>
>
"""
def transform(self, node, results):
assert results
obj1 = results["obj1"]
obj1 = obj1.clone()
obj1.prefix = ""
obj2 = results["obj2"]
obj2 = obj2.clone()
obj2.prefix = ""
prefix = node.prefix
dest_tree2 = dest_tree.clone()
node = dest_tree2.children[0].children[0].children[1]
node.children[0] = obj1
node.children[2] = obj2
dest_tree2.prefix = prefix
return dest_tree2
However this produces the output assert123 ==456 instead of assert 123 == 456. Any idea how to fix this?

I found the solution: the obj2.prefix should not be set to "", this removes the space before the object.

Related

My Merkle Tree calculation does not match the actual one

I am studying bitcoin.
https://en.bitcoin.it/wiki/Protocol_documentation#Merkle_Trees
I read the above URL and implemented Merkle root in Python.
Using the API below, I collected all transactions in block 641150 and calculated the Merkle Root.
https://www.blockchain.com/explorer/api/blockchain_api
The following is the expected value
67a637b1c49d95165b3dd3177033adbbbc880f6da3620498d451ee0976d7b1f4
(https://www.blockchain.com/btc/block/641150 )
The values I calculated were as follows
f2a2207a1e8360b75729fd2f23659b1b79b14940b6e4982a985cf6aa6f941ad7
What is wrong?
My python code is;
from hashlib import sha256
import requests, json
base_url = 'https://blockchain.info/rawblock/'
block_hash = '000000000000000000042cef688cf40b4a70ac814e4222e6646bd6bb79d18168'
end_point = base_url + block_hash
def reverse(hex_be):
bytes_be = bytes.fromhex(hex_be)
bytes_le = bytes_be[::-1]
hex_le = bytes_le.hex()
return hex_le
def dhash(hash):
return sha256(sha256(hash.encode('utf-8')).hexdigest().encode('utf-8')).hexdigest()
def culculate_merkle(hash_list):
if len(hash_list) == 1:
return dhash(hash_list[0])
hashed_list = list(map(dhash, hash_list))
if len(hashed_list) % 2 == 1:
hashed_list.append(hashed_list[-1])
parent_hash_list = []
it = iter(hashed_list)
for hash1, hash2 in zip(it, it):
parent_hash_list.append(hash1 + hash2)
hashed_list = list(map(dhash, hash_list))
return culculate_merkle(parent_hash_list)
data = requests.get(end_point)
jsondata = json.loads(data.text)
tx_list = list(map(lambda tx_object: tx_object['hash'], jsondata['tx']))
markleroot = '67a637b1c49d95165b3dd3177033adbbbc880f6da3620498d451ee0976d7b1f4'
tx_list = list(map(reverse, tx_list))
output = culculate_merkle(tx_list)
output = reverse(output)
print(output)
result
$ python merkleTree.py
f2a2207a1e8360b75729fd2f23659b1b79b14940b6e4982a985cf6aa6f941ad7
I expect the following output as a result
67a637b1c49d95165b3dd3177033adbbbc880f6da3620498d451ee0976d7b1f4

self solving
first:
There was mistake in the dhash calculation method.
second:
The hash list obtained from the API had already had the dhash applied once.
⇒ The first time dhash was performed without this perspective
The complete source code is summarized in the following blog (in Japanese)
https://tec.tecotec.co.jp/entry/2022/12/25/000000

a filed in class doesn't initialize in a python function

I created a function that checking if a path is a directory.
The path is depend on a filed in class named "wafer_num".
The function returns me false, because the path it's getting is:
/usr/local/insight/results/images/toolsDB/lauto_ptest_s2022-08-09/w<class 'int'>
instead of: /usr/local/insight/results/images/toolsDB/lauto_ptest_s2022-08-09/w+int number
the whole path is constant until the int number that attached to the 'w' char in path.
for example: if wafer_num filed in class is: 2, than the path will be:
/usr/local/insight/results/images/toolsDB/lauto_ptest_s2022-08-09/w2
I'm initializing the class from another file. the initialization is working well because I have many other functions in my code that depends on this filed.
I'm attaching the function that making me troubles and the class too. many thaks:
# This is the other file that call a constructive function in my code:
import NR_success_check_BBs as check_BBs
# list of implemented recipes:
sanity = 'sanity_' + time_now + '_Ver_5.1'
R2M_delayer = 'R2M_E2E_delayer_NR_Ver_5.1'
R2M_cross_section = 'R2M_E2E_NR_Ver_5.1'
# ***********************************************************************************************
# ***********************************************************************************************
# the function set_recipe_name will create an object that contain all the data of that recipe
check_BBs.set_recipe_name(R2M_delayer)
# this function contain all the necessary functions to check the object.
check_BBs.success_check()
# EOF
# -------------------------------------------------------
# This is the code file [with constructive function attached] :
import datetime
import logging
import os.path
from pathlib import Path
import time
time_now = str(datetime.datetime.now()).split()[0]
log_file_name_to_write = 'test.log'
log_file_name_to_read = 'NR_' + time_now + '.log'
logging.basicConfig(level=logging.INFO, filename=log_file_name_to_write, filemode='a',format='%(asctime)s - %(levelname)s - %(message)s')
R2M_RecipeRun_path = '/usr/local/disk2/unix_nt/R2M/RecipeRun/'
R2M_dir = '/usr/local/disk2/unix_nt/R2M'
metro_callback_format = 'Metro_Reveal_'
class recipe:
name = ''
script_name = ''
wafer_num = int
images = 0
metrology_images = 0
images_output_dir_path = '/usr/local/insight/results/images/toolsDB/lauto_ptest_s/' + time_now + 'w' + str(recipe.wafer_num)
def set_recipe_name(name):
recipe.name = name
logging.info("Analyzing the recipe: " + recipe.name)
if recipe.name == 'sanity_' + time_now + '_Ver_5.1':
recipe.script_name = 'Sanity_CS.py'
recipe.wafer_num = 1
elif recipe.name == 'R2M_E2E_NR_Ver_5.1':
recipe.script_name = 'R2M.py'
recipe.wafer_num = 2
elif recipe.name == 'R2M_E2E_delayer_NR_Ver_5.1':
recipe.script_name = 'R2M_delayer.py'
recipe.wafer_num = 3
# ***********************************************************************************************
# ***********************************************************************************************
# This is the function that makes my trouble:
def is_results_images_directory_exist():
images_directory_path = Path('/usr/local/insight/results/images/toolsDB/lauto_ptest_s' + time_now + '/w' + wafer_num)
print(images_directory_path)
is_directory = os.path.isdir(images_directory_path)
print(is_directory)
if not is_directory:
logging.error("There is no images directory for the current recipe at results")
return False
return True

Can I use abstract methods to import file-specific formatting of (Python) pandas data?

I have a class FileSet with a method _process_series, which contains a bunch of if-elif blocks doing filetag-specific formatting of different pandas.Series:
elif filetag == "EntityA":
ps[filetag+"_Id"] = str(ps[filetag+"_Id"]).strip()
ps[filetag+"_DateOfBirth"] = str(pd.to_datetime(ps[filetag+"_DateOfBirth"]).strftime('%Y-%m-%d')).strip()
ps[filetag+"_FirstName"] = str(ps[filetag+"_FirstName"]).strip().capitalize()
ps[filetag+"_LastName"] = str(ps[filetag+"_LastName"]).strip().capitalize()
ps[filetag+"_Age"] = relativedelta(datetime.today(), datetime.strptime(ps[filetag+"_DateOfBirth"], "%Y-%m-%d")).years
return ps
I'd like to define an abstract format method in the class and keep these blocks of formatting in separate modules that are imported when _process_series is called for a given filetag. Forgive the pseudo-code, but something like:
for tag in filetag:
from my_formatters import tag+'_formatter' as fmt
ps = self.format(pandas_series, fmt)
return ps
And the module would contain the formatting block:
# my_formatters.EntityA_formatter
ps[filetag+"_Id"] = str(ps[filetag+"_Id"]).strip()
ps[filetag+"_DateOfBirth"] = str(pd.to_datetime(ps[filetag+"_DateOfBirth"]).strftime('%Y-%m-%d')).strip()
ps[filetag+"_FirstName"] = str(ps[filetag+"_FirstName"]).strip().capitalize()
ps[filetag+"_LastName"] = str(ps[filetag+"_LastName"]).strip().capitalize()
ps[filetag+"_Age"] = relativedelta(datetime.today(), datetime.strptime(ps[filetag+"_DateOfBirth"], "%Y-%m-%d")).years
return ps

You can create a function in it's own .py file and import it. If you create the same function in each file you can then call it.
here is f1.py:
def gimme():
return 'format 1'
here is f2.py:
def gimme():
return 'format 2'
Then you main file:
module_names = ['f1','f2']
for module_name in module_names:
import_test = __import__(module_name)
result = import_test.gimme()
result = import_test.gimme()
print(result)
Which gives the output:
format 1
format 2

Why not use globals with asterisk:
from my_formatters import *
for tag in filetag:
fmt = globals()[tag + '_formatter']
ps = self.format(pandas_series, fmt)
return ps
I converted your pseudocode to real code.
globals documentation:
Return a dictionary representing the current global symbol table. This is always the dictionary of the current module (inside a function or method, this is the module where it is defined, not the module from which it is called).

Your psuedocode could be made into real code like so:
import my_formatters
for tag in filetag:
fmt = getattr(my_formatters, tag + '_formatter')
ps = self.format(pandas_series, fmt)
return ps

Fixing faulty unicode strings

A faulty unicode string is one that has accidentally encoded bytes in it.
For example:
Text: שלום, Windows-1255-encoded: \x99\x8c\x85\x8d, Unicode: u'\u05e9\u05dc\u05d5\u05dd', Faulty Unicode: u'\x99\x8c\x85\x8d'
I sometimes bump into such strings when parsing ID3 tags in MP3 files. How can I fix these strings? (e.g. convert u'\x99\x8c\x85\x8d' into u'\u05e9\u05dc\u05d5\u05dd')

You could convert u'\x99\x8c\x85\x8d' to '\x99\x8c\x85\x8d' using the latin-1 encoding:
In [9]: x = u'\x99\x8c\x85\x8d'
In [10]: x.encode('latin-1')
Out[10]: '\x99\x8c\x85\x8d'
However, it seems like this is not a valid Windows-1255-encoded string. Did you perhaps mean '\xf9\xec\xe5\xed'? If so, then
In [22]: x = u'\xf9\xec\xe5\xed'
In [23]: x.encode('latin-1').decode('cp1255')
Out[23]: u'\u05e9\u05dc\u05d5\u05dd'
converts u'\xf9\xec\xe5\xed' to u'\u05e9\u05dc\u05d5\u05dd' which matches the desired unicode you posted.
If you really want to convert u'\x99\x8c\x85\x8d' into u'\u05e9\u05dc\u05d5\u05dd', then this happens to work:
In [27]: u'\x99\x8c\x85\x8d'.encode('latin-1').decode('cp862')
Out[27]: u'\u05e9\u05dc\u05d5\u05dd'
The above encoding/decoding chain was found using this script:
guess_chain_encodings.py
"""
Usage example: guess_chain_encodings.py "u'баба'" "u'\xe1\xe0\xe1\xe0'"
"""
import six
import argparse
import binascii
import zlib
import utils_string as us
import ast
import collections
import itertools
import random
encodings = us.all_encodings()
Errors = (IOError, UnicodeEncodeError, UnicodeError, LookupError,
TypeError, ValueError, binascii.Error, zlib.error)
def breadth_first_search(text, all = False):
seen = set()
tasks = collections.deque()
tasks.append(([], text))
while tasks:
encs, text = tasks.popleft()
for enc, newtext in candidates(text):
if repr(newtext) not in seen:
if not all:
seen.add(repr(newtext))
newtask = encs+[enc], newtext
tasks.append(newtask)
yield newtask
def candidates(text):
f = text.encode if isinstance(text, six.text_type) else text.decode
results = []
for enc in encodings:
try:
results.append((enc, f(enc)))
except Errors as err:
pass
random.shuffle(results)
for r in results:
yield r
def fmt(encs, text):
encode_decode = itertools.cycle(['encode', 'decode'])
if not isinstance(text, six.text_type):
next(encode_decode)
chain = '.'.join( "{f}('{e}')".format(f = func, e = enc)
for enc, func in zip(encs, encode_decode) )
return '{t!r}.{c}'.format(t = text, c = chain)
def main():
parser = argparse.ArgumentParser()
parser.add_argument('start', type = ast.literal_eval, help = 'starting unicode')
parser.add_argument('stop', type = ast.literal_eval, help = 'ending unicode')
parser.add_argument('--all', '-a', action = 'store_true')
args = parser.parse_args()
min_len = None
for encs, text in breadth_first_search(args.start, args.all):
if min_len is not None and len(encs) > min_len:
break
if type(text) == type(args.stop) and text == args.stop:
print(fmt(encs, args.start))
min_len = len(encs)
if __name__ == '__main__':
main()
Running
% guess_chain_encodings.py "u'\x99\x8c\x85\x8d'" "u'\u05e9\u05dc\u05d5\u05dd'" --all
yields
u'\x99\x8c\x85\x8d'.encode('latin_1').decode('cp862')
u'\x99\x8c\x85\x8d'.encode('charmap').decode('cp862')
u'\x99\x8c\x85\x8d'.encode('rot_13').decode('cp856')
etc.

Parse xml file while a tag is missing

I try to parse an xml file. The text which is in tags is parsed successfully (or it seems so) but I want to output as the text which is not contained in some tags and the following program just ignores it.
from xml.etree.ElementTree import XMLTreeBuilder
class HtmlLatex: # The target object of the parser
out = ''
var = ''
def start(self, tag, attrib): # Called for each opening tag.
pass
def end(self, tag): # Called for each closing tag.
if tag == 'i':
self.out += self.var
elif tag == 'sub':
self.out += '_{' + self.var + '}'
elif tag == 'sup':
self.out += '^{' + self.var + '}'
else:
self.out += self.var
def data(self, data):
self.var = data
def close(self):
print(self.out)
if __name__ == '__main__':
target = HtmlLatex()
parser = XMLTreeBuilder(target=target)
text = ''
with open('input.txt') as f1:
text = f1.read()
print(text)
parser.feed(text)
parser.close()
A part of the input I want to parse:
<p><i>p</i><sub>0</sub> = (<i>m</i><sup>3</sup>+(2<i>l</i><sub>2</sub>+<i>l</i><sub>1</sub>) <i>m</i><sup>2</sup>+(<i>l</i><sub>2</sub><sup>2</sup>+2<i>l</i><sub>1</sub> <i>l</i><sub>2</sub>+<i>l</i><sub>1</sub><sup>2</sup>) <i>m</i>) /(<i>m</i><sup>3</sup>+(3<i>l</i><sub>2</sub>+2<i>l</i><sub>1</sub>) ) }.</p>

Have a look at BeautifulSoup, a python library for parsing, navigating and manipulating html and xml. It has a handy interface and might solve your problem ...

Here's a pyparsing version - I hope the comments are sufficiently explanatory.
src = """<p><i>p</i><sub>0</sub> = (<i>m</i><sup>3</sup>+(2<i>l</i><sub>2</sub>+<i>l</i><sub>1</sub>) """ \
"""<i>m</i><sup>2</sup>+(<i>l</i><sub>2</sub><sup>2</sup>+2<i>l</i><sub>1</sub> <i>l</i><sub>2</sub>+""" \
"""<i>l</i><sub>1</sub><sup>2</sup>) <i>m</i>) /(<i>m</i><sup>3</sup>+(3<i>l</i><sub>2</sub>+""" \
"""2<i>l</i><sub>1</sub>) ) }.</p>"""
from pyparsing import makeHTMLTags, anyOpenTag, anyCloseTag, Suppress, replaceWith
# set up tag matching for <sub> and <sup> tags
SUB,endSUB = makeHTMLTags("sub")
SUP,endSUP = makeHTMLTags("sup")
# all other tags will be suppressed from the output
ANY,endANY = map(Suppress,(anyOpenTag,anyCloseTag))
SUB.setParseAction(replaceWith("_{"))
SUP.setParseAction(replaceWith("^{"))
endSUB.setParseAction(replaceWith("}"))
endSUP.setParseAction(replaceWith("}"))
transformer = (SUB | endSUB | SUP | endSUP | ANY | endANY)
# now use the transformer to apply these transforms to the input string
print transformer.transformString(src)
Gives
p_{0} = (m^{3}+(2l_{2}+l_{1}) m^{2}+(l_{2}^{2}+2l_{1} l_{2}+l_{1}^{2}) m) /(m^{3}+(3l_{2}+2l_{1}) ) }.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

programmatically transform python code via lib2to3 - python

I found the solution: the obj2.prefix should not be set to "", this removes the space before the object.

Related

My Merkle Tree calculation does not match the actual one

a filed in class doesn't initialize in a python function

Can I use abstract methods to import file-specific formatting of (Python) pandas data?

Fixing faulty unicode strings

Parse xml file while a tag is missing

Categories

Resources