I want to write a tool in Python to prepare a simulation study by creating for each simulation run a folder and a configuration file with some run-specific parameters.
study/
study.conf
run1
run.conf
run2
run.conf
The tool should read the overall study configuration from a file including (1) static parameters (key-value pairs), (2) lists for iteration parameters, and (3) some small code snippets to calculate further parameters from the previous ones. The latter are run specific depending on the permutation of the iteration parameters used.
Before writing the run.conf files from a template, I need to run some code like this to determine the specific key-value pairs from the code snippets for that run
code = compile(code_str, 'foo.py', 'exec')
rv=eval(code, context, { })
However, as this is confirmed by the Python documentation, this just leads to a None as return value.
The code string and context dictionary in the example are filled elsewhere. For this discussion, this snippet should do it:
code_str="""import math
math.sqrt(width**2 + height**2)
"""
context = {
'width' : 30,
'height' : 10
}
I have done this before in Perl and Java+JavaScript. There, you just give the code snippet to some evaluation function or script engine and get in return a value (object) from the last executed statement -- not a big issue.
Now, in Python I struggle with the fact that eval() is too narrow just allowing one statement and exec() doesn't return values in general. I need to import modules and sometimes do some slightly more complex calculations, e.g., 5 lines of code.
Isn't there a better solution that I don't see at the moment?
During my research, I found some very good discussions about Pyhton eval() and exec() and also some tricky solutions to circumvent the issue by going via the stdout and parsing the return value from there. The latter would do it, but is not very nice and already 5 years old.
The exec function will modify the global parameter (dict) passed to it. So you can use the code below
code_str="""import math
Result1 = math.sqrt(width**2 + height**2)
"""
context = {
'width' : 30,
'height' : 10
}
exec(code_str, context)
print (context['Result1']) # 31.6
Every variable code_str created will end up with a key:value pair in the context dictionary. So the dict is the "object" like you mentioned in JavaScript.
Edit1:
If you only need the result of the last line in code_str and try to prevent something like Result1=..., try the below code
code_str="""import math
math.sqrt(width**2 + height**2)
"""
context = { 'width' : 30, 'height' : 10 }
lines = [l for l in code_str.split('\n') if l.strip()]
lines[-1] = '__myresult__='+lines[-1]
exec('\n'.join(lines), context)
print (context['__myresult__'])
This approach is not as robust as the former one, but should work for most case. If you need to manipulate the code in a sophisticated way, please take a look at the Abstract Syntax Trees
Since this whole exec() / eval() thing in Python is a bit weird ... I have chose to re-design the whole thing based on a proposal in the comments to my question (thanks #jonrsharpe).
Now, the whole study specification is a .py module that the user can edit. From there, the configuration setup is directly written to a central object of the whole package. On tool runs, the configuration module is imported using the code below
import imp
# import the configuration as a module
(path, name) = os.path.split(filename)
(name, _) = os.path.splitext(name)
(file, filename, data) = imp.find_module(name, [path])
try:
module = imp.load_module(name, file, filename, data)
except ImportError as e:
print(e)
sys.exit(1)
finally:
file.close()
I came across similar needs, and finally figured out a approach by playing with ast:
import ast
code = """
def tf(n):
return n*n
r=tf(3)
{"vvv": tf(5)}
"""
ast_ = ast.parse(code, '<code>', 'exec')
final_expr = None
for field_ in ast.iter_fields(ast_):
if 'body' != field_[0]: continue
if len(field_[1]) > 0 and isinstance(field_[1][-1], ast.Expr):
final_expr = ast.Expression()
final_expr.body = field_[1].pop().value
ld = {}
rv = None
exec(compile(ast_, '<code>', 'exec'), None, ld)
if final_expr:
rv = eval(compile(final_expr, '<code>', 'eval'), None, ld)
print('got locals: {}'.format(ld))
print('got return: {}'.format(rv))
It'll eval instead of exec the last clause if it's an expression, or have all execed and return None.
Output:
got locals: {'tf': <function tf at 0x10103a268>, 'r': 9}
got return: {'vvv': 25}
Related
Example is better than long speech:
import inspect
def foo(param1, lambda_ref):
_ = param1
print(str(inspect.getsource(lambda_ref)))
foo(param1=0,
lambda_ref=lambda:
1 +
2)
output :
lambda_ref=lambda:
The wanted output would be :
foo(lambda_ref=lambda:
1 +
2)
Or just the lambda code itself, but if one get all lines at least its possible to just keep whats needed. It works if param0 is not here.
Obviously in my case lambda code is much more longer so that its easier to read and understand if a line break is inserted just after the lambda: keyword. Is that a bug ? get_source() doc is pretty vague about corner cases.
EDIT:
bug is still there if getsourceslines() is used instead of getsource(). the output is :
([' lambda_ref=lambda:\n'], 10)
Find out by myself that the inspect package consider that "lambdas always end at the first NEWLINE" line 957 of inspect.py. Cannot tell if its a python rule or over interpreted by inspect.
I'm using Python's xml.parsers.expat to parse some xml data, but sometimes the tags doesn't seem to be parsed correctly. I wonder if anyone knows if why setting buffer_text and / or buffer_size might actually change whether the data body gets parsed correctly?
A sanitized example, used to parse Azure blob list:
from xml.parsers.expat import ParserCreate, ExpatError, errors
class MyParser:
def __init__(self, xmlstring):
self.current_blob_name = None
self.current_blob_hash = None
self.get_hash = False
self.get_filename = False
self.xmlstring = xmlstring
self.p = ParserCreate()
#### LINE A ####
self.p.buffer_text = True
#### LINE B ####
# self.p.buffer_size = 24
self.p.StartElementHandler = self._start_element
self.p.EndElementHandler = self._end_element
self.p.CharacterDataHandler = self._char_data
def _reset_current_blob(self):
self.current_blob_name = None
self.current_blob_hash = None
# 3 handler functions
def _start_element(self, name, attrs):
if name == 'Blob':
self._reset_current_blob()
elif name == 'Name':
self.get_filename = True
elif name == 'Content-MD5':
self.get_hash = True
def _end_element(self, name):
if name == 'Blob':
...
elif name == 'Name':
self.get_filename = False
elif name == 'Content-MD5':
self.get_hash = False
def _char_data(self, data):
if self.get_hash is True:
self.current_blob_hash = data
if self.get_filename is True:
try:
...
except Exception as e:
print('Error parsing blob name from Azure.')
print(e)
def run(self):
self.p.Parse(self.xmlstring)
Most of the time things work just fine. However, sometimes after I upload a new item to Azure and that the xml returned changes, The error "Error parsing blob name from Azure" will get triggered, since some operations in the try block gets unexpected data. Upon further inspection, I observe that one potential error happens when data is supposed to be abcdefg_123/hijk/test.jpg, only abcdefg got passed into _char_data, while _123/hijk/test.jpg gets passed into another item. There is nothing wrong with the xml returned from Azure though.
I've experimented with a few things:
Added LINE A (buffer_text = True). It fixed the issue this time but I do not know if it is just luck (e.g. whether it would break again if I upload a new item to Azure);
If both LINE A and LINE B are added - things break again (I've also experimented with different buffer sizes; all seem to break the parser);
The way it breaks is deterministic: I can rerun the code all I want and it is always the exact element that gets parsed incorrectly;
If I remove that element from the original xml data, depending on how much other data I remove along the way, it could lead to another element being parsed incorrectly, or it could lead to a "fix" / no error. It does look like it has something to do with the total length of the data returned from Azure. My data is quite long (3537860 characters at this moment), and if I remove a fixed length from anywhere in the xml data, once the length reach a certain value, no more errors occur.
Unfortunately I am not able to share the raw xml I am parsing since it would take too much effort to sanitize, but hopefully this contains enough information! Let me know if I can add anything else.
Thanks!
Thanks to Walter:
https://bugs.python.org/msg385104
Just a guess, but the buffer size might be so small that the text that you expect gets passed via two calls to _char_data(). You should refactor your code the simply collect all the text in _char_data() and act on it in the _end_element() handler.
So this probably isn't a bug in xml.parsers.expat.
This fix is working for me so far. What I did not understand is that I thought the default buffer was 24, but if I set it to say 48 (a larger buffer), I would again see the issue (before this fix). In any case, collecting all data in the _char_data() section does make sense to me.
So, I have a function which basically does this:
import os
import json
import requests
from openpyxl import load_workbook
def function(data):
statuslist = []
for i in range(len(data[0])):
result = performOperation(data[0][i])
if result in satisfying_results:
print("its okay")
statuslist.append("Pass")
else:
print("absolutely not okay")
statuslist.append("Fail" + result)
return statuslist
Then, I invoke the function like this (I've added error handling to check what will happen after stumbling upon the reason for me asking this question), and was actually amazed by the results, as the function returns None, and then executes:
statuslist = function(data)
print(statuslist)
try:
for i in range(len(statuslist)):
anotherFunction(i)
print("Confirmation that it is working")
except TypeError:
print("This is utterly nonsense I think")
The output of the program is then as follows:
None
This is utterly nonsense I think
its okay
its okay
its okay
absolutely not okay
its okay
There is only single return statement at the end of the function, the function is not recursive, its pretty straightforward and top-down(but parses a lot of data in the meantime).
From the output log, it appears that the function first returns None, and then is properly executed. I am puzzled, and I were unable to find any similar problems over the internet (maybe I phrase the question incorrectly).
Even if there were some inconsistency in the code, I'd still expect it to return [] instead.
After changing the initial list to statuslist = ["WTF"], the return is [].
To rule out the fact that I have modified the list in some other functions performed in the function(data), I have changed the name of the initial list several times - the results are consistently beyond my comprehension
I will be very grateful on tips in debugging the issue. Why does the function return the value first, and is executed after?
While being unable to write the code which would at the same time present what happened in my code in full spectrum, be readable, and wouldn't interfere with no security policies of the company, I have re-wrote it in a simpler form (the original code has been written while I had 3 months of programming experience), and the issue does not reproduce anymore. I guess there had be some level of nesting of functions that I have misinterpreted, and this re-written code, doing pretty much the same, correctly returns me the expected list.
Thank you everyone for your time and suggestions.
So, the answer appears to be: You do not understand your own code, make it simpler.
Just trying to learn and I"m wondering if multiprocessing would speed
up this for loop ,.. trying to compare
alexa_white_list(1,000,000 lines) and
dnsMISP (can get up to 160,000 lines)
Code checks each line in dnsMISP and looks for it in alexa_white_list.
if it doesn't see it, it adds it to blacklist.
Without mp_handler function the code works fine but it takes
around 40-45 minutes. For brevity, I've omitted all the other imports and
the function that pulls down and unzips the alexa white list.
The below gives me the following error -
File "./vetdns.py", line 128, in mp_handler
p.map(dns_check,dnsMISP,alexa_white_list)
NameError: global name 'dnsMISP' is not defined
from multiprocessing import Pool
def dns_check():
awl = []
blacklist = []
ctr = 0
dnsMISP = open(INPUT_FILE,"r")
dns_misp_lines = dnsMISP.readlines()
dnsMISP.close()
alexa_white_list = open(outname, 'r')
alexa_white_list_lines = alexa_white_list.readlines()
alexa_white_list.close()
print "converting awl to proper format"
for line in alexa_white_list_lines:
awl.append(".".join(line.split(".")[-2:]).strip())
print "done"
for host in dns_misp_lines:
host = host.strip()
host = ".".join(host.split(".")[-2:])
if not host in awl:
blacklist.append(host)
file_out = open(FULL_FILENAME,"w")
file_out.write("\n".join(blacklist))
file_out.close()
def mp_handler():
p = Pool(2)
p.map(dns_check,dnsMISP,alexa_white_list)
if __name__ =='__main__':
mp_handler()
If I label it as global etc I still get the error. I'd appreciate any
suggestions!!
There's no need for multiprocessing here. In fact this code can be greatly simplified:
def get_host_form_line(line):
return line.strip().split(".", 1)[-1]
def dns_check():
with open('alexa.txt') as alexa:
awl = {get_host_from_line(line) for line in alexa}
blacklist = []
with open(INPUT_FILE, "r") as dns_misp_lines:
for line in dns_misp_lines:
host = get_host_from_line(line)
if host not in awl:
blacklist.append(host)
with open(FULL_FILENAME,"w") as file_out:
file_out.write("\n".join(blacklist))
Using a set comprehension to create your Alexa collection has the advantage of being O(1) lookup time. Sets are similar to dictionaries. They are pretty much dictionaries that only have keys with no values. There is some additional overhead in memory and the initial creation time will likely be slower since the values you put in to a set need to be hashed and hash collisions dealt with but the increase in performance you gain from the faster look ups should make up for it.
You can also clean up your line parsing. split() takes an additional parameter that will limit the number of times the input is split. I'm assuming your lines look something like this:
http://www.something.com and you want something.com (if this isn't the case let me know)
It's important to remember that the in operator isn't magic. When you use it to check membership (is an element in the list) what it's essentially doing under the hood is this:
for element in list:
if element == input:
return True
return False
So every time in your code you did if element in list your program had to iterate across each element until it either found what you were looking for or got to the end. This was probably the biggest bottleneck of your code.
You tried to read a variable named dnsMISP to pass as an argument to Pool.map. It doesn't exist in local or global scope (where do you think it's coming from?), so you got a NameError. This has nothing to do with multiprocessing; you could just type a line with nothing but:
dnsMISP
and have the same error.
First off, I found the following two similar questions:
Passing Structure to Windows API in python ctypes
ctypes and passing a by reference to a function
The first does not have an accepted answer, and I do not think that I'm doing anything in separate processes. The second simply points out pointer() and byref(), both of which I have tried using to no avail.
Now, on to my question:
I am trying to call the function WERReportCreate with my own pReportInformation (which is a pointer to a struct whose first data value is its own size). This fails in various ways, depending on how I go about it, but I'm not sure how to do it correctly. It is complicated by the fact that one of the requirements is that the structure know it's own size, which I'm not sure how to programatically determine (though if that was the only issue, I think I would have guessed the right value by now). The relevant information from the WER API is shown below:
HRESULT WINAPI WerReportCreate(
__in PCWSTR pwzEventType,
__in WER_REPORT_TYPE repType,
__in_opt PWER_REPORT_INFORMATION pReportInformation,
__out HREPORT *phReportHandle
);
(full info at http://msdn.microsoft.com/en-us/library/windows/desktop/bb513625%28v=vs.85%29.aspx)
typedef struct _WER_REPORT_INFORMATION {
DWORD dwSize;
HANDLE hProcess;
WCHAR wzConsentKey[64];
WCHAR wzFriendlyEventName[128];
WCHAR wzApplicationName[128];
WCHAR wzApplicationPath[MAX_PATH];
WCHAR wzDescription[512];
HWND hwndParent;
} WER_REPORT_INFORMATION, *PWER_REPORT_INFORMATION;
(full info at http://msdn.microsoft.com/en-us/library/windows/desktop/bb513637%28v=vs.85%29.aspx)
This is the code that I have tried:
import ctypes
import ctypes.wintypes
class ReportInfo( ctypes.Structure):
_fields_ = [ ("dwSize", ctypes.wintypes.DWORD),
("hProcess", ctypes.wintypes.HANDLE),
("wzConsentKey", ctypes.wintypes.WCHAR * 64),
("wzFriendlyEventName", ctypes.wintypes.WCHAR * 128),
("wzApplicationName", ctypes.wintypes.WCHAR * 128),
("wzApplicationPath", ctypes.wintypes.WCHAR * ctypes.wintypes.MAX_PATH),
("wzDescription", ctypes.wintypes.WCHAR * 512),
("hwndParent", ctypes.wintypes.HWND) ]
def genReportInfo():
import os
size = 32 #Complete SWAG, have tried many values
process = os.getpid()
parentwindow = ctypes.windll.user32.GetParent(process)
werreportinfopointer = ctypes.POINTER(ReportInfo)
p_werinfo = werreportinfopointer()
p_werinfo = ReportInfo(size, process, "consentkey", "friendlyeventname", "appname", "apppath", "desc", parentwindow)
return p_werinfo
if __name__ == '__main__':
reporthandle = ctypes.wintypes.HANDLE()
res = ctypes.wintypes.HRESULT()
### First pass NULL in as optional parameter to get default behavior ###
p_werinfo = None
res = ctypes.windll.wer.WerReportCreate(u'pwzEventType', 2, p_werinfo, ctypes.byref(reporthandle))
print "Return Code",res,"\nHandle",reporthandle #Return Code 0, reporthandle is correct (verified by submitting report in a different test)
p_werinfo = genReportInfo() # Create our own struct
### Try Again Using Our Own Struct (via 'byref') ###
res = ctypes.windll.wer.WerReportCreate(u'pwzEventType', 2, ctypes.byref(p_werinfo), ctypes.byref(reporthandle))
print "Return Code",res,"\nHandle",reporthandle #Return Code Nonzero, reporthandle is None
### Try Again Using Our Own Struct (directly) ###
res = ctypes.windll.wer.WerReportCreate(u'pwzEventType', 2, p_werinfo, ctypes.byref(reporthandle))
print "Return Code",res,"\nHandle",reporthandle #Exception Occurs, Execution halts
And this is the output I get:
Return Code 0
Handle c_void_p(26085328)
Return Code -2147024809
Handle c_void_p(None)
Traceback (most recent call last):
File "test.py", line 40, in <module>
res = ctypes.windll.wer.WerReportCreate(u'pwzEventType', s.byref(reporthandle))
WindowsError: exception: access violation writing 0x0000087C
The fact that it works when I pass in a null, but not when I actually pass my (reference to my?) structure suggests to me I have one of three problems: I am not creating the structure correctly (I'm not certain the wzConsentKey is correctly defined), or I am not correctly figuring out the struct's size (I'm actually using struct.calcsize with various options to get initial guesses, and adding and subtracting 1 randomly), or I am not correctly passing the (reference to the?) structure.
Here is where I've hit a deadend. Any help would be appreciated (as well as suggestions for how to improve the clarity, formatting, or quality of my question; this is my first post).
The general short answer to the posted question is: Be sure you are putting the correct information into the structure, other than that, the provided code is a good example of creating and passing a structure. Here's what solved my problem specifically:
There were two issues with the provided code: First, as Mark Tolonen pointed out, I was passing an incorrect size. Using ctypes.sizeof(ReportInfo) solved that problem. The second issue was that I was using a process ID where a process handle was required. Using OpenProcess to obtain a valid process handle in place of my "process" argument solved the second problem.
To anyone debugging similar issues in the future, printing the HRESULTS out as hex numbers rather than integers to make better sense of the return codes:
print "Return Code %08x" % (res & 0xffffffff)
This, in my case, produced the following results:
Return Code 80070057
for my original error, and
Return Code 80070006
for the second error. Using the information at http://msdn.microsoft.com/en-us/library/bb446131.aspx , I saw the first half was metadata, the second half was my actual error code.
After converting the Error Code part of the Hex number back to decimal, I used http://msdn.microsoft.com/en-us/library/bb202810.aspx to determine that
Error Code 87 (57 in hex) meant "Parameter Incorrect" (size was wrong)
and
Error Code 6 (6 in hex) meant "The Handle is Invalid" (I was passing in a process ID).
You can use ctypes.sizeof(ReportInfo) to obtains the size in bytes of the structure.
Simply create the ReportInfo instance with genReportInfo. You don't need a pointer at this point:
def genReportInfo():
import os
size = ctypes.sizeof(ReportInfo)
process = os.getpid()
parentwindow = ctypes.windll.user32.GetParent(process)
return ReportInfo(size, process, "consentkey", "friendlyeventname", "appname", "apppath", "desc", parentwindow)
Call WerReportCreate like this. byref passes the pointer to the ReportInfo instance.
werinfo = genReportInfo()
res = ctypes.windll.wer.WerReportCreate(u'pwzEventType', 2, ctypes.byref(werinfo), ctypes.byref(reporthandle))
I think that will work for you. I don't have wer.dll so I can't test it.