I am trying to write a program that initializes a dictionary with web scraping, and then serializes the dictionary using pickle so that the program doesn't need to scrape after the first time it is run. My issue is that after calling pickle.dump(someDict, dictFile), no data is written to the file and the program actually terminates. The code I am using is below:
if not Path(getcwd() + "\\dic.pickle").is_file():
someDict = funcToScrapeDict()
with open("dic.pickle", "wb") as dic_file:
pickle.dump(someDict, dic_file)
else:
with open("dic.pickle", "rb") as dic_file:
someDict = pickle.load(dic_file)
<<lots more code here processing someDict>>
So if the pickle file already exists it will just jump to unpickling.
I know that my scraping function works inside the loop because I test printed it immediately before calling pickle.dump(someDict, dic_file), so termination happens immediately after that call with no bytes written to file (though the file is created) and no error messages.
I am on Windows 10 using python 3.7.1
I also increased recursion limit because of a previous runtime error and tried using absolute paths with no luck.
[EDIT] Also worth noting that I tried this exact implementation outside of the scope of my problem with just a manually created dictionary of equal size (280) and it worked fine.
Related
When I run the code below, I get different results when I run in .py compared to when I run in .exe using pyinstaller
import win32com.client
import os
ConfigMacroName = "test.xls"
xl=win32com.client.Dispatch("Excel.Application")
Configmacrowb = xl.Workbooks.Open(os.getcwd()+ "\\Completed\\" + ConfigMacroName)
SlotPlansheet = Configmacrowb.Sheets("SlotPlan")
Header = SlotPlansheet.Rows(1)
SOcol = Header.Find('SO', LookAt=1).Column #I used LookAt=1 which is equivalent to LookAt:=xlWhole in excel VBA
SOlinecol = Header.Find('SO Line').Column
print("SO is " + str(SOcol) + "\nSo line is " + str(SOlinecol))
SlotPlansheet = None
Configmacrowb.Close(False)
Configmacrowb = None
xl.Quit()
xl = None
The excel input
The output in .py
The output in .exe
The output in .py file is the correct output I need. If I run it in .exe there will be duplicate variable since they both will refer to column B. For temporary solution I can just loop through the header to check each cell.
But I'm using find() function a lot so I don't know if my other programs are also affected by this inconsistency
Try changing the object creation line to:
xl=win32com.client.gencache.EnsureDispatch(‘Excel.Application’)
In my experience, the win32com.client.Dispatch() function can sometimes cause issues in that it does not guarantee the same result every time it runs. The caller doesn't know if they have an early- or late-bound object. If you have no cached makepy files then you will get a late-bound IDispatch automation interface, but if win32com finds an early-bound interface then it will use it (even if it wasn't your programme that created it). Hence code that ran fine previously may stop working.
Unless you have a good reason to be indifferent, I think it is better to be explicit and choose win32com.client.gencache.EnsureDispatch() or win32com.client.dynamic.Dispatch() for early- or late-binding respectively. I generally choose the EnsureDispatch() route, as it is quicker, enforces case-sensitivity, and gives access to any constants in the type library (eg win32com.client.constants.xlWhole) rather than rely on 'magic' integers.
Also, in the past, I have experienced odd behaviour around indexing (eg this SO question), and this was cured by deleted any gencache wrappers (see below).
Add this line to your debug code:
print('Cache directory:',win32com.client.gencache.GetGeneratePath())
This will tell you where the gencache early-binding python files are being generated, and where win32com.client.Dispatch() will look for any cached wrapper files to attempt early-binding. If you want to clear the cached of generated files just delete the contents of this directory. It will be interesting to see if the OP's two routes have the same directory.
There is a python script start_test.py.
There is a second python script siple_test.py.
# pseudo code:
start_test.py --calls--> subprocess(python.exe simple_test.py, args_simple_test[])
The python interpreter for both scripts is the same. So instead of opening a new instance, I want to run simple_test.py directly from start_test.py. I need to preserve the sys.args environment. A nice to have would be to actually enter following code section in simple_test.py:
# file: simple_test.py
if __name__ == '__main__':
some_test_function()
Most important is, that the way should be a universal one, not depending on the content of the simple_test.py.
This setup would provide two benefits:
The call is much less resource intensive
The whole stack of simple_test.py can be debugged with pycharm
So, how do I execute the call of a python script, from a python script, without starting a new subprocess?
"Executing a script" is a somewhat blurry term.
Typically the if __name__== "__main__": part does the argument (sys.argv) decoding and then calls a worker function with explicit parameters. For clarity: It should not do anything else, since this additional work can't be called without creating a new process causing all the overhead you are trying to avoid.
You simply bypass that and call this implementing routine directly.
So you end up with start_test.py containing something like:
from simple_test import worker
# ...
worker(typed_arg1, typed_arg2)
I have a Python script that runs 24hrs a day.
A module from this script is using variables values that I wish to change from time to time, without having to stop the script, edit the module file, then launch the script again (I need to avoid interruptions as much as I can).
I thought about storing the variables in a separate file, and the module would, when needed, fetch the new values from the file and use them.
Pickle seemed a solution but is not human readable and therefore not easily changeable. Maybe a JSON file, or another .py file I import over again ?
Another advantage of doing so, for me, is that in case of interruption (eg. server restart), I can resume the script with the latest variable values if I load them from a separate file.
Is there a recommended way of doing such things ?
Something along the lines :
# variables file:
variable1 = 10
variable2 = 25
# main file:
while True:
import variables
print('Sum:', str(variable1+variable2))
time.sleep(60)
An easy way to maintain a text file with variables would be the YAML format. This answer explains how to use it, basically:
import yaml
stream = open("vars.yaml", "r")
docs = yaml.load_all(stream)
If you have more than a few variables, it may be good to check the file descriptor to see if the file was recently updated, and only re-load variables when there was a change in the file.
import os
last_updated = os.path.getmtime('vars.yaml')
Finally, since you want avoid interruption of the script, it may be good to have the script catch any errors in the YAML file and warn the user, instead of just throwing an exception and die. But also remember that "errors should never pass silently". What is the best approach here would depend on your use-case.
I am working on an extendscript code (Adobe After Effects - but it is basically just javascript) which needs to iterate over tens of thousands of file names on a server. This is extremely slow in extendscript but I can accomplish what I need to in just a few seconds with python, which is my preferred language anyway. So I would like to run a python file and return an array back into extendscript. I'm able to run my python file and pass an argument (the root folder) by creating and executing a batch file, but how would pass the result (an array) back into extendscript? I suppose I could write out a .csv and read this back in but that seems a bit "hacky".
In After Effects you can use the "system" object's callSystem() method. This gives you access to the system's shell so you can run any script from the code. So, you can write your python script that echos or prints the array and that is essentially what is returned by the system.callSystem() method. It's a synchronous call, so it has to complete before the next line in ExtendScript executes.
The actual code might by something like:
var stdOut = system.callSystem("python my-python-script.py")
The following code is written in the morph.py file:
with open("morph.py", "r+") as f:
old = f.read() # read everything in the file
f.seek(0,2) # rewind
f.write("1") # write the new line before
a="BAD"
a1="Worked"
print a
The idea is that the morph.py file will be rewritten, and the text "Worked" will be printed.
This is not the case, I think it has to do with how Python interpreter loads the files. The only thing that makes sense is that the whole file is loaded, and then run.
Can somebody shed some light? Is it even possible to have self morphing code in python?
Partially related question:
Self decompressing and executing code in python
Not in the way you're trying to do it.
Before Python starts executing any piece of code, it compiles it into a bytecode representation, which is much faster to execute than reading line-by-line. This means that after Python has compiled the file, no changes to the file will be reflected in currently-running code.
However, you can manually load code from strings by using compile, exec, or eval. You can use this to create a program that is passed its own source code, alters and returns it, and executes the modified source code.
When I run the file the first time it outputs:
BAD
When I run it a second time it outputs:
Worked
Any subsequent times it will give an error:
... name 'a11' is not defined
When you run python on a file, it will load the file, then convert it to bytecode, then execute the bytecode. The file has already undergone conversion when you change the file, so you see no effect.