So I'm new to Python and I need some help on how to improve my life. I learned Python for work and need to cut my workload a little. I have three different scripts which I run around 5 copies of at the same time all the time, they read XML data and add in information etc... However, when I make a change to a script I have to change the 5 other files too, which is annoying after a while. I can't just run the same script 5 times because each file needs some different parameters which I store as variables at the start in every script (different filepaths...).
But I'm sure theres a much better way out there?
A very small example:
script1.py
xml.open('c:\file1.xls')
while True:
do script...
script2.py
xml.open('c:\file2.xls')
while True:
do exactley the same script...
etc...
You'll want to learn about Python functions and modules.
A function is the solution to your problem: it bundles some functionality and allows you to call it to run it, with only minor differences passed as a parameter:
def do_something_with_my_sheet(name):
xml.open(name)
while True:
do script...
Elsewhere in your script, you can just call the function:
do_something_with_my_sheet('c:\file1.xls')
Now, if you want to use the same function from multiple other scripts, you can put the function in a module and import it from both scripts. For example:
This is my_module.py:
def do_something_with_my_sheet(name):
xml.open(name)
while True:
do script...
This is script1.py:
import my_module
my_module.do_something_with_my_sheet('c:\file1.xls')
And this could be script2.py (showing a different style of import):
from my_module import do_something_with_my_sheet
do_something_with_my_sheet('c:\file2.xls')
Note that the examples above assume you have everything sitting in a single folder, all the scripts in one place. You can separate stuff for easier reuse by putting your module in a package, but that's beyond the scope of this answer - look into it if you're curious.
You only need one script, that takes the name of the file as an argument:
import sys
xml.open(sys.argv[1])
while True:
do script...
Then run the script. Other variables can be passed as additional arguments, accessed via sys.argv[2], etc.
If there are many such parameters, it may be easier to save them in a configuration file, the pass the name of the configuration file as the single argument. Your script would then parse the file for all the information it needs.
For example, you might have a JSON file with contents like
{
"filename": "c:\file1.xls",
"some_param": 6,
"some_other_param": True
}
and your script would look like
import json
import sys
with open(sys.argv[1]) as f:
config = json.load(f)
xml.open(config['filename'])
while True:
do stuff using config['some_param'] and config['some_other_param']
Related
I have couple questions regarding running code from multiple files from one .py file
When I import the Files with
from [subfolder] import [First scriptname - without .py]
from [subfolder] import [Second scriptname - without .py]
the scripts start running instantly, like if my scripts have a print code it would look like this after running the combined-scripts file
print("Hi Im Script One")
print("Hi Im Script Two")
now I could put them in functions and run the functions but my files also have some variables that are not inside functions, the question I have is if there is a way to not start the script automatically after import?
Also what happens with variables inside these scripts, are they usable throughout the combined file or do i need to state them with something like "global"?
is there a way to not start the script automaticly after import?
This is what python's if __name__ == "__main__": is for.
Anything outside that (imports, etc.) will be run when the .py file is imported.
what happens with variables inside these scripts, are they usable troughout the combined file or do i need to state them with something like "global"?
They may be best put inside a class, or you can also (not sure how Pythonic/not this is):
from [FirstScriptName_without_.py] import [className_a], [functionName_b], [varName_c]
I want to make my own programming language based on python which will provide additional features that python wasn't provide, for example to make multiline anonymous function with custom syntax. I want my programming language is so simple to be used, just import my script, then I read the script file which is imported my script, then process it's code and stop anymore execution of the script which called my script to prevent error on syntax...
Let say there are 2 py file, main.py and MyLanguage.py
The main.py imported MyLanguage.py
Then how to get the main.py file from MyLanguage.py if main.py can be another name(Dynamic Name)?
Additional information:
I using python 3.4.4 on Windows 7
Like Colonder, I believe the project you have in mind is far more difficult than you imagine.
But, to get you started, here is how to get the main.py file from inside MyLanguage.py. If your importing module looks like this
# main.py
import MyLanguage
if __name__ == "__main__":
print("Hello world from main.py")
and the module it is importing looks like this, in Python 3:
#MyLanguage.py
import inspect
def caller_discoverer():
print('Importing file is', inspect.stack()[-1].filename)
caller_discoverer()
or (edit) like this, in Python 2:
#MyLanguage.py
import inspect
def caller_discoverer():
print 'Importing file is', inspect.stack()[-1][1]
caller_discoverer()
then the output you will get when you run main.py is
Importing file is E:/..blahblahblah../StackOverflow-3.6/48034902/main.py
Hello world from main.py
I believe this answers the question you asked, though I don't think it goes very far towards achieving what you want. The reason for my scepticism is simple: the import statement expects a file containing valid Python, and if you want to import a file with your own non-Python syntax, then you are going to have to do some very clever stuff with import hooks. Without that, your program will simply fail at the import statement with a syntax error.
Best of luck.
I recently discovered (Much to my surprised) you can call command line args in files other than the one that is explicitly called when you enter it.
So, you can run python file1.py abc in command line, and use sys.argv[1] to get the string 'abc' from within file2.py or file3.py.
I still feel like this shouldn't work, but I'm glad it does, since it saved me a lot of trouble.
But now I'd really appreciate an answer as to why/how this works. I had assumed that sys.argv[1] would be local to each file.
As for the how/why, sys is only imported once (when python starts up). When sys is imported, it's argv member gets populated with the commandline arguements. Subsequent import statements return the same sys module object so no matter where you import sys from, you'll always get the same object and therefore sys.argv will always be the same list no matter where you reference it in your application.
Whether you should be doing commandline parsing in more than one place is a different question. Generally, my answer would be "NO" unless you are only hacking together a script to work for the next 2 or 3 days. Anything that you expect to last should do all it's parsing up front (probably with a robust argument parser like argparse) and pass the data necessary for the various functions/classes to them from it's entry point.
I have written three simple scripts (which I will not post here, as they are part of my dissertation research) that are all in working order.
What I would like to do now is write a "batch-processing" script for them. I have many (read as potentially tens of thousands) of data files on which I want these scripts to act.
My questions about this process are as follows:
What is the most efficient way to go about this sort of thing?
I am relatively new to programming. Is there a simple way to do this, or is this a very complex endeavor?
Before anyone downvotes this question as "unresearched" or whatever negative connotation comes to mind, PLEASE just offer help. I have spent days reading documentation and following leads from Google searches, and it would be most appreciated if a human being could offer some input.
If you just need to have the scripts run, probably a shell script would be the easiest thing.
If you want to stay in Python, the best way would be to have a main() (or somesuch) function in each script (and have each script importable), have the batch script import the subscript and then run its main.
If staying in Python:
- your three scripts must have the .py ending to be importable
- they should either be in Python's search path, or the batch control script can set the path
- they should each have a main function (or whatever name you choose) that will activate that script
For example:
batch_script
import sys
sys.path.insert(0, '/location/of/subscripts')
import first_script
import second_script
import third_script
first_script.main('/location/of/files')
second_script.main('/location/of/files')
third_script.main('/location/of/files')
example sub_script
import os
import sys
import some_other_stuff
SOMETHING_IMPORTANT = 'a value'
def do_frobber(a_file):
...
def main(path_to_files):
all_files = os.listdir(path_to_files)
for file in all_files:
do_frobber(os.path.join(path_to_files, file)
if __name__ == '__main__':
main(sys.argv[1])
This way, your subscript can be run on its own, or called from the main script.
You can write a batch script in python using os.walk() to generate a list of the files and then process them one by one with your existing python programs.
import os, re
for root, dir, file in os.walk(/path/to/files):
for f in file:
if re.match('.*\.dat$', f):
run_existing_script1 root + "/" file
run_existing_script2 root + "/" file
If there are other files in the directory you might want to add a regex to ensure you only process the files you're interested in.
EDIT - added regular expression to ensure only files ending ".dat" are processed.
(Assume that: application start-up time is absolutely critical; my application is started a lot; my application runs in an environment in which importing is slower than usual; many files need to be imported; and compilation to .pyc files is not available.)
I would like to concatenate all the Python source files that define a collection of modules into a single new Python source file.
I would like the result of importing the new file to be as if I imported one of the original files (which would then import some more of the original files, and so on).
Is this possible?
Here is a rough, manual simulation of what a tool might produce when fed the source files for modules 'bar' and 'baz'. You would run such a tool prior to deploying the code.
__file__ = 'foo.py'
def _module(_name):
import types
mod = types.ModuleType(name)
mod.__file__ = __file__
sys.modules[module_name] = mod
return mod
def _bar_module():
def hello():
print 'Hello World! BAR'
mod = create_module('foo.bar')
mod.hello = hello
return mod
bar = _bar_module()
del _bar_module
def _baz_module():
def hello():
print 'Hello World! BAZ'
mod = create_module('foo.bar.baz')
mod.hello = hello
return mod
baz = _baz_module()
del _baz_module
And now you can:
from foo.bar import hello
hello()
This code doesn't take account of things like import statements and dependencies. Is there any existing code that will assemble source files using this, or some other technique?
This is very similar idea to tools being used to assemble and optimise JavaScript files before sending to the browser, where the latency of multiple HTTP requests hurts performance. In this Python case, it's the latency of importing hundreds of Python source files at startup which hurts.
If this is on google app engine as the tags indicate, make sure you are using this idiom
def main():
#do stuff
if __name__ == '__main__':
main()
Because GAE doesn't restart your app every request unless the .py has changed, it just runs main() again.
This trick lets you write CGI style apps without the startup performance hit
AppCaching
If a handler script provides a main()
routine, the runtime environment also
caches the script. Otherwise, the
handler script is loaded for every
request.
I think that due to the precompilation of Python files and some system caching, the speed up that you'll eventually get won't be measurable.
Doing this is unlikely to yield any performance benefits. You're still importing the same amount of Python code, just in fewer modules - and you're sacrificing all modularity for it.
A better approach would be to modify your code and/or libraries to only import things when needed, so that a minimum of required code is loaded for each request.
Without dealing with the question, whether or not this technique would boost up things at your environment, say you are right, here is what I would have done.
I would make a list of all my modules e.g.
my_files = ['foo', 'bar', 'baz']
I would then use os.path utilities to read all lines in all files under the source directory and writes them all into a new file, filtering all import foo|bar|baz lines since all code is now within a single file.
Of curse, at last adding the main() from __init__.py (if there is such) at the tail of the file.