Backing up/copying an entire folder tree in batch or python?

Backing up/copying an entire folder tree in batch or python? - python

I'm trying to copy an entire directory from one locations to another via python every 7 days to essentially make a backup...
The backup folder/tree folder may or may not exist so it needs to create the folder if it doesn't exist, that's why I assumed distutils is better suited over shutil
Note Is it better for me to use batch or some other language for the said job?
The following code:
import distutils
distutils.dir_util.copy_tree("C:\Users\A\Desktop\Test", "C:\Users\A\Desktop\test_new", preserve_mode=1, preserve_times=1, preserve_symlinks=0, update=1, verbose=0, dry_run=0)
Returns:
Traceback (most recent call last):
File "C:\Users\A\Desktop\test.py", line 2, in <module>
distutils.dir_util.copy_tree("C:\Users\A\Desktop\test", "C:\Users\A\Desktop\test2", preserve_mode=1, preserve_times=1, preserve_symlinks=0, update=1, verbose=0, dry_run=0)
AttributeError: 'module' object has no attribute 'dir_util'
What am I doing wrong?
Thanks in advance
- Hyflex

You need to import dir_util specifically to access it's functions:
from distutils import dir_util
If there are other modules in that package that you need, add them to the line, separated by commas. Only import the modules you need.

For Unix/Linux, I suggest 'rsync'.
For windows: xcopy

I've been attempting essentially the same thing to back up what I write on a plethora of virtual machines.
I ran into the same problem you did with distutils. From what I can tell, the Python community is using the distutils module to start standardizing how new modules interface with Python. I think they're still in the thick of it though as everything I've seen relating to it seems more complicated, not less complicated. Hopefully, I'm just seeing all the crazy that happens in the middle of a big change.
But I did figure out how to get it working. To use distutil.dir_util.copytree(),
>>> from distutils import dir_util
>>> dir_util.copy_tree("/home/user/backing_up/temp", "/home/user/backing_up/other")
['/home/user/backing_up/other/stuff.txt'] # Return value indicating success
If you feel like it's worthwhile, you can import distutils.core and make the longer call to distutils.dir_util.copy_tree().
>>> import distutils.core
>>> distutils.dir_util.copy_tree("/home/user/backing_up/temp", "/home/user/backing_up/other")
['/home/user/backing_up/other/stuff.txt'] # Return value indicating success
(I know, I know, there are subtle differences between "import module.submodule" and "from module import submodule" but that's not the intent of the question and so long as you're importing the correct stuff and calling the functions appropriately, it doesn't make a difference.)
Like you, I also explicitly stated that I wanted the default for preserve_mode and preserve_times, but I didn't touch the other variables. Everything worked as expected once I imported and called the function the way it wanted me to.
Now that my back up script works, I realize I should have written it in Bash since I plan on having it run whenever the machine goes to a specific runlevel. I'm using a wrapper instead now, even if I should just re-write it.

Related

python calling function from another file while using import from the main file

I'm trying to use multiple files in my programing and I ran into a problem.
I have two files: main.py, nu.py
The main file is:
import numpy
import nu
def numarray():
numpy.array(some code goes here)
nu.createarray()
The nu file is:
def createarray():
numpy.array(some code goes here)
When I run main I get an error:
File "D:\python\nu.py", line 2, in createarray
numpy.array(some code goes here)
NameError: name 'numpy' is not defined
numpy is just an exaple, I'm using about six imports.
As far as I see it, I have to import all moduls on all files, but it creating a problem where certain modules can't be loaded twice, it just hang.
What I'm doing wrong and how can I properly import functions from another file while using imported modules from the main file?
I hope i explain it well.
thanks for helping!

I have years in python and importing from other files is still a headache..
The problmen here is that you are not importing numpy in "nu.py".
But as you say sometimes it's a little annoying have to import al libraries in all files.
The last thing is, How do you get the error a module cannot be imported twice? can you give me an example?

In each separate python script if you are using a module within you need to import it to access. So you will need to 'import numpy' in your nu.py script like below
If possible try keeping the use of a module within a script so you dont have import the same multiple times, although this wont always be appropriate

What does ctypes.cdll.LoadLibrary really do?

I am a little confused with python3, ctypes.cdll.LoadLibrary.
Now see the following,I have a file named '_iterative.cpython-36m-x86_64-cygwin.dll', and I wrote a python script named '_iterative.py' to import it:
import ctypes
api = ctypes.cdll.LoadLibrary("_iterative.cpython-36m-x86_64-cygwin.dll")
Well the weird part is that when I just typed those command in Python REPL, and list api's __dir__, list current module's __dir__, that's different with the case when I use import.
To be more clear, see the pictures:
using REPL commands
use import
can anyone explain why?

Because doing import means something more than mere load of a dll.
While LoadLibrary does exactly that: just load a [binary] lib in a way you may be able to call something from there (no warranty).
Hence, with import you may (if the importee provides) get something (like those dir() or globals()) other than just a handle.
On the other side, LoadLibrary does not oblige you to be pythonic in the library you wanna use.

Unable to split up large Python module

In Python 2.7, I'm getting
'module' has no attribute
, and/or
'name' is not defined
errors when I try to split up a large python file.
(I have already read similar posts and the Python modules documentation)
Say you have a python file that is structured like this:
<imports>
<50 global variables defined>
<100 lengthy functions that each use most or all of the globals
defined above, and also call each other>
<main() that calls some of the functions and uses the globals>
So I can easily categorize groups of functions together, create a python file for each group, and put them there. The problem is whenever I try to call any of them from the main python file, I get the errors listed above. I think the problem is related to circular dependencies. Since all of the functions rely on the globals, and each other, they are circularly dependent.
If I have main_file.py, group_of_functions_1.py, and group_of_functions_2.py,
main_file.py will have:
import group_of_functions_1.py
import group_of_functions_2.py
and group_of_functions_1.py will have
import main_file.py
import group_of_functions_2.py
and group_of_functions_2.py will have
import main_file.py
import group_of_functions_1.py
Regardless of whether I use "import package_x" or "from package_x import *" the problem remains.
If I take the route of getting rid of the globals, then most of the functions will have 50 parameters they will be passing around which then also need to be returned
What is the right way to clean this up?
(I have already read similar posts and the Python modules documentation)

One of the sources of your errors is likely the following:
import group_of_functions_1.py
import group_of_functions_2.py
When importing, you don't add .py to the end of the module name. Do this instead:
import group_of_functions_1
import group_of_functions_2

Python: Importing an "import file"

I am importing a lot of different scripts, so at the top of my file it gets cluttered with import statements, i.e.:
from somewhere.fileA import ...
from somewhere.fileB import ...
from somewhere.fileC import ...
...
Is there a way to move all of these somewhere else and then all I have to do is import that file instead so it's just one clean import?

I strongly advise against what you want to do. You are doing the global include file mistake again. Although only one module is importing all your modules (as opposed to all modules importing the global one), the remaining point is that if there's a valid reason for all those modules to be collected under a common name, fine. If there's no reason, then they should be kept as separate includes. The reason is documentation. If I open your file, and see only one import, I don't get any information about what is imported and where it comes from. If on the other hand, I have the list of imports, I know at a glance what is needed and what not.
Also, there's another important error I assume you are doing. When you say
from somewhere.fileA import ...
from somewhere.fileB import ...
from somewhere.fileC import ...
I assume you are importing, for example, a class, like this
from somewhere.fileA import MyClass
this is wrong. This alternative solution is much better
from somewhere import fileA
<later>
a=fileA.MyClass()
Why? two reasons: first, namespacing. If you have two modules having a class named MyClass, you would have a clash. Second, documentation. Suppose you use the first option, and I find in your code the following line
a=MyClass()
now I have no idea where this MyClass comes from, and I will have to grep around all your files in order to find it. Having it qualified with the module name allows me to immediately understand where it comes from, and immediately find, via a /search, where stuff coming from the fileA module is used in your program.
Final note: when you say "fileA" you are doing a mistake. There are modules (or packages), not files. Modules map to files, and packages map to directories, but they may also map to egg files, and you may even create a module having no file at all. This is naming of concepts, and it's a lateral issue.

Of course there is; just create a file called myimports.py in the same directory where your main file is and put your imports there. Then you can simply use from myimports import * in your main script.

python refresh/reload

This is a very basic question - but I haven't been able to find an answer by searching online.
I am using python to control ArcGIS, and I have a simple python script, that calls some pre-written code.
However, when I make a change to the pre-written code, it does not appear to result in any change. I import this module, and have tried refreshing it, but nothing happens.
I've even moved the file it calls to another location, and the script still works fine. One thing I did yesterday was I added the folder where all my python files are to the sys path (using sys.append('path') ), and I wonder if that made a difference.
Thanks in advance, and sorry for the sloppy terminology.

It's unclear what you mean with "refresh", but the normal behavior of Python is that you need to restart the software for it to take a new look on a Python module and reread it.
If your changes isn't taken care of even after restart, then this is due to one of two errors:
The timestamp on the pyc-file is incorrect and some time in the future.
You are actually editing the wrong file.
You can with reload re-read a file even without restarting the software with the reload() command. Note that any variable pointing to anything in the module will need to get reimported after the reload. Something like this:
import themodule
from themodule import AClass
reload(themodule)
from themodule import AClass

One way to do this is to call reload.
Example: Here is the contents of foo.py:
def bar():
return 1
In an interactive session, I can do:
>>> import foo
>>> foo.bar()
1
Then in another window, I can change foo.py to:
def bar():
return "Hello"
Back in the interactive session, calling foo.bar() still returns 1, until I do:
>>> reload(foo)
<module 'foo' from 'foo.py'>
>>> foo.bar()
'Hello'
Calling reload is one way to ensure that your module is up-to-date even if the file on disk has changed. It's not necessarily the most efficient (you might be better off checking the last modification time on the file or using something like pyinotify before you reload), but it's certainly quick to implement.
One reason that Python doesn't read from the source module every time is that loading a module is (relatively) expensive -- what if you had a 300kb module and you were just using a single constant from the file? Python loads a module once and keeps it in memory, until you reload it.

If you are running in an IPython shell, then there are some magic commands that exist.
The IPython docs cover this feature called the autoreload extension.
Originally, I found this solution from Jonathan March's blog posting on this very subject (see point 3 from that link).
Basically all you have to do is the following, and changes you make are reflected automatically after you save:
In [1]: %load_ext autoreload
In [2]: %autoreload 2
In [3]: Import MODULE
In [4]: my_class = Module.class()
my_class.printham()
Out[4]: ham
In [5]: #make changes to printham and save
In [6]: my_class.printham()
Out[6]: hamlet

I used the following when importing all objects from within a module to ensure web2py was using my current code:
import buttons
import table
reload(buttons)
reload(table)
from buttons import *
from table import *

I'm not really sure that is what you mean, so don't hesitate to correct me. You are importing a module - let's call it mymodule.py - in your program, but when you change its contents, you don't see the difference?
Python will not look for changes in mymodule.py each time it is used, it will load it a first time, compile it to bytecode and keep it internally. It will normally also save the compiled bytecode (mymodule.pyc). The next time you will start your program, it will check if mymodule.py is more recent than mymodule.pyc, and recompile it if necessary.
If you need to, you can reload the module explicitly:
import mymodule
[... some code ...]
if userAskedForRefresh:
reload(mymodule)
Of course, it is more complicated than that and you may have side-effects depending on what you do with your program regarding the other module, for example if variables depends on classes defined in mymodule.
Alternatively, you could use the execfile function (or exec(), eval(), compile())

I had the exact same issue creating a geoprocessing script for ArcGIS 10.2. I had a python toolbox script, a tool script and then a common script. I have a parameter for Dev/Test/Prod in the tool that would control which version of the code was run. Dev would run the code in the dev folder, test from test folder and prod from prod folder. Changes to the common dev script would not run when the tool was run from ArcCatalog. Closing ArcCatalog made no difference. Even though I selected Dev or Test it would always run from the prod folder.
Adding reload(myCommonModule) to the tool script resolved this issue.

The cases will be different for different versions of python.
Following shows an example of python 3.4 version or above:
hello import hello_world
#Calls hello_world function
hello_world()
HI !!
#Now changes are done and reload option is needed
import importlib
importlib.reload(hello)
hello_world()
How are you?
For earlier python versions like 2.x, use inbuilt reload function as stated above.
Better is to use ipython3 as it provides autoreload feature.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Backing up/copying an entire folder tree in batch or python? - python

You need to import dir_util specifically to access it's functions: from distutils import dir_util If there are other modules in that package that you need, add them to the line, separated by commas. Only import the modules you need.

For Unix/Linux, I suggest 'rsync'. For windows: xcopy

Related

python calling function from another file while using import from the main file

What does ctypes.cdll.LoadLibrary really do?

Unable to split up large Python module

Python: Importing an "import file"

python refresh/reload

Categories

Resources