How to zip the contents of a whole directory? - python

I'm using python 2.7 and try to archive a directory:
the object is to: archive the whole folder's content
subject: C:\User\blah\blah\parentfolder
structure:C...parentfolder\ 1.docx
2.xlxs
4.pdf
subfolder\5.pdf
6.txt
Now I know that shutil.make_archive() can zip me the folder, but it turned out to be like this
C:\\User\\blah\\blah\\zipfile_name\\parentfolder\ 1.docx
2.xlxs
4.pdf
subfolder\5.pdf
6.txt
but I want a to skip the parentfolder layer in the zipfile_name folder, that is:(This is where the problem is different, I did learn shutil.make_archive from that question, but as I open the shutil module, I found out that make_archive has four intakes. I didn't know what makes them different and used the four input one, that's where I had my question. The previous one didn't not point out why it used just three inputs, and the other answers were using very complicated self-defined functions or so. The answer I chose in this problem compared and explained the difference, which I really appreciate. As a non-computer science majored, I think this kind of details help a lot of people.)
C:\\User\\blah\\blah\\zipfile_name\\ 1.docx
2.xlxs
4.pdf
subfolder\5.pdf
6.txt
if I use os.path.walk and so on to write every file to zipfile_name, it seems it will be super complicated, as I will need to first 1.list everything in the folder--while:1 file?--write into zip folder while:0 directory?--make_directory , and list all items, and while loop again this process( do until no directory exists in the last directory)
Plus I don't know how to make a directory in a .zip folder. So all this just seems complicated.
So is there a way to change a folder into an archive folder while keeping the folder structure?

It's just a matter of how you pass the information about the parentfolder folder in the parameters of make_archive.
Consider the following two examples:
shutil.make_archive('myarchive', 'zip', '/User/blah/blah', 'parentfolder')
This will create the zip archive myarchive.zip. If you open myarchive.zip you will first see the folder parentfolder and if you open that you will see the contents of the original parentfolder from which the archive was made.
shutil.make_archive('myarchive', 'zip', '/User/blah/blah/parentfolder')
This will create the zip archive myarchive.zip. If you open myarchive.zip you will immediately see the contents of the original parentfolder from which the archive was made. The folder name parentfolder will not appear in the archive.
The parameters are documented in https://docs.python.org/2/library/shutil.html, though to be honest it is not easy for me to see where the behavior I illustrated above is documented. I found it by trial and error
(Python 2.7.14 on Windows).

Related

Put contents of a folder into a zip using python

I need to put the contents of a folder into a ZIP archive (In my case a JAR but they're the same thing).
Though I cant just do the normal adding every file individually, because for one... they change each time depending on what the user put in and it's something like 1,000 files anyway!
I'm using ZipFile, it's the only really viable option.
Basically my code makes a temporary folder, and puts all the files in there, but I can't find a way to add the contents of the folder and not just the folder itself.
Python allrady support zip and rar archife. Personaly I learn it form here https://docs.python.org/3/library/zipfile.html
And to get all files form that folder try somethign like
for r, d, f in os.walk(path):
for file in f:
#enter code here

Incorrect file reading when using os.walk in python3

I am crawling through folders using the os.walk() method. In one of the folders, there is a large number of files, around 100,000 of them. The files look like: p_123_456.zip. But they are read as p123456.zip. Indeed, when I open windows explorer to browse the folder, for the first several seconds the files look like p123456.zip, but then change their appearance to p_123_456.zip. This is a strange scenario.
Now, I can't use time.sleep() because all folders and and files are being read into python variables in the looping line. Here is a snippet of the code:
for root, dirs, files in os.walk(srcFolder):
os.chdir(root)
for file in files:
shutil.copy(file, storeFolder)
In the last line, I get a file not found exception, saying that the file p123456.zip does not exist. Has anyone run into this mysterious issue? Anyway to bypass this? What is the cause of this? Thank you.
You don't seem to be concatenating the actual folder name with the filenames. Try changing your code to:
for root, dirs, files in os.walk(srcFolder):
for file in files:
shutil.copy(os.path.join(root, file), storeFolder)
os.chdir should be avoided like the plague. For one thing - if the changes suceeeds, it won't be the directory from which you are running your os.walk anymore - and then, a second chdir on another folder will fail (either stop your porgram or change you to an unexpected folder).
Just add the folder name as prefixes, and don't try using chdir.
Moreover, as for the comment from ShadowRanger above, os.walk officially breaks if you chdir inside its iteration - https://docs.python.org/3/library/os.html#os.walk - that is likely the root of the problem you had.

Python's os.walk() fails in Windows when there are long filenames

I use python os.walk() to get files and dirs in some directories, but there're files whose names are too long(>300), os.walk() return nothing, use onerror I get '[Error 234] More data is available'. I tried to use yield, but also get nothing and shows 'Traceback: StopIteration'.
OS is windows, code is simple. I have tested with a directory, if there's long-name file, problem occur, while if rename the long-name files with short names, code can get correct result.
I can do nothing for these directories, such as rename or move the long-name files.
Please help me to solve the problem!
def t(a):
for root,dirs,files in os.walk(a):
print root,dirs,files
t('c:/test/1')
In Windows file names (including path) can not be greater than 255 characters, so the error you're seeing comes from Windows, not from Python - because somehow you managed to create such big file names, but now you can't read them. See this post for more details.
The only workaround I can think of is to map the the folder to the specific directory. This will make the path way shorter. e.g. z:\myfile.xlsx instead of c:\a\b\c\d\e\f\g\myfile.xlsx

How to copy all files with a particular prefix to new directories with python?

Suppose I have the following directory structure:
C:\Test
C:\Test\2009
C:\Test\2009\files\Artists
C:\Test\2009\files\Artists\SnoopDog
C:\Test\2009\files\Artists\SnoopDog\albums.txt
C:\Test\2009\files\Artists\SnoopDog\albums.jpg
C:\Test\2009\files\Artists\SnoopDog\hobbies.doc
C:\Test\2009\files\Artists\SmashMouth\albums.txt
C:\Test\2009\files\Artists\SmashMouth\hobbies.doc
C:\Test\2010\files\Artists\SnoopDog\albums.txt
C:\Test\2010\files\Artists\SnoopDog\albums.jpg
C:\Test\2010\files\Artists\SnoopDog\hobbies.doc
The following is the directory structure I want as a goal:
C:\ToDirectory\
C:\ToDirectory\2009\albums\SnoopDog_albums.txt
C:\ToDirectory\2009\albums\SnoopDog_albums.jpg
C:\ToDirectory\2009\albums\SmashMouth_albums.txt
C:\ToDirectory\2009\albums\SmashMouth_albums.jpg
C:\ToDirectory\2009\hobbies\SmashMouth_hobbies.doc
C:\ToDirectory\2009\hobbies\SnoopDog_hobbies.doc
C:\ToDirectory\2010\albums\SnoopDog_albums.txt
C:\ToDirectory\2010\albums\SnoopDog_albums.jpg
Assuming C:\Test contains all the files and C:\ToDirectory starts off as being an empty directory.
What is the most efficient way to have a function where I simply give its source directory C:\Test and a target directory ToDirectory and the script goes to the lowest level of C:\Test, goes through each file in the directory and checks whether the filename (ignoring extension) is a durectiry in the ToDirectory structure, if not, create it and copy the file into it with the parent directory appended at the beginning of its name with python?
I am using os.listdir and os.isdir in a series of nexted loops, but it appears to be very lengthy and though it does it job, appears inefficient...
try pythons high level file operations library ( import shutil )
Also consider using pythons excellent directory walking function ( from os import walk )
You shouldn't need to copy the files but just rename the directories and filenames.
You can achieve shorter code using recursion instead of nesting loops explicitly.
Recursion is not always more efficient but your code will look a lot better!.

With Python's 'tarfile', how can I get the top-most directory in a tar archive?

I am wanting to upload a theme archive to a django web module and wanting to pull the name of the top-most directory in the archive to use as the theme's name. The archive will always be a tar-gzip format and will always have only one folder at the top level (though other files may exist parallel to it) with the various sub-directories containing templates, css, images etc. in what ever order suits the theme best.
Currently, based on the very useful code from MegaMark16, my tool uses the following method:
f = tarfile.open(fileobj=self.theme_file, mode='r:gz')
self.name = f.getnames()[0]
Where self.theme_file is a full path to the uploaded file. This works fine as long as the order of the entries in the tarball happens to be correct, but in many cases it is not. I can certainly loop through the entire archive and manually check for the proper 'name' characteristics, but I suspect that there is a more elegant and rapid approach. Any suggestions?
You'll want to use a method called commonprefix.
Sample code would be something to the effect of:
archive = tarfile.open(filepath, mode='r')
print os.path.commonprefix(archive.getnames())
Where the printed value would be the 'topmost directory in the archive'--or, your theme name.
Edit: upon further reading of your specs, though, this approach may not yield your desired result if you have files that are siblings to the 'topmost directory', as the common prefix would then just be .; this would only work if ALL files, indeed, had that common prefix of your theme name.
All sub directories have a '/' so you can do something like this
self.name = [name for name in f.getnames() if '/' not in name][0] and optimize with other tricks.

Categories

Resources