Different File Paths in Python ZipFile Depending on .write() vs .writestr() - python

I just wanted to ask quickly if the behavior I'm seeing in Python's zipfile module is expected... I wanted to put together a zip archive. For reasons I don't think I need to get into, I was adding some files using zipfile.writestr() and others using .write(). I was writing some files to zip subdirectory called /scripts and others to a zip subdirectory called /data.
For /data, I originally did this:
for root, _, filenames in os.walk(tmpdirname):
for root_name in filenames:
print(f"Handle zip of {root_name}")
name = os.path.join(root, root_name)
name = os.path.normpath(name)
zipFile.write(name, f'/data/{root_name}')
This worked fine and produced a working archive that I could extract. So far, so good. To write text files to the /script subdirectory, I used:
zipFile.writestr(f'/script/{scriptname}', fileBytes)
Again, so far so good.
Now it gets odd... I wanted to extract files in /data/. So I looked for paths in zipFile.namelist() starting with /data. My code kept missing the files in /data/, however. Doing some more digging, I noticed that the files written using .writestr had a slash at the start of the zipfile path like this: "/scripts/myscript.py". The files written using .write did not have a slash at the start of the path, so the data file paths looked like this: "data/mydata.pickle".
I changed my code to use .writestr() for the data files:
for root, _, filenames in os.walk(tmpdirname):
for root_name in filenames:
print(f"Handle zip of {root_name}")
name = os.path.join(root, root_name)
name = os.path.normpath(name)
with open(name, mode='rb') as extracted_file:
zipFile.writestr(f'/data/{root_name}', extracted_file.read())
Voila, the data files now have slashes at the start of the path. I'm not sure why, however, as I'm providing the same file path either way, and I wouldn't expect using one method versus another would change the paths.
Is this supposed to work this way? Am I missing something obvious here?

Related

Traversing a file-system directory structure

I am trying to do some work within some files in a directory. The basic structure of what I'm trying to work with is folder -> sub-folders -> files I need to access. data holds hundreds of subfolders, I am trying to access each one, find the file within them that ends in 'params', and for now just read the contents. My code is below:
import os
for sub_folder in os.scandir('data'):
os.chdir(sub_folder)
for file in os.scandir(sub_folder):
print(file.name)
if(file.name.endswith('params')):
with open(file.name, 'r') as f:
data = f.read()
I'm getting a FileNotFoundError, where it's telling me that the path 'data\\\run.0' doesn't exist. I have confirmed that 'run.0' is the first sub folder within data, so where I'm confused is how the path doesn't actually exist.
I know the error is happening when I attempt to change directories, so I'm suspecting the way that I am traversing the data folder is not a correct way of doing so. I understand that os.scandir gives a DirEntry object, which is what the variable sub_folder will be but is this not a valid input for the change directory function?
You can use os.walk, but I prefer use glob: See How to use Glob() function to find files recursively in Python?

How can I read files with similar names on python, rename them and then work with them?

I've already posted here with the same question but I sadly I couldn't come up with a solution (even though some of you guys gave me awesome answers but most of them weren't what I was looking for), so I'll try again and this time giving more information about what I'm trying to do.
So, I'm using a program called GMAT to get some outputs (.txt files with numerical values). These outputs have different names, but because I'm using them to more than one thing I'm getting something like this:
GMATd_1.txt
GMATd_2.txt
GMATf_1.txt
GMATf_2.txt
Now, what I need to do is to use these outputs as inputs in my code. I need to work with them in other functions of my script, and since I will have a lot of these .txt files I want to rename them as I don't want to use them like './path/etc'.
So what I wanted was to write a loop that could get these files and rename them inside the script so I can use these files with the new name in other functions (outside the loop).
So instead of having to this individually:
GMATds1= './path/GMATd_1.txt'
GMATds2= './path/GMATd_2.txt'
I wanted to write a loop that would do that for me.
I've already tried using a dictionary:
import os
import fnmatch
dict = {}
for filename in os.listdir('.'):
if fnmatch.fnmatch(filename, 'thing*.txt'):
examples[filename[:6]] = filename
This does work but I can't use the dictionary key outside the loop.
If I understand correctly, you try to fetch files with similar names (at least a re-occurring pattern) and rename them. This can be accomplished with the following code:
import glob
import os
all_files = glob.glob('path/to/directory/with/files/GMAT*.txt')
for file in files:
new_path = create_new_path(file) # possibly split the file name, change directory and/or filename
os.rename(file, new_path)
The glob library allows for searching files with * wildcards and makes it hence possible to search for files with a specific pattern. It lists all the files in a certain directory (or multiple directories if you include a * wildcard as a directory). When you iterate over the files, you could either directly work with the input of the files (as you apparently intend to do) or rename them as shown in this snippet. To rename them, you would need to generate a new path - so you would have to write the create_new_path function that takes the old path and creates a new one.
Since python 3.4 you should be using the built-in pathlib package instead of os or glob.
from pathlib import Path
import shutil
for file_src in Path("path/to/files").glob("GMAT*.txt"):
file_dest = str(file_src.resolve()).replace("ds", "d_")
shutil.move(file_src, file_dest)
you can use
import os
path='.....' # path where these files are located
path1='.....' ## path where you want these files to store
i=1
for file in os.listdir(path):
if file.endswith(end='.txt'):
os.rename(path + "/" + file, path1 + "/"+str(i) + ".txt")
i+=1
it will rename all the txt file in the source folder to 1,2,3,....n.txt

Writing zipfile in Python 3.6 without absolute path

I am trying to write a zip file using Python's zipfile module that starts at a certain subfolder but still maintains the tree structure from that subfolder. For example, if I pass "C:\Users\User1\OneDrive\Documents", the zip file will contain everything from Documents onward, with all of Documents' subfolders maintained within Documents. I have the following code:
import zipfile
import os
import datetime
def backup(src, dest):
"""Backup files from src to dest."""
base = os.path.basename(src)
now = datetime.datetime.now()
newFile = f'{base}_{now.month}-{now.day}-{now.year}.zip'
# Set the current working directory.
os.chdir(dest)
if os.path.exists(newFile):
os.unlink(newFile)
newFile = f'{base}_{now.month}-{now.day}-{now.year}_OVERWRITE.zip'
# Write the zipfile and walk the source directory tree.
with zipfile.ZipFile(newFile, 'w') as zip:
for folder, _ , files in os.walk(src):
print(f'Working in folder {os.path.basename(folder)}')
for file in files:
zip.write(os.path.join(folder, file),
arcname=os.path.join(
folder[len(os.path.dirname(folder)) + 1:], file),
compress_type=zipfile.ZIP_DEFLATED)
print(f'\n---------- Backup of {base} to {dest} successful! ----------\n')
I know I have to use the arcname parameter for zipfile.write(), but I can't figure out how to get it to maintain the tree structure of the original directory. The code as it is now writes every subfolder to the first level of the zip file, if that makes sense. I've read several posts suggesting I use os.path.relname() to chop off the root, but I can't seem to figure out how to do it properly. I am also aware that this post looks similar to others on Stack Overflow. I have read those other posts and cannot figure out how to solve this problem.
The arcname parameter will set the exact path within the zip file for the file you are adding. You issue is when you are building the path for arcname you are using the wrong value to get the length of the prefix to remove. Specifically:
arcname=os.path.join(folder[len(os.path.dirname(folder)) + 1:], file)
Should be changed to:
arcname=os.path.join(folder[len(src):], file)

First Practice Project in Automate the Boring Stuff with Python, Ch. 9

So my friend and I have been having a problem with the first practice project of the above chapter of Automate the Boring Stuff with Python. The prompt goes: "Write a program that walks through a folder tree and searches for files with a certain file extension (such as .pdf or .jpg). Copy these files from whatever location they are in to a new folder."
To simplify, we are trying to write a program that copies all of the .jpg files out of My Pictures to another directory. Here's our code:
#! python3
# moveFileType looks in My Puctures and copies .jpg files to my Python folder
import os, shutil
def moveFileType(folder):
for folderName, subfolders, filenames in os.walk(folder):
for subfolder in subfolders:
for filename in filenames:
if filename.endswith('.jpg'):
shutil.copy(folder + filename, '<destination>')
moveFileType('<source>')
We keep getting an error along the lines of "FileNotFoundError: [Errno 2] No such file or directory".
Edit: I added a "\" to the end of my source path (I'm not sure if that is what you meant, #Jacob H), and was able to copy all of the .jpg files in that directory, but received an error when it tried to copy a file within a subfolder of that directory. I added a for loop for subfolder in subfolders and I no longer get any errors, but it doesn't actually look in the subfolders for .jpg files.
There is a more fundamental problem with your code. When you use os.walk() it will already loop through every directory for you, so looping manually through the subfolders is going to produce the same results multiple times.
The other, and more immediate, problem is that os.walk() produces relative file names, so you need to glue them back together. Basically you are omitting the directory name and looking in the current directory for files which os.walk() is finding down in a subdirectory somewhere.
Here's a quick attempt at fixing your code:
def moveFileType(folder):
for folderName, subfolders, filenames in os.walk(folder):
for filename in filenames:
if filename.endswith('.jpg'):
shutil.copy(os.path.join(folderName, filename), '<destination>')
Making the function accept a destination parameter as a second argument, instead of hardcoding <destination>, would make it a lot more useful for the future.
Make sure to type the source file destination address correctly. While i tested your code, i wrote
moveFileType('/home/anum/Pictures')
and i got error;
IOError: [Errno 2] No such file or directory:
and when i wrote
moveFileType('/home/anum/Pictures/')
the code worked perfectly...
Try doing that, hope that will do your work. M using Python 2.7
Herez the re defined code for walking into subfolders and copying ,jpg files from there aswell.
import os, shutil
def moveFileType(folder):
for root, dirs, files in os.walk(folder):
for file in files:
if file.endswith('.jpg'):
image_path=os.path.join(root,file) # get the path location of each jpeg image.
print 'location: ',image_path
shutil.copy(image_path, '/home/anum/Documents/Stackoverflow questions')
moveFileType('/home/anum/Pictures/')

Having trouble understanding directory navigation with os.walk

I'm relatively new to python and I'm trying my hand at a weekend project. I want to navigate through my music directories and get the artist name of each music file and export that to a csv so that I can upgrade my music collection (a lot of it is from when I was younger and didn't care about quality).
Anyway, I'm trying to get the path of each music file in its respective directory, so I can pass it to id3 tag reading module to get the artist name.
Here is what I'm trying:
import os
def main():
for subdir, dirs, files in os.walk(dir):
for file in files:
if file.endswith(".mp3") or file.endswith(".m4a"):
print(os.path.abspath(file))
However, .abspath() doesn't do what I think it should. If I have a directory like this:
music
--1.mp3
--2.mp3
--folder
----a.mp3
----b.mp3
----c.mp3
----d.m4a
----e.m4a
and I run my code, I get this output:
C:\Users\User\Documents\python_music\1.mp3
C:\Users\User\Documents\python_music\2.mp3
C:\Users\User\Documents\python_music\a.mp3
C:\Users\User\Documents\python_music\b.mp3
C:\Users\User\Documents\python_music\c.mp3
C:\Users\User\Documents\python_music\d.m4a
C:\Users\User\Documents\python_music\e.m4a
I'm confused why it doesn't show the 5 files being inside of a folder.
Aside from that, am I even going about this in the easiest or best way? Again, I'm new to python so any help is appreciated.
You are passing just the filename to os.path.abspath(), which has no context but your current working directory.
Join the path with the subdir parameter:
print(os.path.join(subdir, file))
From the os.path.abspath() documentation:
On most platforms, this is equivalent to calling the function normpath() as follows: normpath(join(os.getcwd(), path)).
so if your current working directory is C:\Users\User\Documents\python_music all your files are joined relative to that.
But os.walk gives you the correct location to base filenames off instead; from the documentation:
For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames).
dirpath is a string, the path to the directory. [...] filenames is a list of the names of the non-directory files in dirpath. Note that the names in the lists contain no path components. To get a full path (which begins with top) to a file or directory in dirpath, do os.path.join(dirpath, name).
Emphasis mine.

Categories

Resources