How to exclude specific subfolders from the generated tar file? - python

I am using Python 3 with the tarfile module to compress some folders (with subfolders). What I need to do: to set a couple of subfolders to be excluded from the final tar file.
For example, say my folders looked like:
dir/
├── subdirA
│   ├── subsubdirA1
│   │   └── fileA11.txt
│   │   └── fileA12.txt
│   ├── subsubdirA2
│   │   └── fileA21.txt
│   │   └── fileA22.txt
│   └── fileA.txt
├── subdirB
│   ├── subsubdirB1
│   │   └── fileB11.txt
│   │   └── fileA12.txt
│   ├── subsubdirB2
│   │   └── fileB21.txt
│   │   └── fileB22.txt
│   └── fileB.txt
└── main.txt
Now, I say I wanted to include everything in dir/ except the contents of subsubdirA2 and of subsubdirB2. Based on this answer, I have tried:
EXCLUDE_FILES = ['/subdirA/subsubdirA2', '/subdirB/subsubdirB2']
mytarfile.add(..., filter=lambda x: None if x.name in EXCLUDE_FILES else x)
Or:
EXCLUDE_FILES = ['/subdirA/subsubdirA2/*', '/subdirB/subsubdirB2/*']
mytarfile.add(..., filter=lambda x: None if x.name in EXCLUDE_FILES else x)
Or:
EXCLUDE_FILES = ['/subdirA/subsubdirA2/*.*', '/subdirB/subsubdirB2/*.*']
mytarfile.add(..., filter=lambda x: None if x.name in EXCLUDE_FILES else x)
I also tried variants of the three options above where the subfolder paths started without / or with dir or with /dir. None worked - all the time, everything within dir was included.
How could I correctly exclude specific subfolders from a tar file I want to generate? If a different module/library is required instead of tarfile, that is fine.

I didn't find reference about tarfile the way you need, but you can use thread and include shell command like this:
import subprocess
exclude=['dir/subdirA/subsubdirA2','dir/subdirA/subsubdirA1','dir/subdirA/text.tx']
excludeline=''
for x in exclude:
excludeline += ' --exclude '+x
# cmd has tar command
cmd='tar -czvf dir.tar dir '+ excludeline
print(cmd)
process = subprocess.Popen(cmd,shell=True,stdin=None,stdout=subprocess.PIPE,stderr=subprocess.PIPE)
result=process.stdout.readlines()
# All files were compressed
if len(result) >= 1:
for line in result:
print(line.decode("utf-8"))
Where cmd has value in this example :
cmd = tar -czvf dir.tar dir --exclude dir/subdirA/subsubdirA2 --exclude dir/subdirA/subsubdirA1 --exclude dir/subdirA/text.tx

I think the EXCLUDE_FILES that you are using should be matched against the file names with pattern matching. Here is how I would do that:
import re, os
EXCLUDE_FILES = ['/subdirA/subsubdirA2/*', '/subdirB/subsubdirB2/*']
pattern = '(?:% s)' % '|'.join(EXCLUDE_FILES) #form a pattern string
For using a filter against the pattern we'll use re.match,
mytarfile.add(..., filter=lambda x: None if re.match(pattern, x.name) else x)
We exclude the file if file.name matches any of the patterns specified in EXCLUDE_FILES. Hope this helps.

Related

How do I match the file name from different directories and replace the partial filename with the actual filename?

So I have a slightly complicated issue that I need some help with :(
In Directory 1, I have the filenames as follows:
00HFP.mp4
0AMBV.mp4
2D5GN.mp4
3HVKR.mp4
3IJGQ.mp4
In Directory 2, I did some processing to the mp4s and got some output files:
_0HFP.usd
_AMBV.usd
_D5GN.usd
_HVKR.usd
_IJGQ.usd
For some reason, the programme I'm using replaces the first number/character with an underscore for some files. Other files are generally left alone. But I need the filenames to match :( How do I do a mass renaming (over 500 files) based on this partial naming using python script? So like for example: _0HFP.usd should become 00HFP.usd since there's a 00HFP.mp4 file in Directory 1.
Please help :( Thank you!
Trying this (as suggested by Corralien): but still doesn't work for me :(
dir1 = pathlib.Path('./mnt/d/Downloads/Charades_v1_480/charades_18Jan/done/')
dir2 = pathlib.Path('./mnt/d/Downloads/Charades_v1_480/charades_18Jan_anim/pt-charades-output/')
print('i am here')
for f1 in dir1.glob('*.mp4'):
print(f1)
f2 = dir2 / f'_{f1.stem[1:]}.usd'
if f2.exists():
f2.rename(dir2 / f'{f1.stem}.usd')
Suppose the following directories:
Dir1
├── 00HFP.mp4
├── 0AMBV.mp4
├── 2D5GN.mp4
├── 3HVKR.mp4
└── 3IJGQ.mp4
Dir2
├── _0HFP.usd
├── _AMBV.usd
├── _D5GN.usd
├── _HVKR.usd
└── _IJGQ.usd
Try:
import pathlib
dir1 = pathlib.Path('./Dir1')
dir2 = pathlib.Path('./Dir2')
for f1 in dir1.glob('*.mp4'):
f2 = dir2 / f'_{f1.stem[1:]}.usd'
if f2.exists():
f2.rename(dir2 / f'{f1.stem}.usd')
After processing:
Dir1
├── 00HFP.mp4
├── 0AMBV.mp4
├── 2D5GN.mp4
├── 3HVKR.mp4
└── 3IJGQ.mp4
Dir2
├── 00HFP.usd
├── 0AMBV.usd
├── 2D5GN.usd
├── 3HVKR.usd
└── 3IJGQ.usd

python Pathlib, how do I remove leading directories to get relative paths?

Let's say I have this directory structure.
├── root1
│   └── root2
│   ├── bar
│   │   └── file1
│   ├── foo
│   │   ├── file2
│   │   └── file3
│   └── zoom
│   └── z1
│   └── file41
I want to isolate path components relative to root1/root2, i.e. strip out the leading root part, giving relative directories:
bar/file1
foo/file3
zoom/z1/file41
The root depth can be arbitrary and the files, the node of this tree, can also reside at different levels.
This code does it, but I am looking for Pathlib's pythonic way to do it.
from pathlib import Path
import os
#these would come from os.walk or some glob...
file1 = Path("root1/root2/bar/file1")
file2 = Path("root1/root2/foo/file3")
file41 = Path("root1/root2/zoom/z1/file41")
root = Path("root1/root2")
#take out the root prefix by string replacement.
for file_ in [file1, file2, file41]:
#is there a PathLib way to do this?🤔
file_relative = Path(str(file_).replace(str(root),"").lstrip(os.path.sep))
print(" %s" % (file_relative))
TLDR: use Path.relative_to:
Path("a/b/c").relative_to("a/b") # returns PosixPath('c')
Full example:
from pathlib import Path
import os
# these would come from os.walk or some glob...
file1 = Path("root1/root2/bar/file1")
file2 = Path("root1/root2/foo/file3")
file41 = Path("root1/root2/zoom/z1/file41")
root = Path("root1/root2")
# take out the root prefix by string replacement.
for file_ in [file1, file2, file41]:
# is there a PathLib way to do this?🤔
file_relative = file_.relative_to(root)
print(" %s" % (file_relative))
Prints
bar\file1
foo\file3
zoom\z1\file41

Directory compression in python

I will try to explain it on example.
abc
├── test
├── dir1
├── dir2
├── not_for_zipping.txt
I want to compress all directories in test dir (in this example it is dir1 and dir2)
Right now I made it like this:
directory = dlg.lineEdit_zipfile_path2.text() // this should be path to test dir. (.../abc/test/)
arr = os.listdir(directory)
for item in arr:
allfiles2zip = directory + item
try:
shutil.make_archive(item,'zip', + allfiles2zip)
except OSError:
pass
it looks like it is working but all directories (dir1 and dir2) are compressed to: .../abc/here
abc
├── dit1.zip
├── dir2.zip
├── test
├── dir1
├── dir2
├── not_for_zipping.txt
but I would like to receive those files in selected path (directory) ...abc/test/here
abc
├── test
├── dir1
├── dir2
├── not_for_zipping.txt
├── dir1.zip
├── dir2.zip
Do you have any idea how can I change it ?
By the way, do you have any better way for this case ?
You can use path in file name
make_archive('test/' + item, 'zip', ...)
Eventually you can change folder before compressing
old_folder = os.getcwd()
os.chdir('test')
shutil.make_archive(item, 'zip', ...)
os.chdir(old_folder)

Rename part of filenames in sub-folders python

I have to rename images in main directory with contains sub-folders, script with I using right now do some work but not exactly what I need: I can't find a way to do it properly, now i have it:
maindir #my example origin
├── Sub1
│ ├── example01.jpg
│ ├── example02.jpg
│ └── example03.jpg
└── Sub2
├── example01.jpg
├── example02.jpg
└── example03.jpg
My script do that:
maindir
├── Sub1
│ ├── Sub1_example01.jpg
│ ├── Sub1_example02.jpg
│ └── Sub1_example03.jpg
└── Sub2
├── Sub2_example01.jpg
├── Sub2_example02.jpg
└── Sub2_example03.jpg
And I would like to get it :replace a letters in my filenames by my sub-folder name and keep the origin numbers of my jpg:
maindir
├── Sub1
│ ├── Sub1_01.jpg
│ ├── Sub1_02.jpg
│ └── Sub1_03.jpg
└── Sub2
├── Sub2_01.jpg
├── Sub2_02.jpg
└── Sub2_03.jpg
there is my code 4 witch I using:
from os import walk, path, rename
parent = ("F:\\PS\\maindir")
for dirpath, _, files in walk(parent):
for f in files:
rename(path.join(dirpath, f), path.join(dirpath, path.split(dirpath)[-1] + '_' + f))
what I have to change overhere to get my result???
instead of that line:
rename(path.join(dirpath, f), path.join(dirpath, path.split(dirpath)[-1] + '_' + f))
generate a new name using str.replace:
newf = f.replace("example",os.path.basename(dirpath)+"_")
then
rename(path.join(dirpath, f), path.join(dirpath,newf))
of course if you don't know the extension or the "prefix" of the input file, and only want to keep the number & extension, there's a way:
import re
number = (re.findall("\d+",f) or ['000'])[0]
this extracts the number from the name, and if not found, issues 000.
Then rebuild newf with the folder name, the extracted number & the original extension:
newf = "{}_{}.{}".format(os.path.basename(dirpath),number,os.path.splitext(f)[1])

Python. Rename files in subdirectories

Could you please help me to modify below script to change the name of files also in subdirectories.
def change():
path = e.get()
for filename in os.walk(path):
for ele in filename:
if type(ele) == type([]) and len(ele)!=0:
for every_file in ele:
if every_file[0:6].isdigit():
number = every_file[0:6]
name = every_file[6:]
x = int(number)+y
newname = (str(x) + name)
os.rename(os.path.join(path, every_file), os.path.join(path, newname))
I don't know what constraints you have on file names, therefore I wrote a general script just to show you how change their names in a given folder and all subfolders.
The test folder has the following tree structure:
~/test$ tree
.
├── bye.txt
├── hello.txt
├── subtest
│   ├── hey.txt
│   ├── lol.txt
│   └── subsubtest
│   └── good.txt
└── subtest2
└── bad.txt
3 directories, 6 files
As you can see all files have .txt extension.
The script that rename all of them is the following:
import os
def main():
path = "/path/toyour/folder"
count = 1
for root, dirs, files in os.walk(path):
for i in files:
os.rename(os.path.join(root, i), os.path.join(root, "changed" + str(count) + ".txt"))
count += 1
if __name__ == '__main__':
main()
The count variable is useful only to have different names for every file; probably you can get rid of it.
After executing the script, the folder looks like this:
~/test$ tree
.
├── changed1.txt
├── changed2.txt
├── subtest
│   ├── changed4.txt
│   ├── changed5.txt
│   └── subsubtest
│   └── changed6.txt
└── subtest2
└── changed3.txt
3 directories, 6 files
I think that the problem in your code is that you don't use the actual root of the os.walk function.
Hope this helps.

Categories

Resources