Faster way to package a folder into a file with Python - python

I would like to package a folder into a file, I do not need compression. All alternatives I tried were slow.
I have tried:
The zipfile library with ZIP_STORED (no compression)
import zipfile
output_filename="folder.zip"
source_dir = "folder"
with zipfile.ZipFile(output_filename, 'w', zipfile.ZIP_STORED) as zipf:
zipdir(source_dir, zipf)
The tarfile library also using w to open the file for writing
without compression
import tarfile
import os
output_filename="folder.tar"
source_dir = "folder"
with tarfile.open(output_filename, "w") as tar:
tar.add(source_dir, arcname=os.path.basename(source_dir))
But both still take ~3-5 minutes to package a folder that is ~5GB and has < 10 files in it.
I am using a Linux machine.
Is there a faster way?

I am not quite sure if it is that faster but if you are running linux you could try tar command:
import time
import os
start = time.time()
os.system("tar -cvf name.tar /path/to/directory")
end = time.time()
print("Elapsed time: %s"%(end - start,))
If you also need file compression you need to add gzip after the first command:
os.system("gzip name.tar")

Related

Python MacOS Loop Files Get File Info

I am trying to loop through all mp3 files in my directory in MacOS Monterrey and for every iteration get the file's more info attributes, like Title, Duration, Authors etc. I found a post saying use xattr, but when i create a variable with xattr it doesn't show any properties or attributes of the files. This is in Python 3.9 with xattr package
import os
import xattr
directory = os.getcwd()
for filename in os.listdir(directory):
f = os.path.join(directory, filename)
# checking if it is a file
if os.path.isfile(f):
print(f)
x = xattr.xattr(f)
xs = x.items()
xattr is not reading mp3 metadata or tags, it is for reading metadata that is stored for the particular file to the filesystem itself, not the metadata/tags thats stored inside the file.
In order to get the data you need, you need to read the mp3 file itself with some library that supports reading ID3 of the file, for example: eyed3.
Here's a small example:
from pathlib import Path
import eyed3
root_directory = Path(".")
for filename in root_directory.rglob("*.mp3"):
mp3data = eyed3.load(filename)
if mp3data.tag != None:
print(mp3data.tag.artist)
print(mp3data.info.time_secs)

How to unzip many files with python

I got more than 1000 zip files in the same folder with naming convention output_MOJIBAKE
Example name: output_0aa3199eca63522b520ecfe11a4336eb_20210122_181742
How can I unzip them using Python?
Try this and let me know if it worked.
import os
import zipfile
path = 'path/to/your/zip/files'
os.chdir(path)
for file in os.listdir('.'):
with zipfile.ZipFile(file, 'r') as zip_ref:
zip_ref.extractall('.')

How to convert tar.gz file to zip using Python only?

Does anybody has any code for converting tar.gz file into zip using only Python code? I have been facing many issues with tar.gz as mentioned in the How can I read tar.gz file using pandas read_csv with gzip compression option?
You would have to use the tarfile module, with mode 'r|gz' for reading.
Then use zipfile for writing.
import tarfile, zipfile
tarf = tarfile.open( name='mytar.tar.gz', mode='r|gz' )
zipf = zipfile.ZipFile( file='myzip.zip', mode='a', compression=zipfile.ZIP_DEFLATED )
for m in tarf:
f = tarf.extractfile( m )
fl = f.read()
fn = m.name
zipf.writestr( fn, fl )
tarf.close()
zipf.close()
You can use is_tarfile() to check for a valid tar file.
Perhaps you could also use shutil, but I think it cannot work on memory.
PS: From the brief testing that I performed, you may have issues with members m which are directories.
If so, you may have to use is_dir(), or even first get the info on each tar file member with tarf.getmembers(), and the open the tar.gz file for transferring to zip, since you cannot do it after tarf.getmembers() (you cannot seek backwards).
This just fixes a couple of tiny issues from the above answer, makes sure the mtime is preserved and makes sure compression is happening on all the files. All credit to the above for the simple answer.
from datetime import datetime
import sys
from tarfile import open
from zipfile import ZipFile, ZIP_DEFLATED, ZipInfo
compresslevel = 9
compression = ZIP_DEFLATED
with open(name=sys.argv[1], mode='r|gz') as tarf:
with ZipFile(file=sys.argv[2], mode='w', compression=compression, compresslevel=compresslevel) as zipf:
for m in tarf:
mtime = datetime.fromtimestamp(m.mtime)
print(f'{mtime} - {m.name}')
zinfo: ZipInfo = ZipInfo(
filename=m.name,
date_time=(mtime.year, mtime.month, mtime.day, mtime.hour, mtime.minute, mtime.second)
)
if not m.isfile():
# for directories and other types
continue
f = tarf.extractfile(m)
fl = f.read()
zipf.writestr(zinfo, fl, compress_type=compression, compresslevel=compresslevel)
print('done.')

Why won't it expand both tar.gz files?

I have two tar.gz files, 2014_SRS.tar.gz and 2013_SRS.tar.gz. Each of the files contains a folder called SRS, which is full of text files. I downloaded these from an ftp server. I want to unzip them automatically in Python. This is my code:
import re
import ftplib
import os
import time
import tarfile
import sys
print('1')
tar = tarfile.open('2014_SRS.tar.gz')
tar.extractall()
tar.close()
print('2')
tar = tarfile.open('2013_SRS.tar.gz')
tar.extractall()
tar.close()
print('3')
This code only opens the second file. How do I fix it to open both files?
Also, I tried using a for loop to run through the whole directory. The code is shown below.
for i in os.listdir(os.getcwd()):
if i.endswith(".tar.gz"):
tar = tarfile.open(i, "r:gz")
tar.extractall()
tar.close()
However this gave me an EOFError. In addition, before I ran bit of code, I was able to unzip both files manually. However, after I run it, and after the code gives me an error, I cannot unzip the 2014_SRS file manually anymore. How do I fix this?
While this may not answer your specific question as to why both files could not be unzipped with your code , the following is one way to unzip a list of tar.gz files.
import tarfile, glob
srcDir = "/your/src/directory"
dstDir = "/your/dst/directory"
for f in glob.glob(srcDir + "/*.gz"):
t = tarfile.open(f,"r:gz")
for member in t.getmembers():
t.extract(member,dstDir)
t.close()

How to get .avi files length

I am trying to loop over a directory of sub folders where every folder contains one .avi file that i want to retrieve its length in seconds.
I've found PyMedia http://pymedia.org/ and i understand it could possibly help me achieve this but i cannot find anything about avi duration / length in the documentation.
How would i be able to do that? also, if there is a different library of some sort i'd like to know aswel.
Edit: Added my final solution that works thanks to J.F. Sebastian
import sys
import glob
import os
from hachoir_core.cmd_line import unicodeFilename
from hachoir_core.i18n import getTerminalCharset
from hachoir_metadata import extractMetadata
from hachoir_parser import createParser
path = "z:\*"
for fpath in glob.glob(os.path.join(path, '*avi')):
filename = fpath
filename, real_filename = unicodeFilename(filename), filename
parser = createParser(filename, real_filename=real_filename)
metadata = extractMetadata(parser)
print fpath
print("Duration (hh:mm:ss.f): %s" % metadata.get('duration'))
print '\n'
You could use hachoir-metadata to extract avi duration from a file:
#!/usr/bin/env python
import sys
# $ pip install hachoir-{core,parser,metadata}
from hachoir_core.cmd_line import unicodeFilename
from hachoir_core.i18n import getTerminalCharset
from hachoir_metadata import extractMetadata
from hachoir_parser import createParser
filename = sys.argv[1]
charset = getTerminalCharset()
filename, real_filename = unicodeFilename(filename, charset), filename
parser = createParser(filename, real_filename=real_filename)
metadata = extractMetadata(parser)
print("Duration (hh:mm:ss.f): %s" % metadata.get('duration'))
It uses pure Python RIFF parser to extract info from avi file.
Example:
$ get-avi-duration.py test.avi
Duration (hh:mm:ss.f): 0:47:03.360000
Here's ffmpeg's output for comparison:
$ ffmpeg -i test.avi |& grep -i duration
Duration: 00:47:03.36, start: 0.000000, bitrate: 1038 kb/s
To print info about all avi files in a directory tree:
#!/usr/bin/env python
import os
import sys
from hachoir_metadata import extractMetadata
from hachoir_parser import createParser
def getinfo(rootdir, extensions=(".avi", ".mp4")):
if not isinstance(rootdir, unicode):
rootdir = rootdir.decode(sys.getfilesystemencoding())
for dirpath, dirs, files in os.walk(rootdir):
dirs.sort() # traverse directories in sorted order
files.sort()
for filename in files:
if filename.endswith(extensions):
path = os.path.join(dirpath, filename)
yield path, extractMetadata(createParser(path))
for path, metadata in getinfo(u"z:\\"):
if metadata.has('duration'):
print(path)
print(" Duration (hh:mm:ss.f): %s" % metadata.get('duration'))
If your server running any UNIX operation system you can use ffmpeg to do this. Usually just default command like ffmpeg myvideo.avi will give you full video details.
There's also a python wrapper for ffmpeg which probably will return video details in dictionary or list.
EDIT:
I've also found nice ffmpeg tool called ffprobe which can output length of video without additional fuss.
fprobe -loglevel error -show_streams inputFile.avi | grep duration | cut -f2 -d=
Not sure if there is a platform independent way to do this, but if you only need this to work on windows then it looks like MediaInfo (below) has a command line interface which you can use to output details about video files, which could then be parsed to get the information. Not the prettiest solution but looks like it should work.
http://mediainfo.sourceforge.net/en

Categories

Resources