Re-organizing a large MP3 library for my friend's MP3 Player, I have the need to name the Title ID3 tag the same as the file name, and doing this via Windows Properties takes forever, so I was wondering if anyone has an idea of how to make a Python script that does this to all MP3's in a directory in rapid succession. Or at least a link to a library installable on Windows.
Look at this:
ID3 Tagging in Python
id3reader
Also Dive Into Python uses MP3 ID3 tags as an example.
Don't forget about PyPI - the Python Package Index.
Here is a python script I wrote to do this https://gitlab.com/tomleo/id3_folder_rename
#! /usr/bin/python
import os
import re
import glob
import subprocess
from mutagen.easyid3 import EasyID3
path = os.getcwd()
fpath = u"%s/*.mp3" % path
files = glob.glob(fpath)
for fname in files:
_track = EasyID3(fname)
track_num = _track.get('tracknumber')[0]
track_title = re.sub(r'/', '_', _track.get('title')[0])
if '/' in track_num:
track_num = track_num.split('/')[0]
if len(track_num) == 1:
track_num = "0%s" % track_num
_valid_fname = u"%s/%s %s.mp3" % (path, track_num, track_title)
if fname != _valid_fname:
subprocess.call(["/bin/mv", fname, _valid_fname])
It uses the mutagen python library for parsing the ID3 info. You'll have to tweak the subprocess call it to make it work with windows, but this should give you an idea for how to do it. Hope this helps.
Related
I am trying to loop through all mp3 files in my directory in MacOS Monterrey and for every iteration get the file's more info attributes, like Title, Duration, Authors etc. I found a post saying use xattr, but when i create a variable with xattr it doesn't show any properties or attributes of the files. This is in Python 3.9 with xattr package
import os
import xattr
directory = os.getcwd()
for filename in os.listdir(directory):
f = os.path.join(directory, filename)
# checking if it is a file
if os.path.isfile(f):
print(f)
x = xattr.xattr(f)
xs = x.items()
xattr is not reading mp3 metadata or tags, it is for reading metadata that is stored for the particular file to the filesystem itself, not the metadata/tags thats stored inside the file.
In order to get the data you need, you need to read the mp3 file itself with some library that supports reading ID3 of the file, for example: eyed3.
Here's a small example:
from pathlib import Path
import eyed3
root_directory = Path(".")
for filename in root_directory.rglob("*.mp3"):
mp3data = eyed3.load(filename)
if mp3data.tag != None:
print(mp3data.tag.artist)
print(mp3data.info.time_secs)
As title states, I write a library that contains data package with several cofiguration files.
The configuration files contains hard-coded paths to other configuration files, that I would like to change during installation time, so the new hard-coded paths will point to where the library is actually installed.
I tried different approaches that work well under the Windows environmet, but not under Unix based platorms (e.g. Ubuntu).
My setup.py code:
import atexit
import os
import sys
import fileinput
import fnmatch
import glob
from setuptools import setup
from setuptools.command.develop import develop
from setuptools.command.install import install
from setuptools.command.egg_info import egg_info
LIB_NAME = "namsim"
NAMSIM_DATA_DIRECTORY = "data"
NAMSIM_CONF_DIRECTORY = "default_namsim_conf"
def post_install_operations(lib_path):
# TODO: workaround to exit in library creation process
if 'site-packages' not in lib_path:
return
# set conf path and replace slash to backslash to support UNIX systems
conf_dir_path = os.path.join(lib_path, NAMSIM_DATA_DIRECTORY, NAMSIM_CONF_DIRECTORY)
conf_dir_path = conf_dir_path.replace(os.sep, '/')
# change paths in all conf .xml files
file_pattern = "*.xml"
for path, dirs, files in os.walk(conf_dir_path):
for filename in fnmatch.filter(files, file_pattern):
full_file_path = os.path.join(path, filename)
print(full_file_path)
# replace stub with the actual path
stub_name = 'STUB_PATH'
# Read in the file
with open(full_file_path, 'r') as file:
file_data = file.read()
print(file_data)
# Replace the target string and fix slash direction based
file_data = file_data.replace(stub_name, conf_dir_path)
print(file_data)
# Write the file out again
with open(full_file_path, 'w') as file:
file.write(file_data)
def post_install_decorator(command_subclass):
"""A decorator for classes subclassing one of the setuptools commands.
It modifies the run() method so that it will change the configuration paths.
"""
orig_run = command_subclass.run
def modified_run(self):
def find_module_path():
for p in sys.path:
if os.path.isdir(p) and LIB_NAME in os.listdir(p):
return os.path.join(p, LIB_NAME)
orig_run(self)
lib_path = find_module_path()
post_install_operations(lib_path)
command_subclass.run = modified_run
return command_subclass
#post_install_decorator
class CustomDevelopCommand(develop):
pass
#post_install_decorator
class CustomInstallCommand(install):
pass
#post_install_decorator
class CustomEggInfoCommand(egg_info):
pass
atexit.register(all_done)
setup(
name="namsim",
version="1.0.0",
author="Barak David",
license="MIT",
keywords="Name similarity mock-up library.",
packages=['namsim', 'namsim.wrapper', 'namsim.data'],
package_date={'data': ['default_namsim_conf/*']},
include_package_data=True,
cmdclass={
'develop': CustomDevelopCommand,
'install': CustomInstallCommand,
'egg_info': CustomEggInfoCommand
}
)
Picture of my library source tree:
To be clear, the original namsim_config.xml original contains the text:
STUB_PATH/conf/multiplier_config.xml
My goal is that the text will be changed after installaion to:
{actual lib installation path}/conf/multiplier_config.xml
Some additional information:
I tried the above code on both python 2.7 and 3.x platforms.
On Windows I get the expected result, in contrast to Unix based platforms.
I use "python setup.py sdist" command on Windows to create the libary, and I install the resulting tar.gz on the different platforms.
I also tried using the atexit module to change the configurations before process termination, but I got the same result.
Thank you.
I have been given a Project on Python Programming so I wanted to ask you that how can I give relative directory paths to the generated files in Python so that it could be opened in other machines as absolute paths won't work on every PC
If you have write access to the folder containing your script then you can use something like
import sys, os
if __name__ == '__main__':
myself = sys.argv[0]
else:
myself = __file__
myself = os.path.abspath(myself)
whereami = os.path.dirname(myself)
print(myself)
print(whereami)
datadir = os.path.join(whereami, 'data')
if not os.path.exists(datadir):
os.mkdir(datadir)
datafile = os.path.join(datadir, 'foo.txt')
with open(datafile, 'w') as f:
f.write('Hello, World!\n')
with open(datafile) as f:
print(f.read())
In the file that has the script, you want to do something like this:
import os
dirname = os.path.dirname(__file__)
filename = os.path.join(dirname, 'relative/path/to/file/you/want')
This will give you the absolute path to the file you're looking for, irrespective of the machine you are running your code on.
You can also refer these links for more information:
Link1
Link2
For more specific information, please make your question specific. i.e Please post the code that you have tried along with your inputs and expected outputs.
I have thousands of PDF files in my computers which names are from a0001.pdf to a3621.pdf, and inside of each there is a title; e.g. "aluminum carbonate" for a0001.pdf, "aluminum nitrate" in a0002.pdf, etc., which I'd like to extract to rename my files.
I use this program to rename a file:
path=r"C:\Users\YANN\Desktop\..."
old='string 1'
new='string 2'
def rename(path,old,new):
for f in os.listdir(path):
os.rename(os.path.join(path, f), os.path.join(path, f.replace(old, new)))
rename(path,old,new)
I would like to know if there is/are solution(s) to extract the title embedded in the PDF file to rename the file?
Installing the package
This cannot be solved with plain Python. You will need an external package such as pdfrw, which allows you to read PDF metadata. The installation is quite easy using the standard Python package manager pip.
On Windows, first make sure you have a recent version of pip using the shell command:
python -m pip install -U pip
On Linux:
pip install -U pip
On both platforms, install then the pdfrw package using
pip install pdfrw
The code
I combined the ansatzes of zeebonk and user2125722 to write something very compact and readable which is close to your original code:
import os
from pdfrw import PdfReader
path = r'C:\Users\YANN\Desktop'
def renameFileToPDFTitle(path, fileName):
fullName = os.path.join(path, fileName)
# Extract pdf title from pdf file
newName = PdfReader(fullName).Info.Title
# Remove surrounding brackets that some pdf titles have
newName = newName.strip('()') + '.pdf'
newFullName = os.path.join(path, newName)
os.rename(fullName, newFullName)
for fileName in os.listdir(path):
# Rename only pdf files
fullName = os.path.join(path, fileName)
if (not os.path.isfile(fullName) or fileName[-4:] != '.pdf'):
continue
renameFileToPDFTitle(path, fileName)
What you need is a library that can actually read PDF files. For example pdfrw:
In [8]: from pdfrw import PdfReader
In [9]: reader = PdfReader('example.pdf')
In [10]: reader.Info.Title
Out[10]: 'Example PDF document'
You can use pdfminer library to parse the PDFs. The info property contains the Title of the PDF. Here is what a sample info looks like :
[{'CreationDate': "D:20170110095753+05'30'", 'Producer': 'PDF-XChange Printer `V6 (6.0 build 317.1) [Windows 10 Enterprise x64 (Build 10586)]', 'Creator': 'PDF-XChange Office Addin', 'Title': 'Python Basics'}]`
Then we can extract the Title using the properties of a dictionary. Here is the whole code (including iterating all the files and renaming them):
from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument
import os
start = "0000"
def convert(var):
while len(var) < 4:
var = "0" + var
return var
for i in range(1,3622):
var = str(i)
var = convert(var)
file_name = "a" + var + ".pdf"
fp = open(file_name, 'rb')
parser = PDFParser(fp)
doc = PDFDocument(parser)
fp.close()
metadata = doc.info # The "Info" metadata
print metadata
metadata = metadata[0]
for x in metadata:
if x == "Title":
new_name = metadata[x] + ".pdf"
os.rename(file_name,new_name)
You can look at only the metadata using a ghostscript tool pdf_info.ps. It used to ship with ghostscript but is still available at https://r-forge.r-project.org/scm/viewvc.php/pkg/inst/ghostscript/pdf_info.ps?view=markup&root=tm
Building on Ciprian Tomoiagă's suggestion of using pdfrw, I've uploaded a script which also:
renames files in sub-directories
adds a command-line interface
handles when file name already exists by appending a random string
strips any character which is not alphanumeric from the new file name
replaces non-ASCII characters (such as á è í ò ç...) for ASCII (a e i o c) in the new file name
allows you to set the root dir and limit the length of the new file name from command-line
show a progress bar and, after the script has finished, show some statistics
does some error handling
As TextGeek mentioned, unfortunately not all files have the title metadata, so some files won't be renamed.
Repository: https://github.com/favict/pdf_renamefy
Usage:
After downloading the files, install the dependencies by running pip:
$pip install -r requirements.txt
and then to run the script:
$python -m renamefy <directory> <filename maximum length>
...in which directory is the full path you would like to look for PDF files, and filename maximum length is the length at which the filename will be truncated in case the title is too long or was incorrectly set in the file.
Both parameters are optional. If none is provided, the directory is set to the current directory and filename maximum length is set to 120 characters.
Example:
$python -m renamefy C:\Users\John\Downloads 120
I used it on Windows, but it should work on Linux too.
Feel free to copy, fork and edit as you see fit.
has some issues with defined solutions, here is my recipe
from pathlib import Path
from pdfrw import PdfReader
import re
path_to_files = Path(r"C:\Users\Malac\Desktop\articles\Downloaded")
# Exclude windows forbidden chars for name <>:"/\|?*
# Newlines \n and backslashes will be removed anyway
exclude_chars = '[<>:"/|?*]'
for i in path_to_files.glob("*.pdf"):
try:
title = PdfReader(i).Info.Title
except Exception:
# print(f"File {i} not renamed.")
pass
# Some names was just ()
if not title:
continue
# For some reason, titles are returned in brackets - remove brackets if around titles
if title.startswith("("):
title = title[1:]
if title.endswith(")"):
title = title[:-1]
title = re.sub(exclude_chars, "", title)
title = re.sub(r"\\", "", title)
title = re.sub("\n", "", title)
# Some names are just ()
if not title:
continue
try:
final_path = (path_to_files / title).with_suffix(".pdf")
if final_path.exists():
continue
i.rename(final_path)
except Exception:
# print(f"Name {i} incorrect.")
pass
Once you have installed it, open the app and go to the Download folder. You will see your downloaded files there. Just long press the file you wish to rename and the Rename option will appear at the bottom.
I am trying to loop over a directory of sub folders where every folder contains one .avi file that i want to retrieve its length in seconds.
I've found PyMedia http://pymedia.org/ and i understand it could possibly help me achieve this but i cannot find anything about avi duration / length in the documentation.
How would i be able to do that? also, if there is a different library of some sort i'd like to know aswel.
Edit: Added my final solution that works thanks to J.F. Sebastian
import sys
import glob
import os
from hachoir_core.cmd_line import unicodeFilename
from hachoir_core.i18n import getTerminalCharset
from hachoir_metadata import extractMetadata
from hachoir_parser import createParser
path = "z:\*"
for fpath in glob.glob(os.path.join(path, '*avi')):
filename = fpath
filename, real_filename = unicodeFilename(filename), filename
parser = createParser(filename, real_filename=real_filename)
metadata = extractMetadata(parser)
print fpath
print("Duration (hh:mm:ss.f): %s" % metadata.get('duration'))
print '\n'
You could use hachoir-metadata to extract avi duration from a file:
#!/usr/bin/env python
import sys
# $ pip install hachoir-{core,parser,metadata}
from hachoir_core.cmd_line import unicodeFilename
from hachoir_core.i18n import getTerminalCharset
from hachoir_metadata import extractMetadata
from hachoir_parser import createParser
filename = sys.argv[1]
charset = getTerminalCharset()
filename, real_filename = unicodeFilename(filename, charset), filename
parser = createParser(filename, real_filename=real_filename)
metadata = extractMetadata(parser)
print("Duration (hh:mm:ss.f): %s" % metadata.get('duration'))
It uses pure Python RIFF parser to extract info from avi file.
Example:
$ get-avi-duration.py test.avi
Duration (hh:mm:ss.f): 0:47:03.360000
Here's ffmpeg's output for comparison:
$ ffmpeg -i test.avi |& grep -i duration
Duration: 00:47:03.36, start: 0.000000, bitrate: 1038 kb/s
To print info about all avi files in a directory tree:
#!/usr/bin/env python
import os
import sys
from hachoir_metadata import extractMetadata
from hachoir_parser import createParser
def getinfo(rootdir, extensions=(".avi", ".mp4")):
if not isinstance(rootdir, unicode):
rootdir = rootdir.decode(sys.getfilesystemencoding())
for dirpath, dirs, files in os.walk(rootdir):
dirs.sort() # traverse directories in sorted order
files.sort()
for filename in files:
if filename.endswith(extensions):
path = os.path.join(dirpath, filename)
yield path, extractMetadata(createParser(path))
for path, metadata in getinfo(u"z:\\"):
if metadata.has('duration'):
print(path)
print(" Duration (hh:mm:ss.f): %s" % metadata.get('duration'))
If your server running any UNIX operation system you can use ffmpeg to do this. Usually just default command like ffmpeg myvideo.avi will give you full video details.
There's also a python wrapper for ffmpeg which probably will return video details in dictionary or list.
EDIT:
I've also found nice ffmpeg tool called ffprobe which can output length of video without additional fuss.
fprobe -loglevel error -show_streams inputFile.avi | grep duration | cut -f2 -d=
Not sure if there is a platform independent way to do this, but if you only need this to work on windows then it looks like MediaInfo (below) has a command line interface which you can use to output details about video files, which could then be parsed to get the information. Not the prettiest solution but looks like it should work.
http://mediainfo.sourceforge.net/en