Removing hyperlinks in PowerPoint with python-pptx - python

Quite new to XML and the python-pptx module I want to remove a single hyperlink that is present on every page
my own attempt so far has been to retrieve my files, change to zip format and unzip them into separate folders
I then locate the following attribute <a:hlinkClick r:id="RelId4">
and remove it whilst removing the Relationshipattribute within the xml.rels file which corresponds to this slide.
I then rezip and change the extension to pptx and this loads fines. I then tried to replicate this in Python so I can create an on-going automation.
my attempt:
from pathlib import Path
import zipfile as zf
from pptx import Presentation
import re
import xml.etree.ElementTree as ET
path = 'mypath'
ppts = [files for files in Path(path).glob('*.pptx')]
for file in ppts:
file.rename(file.with_suffix('.zip'))
zip_files = ppts = [files for files in Path(path).glob('*.zip')]
for zips in zip_files:
with zf.ZipFile(zips,'r') as zip_ref:
zip_ref.extractall(Path(path).joinpath('zipFiles',zips.stem))
I then do some further filtering and end up with my xmls from the rels folder & the ppt/slide folder.
it's here that I get stuck I can read my xml with the ElementTree Module but I cannot find the relevant tag to remove?
for file in normal_xmls:
tree = (ET.parse(file).getroot())
y = tree.findall('a')
print(y)
this yields nothing, I tried to use the python-pptx module but the .Action.Hyperlink doesn't seem to be a complete feature unless I am misunderstanding the API.

To remove a hyperlink from a shape (the kind where clicking on the shape navigates somewhere), set the hyperlink address to None:
shape.click_action.hyperlink.address = None

Related

Recreating Folder Structure and moving documents in to correct the folders

we use a DMS that spits out a document report csv that looks like the one below. The tool that comes with the DMS to export files does not allow you to export folders only documents. I am trying to create PowerShell or python script. To recreate the the folders and move the documents to the folders.
The Report does not include file extensions so they vary.
Column A is the path of the folder from the DMS.
Column B File name without extension
Document Report Example
you would do this:
# Python program to explain shutil.copyfile() method
# importing shutil module
import shutil
# Source path
source = "/home/User/Documents/file.txt"
# Destination path
destination = "/home/User/Documents/file.txt"
# Copy the content of
# source to destination
shutil.copyfile(source, destination)
in your particular case you would put this into a loop so that you can complete each one...

Delete particular mp3 files in a directory that has no metadata(Title,artist etc)?

I want to delete particular mp3 files in a directory that has no metadata(Title, artist, etc). I tried sorting the files and then deleting it manually but it didn't work as the number of files is huge(~30K). Is there any python script to accomplish this task?
This example script is not completed, but I hope that good start point for you.
import os
import glob
from mutagen.easyid3 import EasyID3
mp3_files_list = glob.glob(path/to/your/mp3-files-folder/*.mp3) # example path /home/user/Downloads/mp3/*.mp3
for mp3_file in mp3_files_list:
audio = EasyID3(mp3_file)
if not audio['title']: # if title tag is empty
os.remove(mp3_file)
if not audio['artist']:
os.remove(mp3_file)
# etc tags check
Mutagen module documentation: https://mutagen.readthedocs.io/en/latest/

How to import every docx file in a folder into Python?

I'm pretty new to Python and I'm using the Python-docx module to manipulate some docx files.
I'm importing the docx files using this code:
doc = docx.Document('filename.docx')
The thing is that I need to work with many docx files and in order to avoid write the same line code for each file, I was wondering, if I create a folder in my working directory, is there a way to import all the docx files in a more efficient way?
Something like:
from glob import glob
def edit_document(path):
document = docx.Document(path)
# --- do things on document ---
for path in glob.glob("./*.docx"):
edit_document(path)
You'll need to adjust the glob expression to suit.
There are plenty of other ways to do that part, like os.walk() if you want to recursively descend directories, but this is maybe a good place to start.

Walking through directory path and opening them with trimesh

I have the following code:
import os
import trimesh
# Core settings
rootdir = 'path'
extension = ".zip"
for root, dirs, files in os.walk(rootdir):
if not root.endswith(".zip"):
for file in files:
if file.endswith(".stl"):
mesh = trimesh.load(file)
And I get the following error:
ValueError: File object passed as string that is not a file!
When I open the files one by one however, it works. What could be the reason ?
that's because file is the filename, not the full filepath
Fix that by using os.path.join with the containing directory:
mesh = trimesh.load(os.path.join(root,file))
This is not a direct answer to your question. However, you might be interested in noting that there is now a less complicated paradigm for this situation. It involves using the pathlib module.
I don't use trimesh. I will process pdf documents instead.
First, you can identify all of the pdf files in a directory and its subdirectories recursively with just a single line.
>>> from pathlib import Path
>>> for item in path.glob('**/*.pdf'):
... item
...
WindowsPath('C:/Quantarctica2/Quantarctica-Get_Started.pdf')
WindowsPath('C:/Quantarctica2/Quantarctica2_GetStarted.pdf')
WindowsPath('C:/Quantarctica2/Basemap/Terrain/BEDMAP2/tc-7-375-2013.pdf') WindowsPath('C:/Quantarctica2/Scientific/Glaciology/ALBMAP/1st_ReadMe_ALBMAP_LeBrocq_2010_EarthSystSciData.pdf')
WindowsPath('C:/Quantarctica2/Scientific/Glaciology/ASAID/Bindschadler2011TC_GroundingLines.pdf')
WindowsPath('C:/Quantarctica2/Software/CIA_WorldFactbook_Antarctica.pdf')
WindowsPath('C:/Quantarctica2/Software/CIA_WorldFactbook_SouthernOcean.pdf')
WindowsPath('C:/Quantarctica2/Software/QGIS-2.2-UserGuide-en.pdf')
You will have noticed that (a) the complete paths are made available, and (b) the paths are available within object instances. Fortunately, it's easy to recover the full paths using str.
>>> import fitz
>>> for item in path.glob('**/*.pdf'):
... doc = fitz.Document(str(item))
...
This line shows that the final pdf document has been loaded as a fitz document, ready for subsequent processing.
>>> doc
fitz.Document('C:\Quantarctica2\Software\QGIS-2.2-UserGuide-en.pdf')

MetaData of downloaded zipped file

url='http://www.test.com/test.zip'
z = zipfile.ZipFile(BytesIO(urllib.urlopen(url).read()))
z.extractall(path='D:')
I am writing above code to download a zipped file from a url and have downloaded and extracted all files from it to a specified drive and it is working fine.
Is there a way I can get meta data of all files extracted from z for example.
filenames,file sizes and file extenstions etc?
Zipfile objects actually have built in tools for this that you can use without even extracting anything. infolist returns a list of ZipInfo objects that you can read certain information out of, including full file name and uncompressed size.
import os
url='http://www.test.com/test.zip'
z = zipfile.ZipFile(BytesIO(urllib.urlopen(url).read()))
info = z.infolist()
data = []
for obj in info:
name = os.path.splitext(obj.filename)
data.append(name[0], name[1], obj.file_size)
I also used os.path.splitext just to separate out the file's name from its extension as you did ask for file type separately from the name.
I don't know of a built-in way to do that using the zipfile module, however it is easily done using os.path:
import os
EXTRACT_PATH = "D:"
z= zipfile.ZipFile(BytesIO(urllib.urlopen(url).read()))
z.extractall(path=EXTRACT_PATH)
extracted_files = [os.path.join(EXTRACT_PATH, filename) for filename in z.namelist()]
for extracted_file in extracted_files:
# All metadata operations here, such as:
print os.path.getsize(extracted_file)

Categories

Resources