Python docx AttributeError: 'WindowsPath' object has no attribute 'seek' - python

I want to insert about 250 images with their filename into a docx-file.
My test.py file:
from pathlib import Path
import docx
from docx.shared import Cm
filepath = r"C:\Users\Admin\Desktop\img"
document = docx.Document()
for file in Path(filepath).iterdir():
# paragraph = document.add_paragraph(Path(file).resolve().stem)
document.add_picture(Path(file).absolute(), width=Cm(15.0))
document.save('test.docx')
After Debugging I got this Error:
Exception has occurred: AttributeError
'WindowsPath' object has no attribute 'seek'
File "C:\Users\Admin\Desktop\test.py", line 10, in <module>
document.add_picture(Path(file).absolute(), width=Cm(15.0))
How can i avoid this Error?

Have you tried using io.FileIO?
from io import FileIO
from pathlib import Path
import docx
from docx.shared import Cm
filepath = r"C:\Users\Admin\Desktop\img"
document = docx.Document()
for file in Path(filepath).iterdir():
# paragraph = document.add_paragraph(Path(file).resolve().stem)
document.add_picture(FileIO(Path(file).absolute(), "rb"), width=Cm(15.0))
document.save('test.docx')
I encountered the same error using PyPDF2 when passing a file path to PdfFileReader. When I wrapped the PDF file in FileIO like so FileIO(pdf_path, "rb") the error went away and I was able to process the file successfully.

You need to convert the file object to a string type for the Path method.
for file in Path(filepath).iterdir():
# Paragraph = document.add_paragraph(Path(file).resolve().stem)
document.add_picture(Path(str(file)).absolute(), width=Cm(15.0))

In my case, changing the '/' for '\' in the path did the trick. Ex: "C:/Users/Admin/Desktop/img"
(which I believe is probably what wrapping it in FileIO does, but in my case doing this didn't work)
You can also achieve that using
os.path.join(mydir, myfile)
as explained here https://stackoverflow.com/a/2953843/11126742

Simply cast the path object to string:
for file in Path(filepath).iterdir():
path_str = str(Path(file).absolute())
document.add_picture(path_str, width=Cm(15.0))
The problem with using WindowsPath object as an input seems to be that the document.add_picture does not know how to use that to open a file. The seek is a method of a file object.

The problem is within python-docx (still) as of the current version 0.8.11 (from 31/03/2022). Wherein the assumption is that if it's not a string, it must be a file operator. This is an unfortunate limitation of docx design, surely a holdover from pre-Pathlib days, as Path objects have an open method to directly use them as a file operator, and would work as well as str if they weren't being filtered out with an is_string test.
So in order to work around it, you need to pass in a string. Fortunately, pathlib has good coverage for this. Change your loop to pass in the file name. Also, you're already using Path, so skip the raw strings for your filepath
filepath = Path("C:/Users/Admin/Desktop/img")
# filepath = Path(r"C:\Users\Admin\Desktop\img") # alternatively, gives same results
document = docx.Document()
for file in filepath.iterdir():
# paragraph = document.add_paragraph(Path(file).resolve().stem)
document.add_picture(file.as_posix(), width=Cm(15.0))
Additionally, if you want to scrub relative pathing, do not use absolute(), use resolve().
In this case however, you know the files exist. You setup an absolute filepath, so the full path is guaranteed, there is no need for resolve() (or absolute()).
If instead your filepath was relative, you could resolve it once to avoid the overhead of handling each file that comes out of iterdir()
filepath = Path("Desktop/img")
# filepath = Path(r"Desktop\img") # alternatively, gives same results
document = docx.Document()
full_filepath = filepath.resolve() # to Path("C:/Users/Admin/Desktop/img")
# filepath = filepath.resolve() # is also safe
for file in full_filepath.iterdir():
# paragraph = document.add_paragraph(Path(file).resolve().stem)
document.add_picture(file.as_posix(), width=Cm(15.0))
But when it's not certain: resolve() will remove any '..' that may enter into your paths. Behavior on Windows can be unpredictable when the location doesn't exist, but as long as the file (including dirs) already exists, resolve() will give you a full, absolute path. If the file doesn't exist, then it will bizarrely only add a full path on Windows if there are relative steps like '..' in the path. On the other hand, absolute() never scrubs '..' but will always add a root path on Windows, so if you need to be sure, you could call absolute() first and then resolve(), and lastly as_posix() for a string: file.absolute().resolve().as_posix()
But be warned: absolute() is not documented, so its behavior could change or be removed without warning.
As others have written, you can also use str(file). Since Path stores posix safe path-strings, you should find str(file) == file.as_posix() is True in all cases.

Related

Accommodating variable filename for python script

I have to read multiple filenames which i will be treating as input for my python script. But the input files may have variable name depending upon the time it got generated.
File1: RM_Sales_Japan_2011201920191124194200.xlsx
File2: RM_Volume_Australia_201120192019154321194200.xlsx
How to accommodate these changes while reading a file instead of exactly specifying the filename every time we run the script?
Things i tried:
I have used below method in my previous scripts because it had only one file with known extension:
xlsxfile = "*.xlsx"
filelocation = "/user/script/" + xlsxfile
But with multiple files with similar extension i am not sure how to get the definition done.
EDIT1:
I was trying to get more clarity on using glob with read_excel. Please see my example code below:
import os
import glob
import pandas as pd
os.chdir ('D:\\Users\\RMoharir\\Downloads\\Smart Spend\\Input')
fls=glob.glob("Medical*.*")
df1 = pd.read_excel(fls, parse_cols = 'A:H', skiprows = 10, header = None)
But this gives me an error:
ValueError: Invalid file path or buffer object type: <class 'list'>
Any help is appreciated.
If you simply need to find all the files that match a given pattern in a directory, os and re modules have you covered.
import os
import re
files = os.listdir()
for file in files:
if re.match(r".*\.xlsx$", file):
print(file)
This short program will print out every file in the current directory whose name ends with .xslx. If you need to match a more complicated pattern, you may need to read up on Regular Expressions
Note that os.listdir takes an optional string argument of what path to look in, if not given it will look in the directory the program was ran from

Is it possible to get the path of a tempfile in Python 3

I was wondering if it was possible to get the file path of a temporary file made using the tempfile library. Basically, I'm trying to make a function that intakes some data, and generates a temporary csv file based off of said data. I was wondering if there was a way to get the path of this temporary file?
Use tempfile.NamedTemporaryFile to create a temporary file with a name, and then use the .name attribute of the object.
Note that there are platform-specific limitations on how this name can be used. The documentation says:
Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows NT or later).
tempfile.NamedTemporaryFile has a .dir property which will give you want you want.
EDIT: No, it is not .name, #Barmar, but looking through the source code for tempfile, I don't see a .dir property either. However, you can use the .name property in conjunction with os.path's dirname method as follows:
with tempfile.NamedTemporaryFile(suffix='.csv', prefix=os.path.basename(__file__)) as tf:
tf_directory = os.path.dirname(tf.name)
This works just fine to get the full path with the filename
file = tempfile.NamedTemporaryFile()
filename = file.name
The output:
/tmp/tmp_sl4v9ps
Anyway, if you need the path of the tempfile directory you can use tempfile.gettempdir()

Walking through directory path and opening them with trimesh

I have the following code:
import os
import trimesh
# Core settings
rootdir = 'path'
extension = ".zip"
for root, dirs, files in os.walk(rootdir):
if not root.endswith(".zip"):
for file in files:
if file.endswith(".stl"):
mesh = trimesh.load(file)
And I get the following error:
ValueError: File object passed as string that is not a file!
When I open the files one by one however, it works. What could be the reason ?
that's because file is the filename, not the full filepath
Fix that by using os.path.join with the containing directory:
mesh = trimesh.load(os.path.join(root,file))
This is not a direct answer to your question. However, you might be interested in noting that there is now a less complicated paradigm for this situation. It involves using the pathlib module.
I don't use trimesh. I will process pdf documents instead.
First, you can identify all of the pdf files in a directory and its subdirectories recursively with just a single line.
>>> from pathlib import Path
>>> for item in path.glob('**/*.pdf'):
... item
...
WindowsPath('C:/Quantarctica2/Quantarctica-Get_Started.pdf')
WindowsPath('C:/Quantarctica2/Quantarctica2_GetStarted.pdf')
WindowsPath('C:/Quantarctica2/Basemap/Terrain/BEDMAP2/tc-7-375-2013.pdf') WindowsPath('C:/Quantarctica2/Scientific/Glaciology/ALBMAP/1st_ReadMe_ALBMAP_LeBrocq_2010_EarthSystSciData.pdf')
WindowsPath('C:/Quantarctica2/Scientific/Glaciology/ASAID/Bindschadler2011TC_GroundingLines.pdf')
WindowsPath('C:/Quantarctica2/Software/CIA_WorldFactbook_Antarctica.pdf')
WindowsPath('C:/Quantarctica2/Software/CIA_WorldFactbook_SouthernOcean.pdf')
WindowsPath('C:/Quantarctica2/Software/QGIS-2.2-UserGuide-en.pdf')
You will have noticed that (a) the complete paths are made available, and (b) the paths are available within object instances. Fortunately, it's easy to recover the full paths using str.
>>> import fitz
>>> for item in path.glob('**/*.pdf'):
... doc = fitz.Document(str(item))
...
This line shows that the final pdf document has been loaded as a fitz document, ready for subsequent processing.
>>> doc
fitz.Document('C:\Quantarctica2\Software\QGIS-2.2-UserGuide-en.pdf')

Move file to a new folder that happens to have the same file name in Python

Within in my script it's very rare that I run into this problem where I'm trying to move a file to this new folder that already happens to have a file with the same name, but it just happened. So my current code uses the shutil.move method but it errors out with the duplicate file names. I was hoping I could use a simple if statement of checking if source is already in destination and change the name slightly but can't get to that work either. I also read another post on here that used the distutils module for this issue but that one gives me an attribute error. Any other ideas people may have for this?
I added some sample code below. There is already a file called 'file.txt' in the 'C:\data\new' directory. The error given is Destination path already exist.
import shutil
myfile = r"C:\data\file.txt"
newpath = r"C:\data\new"
shutil.move(myfile, newpath)
You can just check that the file exists with os.path.exists and then remove it if it does.
import os
import shutil
myfile = r"C:\data\file.txt"
newpath = r"C:\data\new"
# if check existence of the new possible new path name.
check_existence = os.path.join(newpath, os.path.basename(myfile))
if os.path.exists(check_existence):
os.remove(check_existence)
shutil.move(myfile, newpath)
In Python 3.4 you can try the pathlib module. This is just an example so you can rewrite this to be more efficient/use variables:
import pathlib
import shutil
myfile = r"C:\data\file.txt"
newpath = r"C:\data\new"
p = pathlib.Path("C:\data\new")
if not p.exists():
shutil.move(myfile, newpath)
#Use an else: here to handle your edge case.

How to append a record in a file

I have tried to append a record on the next line in the file using the following code(please note that the file has been created already). But, it does not insert any records at all. The file remains empty.
with open(utmppath+'/'+tmpfile, "a") as myfile:
myfile.write(record+'\n')
myfile.close()
Any suggestion would be great. Thanks
Check additionally if you set your path correctly:
import os
path = utmppath+'/'+tmpfile
assert os.path.isfile(path), path
The assertion checks if the file exists and raises an AssertionError if you used a wrong path. Additionally the used path is included in the error message thanks to the variable
after the comma.
Additionally I recommend you to join files with the help of os.path.join and os.path.abspath. os.path.join concatenates path strings correctly for you and os.path.abspath creates an absolute path.
path = os.path.join(utmppath, tmpfile)
Let's say the wished file is in the same directory like your script and called your_output.txt - you can use this:
path = os.path.abspath(os.path.join(os.path.dirname(__file__), 'your_output.txt'))
By the way, __file__ gives you the name of your script file.

Categories

Resources