I have been using gdal from the command line to convert an asc file to a GeoJSON output. I can do this successfully:
gdal_polygonize.py input.asc -f "GeoJSON" output.json
Now I wish to use Python and follow this process for a range of files.
import gdal
import glob
for file in glob.glob("dir/*.asc"):
new_name = file[:-4] + ".json"
gdal.Polygonize(file, "-f", "GeoJSON", new_name)
Hpwever, for exactly the same file I get the following error TypeError: in method 'Polygonize', argument 1 of type 'GDALRasterBandShadow *'
Why does the command line version work and the python version not?
The easiest way to find what is wrong with your call to gdal.Polygonize is to investigate the documentation for that function. You can find it here by going through the C algorithms API. Admittedly, GDAL's documentation isn't the most coherent and easy to access. This is doubly true for the conversion of the C API to Python.
GDALPolygonize
GDALPolygonize (GDALRasterBandH hSrcBand,
GDALRasterBandH hMaskBand,
OGRLayerH hOutLayer,
int iPixValField,
char ** papszOptions,
GDALProgressFunc pfnProgress,
void * pProgressArg
)
You can see that the first two arguments are RasterBand types. The output type is an OGRLayer, and there are other (in this case unnecessary) options.
To use gdal.Polygonize() you will need to open your input file with gdal, get a raster band, and pass that into the function. Similarly, you will need to create a new geojson vector file, and pass its layer into the function.
Using subprocess
As an alternative, you could employ python's subprocess module to call the same command-line program that you already know.
import subprocess
import glob
import os
for f in glob.glob("dir/*.asc"): # don't override python's file variable
out_file = f[:-4] + ".json"
in_file = os.path.join("dir", f) # need the full path of the input
cmdline = ['gdal_polygonize.py', in_file, ,"-f", "GeoJSON", out_file]
subprocess.call(cmdline)
Related
I want to insert about 250 images with their filename into a docx-file.
My test.py file:
from pathlib import Path
import docx
from docx.shared import Cm
filepath = r"C:\Users\Admin\Desktop\img"
document = docx.Document()
for file in Path(filepath).iterdir():
# paragraph = document.add_paragraph(Path(file).resolve().stem)
document.add_picture(Path(file).absolute(), width=Cm(15.0))
document.save('test.docx')
After Debugging I got this Error:
Exception has occurred: AttributeError
'WindowsPath' object has no attribute 'seek'
File "C:\Users\Admin\Desktop\test.py", line 10, in <module>
document.add_picture(Path(file).absolute(), width=Cm(15.0))
How can i avoid this Error?
Have you tried using io.FileIO?
from io import FileIO
from pathlib import Path
import docx
from docx.shared import Cm
filepath = r"C:\Users\Admin\Desktop\img"
document = docx.Document()
for file in Path(filepath).iterdir():
# paragraph = document.add_paragraph(Path(file).resolve().stem)
document.add_picture(FileIO(Path(file).absolute(), "rb"), width=Cm(15.0))
document.save('test.docx')
I encountered the same error using PyPDF2 when passing a file path to PdfFileReader. When I wrapped the PDF file in FileIO like so FileIO(pdf_path, "rb") the error went away and I was able to process the file successfully.
You need to convert the file object to a string type for the Path method.
for file in Path(filepath).iterdir():
# Paragraph = document.add_paragraph(Path(file).resolve().stem)
document.add_picture(Path(str(file)).absolute(), width=Cm(15.0))
In my case, changing the '/' for '\' in the path did the trick. Ex: "C:/Users/Admin/Desktop/img"
(which I believe is probably what wrapping it in FileIO does, but in my case doing this didn't work)
You can also achieve that using
os.path.join(mydir, myfile)
as explained here https://stackoverflow.com/a/2953843/11126742
Simply cast the path object to string:
for file in Path(filepath).iterdir():
path_str = str(Path(file).absolute())
document.add_picture(path_str, width=Cm(15.0))
The problem with using WindowsPath object as an input seems to be that the document.add_picture does not know how to use that to open a file. The seek is a method of a file object.
The problem is within python-docx (still) as of the current version 0.8.11 (from 31/03/2022). Wherein the assumption is that if it's not a string, it must be a file operator. This is an unfortunate limitation of docx design, surely a holdover from pre-Pathlib days, as Path objects have an open method to directly use them as a file operator, and would work as well as str if they weren't being filtered out with an is_string test.
So in order to work around it, you need to pass in a string. Fortunately, pathlib has good coverage for this. Change your loop to pass in the file name. Also, you're already using Path, so skip the raw strings for your filepath
filepath = Path("C:/Users/Admin/Desktop/img")
# filepath = Path(r"C:\Users\Admin\Desktop\img") # alternatively, gives same results
document = docx.Document()
for file in filepath.iterdir():
# paragraph = document.add_paragraph(Path(file).resolve().stem)
document.add_picture(file.as_posix(), width=Cm(15.0))
Additionally, if you want to scrub relative pathing, do not use absolute(), use resolve().
In this case however, you know the files exist. You setup an absolute filepath, so the full path is guaranteed, there is no need for resolve() (or absolute()).
If instead your filepath was relative, you could resolve it once to avoid the overhead of handling each file that comes out of iterdir()
filepath = Path("Desktop/img")
# filepath = Path(r"Desktop\img") # alternatively, gives same results
document = docx.Document()
full_filepath = filepath.resolve() # to Path("C:/Users/Admin/Desktop/img")
# filepath = filepath.resolve() # is also safe
for file in full_filepath.iterdir():
# paragraph = document.add_paragraph(Path(file).resolve().stem)
document.add_picture(file.as_posix(), width=Cm(15.0))
But when it's not certain: resolve() will remove any '..' that may enter into your paths. Behavior on Windows can be unpredictable when the location doesn't exist, but as long as the file (including dirs) already exists, resolve() will give you a full, absolute path. If the file doesn't exist, then it will bizarrely only add a full path on Windows if there are relative steps like '..' in the path. On the other hand, absolute() never scrubs '..' but will always add a root path on Windows, so if you need to be sure, you could call absolute() first and then resolve(), and lastly as_posix() for a string: file.absolute().resolve().as_posix()
But be warned: absolute() is not documented, so its behavior could change or be removed without warning.
As others have written, you can also use str(file). Since Path stores posix safe path-strings, you should find str(file) == file.as_posix() is True in all cases.
Is there a way of converting SCAD files to STL format efficiently in Python? I have around 3000 files to be converted to STL. Plus, there are some different formats.
I tried searching on the internet for some libraries but was not able to find any suitable one (I am using Windows OS) Anyone has any idea?
you can run openscad from command line, see documentation,
and prepare every command by python (example in python3)
from os import listdir
from subprocess import call
files = listdir('.')
for f in files:
if f.find(".scad") >= 0: # get all .scad files in directory
of = f.replace('.scad', '.stl') # name of the outfile .stl
cmd = 'call (["openscad", "-o", "{}", "{}"])'.format(of, f) #create openscad command
exec(cmd)
in python3.5 and higher subprocess.call should be replaced by subrocess.run()
This question already has answers here:
How to find the mime type of a file in python?
(18 answers)
Closed 1 year ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
I have a folder full of files and they don't have an extension. How can I check file types? I want to check the file type and change the filename accordingly. Let's assume a function filetype(x) returns a file type like png. I want to do this:
files = os.listdir(".")
for f in files:
os.rename(f, f+filetype(f))
How do I do this?
There are Python libraries that can recognize files based on their content (usually a header / magic number) and that don't rely on the file name or extension.
If you're addressing many different file types, you can use python-magic. That's just a Python binding for the well-established magic library. This has a good reputation and (small endorsement) in the limited use I've made of it, it has been solid.
There are also libraries for more specialized file types. For example, the Python standard library has the imghdr module that does the same thing just for image file types.
If you need dependency-free (pure Python) file type checking, see filetype.
The Python Magic library provides the functionality you need.
You can install the library with pip install python-magic and use it as follows:
>>> import magic
>>> magic.from_file('iceland.jpg')
'JPEG image data, JFIF standard 1.01'
>>> magic.from_file('iceland.jpg', mime=True)
'image/jpeg'
>>> magic.from_file('greenland.png')
'PNG image data, 600 x 1000, 8-bit colormap, non-interlaced'
>>> magic.from_file('greenland.png', mime=True)
'image/png'
The Python code in this case is calling to libmagic beneath the hood, which is the same library used by the *NIX file command. Thus, this does the same thing as the subprocess/shell-based answers, but without that overhead.
On unix and linux there is the file command to guess file types. There's even a windows port.
From the man page:
File tests each argument in an attempt to classify it. There are three
sets of tests, performed in this order: filesystem tests, magic number
tests, and language tests. The first test that succeeds causes the
file type to be printed.
You would need to run the file command with the subprocess module and then parse the results to figure out an extension.
edit: Ignore my answer. Use Chris Johnson's answer instead.
In the case of images, you can use the imghdr module.
>>> import imghdr
>>> imghdr.what('8e5d7e9d873e2a9db0e31f9dfc11cf47') # You can pass a file name or a file object as first param. See doc for optional 2nd param.
'png'
Python 2 imghdr doc
Python 3 imghdr doc
import subprocess as sub
p = sub.Popen('file yourfile.txt', stdout=sub.PIPE, stderr=sub.PIPE)
output, errors = p.communicate()
print(output)
As Steven pointed out, subprocess is the way. You can get the command output by the way above as this post said
You can also install the official file binding for Python, a library called file-magic (it does not use ctypes, like python-magic).
It's available on PyPI as file-magic and on Debian as python-magic. For me this library is the best to use since it's available on PyPI and on Debian (and probably other distributions), making the process of deploying your software easier.
I've blogged about how to use it, also.
With newer subprocess library, you can now use the following code (*nix only solution):
import subprocess
import shlex
filename = 'your_file'
cmd = shlex.split('file --mime-type {0}'.format(filename))
result = subprocess.check_output(cmd)
mime_type = result.split()[-1]
print mime_type
also you can use this code (pure python by 3 byte of header file):
full_path = os.path.join(MEDIA_ROOT, pathfile)
try:
image_data = open(full_path, "rb").read()
except IOError:
return "Incorrect Request :( !!!"
header_byte = image_data[0:3].encode("hex").lower()
if header_byte == '474946':
return "image/gif"
elif header_byte == '89504e':
return "image/png"
elif header_byte == 'ffd8ff':
return "image/jpeg"
else:
return "binary file"
without any package install [and update version]
Only works for Linux but Using the "sh" python module you can simply call any shell command
https://pypi.org/project/sh/
pip install sh
import sh
sh.file("/root/file")
Output:
/root/file: ASCII text
This code list all files of a given extension in a given folder recursively
import magic
import glob
from os.path import isfile
ROOT_DIR = 'backup'
WANTED_EXTENSION = 'sqlite'
for filename in glob.iglob(ROOT_DIR + '/**', recursive=True):
if isfile(filename):
extension = magic.from_file(filename, mime = True)
if WANTED_EXTENSION in extension:
print(filename)
https://gist.github.com/izmcm/6a5d6fa8d4ec65fd9851a1c06c8946ac
I have a python module that has a variety of data files, (a set of csv files representing curves) that need to be loaded at runtime. The csv module works very well
# curvefile = "ntc.10k.csv"
raw = csv.reader(open(curvefile, 'rb'), delimiter=',')
But if I import this module into another script, I need to find the full path to the data file.
/project
/shared
curve.py
ntc.10k.csv
ntc.2k5.csv
/apps
script.py
I want the script.py to just refer to the curves by basic filename, not with full paths. In the module code, I can use:
pkgutil.get_data("curve", "ntc.10k.csv")
which works very well at finding the file, but it returns the csv file already read in, whereas the csv.reader requires the file handle itself. Is there any way to make these two modules play well together? They're both standard libary modules, so I wasn't really expecting problems. I know I can start splitting the pkgutil binary file data, but then I might as well not be using the csv library.
I know I can just use this in the module code, and forget about pkgutils, but it seems like pkgutils is really exactly what this is for.
this_dir, this_filename = os.path.split(__file__)
DATA_PATH = os.path.join(this_dir, curvefile)
raw = csv.reader(open(DATA_PATH, "rb"))
I opened up the source code to get_data, and it is trivial to have it return the path to the file instead of the loaded file. This module should do the trick. Use the keyword as_string=True to return the file read into memory, or as_string=False, to return the path.
import os, sys
from pkgutil import get_loader
def get_data_smart(package, resource, as_string=True):
"""Rewrite of pkgutil.get_data() that actually lets the user determine if data should
be returned read into memory (aka as_string=True) or just return the file path.
"""
loader = get_loader(package)
if loader is None or not hasattr(loader, 'get_data'):
return None
mod = sys.modules.get(package) or loader.load_module(package)
if mod is None or not hasattr(mod, '__file__'):
return None
# Modify the resource name to be compatible with the loader.get_data
# signature - an os.path format "filename" starting with the dirname of
# the package's __file__
parts = resource.split('/')
parts.insert(0, os.path.dirname(mod.__file__))
resource_name = os.path.join(*parts)
if as_string:
return loader.get_data(resource_name)
else:
return resource_name
It's not ideal, especially for very large files, but you can use StringIO to turn a string into something with a read() method, which csv.reader should be able to handle.
csvdata = pkgutil.get_data("curve", "ntc.10k.csv")
csvio = StringIO(csvdata)
raw = csv.reader(csvio)
Over 10 years after the question has been asked, but I came here using Google and went down the rabbit hole posted in other answers. Nowadays this seems to be more straightforward. Below my implementation using stdlib's importlib that returns the filesystem path to the package's resource as string. Should work with 3.6+.
import importlib.resources
import os
def get_data_file_path(package: str, resource: str) -> str:
"""
Returns the filesystem path of a resource marked as package
data of a Python package installed.
:param package: string of the Python package the resource is
located in, e.g. "mypackage.module"
:param resource: string of the filename of the resource (do not
include directory names), e.g. "myfile.png"
:return: string of the full (absolute) filesystem path to the
resource if it exists.
:raises ModuleNotFoundError: In case the package `package` is not found.
:raises FileNotFoundError: In case the file in `resource` is not
found in the package.
"""
# Guard against non-existing files, or else importlib.resources.path
# may raise a confusing TypeError.
if not importlib.resources.is_resource(package, resource):
raise FileNotFoundError(f"Python package '{package}' resource '{resource}' not found.")
with importlib.resources.path(package, resource) as resource_path:
return os.fspath(resource_path)
Another way is to use json.loads() along-with file.decode(). As get_data() retrieves data as bytes, need to convert it to string in-order to process it as json
import json
import pkgutil
data_file = pkgutil.get_data('test.testmodel', 'data/test_data.json')
length_data_file = len(json.loads(data_file.decode()))
Reference
I have a python web form with two options - File upload and textarea. I need to take the values from each and pass them to another command-line program. I can easily pass the file name with file upload options, but I am not sure how to pass the value of the textarea.
I think what I need to do is:
Generate a unique file name
Create a temporary file with that name in the working directory
Save the values passed from textarea into the temporary file
Execute the commandline program from inside my python module and pass it the name of the temporary file
I am not sure how to generate a unique file name. Can anybody give me some tips on how to generate a unique file name? Any algorithms, suggestions, and lines of code are appreciated.
Thanks for your concern
I didn't think your question was very clear, but if all you need is a unique file name...
import uuid
unique_filename = str(uuid.uuid4())
If you want to make temporary files in Python, there's a module called tempfile in Python's standard libraries. If you want to launch other programs to operate on the file, use tempfile.mkstemp() to create files, and os.fdopen() to access the file descriptors that mkstemp() gives you.
Incidentally, you say you're running commands from a Python program? You should almost certainly be using the subprocess module.
So you can quite merrily write code that looks like:
import subprocess
import tempfile
import os
(fd, filename) = tempfile.mkstemp()
try:
tfile = os.fdopen(fd, "w")
tfile.write("Hello, world!\n")
tfile.close()
subprocess.Popen(["/bin/cat", filename]).wait()
finally:
os.remove(filename)
Running that, you should find that the cat command worked perfectly well, but the temporary file was deleted in the finally block. Be aware that you have to delete the temporary file that mkstemp() returns yourself - the library has no way of knowing when you're done with it!
(Edit: I had presumed that NamedTemporaryFile did exactly what you're after, but that might not be so convenient - the file gets deleted immediately when the temp file object is closed, and having other processes open the file before you've closed it won't work on some platforms, notably Windows. Sorry, fail on my part.)
The uuid module would be a good choice, I prefer to use uuid.uuid4().hex as random filename because it will return a hex string without dashes.
import uuid
filename = uuid.uuid4().hex
The outputs should like this:
>>> import uuid
>>> uuid.uuid()
UUID('20818854-3564-415c-9edc-9262fbb54c82')
>>> str(uuid.uuid4())
'f705a69a-8e98-442b-bd2e-9de010132dc4'
>>> uuid.uuid4().hex
'5ad02dfb08a04d889e3aa9545985e304' # <-- this one
Maybe you need unique temporary file?
import tempfile
f = tempfile.NamedTemporaryFile(mode='w+b', delete=False)
print f.name
f.close()
f is opened file. delete=False means do not delete file after closing.
If you need control over the name of the file, there are optional prefix=... and suffix=... arguments that take strings. See https://docs.python.org/3/library/tempfile.html.
You can use the datetime module
import datetime
uniq_filename = str(datetime.datetime.now().date()) + '_' + str(datetime.datetime.now().time()).replace(':', '.')
Note that:
I am using replace since the colons are not allowed in filenames in many operating systems.
That's it, this will give you a unique filename every single time.
In case you need short unique IDs as your filename, try shortuuid, shortuuid uses lowercase and uppercase letters and digits, and removing similar-looking characters such as l, 1, I, O and 0.
>>> import shortuuid
>>> shortuuid.uuid()
'Tw8VgM47kSS5iX2m8NExNa'
>>> len(ui)
22
compared to
>>> import uuid
>>> unique_filename = str(uuid.uuid4())
>>> len(unique_filename)
36
>>> unique_filename
'2d303ad1-79a1-4c1a-81f3-beea761b5fdf'
I came across this question, and I will add my solution for those who may be looking for something similar. My approach was just to make a random file name from ascii characters. It will be unique with a good probability.
from random import sample
from string import digits, ascii_uppercase, ascii_lowercase
from tempfile import gettempdir
from os import path
def rand_fname(suffix, length=8):
chars = ascii_lowercase + ascii_uppercase + digits
fname = path.join(gettempdir(), 'tmp-'
+ ''.join(sample(chars, length)) + suffix)
return fname if not path.exists(fname) \
else rand_fname(suffix, length)
This can be done using the unique function in ufp.path module.
import ufp.path
ufp.path.unique('./test.ext')
if current path exists 'test.ext' file. ufp.path.unique function return './test (d1).ext'.
To create a unique file path if its exist, use random package to generate a new string name for file. You may refer below code for same.
import os
import random
import string
def getUniquePath(folder, filename):
path = os.path.join(folder, filename)
while os.path.exists(path):
path = path.split('.')[0] + ''.join(random.choice(string.ascii_lowercase) for i in range(10)) + '.' + path.split('.')[1]
return path
Now you can use this path to create file accordingly.