With python I want to transform Joomla ini language files to sql. However the joomla ini files actually misses any section (example: [translations])
Since the rawconfigparser almost does the job but it demands a section, so I construct a temp file with a 'dummy' section named [ALL]:
fout = tempfile.NamedTemporaryFile(delete=True)
fin = file(self._inFilename, "r")
fout.write("[ALL]\n")
for f in fin.read():
fout.write(f)
config = ConfigParser.RawConfigParser(allow_no_value=True)
config.read(fout.name)
for c in config.items("ALL"):
self._ini2sql(unicode(c[0]).upper(), unicode('de'), unicode(c[1][1:-1]))
However... this is def. not the most elegant solution... any tips to make this more pythonic?
You could use a StringIO instead of creating an actual file:
from cStringIO import StringIO
import shutil
data = StringIO()
data.write('[ALL]\n')
with open(self._infilename, 'r') as f:
shutil.copyfileobj(f, data)
data.seek(0)
config.readfp(data)
You can use StringIO instead, which is keeping the content in the RAM:
import cStringIO
fout = cStringIO.StringIO()
fout.write("[ALL]\n")
with open(self._inFilename) as fobj:
fout.write(fobj.read())
fout.seek(0)
config = ConfigParser.RawConfigParser(allow_no_value=True)
config.readfp(fout)
Please note, there is some optimization in contrast to your code, which is important for you to learn:
Always safely close a file. This is done with the with statement.
You are iterating over each char of the input and writing it. This is not necessary and a serious performance drawback.
As an alternative to ConfigParser I would really recommend the configobj library, which has a much cleaner and more pythonic API (and does not require a default section). Example:
from configobj import ConfigObj
config = ConfigObj('myConfigFile.ini')
config.get('key1')
config.get('key2')
Reading .ini file in current directory
import configparser
import os
ini_file = configparser.ConfigParser()
ini_file_path = os.path.join(os.path.dirname(__file__),"filename.ini")
ini_file.read(ini_file_path) # ini_file as a dictionary
print (ini_file["key1"])
Related
An existing Python package requires a filepath as input parameter for a method to be able to parse the file from the filepath. I want to use this very specific Python package in a cloud environment, where I can't write files to the harddrive. I don't have direct control over the code in the existing Python package, and it's not easy to switch to another environment, where I would be able to write files to the harddrive. So I'm looking for a solution that is able to write a file to a memory filepath, and let the parser read directly from this memory filepath. Is this possible in Python? Or are there any other solutions?
Example Python code that works by using harddrive, which should be changed so that no harddrive is used:
temp_filepath = "./temp.txt"
with open(temp_filepath, "wb") as file:
file.write("some binary data")
model = Model()
model.parse(temp_filepath)
Example Python code that uses memory filesystem to store file, but which does not let parser read file from memory filesystem:
from fs import open_fs
temp_filepath = "./temp.txt"
with open_fs('osfs://~/') as home_fs:
home_fs.writetext(temp_filepath, "some binary data")
model = Model()
model.parse(temp_filepath)
You're probably looking for StringIO or BytesIO from io
import io
with io.BytesIO() as tmp:
tmp.write(content)
# to continue working, rewind file pointer
tmp.seek(0)
# work with tmp
pathlib may also be an advantage
I would like to create a temporary file with an specific name (and if its possible with an specific extension).
Example:
-mytempfile.txt
-mytempfile2.xml
I've been reading about tempfile library, but as far as I know I can only set the following parameters
(mode='w+b', buffering=None, encoding=None, newline=None, suffix=None, prefix=None, dir=None)
The most secure method to do what you are asking for is, as Dan points out there is no need to specify any name to the file, I am only using suffix and prefix as OP asked for it in the question.
import os
import tempfile as tfile
fd, path = tfile.mkstemp(suffix=".txt",prefix="abc") #can use anything
try:
with os.fdopen(fd, 'w') as tmpo:
# do stuff with temp file
tmpo.write('something here')
finally:
os.remove(path)
to understand more about the security aspects attached to this you can refer to this link
well if you can't use os and are required to perform these actions then consider using the following code.
import tempfile as tfile
temp_file=tfile.NamedTemporaryFile(mode="w",suffix=".xml",prefix="myname")
a=temp_file.name
temp_file.write("12")
temp_file.close()
a will give you the complete path to the file eg as:
'/tmp/mynamesomething.xml'
in case you don't want to delete the file in the end then use:
temp_file=tfile.NamedTemporaryFile(delete=False) #along with other parameters of course.
I have a working python program that reads in a number of large netCDF files using the Dataset command from the netCDF4 module. Here is a snippet of the relevant parts:
from netCDF4 import Dataset
import glob
infile_root = 'start_of_file_name_'
for infile in sorted(glob.iglob(infile_root + '*')):
ncin = Dataset(infile,'r')
ncin.close()
I want to modify this to read in netCDF files that are gzipped. The files themselves were gzipped after creation; they are not internally compressed (i.e., the files are *.nc.gz). If I were reading in gzipped text files, the command would be:
from netCDF4 import Dataset
import glob
import gzip
infile_root = 'start_of_file_name_'
for infile in sorted(glob.iglob(infile_root + '*.gz')):
f = gzip.open(infile, 'rb')
file_content = f.read()
f.close()
After googling around for maybe half an hour and reading through the netCDF4 documentation, the only way I can come up with to do this for netCDF files is:
from netCDF4 import Dataset
import glob
import os
infile_root = 'start_of_file_name_'
for infile in sorted(glob.iglob(infile_root + '*.gz')):
os.system('gzip -d ' + infile)
ncin = Dataset(infile[:-3],'r')
ncin.close()
os.system('gzip ' + infile[:-3])
Is it possible to read gzip files with the Dataset command directly? Or without otherwise calling gzip through os?
Reading datasets from memory is supported since netCDF4-1.2.8 (Changelog):
import netCDF4
import gzip
with gzip.open('test.nc.gz') as gz:
with netCDF4.Dataset('dummy', mode='r', memory=gz.read()) as nc:
print(nc.variables)
See the description of the memory parameter in the Dataset documentation
Because NetCDF4-Python wraps the C NetCDF4 library, you're out of luck as far as using the gzip module to pass in a file-like object. The only option is, as suggested by #tdelaney, to use the gzip to extract to a temporary file.
If you happen to have any control over the creation of these files, NetCDF version 4 files support zlib compression internally, so that using gzip is superfluous. It might also be worth converting the files from version 3 to version 4 if you need to repeatedly process these files.
Since I just had to solve the same problem, here is a ready-made solution:
import gzip
import os
import shutil
import tempfile
import netCDF4
def open_netcdf(fname):
if fname.endswith(".gz"):
infile = gzip.open(fname, 'rb')
tmp = tempfile.NamedTemporaryFile(delete=False)
shutil.copyfileobj(infile, tmp)
infile.close()
tmp.close()
data = netCDF4.Dataset(tmp.name)
os.unlink(tmp.name)
else:
data = netCDF4.Dataset(fname)
return data
I have a problem with compression in Python.
I know I should call the ZIP_DEFLATED method when writing to make the zip file compressed, but it does not work for me.
I have 3 PDF documents in the C:zip directory.
When I run the following code it works just fine:
import os,sys
list = os.listdir('C:\zip')
file = ZipFile('test.zip','w')
for item in list:
file.write(item)
file.close()
It makes the test.zip file without the compression.
When I change the fourth row to this:
file = ZipFile('test.zip','w', compression = ZIP_DEFLATED)
It also makes the test.zip file without the compression.
I also tried to change the write method to give it the compress_ type argument:
file.write(item, compress_type = ZIP_DEFLATED)
But that doesn't work either.
I use Python version 2.7.4 with Win7.
I tired the code with another computer (same circumstances, Win7 and Python 2.7.4), and it made the test.zip file compressed just like it should.
I know the zlib module should be available, when I run this:
import zlib
It doesn't return an error, also if there would be something wrong with the zlib module the code at the top should had return an error too, so I suspect that zlib isn't the problem.
By default the ZIP module only store data, to compress it you can do this:
import zipfile
try:
import zlib
mode= zipfile.ZIP_DEFLATED
except:
mode= zipfile.ZIP_STORED
zip= zipfile.ZipFile('zipfilename', 'w', mode)
zip.write(item)
zip.close()
In case you get here as I did, I'll add something.
If you use ZipInfo objects, they always override the compression method specified while creating the ZipFile, which is then useless.
So either you set their compression method (no parameter on the constructor, you must set the attribute) or specify the compression method when calling write (or writestr).
import zlib
from zipfile import ZipFile, ZipInfo, ZIP_DEFLATED
def write_things():
zip_buffer = io.BytesIO()
with ZipFile(file = zip_buffer, mode = "w", compression = ZIP_DEFLATED) as zipper:
# Get some data to write
fname, content, zip_ts = get_file_data()
file_object = ZipInfo(fname, zip_ts)
zipper.writestr(file_object, content) # Surprise, no compression
# This is required to get compression
# zipper.writestr(file_object, content, compress_type = ZIP_DEFLATED)
If I have 1000+ pdf files need to be merged into one pdf,
from PyPDF2 import PdfReader, PdfWriter
writer = PdfWriter()
for i in range(1000):
filepath = f"my/pdfs/{i}.pdf"
reader = PdfReader(open(filepath, "rb"))
for page in reader.pages:
writer.add_page(page)
with open("document-output.pdf", "wb") as fh:
writer.write(fh)
Execute the above code,when reader = PdfReader(open(filepath, "rb")),
An error message:
IOError: [Errno 24] Too many open files:
I think this is a bug, If not, What should I do?
I recently came across this exact same problem, so I dug into PyPDF2 to see what's going on, and how to resolve it.
Note: I am assuming that filename is a well-formed file path string. Assume the same for all of my code
The Short Answer
Use the PdfFileMerger() class instead of the PdfFileWriter() class. I've tried to provide the following to as closely resemble your content as I could:
from PyPDF2 import PdfFileMerger, PdfFileReader
[...]
merger = PdfFileMerger()
for filename in filenames:
merger.append(PdfFileReader(file(filename, 'rb')))
merger.write("document-output.pdf")
The Long Answer
The way you're using PdfFileReader and PdfFileWriter is keeping each file open, and eventually causing Python to generate IOError 24. To be more specific, when you add a page to the PdfFileWriter, you are adding references to the page in the open PdfFileReader (hence the noted IO Error if you close the file). Python detects the file to still be referenced and doesn't do any garbage collection / automatic file closing despite re-using the file handle. They remain open until PdfFileWriter no longer needs access to them, which is at output.write(outputStream) in your code.
To solve this, create copies in memory of the content, and allow the file to be closed. I noticed in my adventures through the PyPDF2 code that the PdfFileMerger() class already has this functionality, so instead of re-inventing the wheel, I opted to use it instead. I learned, though, that my initial look at PdfFileMerger wasn't close enough, and that it only created copies in certain conditions.
My initial attempts looked like the following, and were resulting in the same IO Problems:
merger = PdfFileMerger()
for filename in filenames:
merger.append(filename)
merger.write(output_file_path)
Looking at the PyPDF2 source code, we see that append() requires fileobj to be passed, and then uses the merge() function, passing in it's last page as the new files position. merge() does the following with fileobj (before opening it with PdfFileReader(fileobj):
if type(fileobj) in (str, unicode):
fileobj = file(fileobj, 'rb')
my_file = True
elif type(fileobj) == file:
fileobj.seek(0)
filecontent = fileobj.read()
fileobj = StringIO(filecontent)
my_file = True
elif type(fileobj) == PdfFileReader:
orig_tell = fileobj.stream.tell()
fileobj.stream.seek(0)
filecontent = StringIO(fileobj.stream.read())
fileobj.stream.seek(orig_tell)
fileobj = filecontent
my_file = True
We can see that the append() option does accept a string, and when doing so, assumes it's a file path and creates a file object at that location. The end result is the exact same thing we're trying to avoid. A PdfFileReader() object holding open a file until the file is eventually written!
However, if we either make a file object of the file path string or a PdfFileReader(see Edit 2) object of the path string before it gets passed into append(), it will automatically create a copy for us as a StringIO object, allowing Python to close the file.
I would recommend the simpler merger.append(file(filename, 'rb')), as others have reported that a PdfFileReader object may stay open in memory, even after calling writer.close().
Hope this helped!
EDIT: I assumed you were using PyPDF2, not PyPDF. If you aren't, I highly recommend switching, as PyPDF is no longer maintained with the author giving his official blessings to Phaseit in developing PyPDF2.
If for some reason you cannot swap to PyPDF2 (licensing, system restrictions, etc.) than PdfFileMerger won't be available to you. In that situation you can re-use the code from PyPDF2's merge function (provided above) to create a copy of the file as a StringIO object, and use that in your code in place of the file object.
EDIT 2: Previous recommendation of using merger.append(PdfFileReader(file(filename, 'rb'))) changed based on comments (Thanks #Agostino).
The pdfrw package reads each file all in one go, so will not suffer from the problem of too many open files. Here is an example concatenation script.
The relevant part -- assumes inputs is a list of input filenames, and outfn is an output file name:
from pdfrw import PdfReader, PdfWriter
writer = PdfWriter()
for inpfn in inputs:
writer.addpages(PdfReader(inpfn).pages)
writer.write(outfn)
Disclaimer: I am the primary pdfrw author.
The problem is that you are only allowed to have a certain number of files open at any given time. There are ways to change this (http://docs.python.org/3/library/resource.html#resource.getrlimit), but I don't think you need this.
What you could try is closing the files in the for loop:
input = PdfFileReader()
output = PdfFileWriter()
for file in filenames:
f = open(file, 'rb')
input = PdfFileReader(f)
# Some code
f.close()
I have written this code to help with the answer:-
import sys
import os
import PyPDF2
merger = PyPDF2.PdfFileMerger()
#get PDFs files and path
path = sys.argv[1]
pdfs = sys.argv[2:]
os.chdir(path)
#iterate among the documents
for pdf in pdfs:
try:
#if doc exist then merge
if os.path.exists(pdf):
input = PyPDF2.PdfFileReader(open(pdf,'rb'))
merger.append((input))
else:
print(f"problem with file {pdf}")
except:
print("cant merge !! sorry")
else:
print(f" {pdf} Merged !!! ")
merger.write("Merged_doc.pdf")
In this, I have used PyPDF2.PdfFileMerger and PyPDF2.PdfFileReader, instead of explicitly converting the file name to file object
It maybe just what it says, you are opening to many files.
You may explicitly use f=file(filename) ... f.close() in the loop, or use the with statement. So that each opened file is properly closed.