Creating a unicode XML file in Python 2.7.1 - python

I am trying to write some data out into a unicode XML file with the following statement:
filepath = 'G:\Kodi EPG\ChannelGuide.xml'
with open(filepath, "w", encoding = 'UTF-8') as xml_file:
xml_file.write(file_blanker)
xml_file.close
...but am getting the following error:
Traceback (most recent call last):
File "G:\Python27\Kodi\Sky TV Guide Scraper.py", line 35, in <module>
class tv_guide:
File "G:\Python27\Kodi\Sky TV Guide Scraper.py", line 47, in tv_guide
with open(filepath, "w", encoding = 'UTF-8') as xml_file:
TypeError: 'encoding' is an invalid keyword argument for this function
I have seen this given as an accepted answer on here to a question, but that was for Python 3xx. Is the syntax slightly different for version 2?
Thanks

Yes, the syntax is different for Python2 - regarding the encoding argument.
Python2 open description:
open(name[, mode[, buffering]])
Python3 open description:
open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
As you can see, in Python 2.7 open doesn't accept the encoding argument, hence the Type Error.
However you can use the built-in io module to open your files. This would allow you to specify the encoding, and also provide compatibility with Python3. For example,
import io
filepath = r'G:\Kodi EPG\ChannelGuide.xml'
with io.open(filepath, "w", encoding = 'UTF-8') as xml_file:
xml_file.write(file_blanker)
Note that you don't have to explicitly close your files when using the the with statement.

Related

Python 3: message error when I try to open a pdf

I'm having issues with code that used to work during weeks.
The problem comes from this part of my code:
TypeError: ifile = open('0_Inputs/CompaniesList.csv', "r", encoding = 'utf-8')
I got the following message:
open() got an unexpected keyword argument 'encoding'
If i try:
ifile = open('0_Inputs/CompaniesList.csv', "r")
then i have an other error:
OSError: cannot identify image file '0_Inputs/CompaniesList.csv'
Im doing from PyPDF2 import PdfFileReader, PdfFileWriter but I don't if there's a conflict between libraries?
Thank you!
You should use the csv module instead of opening the file this way.
with open('0_Inputs/CompaniesList.csv', newline='') as ifile:
pass
More information: https://docs.python.org/3/library/csv.html

Python3 write gzip file - memoryview: a bytes-like object is required, not 'str'

I want to write a file. Based on the name of the file this may or may not be compressed with the gzip module. Here is my code:
import gzip
filename = 'output.gz'
opener = gzip.open if filename.endswith('.gz') else open
with opener(filename, 'wb') as fd:
print('blah blah blah'.encode(), file=fd)
I'm opening the writable file in binary mode and encoding my string to be written. However I get the following error:
File "/usr/lib/python3.5/gzip.py", line 258, in write
data = memoryview(data)
TypeError: memoryview: a bytes-like object is required, not 'str'
Why is my object not a bytes? I get the same error if I open the file with 'w' and skip the encoding step. I also get the same error if I remove the '.gz' from the filename.
I'm using Python3.5 on Ubuntu 16.04
For me, changing the gzip flag to 'wt' did the job. I could write the original string, without "byting" it.
(tested on python 3.5, 3.7 on ubuntu 16).
From python 3 gzip doc - quoting: "... The mode argument can be any of 'r', 'rb', 'a', 'ab', 'w', 'wb', 'x' or 'xb' for binary mode, or 'rt', 'at', 'wt', or 'xt' for text mode..."
import gzip
filename = 'output.gz'
opener = gzip.open if filename.endswith('.gz') else open
with opener(filename, 'wt') as fd:
print('blah blah blah', file=fd)
!zcat output.gz
> blah blah blah
you can convert it to bytes like this.
import gzip
with gzip.open(filename, 'wb') as fd:
fd.write('blah blah blah'.encode('utf-8'))
print is a relatively complex function. It writes str to a file but not the str that you pass, it writes the str that is the result of rendering the parameters.
If you have bytes already, you can use fd.write(bytes) directly and take care of adding a newline if you need it.
If you don't have bytes, make sure fd is opened to receive text.
You can serialize it using pickle.
First serializing the object to be written using pickle, then using gzip.
To save the object:
import gzip, pickle
filename = 'non-serialize_object.zip'
# serialize the object
serialized_obj = pickle.dumps(object)
# writing zip file
with gzip.open(filename, 'wb') as f:
f.write(serialized_obj)
To load the object:
import gzip, pickle
filename = 'non-serialize_object.zip'
with gzip.open(filename, 'rb') as f:
serialized_obj = f.read()
# de-serialize the object
object = pickle.loads(serialized_obj)

Korean txt file encoding with utf-8

I'm trying to process a Korean text file with python, but it fails when I try to encode the file with utf-8.
#!/usr/bin/python
#-*- coding: utf-8 -*-
f = open('tag.txt', 'r', encoding='utf=8')
s = f.readlines()
z = open('tagresult.txt', 'w')
y = z.write(s)
z.close
=============================================================
Traceback (most recent call last):
File "C:\Users\******\Desktop\tagging.py", line 5, in <module>
f = open('tag.txt', 'r', encoding='utf=8')
TypeError: 'encoding' is an invalid keyword argument for this function
[Finished in 0.1s]
==================================================================
And when I just opens a Korean txt file encoded with utf-8, the fonts are broken like this. What can I do?
\xc1\xc1\xbe\xc6\xc1\xf6\xb4\xc2\n',
'\xc1\xc1\xbe\xc6\xc7\xcf\xb0\xc5\xb5\xe7\xbf\xe4\n',
'\xc1\xc1\xbe\xc6\xc7\xcf\xbd\xc3\xb4\xc2\n',
'\xc1\xcb\xbc\xdb\xc7\xd1\xb5\xa5\xbf\xe4\n',
'\xc1\xd6\xb1\xb8\xbf\xe4\
I don't know Korean, and don't have sample string to try, but here are some advices for you:
1
f = open('tag.txt', 'r', encoding='utf=8')
You have a typo here, utf-8 not utf=8, this explains for the exception you got.
The default mode of open() is 'r' so you don't have to define it again.
2 Don't just use open, you should use context manager statement to manage the opening/closing file descriptor, like this:
with open('tagresult.txt', 'w') as f:
f.write(s)
In Python 2 the open function does not take an encoding parameter. Instead you read a line and convert it to unicode. This article on kitchen (as in kitchen sink) modules provides details and some lightweight utilities to work with unicode in python 2.x.

Syntax error for "with open(file, 'r+'):"

I have a bunch of .csh in a directory and I want to open them one by one, search for "//" and replace it with "/" with a python script. How do I do that?
I tried:
import os
for file in os.listdir("./"):
if file.endswith(".csh"):
with open(file, 'r+'):
data = file.read().replace("//", "/")
f.write(data)
f.close()
But it gives me:
File "script.py", line 4
with open(file, 'r+'):
^
SyntaxError: invalid syntax
You are using an old version of Python. The with statement was introduced in Python 2.5, where it had to be enabled via
from __future__ import with_statement
It is best to upgrade to 2.7, if you need to stay in the 2.x line, or 3.4.
Note that you also need to change your code according the answer by Avinash Raj, capturing the file object in a variable via as f. file.read() will not work because file continues to be the file name string.
Change your code to,
import os
for file in os.listdir("./"):
if file.endswith(".csh"):
with open(file, 'r+') as f:
data = f.read()
f.seek(0)
with open(file, 'w+') as w:
dat = data.replace("//", "/")
w.write(dat)

Writing XML to file corrupts files in python

I'm attempting to write contents from xml.dom.minidom object to file. The simple idea is to use 'writexml' method:
import codecs
def write_xml_native():
# Building DOM from XML
xmldoc = minidom.parse('semio2.xml')
f = codecs.open('codified.xml', mode='w', encoding='utf-8')
# Using native writexml() method to write
xmldoc.writexml(f, encoding="utf=8")
f.close()
The problem is that it corrupts the non-latin-encoded text in the file. The other way is to get the text string and write it to file explicitly:
def write_xml():
# Building DOM from XML
xmldoc = minidom.parse('semio2.xml')
# Opening file for writing UTF-8, which is XML's default encoding
f = codecs.open('codified3.xml', mode='w', encoding='utf-8')
# Writing XML in UTF-8 encoding, as recommended in the documentation
f.write(xmldoc.toxml("utf-8"))
f.close()
This results in the following error:
Traceback (most recent call last):
File "D:\Projects\Semio\semioparser.py", line 45, in <module>
write_xml()
File "D:\Projects\Semio\semioparser.py", line 42, in write_xml
f.write(xmldoc.toxml(encoding="utf-8"))
File "C:\Python26\lib\codecs.py", line 686, in write
return self.writer.write(data)
File "C:\Python26\lib\codecs.py", line 351, in write
data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 2064: ordinal not in range(128)
How do I write an XML text to file? What is it I'm missing?
EDIT. Error is fixed by adding decode statement:
f.write(xmldoc.toxml("utf-8").decode("utf-8"))
But russian symbols are still corrupted.
The text is not corrupted when viewed in an interpreter, but when it's written in file.
Hmm, though this should work:
xml = minidom.parse("test.xml")
with codecs.open("out.xml", "w", "utf-8") as out:
xml.writexml(out)
you may alternatively try:
with codecs.open("test.xml", "r", "utf-8") as inp:
xml = minidom.parseString(inp.read().encode("utf-8"))
with codecs.open("out.xml", "w", "utf-8") as out:
xml.writexml(out)
Update: In case you construct xml out of string object, you should encode it before passing to minidom parser, like this:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import codecs
import xml.dom.minidom as minidom
xml = minidom.parseString(u"<ru>Тест</ru>".encode("utf-8"))
with codecs.open("out.xml", "w", "utf-8") as out:
xml.writexml(out)
Try this:
with open("codified.xml", "w") as f:
f.write(xmldoc.toxml("utf-8").decode("utf-8"))
This works for me (under Python 3, though).

Categories

Resources