Python - write Xml (formatted) - python

I wrote this python script in order to create Xml content and i would like to write this "prettified" xml to a file (50% done):
My script so far:
data = ET.Element("data")
project = ET.SubElement(data, "project")
project.text = "This project text"
rawString = ET.tostring(data, "utf-8")
reparsed = xml.dom.minidom.parseString(rawString)
cleanXml = reparsed.toprettyxml(indent=" ")
# This prints the prettified xml i would like to save to a file
print cleanXml
# This part does not work, the only parameter i can pass is "data"
# But when i pass "data" as a parameter, a xml-string is written to the file
tree = ET.ElementTree(cleanXml)
tree.write("config.xml")
The error i get when i pass cleanXml as parameter:
Traceback (most recent call last):
File "app.py", line 45, in <module>
tree.write("config.xml")
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 817, in write
self._root, encoding, default_namespace
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 876, in _namespaces
iterate = elem.getiterator # cET compatibility
AttributeError: 'unicode' object has no attribute 'getiterator'
Anybody knows how i can get my prettified xml to a file ?
Thanks and Greetings!

The ElementTree constructor can be passed a root element and a file. To create an ElementTree from a string, use ElementTree.fromstring.
However, that isn't what you want. Just open a file and write the string directly:
with open("config.xml", "w") as config_file:
config_file.write(cleanXml)

Related

Python Configparser. Whitespace causes AttributeError

I recieve some files with .ini file with them. I have to recieve file names from [FILES] section.
Sometimes there is an extra witespace in another section of .ini-file which raises exception in ConfigParser module
The example of "bad" ini-file:
[LETTER]
SUBJECT=some text
some text
and text with whitespace in the beggining
[FILES]
0=file1.txt
1=file2.doc
My code(Python 3.7):
import configparser
def get_files_from_ini_file(info_file):
ini = configparser.ConfigParser(allow_no_value=True)
ini.read(info_file) # ERROR is here
if ini.has_section("FILES"):
pocket_files = [ini.get("FILES", i) for i in ini.options("FILES")]
return pocket_files
print(get_files_from_ini_file("D:\\bad.ini"))
Traceback (most recent call last):
File "D:/test.py", line 10, in <module>
print(get_files_from_ini_file("D:\\bad.ini"))
File "D:/test.py", line 5, in get_files_from_ini_file
ini.read(info_file) # ERROR
File "C:\Users\ap\AppData\Local\Programs\Python\Python37-32\lib\configparser.py", line 696, in read
self._read(fp, filename)
File "C:\Users\ap\AppData\Local\Programs\Python\Python37-32\lib\configparser.py", line 1054, in _read
cursect[optname].append(value)
AttributeError: 'NoneType' object has no attribute 'append'
I can't influence on files I recieve so that is there any way to ignore this error? In fact I need only [FILES] section to parse.
Have tried empty_lines_in_values=False with no result
May be that's invalid ini file and I should write my own parser?
If you only need the "FILES" part, a simple way is to:
open the file and read into a string
get the part after "[FILES]" using .split() method
add "[FILES]" before the string
use the configparser read_string method on the string
This is a hacky solution but it should work:
import configparser
def get_files_from_ini_file(info_file):
with open(info_file, 'r') as file:
ini_string = file.read()
useful_part = "[FILES]" + ini_string.split("[FILES]")[-1]
ini = configparser.ConfigParser(allow_no_value=True)
ini.read_string(useful_part) # ERROR is here
if ini.has_section("FILES"):
pocket_files = [ini.get("FILES", i) for i in ini.options("FILES")]
return pocket_files
print(get_files_from_ini_file("D:\\bad.ini"))

serialize a text file into a protobuf message

I have a serialized protobuf message that I can simply read and save in plain text in python with something like this:
import MyMessage
import sys
FilePath = sys.argv[1]
T = MyMessage.MyType()
f = open(FilePath, 'rb')
T.ParseFromString(f.read())
f.close()
print(T)
I can save this to a plain txt file and do what I want to do.
Now I need to do the inverse operation, i.e. reading the simple plain text file, already formatted in the right way, and save it as a protobuf message
import MyMessage
import sys
FilePath = sys.argv[1]
input = open("./input.txt", 'r')
T = MyMessage.MyType()
T.ParseFrom(inputText.readlines())
output.write(T.SerializeToString())
input.close()
output.close()
This fails with
Traceback (most recent call last):
File "MyFile.py", line 13, in <module>
T.ParseFromString(input.readlines())
File "C:\Users\xxx\AppData\Local\Programs\Python\Python38\lib\site-packages\google\protobuf\message.py", line 199, in ParseFromString
return self.MergeFromString(serialized)
File "C:\Users\xxx\AppData\Local\Programs\Python\Python38\lib\site-packages\google\protobuf\internal\python_message.py", line 1142, in MergeFromString
serialized = memoryview(serialized)
TypeError: memoryview: a bytes-like object is required, not 'list'
I am not a python nor a protobuf expert, so I guess I am missing something trivial...
Any help?
Thanks :)
print(x) calls str(x), which for protobufs uses the human-readable "text format" representation.
To read back from that format, you can use the google.protobuf.text_format module:
from google.protobuf import text_format
def parse_my_type(file_path):
with open(file_path, 'r') as f:
return text_format.Parse(f.read(), MyMessage.MyType())

parse Dutch NDW xml

I am trying to parse the XML file from the Dutch NDW which contains every minute the trafficspeed on many Dutch motorways. I use this example file: http://www.ndw.nu/downloaddocument/e838c62446e862f5b6230be485291685/Reistijden.zip
I am trying to parse the traveltime data in variables with Python but i am struggling.
from xml.etree import ElementTree
import urllib2
url = "http://weburloffile.nl/ndw/Reistijden.xml"
response = urllib2.urlopen(url)
namespaces = {
'soap': 'http://schemas.xmlsoap.org/soap/envelope/',
'a': 'http://datex2.eu/schema/2/2_0'
}
dom = ElementTree.fromstring(response.read)
names = dom.findall(
'soap:Envelope'
'/a:duration',
namespaces,
)
#print names
for duration in names:
print(duration.text)
I get this new error
Traceback (most recent call last):
File "test.py", line 9, in <module>
dom = ElementTree.fromstring(response.read)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1311, in XML
parser.feed(text)
File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1651, in feed
self._parser.Parse(data, 0)
TypeError: Parse() argument 1 must be string or read-only buffer, not instancemethod
How to parse this (complex) xml correctly?
-- changed it into read as suggested by comment
The problem isn't the XML parsing; it's that you are using the response object incorrectly. urllib2.urlopen returns a file-like object that does not have a content attribute. Instead, you should be calling read on it:
dom = ElementTree.fromstring(response.read())

Python XML Parser: junk after document element

I'm learning Python at work. I've got a large XML file with data similar to this:
testData3.xml File
<r><c>something1</c><c>something1</c><c>something1</c><c>something1</c><c>something1</c><c>something1</c><c>something1</c><c>something1</c><c></c><c></c><c>something1</c><c>something1</c></r>
<r><c>something2</c><c>something2</c><c>something2</c><c>something2</c><c>something2</c><c>something2</c><c>something2</c><c>something2</c><c></c><c></c><c>something2</c><c>something2</c></r>
I have copied an XML parser out of one of my Python books that works in gathering the data when the data file contains only one line. As soon as I add a second line of data, the script fails when it runs.
Python script that I'm running (xmlReader.py):
from xml.dom.minidom import parse, Node
xmltree = parse('testData3.xml')
for node1 in xmltree.getElementsByTagName('c'):
for node2 in node1.childNodes:
if node2.nodeType == Node.TEXT_NODE:
print(node2.data)
I'm looking for some help on how to write the loop so that my xmlReader.py continues through the entire file instead of just one line. I get the following errors when I run this script:
Errors during execution:
xxxx#xxxx:~/xxxx/xxxx> python xmlReader.py
Traceback (most recent call last):
File "xmlReader.py", line 2, in <module>
xmltree = parse('testData3.xml')
File "/usr/lib64/python2.6/site-packages/_xmlplus/dom/minidom.py", line 1915, in parse
return expatbuilder.parse(file)
File "/usr/lib64/python2.6/site-packages/_xmlplus/dom/expatbuilder.py", line 926, in parse
result = builder.parseFile(fp)
File "/usr/lib64/python2.6/site-packages/_xmlplus/dom/expatbuilder.py", line 207, in parseFile
parser.Parse(buffer, 0)
xml.parsers.expat.ExpatError: junk after document element: line 2, column 0
xxxx#xxxx:~/xxxx/xxxx>
The problem is that your example data is not valid XML. A valid XML document should have a single root element; this is true for a single line of the file, where <r> is the root element, but not true when you add a second line, because each line is contained within a separate <r> element, but there is no global parent element in the file.
Either construct valid XML, for example:
<root>
<r><c>something1</c><c>something1</c><c>something1</c><c>something1</c><c>something1</c><c>something1</c><c>something1</c><c>something1</c><c></c><c></c><c>something1</c><c>something1</c></r>
<r><c>something2</c><c>something2</c><c>something2</c><c>something2</c><c>something2</c><c>something2</c><c>something2</c><c>something2</c><c></c><c></c><c>something2</c><c>something2</c></r>
</root>
or parse the file line by line:
from xml.dom.minidom import parseString
f = open('testData3.xml'):
for line in f:
xmltree = parseString(line)
...
f.close()

How to extract information from json?

i am trying to extract from json data some information. On the following code, I first extract the part of json data that contains the information i want and i store it in a file. Then i am trying to open this file and i get the error that follows my code. Can you help me find where i am wrong?
import json
import re
input_file = 'path'
text = open(input_file).read()
experience = re.findall(r'Experience":{"positionsMpr":{"showSection":true," (.+?),"visible":true,"find_title":"Find others',text)
output_file = open ('/home/evi.nastou/Documenten/LinkedIn_data/Alewijnse/temp', 'w')
output_file.write('{'+experience[0]+'}')
output_file.close()
text = open('path/temp')
input_text = text.read()
data = json.load(input_text)
positions = json.dumps([s['companyName'] for s in data['positions']])
print positions
Error:
Traceback (most recent call last):
File "test.py", line 13, in <module>
data = json.load(input_text)
File "/home/evi.nastou/.pythonbrew/pythons/Python-2.7.2/lib/python2.7/json/__init__.py", line 274, in load
return loads(fp.read(),
AttributeError: 'str' object has no attribute 'read'
You want to use json.loads() (note the s), or pass in the file object instead of the result of .read():
text = open('path/temp')
data = json.load(text)
json.load() takes an open file object, but you were passing it a string; json.loads() takes a string.

Categories

Resources