I am making a script to perform creating and writing data to XML file. The error is no module no module name
I refer to this stackoverflow link, Python 2.5.4 - ImportError: No module named etree.ElementTree. I refer to this tutorial, https://stackabuse.com/reading-and-writing-xml-files-in-python/. I still do not understand on what is the solution. I tried to replace
"from elementtree import ElementTree"
to
"from xml.etree import ElementTree"
It still did not work.
#!/usr/bin/python
import xml.etree.ElementTree as xml
root = xml.Element("FOLDER")
child = xml.Element("File")
root.append(child)
fn = xml.SubElement(child, "PICTURE")
fn.text = "he32dh32rf43hd23"
md5 = xml.SubElement(child, "CONTENT")
md5.text = "he32dh32rf43hd23"
tree = xml.ElementTree(root)
with open(xml.xml, "w") as fh:
tree.write(fh)
"""
I expect the result to be that data is written to xml file. But I received an error shown below,
File "./xml.py", line 2, in <module>
import xml.etree.ElementTree as xml
File "/root/Desktop/virustotal/testxml/xml.py", line 2, in <module>
import xml.etree.ElementTree as xml
```ImportError: No module named etree.ElementTree
import xml.etree.ElementTree as xml
and make sure you have the __init__.py file within the same folder if you use your own xml module and please avoid the path conflict.
then it will work.
etree package is provided by "ElementTree" and "lxml" both are similar but it is reported that ElementTree have bugs in python 2.7 and works great in python3.
I see you are using python 2.7 so lxml will work fine for you.
try this
from lxml import etree
from io import StringIO
tree = etree.parse(StringIO(xml_file))
# incase you need to read an XML.
print(tree.getroot())
And the StringIO is from default python io package.
StringIO is neccessary when you are passing file to it (I mean putting XML in a file and passing that file to parser).
It's good to keep it even tough you are passing XML as a big string.
all the writing operations will be same for both.
Related
I am new to Python. I have installed signxml package and I am doing xml signature process.
Link to python package : https://pypi.org/project/signxml/
My xml file is getting generated. However XML signature code is little different. I was able to match most of the part but I donot have idea how to match following one.
Can any one please help me for that.
Different part is following tag
<Signature>
Above part should be like this one
<Signature xmlns="http://www.w3.org/2000/09/xmldsig#">
When i searched into signxml core file i found following note.
To specify the location of an enveloped signature within **data**, insert a
``<ds:Signature Id="placeholder"></ds:Signature>`` element in **data** (where
"ds" is the "http://www.w3.org/2000/09/xmldsig#" namespace). This element will
be replaced by the generated signature, and excised when generating the digest.
How to make change to get this changed.
Following is my python code
from lxml import etree
import xml.etree.ElementTree as ET
from signxml import XMLSigner, XMLVerifier
import signxml
el = ET.parse('example.xml')
root = el.getroot()
#cert = open("key/public.pem").read()
key = open("key/private.pem").read()
signed_root = XMLSigner(method=signxml.methods.enveloped,signature_algorithm='rsa-sha512',digest_algorithm="sha512").sign(root, key=key)
tree = ET.ElementTree(signed_root)
#dv = tree.findall(".//DigestValue");
#print(dv);
tree.write("new_signed_file.xml")
What above code doing is. It takes one xml file and do digital signature process and generates new file.
Can anyone please guide me where and what change should i do for this requirements ?
I am assuming you are using python signxml
Go to python setup and open this file Python\Lib\site-packages\signxml\ __init__.py
Open __init__.py file and do following changes.
Find following code
def _unpack(self, data, reference_uris):
sig_root = Element(ds_tag("Signature"), nsmap=self.namespaces)
Change with following code.
def _unpack(self, data, reference_uris):
#sig_root = Element(ds_tag("Signature"), nsmap=self.namespaces)
sig_root = Element(ds_tag("Signature"), xmlns="http://www.w3.org/2000/09/xmldsig#")
After doing this change re-compile your python signxml package.
Re-generate new xml signature file.
Hello :) This is my first python program but it doesn't work.
What I want to do :
import a XML file and grab only Example.swf from
<page id="Example">
<info>
<title>page 1</title>
</info>
<vector_file>Example.swf</vector_file>
</page>
(the text inside <vector_file>)
than download the associated file on a website (https://website.com/.../.../Example.swf)
than rename it 1.swf (or page 1.swf)
and loop until I reach the last file, at the end of the page (Exampleaa_idontknow.swf → 231.swf)
convert all the files in pdf
What i have done (but useless, because of AttributeError: 'xml.etree.ElementTree.Element' object has no attribute 'xpath'):
import re
import urllib.request
import requests
import time
import requests
import lxml
import lxml.html
import os
from xml.etree import ElementTree as ET
DIR="C:/Users/mypath.../"
for filename in os.listdir(DIR):
if filename.endswith(".xml"):
with open(file=DIR+".xml",mode='r',encoding='utf-8') as file:
_tree = ET.fromstring(text=file.read())
_all_metadata_tags = _tree.xpath('.//vector_file')
for i in _all_metadata_tags:
print(i.text + '\n')
else:
print("skipping for filename")
First of all, you need to make up your mind about what module you're going to use. lxml or xml? Import only one of them. lxml has more features, but it's an external dependency. xml is more basic, but it is built-in. Both modules share a lot of their API, so they are easy to confuse. Check that you're looking at the correct documentation.
For what you want to do, the built-in module is good enough. However, the .xpath() method is not supported there, the method you are looking for here is called .findall().
Then you need to remember to never parse XML files by opening them as plain text files, reading them into into string, and parsing that string. Not only is this wasteful, it's fundamentally the wrong thing to do. XML parsers have built-in automatic encoding detection. This mechanism makes sure you never have to worry about file encodings, but you have to use it, too.
It's not only better, but less code to write: Use ET.parse() and pass a filename.
import os
from xml.etree import ElementTree as ET
DIR = r'C:\Users\mypath'
for filename in os.listdir(DIR):
if not filename.lower().endswith(".xml"):
print("skipping for filename")
continue
fullname = os.path.join(DIR, filename)
tree = ET.parse(fullname)
for vector_file in tree.findall('.//vector_file'):
print(vector_file.text + '\n')
If you only expect a single <vector_file> element per file, or if you only care for the first such element, use .find() instead of .findall():
vector_file = tree.find('.//vector_file')
if vector_file is None:
print('Nothing found')
else:
print(vector_file.text + '\n')
I am currently trying to code a searching program, which makes use of a program I've already written. It refuses to get to the second print statement.
print("Relevance: ")
# import sqlite3
import Breakdown.py as bd
import re, nltk
from nltk.corpus import wordnet
# from sqlite3 import Error
from autocorrect import spell
print("Input line: ")
The file structure looks like this:
However, I can't work out why it can't get past that import section.
This is somewhat important.
Thanks.
Just write:
import Breakdown as bd
python will import the Breakdown.py file as a module. It will be looking for any variable or function named "py" in the Breakdown module if you use:
import Breakdown.py as bd
... which I don't think is the case here.
You should put the Breakdown.py file in the path where you're starting Python or in one of the directories where Python looks for libraries:
import os
for p in sys.path:
print(p)
and use import Breakdown (no .py).
Or else add to sys.paththe folder where the module is with:
sys.path.append('/your/foldername')
I am trying to Parse an XML file using elemenTree of Python.
The xml file is like below:
<App xmlns="test attribute">
<name>sagar</name>
</App>
Parser Code:
from xml.etree.ElementTree import ElementTree
from xml.etree.ElementTree import Element
import xml.etree.ElementTree as etree
def parser():
eleTree = etree.parse('app.xml')
eleRoot = eleTree.getroot()
print("Tag:"+str(eleRoot.tag)+"\nAttrib:"+str(eleRoot.attrib))
if __name__ == "__main__":
parser()
Output:
[sagar#linux Parser]$ python test.py
Tag:{test attribute}App <------------- It should print only "App"
Attrib:{}
When I remove "xmlns" attribute or rename "xmlns" attribute to something else the eleRoot.tag is printing correct value.
Why can't element tree unable to parse the tags properly when I have "xmlns" attribute in the tag. Am I missing some pre-requisite to parse an XML of this format using element tree?
Your xml uses the attribute xmlns, which is trying to define a default xml namespace. Xml namespaces are used to solve naming conflicts, and require a valid URI for their value, as such the value of "test attribute" is invalid, which appears to be troubling the parsing of your xml by etree.
For more information on xml namespaces see XML Namespaces at W3 Schools.
Edit:
After looking into the issue further it appears that the fully qualified name of an element when using a python's ElementTree has the form {namespace_url}tag_name. This means that, as you defined the default namespace of "test attribute", the fully qualified name of your "App" tag is infact {test attribute}App, which is what you're getting out of your program.
Source
I have more than 5000 XML files in multiple sub directories named f1, f2, f3, f4,...
Each folder contains more than 200 files. At the moment I want to extract all the files using BeautifulSoup only as I have already tried lxml, elemetTree and minidom but am struggling to get it done through BeautifulSoup.
I can extract single file in sub directory but not able to get all the files through BeautifulSoup.
I have checked the below posts:
XML parsing in Python using BeautifulSoup (Extract Single File)
Parsing all XML files in directory and all subdirectories (This is minidom)
Reading 1000s of XML documents with BeautifulSoup (Unable to get the files through this post)
Here is the code which I have written to extract a single file:
from bs4 import BeautifulSoup
file = BeautifulSoup(open('./Folder/SubFolder1/file1.XML'),'lxml-xml')
print(file.prettify())
When I try to get all files in all folders I am using the below code:
from bs4 import BeautifulSoup
file = BeautifulSoup('//Folder/*/*.XML','lxml-xml')
print(file.prettify())
Then I am only getting XML Version and nothing else. I know that I have to use a for loop and am not sure how to use it in order to parse all the files through the loop.
I know that it will be very very slow but for the sake of learning I want to use beautifulsoup to parse all the files, or if for loop is not recommended then will be grateful if I can get a better solution but only in beautifulsoup only.
Regards,
If I understood you correctly, then you do need to loop through the files, as you had already thought:
from bs4 import BeautifulSoup
from pathlib import Path
for filepath in Path('./Folder').glob('*/*.XML'):
with filepath.open() as f:
soup = BeautifulSoup(f,'lxml-xml')
print(soup.prettify())
pathlib is just one approach to handling paths, on a higher level using objects. You could achieve the same with glob and string paths.
Use glob.glob to find the XML documents:
import glob
from bs4 import BeautifulSoup
for filename in glob.glob('//Folder/*/*.XML'):
content = BeautifulSoup(filename, 'lxml-xml')
print(content.prettify())
note: don't shadow the builtin function/class file.
Read the BeautifulSoup Quick Start