Using "x:" in an XML element name - python

I'm trying to create an XML file that needs to be sent to a server, with this format:
<x:Envelope xmlns:x="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:sen2="http://www.some_url.com">
<x:Header>
I'm working with Python 3 and the lxml library, but when trying to create the element I get an error.
Test code:
def authSendDataExtraccion():
envelope = etree.Element("x:Envelope")
debug_XML(envelope)
Result:
ValueError: Invalid tag name 'x:Envelope'
How can I use the ":" character in the element and attribute names?

Use an nsmap to create an element in a namespace:
envelope = etree.Element("Envelope", nsmap={'x': 'http://schemas.xmlsoap.org/soap/envelope/'})

Related

Substring from str in py

so I got this now, I have to extract a date from a str, (This comes from a XML file, but the tag I need is inside another tag, which don't allow extracting the date directly, so I have to extract all the Main tags, and then extract the info, from that converting to a str) but when I convert the Object of the XML to a Str, and try to extract the date with RE, I got an error, here's my code:
valor_uno = datos.getElementsByTagName("cbc:Description")[0]
vlr = str(valor_uno)
valor = re.search('<cbc:DueDate>(.+?)</cbc:DueDate>', vlr).group(1)
print(valor)
With that code, I got the next error:
AttributeError: 'NoneType' object has no attribute 'group'
And this is a small piece of the tags inside the XML, where is the cbc:DueDate tag:
<cbc:IssueDate>2022-06-16</cbc:IssueDate>
<cbc:IssueTime>17:44:00-05:00</cbc:IssueTime>
<cbc:DueDate>2022-07-16</cbc:DueDate>
<cbc:InvoiceTypeCode>01</cbc:InvoiceTypeCode>
How can i extract the date (2022-07-16)?
EDIT = This is kind of the XML format I got:
<cac:Attachment>
<cbc:Description>
......
<cbc:IssueDate>2022-06-16</cbc:IssueDate>
<cbc:IssueTime>17:44:00-05:00</cbc:IssueTime>
<cbc:DueDate>2022-07-16</cbc:DueDate>
<cbc:InvoiceTypeCode>01</cbc:InvoiceTypeCode>
......
<</cbc:Description>>
<</cac:Attachment>>

adding value to SubElement in ElementTree (Python)

I am looking for supported format in python for passing value of an element as a text value to sub-element. For example i have the below url that i setup as global as it is referenced in many places.
global appurl
appurl = 'http://%s/adminapi/application' % ipaddr
Now I need to achieve an XML format of this type in Python:
<application name="TEST">
<refURL>http://<ipaddr>/adminapi/application/TEST</refURL>
</application>
In python i wrote something of this sort
application_e = etree.SubElement(doc,'application', name='TEST')
refURL_e = etree.SubElement(application_e,'refURL')
Application = "TEST"
ApplicationURL = "appurl/%s" % Application
refURL_e.text = ApplicationURL
How do i append {value of Application} i.e. TEST to appurl that i defined globally and assign it as a value to refURL which is sub-element of application.
ApplicationURL = "{}/{}".format(appurl, Application) worked for me. I just had to declare appurl as global parameter.

How to extract text from element where the parent element is having attribute style="display:none" using Selenium

I want to extract Phone number from this div. This div has style="display:none"
So I can not access the children of this div. Please help me out in getting the phone number from the div.
I guess We need to change that display:none; to visibility:visible. How can I do this in Python Selenuim
Edit
I have tried the code below, as suggested in the first answer but it throws the following error:
email_div = browser.find_element_by_class_name("returnemail")
email_div_contents = browser.execute_script('return arguments[0].innerHTML', email_div)
telephone = email_div_contents.find_element_by_class_name('reply-tel-number').get_attribute('textContent')
AttributeError: 'str' object has no attribute
'find_element_by_class_name
'
As per the documentation execute_script() returns:
The command's JSON response loaded into a dictionary object.
Hence, moving forward when you attempted to invoke find_element_by_class_name() method on the dictionary object as follows:
email_div_contents.find_element_by_class_name('reply-tel-number').get_attribute('textContent')
The following error is raised:
AttributeError: 'str' object has no attribute 'find_element_by_class_name'
To remove the attribute style="display:none" from the desired element and extract the phone number you can use the following solution:
element = driver.find_element_by_xpath("//div[#class='returnemail js-only']")
driver.execute_script("arguments[0].removeAttribute('style')", element)
tel_number = element.find_element_by_xpath("./aside/ul//li//p[#class='reply-tel-number']").get_attribute("innerHTML")
Your code has incorrect place:
email_div = browser.find_element_by_class_name("returnemail")
email_div_contents = browser.execute_script('return arguments[0].innerHTML', email_div)
email_div_contents.find_element_by_class_name()
email_div_contents is a string represents the HTML code of the email_div, not a web element,
You can't call find_element_by_class_name() on a string.
That's why you got error:
'str' object has no attribute 'find_element_by_class_name'
You can always call get_attribute() to fetch attribute value on visible and invisible element.
To get the text content of invisible element you can use get_attribute('innerText').
phone_number = driver.find_element_by_css_selector("div.returnemail .reply-tel-number")
.get_attribute('innerText')
Actually, element.text call element.get_attribute('innerText') inside, but element.text will respect user experience: If user can't see the element from page, element.text will return empty string as user see. (Even element.get_attribute('innerText') return non-empty string)
#property
text:
if ( element is visible ):
return element.get_attribute('innerText')
else:
return ''

Retrieve XML parent and child attributes using Python and lxml

I'm trying to process an XML file using XPATH in Python / lxml.
I can pull out the values at a particular level of the tree using this code:
file_name = input('Enter the file name, including .xml extension: ') # User inputs file name
print('Parsing ' + file_name)
from lxml import etree
parser = etree.XMLParser()
tree = etree.parse(file_name, parser)
r = tree.xpath('/dataimport/programmelist/programme')
print (len(r))
with open(file_name+'.log', 'w', encoding='utf-8') as f:
for r in tree.xpath('/dataimport/programmelist/programme'):
progid = (r.get("id"))
print (progid)
It returns a list of values as expected. I also want to return the value of a 'child' (where it exists), but I can't work out how (I can only get it to work as a separate list, but I need to maintain the link between them).
Note: I will be writing the values out to a log file, but since I haven't been successful in getting everything out that I want, I haven't added the 'write out' code yet.
This is the structure of the XML:
<dataimport dtdversion="1.1">
<programmelist>
<programme id="eid-273168">
<imageref idref="img-1844575"/>
How can I get Python to return the id + idref?
The previous examples I have worked with had namespaces, but this file doesn't.
Since xpath() method returns tree, you can use xpath again to get idref list you want:
for r in tree.xpath('/dataimport/programmelist/programme')
progid = r.get("id")
ref_list = r.xpath('imageref/#idref')
print progid, ref_lis

Python - ElementTree- cannot use absolute path on element

I'm getting this error in ElementTree when I try to run the code below:
SyntaxError: cannot use absolute path on element
My XML document looks like this:
<Scripts>
<Script>
<StepList>
<Step>
<StepText>
</StepText>
<StepText>
</StepText>
</Step>
</StepList>
</Script>
</Scripts>
Code:
import xml.etree.ElementTree as ET
def search():
root = ET.parse(INPUT_FILE_PATH)
for target in root.findall("//Script"):
print target.attrib['name']
print target.findall("//StepText")
I'm on Python 2.6 on Mac. Am I using Xpath syntax wrong?
Basically I want to show every Script elements name attribute if it contains a StepText element with certain text.
Turns out I needed to say target.findall(".//StepText"). I guess anything without the '.' is considered an absolute path?
Updated working code:
def search():
root = ET.parse(INPUT_FILE_PATH)
for target in root.findall("//Script"):
stepTexts = target.findall(".//StepText")
for stepText in stepTexts:
if FIND.lower() in stepText.text.lower():
print target.attrib['name'],' -- ',stepText.text

Categories

Resources