I'm trying to extract text data from this xml file but I don't know why my code not working. How do I get this phone number? Please have a look at this XML file and my code format as well.I'm trying to extract data from this tag Thank you in advance :)
<ClinicalDocument xmlns="urn:hl7-org:v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:voc="urn:hl7-org:v3/voc" xmlns:sdtc="urn:hl7-org:sdtc" xsi:schemaLocation="CDA.xsd">
<realmCode code="US"/>
<languageCode code="en-US"/>
<recordTarget>
<patientRole>
<addr use="HP">
<streetAddressLine>3345 Elm Street</streetAddressLine>
<city>Aurora</city>
<state>CO</state>
<postalCode>80011</postalCode>
<country>US</country>
</addr>
<telecom value="tel:+1(303)-554-8889" use="HP"/>
<patient>
<name use="L">
<given>Janson</given>
<given>J</given>
<family>Example</family>
</name>
</patient>
</patientRole>
</recordTarget>
</ClinicalDocument>
Here is my python code
import xml.etree.ElementTree as ET
tree = ET.parse('country.xml')
root = tree.getroot()
print(root)
for country in root.findall('patientRole'):
number = country.get('telecom')
print(number)
Your XML document has namespace specified, so it becomes something like:
for country in tree.findall('.//{urn:hl7-org:v3}patientRole'):
number = country.find('{urn:hl7-org:v3}telecom').attrib['value']
print(number)
Output:
tel:+1(303)-554-8889
Related
My xml file:
<?xml version='1.0' encoding='UTF-8'?>
<Document xmlns="urn:iso:std:iso:20022:tech:xsd:pain.001.001.03" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<CstmrCdtTrfInitn>
<CtgyPurp>. // ---->i want to change this tag
<Cd>SALA</Cd> //-----> no change
</CtgyPurp> // ----> i want to change this tag
</CstmrCdtTrfInitn>
</Document>
I want to make a change in the xml file:
<CtgyPurp></CtgyPurp> change in <newName></newName>
I know how to change the value within a tag but not how to change/modify the tag itself with lxml
Something like this should work - note the treatment of namespaces:
from lxml import etree
ctg = """[your xml above"]"""
doc = etree.XML(ctg.encode())
ns = {"xx": "urn:iso:std:iso:20022:tech:xsd:pain.001.001.03"}
target = doc.xpath('//xx:CtgyPurp',namespaces=ns)[0]
target.tag = "newName"
print(etree.tostring(doc).decode())
Output:
<Document xmlns="urn:iso:std:iso:20022:tech:xsd:pain.001.001.03" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<CstmrCdtTrfInitn>
<newName>. // ---->i want to change this tag
<Cd>SALA</Cd> //-----> no change
</newName> // ----> i want to change this tag
</CstmrCdtTrfInitn>
</Document>
I have a scenario where the data is extracted from oracle in the form of CSV and then it should be transformed to desired XML format.
Input CSV File:
Id,SubID,Rank,Size
1,123,1,0.1
1,234,2,0.2
2,456,1,0.1
2,123,2,0.2
Expected XML output:
<AA_ITEMS>
<Id ID="1">
<SubId ID="123">
<Rank>1</Rank>
<Size>0.1</Size>
</SubId>
<SubId ID="234">
<Rank>2</Rank>
<Size>0.2</Size>
</SubId>
</Id>
<Id ID="2">
<SubId ID="456">
<Rank>1</Rank>
<Size>0.1</Size>
</SubId>
<SubId ID="123">
<Rank>2</Rank>
<Size>0.2</Size>
</SubId>
</Id>
Note: The CSV file is a daily load and contains around 150K to 200K records
Please assist. Thanks in advance
There are a couple of ways to approach it and though some people dislike building xml from a template, I believe it works best:
from itertools import groupby
from lxml import etree
csv_string = """[your csv ab0ve]
"""
#first deal with the csv
#split it into lines and discard the headers
lines = csv_string.splitlines()[1:]
#group the lines by the first character
grpfunc = lambda x: x[0]
grps = [list(group) for key, group in groupby(lines, grpfunc)]
#now convert the whole thing into xml:
xml_string = """
<AA_ITEMS>
"""
for grp in grps:
elem = f' <Id ID="{grp[0][0]}">'
for g in grp:
entry = g.split(',')
#create an entry template:
id_tmpl = f"""
<SubId ID="{entry[1]}">
<Rank>{entry[2]}</Rank>
<Size>{entry[3]}</Size>
</SubId>
"""
elem+=id_tmpl
#close elem
elem+="""</Id>
"""
xml_string+=elem
#close the xml string
xml_string += """</AA_ITEMS>"""
#finally, show that the output is well formed xml:
print(etree.tostring(etree.fromstring(xml_string)).decode())
The output should be your expected xml.
I am trying to parse following xml data from a file with python for print only the elements with tag "zip-code" with his attribute name
<response status="success" code="19"><result total-count="1" count="1">
<address>
<entry name="studio">
<zip-code>14407</zip-code>
<description>Nothing</description>
</entry>
<entry name="mailbox">
<zip-code>33896</zip-code>
<description>Nothing</description>
</entry>
<entry name="garage">
<zip-code>33746</zip-code>
<description>Tony garage</description>
</entry>
<entry name="playstore">
<url>playstation.com</url>
<description>game download</description>
</entry>
<entry name="gym">
<zip-code>33746</zip-code>
<description>Getronics NOC subnet 2</description>
</entry>
<entry name="e-cigars">
<url>vape.com/24</url>
<description>vape juices</description>
</entry>
</address>
</result></response>
The python code that I am trying to run is
from xml.etree import ElementTree as ET
tree = ET.parse('file.xml')
root = tree.getroot()
items = root.iter('entry')
for item in items:
zip = item.find('zip-code').text
names = (item.attrib)
print(' {} {} '.format(
names, zip
))
However it fails once it gets to the items without "zip-code" tag.
How I could make this run?
Thanks in advance
As #AmitaiIrron suggested, xpath can help here.
This code searches the document for element named zip-code, and pings back to get the parent of that element. From there, you can get the name attribute, and pair with the text from zip-code element
for ent in root.findall(".//zip-code/.."):
print(ent.attrib.get('name'), ent.find('zip-code').text)
studio 14407
mailbox 33896
garage 33746
gym 33746
OR
{ent.attrib.get('name') : ent.find('zip-code').text
for ent in root.findall(".//zip-code/..")}
{'studio': '14407', 'mailbox': '33896', 'garage': '33746', 'gym': '33746'}
Your loop should look like this:
# Find all <entry> tags in the hierarchy
for item in root.findall('.//entry'):
# Try finding a <zip-code> child
zipc = item.find('./zip-code')
# If found a child, print data for it
if zipc is not None:
names = (item.attrib)
print(' {} {} '.format(
names, zipc.text
))
It's all a matter of learning to use xpath properly when searching through the XML tree.
If you have no problem using regular expressions, the following works just fine:
import re
file = open('file.xml', 'r').read()
pattern = r'name="(.*?)".*?<zip-code>(.*?)<\/zip-code>'
matches = re.findall(pattern, file, re.S)
for m in matches:
print("{} {}".format(m[0], m[1]))
and produces the result:
studio 14407
mailbox 33896
garage 33746
aystore 33746
How can i parse unstructured xml file? i need to get data inside patient tag and title using elementTree.
<?xml version="1.0" encoding="UTF-8"?>
<ClinicalDocument xmlns="urn:hl7-org:v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:hl7-org:v3 CDA.xsd">
<templateId root="2.16.840.1.113883.10.20.22.1.1"/>
<id extension="4b78219a-1d02-4e7c-9870-dc7ce3b8a8fb" root="1.2.840.113619.21.1.3214775361124994304.5.1"/>
<code code="34133-9" codeSystem="2.16.840.1.113883.6.1" codeSystemName="LOINC" displayName="Summarization of episode note"/>
<title>Summary</title>
<effectiveTime value="20170919160921ddfdsdsdsd31-0400"/>
<confidentialityCode code="N" codeSystem="2.16.840.dwdwddsd1.113883.5.25"/>
<recordTarget>
<patientRole><id extension="0" root="1.2.840.113619.21.1.3214775361124994304.2.1.1.2"/>
<addr use="HP"><streetAddressLine>addd2 </streetAddressLine><city>fgfgrtt</city><state>tr</state><postalCode>121213434</postalCode><country>rere</country></addr>
<patient>
<name><given>fname</given><family>lname</family></name>
<administrativeGenderCode code="F" codeSystem="2.16.840.1.113883.5.1" displayName="Female"/>
<birthTime value="19501025"/>
<maritalStatusCode code="M" codeSystem="2434.16.840.1.143434313883.5.2" displayName="M"/>
<languageCommunication>
<languageCode code="eng"/>
<proficiencyLevelCode nullFlavor="NI"/>
<preferenceInd value="true"/>
</languageCommunication>
</patient>
i want given name , family name , gender and title.
Using BeautifulSoup bs4 and lxml parser library to scrape xml data.
from bs4 import BeautifulSoup
xml_data = '''<?xml version="1.0" encoding="UTF-8"?>
<ClinicalDocument xmlns="urn:hl7-org:v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:hl7-org:v3 CDA.xsd">
<templateId root="2.16.840.1.113883.10.20.22.1.1"/>
<id extension="4b78219a-1d02-4e7c-9870-dc7ce3b8a8fb" root="1.2.840.113619.21.1.3214775361124994304.5.1"/>
<code code="34133-9" codeSystem="2.16.840.1.113883.6.1" codeSystemName="LOINC" displayName="Summarization of episode note"/>
<title>Summary</title>
<effectiveTime value="20170919160921ddfdsdsdsd31-0400"/>
<confidentialityCode code="N" codeSystem="2.16.840.dwdwddsd1.113883.5.25"/>
<recordTarget>
<patientRole><id extension="0" root="1.2.840.113619.21.1.3214775361124994304.2.1.1.2"/>
<addr use="HP"><streetAddressLine>addd2 </streetAddressLine><city>fgfgrtt</city><state>tr</state><postalCode>121213434</postalCode><country>rere</country></addr>
<patient>
<name><given>fname</given><family>lname</family></name>
<administrativeGenderCode code="F" codeSystem="2.16.840.1.113883.5.1" displayName="Female"/>
<birthTime value="19501025"/>
<maritalStatusCode code="M" codeSystem="2434.16.840.1.143434313883.5.2" displayName="M"/>
<languageCommunication>
<languageCode code="eng"/>
<proficiencyLevelCode nullFlavor="NI"/>
<preferenceInd value="true"/>
</languageCommunication>
</patient>'''
soup = BeautifulSoup(xml_data, "lxml")
title = soup.find("title")
print(title.text.strip())
patient = soup.find("patient")
given = patient.find("given").text.strip()
family = patient.find("family").text.strip()
gender = patient.find("administrativegendercode")['displayname'].strip()
print(given)
print(family)
print(gender)
O/P:
Summary
fname
lname
Female
Install library dependency:
pip3 install beautifulsoup4==4.7.1
pip3 install lxml==4.3.3
Or you can simply use lxml. Here is tutorial that I used: https://lxml.de/tutorial.html
But it should be similar to:
from lxml import etree
root = etree.Element("patient")
print(root.find("given"))
print(root.find("family"))
print(root.find("give"))
<?xml version="1.0" encoding="utf-8"?>
<ArrayOfRecord xmlns:i="http://www.w3.org/2001/XMLSchema-instance" i:type="Record">
<AvailableCharts>
<Accelerometer>true</Accelerometer>
<Velocity>false</Velocity>
</AvailableCharts>
<Trics>
<Trick>
<EndOffset>PT2M21.835S</EndOffset>
<Values>
<TrickValue>
<Acceleration>26.505801694441629</Acceleration>
<Rotation>0.023379150593228679</Rotation>
</TrickValue>
</Values>
</Trick>
</Trics>
<Values>
<SensorValue>
<accelx>-3.593643144</accelx>
<accely>7.316485176</accely>
</SensorValue>
<SensorValue>
<accelx>0.31103436</accelx>
<accely>7.70408184</accely>
</SensorValue>
</Values>
</ArrayOfRecord>
I am only interested in 'accelx' and 'accely' value in this data and need to create a csv out of it.
Update: The code given below breaks when I change the second row with the following. Nothing is displayed because of this;
<ArrayOfRecord xmlns:i="http://www.w3.org/2001/XMLSchema-instance" i:type="Record" xmlns="http://schemas">
The following code works:
import xml.etree.ElementTree as etree
tree = etree.parse(r"C:\Users\data.xml")
root = tree.getroot()
val_of_interest = root.findall("./Values/SensorValue")
for sensor_val in val_of_interest:
print sensor_val.find('accelx').text
print sensor_val.find('accely').text