I would like to get an element from the JSON tree
dataMod = data['products'][VARIABLE]['bla']['bla']
but I faced an issue when one of the element in this tree have VARIABLE inside, it is any clean way of skipping it? like:
dataMod = data['products'][*]['bla']['bla']
ANSWER:
for p in data['products']:
skipPLU = data['products'][p]
productPLU = skipPLU['bla']['bla']
Is this what you want?
head = data['products']
for skip in head :
for data in head[skip]:
print(head[skip][data]['bla']['bla'])
This will get all data under data['products'][VARIABLE]
Related
Im iterating through a nested json tree with Pandas dataframe. The issue im having is more or less simple to solve, but im out of ideas. When im traversing trough the nested json tree i get to a part where i cant get out of it and continue on another branch (i.e. when i reach Placeholder 1 i cant return and continue with Placeholder 2 (see json below). Here is my code so far:
def recursiveImport(df):
for row,_ in enumerate(df):
# Get ID, Name, Type
id = df['ID'].values[row]
name = df['Name'].values[row]
type = df['Type'].values[row]
# Iterate through Value
if type == 'struct':
for i in df.at[row, 'Value']:
df = pd.json_normalize(i)
recursiveImport(df)
elif type != 'struct':
value = df['Value'].values[row]
print(f'Value: {value}')
return
data = pd.read_json('work_gmt.json', orient='records')
print(data)
recursiveImport(data)
And the (minified) data im using for this is below (you can use a online json viewer to get a better look):
[{"ID":11,"Name":"Data","Type":"struct","Value":[[{"ID":0,"Name":"humidity","Type":"u32","Value":0},{"ID":0,"Name":"meta","Type":"struct","Value":[{"ID":0,"Name":"height","Type":"e32","Value":[0,0]},{"ID":0,"Name":"voltage","Type":"u16","Value":0},{"ID":0,"Name":"Placeholder 1","Type":"u16","Value":0}]},{"ID":0,"Name":"Placeholder 2","Type":"struct","Value":[{"ID":0,"Name":"volume","Type":"struct","Value":[{"ID":0,"Name":"volume profile","Type":"struct","Value":[{"ID":0,"Name":"upper","Type":"u8","Value":0},{"ID":0,"Name":"middle","Type":"u8","Value":0},{"ID":0,"Name":"down","Type":"u8","Value":0}]}]}]}]]}]
I tried using an indexed approach and keep track of each branch, but that didn't work for me. Perhaps i have to use a Stack/Queue to keep track? Thanks in advance!
Cheers!
I am pulling down an xml file using BeautifulSoup with this code
dlink = r'https://www.sec.gov/Archives/edgar/data/1040188/000104018820000126/primary_doc.xml'
dreq = requests.get(dlink).content
dsoup = BeautifulSoup(dreq, 'lxml')
There is a level I'm trying to access and then place the elements into a dictionary. I've got it working with this code:
if dsoup.otherincludedmanagerscount.text != '0':
inclmgr = []
for i in dsoup.find_all('othermanagers2info'):
for m in i.find_all('othermanager2'):
for o in m.find_all('othermanager'):
imd={}
if o.cik:
imd['cik'] = o.cik.text
if o.form13ffilenumber:
imd['file_no'] = o.form13ffilenumber.text
imd['name'] = o.find('name').text
inclmgr.append(imd)
comp_dict['incl_mgr'] = inclmgr
I assume its easier to use the .children or .descendants generators, but every time I run it, I get an error. Is there a way to only iterate over tags using the BeautifulSoup generators?
Something like this?
for i in dsoup.othermanagers2info.children:
imd['cik'] = i.cik.text
AttributeError: 'NavigableString' object has no attribute 'cik'
Assuming othermanagers2info is a single item; you can create the same results using 1 for loop:
for i in dsoup.find('othermanagers2info').find_all('othermanager'):
imd={}
if i.cik:
imd['cik'] = i.cik.text
if i.form13ffilenumber:
imd['file_no'] = i.form13ffilenumber.text
imd['name'] = i.find('name').text
inclmgr.append(imd)
comp_dict['incl_mgr'] = inclmgr
You can also do for i in dsoup.find('othermanagers2info').findChildren():. However this will produce different results (unless you add additional code). It will flattened the list and include both parent & child items. You can also pass in a node name
I am currently working on a script to scrape data from ClinicalTrials.gov. To do this I have written the following script:
def clinicalTrialsGov (id):
url = "https://clinicaltrials.gov/ct2/show/" + id + "?displayxml=true"
data = BeautifulSoup(requests.get(url).text, "lxml")
studyType = data.study_type.text
if studyType == 'Interventional':
allocation = data.allocation.text
interventionModel = data.intervention_model.text
primaryPurpose = data.primary_purpose.text
masking = data.masking.text
enrollment = data.enrollment.text
officialTitle = data.official_title.text
condition = data.condition.text
minAge = data.eligibility.minimum_age.text
maxAge = data.eligibility.maximum_age.text
gender = data.eligibility.gender.text
healthyVolunteers = data.eligibility.healthy_volunteers.text
armType = []
intType = []
for each in data.findAll('intervention'):
intType.append(each.intervention_type.text)
for each in data.findAll('arm_group'):
armType.append(each.arm_group_type.text)
citedPMID = tryExceptCT(data, '.results_reference.PMID')
citedPMID = data.results_reference.PMID
print(citedPMID)
return officialTitle, studyType, allocation, interventionModel, primaryPurpose, masking, enrollment, condition, minAge, maxAge, gender, healthyVolunteers, armType, intType
However, the following script won't always work as not all studies will have all items (ie. a KeyError will occur). To resolve this, I could simply wrap each statement in a try-except catch, like this:
try:
studyType = data.study_type.text
except:
studyType = ""
but it seems a bad way to implement this. What's a better/cleaner solution?
This is a good question. Before I address it, let me say that you should consider changing the second parameter to the BeautifulSoup (BS) constructor from lxml to xml. Otherwise, BS does not flag the parsed markup as XML (to verify this for yourself, access the is_xml attribute on the data variable in your code).
You can avoid generating an error when attempting to access a non-existent element by passing a list of desired element names to the find_all() method:
subset = ['results_reference','allocation','interventionModel','primaryPurpose','masking','enrollment','eligibility','official_title','arm_group','condition']
tag_matches = data.find_all(subset)
Then, if you want to get a specific element from the list of Tags without iterating through it, you can convert it to a dict using the Tag names as keys:
tag_dict = dict((tag_matches[i].name, tag_matches[i]) for i in range(0, len(tag_matches)))
I am using QGraphicsWebView and trying to iterate over QWebElements. At first tried :
frame = self.page().mainFrame()
doc = frame.documentElement()
h = frame.findFirstElement("head")
b = frame.findFirstElement("body")
elements = h.findAll("link")
for d in elements :
print d.tagName()
So you see what I thought but, but later on find that there's elements in QWebElementCollection, not in list. Please help me with iterating over DOM tree.
a QWebElement's findAll method returns a QWebElementCollection, which can be converted to a QList instance with it's toList() method. To iterate over a list of matched elements, you could use:
body_element = frame.findFirstElement("body")
for el in body_element.findAll("div").toList():
print el.tagName()
I'm reading instrument data from a specialty server that delivers the info in xml format. The code I've written is:
from lxml import etree as ET
xmlDoc = ET.parse('http://192.168.1.198/Bench_read.xml')
print ET.tostring(xmlDoc, pretty_print=True)
dmtCount = xmlDoc.xpath('//dmt')
print(len(dmtCount))
dmtVal = []
for i in range(1, len(dmtCount)):
dmtVal[i:0] = xmlDoc.xpath('./address/text()')
dmtVal[i:1] = xmlDoc.xpath('./status/text()')
dmtVal[i:2] = xmlDoc.xpath('./flow/text()')
dmtVal[i:3] = xmlDoc.xpath('./dp/text()')
dmtVal[i:4] = xmlDoc.xpath('./inPressure/text()')
dmtVal[i:5] = xmlDoc.xpath('./actVal/text()')
dmtVal[i:6] = xmlDoc.xpath('./temp/text()')
dmtVal[i:7] = xmlDoc.xpath('./valveOnPercent/text()')
print dmtVal
And the results I get are:
$python XMLparse2.py
<response>
<heartbeat>0x24</heartbeat>
<dmt node="1">
<address>0x21</address>
<status>0x01</status>
<flow>0.000000</flow>
<dp>0.000000</dp>
<inPressure>0.000000</inPressure>
<actVal>0.000000</actVal>
<temp>0x00</temp>
<valveOnPercent>0x00</valveOnPercent>
</dmt>
<dmt node="2">
<address>0x32</address>
<status>0x01</status>
<flow>0.000000</flow>
<dp>0.000000</dp>
<inPressure>0.000000</inPressure>
<actVal>0.000000</actVal>
<temp>0x00</temp>
<valveOnPercent>0x00</valveOnPercent>
</dmt>
</response>
...Starting to parse XML nodes
2
[]
...Done
Sooo, nothing is coming out. I've tried using /value in place of the /text() in the xpath call, but the results are unchanged. Is my problem:
1) An incorrect xpath command in the for loop? or
2) A problem in the way I've structured list variable dmtVal ? or
3) Something else I'm missing completely?
I'd welcome any suggestions! Thanks in advance...
dmtVal[i:0] is the syntax for slicing.
You probably wanted indexing: dmtVal[i][0]. But that also wouldn't work.
You don't typically loop over the indices of a list in python, you loop over it's elements instead.
So, you'd use
for element in some_list:
rather than
for i in xrange(len(some_list)):
element = some_list[i]
The way you handle your xpaths is also wrong.
Something like this should work(not tested):
from lxml import etree as ET
xml_doc = ET.parse('http://192.168.1.198/Bench_read.xml')
dmts = xml_doc.xpath('//dmt')
dmt_val = []
for dmt in dmts:
values = []
values.append(dmt.xpath('./address/text()'))
# do this for all values
# making this a loop would be a good idea
dmt_val.append(values)
print dmt_val
Counting <dmt/> tags and then iterating over them by index is both inefficient and un-Pythonic. Apart from that you are using wrong syntax (slice instead of index) for indexing arrays. In fact you don't need to index the val at all, to do it Pythonic way use list comprehensions.
Here's a slightly modified version of what stranac suggested:
from lxml import etree as ET
xmlDoc = ET.parse('http://192.168.1.198/Bench_read.xml')
print ET.tostring(xmlDoc, pretty_print=True)
response = xmlDoc.getroot()
tags = (
'address',
'status',
'flow',
'dp',
'inPressure',
'actVal',
'temp',
'valveOnPercent',
)
dmtVal = []
for dmt in response.iter('dmt'):
val = [dmt.xpath('./%s/text()' % tag) for tag in tags]
dmtVal.append(val)
Can you explain this:
dmtVal[i:0]
If the iteration starts with a count of 0 and increments over times, you're not actually storing anything in the list.