Extract and plot XML data using python

Extract and plot XML data using python - python

I'm trying to analyze my XML file, I would like to get data X Y Z for advanced analysis later and then plot all values.
Here the XML files looks like:
<UserPosition>
<X>-12.2934008394709</X>
<Y>52.488259963403273</Y>
<Z>-0.92276278637695341</Z>
</UserPosition>
this is my code :
from lxml import etree
import matplotlib.pyplot as plt
import numpy as np
# Read xml files
PostX = []
PostY= []
Thikness = []
tree = etree.parse("XMLFILE.xml")
for UserPosition in
tree.xpath("/cResult/measure/lMeasuredItem/cMeasureItem/UserPosition/X"):
PostX.append(UserPosition.text)
print PostX
I'm getting this ! :
['-12.2934008394709', '-9.1133008238197366', '-5.9329608027622784', '-2.7523007917339029',
Any help to get a proper values for analysis.

Is there any reason you cant change
PostX.append(UserPosition.text)
to
PostX.append(float(UserPosition.text))
Otherwise, it would be helpful to see how it is all your x, y and z values (or certainly more of them) are structured in this .xml file.

Related

Python XML handling. Subset Etree / remove elements / split etree in two parts based on tags

I have an issue where I loop through a xml response and populate lists. But once in a while the response has unexpected data which needs to be treated differently. I am struggling to split the relevant data into 2 categories. the main tag here "familierelasjon" in theory can go to infinity.
Id like to have two etrees as an output: one containing all data where the parent "familierelasjon" has the child "relatertPersonUtenFolkeregisteridentifikator" and one etree where it does not. See the xml example at the end.
Im using the data to populate lists that are appended and since the data format and content for the two cases are different the irregular responses kills my logic. The lists would generate a Length of values (x) does not match length of index (y) error. If i was able to split the data before looping through I could make two different logics for handling said data based on which tree im processing. Is there a simple way to do this? The BASE64 string contains the XML example data attached under.
Simplified code just for some context and the xml example data:
import base64 # Encoding/decoding Base64
import pandas as pd # Create and manipulate tables/dataframes
#import numpy as np # Formatting and array operations
from lxml import etree # Xml parsing
#from datetime import datetime as datetime
#import datetime as dt
#from dateutil.relativedelta import relativedelta
enc_family_str_main = str('ICAgIDxmYW1pbGllcmVsYXNqb24+CiAgICAJPHJlbGF0ZXJ0UGVyc29uVXRlbkZvbGtlcmVnaXN0ZXJpZGVudGlmaWthdG9yPgogICAgCSAgPG5hdm4+CiAgICAJCTxmb3JuYXZuPkNsb3duLUNsb3duPC9mb3JuYXZuPgogICAgCQk8ZXR0ZXJuYXZuPkNsb3duaWU8L2V0dGVybmF2bj4KICAgIAkgIDwvbmF2bj4KICAgIAkgIDxmb2Vkc2Vsc2RhdG8+MjAyMi0wMS0wMTwvZm9lZHNlbHNkYXRvPgogICAgCSAgPHN0YXRzYm9yZ2Vyc2thcD5GVUtPRjwvc3RhdHNib3JnZXJza2FwPgogICAgCTwvcmVsYXRlcnRQZXJzb25VdGVuRm9sa2VyZWdpc3RlcmlkZW50aWZpa2F0b3I+CiAgICAJPHJlbGF0ZXJ0UGVyc29uc1JvbGxlPmVrdGVmZWxsZUVsbGVyUGFydG5lcjwvcmVsYXRlcnRQZXJzb25zUm9sbGU+CiAgICAgIDwvZmFtaWxpZXJlbGFzam9uPgogICAgICA8ZmFtaWxpZXJlbGFzam9uPgogICAgCTxyZWxhdGVydFBlcnNvbj4xMjM0NTY3ODkxMDwvcmVsYXRlcnRQZXJzb24+CiAgICAJPHJlbGF0ZXJ0UGVyc29uc1JvbGxlPmJhcm48L3JlbGF0ZXJ0UGVyc29uc1JvbGxlPgogICAgCTxtaW5Sb2xsZUZvclBlcnNvbj5mYXI8L21pblJvbGxlRm9yUGVyc29uPgogICAgICA8L2ZhbWlsaWVyZWxhc2pvbj4=')
"""
######## Parse and create loopable xml trees ########
"""
dec_family_str_main = base64.b64decode(enc_family_str_main + '='*(-len(enc_family_str_main) % 4))
parser = etree.XMLParser(recover=True)
# DSF children info main
tree_main = etree.fromstring(dec_family_str_main, parser=parser)
full_ssn = []
applicant = []
relation = []
#populate lists by looping through xml tree data
for element in tree_main.iter():
if element.tag=='relatertPerson':
full_ssn.append(element.text)
applicant.append('Main')
if element.tag=='relatertPersonsRolle':
relation.append(element.text)
#if co_applicant==1:
#for element in tree_co.iter():
#if element.tag=='relatertPerson':
#full_ssn.append(element.text)
#applicant.append('Co')
#if element.tag=='relatertPersonsRolle':
#relation.append(element.text)
#create dataframe and fill with list data
df = pd.DataFrame()
df['applicant'] = applicant
df['SSN'] = full_ssn
df['relation'] = relation
XML example simplified:
<familierelasjon>
<relatertPersonUtenFolkeregisteridentifikator>
<navn>
<fornavn>Clown-Clown</fornavn>
<etternavn>Clownie</etternavn>
</navn>
<foedselsdato>2022-01-01</foedselsdato>
<statsborgerskap>FUKOF</statsborgerskap>
</relatertPersonUtenFolkeregisteridentifikator>
<relatertPersonsRolle>ektefelleEllerPartner</relatertPersonsRolle>
</familierelasjon>
<familierelasjon>
<relatertPerson>12345678910</relatertPerson>
<relatertPersonsRolle>barn</relatertPersonsRolle>
<minRolleForPerson>far</minRolleForPerson>
</familierelasjon>
I'd like the following output so that i have two subset of trees that i can run my iteration logic on.
Tree1:
<familierelasjon>
<relatertPersonUtenFolkeregisteridentifikator>
<navn>
<fornavn>Clown-Clown</fornavn>
<etternavn>Clownie</etternavn>
</navn>
<foedselsdato>2022-01-01</foedselsdato>
<statsborgerskap>FUKOF</statsborgerskap>
</relatertPersonUtenFolkeregisteridentifikator>
<relatertPersonsRolle>ektefelleEllerPartner</relatertPersonsRolle>
</familierelasjon>
+ all other <familierelasjon> that contains <relatertPersonUtenFolkeregisteridentifikator>
tree2:
<familierelasjon>
<relatertPerson>12345678910</relatertPerson>
<relatertPersonsRolle>barn</relatertPersonsRolle>
<minRolleForPerson>far</minRolleForPerson>
</familierelasjon>
+ all other <familerelasjon> that does NOT contain <relatertPersonUtenFolkeregisteridentifikator>
PS I am stupid and not a programmer. I just type stuff and see what happens.

If I understand you correctly, this will get you what you want. Note that, instead of ElementTree, I'm using lxml because of its better xpath support.
Assuming your sample xml is this:
fams = """<root>
<familierelasjon>
<relatertPersonUtenFolkeregisteridentifikator>
<navn>
<fornavn>Clown1-Clown1</fornavn>
<etternavn>Clownie1</etternavn>
</navn>
<foedselsdato>2022-01-01</foedselsdato>
<statsborgerskap>FUKOF1</statsborgerskap>
</relatertPersonUtenFolkeregisteridentifikator>
<relatertPersonsRolle>ektefelleEllerPartner1</relatertPersonsRolle>
</familierelasjon>
<familierelasjon>
<relatertPerson>12345678910</relatertPerson>
<relatertPersonsRolle>barn1</relatertPersonsRolle>
<minRolleForPerson>far1</minRolleForPerson>
</familierelasjon>
<familierelasjon>
<relatertPersonUtenFolkeregisteridentifikator>
<navn>
<fornavn>Clown2-Clown2</fornavn>
<etternavn>Clownie2</etternavn>
</navn>
<foedselsdato>3022-01-01</foedselsdato>
<statsborgerskap>FUKOF2</statsborgerskap>
</relatertPersonUtenFolkeregisteridentifikator>
<relatertPersonsRolle>ektefelleEllerPartner2</relatertPersonsRolle>
</familierelasjon>
<familierelasjon>
<relatertPerson>1111111</relatertPerson>
<relatertPersonsRolle>barn2</relatertPersonsRolle>
<minRolleForPerson>far2</minRolleForPerson>
</familierelasjon>
</root>"""
You can use the following:
from lxml import etree
tree1= etree.fromstring('<root></root>')
tree2= etree.fromstring('<root></root>')
#you need a root element for a well formed xml file
source = etree.fromstring(fams)
#first run will get you elements "where the parent "familierelasjon" has the child "relatertPersonUtenFolkeregisteridentifikator""
for fr in source.xpath('//familierelasjon[relatertPersonUtenFolkeregisteridentifikator]'):
tree1.append(fr)
#second run will get you the opposite
for nfr in source.xpath('//familierelasjon[not(.//relatertPersonUtenFolkeregisteridentifikator)]'):
tree2.append(nfr)
print(etree.tostring(tree1).decode())
print('----------------------')
print(etree.tostring(tree2).decode())
Output should be your expected output.

Converting pixels into wavelength using 2 FITS files

I am new to python and FITS image files, as such I am running into issues. I have two FITS files; the first FITS file is pixels/counts and the second FITS file (calibration file) is pixels/wavelength. I need to convert pixels/counts into wavelength/counts. Once this is done, I need to output wavelength/counts as a new FITS file for further analysis. So far I have managed to array the required data as shown in the code below.
import numpy as np
from astropy.io import fits
# read the images
image_file = ("run_1.fits")
image_calibration = ("cali_1.fits")
hdr = fits.getheader(image_file)
hdr_c = fits.getheader(image_calibration)
# print headers
sp = fits.open(image_file)
print('\n\nHeader of the spectrum :\n\n', sp[0].header, '\n\n')
sp_c = fits.open(image_calibration)
print('\n\nHeader of the spectrum :\n\n', sp_c[0].header, '\n\n')
# generation of arrays with the wavelengths and counts
count = np.array(sp[0].data)
wave = np.array(sp_c[0].data)
I do not understand how to save two separate arrays into one FITS file. I tried an alternative approach by creating list as shown in this code
file_list = fits.open(image_file)
calibration_list = fits.open(image_calibration)
image_data = file_list[0].data
calibration_data = calibration_list[0].data
# make a list to hold images
img_list = []
img_list.append(image_data)
img_list.append(calibration_data)
# list to numpy array
img_array = np.array(img_list)
# save the array as fits - image cube
fits.writeto('mycube.fits', img_array)
However I could only save as a cube, which is not correct because I just need wavelength and counts data. Also, I lost all the headers in the newly created FITS file. To say I am lost is an understatement! Could someone point me in the right direction please? Thank you.
I am still working on this problem. I have now managed (I think) to produce a FITS file containing the wavelength and counts using this website:
https://www.mubdirahman.com/assets/lecture-3---numerical-manipulation-ii.pdf
This is my code:
# Making a Primary HDU (required):
primaryhdu = fits.PrimaryHDU(flux) # Makes a header # or if you have a header that you’ve created: primaryhdu = fits.PrimaryHDU(arr1, header=head1)
# If you have additional extensions:
secondhdu = fits.ImageHDU(wave)
# Making a new HDU List:
hdulist1 = fits.HDUList([primaryhdu, secondhdu])
# Writing the file:
hdulist1.writeto("filename.fits", overwrite=True)
image = ("filename.fits")
hdr = fits.open(image)
image_data = hdr[0].data
wave_data = hdr[1].data
I am sure this is not the correct format for wavelength/counts. I need both wavelength and counts to be contained in hdr[0].data

If you are working with spectral data, it might be useful to look into specutils which is designed for common tasks associated with reading/writing/manipulating spectra.
It's common to store spectral data in FITS files using tables, rather than images. For example you can create a table containing wavelength, flux, and counts columns, and include the associated units in the column metadata.
The docs include an example on how to create a generic "FITS table" writer with wavelength and flux columns. You could start from this example and modify it to suit your exact needs (which can vary quite a bit from case to case, which is probably why a "generic" FITS writer is not built-in).
You might also be able to use the fits-wcs1d format.
If you prefer not to use specutils, that example still might be useful as it demonstrates how to create an Astropy Table from your data and output it to a well-formatted FITS file.

How to modify a set of concatenate traces in one file to a set of

I have a set of traces in one folder Folder_Traces:
Trace1.npy
Trace2.npy
Trace3.npy
Trace4.npy
...
In my code, I must concatenate all traces and put them in one file.Each trace is a table. The big file where I put all my file is a table containing a set of tables. This file looks like this: All_Traces=[[Trace1],[Trace2],[Trace3],...[Tracen]]
import numpy as np
import matplotlib.pyplot as plt
sbox=( 0x63,0x7c,0x77,0x7b,0xf2,0x6b..........)
hw = [bin(x).count("1") for x in range(256)]
print (sbox)
print ([hw[s] for s in sbox])
# Start calculating template
# 1: load data
tempTraces = np.load(r'C:\\Users\\user\\2016.06.01-09.41.16_traces.npy')
tempPText = np.load(r'C:\\Users\\user\\2016.06.01-09.41.16_textin.npy')
tempKey = np.load(r'C:\\Users\\user\\2016.06.01-09.41.16_keylist.npy')
print (tempPText)
print (len(tempPText))
print (tempKey)
print (len(tempKey))
plt.plot(tempTraces[0])
plt.show()
tempSbox = [sbox[tempPText[i][0] ^ tempKey[i][0]] for i in range(len(tempPText))]
print (sorted(tempSbox))
So, what I need is to use all my trace files without concatenation, because concatenation causes many memory problems. So what I need is to change this line: tempTraces = np.load(r'C:\\Users\\user\\2016.06.01-09.41.16_traces.npy') by the path for my folder directly then load each trace and make the necessary analysis. So, How to resolve that please?

Python input from a <pre> tag

So I am writing some code in Python 2.7 to pull some information from a website, pull the relevant data from that set, then format that data in a way that is more useful. Specifically, I am wanting to take information from a html <pre> tag, put it into a file, turn that information in the file into an array (using numpy), and then do my analysis from that. I am stuck on the "put into a file" part. It seems that when I put it into a file, it is a 1x1 matrix or something and so it won't do what I hope it will. On an attempt previous to the code sample below, the error I got was: IndexError: index 5 is out of bounds for axis 0 with size 0 I had the index for array just to test if it would provide output from what I have so far.
Here is my code so far:
#Pulling data from GFS lamps
from lxml import html
import requests
import numpy as np
ICAO = raw_input("What station would you like GFS lamps data for? ")
page = requests.get('http://www.nws.noaa.gov/cgi-bin/lamp/getlav.pl?sta=' + ICAO)
tree = html.fromstring(page.content)
Lamp = tree.xpath('//pre/text()') #stores class of //pre html element in list Lamp
gfsLamps = open('ICAO', 'w') #stores text of Lamp into a new file
gfsLamps.write(Lamp[0])
array = np.genfromtxt('ICAO') #puts file into an array
array[5]
You can use KOGD as the ICAO to test this. As is, I get Value Error: Some Errors were detected and it lists Lines 2-23 (Got 26 columns instead of 8). What is the first step that I am doing wrong for what I want to do? Or am I just going about this all wrong?

The problem isn't in the putting data into the file part, its getting it out using genfromtxt. The problem is that genfromtxt is a very rigid function, mostly needs complete data unless you specify lots of options to skip columns and rows. Use this one instead:
arrays = [np.array(map(str, line.split())) for line in open('ICAO')]
The arrays variable will contain array of each line which contains each individual element in that line seperated by a space, for ex if your line has the following data:
a b cdef 124
the array for this line will be:
['a','b','cdef','124']
arrays will contain array of each line like this, which can be processed as you wish further.
So complete code is:
from lxml import html
import requests
import numpy as np
ICAO = raw_input("What station would you like GFS lamps data for? ")
page = requests.get('http://www.nws.noaa.gov/cgi-bin/lamp/getlav.pl?sta=' + ICAO)
tree = html.fromstring(page.content)
Lamp = tree.xpath('//pre/text()') #stores class of //pre html element in list Lamp
gfsLamps = open('ICAO', 'w') #stores text of Lamp into a new file
gfsLamps.write(Lamp[0])
gfsLamps.close()
array = [np.array(map(str, line.split())) for line in open('ICAO')]
print array

python program to export numpy/lists in svmlight format

Any way to export a python array into SVM light format?

There is one in scikit-learn:
http://scikit-learn.org/stable/modules/generated/sklearn.datasets.dump_svmlight_file.html
It's basic but it works both for numpy arrays and scipy.sparse matrices.

I wrote this totally un-optimized script a while ago, maybe it can help! Data and labels must be in two separate numpy arrays.
def save_svmlight_data(data, labels, data_filename, data_folder = ''):
file = open(data_folder+data_filename,'w')
for i,x in enumerate(data):
indexes = x.nonzero()[0]
values = x[indexes]
label = '%i'%(labels[i])
pairs = ['%i:%f'%(indexes[i]+1,values[i]) for i in xrange(len(indexes))]
sep_line = [label]
sep_line.extend(pairs)
sep_line.append('\n')
line = ' '.join(sep_line)
file.write(line)

The svmlight-loader module can load an svmlight file into a numpy array. I don't think anything exists for the other direction, but the module is probably a good starting point for extending its functionality.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extract and plot XML data using python - python

Is there any reason you cant change PostX.append(UserPosition.text) to PostX.append(float(UserPosition.text)) Otherwise, it would be helpful to see how it is all your x, y and z values (or certainly more of them) are structured in this .xml file.

Related

Python XML handling. Subset Etree / remove elements / split etree in two parts based on tags

Converting pixels into wavelength using 2 FITS files

How to modify a set of concatenate traces in one file to a set of

Python input from a <pre> tag

python program to export numpy/lists in svmlight format

Categories

Resources