I have an application that takes an xml file as input and converts this data into a specific data structure. I would like to write a test for this application, but instead of using an external xml file I would like to define the xml data inside the test file and then pass this data to the function, so originally my idea was to do something like this:
data = pd.DataFrame([#insert data here])
in_memory_xml = io.BytesIO()
xml_file = original.to_xml(in_memory_xml)
my_function(xml_file)
However, pandas DataFrame objects do not have a "to_xml" function, so the xml data needs to be defined differently.Is there good way to solve this problem that doesn't involve the use of an external xml file?
It can be done by just converting the string to xml using lxml: https://kite.com/python/examples/5415/lxml-load-xml-from-a-string-into-an-%60elementtree%60
You can try the below:
data=pd.read_csv('temps.csv',delimiter=r"\s+")
def myfunction(row):
myxml = ['<item>']
for field in row.index:
myxml.append(' <field name="{0}">{1}</field>'.format(field, row[field]))
myxml.append('</item>')
return '\n'.join(myxml)
final_xml='\n'.join(data.apply(myfunction, axis=1))
print(final_xml)
Related
I'm working on a Flask app that I'd like to have store xml files in a database. I'd like to use flask-sqlalchemy. I've seen that in regular old sqlalchemy it is possible to use the LONGTEXT type. I believe this would work for my use case.
I would like to know (1) if LONGTEXT would be the best way store xml files and, if so, (2) how to use LONGTEXT within the flask-sqlalchemy syntax.
What should {insert-name-here} be in the code below? Will I need to install additional dependencies to use whatever is suggested?
xml_column = db.Column(db.{insert-name-here})
I use python some time.
I use [xml.etree.ElementTree] package to read or write xml data in python.
I use it like:
'''
#import
Import xml.etree.ElementTree as ET
#xml file
_xml = 'c:/.../test.xml'
#read
tree = ET.parse(_xml)
root = tree.getroot()
h_data = root.findall('h')
#write
root = ET.Element('test')
tree = ET.ElementTree(root)
tree.write(_xml), encoding='utf-8', xml_declaration=1)
'''
More you can see documents.
Xml file can save as a txt, but best the encoding is utf8.
I think xml data is not best for python.
The best is json data.
Hope I can help you.
python lxml can be used to extract text (e.g., with xpath) from XML files without having to fully parse XML. For example, I can do the following which is faster than BeautifulSoup, especially for large input. I'd like to have some equivalent code for JSON.
from lxml import etree
tree = etree.XML('<foo><bar>abc</bar></foo>')
print type(tree)
r = tree.xpath('/foo/bar')
print [x.tag for x in r]
I see http://goessner.net/articles/JsonPath/. But I don't see an example python code to extract some text from a json file without having use json.load(). Could anybody show me an example? Thanks.
I'm assuming you don't want to load the entire JSON for performance reasons.
If that's the case, perhaps ijson is what you need. I used it to search huge JSON files (>8gb) and it works well.
However, you will have to implement the search code yourself.
I have almost finished Quiz application in GUI and i'm trying to make a leaderboard so it will show all the names in Label.
But how can this be possible? do i need to create a text file, from where i can import the names? if so:
How can i parse text into files using Python? (For this instance from QLineEdit)
How can i import its data? (In bash scripting something equivalent to grep)
Should i make another python file and add array variables into them?
How can i append data to arrays other file?
import leaderboards #Import leaderboards.py
in leaderboards board.append("name1") #Should it be something like this?
Otherwise how can i do this with Json or some other database scripts?
(without accessing http protocol, game to be offline)
What do i need to do to make leaderboard data file in Json? (can i do it with simple arrays)
How can i parse data to Json file?
How can i import Json file to Python and print it?
Would be very thankful for explanation, for instance Json file name is Data.json.
Can i incude compile Json files in PyQt resources file?
Also if i compile Python to Executable, how can i include this data files?
Sorry for making it too general, i couldn't find specific questions regarding to mine.
2 methods below, reads and writes to json file.
The file structure is similar to a dict.
import json
def json_load_file(json_file):
with open(json_file) as json_file:
json_data = json.load(json_file)
return json_data
def json_dump_to_file(json_file, json_dict):
with open(json_file, 'w') as outfile:
json.dump(json_dict, outfile, indent=4)
After that when a game is ended update the dict, and save it, just an example:
def update_board(json_file, latest_game_score):
leaderboard_dict = json_load_file(json_file)
do_stuff-> update the dict if required (example curr_score> score in file)
when board is updated call->
json_dump_to_file("/root/board.json",leaderboard_dict)
I am using h5py to save data (float numbers), in groups. In addition to the data itself, I need to include an additional file (an .xml file, containing necessary information) within the hdf5. How do i do this? Is my approach wrong?
f = h5py.File('filename.h5')
f.create_dataset('/data/1',numpy_array_1)
f.create_dataset('/data/2',numpy_array_2)
.
.
my h5 tree should look thus:
/
/data
/data/1 (numpy_array_1)
/data/2 (numpy_array_2)
.
.
/morphology.xml (?)
One option is to add it as a variable-length string dataset.
http://code.google.com/p/h5py/wiki/HowTo#Variable-length_strings
E.g.:
import h5py
xmldata = """<xml>
<something>
<else>Text</else>
</something>
</xml>
"""
# Write the xml file...
f = h5py.File('test.hdf5', 'w')
str_type = h5py.new_vlen(str)
ds = f.create_dataset('something.xml', shape=(1,), dtype=str_type)
ds[:] = xmldata
f.close()
# Read the xml file back...
f = h5py.File('test.hdf5', 'r')
print f['something.xml'][0]
If you just need to attach the XML file to the hdf5 file, you can add it as an attribute to the hdf5 file.
xmlfh = open('morphology.xml', 'rb')
h5f.attrs['xml'] = xmlfh.read()
You can access the xml file then like this:
h5f.attrs['xml']
Notice, also, that you can't store attributes larger than 64K, you may want to compress the file before attaching. You can have a look at compressing libraries in the standard library of Python.
However, this doesn't make the information in the XML file very accessible. If you want to associate the metadata of each dataset to some metadata in the XML file, you could map it as you need using an XML library like lxml. You can also add each field of the XML data as a separate attribute so that you can query datasets by XML field, this all depends on what you have in the XML file. Try to think about how you would like to retrieve the data later.
You may also want to create groups for each xml file with its datasets and put it all in a single hdf5 file. I don't know how large are the files you are managing, YMMV.
Let's say I have an existing trivial XML file named 'MyData.xml' that contains the following:
<?xml version="1.0" encoding="utf-8" ?>
<myElement>foo</myElement>
I want to change the text value of 'foo' to 'bar' resulting in the following:
<?xml version="1.0" encoding="utf-8" ?>
<myElement>bar</myElement>
Once I am done, I want to save the changes.
What is the easiest and simplest way to accomplish all this?
Use Python's minidom
Basically you will take the following steps:
Read XML data into DOM object
Use DOM methods to modify the document
Save new DOM object to new XML document
The python spec should hold your hand rather nicely though this process.
This is what I wrote based on #Ryan's answer:
from xml.dom.minidom import parse
import os
# create a backup of original file
new_file_name = 'MyData.xml'
old_file_name = new_file_name + "~"
os.rename(new_file_name, old_file_name)
# change text value of element
doc = parse(old_file_name)
node = doc.getElementsByTagName('myElement')
node[0].firstChild.nodeValue = 'bar'
# persist changes to new file
xml_file = open(new_file_name, "w")
doc.writexml(xml_file, encoding="utf-8")
xml_file.close()
Not sure if this was the easiest and simplest approach but it does work. (#Javier's answer has less lines of code but requires non-standard library)
For quick, non-critical XML manipulations, i really like P4X. It let's you write like this:
import p4x
doc = p4x.P4X (open(file).read)
doc.myElement = 'bar'
You also might want to check out Uche Ogbuji's excellent XML Data Binding Library, Amara:
http://uche.ogbuji.net/tech/4suite/amara
(Documentation here:
http://platea.pntic.mec.es/~jmorilla/amara/manual/)
The cool thing about Amara is that it turns an XML document in to a Python object, so you can just do stuff like:
record = doc.xml_create_element(u'Record')
nameElem = doc.xml_create_element(u'Name', content=unicode(name))
record.xml_append(nameElem)
valueElem = doc.xml_create_element(u'Value', content=unicode(value))
record.xml_append(valueElem
(which creates a Record element that contains Name and Value elements (which in turn contain the values of the name and value variables)).