Python: Generating strings dynamically (based on some templates) - python

I am trying to generate specific xml strings from my module, based on the type of string requested by some code using my module. Each of these xml strings contains some data that gets generated dynamically. For example, many of these xml strings have a cookie field that is generated using another function.
I started out with a python dictionary that initializes all the xml strings with the dynamic field (ie cookie) pre-populated (ie, not exactly dynamic). And then I call the dictionary to get the relevant xml strings.
The problem with this approach is that the cookies expire every hour and therefore the strings being returned by the module after an hour have those expired values. What I would ideally like to have is some form of a generator function (not sure if that's even possible in this case) that returns the correctly formed strings as and when they are requested based on the msg_type requested (as in the example below). Each of the xml strings saved in this dict is in a unique format, so I can't exactly have some sort of a common template xml generator.
As an example, the dict that I have defined looks similar to get_msg dictionary here:
get_msg["msg_value_1"] = """<ABC cookie=""" + getCookie() + """ >
<XYZ """ + foo_name +""">
</XYZ>
</ABC>"""
get_msg["msg_value_2"] = """<ABC cookie=""" + getCookie() + """ >
<some text """ + bar_name + """>
</XYZ>
</ABC>"""
What would be a good approach to be able to generate these xml strings on the fly with getCookie() getting invoked for every fresh msg request. Any input would be appreciated.

In Python, functions are first-class objects. This means you can pass a function as an argument to another function.
def get_msg(function_to_call_to_get_injection_bit, tag_name_function,
cookie_function):
tagname = tag_name_function()
injection = function_to_call_to_get_injection_bit()
cookie = cookie_function()
return '<ABC cookie="%s">' % (cookie) +
'<%s %s></%s></ABC>' % (tagname, injection, tagname)
def get_injection():
return foo_name
def get_tag_name_1():
return "XYZ"
def get_tag_name_2():
return "SomeText"
get_msg(get_injection, get_tag_name_1, getCookie)
get_msg(get_injection, get_tag_name_2, getCookie)
You just call get_msg every time you want a message, and pass it the function(s) that generate the portion of the message not related to the cookie.
From your question it's not entirely clear what the issue is. But this isn't really a "generator function": I don't think you want to return a function (which is possible), you want to return the XML string and just customize how it's built.
As BenDundee commented above, it'd probably also be better to use an XML library to build XML, instead of constructing the strings by hand. Python has several options built in, and more available outside (like the wonderful lxml library).

Related

How do I properly format this API call?

I am making a telegram chatbot and can't figure out how to take out the [{' from the output.
def tether(bot, update):
tetherCall = "https://api.omniexplorer.info/v1/property/31"
tetherCallJson = requests.get(tetherCall).json()
tetherOut = tetherCallJson ['issuances'][:1]
update.message.reply_text("Last printed tether: " + str (tetherOut)+" Please take TXID and past it in this block explorer to see more info: https://www.omniexplorer.info/search")
My user will see this as a response: [{'grant': '25000000.00000000', 'txid': 'f307bdf50d90c92278265cd92819c787070d6652ae3c8af46fa6a96278589b03'}]
This looks like a list with a single dict in it:
[{'grant': '25000000.00000000',
'txid': 'f307bdf50d90c92278265cd92819c787070d6652ae3c8af46fa6a96278589b03'}]
You should be able to access the dict by indexing the list with [0]…
tetherOut[0]
# {'grant': '25000000.00000000',
# 'txid': 'f307bdf50d90c92278265cd92819c787070d6652ae3c8af46fa6a96278589b03'}
…and if you want to get a particular value from the dict you can index by its name, e.g.
tetherOut[0]['txid']
# 'f307bdf50d90c92278265cd92819c787070d6652ae3c8af46fa6a96278589b03'
Be careful chaining these things, though. If tetherOut is an empty list, tetherOut[0] will generate an IndexError. You'll probably want to catch that (and the KeyError that an invalid dict key will generate).

Representation of python dictionaries with unicode in database queries

I have a problem that I would like to know how to efficiently tackle.
I have data that is JSON-formatted (used with dumps / loads) and contains unicode.
This is part of a protocol implemented with JSON to send messages. So messages will be sent as strings and then loaded into python dictionaries. This means that the representation, as a python dictionary, afterwards will look something like:
{u"mykey": u"myVal"}
It is no problem in itself for the system to handle such structures, but the thing happens when I'm going to make a database query to store this structure.
I'm using pyOrient towards OrientDB. The command ends up something like:
"CREATE VERTEX TestVertex SET data = {u'mykey': u'myVal'}"
Which will end up in the data field getting the following values in OrientDB:
{'_NOT_PARSED_': '_NOT_PARSED_'}
I'm assuming this problem relates to other cases as well when you wish to make a query or somehow represent a data object containing unicode.
How could I efficiently get a representation of this data, of arbitrary depth, to be able to use it in a query?
To clarify even more, this is the string the db expects:
"CREATE VERTEX TestVertex SET data = {'mykey': 'myVal'}"
If I'm simply stating the wrong problem/question and should handle it some other way, I'm very much open to suggestions. But what I want to achieve is to have an efficient way to use python2.7 to build a db-query towards orientdb (using pyorient) that specifies an arbitrary data structure. The data property being set is of the OrientDB type EMBEDDEDMAP.
Any help greatly appreciated.
EDIT1:
More explicitly stating that the first code block shows the object as a dict AFTER being dumped / loaded with json to avoid confusion.
Dargolith:
ok based on your last response it seems you are simply looking for code that will dump python expression in a way that you can control how unicode and other data types print. Here is a very simply function that provides this control. There are ways to make this function more efficient (for example, by using a string buffer rather than doing all of the recursive string concatenation happening here). Still this is a very simple function, and as it stands its execution is probably still dominated by your DB lookup.
As you can see in each of the 'if' statements, you have full control of how each data type prints.
def expr_to_str(thing):
if hasattr(thing, 'keys'):
pairs = ['%s:%s' % (expr_to_str(k),expr_to_str(v)) for k,v in thing.iteritems()]
return '{%s}' % ', '.join(pairs)
if hasattr(thing, '__setslice__'):
parts = [expr_to_str(ele) for ele in thing]
return '[%s]' % (', '.join(parts),)
if isinstance(thing, basestring):
return "'%s'" % (str(thing),)
return str(thing)
print "dumped: %s" % expr_to_str({'one': 33, 'two': [u'unicode', 'just a str', 44.44, {'hash': 'here'}]})
outputs:
dumped: {'two':['unicode', 'just a str', 44.44, {'hash':'here'}], 'one':33}
I went on to use json.dumps() as sobolevn suggested in the comment. I didn't think of that one at first since I wasn't really using json in the driver. It turned out however that json.dumps() provided exactly the formats I needed on all the data types I use. Some examples:
>>> json.dumps('test')
'"test"'
>>> json.dumps(['test1', 'test2'])
'["test1", "test2"]'
>>> json.dumps([u'test1', u'test2'])
'["test1", "test2"]'
>>> json.dumps({u'key1': u'val1', u'key2': [u'val21', 'val22', 1]})
'{"key2": ["val21", "val22", 1], "key1": "val1"}'
If you need to take more control of the format, quotes or other things regarding this conversion, see the reply by Dan Oblinger.

Parse an html page with XML in python

I'm trying to put python parsing this XML code from an HTML page:
<weather>
<loc mobiurl="http://foreca.mobi/?lon=-8.6110&lat=41.1496&source=navi/" url="http://foreca.com/?lon=-8.6110&lat=41.1496&source=navi/">
<obs station="Porto / Pedras Rubras" dist="11 km NW" dt="2013-03-06 17:00:00" t="14" tf="14" s="d320" wn="S" ws="8" p="997" rh="94" v="5000"/>
<fc dt="2013-03-07" tx="16" tn="11" s="d220"/>
<fc dt="2013-03-08" tx="15" tn="10" s="d220"/>
<fc dt="2013-03-09" tx="15" tn="10" s="d220"/>
</loc>
</weather>
I want to get the information on dr, s, tx and tn fields but I don't know how to do it with XML functions. I try to read the HTML file and then create and arrow to store the content after the paths said before but I can't get it working.
Is there any easy way to get the data with python?
Some HTML scraping is easily done with pyparsing, using that library's makeHTMLTags method (makeHTMLTags returns a pair of expressions, for opening and closing tags, but in your example, only the opening tag is needed):
from pyparsing import makeHTMLTags
fcTag = makeHTMLTags("fc")[0]
tagAttrs = 'dt s tx tn'.split()
for match in fcTag.searchString(htmltext):
print ' '.join("%s:%s" % (attr,match[attr]) for attr in tagAttrs)
Prints:
dt:2013-03-07 s:d220 tx:16 tn:11
dt:2013-03-08 s:d220 tx:15 tn:10
dt:2013-03-09 s:d220 tx:15 tn:10
This makes it easy to incorporate this fragment parser with pyparsing's other features, such as run-time parse actions, semantic checking, etc.
EDIT
If you want all the dt's, s's, etc. in their own respective lists (in Python, we call them "lists", not "vectors"), do this:
dtArray = []
sArray = []
txArray = []
tnArray = []
for match in fcTag.searchString(htmltext):
dtArray.append(match.dt)
sArray.append(match.s)
txArray.append(match.tx)
tnArray.append(match.tn)
print ' '.join("%s:%s" % (attr,match[attr]) for attr in tagAttrs)
I've seen code like this before, and it is a poor data structure pattern. You access the value of the i'th entry of the original table by getting dtArray[i], sArray[i], etc.
Please consider instead one of the several structured types offered by Python. You have several to choose from:
A. Use dicts.
fcArray = []
for match in fcTag.searchString(htmltext):
fcArray.append(dict((attr,match[attr]) for attr in tagAttrs))
Now to get at the i'th entry, just get fc = fcArray[i], and access the fc['dt'], fc['s'] etc. values from that dict.
B. Use namedtuples.
from collections import namedtuple
FCData = namedtuple("FCData", tagAttrs)
fcArray = []
for match in fcTag.searchString(htmltext):
fcArray.append(FCData(*(match[attr] for attr in tagAttrs)))
You again use fc = fcArray[i] to get the i'th entry, but now you access the values using fc.dt, fc.s, etc. I find this form to be cleaner-looking than the dict form, but there are some restrictions. All the tag names have to be legal Python identifiers, so if you have a tag "rise/run", then you can't use a namedtuple. Also, namedtuples are immutable - you can't take an existing FCData fc and assign into its dt field with fc.dt = "new datetime value". dicts on the other hand would allow this.
C. Use objects. The simplest is a "bag"-type object that creates empty object instances, which you than add attributes to through simple assignment or setattr calls:
class FCData(object): pass
fcArray = []
for match in fcTag.searchString(htmltext):
fc = FCdata()
for attr in tagAttrs:
setattr(fc, attr, match[attr])
fcArray.append(fc)
You get the i'th entry with fc = fcArray[i], and like the namedtuple, you get the attributes using fc.dt and so on. But you can also modify the attributes if need be, and the assignment fc.dt = "new datetime value" would work.
D. Just use the objects created by pyparsing's searchString method.
fcArray = fcTag.searchString(htmltext)
pyparsing returns ParseResults, and it combines the behavior of both dicts and namedtuples. Just like before you access the i'th entry with fc = fcArray[i]. You can read the dt attribute with fc.dt or fc['dt']. You can read fc.dt, but you can't assign to it, just like the namedtuple. You can assign to fc['dt'], just like the dict.
If you can extract just the weather tags easily, you can use the xml.etree.ElementTree API which comes with Python.
import xml.etree.ElementTree as ET
tree = ET.fromstring(weatherdata)
for fcelem in tree.findall('.//fc'):
print fcelem.attrib['tx'], fcelem.attrib['tn']
If you want to extract it from the HTML document, then it depends on how well-formed the HTML is. If it is a XHTML document, the ElementTree API can handle it fine.
Otherwise, you'll need to switch to a HTML parser instead. You could install the lxml library; that library supports the same ElementTree API but has a dedicated HTML parser included.
You could also use BeautifulSoup for an alternate HTML API. In fact, lxml and BeautifulSoup can work in concert giving you a choice of APIs for your tasks; use whichever is easier for you.
Both lxml and BeautifulSoup are external libraries.

Putting together Haystack records without templates

The haystack documentation (link below) makes this statement:
Additionally, we're providing use_template=True on the text field.
This allows us to use a data template (rather than error prone
concatenation) to build the document the search engine will use in
searching.
How would one go about using concatenation to build the document? I couldn't find an example.
It may have something to do with overriding the prepare method (second link). But in the example given in the documentation the prepare method is used together with a template, so the two might also be orthogonal.
https://github.com/toastdriven/django-haystack/blob/master/docs/tutorial.rst
http://django-haystack.readthedocs.org/en/latest/searchindex_api.html#advanced-data-preparation
You can see how it works in the Haystack source. Basically, the default implementation of the prepare method on SearchField (the base class for Haystack's fields) calls prepare_template if use_template is True.
If you don't want to use a template, you can indeed use concatenation - it's as simple as just joining the data you want together, separated by something (here I've used a newline):
def prepare_myfield(self, obj):
return self.cleaned_data['field1'] + '\n' + self.cleaned_data['field2']
etc.

Python String parse

Im working on a data packet retrieval system which will take a packet, and process the various parts of the packet, based on a system of tags [similar to HTML tags].
[text based files only, no binary files].
Each part of the packet is contained between two identical tags, and here is a sample packet:
"<PACKET><HEAD><ID><ID><SEQ><SEQ><FILENAME><FILENAME><HEAD><DATA><DATA><PACKET>"
The entire packet is contained within the <PACKET><PACKET> tags.
All meta-data is contained within the <HEAD><HEAD> tags and the filename from which the packet is part of is contained within the, you guessed it, the <FILENAME><FILENAME> tags.
Lets say, for example, a single packet is received and stored in a temporary string variable called sTemp.
How do you efficiently retrieve, for example, only the contents of a single pair of tags, for example the contents of the <FILENAME><FILENAME> tags?
I was hoping for such functionality as saying getTagFILENAME( packetX ), which would return the textual string contents of the <FILENAME><FILENAME> tags of the packet.
Is this possible using Python?
Any suggestions or comments appreciated.
If the packet format effectively uses XML-looking syntax (i.e., if the "closing tags" actually include a slash), the xml.etree.ElementTree could be used.
This libray is part of Python Standard Library, starting in Py2.5. I find it a very convenient one to deal with this kind of data. It provides many ways to read and to modify this kind of tree structure. Thanks to the generic nature of XML languages and to the XML awareness built-in the ElementTree library, the packet syntax could evolve easily for example to support repeating elements, element attributes.
Example:
>>> import xml.etree.ElementTree
>>> myPacket = '<PACKET><HEAD><ID>123</ID><SEQ>1</SEQ><FILENAME>Test99.txt</FILE
NAME></HEAD><DATA>spam and cheese</DATA></PACKET>'
>>> xt = xml.etree.ElementTree.fromstring(myPacket)
>>> wrk_ele = xt.find('HEAD/FILENAME')
>>> wrk_ele.text
'Test99.txt'
>>>
Something like this?
import re
def getPacketContent ( code, packetName ):
match = re.search( '<' + packetName + '>(.*?)<' + packetName + '>', code )
return match.group( 1 ) if match else ''
# usage
code = "<PACKET><HEAD><ID><ID><SEQ><SEQ><FILENAME><FILENAME><HEAD><DATA><DATA><PACKET>"
print( getPacketContent( code, 'HEAD' ) )
print( getPacketContent( code, 'SEQ' ) )
As mjv points out, there's not the least sense in inventing an XML-like format if you can just use XML.
But: If you're going to use XML for your packet format, you need to really use XML for it. You should use an XML library to create your packets, not just to parse them. Otherwise you will come to grief the first time one of your field values contains an XML markup character.
You can, of course, write your own code to do the necessary escaping, filter out illegal characters, guarantee well-formedness, etc. For a format this simple, that may be all you need to do. But going down that path is a way to learn things about XML that you perhaps would rather not have to learn.
If using an XML library to create your packets is a problem, you're probably better off defining a custom format (and I'd define one that didn't look anything like XML, to keep people from getting ideas) and building a parser for it using pyparsing.

Categories

Resources