Getting name of attributes with Scrapy XPATH - python

I'm trying to get both keys and values of attributes of some tag in a XML file (using scrapy and xpath).
The tag is something like:
<element attr1="value1" attr2="value2 ...>
I don't know the keys "attr1", "attr2" and so on, and they can change between two elements. I didn't figure out how to get both keys and values with xpath, is there any other good practice for doing that?

Short version
>>> for element in selector.xpath('//element'):
... attributes = []
... # loop over all attribute nodes of the element
... for index, attribute in enumerate(element.xpath('#*'), start=1):
... # use XPath's name() string function on each attribute,
... # using their position
... attribute_name = element.xpath('name(#*[%d])' % index).extract_first()
... # Scrapy's extract() on an attribute returns its value
... attributes.append((attribute_name, attribute.extract()))
...
>>> attributes # list of (attribute name, attribute value) tuples
[(u'attr1', u'value1'), (u'attr2', u'value2')]
>>> dict(attributes)
{u'attr2': u'value2', u'attr1': u'value1'}
>>>
Long version
XPath has a name(node-set?) function to get node names (an attribute is a node, an attribute node):
The name function returns a string containing a QName representing the expanded-name of the node in the argument node-set that is first in document order.(...) If the argument it omitted, it defaults to a node-set with the context node as its only member.
(source: http://www.w3.org/TR/xpath/#function-name)
>>> import scrapy
>>> selector = scrapy.Selector(text='''
... <html>
... <element attr1="value1" attr2="value2">some text</element>
... </html>''')
>>> selector.xpath('//element').xpath('name()').extract()
[u'element']
(Here, I chained name() on the result of //element selection, to apply the function to all selected element nodes. A handy feature of Scrapy selectors)
One would like to do the same with attribute nodes, right? But it does not work:
>>> selector.xpath('//element/#*').extract()
[u'value1', u'value2']
>>> selector.xpath('//element/#*').xpath('name()').extract()
[]
>>>
Note: I don't know if it's a limitation of lxml/libxml2, which Scrapy uses under the hood, or if the XPath specs disallow it. (I don't see why it would.)
What you can do though is use name(node-set) form, i.e. with a non-empty node-set as parameter. If you read carefully the part of the XPath 1.0 specs I pasted above, as with other string functions, name(node-set) only takes into account the first node in the node-set (in document order):
>>> selector.xpath('//element').xpath('#*').extract()
[u'value1', u'value2']
>>> selector.xpath('//element').xpath('name(#*)').extract()
[u'attr1']
>>>
Attribute nodes also have positions, so you can loop on all attributes by their position. Here we have 2 (result of count(#*) on the context node):
>>> for element in selector.xpath('//element'):
... print element.xpath('count(#*)').extract_first()
...
2.0
>>> for element in selector.xpath('//element'):
... for i in range(1, 2+1):
... print element.xpath('#*[%d]' % i).extract_first()
...
value1
value2
>>>
Now, you can guess what we can do: call name() for each #*[i]
>>> for element in selector.xpath('//element'):
... for i in range(1, 2+1):
... print element.xpath('name(#*[%d])' % i).extract_first()
...
attr1
attr2
>>>
If you put all this together, and assume that #* will get you attributes in document order (not said in the XPath 1.0 specs I think, but it's what I see happening with lxml), you end up with this:
>>> attributes = []
>>> for element in selector.xpath('//element'):
... for index, attribute in enumerate(element.xpath('#*'), start=1):
... attribute_name = element.xpath('name(#*[%d])' % index).extract_first()
... attributes.append((attribute_name, attribute.extract()))
...
>>> attributes
[(u'attr1', u'value1'), (u'attr2', u'value2')]
>>> dict(attributes)
{u'attr2': u'value2', u'attr1': u'value1'}
>>>

I'm trying to get both keys and values of attributes of some tag in a XML file (using scrapy and xpath).
You need #*, which means "any attribute". The XPath expression //element/#* will give you all the attributes of elements element, and with the attributes, their values.

Related

strange error using AttrDict on Python 2.7

I am getting a weird recurring error using AttrDict 2.0 on Python 2.7. The weird part is that transitive assignment seems to break, but only when using AttrDict.
What's happening is that I want to instantiate a new list on an object if it doesn't exist and then append data to it.
If I use AttrDict, the list somehow gets transformed into a tuple and I get an exception.
from attrdict import AttrDict
class Test(object):
pass
try:
for cls_ in [Test,AttrDict]:
foo = cls_()
print ("\ntesting with class %s" % (cls_))
#this
chk = foo.li = getattr(foo, "li", None) or []
print(" type(chk):%s, id(chk):%s" % (type(chk),id(chk)))
print(" type(foo.li):%s, id(foo.li):%s" % (type(foo.li),id(foo.li)))
foo.li.append(3)
print (" success appending with class %s: foo.li:%s" % (cls_, foo.li))
except (Exception,) as e:
# pdb.set_trace()
raise
Now check out the output, when I use the Test class vs when I use AttrDict.
testing with class <class '__main__.Test'>
type(chk):<type 'list'>, id(chk):4465207704
type(foo.li):<type 'list'>, id(foo.li):4465207704
success appending with class <class '__main__.Test'>: foo.li:[3]
With the custom Test class, as expected, chk and foo.li are both lists and have the same id. append works.
Looking at the pass using AttrDict, id does not match and foo.li is a tuple rather than a list.
testing with class <class 'attrdict.dictionary.AttrDict'>
type(chk):<type 'list'>, id(chk):4465207848
type(foo.li):<type 'tuple'>, id(foo.li):4464595080
Traceback (most recent call last):
File "test_attrdict2.py", line 25, in <module>
test()
File "test_attrdict2.py", line 18, in test
foo.li.append(3)
AttributeError: 'tuple' object has no attribute 'append'
Is attrdict assignment actually returning some kind of property/accessor object that gets changed the 2nd time you access it?
Took #abartnet's suggestion:
from attrdict import AttrDict
a = AttrDict()
a.li = []
print(a.li)
output:
()
OK, but even if that points to some weird behavior on AttrDict's end, how is it the transitive assignment does not assign the tuple as well?
reworked:
from attrdict import AttrDict
a = AttrDict()
b = a.li = []
print("a.li:", a.li)
print("b:",b)
output:
('a.li:', ())
('b:', [])
This is part of the automatic recursiveness of AttrDict. Which is explained better in the inline help (which you can find here in the source) than in the README:
If a values which is accessed as an attribute is a Sequence-type (and is not a string/bytes), it will be converted to a _sequence_type with any mappings within it converted to Attrs.
In other words, in order to auto-convert any dict or other mappings recursively inside your AttrDict to AttrDict values when doing attribute access, it also converts all sequences to (by default) tuple. This is a little weird, but appears to be intentional and somewhat-documented behavior, not a bug.
>>> a = AttrDict()
>>> a._sequence_type
tuple
>>> a.li = []
>>> a.li
()
The more flexible AttrMap type lets you specify the sequence type, and documents that you can disable this recursive remapping stuff by passing None:
>>> a = AttrMap(sequence_type=None)
>>> a.li = []
>>> a.li
[]
But of course AttrMap isn't a dict (although it is a collections.abc.MutableMapping, and more generally it duck-types as a dict-like type).
OK, but even if that points to some weird behavior on AttrDict's end, how is it the transitive assignment does not assign the tuple as well?
Because that's not how chained assignment works. Oversimplifying a bit:
target1 = target2 = value
… is not equivalent to this:
target2 = value
target1 = target2
… but to this:
target2 = value
target1 = value
The best way to understand why that's true: targets aren't expressions, and therefore don't have values. Sure, often the exact same sequence of tokens would be valid as an expression elsewhere in the grammar, but that sequence of tokens never gets evaluated as an expression anywhere in an assignment statement—otherwise, simple things like d['spam'] = 'eggs' would have to raise an exception if d['spam'] didn't exist.
Also, a.li = [] doesn't actually assign tuple([]) anywhere; it actually stores the [] internally, and does the tuple(…) later, when you try to access a.li. You can't really tell that for sure without reading the source, but when you consider that a['li'] gives you [] rather than (), it pretty much has to be true. And, in fact:
>>> li = []
>>> a.li = li
>>> a['li'] is li
True

python - lxml: enforcing a specific order for attributes

I have an XML writing script that outputs XML for a specific 3rd party tool.
I've used the original XML as a template to make sure that I'm building all the correct elements, but the final XML does not appear like the original.
I write the attributes in the same order, but lxml is writing them in its own order.
I'm not sure, but I suspect that the 3rd part tool expects attributes to appear in a specific order, and I'd like to resolve this issue so I can see if its the attrib order that making it fail, or something else.
Source element:
<FileFormat ID="1" Name="Development Signature" PUID="dev/1" Version="1.0" MIMEType="text/x-test-signature">
My source script:
sig.fileformat = etree.SubElement(sig.fileformats, "FileFormat", ID = str(db.ID), Name = db.name, PUID="fileSig/{}".format(str(db.ID)), Version = "", MIMEType = "")
My resultant XML:
<FileFormat MIMEType="" PUID="fileSig/19" Version="" Name="Printer Info File" ID="19">
Is there a way of constraining the order they are written?
It looks like lxml serializes attributes in the order you set them:
>>> from lxml import etree as ET
>>> x = ET.Element("x")
>>> x.set('a', '1')
>>> x.set('b', '2')
>>> ET.tostring(x)
'<x a="1" b="2"/>'
>>> y= ET.Element("y")
>>> y.set('b', '2')
>>> y.set('a', '1')
>>> ET.tostring(y)
'<y b="2" a="1"/>'
Note that when you pass attributes using the ET.SubElement() constructor, Python constructs a dictionary of keyword arguments and passes that dictionary to lxml. This loses any ordering you had in the source file, since Python's dictionaries are unordered (or, rather, their order is determined by string hash values, which may differ from platform to platform or, in fact, from execution to execution).
OrderedDict of attributes
As of lxml 3.3.3 (perhaps also in earlier versions) you can pass an OrderedDict of attributes to the lxml.etree.(Sub)Element constructor and the order will be preserved when using lxml.etree.tostring(root):
sig.fileformat = etree.SubElement(sig.fileformats, "FileFormat", OrderedDict([("ID",str(db.ID)), ("Name",db.name), ("PUID","fileSig/{}".format(str(db.ID))), ("Version",""), ("MIMEType","")]))
Note that the ElementTree API (xml.etree.ElementTree) does not preserve attribute order even if you provide an OrderedDict to the xml.etree.ElementTree.(Sub)Element constructor!
UPDATE: Also note that using the **extra parameter of the lxml.etree.(Sub)Element constructor for specifying attributes does not preserve attribute order:
>>> from lxml.etree import Element, tostring
>>> from collections import OrderedDict
>>> root = Element("root", OrderedDict([("b","1"),("a","2")])) # attrib parameter
>>> tostring(root)
b'<root b="1" a="2"/>' # preserved
>>> root = Element("root", b="1", a="2") # **extra parameter
>>> tostring(root)
b'<root a="2" b="1"/>' # not preserved
Attribute ordering and readability
As the commenters have mentioned, attribute order has no semantic significance in XML, which is to say it doesn't change the meaning of an element:
<tag attr1="val1" attr2="val2"/>
<!-- means the same thing as: -->
<tag attr2="val2" attr1="val1"/>
There is an analogous characteristic in SQL, where column order doesn't change
the meaning of a table definition. XML attributes and SQL columns are a set
(not an ordered set), and so all that can "officially" be said about either
one of those is whether the attribute or column is present in the set.
That said, it definitely makes a difference to human readability which order
these things appear in and in situations where constructs like this are authored and appear in text (e.g. source code) and must be interpreted, a careful ordering makes a lot of sense to me.
Typical parser behavior
Any XML parser that treated attribute order as significant would be out of compliance with the XML standard. That doesn't mean it can't happen, but in my experience it is certainly unusual. Still, depending on the provenence of the tool you mention, it's a possibility that may be worth testing.
As far as I know, lxml has no mechanism for specifying the order attributes appear in serialized XML, and I would be surprised if it did.
In order to test the behavior I'd be strongly inclined to just write a text-based template to generate enough XML to test it out:
id = 1
name = 'Development Signature'
puid = 'dev/1'
version = '1.0'
mimetype = 'text/x-test-signature'
template = ('<FileFormat ID="%d" Name="%s" PUID="%s" Version="%s" '
'MIMEType="%s">')
xml = template % (id, name, puid, version, mimetype)
I have seen order matter where the consumer of the XML is expecting canonicalized XML. Canonical XML specifies that the attributes be sorted:
in increasing lexicographic order with namespace URI as the primary
key and local name as the secondary key (an empty namespace URI is
lexicographically least). (section 2.6 of https://www.w3.org/TR/xml-c14n2/)
So if your application is expecting the kind of order you would get out of canonical XML, lxml does support output in canonical form using the method= argument to print. (see heading C14N of https://lxml.de/api.html)
For example:
from lxml import etree as ET
element = ET.Element('Test', B='beta', Z='omega', A='alpha')
val = ET.tostring(element, method="c14n")
print(val)
lxml uses libxml2 under the hood. It preserves attribute order, which means for an individual element you can sort them like this:
x = etree.XML('<x a="1" b="2" d="4" c="3"><y></y></x>')
sorted_attrs = sorted(x.attrib.items())
x.attrib.clear()
x.attrib.update(sorted_attrs)
Not very helpful if you want them all sorted though. If you want them all sorted you can use the c14n2 output method (XML Canonicalisation Version 2):
>>> x = etree.XML('<x a="1" b="2" d="4" c="3"><y></y></x>')
>>> etree.tostring(x, method="c14n2")
b'<x a="1" b="2" c="3" d="4"><y></y></x>'
That will sort the attributes. Unfortunately it has the downside of ignoring pretty_print, which isn't great if you want human-readable XML.
If you use c14n2 then lxml will use custom Python serialisation code to write the XML which calls sorted(x.attrib.items() itself for all attributes. If you don't, then it will instead call into libxml2's xmlNodeDumpOutput() function which doesn't support sorting attributes but does support pretty-printing.
Therefore the only solution is to manually walk the XML tree and sort all the attributes, like this:
from lxml import etree
x = etree.XML('<x a="1" b="2" d="4" c="3"><y z="1" a="2"><!--comment--></y></x>')
for el in x.iter(etree.Element):
sorted_attrs = sorted(el.attrib.items())
el.attrib.clear()
el.attrib.update(sorted_attrs)
etree.tostring(x, pretty_print=True)
# b'<x a="1" b="2" c="3" d="4">\n <y a="2" z="1">\n <!--comment-->\n </y>\n</x>\n'
You need to encapsulate a new string, which gives order when compared, and gives value when print and get strings.
Here is an example:
class S:
def __init__(self, _idx, _obj):
self._obj = (_idx, _obj)
def get_idx(self):
return self._obj[0]
def __le__(self, other):
return self._obj[0] <= other.get_idx()
def __lt__(self, other):
return self._obj[0] < other.get_idx()
def __str__(self):
return self._obj[1].__str__()
def __repr__(self):
return self._obj[1].__repr__()
def __eq__(self, other):
if isinstance(other, str):
return self._obj[1] == other
elif isinstance(other, S):
return self._obj[
0] == other.get_idx() and self.__str__() == other.__str__()
else:
return self._obj[
0] == other.get_idx() and self._obj[1] == other
def __add__(self, other):
return self._obj[1] + other
def __hash__(self):
return self._obj[1].__hash__()
def __getitem__(self, item):
return self._obj[1].__getitem__(item)
def __radd__(self, other):
return other + self._obj[1]
list_sortable = ['c', 'b', 'a']
list_not_sortable = [S(0, 'c'), S(0, 'b'), S(0, 'a')]
print("list_sortable ---- Before sort ----")
for ele in list_sortable:
print(ele)
print("list_not_sortable ---- Before sort ----")
for ele in list_not_sortable:
print(ele)
list_sortable.sort()
list_not_sortable.sort()
print("list_sortable ---- After sort ----")
for ele in list_sortable:
print(ele)
print("list_not_sortable ---- After sort ----")
for ele in list_not_sortable:
print(ele)
running result:
list_sortable ---- Before sort ----
c
b
a
list_not_sortable ---- Before sort ----
c
b
a
list_sortable ---- After sort ----
a
b
c
list_not_sortable ---- After sort ----
c
b
a
dict_sortable ---- After sort ----
a 3
b 2
c 1
dict_not_sortable ---- After sort ----
c 1
b 2
a 3

Python: Counter not taking class name of namedtuple into account ... how do I fix?

>>> Employee = namedtuple("Employee", "name")
>>> Patient = namedtuple("Patient", "name")
>>> e = Employee("Mike")
>>> p = Patient("Mike")
>>> Counter([e, p])
Counter({Employee(name='Mike'): 2})
Why doesn't the Counter differentiate between the two classes of namedtuple?
Namedtuples are, as the name implies, tuples. They are compared elementwise. Since both of your tuples have "Mike" as the first (and only) element, they are equal. It doesn't matter that they're different classes; only the contents are compared.
If you want to take account of the class itself in comparison, you'd have to write your own wrapper class. (One simple possibility would be to make a wrapper that includes the class name as an element of the tuple, so employee-Mike would become ("Employee", "Mike") and patient-Mike would be ("Patient", "Mike").)

What are the arguments of ElementTree.SubElement used for?

I have looked at the documentation here:
http://docs.python.org/dev/library/xml.etree.elementtree.html#xml.etree.ElementTree.SubElement
The parent and tag argument seems clear enough, but what format do I put the attribute name and value in? I couldn't find any previous example. What format is the extra** argument?
I receive and error for trying to call the SubElement itself, saying that it is not defined. Thank you.
SubElement is a function of ElementTree (not Element) which allows to create child objects for an Element.
attrib takes a dictionary containing the attributes
of the element you want to create.
**extra is used for additional keyword arguments, those will be added as attributes to the Element.
Example:
>>> import xml.etree.ElementTree as ET
>>>
>>> parent = ET.Element("parent")
>>>
>>> myattributes = {"size": "small", "gender": "unknown"}
>>> child = ET.SubElement(parent, "child", attrib=myattributes, age="10" )
>>>
>>> ET.dump(parent)
<parent><child age="10" gender="unknown" size="small" /></parent>
>>>
If you look further down on the same page you linked to where it deals with class xml.etree.ElementTree.Element(tag, attrib={}, **extra) it tells you how any of the extra arguments work, that is by e.g.:
from etree import ElementTree as ET
a = ET.Element('root-node', tag='This is an extra that sets a tag')
b = ET.SubElement(a, 'nested-node 1')
c = ET.SubElement(a, 'nested-node 2')
d = ET.SubElement(c, 'innermost node')
ET.dump(a)
This also shows you how subelement works, you simply tell it which element (can be a subelsement) that you want to attach it to. For the future, supply some code too so it's easier to see what you're doing/want.

Convert Variable Name to String?

I would like to convert a python variable name into the string equivalent as shown. Any ideas how?
var = {}
print ??? # Would like to see 'var'
something_else = 3
print ??? # Would print 'something_else'
TL;DR: Not possible. See 'conclusion' at the end.
There is an usage scenario where you might need this. I'm not implying there are not better ways or achieving the same functionality.
This would be useful in order to 'dump' an arbitrary list of dictionaries in case of error, in debug modes and other similar situations.
What would be needed, is the reverse of the eval() function:
get_indentifier_name_missing_function()
which would take an identifier name ('variable','dictionary',etc) as an argument, and return a
string containing the identifier’s name.
Consider the following current state of affairs:
random_function(argument_data)
If one is passing an identifier name ('function','variable','dictionary',etc) argument_data to a random_function() (another identifier name), one actually passes an identifier (e.g.: <argument_data object at 0xb1ce10>) to another identifier (e.g.: <function random_function at 0xafff78>):
<function random_function at 0xafff78>(<argument_data object at 0xb1ce10>)
From my understanding, only the memory address is passed to the function:
<function at 0xafff78>(<object at 0xb1ce10>)
Therefore, one would need to pass a string as an argument to random_function() in order for that function to have the argument's identifier name:
random_function('argument_data')
Inside the random_function()
def random_function(first_argument):
, one would use the already supplied string 'argument_data' to:
serve as an 'identifier name' (to display, log, string split/concat, whatever)
feed the eval() function in order to get a reference to the actual identifier, and therefore, a reference to the real data:
print("Currently working on", first_argument)
some_internal_var = eval(first_argument)
print("here comes the data: " + str(some_internal_var))
Unfortunately, this doesn't work in all cases. It only works if the random_function() can resolve the 'argument_data' string to an actual identifier. I.e. If argument_data identifier name is available in the random_function()'s namespace.
This isn't always the case:
# main1.py
import some_module1
argument_data = 'my data'
some_module1.random_function('argument_data')
# some_module1.py
def random_function(first_argument):
print("Currently working on", first_argument)
some_internal_var = eval(first_argument)
print("here comes the data: " + str(some_internal_var))
######
Expected results would be:
Currently working on: argument_data
here comes the data: my data
Because argument_data identifier name is not available in the random_function()'s namespace, this would yield instead:
Currently working on argument_data
Traceback (most recent call last):
File "~/main1.py", line 6, in <module>
some_module1.random_function('argument_data')
File "~/some_module1.py", line 4, in random_function
some_internal_var = eval(first_argument)
File "<string>", line 1, in <module>
NameError: name 'argument_data' is not defined
Now, consider the hypotetical usage of a get_indentifier_name_missing_function() which would behave as described above.
Here's a dummy Python 3.0 code: .
# main2.py
import some_module2
some_dictionary_1 = { 'definition_1':'text_1',
'definition_2':'text_2',
'etc':'etc.' }
some_other_dictionary_2 = { 'key_3':'value_3',
'key_4':'value_4',
'etc':'etc.' }
#
# more such stuff
#
some_other_dictionary_n = { 'random_n':'random_n',
'etc':'etc.' }
for each_one_of_my_dictionaries in ( some_dictionary_1,
some_other_dictionary_2,
...,
some_other_dictionary_n ):
some_module2.some_function(each_one_of_my_dictionaries)
# some_module2.py
def some_function(a_dictionary_object):
for _key, _value in a_dictionary_object.items():
print( get_indentifier_name_missing_function(a_dictionary_object) +
" " +
str(_key) +
" = " +
str(_value) )
######
Expected results would be:
some_dictionary_1 definition_1 = text_1
some_dictionary_1 definition_2 = text_2
some_dictionary_1 etc = etc.
some_other_dictionary_2 key_3 = value_3
some_other_dictionary_2 key_4 = value_4
some_other_dictionary_2 etc = etc.
......
......
......
some_other_dictionary_n random_n = random_n
some_other_dictionary_n etc = etc.
Unfortunately, get_indentifier_name_missing_function() would not see the 'original' identifier names (some_dictionary_,some_other_dictionary_2,some_other_dictionary_n). It would only see the a_dictionary_object identifier name.
Therefore the real result would rather be:
a_dictionary_object definition_1 = text_1
a_dictionary_object definition_2 = text_2
a_dictionary_object etc = etc.
a_dictionary_object key_3 = value_3
a_dictionary_object key_4 = value_4
a_dictionary_object etc = etc.
......
......
......
a_dictionary_object random_n = random_n
a_dictionary_object etc = etc.
So, the reverse of the eval() function won't be that useful in this case.
Currently, one would need to do this:
# main2.py same as above, except:
for each_one_of_my_dictionaries_names in ( 'some_dictionary_1',
'some_other_dictionary_2',
'...',
'some_other_dictionary_n' ):
some_module2.some_function( { each_one_of_my_dictionaries_names :
eval(each_one_of_my_dictionaries_names) } )
# some_module2.py
def some_function(a_dictionary_name_object_container):
for _dictionary_name, _dictionary_object in a_dictionary_name_object_container.items():
for _key, _value in _dictionary_object.items():
print( str(_dictionary_name) +
" " +
str(_key) +
" = " +
str(_value) )
######
In conclusion:
Python passes only memory addresses as arguments to functions.
Strings representing the name of an identifier, can only be referenced back to the actual identifier by the eval() function if the name identifier is available in the current namespace.
A hypothetical reverse of the eval() function, would not be useful in cases where the identifier name is not 'seen' directly by the calling code. E.g. inside any called function.
Currently one needs to pass to a function:
the string representing the identifier name
the actual identifier (memory address)
This can be achieved by passing both the 'string' and eval('string') to the called function at the same time. I think this is the most 'general' way of solving this egg-chicken problem across arbitrary functions, modules, namespaces, without using corner-case solutions. The only downside is the use of the eval() function which may easily lead to unsecured code. Care must be taken to not feed the eval() function with just about anything, especially unfiltered external-input data.
Totally possible with the python-varname package (python3):
from varname import nameof
s = 'Hey!'
print (nameof(s))
Output:
s
Install:
pip3 install varname
Or get the package here:
https://github.com/pwwang/python-varname
I searched for this question because I wanted a Python program to print assignment statements for some of the variables in the program. For example, it might print "foo = 3, bar = 21, baz = 432". The print function would need the variable names in string form. I could have provided my code with the strings "foo","bar", and "baz", but that felt like repeating myself. After reading the previous answers, I developed the solution below.
The globals() function behaves like a dict with variable names (in the form of strings) as keys. I wanted to retrieve from globals() the key corresponding to the value of each variable. The method globals().items() returns a list of tuples; in each tuple the first item is the variable name (as a string) and the second is the variable value. My variablename() function searches through that list to find the variable name(s) that corresponds to the value of the variable whose name I need in string form.
The function itertools.ifilter() does the search by testing each tuple in the globals().items() list with the function lambda x: var is globals()[x[0]]. In that function x is the tuple being tested; x[0] is the variable name (as a string) and x[1] is the value. The lambda function tests whether the value of the tested variable is the same as the value of the variable passed to variablename(). In fact, by using the is operator, the lambda function tests whether the name of the tested variable is bound to the exact same object as the variable passed to variablename(). If so, the tuple passes the test and is returned by ifilter().
The itertools.ifilter() function actually returns an iterator which doesn't return any results until it is called properly. To get it called properly, I put it inside a list comprehension [tpl[0] for tpl ... globals().items())]. The list comprehension saves only the variable name tpl[0], ignoring the variable value. The list that is created contains one or more names (as strings) that are bound to the value of the variable passed to variablename().
In the uses of variablename() shown below, the desired string is returned as an element in a list. In many cases, it will be the only item in the list. If another variable name is assigned the same value, however, the list will be longer.
>>> def variablename(var):
... import itertools
... return [tpl[0] for tpl in
... itertools.ifilter(lambda x: var is x[1], globals().items())]
...
>>> var = {}
>>> variablename(var)
['var']
>>> something_else = 3
>>> variablename(something_else)
['something_else']
>>> yet_another = 3
>>> variablename(something_else)
['yet_another', 'something_else']
as long as it's a variable and not a second class, this here works for me:
def print_var_name(variable):
for name in globals():
if eval(name) == variable:
print name
foo = 123
print_var_name(foo)
>>>foo
this happens for class members:
class xyz:
def __init__(self):
pass
member = xyz()
print_var_name(member)
>>>member
ans this for classes (as example):
abc = xyz
print_var_name(abc)
>>>abc
>>>xyz
So for classes it gives you the name AND the properteries
This is not possible.
In Python, there really isn't any such thing as a "variable". What Python really has are "names" which can have objects bound to them. It makes no difference to the object what names, if any, it might be bound to. It might be bound to dozens of different names, or none.
Consider this example:
foo = 1
bar = 1
baz = 1
Now, suppose you have the integer object with value 1, and you want to work backwards and find its name. What would you print? Three different names have that object bound to them, and all are equally valid.
In Python, a name is a way to access an object, so there is no way to work with names directly. There might be some clever way to hack the Python bytecodes or something to get the value of the name, but that is at best a parlor trick.
If you know you want print foo to print "foo", you might as well just execute print "foo" in the first place.
EDIT: I have changed the wording slightly to make this more clear. Also, here is an even better example:
foo = 1
bar = foo
baz = foo
In practice, Python reuses the same object for integers with common values like 0 or 1, so the first example should bind the same object to all three names. But this example is crystal clear: the same object is bound to foo, bar, and baz.
Technically the information is available to you, but as others have asked, how would you make use of it in a sensible way?
>>> x = 52
>>> globals()
{'__builtins__': <module '__builtin__' (built-in)>, '__name__': '__main__',
'x': 52, '__doc__': None, '__package__': None}
This shows that the variable name is present as a string in the globals() dictionary.
>>> globals().keys()[2]
'x'
In this case it happens to be the third key, but there's no reliable way to know where a given variable name will end up
>>> for k in globals().keys():
... if not k.startswith("_"):
... print k
...
x
>>>
You could filter out system variables like this, but you're still going to get all of your own items. Just running that code above created another variable "k" that changed the position of "x" in the dict.
But maybe this is a useful start for you. If you tell us what you want this capability for, more helpful information could possibly be given.
By using the the unpacking operator:
>>> def tostr(**kwargs):
return kwargs
>>> var = {}
>>> something_else = 3
>>> tostr(var = var,something_else=something_else)
{'var' = {},'something_else'=3}
You somehow have to refer to the variable you want to print the name of. So it would look like:
print varname(something_else)
There is no such function, but if there were it would be kind of pointless. You have to type out something_else, so you can as well just type quotes to the left and right of it to print the name as a string:
print "something_else"
What are you trying to achieve? There is absolutely no reason to ever do what you describe, and there is likely a much better solution to the problem you're trying to solve..
The most obvious alternative to what you request is a dictionary. For example:
>>> my_data = {'var': 'something'}
>>> my_data['something_else'] = 'something'
>>> print my_data.keys()
['var', 'something_else']
>>> print my_data['var']
something
Mostly as a.. challenge, I implemented your desired output. Do not use this code, please!
#!/usr/bin/env python2.6
class NewLocals:
"""Please don't ever use this code.."""
def __init__(self, initial_locals):
self.prev_locals = list(initial_locals.keys())
def show_new(self, new_locals):
output = ", ".join(list(set(new_locals) - set(self.prev_locals)))
self.prev_locals = list(new_locals.keys())
return output
# Set up
eww = None
eww = NewLocals(locals())
# "Working" requested code
var = {}
print eww.show_new(locals()) # Outputs: var
something_else = 3
print eww.show_new(locals()) # Outputs: something_else
# Further testing
another_variable = 4
and_a_final_one = 5
print eww.show_new(locals()) # Outputs: another_variable, and_a_final_one
Does Django not do this when generating field names?
http://docs.djangoproject.com/en/dev//topics/db/models/#verbose-field-names
Seems reasonable to me.
I think this is a cool solution and I suppose the best you can get. But do you see any way to handle the ambigious results, your function may return?
As "is" operator behaves unexpectedly with integers shows, low integers and strings of the same value get cached by python so that your variablename-function might priovide ambigous results with a high probability.
In my case, I would like to create a decorator, that adds a new variable to a class by the varialbename i pass it:
def inject(klass, dependency):
klass.__dict__["__"+variablename(dependency)]=dependency
But if your method returns ambigous results, how can I know the name of the variable I added?
var any_var="myvarcontent"
var myvar="myvarcontent"
#inject(myvar)
class myclasss():
def myclass_method(self):
print self.__myvar #I can not be sure, that this variable will be set...
Maybe if I will also check the local list I could at least remove the "dependency"-Variable from the list, but this will not be a reliable result.
Here is a succinct variation that lets you specify any directory.
The issue with using directories to find anything is that multiple variables can have the same value. So this code returns a list of possible variables.
def varname( var, dir=locals()):
return [ key for key, val in dir.items() if id( val) == id( var)]
I don't know it's right or not, but it worked for me
def varname(variable):
for name in list(globals().keys()):
expression = f'id({name})'
if id(variable) == eval(expression):
return name
it is possible to a limited extent. the answer is similar to the solution by #tamtam .
The given example assumes the following assumptions -
You are searching for a variable by its value
The variable has a distinct value
The value is in the global namespace
Example:
testVar = "unique value"
varNameAsString = [k for k,v in globals().items() if v == "unique value"]
#
# the variable "varNameAsString" will contain all the variable name that matches
# the value "unique value"
# for this example, it will be a list of a single entry "testVar"
#
print(varNameAsString)
Output : ['testVar']
You can extend this example for any other variable/data type
I'd like to point out a use case for this that is not an anti-pattern, and there is no better way to do it.
This seems to be a missing feature in python.
There are a number of functions, like patch.object, that take the name of a method or property to be patched or accessed.
Consider this:
patch.object(obj, "method_name", new_reg)
This can potentially start "false succeeding" when you change the name of a method. IE: you can ship a bug, you thought you were testing.... simply because of a bad method name refactor.
Now consider: varname. This could be an efficient, built-in function. But for now it can work by iterating an object or the caller's frame:
Now your call can be:
patch.member(obj, obj.method_name, new_reg)
And the patch function can call:
varname(var, obj=obj)
This would: assert that the var is bound to the obj and return the name of the member. Or if the obj is not specified, use the callers stack frame to derive it, etc.
Could be made an efficient built in at some point, but here's a definition that works. I deliberately didn't support builtins, easy to add tho:
Feel free to stick this in a package called varname.py, and use it in your patch.object calls:
patch.object(obj, varname(obj, obj.method_name), new_reg)
Note: this was written for python 3.
import inspect
def _varname_dict(var, dct):
key_name = None
for key, val in dct.items():
if val is var:
if key_name is not None:
raise NotImplementedError("Duplicate names not supported %s, %s" % (key_name, key))
key_name = key
return key_name
def _varname_obj(var, obj):
key_name = None
for key in dir(obj):
val = getattr(obj, key)
equal = val is var
if equal:
if key_name is not None:
raise NotImplementedError("Duplicate names not supported %s, %s" % (key_name, key))
key_name = key
return key_name
def varname(var, obj=None):
if obj is None:
if hasattr(var, "__self__"):
return var.__name__
caller_frame = inspect.currentframe().f_back
try:
ret = _varname_dict(var, caller_frame.f_locals)
except NameError:
ret = _varname_dict(var, caller_frame.f_globals)
else:
ret = _varname_obj(var, obj)
if ret is None:
raise NameError("Name not found. (Note: builtins not supported)")
return ret
This will work for simnple data types (str, int, float, list etc.)
>>> def my_print(var_str) :
print var_str+':', globals()[var_str]
>>> a = 5
>>> b = ['hello', ',world!']
>>> my_print('a')
a: 5
>>> my_print('b')
b: ['hello', ',world!']
It's not very Pythonesque but I was curious and found this solution. You need to duplicate the globals dictionary since its size will change as soon as you define a new variable.
def var_to_name(var):
# noinspection PyTypeChecker
dict_vars = dict(globals().items())
var_string = None
for name in dict_vars.keys():
if dict_vars[name] is var:
var_string = name
break
return var_string
if __name__ == "__main__":
test = 3
print(f"test = {test}")
print(f"variable name: {var_to_name(test)}")
which returns:
test = 3
variable name: test
To get the variable name of var as a string:
var = 1000
var_name = [k for k,v in locals().items() if v == var][0]
print(var_name) # ---> outputs 'var'
Thanks #restrepo, this was exactly what I needed to create a standard save_df_to_file() function. For this, I made some small changes to your tostr() function. Hope this will help someone else:
def variabletostr(**df):
variablename = list(df.keys())[0]
return variablename
variabletostr(df=0)
The original question is pretty old, but I found an almost solution with Python 3. (I say almost because I think you can get close to a solution but I do not believe there is a solution concrete enough to satisfy the exact request).
First, you might want to consider the following:
objects are a core concept in Python, and they may be assigned a variable, but the variable itself is a bound name (think pointer or reference) not the object itself
var is just a variable name bound to an object and that object could have more than one reference (in your example it does not seem to)
in this case, var appears to be in the global namespace so you can use the global builtin conveniently named global
different name references to the same object will all share the same id which can be checked by running the id builtin id like so: id(var)
This function grabs the global variables and filters out the ones matching the content of your variable.
def get_bound_names(target_variable):
'''Returns a list of bound object names.'''
return [k for k, v in globals().items() if v is target_variable]
The real challenge here is that you are not guaranteed to get back the variable name by itself. It will be a list, but that list will contain the variable name you are looking for. If your target variable (bound to an object) is really the only bound name, you could access it this way:
bound_names = get_variable_names(target_variable)
var_string = bound_names[0]
Possible for Python >= 3.8 (with f'{var=}' string )
Not sure if this could be used in production code, but in Python 3.8(and up) you can use f' string debugging specifier. Add = at the end of an expression, and it will print both the expression and its value:
my_salary_variable = 5000
print(f'{my_salary_variable = }')
Output:
my_salary_variable = 5000
To uncover this magic here is another example:
param_list = f'{my_salary_variable=}'.split('=')
print(param_list)
Output:
['my_salary_variable', '5000']
Explanation: when you put '=' after your var in f'string, it returns a string with variable name, '=' and its value. Split it with .split('=') and get a List of 2 strings, [0] - your_variable_name, and [1] - actual object of variable.
Pick up [0] element of the list if you need variable name only.
my_salary_variable = 5000
param_list = f'{my_salary_variable=}'.split('=')
print(param_list[0])
Output:
my_salary_variable
or, in one line
my_salary_variable = 5000
print(f'{my_salary_variable=}'.split('=')[0])
Output:
my_salary_variable
Works with functions too:
def my_super_calc_foo(number):
return number**3
print(f'{my_super_calc_foo(5) = }')
print(f'{my_super_calc_foo(5)=}'.split('='))
Output:
my_super_calc_foo(5) = 125
['my_super_calc_foo(5)', '125']
Process finished with exit code 0
This module works for converting variables names to a string:
https://pypi.org/project/varname/
Use it like this:
from varname import nameof
variable=0
name=nameof(variable)
print(name)
//output: variable
Install it by:
pip install varname
print "var"
print "something_else"
Or did you mean something_else?

Categories

Resources