Move an entire element in with lxml.etree - python

Within lxml, is it possible, given an element, to move the entire thing elsewhere in the xml document without having to read all of it's children and recreate it? My best example would be changing parents. I've rummaged around the docs a bit but haven't had much luck. Thanks in advance!

.append, .insert and other operations do that by default
>>> from lxml import etree
>>> tree = etree.XML('<a><b><c/></b><d><e><f/></e></d></a>')
>>> node_b = tree.xpath('/a/b')[0]
>>> node_d = tree.xpath('/a/d')[0]
>>> node_d.append(node_b)
>>> etree.tostring(tree) # complete 'b'-branch is now under 'd', after 'e'
'<a><d><e><f/></e><b><c/></b></d></a>'
>>> node_f = tree.xpath('/a/d/e/f')[0] # Nothing stops us from moving it again
>>> node_f.append(node_b) # Now 'b' and its child are under 'f'
>>> etree.tostring(tree)
'<a><d><e><f><b><c/></b></f></e></d></a>'
Be careful when moving nodes having a tail text. In lxml tail text belong to the node and moves around with it. (Also, when you delete a node, its tail text is also deleted)
>>> tree = etree.XML('<a><b><c/></b>TAIL<d><e><f/></e></d></a>')
>>> node_b = tree.xpath('/a/b')[0]
>>> node_d = tree.xpath('/a/d')[0]
>>> node_d.append(node_b)
>>> etree.tostring(tree)
'<a><d><e><f/></e><b><c/></b>TAIL</d></a>'
Sometimes it's a desired effect, but sometimes you will need something like that:
>>> tree = etree.XML('<a><b><c/></b>TAIL<d><e><f/></e></d></a>')
>>> node_b = tree.xpath('/a/b')[0]
>>> node_d = tree.xpath('/a/d')[0]
>>> node_a = tree.xpath('/a')[0]
>>> # Manually move text
>>> node_a.text = node_b.tail
>>> node_b.tail = None
>>> node_d.append(node_b)
>>> etree.tostring(tree)
>>> # Now TAIL text stays within its old place
'<a>TAIL<d><e><f/></e><b><c/></b></d></a>'

You could use .append(), .insert() methods to add a subelement to the existing element:
>>> from lxml import etree
>>> from_ = etree.fromstring("<from/>")
>>> to = etree.fromstring("<to/>")
>>> to.append(from_)
>>> etree.tostring(to)
'<to><from/></to>'

Related

Algorithm to split the values of a list into a specific format

Can you help me with my algorithm in Python to parse a list, please?
List = ['PPPP_YYYY_ZZZZ_XXXX', 'PPPP_TOTO_TATA_TITI_TUTU', 'PPPP_TOTO_MMMM_TITI_TUTU', 'PPPP_TOTO_EHEH_TITI_TUTU', 'PPPP_TOTO_EHEH_OOOO_AAAAA', 'PPPP_TOTO_EHEH_IIII_SSSS_RRRR']
In this list, I have to get the last two words (PARENT_CHILD). For example for PPPP_TOTO_TATA_TITI_TUTU, I only get TITI_TUTU
In the case where there are duplicates, that is to say that in my list, I have: PPPP_TOTO_TATA_TITI_TUTU and PPPP_TOTO_EHEH_TITI_TUTU, I would have two times TITI_TUTU, I then want to recover the GRANDPARENT for each of them, that is: TATA_TITI_TUTU and EHEH_TITI_TUTU
As long as the names are duplicated, we take the level above.
But in this case, if I added the GRANDPARENT for EHEH_TITI_TUTU, I also want it to be added for all those who have EHEH in the name so instead of having OOOO_AAAAA, I would like to have EHEH_OOO_AAAAA and EHEH_IIII_SSSS_RRRR
My final list =
['ZZZZ_XXXX', 'TATA_TITI_TUTU', 'MMMM_TITI_TUTU', 'EHEH_TITI_TUTU', 'EHEH_OOOO_AAAAA', 'EHEH_IIII_SSSS_RRRR']
Thank you in advance.
Here is the code I started to write:
json_paths = ['PPPP_YYYY_ZZZZ_XXXX', 'PPPP_TOTO_TATA_TITI_TUTU',
'PPPP_TOTO_EHEH_TITI_TUTU', 'PPPP_TOTO_MMMM_TITI_TUTU', 'PPPP_TOTO_EHEH_OOOO_AAAAA']
cols_name = []
for path in json_paths:
acc=2
col_name = '_'.join(path.split('_')[-acc:])
tmp = cols_name
while col_name in tmp:
acc += 1
idx = tmp.index(col_name)
cols_name[idx] = '_'.join(json_paths[idx].split('_')[-acc:])
col_name = '_'.join(path.split('_')[-acc:])
tmp = ['_'.join(item.split('_')[-acc:]) for item in json_paths].pop()
cols_name.append(col_name)
print(cols_name.index(col_name), col_name)
cols_name
help ... with ... algorithm
use a dictionary for the initial container while iterating
keys will be PARENT_CHILD's and values will be lists containing grandparents.
>>> s = 'PPPP_TOTO_TATA_TITI_TUTU'
>>> d = collections.defaultdict(list)
>>> *_,grandparent,parent,child = s.rsplit('_',maxsplit=3)
>>> d['_'.join([parent,child])].append(grandparent)
>>> d
defaultdict(<class 'list'>, {'TITI_TUTU': ['TATA']})
>>> s = 'PPPP_TOTO_EHEH_TITI_TUTU'
>>> *_,grandparent,parent,child = s.rsplit('_',maxsplit=3)
>>> d['_'.join([parent,child])].append(grandparent)
>>> d
defaultdict(<class 'list'>, {'TITI_TUTU': ['TATA', 'EHEH']})
>>>
after iteration determine if there are multiple grandparents in a value
if there are, join/append the parent_child to each grandparent
additionally find all the parent_child's with these grandparents and prepend their grandparents. To facilitate build a second dictionary during iteration - {grandparent:[list_of_children],...}.
if the parent_child only has one grandparent use as-is
Instead of splitting each string the info could be extracted with a regular expression.
pattern = r'^.*?_([^_]*)_([^_]*_[^_]*)$'

Breadth First Search traversal for LXML Files Python

I am working on performing a breadth-first search (BFS) traversal on an XML File.
A Depth First Search algorithm is shown in the https://lxml.de/3.3/api.html#lxml-etre. However, I need help with applying the BFS Search based on this code.
Below is the code given in the documentation:
>>> root = etree.XML('<root><a><b/><c/></a><d><e/></d></root>')
>>> print(etree.tostring(root, pretty_print=True, encoding='unicode'))
<root>
<a>
<b/>
<c/>
</a>
<d>
<e/>
</d>
</root>
>>> queue = deque([root])
>>> while queue:
... el = queue.popleft() # pop next element
... queue.extend(el) # append its children
... print(el.tag)
root
a
d
b
c
e
I need help with trying to append it to make it suitable for BFS Traversal. Below is an example of the code I tried to write but it doesn't work correctly. Can someone please help.
My Code:
from collections import deque
>>> d = deque([root])
>>> while d:
>>> el = d.pop()
>>> d.extend(el)
>>> print(el.tag)
Thank You
Your implementation of BFS is currently popping from the wrong end of your queue. You should use popleft() rather than pop().
d = deque([root])
while d:
el = d.popleft()
d.extend(el)
print(el.tag)
Can be implemented with xpath also
>>> root = etree.XML('<root><a><b/><c><f/></c></a><d><e/></d></root>')
>>> queue = deque([root])
>>> while queue:
... el = queue.popleft()
... queue.extend(el.xpath('./child::*'))
... print(el.tag)
...
root
a
d
b
c
e
f

Python list items encoding

Why is it, that the encoding changes in Python 2.7 when I iterate over the items of a list?
test_list = ['Hafst\xc3\xa4tter', 'asbds#ages.at']
Printing the list:
print(test_list)
gets me this output:
['Hafst\xc3\xa4tter', 'asbds#ages.at']
So far, so good. But why is it, that when I iterate over the list, such as:
for item in test_list:
print(item)
I get this output:
Hafstätter
asbds#ages.at
Why does the encoding change (does it?? And how can I change the encoding within the list?
The encoding isn't changing, they are just different ways of displaying a string. One shows the non-ASCII bytes as escape codes for debugging:
>>> test_list = ['Hafst\xc3\xa4tter', 'asbds#ages.at']
>>> print(test_list)
['Hafst\xc3\xa4tter', 'asbds#ages.at']
>>> for item in test_list:
... print(item)
...
Hafstätter
asbds#ages.at
But they are equivalent:
>>> 'Hafst\xc3\xa4tter' == 'Hafstätter'
True
If you want to see lists displayed with the non-debugging output, you have to generate it yourself:
>>> print("['"+"', '".join(item for item in test_list) + "']")
['Hafstätter', 'asbds#ages.at']
There is a reason for the debugging output:
>>> a = 'a\xcc\x88'
>>> b = '\xc3\xa4'
>>> a
'a\xcc\x88'
>>> print a,b # should look the same, if not it is the browser's fault :)
ä ä
>>> a==b
False
>>> [a,b] # In a list you can see the difference by default.
['a\xcc\x88', '\xc3\xa4']

create multiple ppt slides with python

I want to print the slides based on array list. But somehow I don't grasp the logic. My code right now is like this
totalSheets = [0, 1, 2]
totalSlides = ['slide', 'slide2', 'slide3']
prs = Presentation()
blank_slide_layout = prs.slide_layouts[6]
for sheet, slide in zip(totalSheets, totalSlides):
sheetExcel = excelFile.sheet_by_index(sheet)
slide = prs.slides.add_slide(blank_slide_layout)
I wrong at the slide one. I just thinking is it can doing like slide(n) and just do n += 1? Thanks for any help in advance
if you want to actually use random in the list, you can use this..
>>> import random
>>> totalSlides = ['slide', 'slide2', 'slide3']
>>> random.choice(totalSlides)
'slide3'
>>> random.choice(totalSlides)
'slide'
>>>
For mutiple on list, you can try this..
>>> import random
>>> totalSlides = ['slide', 'slide2', 'slide3']
>>> random.sample(totalSlides, len(totalSlides))
['slide2', 'slide3', 'slide']
>>> random.sample(totalSlides, len(totalSlides))
['slide3', 'slide', 'slide2']
>>> random.sample(totalSlides, len(totalSlides))
['slide2', 'slide', 'slide3']
>>> random.sample(totalSlides, len(totalSlides))
['slide3', 'slide2', 'slide']
>>>

Python Copying part of string

I have this line
Server:x.x.x.x # U:100 # P:100 # Pre:0810 # Tel:xxxxxxxxxx
and I want to copy the value 0810 which is after Pre: value
How i can do that ?
You could use the re module ('re' stands for regular expressions)
This solution assumes that your Pre: field will always have four numbers.
If the length of the number varies, you could replace the {4}(expect exactly 4) with + (expect 'one or more')
>>> import re
>>> x = "Server:x.x.x.x # U:100 # P:100 # Pre:0810 # Tel:xxxxxxxxxx"
>>> num = re.findall(r'Pre:(\d{4})', x)[0] # re.findall returns a list
>>> print num
'0810'
You can read about it here:
https://docs.python.org/2/howto/regex.html
As usual in these cases, the best answer depends upon how your strings will vary, and we only have one example to generalize from.
Anyway, you could use string methods like str.split to get at it directly:
>>> s = "Server:x.x.x.x # U:100 # P:100 # Pre:0810 # Tel:xxxxxxxxxx"
>>> s.split()[6].split(":")[1]
'0810'
But I tend to prefer more general solutions. For example, depending on how the format changes, something like
>>> d = dict(x.split(":") for x in s.split(" # "))
>>> d
{'Pre': '0810', 'P': '100', 'U': '100', 'Tel': 'xxxxxxxxxx', 'Server': 'x.x.x.x'}
which makes a dictionary of all the values, after which you could access any element:
>>> d["Pre"]
'0810'
>>> d["Server"]
'x.x.x.x'

Categories

Resources