Text of xml element not reassigned to new value - python

I have an xml defined as string my_xml.
Then I tried to increase amount of strings and change some values.
my_xml = """<root><foo><bar>spamm.xml</bar></foo></root>"""
from xml.etree import ElementTree as et
tree = et.fromstring(my_xml )
el = list(tree)[0].copy()
tree.insert(0, el)
tree.insert(0, el)
cnt = 0
elements = [elem for elem in tree.iter() if elem.text is not None]
for elem in elements:
if cnt !=0:
print elem.text[:4]+str(cnt)+elem.text[5:]
elem.text= elem.text[:4]+str(cnt)+elem.text[5:] # strange behavour
cnt +=1
print et.tostring(tree)
Why elem.text= elem.text[:4]+str(cnt)+elem.text[5:] string does not reassigned elem.text to new value?
Expected output
<root>
<foo><bar>spamm.xml</bar></foo>
<foo><bar>spamm1.xml</bar></foo>
<foo><bar>spamm2.xml</bar></foo>
</root>
Actual output
<root>
<foo><bar>spam2.xml</bar></foo>
<foo><bar>spam2.xml</bar></foo>
<foo><bar>spam2.xml</bar></foo>
</root>

The problem is in your copy phase:
you should do it for each el or both el share the same ref
you should use copy.deepcopy() because shallow copy doesn't cut it here
I use python 3, so the copy() method doesn't exist. I had to use the copy module, using deepcopy and on both items (or you're copying only once) to make sure all references are duplicated
part of the code I changed (better with a loop):
import copy
tree = et.fromstring(my_xml)
for _ in range(2):
el = copy.deepcopy(list(tree)[0])
tree.insert(0, el)
result:
<root><foo><bar>spam1.xml</bar></foo><foo><bar>spam1.xml</bar></foo><foo><bar>spam2.xml</bar></foo></root>

Related

Get match and unmatch items from two lists

I have to compare two lists for the matching and non matching elements and print them out. I have tried the below code:
list1 = ["prefencia","santro ne prefence"]
I'm fetching all the text from a webpage using selenium getText() method and all the text is getting stored in a string variable which is then stored to list2:
str = "Centro de prefencia de lilly ac"
list2 = []
list2 = str
for item in list1:
if item in list2:
print("match:", item)
else:
print("no_match:", item)
Result of above code-
match:prefencia
Seems like in keyword is working like contains. I would want to search for the exact match for the element present in list1 with the element present in list2.
At least here you have a problem:
list1 = ["prefencia","santro ne prefence"]
str = "Centro de prefencia de lilly ac"
list2 = []
list2 = str
Do you want list2 variable to be list. Now you set it as an empty list and on next row it to variable str.
How about something like this (quess what you want to do)
list1 = ["prefencia","santro ne prefence"]
mystr = "Centro de prefencia de lilly ac"
list2 = mystr.split(' ') // splits your string to list of words
for item in list1:
if item in list2:
print("match:", item)
else:
print("no_match:", item)
But if you split your string to list of words you'll never get exact match for multiple words such as "santro ne prefence".

Breadth First Search traversal for LXML Files Python

I am working on performing a breadth-first search (BFS) traversal on an XML File.
A Depth First Search algorithm is shown in the https://lxml.de/3.3/api.html#lxml-etre. However, I need help with applying the BFS Search based on this code.
Below is the code given in the documentation:
>>> root = etree.XML('<root><a><b/><c/></a><d><e/></d></root>')
>>> print(etree.tostring(root, pretty_print=True, encoding='unicode'))
<root>
<a>
<b/>
<c/>
</a>
<d>
<e/>
</d>
</root>
>>> queue = deque([root])
>>> while queue:
... el = queue.popleft() # pop next element
... queue.extend(el) # append its children
... print(el.tag)
root
a
d
b
c
e
I need help with trying to append it to make it suitable for BFS Traversal. Below is an example of the code I tried to write but it doesn't work correctly. Can someone please help.
My Code:
from collections import deque
>>> d = deque([root])
>>> while d:
>>> el = d.pop()
>>> d.extend(el)
>>> print(el.tag)
Thank You
Your implementation of BFS is currently popping from the wrong end of your queue. You should use popleft() rather than pop().
d = deque([root])
while d:
el = d.popleft()
d.extend(el)
print(el.tag)
Can be implemented with xpath also
>>> root = etree.XML('<root><a><b/><c><f/></c></a><d><e/></d></root>')
>>> queue = deque([root])
>>> while queue:
... el = queue.popleft()
... queue.extend(el.xpath('./child::*'))
... print(el.tag)
...
root
a
d
b
c
e
f

python 3 list comprehension with if clause referring to list

I have this code:
xtralist = ["df","cvbcb","df"]
kont=[]
b = Counter(xtralist)
for item in xtralist:
if item not in kont:
print(b[item]
kont.append(item)
The kont list is only there to see if the printing for that item has been done before. It works but is too slow for large xtralist, so I tried this:
[(print(b[item] and kont.append(item)) for item in xtralist if item not in kont]
which doesnt work. I am sure there are smarter ways, but how can I do this with list comprehension?
set is indeed the way to go. If the order is important though, you'll need to use a list and a set:
xtralist = ["df","cvbcb","df"]
already_seen = set()
for item in xtralist:
if item not in already_seen:
print(item)
already_seen.add(item)
It outputs:
df
cvbcb
If you want to display the number of occurences, you can modify your code slightly:
from collections import Counter
xtralist = ["df","cvbcb","df"]
kont = set()
b = Counter(xtralist)
for item in xtralist:
if item not in kont:
print("%s * %d" % (item, b[item]))
kont.add(item)
It outputs:
df * 2
cvbcb * 1
All you need to do is enclose your xtralist into a set so that it doesn't have any duplicate elements, and then just print each item in the set:
xtralist = ["df", "cvbcb", "df"]
print('\n'.join(set(xtralist)))
df
cvbcb
To dig into the code a little bit, check out each step:
>>> xtralist = ["df", "cvbcb", "df"]
>>> xtralist
['df', 'cvbcb', 'df']
>>> set(xtralist)
{'df', 'cvbcb'}
>>> '\n'.join(set(xtralist))
'df\ncvbcb'
>>> print('\n'.join(set(xtralist)))
df
cvbcb
However, note that a set is unordered, so if order is important, you'll have to iterate through your list; a little modification to what you were trying would work; you don't need a Counter at all:
xtralist = ["df", "cvbcb", "df"]
kont = []
for item in xtralist:
if item not in kont:
print(item)
kont.append(item)
df
cvbcb

Group elements of a list based on repetition of values

I am really new to Python and I am having a issue figuring out the problem below.
I have a list like:
my_list = ['testOne:100', 'testTwo:88', 'testThree:76', 'testOne:78', 'testTwo:88', 'testOne:73', 'testTwo:66', 'testThree:90']
And I want to group the elements based on the occurrence of elements that start with 'testOne'.
Expected Result:
new_list=[['testOne:100', 'testTwo:88', 'testThree:76'], ['testOne:78', 'testTwo:88'], ['testOne:73', 'testTwo:66', 'testThree:90']]
Just start a new list at every testOne.
>>> new_list = []
>>> for item in my_list:
if item.startswith('testOne:'):
new_list.append([])
new_list[-1].append(item)
>>> new_list
[['testOne:100', 'testTwo:88', 'testThree:76'], ['testOne:78', 'testTwo:88'], ['testOne:73', 'testTwo:66', 'testThree:90']]
Not a cool one-liner, but this works also with more general labels:
result = [[]]
seen = set()
for entry in my_list:
test, val = entry.split(":")
if test in seen:
result.append([entry])
seen = {test}
else:
result[-1].append(entry)
seen.add(test)
Here, we are keeping track of the test labels we've already seen in a set and starting a new list whenever we encounter a label we've already seen in the same list.
Alternatively, assuming the lists always start with testOne, you could just start a new list whenever the label is testOne:
result = []
for entry in my_list:
test, val = entry.split(":")
if test == "testOne":
result.append([entry])
else:
result[-1].append(entry)
It'd be nice to have an easy one liner, but I think it'd end up looking a bit too complicated if I tried that. Here's what I came up with:
# Create a list of the starting indices:
ind = [i for i, e in enumerate(my_list) if e.split(':')[0] == 'testOne']
# Create a list of slices using pairs of indices:
new_list = [my_list[i:j] for (i, j) in zip(ind, ind[1:] + [None])]
Not very sophisticated but it works:
my_list = ['testOne:100', 'testTwo:88', 'testThree:76', 'testOne:78', 'testTwo:88', 'testOne:73', 'testTwo:66', 'testThree:90']
splitting_word = 'testOne'
new_list = list()
partial_list = list()
for item in my_list:
if item.startswith(splitting_word) and partial_list:
new_list.append(partial_list)
partial_list = list()
partial_list.append(item)
new_list.append(partial_list)
joining the list into a string with delimiter |
step1="|".join(my_list)
splitting the listing based on 'testOne'
step2=step1.split("testOne")
appending "testOne" to the list elements to get the result
new_list=[[i for i in str('testOne'+i).split("|") if len(i)>0] for i in step2[1:]]

Refresh a list content with another list in Python

How would I extend the content of a given list with another given list without using the method .extend()? I imagine that I could use something with dictionaries.
Code
>>> tags =['N','O','S','Cl']
>>> itags =[1,2,4,3]
>>> anew =['N','H']
>>> inew =[2,5]
I need a function which returns the refreshed lists
tags =['N','O','S','Cl','H']
itags =[3,2,4,3,5]
When an element is already in the list, the number in the other list is added. If I use the extend() method, the the element N will appear in list tags twice:
>>> tags.extend(anew)
>>>itags.extend(inew)
>>> print tags,itags
['N','O','S','Cl','N','H'] [1,2,4,3,5,2,5]
You probably want a Counter for this.
from collections import Counter
tags = Counter({"N":1, "O":2, "S": 4, "Cl":3})
new = Counter({"N": 2, "H": 5})
tags = tags + new
print tags
output:
Counter({'H': 5, 'S': 4, 'Cl': 3, 'N': 3, 'O': 2})
If the order of elements matters, I'd use collections.Counter like so:
from collections import Counter
tags = ['N','O','S','Cl']
itags = [1,2,4,3]
new = ['N','H']
inew = [2,5]
cnt = Counter(dict(zip(tags, itags))) + Counter(dict(zip(new, inew)))
out = tags + [el for el in new if el not in tags]
iout = [cnt[el] for el in out]
print(out)
print(iout)
If the order does not matter, there is a simpler way to obtain out and iout:
out = cnt.keys()
iout = cnt.values()
If you don't have to use a pair of lists, then working with Counter directly is a natural fit for your problem.
If you need to maintain the order, you may want to use an OrderedDict instead of a Counter:
from collections import OrderedDict
tags = ['N','O','S','Cl']
itags = [1,2,4,3]
new = ['N','H']
inew = [2,5]
od = OrderedDict(zip(tags, itags))
for x, i in zip(new, inew):
od[x] = od.setdefault(x, 0) + i
print od.keys()
print od.values()
On Python 3.x, use list(od.keys()) and list(od.values()).

Categories

Resources