I am reading a file and looking for a particular string in it like this :
template = open('/temp/template.txt','r')
new_elements = ["movie1","movies2"]
for i in template.readlines():
if "movie" in i:
print "replace me"
This is all good but I would like to replace the lines that are found with the elements from "new_elements" . I make the assumption that the number of found strings will always match the number of elements in the "new_elements" list . I just don't know how to iterate over the new_elements whilst looking for lines to replace .
Cheers
One way is to make new_elements an iterator:
template = open('/temp/template.txt','r')
new_elements = iter(["movie1","movies2"])
for i in template.readlines():
if "movie" in i:
print "replace line with", new_elements.next()
You haven't said how you want to do the replacement- whether you want to write it to a new file, for example- but this will fit into whatever code you use.
What you're looking for is pretty simple: the .pop() method of lists. This will get the next entry in the list, and remove it; your next call to that function will return the next item. No need to do any iteration at all.
Without a parameter, it will pop the last element. If you want to pop the first, you can use new_elements.pop(0), although this will be slower than popping from the end.
Related
I am trying to clean a text field of its duplicate items (each item is on a new line in the text field). My logic: call get() on the text field, insert into a list, and then run an admittedly slow series of nested loops to check for duplicates and then repopulate the text field.
Could someone please help evaluate my logic and tell me why this isn't working?
def checkDup(self):
clean = []
dirty = O1.get("1.0", END+'-1c').split("\n")
for i in dirty[1:]:
if i not in clean:
clean.append(i)
clean.append("\n")
O1.delete("1.0", END)
O1.insert(END, clean)
I would have used the same logic with for loops for checking duplicates.. maybe there is something better out there to do that but for now at our level I think it's a good start.
reviewing your code:
for i in dirty[1:]:
Here why do you start after the first item in your list, does it need to be excluded? if so, you are deleting it anyways with:
01.delete('1.0', END)
You may need to change your code to 01.delete('2.0', END) if you need to keep the first line.
if i not in clean:
clean.append(i)
clean.append('\n')
Here, you are creating a longer list with a bunch of newlines that are each considered a member of your list, interesting.. I messed up with this portion.. after testing I see your results are only half as weird as what I did.
last line: you are pushing your corrected list directly in your widget, which causes a weird result.
01.insert(END, clean)
Fix that one this way; 01.insert(END, ''.join(clean)) this will break your list into a string containing your previously inserted newlines, putting all the text in the right place.
If I have a list of strings and want to eliminate leading and trailing whitespaces from it, how can I use .strip() effectively to accomplish this?
Here is my code (python 2.7):
for item in myList:
item = item.strip()
print item
for item in myList:
print item
The changes don't preserve from one iteration to the next. I tried using map as suggested here (https://stackoverflow.com/a/7984192) but it did not work for me. Please help.
Note, this question is useful:
An answer does not exist already
It covers a mistake someone new to programming / python might make
Its title covers search cases both general (how to update values in a list) and specific (how to do this with .strip()).
It addresses previous work, in particular the map solution, which would not work for me.
I'm guessing you tried:
map(str.strip, myList)
That creates a new list and returns it, leaving the original list unchanged. If you want to interact with the new list, you need to assign it to something. You could overwrite the old value if you want.
myList = map(str.strip, myList)
You could also use a list comprehension:
myList = [item.strip() for item in myList]
Which many consider a more "pythonic" style, compared to map.
I'm answering my own question here in the hopes that it saves someone from the couple hours of searching and experimentation it took me.
As it turns out the solution is fairly simple:
index = 0
for item in myList:
myList[index] = item.strip()
index += 1
for item in myList:
print "'"+item+"'"
Single quotes are concatenated at the beginning/end of each list item to aid detection of trailing/leading whitespace in the terminal. As you can see, the strings will now be properly stripped.
To update the values in the list we need to actually access the element in the list via its index and commit that change. I suspect the reason is because we are passing by value (passing a copy of the value into item) instead of passing by reference (directly accessing the underlying list[item]) when we declare the variable "item," whose scope is local to the for loop.
I have two arrays, where if an element exists in an array received from a client then it should delete the matching array in the other array. This works when the client array has just a single element but not when it has more than one.
This is the code:
projects = ['xmas','easter','mayday','newyear','vacation']
for i in self.get_arguments('del[]'):
try:
if i in projects:
print 'PROJECTS', projects
print 'DEL', self.get_arguments('del[]')
projects.remove(i)
except ValueError:
pass
self.get_arguments('del[]'), returns an array from the client side in the format:
[u'xmas , newyear, mayday']
So it reads as one element not 3 elements, as only one unicode present.
How can I get this to delete multiple elements?
EDIT: I've had to make the list into one with several individual elements.
How about filter?
projects = filter(lambda a: a not in self.get_arguments('del[]'), projects)
Could try something uber pythonic like a list comprehension:
new_list = [i for i in projects if i not in array_two]
You'd have to write-over your original projects, which isn't the most elegant, but this should work.
The reason this doesn't work is that remove just removes the first element that matches. You could fix that by just repeatedly calling remove until it doesn't exist anymore—e.g., by changing your if to a while, like this:
while i in projects:
print 'PROJECTS', projects
print 'DEL', self.get_arguments('del[]')
projects.remove(i)
But in general, using remove is a bad idea—especially when you already searched for the element. Now you're just repeating the search so you can remove it. Besides the obvious inefficiency, there are many cases where you're going to end up trying to delete the third instance of i (because that's the one you found) but actually deleting the first instead. It just makes your code harder to reason about. You can improve both the complexity and the efficiency by just iterating over the list once and removing as you go.
But even this is overly complicated—and still inefficient, because every time you delete from a list, you're moving all the other elements of the list. It's almost always simpler to just build a new list from the values you want to keep, using filter or a list comprehension:
arguments = set(self.get_arguments('del[]'))
projects = [project for project in projects if project not in arguments]
Making arguments into a set isn't essential here, but it's conceptually cleaner—you don't care about the order of the arguments, or need to retain any duplicates—and it's more efficient—sets can test membership instantly instead of by comparing to each element.
I am reading from file x which is contained individual data. These data are separated from each other by new line.I want to calculate tf_idf_vectorizer() for each individual data. So, I need to remove all members of the tweets whenever the code fine new line (\n) . I got error for the bold line in my code.
def load_text():
file=open('x.txt', 'r')
tweets = []
all_matrix = []
for line in file:
if line in ['\n', '\r\n']:
all_matrix.append(tf_idf_vectorizer(tweets))
**for i in tweets: tweets.remove(i)**
else:
tweets.append(line)
file.close()
return all_matrix
You can make tweets an empty list again with a simple assignment.
tweets = []
If you actually need to empty out the list in-place, the way you do it is either:
del tweets[:]
… or …
tweets[:] = []
In general, you can delete or replace any subslice of a list in this way; [:] is just the subslice that means "the whole list".
However, since nobody else has a reference to tweets, there's really no reason to empty out the list; just create a new empty list, and bind tweets to that, and let the old list become garbage to be cleaned up:
tweets = []
Anyway, there are two big problems with this:
for i in tweets: tweets.remove(i)
First, when you want to remove a specific element, you should never use remove. That has to search the list to find a matching element—which is wasteful (since you already know which one you wanted), and also incorrect if you have any duplicates (there could be multiple matches for the same element). Instead, use the index. For example, del tweets[index]. You can use the enumerate function to get the indices. The same thing is true for lots of other list, string, etc. functions—don't use index, find, etc. with a value when you could get the index directly.
Second, if you remove the first element, everything else shifts up by one. So, first you remove element #0. Then, when you remove element #1, that's not the original element #1, but the original #2, which has shifted up one space. And besides skipping every other element, once you're half-way through, you're trying to remove elements past the (new) end of the list. In general, avoid mutating a list while iterating over it; if you must mutate it, it's only safe to do so from the right, not the left (and it's still tricky to get right).
The right way to remove elements one by one from the left is:
while tweets:
del tweets[0]
However, this will be pretty slow, because you keep having to re-adjust the list after each removal. So it's still better to go from the right:
while tweets:
del tweets[-1]
But again, there's no need to go one by one when you can just do the whole thing at once, or not even do it, as explained above.
You should never try to remove items from a list while iterating over that list. If you want a fresh, empty list, just create one.
tweets = []
Otherwise you may not actually remove all the elements of the list, as I suspect you noticed.
You could also re-work the code to be:
from itertools import groupby
def load_tweet(filename):
with open(filename) as fin:
tweet_blocks = (g for k, g in groupby(fin, lambda line: bool(line.strip())) if k)
return [tf_idf_vectorizer(list(tweets)) for tweets in tweet_blocks]
This groups the file into runs of non-blank lines and blank lines. Where the lines aren't blank, we build a list from them to pass to the vectorizer inside a list-comp. This means that we're not having references to lists hanging about, nor are we appending one-at-a-time to lists.
I have some XML that is generated by a script that may or may not have empty elements. I was told that now we cannot have empty elements in the XML. Here is an example:
<customer>
<govId>
<id>#</id>
<idType>SSN</idType>
<issueDate/>
<expireDate/>
<dob/>
<state/>
<county/>
<country/>
</govId>
<govId>
<id/>
<idType/>
<issueDate/>
<expireDate/>
<dob/>
<state/>
<county/>
<country/>
</govId>
</customer>
The output should look like this:
<customer>
<govId>
<id>#</id>
<idType>SSN</idType>
</govId>
</customer>
I need to remove all the empty elements. You'll note that my code took out the empty stuff in the "govId" sub-element, but didn't take out anything in the second. I am using lxml.objectify at the moment.
Here is basically what I am doing:
root = objectify.fromstring(xml)
for customer in root.customers.iterchildren():
for e in customer.govId.iterchildren():
if not e.text:
customer.govId.remove(e)
Does anyone know of a way to do this with lxml objectify or is there an easier way period? I would also like to remove the second "govId" element in its entirety if all its elements are empty.
First of all, the problem with your code is that you are iterating over customers, but not over govIds. On the third line you take the first govId for every customer, and iterate over its children. So, you'd need a another for loop for the code to work like you intended it to.
This small sentence at the end of your question then makes the problem quite a bit more complex: I would also like to remove the second "govId" element in its entirety if all its elements are empty.
This means, unless you want to hard code just checking one level of nesting, you need to recursively check if an element and it's children are empty. Like this for example:
def recursively_empty(e):
if e.text:
return False
return all((recursively_empty(c) for c in e.iterchildren()))
Note: Python 2.5+ because of the use of the all() builtin.
You then can change your code to something like this to remove all the elements in the document that are empty all the way down.
# Walk over all elements in the tree and remove all
# nodes that are recursively empty
context = etree.iterwalk(root)
for action, elem in context:
parent = elem.getparent()
if recursively_empty(elem):
parent.remove(elem)
Sample output:
<customer>
<govId>
<id>#</id>
<idType>SSN</idType>
</govId>
</customer>
One thing you might want to do is refine the condition if e.text: in the recursive function. Currently this will consider None and the empty string as empty, but not whitespace like spaces and newlines. Use str.strip() if that's part of your definition of "empty".
Edit: As pointed out by #Dave, the recursive function could be improved by using a generator expression:
return all((recursively_empty(c) for c in e.getchildren()))
This will not evaluate recursively_empty(c) for all the children at once, but evaluate it for each one lazily. Since all() will stop iteration upon the first False element, this could mean a significant performance improvement.
Edit 2: The expression can be further optimized by using e.iterchildren() instead of e.getchildren(). This works with the lxml etree API and the objectify API.