I have a list of URLs in an open CSV which I have ordered alphabetically, and now I would like to iterate through the list and check for duplicate URLs. In a second step, the duplicate should then be removed from the list, but I am currently stuck on the checking part which I have tried to solve with a nested for-loop as follows:
for i in short_urls:
first_url = i
for s in short_urls:
second_url = s
if i == s:
print "duplicate"
else:
print "all good"
The print statements will obviously be replaced once the nested for-loop is working. Currently, the list contains a few duplicates, but my nested loop does not seem to work correctly as it does not recognise any of the duplicates.
My question is: are there better ways to do perform this exercise, and what is the problem with the current nested for-loop?
Many thanks :)
By construction, your method is faulty, even if you indent the if/else block correctly. For instance, imagine if you had [1, 2, 3] as short_urls for the sake of argument. The outer for loop will pick out 1 to compare to the list against. It will think it's finding a duplicate when in the inner for loop it encounters the first element, a 1 as well. Essentially, every element will be tagged as a duplicate and if you plan on removing duplicates, you'll end up with an empty list.
The better solution is to call set(short_urls) to get a set of your urls with the duplicates removed. If you want a list (as opposed to a set) of urls with the duplicates removed, you can convert the set back into a list with list(set(short_urls)).
In other words:
short_urls = ['google.com', 'twitter.com', 'google.com']
duplicates_removed_list = list(set(short_urls))
print duplicates_removed_list # Prints ['google.com', 'twitter.com']
if i == s:
is not inside the second for loop. You missed an indentation
for i in short_urls:
first_url = i
for s in short_urls:
second_url = s
if i == s:
print "duplicate"
else:
print "all good"
EDIT: Also you are comparing every element of an array with every element of the same array. This means compare the element at position 0 with the element at postion 0, which is obviously the same.
What you need to do is starting the second for at the position after that reached in the first for.
Related
For every element in a for loop, I want to check if the element satisfies some condition. If yes, I want to do something to it; if no, I want to add it to the end of the list and do it later.
I'm aware that modifying loops while looping is bad, but I want to do it anyway, and want to know how to do it correctly.
for i in list:
if something(i):
do(i)
else:
#append to end of list, do later
list.remove(i)
list.append(i)
This piece of code mostly works, but causes me to skip the element after the removed i while iterating. How do I work around that?
I'm pretty sure there is no correct way to modify a list while looping over a list. In particular, removing objects from a list will cause issues (see strange result when removing item from a list).
I would consider making additional lists, one for your results, one temp list for the elements you have left to process. Note that in my example, tmp1 is your input list.
tmp1 = [values to process]
tmp2 = []
results = []
while(tmp1)
for i in tmp1:
if something(i):
results.append(i)
do(i)
else:
tmp2.append(i)
tmp1 = tmp2
tmp2 = []
Realized just after asking that I don't even need to remove if I'm only ever going to iterate through the list once - if I just append, I'll be able to operate on the element once everything else is finished.
In other words, this works:
for i in list:
if something(i):
do(i)
else:
#append to end of list, do later
list.append(i)
One of the questions for an assignment I'm doing consists of looking within a nested lists consisting of "an ultrashort story and its author.", to find a string that was inputted by a user. Not to sure on how to go about this, here is the assignment brief below if anyone would like more clarification. There are also more questions I'm not to sure on eg "find all stories by a certain author". Some explanations, or point me in the right direction is greatly appreciated :)
list = []
mylist = [['a','b','c'],['d','e','f']]
string = input("String?")
if string in [elem for sublist in mylist for elem in sublist] == True:
list.append(elem)
This is just an example of something i've tried, the list above is similar enough to the one i'm actually using for the question. I've just currently been going through different methods of iterating over a nested lists and adding mathcing items to another list. above code is just one example of an attemp i've made at this proccess.
""" the image above states that the data is in the
form of an list of sublists, with each sublist containing
two strings
"""
stories = [
['story string 1', 'author string 1'],
['story string 2', 'author string 2']
]
""" find stories that contain a given string
"""
stories_with_substring = []
substring = 'some string' # search string
for story, author in stories:
# if the substring is not in the story, a ValueError is raised
try:
story.index(substring)
stories_with_substring.append((story, author))
except ValueError:
continue
""" find stories by a given author
"""
stories_by_author = []
target_author = 'first last'
for story, author in stories:
if author == target_author:
stories_by_author.append((story, author))
This line here
for story, author in stories:
'Unpacks' the array. It's equivalent to
for pair in stories:
story = pair[0]
author = pair[1]
Or to go even further:
i = 0
while i < len(stories):
pair = stories[i]
story = pair[0]
author = pair[1]
I'm sure you can see how useful this is when dealing with lists that contain lists/tuples.
You may need to call .lower() on some of the strings if you want the search to be case insensitive
You can do a few things here. Your example showed the use of a list comprehension, so let's focus on some other aspects of this problem.
Recursion
You can define a function that iterates through all the items in the top level list. Assuming you know for sure all items are either strings or more lists, you can use type() to check if each item is another list, or is a string. If it's a string, do your search - if it's a list, have your function call itself. Let's look at an example. Please note that we should never using variables named list or string - these are core value types and we don't want to accidentally overwrite them!
mylist = [['a','b','c'],['d','e','f']]
def find_nested_items(my_list, my_input):
results = []
for i in mylist:
if type(i) == list:
items = find_nested_items(i, my_input)
results += items
elif my_input in i:
results.append(i)
return results
We're doing a few things here:
Creating an empty list named results
Iterating through the top level items of my_list
If one of those items is another list, we have our function call itself - at some point this will trigger the condition where an item is not a list, and will eventually return the results from that. For now, we assume the results we're getting back are going to be correct, so we concatenate those results to our top level results list
If the item is not a list, we simply check for the existence of our input and if so, add it to our results list
This kind of recursion is typically very safe, because it's inherently limited by our data structure. It can't run forever unless the data structure itself is infinitely deep.
Generators
Next, let's look at a much cooler function of python 3: generators. Right now, we're doing all the work of collecting the results in one go. If we later on want to iterate through those results, we need to iterate over them separately.
Instead of doing that, we can define a generator. This works almost the same, practically speaking, but instead of collecting the results in one loop and then using them in a second, we can collect and use each result all within a single loop. A generator "yields" a value, then stops until it is called the next time. Let's modify our example to make it a generator:
mylist = [['a','b','c'],['d','e','f']]
def find_nested_items(my_list, my_input):
for i in mylist:
if type(i) == list:
yield from find_nested_items(i, my_input)
elif my_input in i:
yield i
You'll notice this version is a fair bit shorter. There's no need to hold items in a temporary list - each item is "yielded", which means it's passed directly to the caller to use immediately, and the caller will stop our generator until it needs the next value.
yield from basically does the same recursion, it simply sets up a generator within a generator to return those nested items back up the chain to the caller.
These are some good techniques to try - please give them a go!
This seems like a fairly straightforward problem but I can't seem to find an efficient way to do it. I have a list of lists like this:
list = [['abc','def','123'],['abc','xyz','123'],['ghi','jqk','456']]
I want to get a list of unique entries by the third item in each child list (the 'id'), i.e. the end result should be
unique_entries = [['abc','def','123'],['ghi','jqk','456']]
What is the most efficient way to do this? I know I can use set to get the unique ids, and then loop through the whole list again. However, there are more than 2 million entries in my list and this is taking too long. Appreciate any pointers you can offer! Thanks.
How about this: Create a set that keeps track of ids already seen, and only append sublists where id's where not seen.
l = [['abc','def','123'],['abc','xyz','123'],['ghi','jqk','456']]
seen = set()
new_list = []
for sl in l:
if sl[2] not in seen:
new_list.append(sl)
seen.add(sl[2])
print new_list
Result:
[['abc', 'def', '123'], ['ghi', 'jqk', '456']]
One approach would be to create an inner loop. within the first loop you iterate over the outer list starting from 1, previously you will need to create an arraylist which will add the first element, inside the inner loop starting from index 0 you will check only if the third element is located as part of the third element within the arraylist current holding elements, if it is not found then on another arraylist whose scope is outside outher loop you will add this element, else you will use "continue" keyword. Finally you will print out the last arraylist created.
This question already has answers here:
How to test multiple variables for equality against a single value?
(31 answers)
Closed 6 years ago.
import os
os.chdir('G:\\f5_automation')
r = open('G:\\f5_automation\\uat.list.cmd.txt')
#print(r.read().replace('\n', ''))
t = r.read().split('\n')
for i in range(len(t)):
if ('inherited' or 'device-group' or 'partition' or 'template' or 'traffic-group') in t[i]:
t.pop(i)
print(i,t[i])
In the above code, I get an index error at line 9: 'if ('inherited' or 'device-group'...etc.
I really don't understand why. How can my index be out of range if it's the perfect length by using len(t) as my range?
The goal is to pop any indexes from my list that contain any of those substrings. Thank you for any assistance!
This happens because you are editing the list while looping through it,
you first get the length which is 10 for example, then you loop through the thing 10 times. but as soon as you've deleted one thing the list will only be 9 long.
A way around this is to create a new list of things you want to keep and use that one instead.
I've slightly edited your code and done something similar.
t = ['inherited', 'cookies', 'device-group']
interesing_things = []
for i in t:
if i not in ['inherited', 'device-group', 'partition', 'template', 'traffic-group']:
interesing_things.append(i)
print(i)
Let's say len(t) == 5.
We'll process i taking values [0,1,2,3,4]
After we process i = 0, we pop one value from t. len(t) == 4 now. This would mean error if we get to i = 4. However, we're still going to try to go up to 4 because our range is already inited to be up to 4.
Next (i = 1) step ensures an error on i = 3.
Next (i = 2) step ensures an error on i = 2, but that is already processed.
Next (i = 3) step yields an error.
Instead, you should do something like this:
while t:
element = t.pop()
print(element)
On a side note, you should replace that in check with sets:
qualities_we_need = {'inherited', 'device-group', 'partition'} # put all your qualities here
And then in loop:
if qualities_we_need & set(element):
print(element)
If you need indexes you could either use one more variable to keep track of index of value we're currently processing, or use enumerate()
As many people said in the comments, there are several problems with your code.
The or operator sees the values on its left and right as booleans and returns the first one that is True (from left to right). So your parenthesis evaluates to 'inherited' since any non-empty string is True. As a result, even if your for loop was working, you would be popping elements that are equal to 'inherited' only.
The for loop is not working though. That happens because the size of the list you are iterating over is changing as you loop through and you will get an index-out-of-range error if an element of the list is actually equal to 'inherited' and gets popped.
So, take a look at this:
import os
os.chdir('G:\\f5_automation')
r = open('G:\\f5_automation\\uat.list.cmd.txt')
print(r.read().replace('\n', ''))
t = r.read().split('\n')
t_dupl = t[:]
for i, items in enumerate(t_dupl):
if items in ['inherited', 'device-group', 'partition', 'template', 'traffic-group']:
print(i, items)
t.remove(items)
By duplicating the original list, we can use its items as a "pool" of items to pick from and modify the list we are actually interested in.
Finally, know that the pop() method returns the item it removes from the list and this is something you do not need in your example. remove() works just fine for you.
As a side note, you can probably replace your first 5 lines of code with this:
with open('G:\\f5_automation\\uat.list.cmd.txt', 'r') as r:
t = r.readlines()
the advantage of using the with statement is that it automatically handles the closing of the file by itself when the reading is done. Finally, instead of reading the whole file and splitting it on linebreaks, you can just use the built-in readlines() method which does exactly that.
This question already has answers here:
How to remove items from a list while iterating?
(25 answers)
Closed 7 months ago.
I wanted to remove the word "hello" from this array, but I get the "index out of bounds" error. I checked the range of len(token); it was (0,5).
Here is the code:
token=['hi','hello','how','are','you']
stop='hello'
for i in range(len(token)):
if(token[i]==stop):
del(token[i])
You're getting an index out of bounds exception because you are deleting an item from an array you're iterating over.
After you delete that item, len(token) is 4, but your for loop is iterating 5 times (5 being returned from the initial len(token)).
There are two ways to solve this. The better way would be to simply call
token.remove(stop)
This way won't require iterating over the list, and will automatically remove the first item in the array with the value of stop.
From the documentation:
list.remove(x): Remove the first item from the list whose value is x. It is an error if there is no such item.
Given this information, you may want to check if the list contains the target element first to avoid throwing a ValueError:
if stop in token:
token.remove(stop)
If the element can exist multiple times in the list, utilizing a while loop will remove all instances of it:
while stop in token:
token.remove(stop)
If you need to iterate over the array for some reason, the other way would be to add a break after del(token[i]), like this:
for i in range(len(token)):
if(token[i]==stop):
del(token[i])
break
It's not recommended to delete a list element when iterating over this list. I'm not sure what you intend but you could create a new list without the stop
token=['hi','hello','how','are','you']
stop='hello'
new_tokens = []
for i in range(len(token)):
if(token[i]!=stop):
new_tokens.append(token[i])
or create a list with everything until stop is reached:
token=['hi','hello','how','are','you']
stop='hello'
new_tokens = []
for i in range(len(token)):
if(token[i]!=stop):
new_tokens.append(token[i])
else:
break
But never delete elements from a list you are iterating over because then the length of the list is modified but the range is not.
The reason you are getting this error is two fold:
you are using the anti pattern in Python of range(len(sequence)). You should use for index, value in enumerate(sequence)
You are mutating a sequence as you iterate across it.
The call to range(len(...)) is only evaluated once. So when you star it evaluates to 5. Once you remove your stop word the list no longer has 5 elements so token[4] results in an IndexError
Once you delete an item, there are no longer as many items in the list as there were originally, so i will get too big. Also, if you delete the item at index i, then the element that used to be at i+1 will now be at index i, but your code won't check it, since it goes ahead and increments i.
Use break statement after deleting because you are modifying the same list in which you are iterating.
for i in range(len(token)):
if(token[i]==stop):
del(token[i])
break
I don't disagree with any of the other answers, but the most Pythonic solution is to get rid of the loop entirely and replace it with one line:
token.remove(stop)
That will remove the first occurrence of 'hello' from the list.