Get unique entries in list of lists by an item - python

This seems like a fairly straightforward problem but I can't seem to find an efficient way to do it. I have a list of lists like this:
list = [['abc','def','123'],['abc','xyz','123'],['ghi','jqk','456']]
I want to get a list of unique entries by the third item in each child list (the 'id'), i.e. the end result should be
unique_entries = [['abc','def','123'],['ghi','jqk','456']]
What is the most efficient way to do this? I know I can use set to get the unique ids, and then loop through the whole list again. However, there are more than 2 million entries in my list and this is taking too long. Appreciate any pointers you can offer! Thanks.

How about this: Create a set that keeps track of ids already seen, and only append sublists where id's where not seen.
l = [['abc','def','123'],['abc','xyz','123'],['ghi','jqk','456']]
seen = set()
new_list = []
for sl in l:
if sl[2] not in seen:
new_list.append(sl)
seen.add(sl[2])
print new_list
Result:
[['abc', 'def', '123'], ['ghi', 'jqk', '456']]

One approach would be to create an inner loop. within the first loop you iterate over the outer list starting from 1, previously you will need to create an arraylist which will add the first element, inside the inner loop starting from index 0 you will check only if the third element is located as part of the third element within the arraylist current holding elements, if it is not found then on another arraylist whose scope is outside outher loop you will add this element, else you will use "continue" keyword. Finally you will print out the last arraylist created.

Related

How to add up values of the "sublists" within a list of lists

I have a list of lists in my script:
list = [[1,2]
[4,3]
[6,2]
[1,6]
[9,2]
[6,5]]
I am looking for a solution to sum up the contents of each "sublist" within the list of lists. The desired output would be:
new_list = [3,7,8,7,11,11]
I know about combining ALL of these lists into one which would be:
new_list = [27,20]
But that's not what i'm looking to accomplish.
I need to combine the two values within these "sublists" and have them remain as their own entry in the main list.
I would also greatly appreciate it if it was also explained how you solved the problem rather than just handing me the solution. I'm trying to learn python so even a minor explanation would be greatly appreciated.
Using Python 3.7.4
Thanks, Riftie.
The "manual" solution will be using a for loop.
new_list = []
for sub_list in list:
new_list.append(sum(sub_list))
or as list compherension:
new_list = [sum(sub_list) for sub_list in list]
The for loop iterates through the elements of list. In your case, list is a list of lists. So every element is a list byitself. That means that while iterating, sub_list is a simple list. To get a sum of list I used sum() build-in function. You can of course iterate manually and sum every element:
new_list = []
for sub_list in list:
sum_val = 0
for element in sub_list:
sum_val = sum_val + element
new_list.append(sum_val)
but no need for that.
A better approach will be to use numpy, which allows you to sum by axis, as it looks on list of lists like an array. Since you are learning basic python, it's too soon to learn about numpy. Just keep in mind that there is a package for handling multi-dimensions arrays and it allows it perform some actions like sum on an axis by your choice.
Edit: I've seen the other solution suggest. As both will work, I believe this solution is more "accessible" for someone who learn to program for first time. Using list comprehension is great and correct, but may be a bit confusing while first learning. Also as suggested, calling your variables list is a bad idea because it's keyword. Better names will be "my_list", "tmp_list" or something else.
Use list comprehension. Also avoid using keywords as variable names, in your case you overrode the builtin list.
# a, b -> sequence unpacking
summed = [a + b for a, b in lst] # where lst is your list of lists
# if the inner lists contain variable number of elements, a more
# concise solution would be
summed2 = [sum(seq) for seq in lst]
Read more about the powerful list comprehension here.

How do I remove the element with fewest sets in a list and only keep the one with the most?

I have a list where each element contains an unknown number of sets. (The sets in the list varies depending on choices the user do in the program.) Now I want to remove all the elements with the fewest number of sets and only keep the one or ones that contains the most number of sets.
My list can look like this:
[{'Chocolate'}, {'Chocolate'}, {'JellyBean', 'Chips'}]
In this case, I would have wanted to keep just the last element because it contains two sets and the rest only one set. But sometimes there are several elements with the highest number of sets and then I want to keep them all.
I have tried to do something like:
if min(len(list)) != max(len(list)):
list.remove(min(len(list)))
but Python just says "'int' object is not iterable" and I can understand why but not how to think instead.
Would be very thankful if someone helped me!
You'll need to iterate the list once to determine the max and then again to find the elements you want to keep. There will be fancier solutions to do it the most efficient way possible but will be harder to understand.
example_list = [('Chocolate'), ('Chocolate'), ('JellyBean', 'Chips')]
max_length = 0
for item in example_list:
if len(item) > max_length:
max_length = len(item)
new_list = []
for item in example_list:
if len(item) == max_length:
new_list.append(item)
You might be able to do remove but normally Python gets mad about changing something you are actively iterating through.
Why not create a new list wherein it has the max number of elements. Try below:
list_of_sets= [{'Chocolate'}, {'Chocolate', 'Candy'}, {'JellyBean', 'Chips'}]
max_len=max(len(s) for s in list_of_sets)
final_list = [s for s in list_of_sets if len(s)==max_len]
final_list
Result:
[{'Candy', 'Chocolate'}, {'Chips', 'JellyBean'}]

Compare items in list with nested for-loop

I have a list of URLs in an open CSV which I have ordered alphabetically, and now I would like to iterate through the list and check for duplicate URLs. In a second step, the duplicate should then be removed from the list, but I am currently stuck on the checking part which I have tried to solve with a nested for-loop as follows:
for i in short_urls:
first_url = i
for s in short_urls:
second_url = s
if i == s:
print "duplicate"
else:
print "all good"
The print statements will obviously be replaced once the nested for-loop is working. Currently, the list contains a few duplicates, but my nested loop does not seem to work correctly as it does not recognise any of the duplicates.
My question is: are there better ways to do perform this exercise, and what is the problem with the current nested for-loop?
Many thanks :)
By construction, your method is faulty, even if you indent the if/else block correctly. For instance, imagine if you had [1, 2, 3] as short_urls for the sake of argument. The outer for loop will pick out 1 to compare to the list against. It will think it's finding a duplicate when in the inner for loop it encounters the first element, a 1 as well. Essentially, every element will be tagged as a duplicate and if you plan on removing duplicates, you'll end up with an empty list.
The better solution is to call set(short_urls) to get a set of your urls with the duplicates removed. If you want a list (as opposed to a set) of urls with the duplicates removed, you can convert the set back into a list with list(set(short_urls)).
In other words:
short_urls = ['google.com', 'twitter.com', 'google.com']
duplicates_removed_list = list(set(short_urls))
print duplicates_removed_list # Prints ['google.com', 'twitter.com']
if i == s:
is not inside the second for loop. You missed an indentation
for i in short_urls:
first_url = i
for s in short_urls:
second_url = s
if i == s:
print "duplicate"
else:
print "all good"
EDIT: Also you are comparing every element of an array with every element of the same array. This means compare the element at position 0 with the element at postion 0, which is obviously the same.
What you need to do is starting the second for at the position after that reached in the first for.

Python: creating empty lists within empty lists?

EDIT: so I figured out that if I declare this at the beginning it works fine:
RelayPlaceHolder = [[],[],[],[],[],[],[],[],[]]
Why can't something like this create the same sort of empty containers? the number of empty lists might change:
for SwimTeams in SwimTeamList:
empty = []
RelayPlaceHolder.append(empty)
this was my old question...
I have a list of lists of further lists of single dictionaries:
TeamAgeGroupGender[team#][swimmer#][their dictionary with a key such as {"free":19.05}]
I have a loop that for every team in the first level of my lists, it then loops through every swimmer within that team's list, and add's their swim that corresponds to the key value "free" to a new list called RelayPlaceHolder[teamid][***the thing I just added***]
for SwimTeams in SwimTeamList:
empty = []
RelayPlaceHolder.append(empty)
teamid = SwimTeamList.index(SwimTeams)
print SwimTeams
print teamid
for swimmers in TeamAgeGroupGender[teamid]:
swimmerID = TeamAgeGroupGender[teamid].index(swimmers)
RelayPlaceHolder[teamid].append({SwimTeams:TeamAgeGroupGender[teamid][swimmerID]["Free"]})
print RelayPlaceHolder[teamid][0]
Desired:
RelayPlaceHolder[0][*** list of swims that go with this team#0 ***]
RelayPlaceHolder[1][*** list of swims that go with team#1, ie the next teamid in the loop***]
for some reason, my loop is only adding swims to RelayPlaceHolder[0], even the swims from team#1. I tried using the print to troubleshoot, however, the teamid index and swimteam names are changing just fine from #0 to #1, however, my RelayPlaceHolder[teamid].append is still adding to the #0 list and not the #1. I also know this because a key value from later code is failing to find the correct key in RelayPlaceHolder[1] (because its turning up empty). I'm not sure why my loop is failing. I've used similar structure in other loops...
Thank you.
As commented by #doukremt: A concise syntax if you need to define an arbitrary number of lists is:
[[] for i in range(some_number)]
If you need to do it more often, you can implement it in a function:
>>> lists = lambda x: [[] for i in range(x)]
>>> lists(3)
[[], [], []]

Indexing According to Number in the Names of Objects in a List in Python

Apologies for my title not being the best. Here is what I am trying to accomplish:
I have a list:
list1 = [a0_something, a2_something, a1_something, a4_something, a3_something]
i have another list who entries are tuples including a name such as :
list2 = [(x1,y1,z1,'bob'),(x2,y2,z2,'alex')...]
the 0th name in the second list corresponds to a0_something and the name in the 1st entry of the second list corresponds to a1_something. basically the second list is in the write order but the 1st list isnt.
The program I am working with has a setName function I would like to do this
a0_something.setName(list2[0][4])
and so on with a loop.
So that I can really just say
for i in range(len(list1)):
a(i)_something.setName(list2[i][4])
Is there anyway I can refer to that number in the a#_something so that I can iterate with a loop?
No.
Variable names have no meaning in run-time. (Unless you're doing introspection, which I guarantee you is something you should not be doing.)
Use a proper list such that:
lst = [a0_val, a1_val, a2_val, a3_val, a4_val]
and then address it by lst[0].
Alternatively, if those names have meanings, use a dict where:
dct = {
'a0' : a0_val,
'a1' : a1_val,
# ...
}
and use it with dct['a0'].
The enumerate function lets you get the value and the index of the current item. So, for your example, you could do:
for i, asomething in enumerate(list1):
asomething.setName(list2[i][3])
Since your list2 is length 4, the final element is index 3 (you could also use -1)

Categories

Resources