I have a list of lists of lists that looks like this:
[[[1], [’apple’], [’AAA’]]
[[2], [’banana’], [’BBB’]]
[[3], [’orange’], [’CCC’]]
[[4], [’pineapple’], [’AAA’]]
[[5], [’tomato’], [’ABC’]]]
Probably the wrong terminology, but: I want to find duplicates in the third column, add that row's second column item to the first instance of the duplicates, and then remove the duplicate row.
So using the example: I want to iterate through the list, find the duplicate value 'AAA', add 'pineapple' after 'apple' and remove the (second level) list containing the second instance of 'AAA'.
The list I want to end up with should look like:
[[[1], [’apple’, 'pineapple'], [’AAA’]]
[[2], [’banana’], [’BBB’]]
[[3], [’orange’], [’CCC’]]
[[5], [’tomato’], [’ABC’]]]
I tried the following but I can't figure out how to do this..
seen = set()
for l in final:
if l[2] not in seen: # TypeError: unhashable type: 'list'
# Here I want to add value to first instance
seen.add(l[2])
# Remove list
This will do what you're asking for... but I seriously wonder whether you can't change your data structure. It's strange and hard to work with!
newList = []
lookup = {}
for l in final:
if l[2][0] not in lookup:
lookup[l[2][0]] = l
newList.append(l)
else:
lookup[l[2][0]][1].append(l[1][0])
print newList
The reason you were getting the TypeError is you that you were doing this l[2] instead of l[2][0]. Remember, l[2] is a list. What you want is to grab the item inside that list (index 0 in this case) and check if that is in lookup. The lookup replaces the seen set implemented in your example because it can also help get back the entry that a duplicate l[2][0] would correspond to, since your data structure currently isn't set up to do something like final['AAA']. However, this isn't very ideal and I'd heavily recommend you do something about changing this, if possible.
Something else to think about...
Currently, because your items are all essentially lists within lists, the current algorithm will essentially change the nested objects (lists) you were working with, because of object mutability. This means that while final would contain the same objects it did originally, those objects will have changed (in this case with ['apple', 'pineapple']).
If you want to prevent that from happening, look into using the copy module. Specifically, using the deepcopy method to copy all objects (even through the nesting).
Edit:
w0lf's version (Improved readability)
newList = []
lookup = {}
for l in final:
row_no, fruit, code = l
unique_id = code[0] # because `code` is a one element list
if unique_id not in lookup:
lookup[unique_id] = l
newList.append(l)
else:
lookup[unique_id][1].extend(fruit)
print(newList)
Also note: He remembered to do print(newList) instead of print newList for Py3k users. Since the question is tagged for Python 3, that's the way to go.
List is unhashable type, i.e you cannot add it (as is) to data structures that uses hash maps (like python dictionary or set). but strings are hashable.
I'd do
seen.add(str(ls[2]))
This will solve the TypeError
Related
I have a list in Python as
list_data = [('a','b',5),('aa','bb',50)]
and some variables:
a = ('a','b','2')
c = ('aaa','bbb','500')
Now how can I search if a is already there in list_data?
If yes add 2 to the value of a, if not append to list_data?
The result should be as
list_data = [('a','b',7),('aa','bb',50),('aaa','bbb','500')]
Actually, this question is a good way to several demonstrate Pythonic ways of doing things. So lets see what we can do.
In order to check if something is in python list you can just use operator in:
if a in list_data:
do_stuff()
But what you ask is a bit different. You want to do something like a search by multiple keys, if I understand correctly. In this case you can 'trim' your tuple by discarding last entry.
Slicing is handy for this:
value_trimmed = value[:-1]
Now you can make a list of trimmed tuples:
list_trimmed = []
for a in list_data:
list_trimmed.append(a[:-1])
And then search there:
if a[:-1] in list_trimmed:
do_smth()
This list can be constructed in a less verbose way using list_comprehension:
list_trimmed = [item[:-1] for item in list_data]
To find where your item exactly is you can use index() method of list:
list_trimmed.index(a[:-1])
This will return index of a[:-1] first occurrence in list_trimmed or throw if it cant be found. We can avoid explicitly checking if item is in the list, and do the insertion only if the exception is caught.
Your full code will look like this:
list_data = [('a','b',5), ('aa','bb',50)]
values_to_find = [('a','b','2'), ('aaa','bbb','500')]
list_trimmed = [item[:-1] for item in list_data]
for val in values_to_find:
val_trimmed = val[:-1]
try:
ind = list_trimmed.index(val_trimmed)
src_tuple = list_data[ind]
# we can't edit tuple inplace, since they are immutable in python
list_data[ind] = (src_tuple[0], src_tuple[1], src_tuple[2]+2)
except ValueError:
list_data.append(val)
print list_data
Of course, if speed or memory-efficiency is your main concern this code is not very appropriate, but you haven't mentioned these in your question, and that is not what python really about in my opinion.
Edit:
You haven't specified what happens when you check for ('aaa','bbb','500') second time - should we use the updated list and increment matching tuple's last element, or should we stick to the original list and insert another copy?
If we use updated list, it is not clear how to handle incrementing string '500' by 2 (we can convert it to integer, but you should have constructed your query appropriately in the first place).
Or maybe you meant add last element of tuple being searched to the tuple in list if found ? Please edit your question to make it clear.
I have been trying to develop a graph structure that will link entities according to co-mentioned features between them, e.g. 2 places are linked if co-mentioned in an article.
I have managed to do so but I have been having problems to iteratively populate an edge with new information keeping the already existing one.
My approach (since I haven't found anything related anywhere) is to append existing information to a list, append the new link in the list and assign that list to the appropriate feature.
temp = []
if G.has_edge(i[z],i[j]):
temp.append(G[i[z]][i[j]]['article'])
temp.append(url[index])
G[i[z]][i[j]]['article'] = temp
else:
print "Create edge!"
G.add_edge(i[z],i[j], article=url)
del temp[:]
As you can see above, as there are many links to be populated, I defined a dedicated list (temp), loaded the old contents of a link's variable called article (if the link does not exist I create a link and add as first value the url that "brought" 2 places together.
My problem is that while I empty the list each time in order to be empty when a new pair comes in when I try to see a link's urls I get something like this:
{'article': [[...], u'http://www.huffingtonpost.co.uk/.../']
It seems like I am keeping only the last link as each time I delete the temporary list's contents but I cannot find a better way to do so without declaring an unnecessary bunch of temp lists.
Any ideas?
Thank you for your time.
TL/DR summary: change your entire snippet to
if G.has_edge(i[z],i[j]):
G[i[z]][i[j]]['article'].append(url[index])
else:
G.add_edge(i[z],i[j], article=[url])
Here's what's going on:
When you create the edge the first time you use
G.add_edge(i[z],i[j], article=url)
So it's a string. But later when you do
G[i[z]][i[j]]['article'] = temp
you've defined temp to be a list whose first element is G[i[z]][i[j]]['article']. So G[i[z]][i[j]]['article'] is now a list with two elements, the first of which is the old value for G[i[z]][i[j]]['article'] (a string) and the second of which is the new url (also a string).
Your problem comes at the later steps:
From then on, it's exactly the same thing. G[i[z]][i[j]]['article'] is again a list with two elements, the first of which is its old value (a list) and the second is the new url (a string). So you've got a nested list.
let's trace through with three urls: 'a', 'b', and 'c', and I'll use E to abbreviate G[i[z]][i[j]]. First time through, you get E='a'. Second time through you get E=['a', 'b']. Third time through it gives E=[['a','b'],'c']. So it's always making E[0] to be the former value of E, and E[1] to be the new url.
Two choices:
1) you can handle the creation of temp differently if you've got a string or a list. This is the bad choice.
2)Better: Make it a list the whole time through and then don't even deal with temp. Try creating the edge as (...,article = [url]) and then just use G[i[z]][i[j]]['article'].append(url) instead of defining temp.
So your code would be
if G.has_edge(i[z],i[j]):
G[i[z]][i[j]]['article'].append(url[index])
else:
G.add_edge(i[z],i[j], article=[url])
A separate thing that could also cause you problems is the call
del temp[:]
This should cause behavior different from what I think you're describing. So I think this is a bit different from how it's actually coded. When you set G[i[z]][i[j]] = temp and then do del temp[:], you've made the two lists to be one list with two different names. When you del temp[:] you're also doing it to G[i[z]][i[j]]. Consider the following
temp = []
temp.append(1)
print temp
> [1]
L = temp
print L
> [1]
del temp[:]
print L
> []
I think all your previous URLs are in your new list. They are in the [...].
You must use extend instead of append when you get the existing list from the edge.
temp = []
temp.append([1, 2, 3])
temp.append(1)
print(temp)
You will get:
[[1, 2, 3], 4]
But if you do:
temp = []
temp.extend([1, 2, 3])
temp.append(4)
print(temp)
You get:
[1, 2, 3, 4]
EDIT: so I figured out that if I declare this at the beginning it works fine:
RelayPlaceHolder = [[],[],[],[],[],[],[],[],[]]
Why can't something like this create the same sort of empty containers? the number of empty lists might change:
for SwimTeams in SwimTeamList:
empty = []
RelayPlaceHolder.append(empty)
this was my old question...
I have a list of lists of further lists of single dictionaries:
TeamAgeGroupGender[team#][swimmer#][their dictionary with a key such as {"free":19.05}]
I have a loop that for every team in the first level of my lists, it then loops through every swimmer within that team's list, and add's their swim that corresponds to the key value "free" to a new list called RelayPlaceHolder[teamid][***the thing I just added***]
for SwimTeams in SwimTeamList:
empty = []
RelayPlaceHolder.append(empty)
teamid = SwimTeamList.index(SwimTeams)
print SwimTeams
print teamid
for swimmers in TeamAgeGroupGender[teamid]:
swimmerID = TeamAgeGroupGender[teamid].index(swimmers)
RelayPlaceHolder[teamid].append({SwimTeams:TeamAgeGroupGender[teamid][swimmerID]["Free"]})
print RelayPlaceHolder[teamid][0]
Desired:
RelayPlaceHolder[0][*** list of swims that go with this team#0 ***]
RelayPlaceHolder[1][*** list of swims that go with team#1, ie the next teamid in the loop***]
for some reason, my loop is only adding swims to RelayPlaceHolder[0], even the swims from team#1. I tried using the print to troubleshoot, however, the teamid index and swimteam names are changing just fine from #0 to #1, however, my RelayPlaceHolder[teamid].append is still adding to the #0 list and not the #1. I also know this because a key value from later code is failing to find the correct key in RelayPlaceHolder[1] (because its turning up empty). I'm not sure why my loop is failing. I've used similar structure in other loops...
Thank you.
As commented by #doukremt: A concise syntax if you need to define an arbitrary number of lists is:
[[] for i in range(some_number)]
If you need to do it more often, you can implement it in a function:
>>> lists = lambda x: [[] for i in range(x)]
>>> lists(3)
[[], [], []]
I am creating a program for a high school course and our teacher is very specific about what is allowed into our programs. We use python 2.x and he only allows if statements, while loops, functions, boolean values, and lists. I am working on a project that will print the reversal of a string, then print again the same reversal without the numbers in it but I cannot figure it out. Help please. What i have so far is this..
def reverse_str(string):
revstring =('')
length=len(string)
i = length - 1
while i>=0:
revstring = revstring + string[i]
i = i - 1
return revstring
def strip_digits(string):
l = [0,1,2,3,4,5,6,7,8,9]
del (l) rev_string
string = raw_input("Enter a string->")
new_str = rev_str(string)
print new_str
I cannot figure out how to use the "del" function properly, how do i delete any of the items in the list from the reversed string..thanks
In general, you have two options for a task like this:
Iterate through the items in your list, deleting the ones that you do not want to keep.
Iterate through the items in your list, copying the ones that you do want to keep to a new list. Return the new list.
Now, although I would normally prefer option (2), that won't help with your specific question about del. To delete an item at index x from a list a, the following syntax will do it:
del a[x]
That will shift all the elements past index x to the left to close the gap left by deleting the element. You will have to take this shift into account if you're iterating through all the items in the list.
Type str in python is immutable (cannot be altered in place) and does not support the del item deletion function.
Map the characters of the string to a list and delete the elements you want and reconstruct the string.
OR
Iterate through the string elements whilst building a new one, omitting numbers.
correct usage of del is:
>>> a = [1, 2, 3]
>>> del a[1]
>>> a
[1, 3]
You could iterate back over the string copying it again but not copying the digits... It would be interesting for you to also figure out the pythonic way to do everything your not allowed to. Both methods are good to know.
.append
Function adds elements to the list.
How can I add elements to the list? In reverse? So that index zero is new value, and the old values move up in index?
What append does
[a,b,c,d,e]
what I would like.
[e,d,c,b,a]
Thank you very much.
Suppose you have a list a, a = [1, 2, 3]
Now suppose you wonder what kinds of things you can do to that list:
dir(a)
Hmmmm... wonder what this insert thingy does...
help(a.insert)
Insert object before index, you say? Why, that sounds a lot like what I want to do! If I want to insert something at the beginning of the list, that would be before index 0. What object do I want to insert? Let's try 7...
a.insert(0, 7)
print a
Well, look at that 7 right at the front of the list!
TL;DR: dir() will let you see what's available, help() will show you how it works, and then you can play around with it and see what it does, or Google up some documentation since you now know what the feature you want is called.
It would be more efficient to use a deque(double-ended queue) for this. Inserting at index 0 is extremely costly in lists since each element must be shifted over which requires O(N) running time, in a deque the same operation is O(1).
>>> from collections import deque
>>> x = deque()
>>> x.appendleft('a')
>>> x.appendleft('b')
>>> x
deque(['b', 'a'])
Use somelist.insert(0, item) to place item at the beginning of somelist, shifting all other elements down. Note that for large lists this is a very expensive operation. Consider using deque instead if you will be adding items to or removing items from both ends of the sequence.
Using Python's list insert command with 0 for the position value will insert the value at the head of the list, thus inserting in reverse order:
your_list.insert(0, new_item)
You can do
your_list=['New item!!']+your_list
But the insert method works as well.
lst=["a","b","c","d","e","f"]
lst_rev=[]
lst_rev.append(lst[::-1])
print(lst_rev)
Here's an example of how to add elements in a list in reverse order:
liste1 = [1,2,3,4,5]
liste2 = list()
for i in liste1:
liste2.insert(0,i)
Use the following (assuming x is what you want to prepend):
your_list = [x] + your_list
or:
your_list.insert(0, x)