Search tuple elements within in list - python

I have a list in Python as
list_data = [('a','b',5),('aa','bb',50)]
and some variables:
a = ('a','b','2')
c = ('aaa','bbb','500')
Now how can I search if a is already there in list_data?
If yes add 2 to the value of a, if not append to list_data?
The result should be as
list_data = [('a','b',7),('aa','bb',50),('aaa','bbb','500')]

Actually, this question is a good way to several demonstrate Pythonic ways of doing things. So lets see what we can do.
In order to check if something is in python list you can just use operator in:
if a in list_data:
do_stuff()
But what you ask is a bit different. You want to do something like a search by multiple keys, if I understand correctly. In this case you can 'trim' your tuple by discarding last entry.
Slicing is handy for this:
value_trimmed = value[:-1]
Now you can make a list of trimmed tuples:
list_trimmed = []
for a in list_data:
list_trimmed.append(a[:-1])
And then search there:
if a[:-1] in list_trimmed:
do_smth()
This list can be constructed in a less verbose way using list_comprehension:
list_trimmed = [item[:-1] for item in list_data]
To find where your item exactly is you can use index() method of list:
list_trimmed.index(a[:-1])
This will return index of a[:-1] first occurrence in list_trimmed or throw if it cant be found. We can avoid explicitly checking if item is in the list, and do the insertion only if the exception is caught.
Your full code will look like this:
list_data = [('a','b',5), ('aa','bb',50)]
values_to_find = [('a','b','2'), ('aaa','bbb','500')]
list_trimmed = [item[:-1] for item in list_data]
for val in values_to_find:
val_trimmed = val[:-1]
try:
ind = list_trimmed.index(val_trimmed)
src_tuple = list_data[ind]
# we can't edit tuple inplace, since they are immutable in python
list_data[ind] = (src_tuple[0], src_tuple[1], src_tuple[2]+2)
except ValueError:
list_data.append(val)
print list_data
Of course, if speed or memory-efficiency is your main concern this code is not very appropriate, but you haven't mentioned these in your question, and that is not what python really about in my opinion.
Edit:
You haven't specified what happens when you check for ('aaa','bbb','500') second time - should we use the updated list and increment matching tuple's last element, or should we stick to the original list and insert another copy?
If we use updated list, it is not clear how to handle incrementing string '500' by 2 (we can convert it to integer, but you should have constructed your query appropriately in the first place).
Or maybe you meant add last element of tuple being searched to the tuple in list if found ? Please edit your question to make it clear.

Related

How to search for strings within nested lists

One of the questions for an assignment I'm doing consists of looking within a nested lists consisting of "an ultrashort story and its author.", to find a string that was inputted by a user. Not to sure on how to go about this, here is the assignment brief below if anyone would like more clarification. There are also more questions I'm not to sure on eg "find all stories by a certain author". Some explanations, or point me in the right direction is greatly appreciated :)
list = []
mylist = [['a','b','c'],['d','e','f']]
string = input("String?")
if string in [elem for sublist in mylist for elem in sublist] == True:
list.append(elem)
This is just an example of something i've tried, the list above is similar enough to the one i'm actually using for the question. I've just currently been going through different methods of iterating over a nested lists and adding mathcing items to another list. above code is just one example of an attemp i've made at this proccess.
""" the image above states that the data is in the
form of an list of sublists, with each sublist containing
two strings
"""
stories = [
['story string 1', 'author string 1'],
['story string 2', 'author string 2']
]
""" find stories that contain a given string
"""
stories_with_substring = []
substring = 'some string' # search string
for story, author in stories:
# if the substring is not in the story, a ValueError is raised
try:
story.index(substring)
stories_with_substring.append((story, author))
except ValueError:
continue
""" find stories by a given author
"""
stories_by_author = []
target_author = 'first last'
for story, author in stories:
if author == target_author:
stories_by_author.append((story, author))
This line here
for story, author in stories:
'Unpacks' the array. It's equivalent to
for pair in stories:
story = pair[0]
author = pair[1]
Or to go even further:
i = 0
while i < len(stories):
pair = stories[i]
story = pair[0]
author = pair[1]
I'm sure you can see how useful this is when dealing with lists that contain lists/tuples.
You may need to call .lower() on some of the strings if you want the search to be case insensitive
You can do a few things here. Your example showed the use of a list comprehension, so let's focus on some other aspects of this problem.
Recursion
You can define a function that iterates through all the items in the top level list. Assuming you know for sure all items are either strings or more lists, you can use type() to check if each item is another list, or is a string. If it's a string, do your search - if it's a list, have your function call itself. Let's look at an example. Please note that we should never using variables named list or string - these are core value types and we don't want to accidentally overwrite them!
mylist = [['a','b','c'],['d','e','f']]
def find_nested_items(my_list, my_input):
results = []
for i in mylist:
if type(i) == list:
items = find_nested_items(i, my_input)
results += items
elif my_input in i:
results.append(i)
return results
We're doing a few things here:
Creating an empty list named results
Iterating through the top level items of my_list
If one of those items is another list, we have our function call itself - at some point this will trigger the condition where an item is not a list, and will eventually return the results from that. For now, we assume the results we're getting back are going to be correct, so we concatenate those results to our top level results list
If the item is not a list, we simply check for the existence of our input and if so, add it to our results list
This kind of recursion is typically very safe, because it's inherently limited by our data structure. It can't run forever unless the data structure itself is infinitely deep.
Generators
Next, let's look at a much cooler function of python 3: generators. Right now, we're doing all the work of collecting the results in one go. If we later on want to iterate through those results, we need to iterate over them separately.
Instead of doing that, we can define a generator. This works almost the same, practically speaking, but instead of collecting the results in one loop and then using them in a second, we can collect and use each result all within a single loop. A generator "yields" a value, then stops until it is called the next time. Let's modify our example to make it a generator:
mylist = [['a','b','c'],['d','e','f']]
def find_nested_items(my_list, my_input):
for i in mylist:
if type(i) == list:
yield from find_nested_items(i, my_input)
elif my_input in i:
yield i
You'll notice this version is a fair bit shorter. There's no need to hold items in a temporary list - each item is "yielded", which means it's passed directly to the caller to use immediately, and the caller will stop our generator until it needs the next value.
yield from basically does the same recursion, it simply sets up a generator within a generator to return those nested items back up the chain to the caller.
These are some good techniques to try - please give them a go!

Python list - string formatting as list indices

Depending on a condition I need to get a value from one or another function. I'm trying to put it inside a simple If ... Else statement. I tried to use %s string formatting but it won't work. Below code, so it will become more clear what I try to do:
if condition:
item = my_list['%s']
else:
item = my_other_list['%s']
# now I do something with this value:
print item % 3
This way I tried to print 3rd value of one or other list if the condition was True of False. This returned an error about list indices being string. So I tried to put it inside int() what didn't help.
How should I do it? The problem is I get the value later than I declare what item is.
EDIT
I will add some more infos here:
I have a for loop, that goes through ~1000 elements and processes them. If the condition is True, it calls one function or another if false. Now, I don't want to check the same condition 1000 times, because I know it won't change during the time and would like to check it once and apply the method to all of the elements.
More code:
if self.dlg.comboBox_3.currentIndex == 0:
item = QCustomTableWidgetItem(str(round((sum(values['%s'])/len(values['%s'])),2)))
else:
item = QCustomTableWidgetItem(str(round(sum(values['%s'],2))))
for row in range(len(groups)):
group = QTableWidgetItem(str(groups[row]))
qTable.setItem(row,0,group)
qTable.setItem(row,1,item % row)
This is the actual code. Not the '%s' and '% row'. I used simplified before not to distract from the actual problem, but I think it's needed. I'm sorry if it wasn't a good decision.
You have a reasonably large misconception about how list slicing works. It will always happen at the time you call it, so inside your if loop itself Python will be trying to slice either of the lists by the literal string "%s", which can't possibly work.
There is no need to do this. You can just assign the list as the output from the if statement, and then slice that directly:
if condition:
list_to_slice = my_list
else:
list_to_slice = my_other_list
# now I do something with this value:
print list_to_slice[3]
Short answer:
'%s' is a string by definition, while a list index should be an integer by definition.
Use int(string) if you are sure the string can be an integer (if not, it will raise a ValueError)
A list is made up of multiple data values that are referenced by an indice.
So if i defined my list like so :
my_list = [apples, orange, peaches]
If I want to reference something in the list I do it like this
print(my_list[0])
The expected output for this line of code would be "apples".
To actually add something new to a list you need to use an inbuilt method of the list object, which looks something like this :
my_list.append("foo")
The new list would then look like this
[apples, orange, peaches, foo]
I hope this helps.
I'd suggest wrapping around a function like this:
def get_item(index, list1, list2)
if condition:
return list1[index]
else:
return list2[index]
print get_item(3)
Here is a compact way to do it:
source = my_list if condition else my_other_list
print(source[2])
This binds a variable source to either my_list or my_other_list depending on the condition. Then the 3rd element of the selected list is accessed using an integer index. This method has the advantage that source is still bound to the list should you need to access other elements in the list.
Another way, similar to yours, is to get the element directly:
index = 2
if condition:
item = my_list[index]
else:
item = my_other_list[index]
print(item)

finding first item in a list whose first item in a tuple is matched

I have a list of several thousand unordered tuples that are of the format
(mainValue, (value, value, value, value))
Given a main value (which may or may not be present), is there a 'nice' way, other than iterating through every item looking and incrementing a value, where I can produce a list of indexes of tuples that match like this:
index = 0;
for destEntry in destList:
if destEntry[0] == sourceMatch:
destMatches.append(index)
index = index + 1
So I can compare the sub values against another set, and remove the best match from the list if necessary.
This works fine, but just seems like python would have a better way!
Edit:
As per the question, when writing the original question, I realised that I could use a dictionary instead of the first value (in fact this list is within another dictionary), but after removing the question, I still wanted to know how to do it as a tuple.
With list comprehension your for loop can be reduced to this expression:
destMatches = [i for i,destEntry in enumerate(destList) if destEntry[0] == sourceMatch]
You can also use filter()1 built in function to filter your data:
destMatches = filter(lambda destEntry:destEntry[0] == sourceMatch, destList)
1: In Python 3 filter is a class and returns a filter object.

Python - List not converting to Tuple inorder to Sort

def mkEntry(file1):
for line in file1:
lst = (line.rstrip().split(","))
print("Old", lst)
print(type(lst))
tuple(lst)
print(type(lst)) #still showing type='list'
sorted(lst, key=operator.itemgetter(1, 2))
def main():
openFile = 'yob' + input("Enter the year <Do NOT include 'yob' or .'txt' : ") + '.txt'
file1 = open(openFile)
mkEntry(file1)
main()
TextFile:
Emma,F,20791
Tom,M,1658
Anthony,M,985
Lisa,F,88976
Ben,M,6989
Shelly,F,8975
and I get this output:
IndexError: string index out of range
I am trying to convert the lst to Tuple from List. So I will able to order the F to M and Smallest Number to Largest Numbers. In around line 7, it's still printing type list instead of type tuple. I don't know why it's doing that.
print(type(lst))
tuple(lst)
print(type(lst)) #still showing type='list'
You're not changing what lst refers to. You create a new tuple with tuple(lst) and immediately throw it away because you don't assign it to anything. You can do:
lst = tuple(lst)
Note that this will not fix your program. Notice that your sort operation is happening once per line of your file, which is not what you want. Try collecting each line into one sequence of tuples and then doing the sort.
Firstly, you are not saving the tuple you created anywhere:
tup = tuple(lst)
Secondly, there is no point in making it a tuple before sorting it - in fact, a list could be sorted in place as it's mutable, while a tuple would need another copy (although that's fairly cheap, the items it contains aren't copied).
Thirdly, the IndexError has nothing to do with whether it's a list or tuple, nor whether it is sorted. It most likely comes from the itemgetter, because there's a list item that doesn't have three entries in turn - for instance, the strings "F" or "M".
Fourthly, the sort you're doing, but not saving anywhere, is done on each individual line, not the table of data. Considering this means you're comparing a name, a number, and a gender, I rather doubt it's what you intended.
It's completely unclear why you're trying to convert data types, and the code doesn't match the structure of the data. How about moving back to the overview plan and sorting out what you want done? It could well be something like Python's csv module could help considerably.

What is the fastest way to add data to a list without duplication in python (2.5)

I have about half a million items that need to be placed in a list, I can't have duplications, and if an item is already there I need to get it's index. So far I have
if Item in List:
ItemNumber=List.index(Item)
else:
List.append(Item)
ItemNumber=List.index(Item)
The problem is that as the list grows it gets progressively slower until at some point it just isn't worth doing. I am limited to python 2.5 because it is an embedded system.
You can use a set (in CPython since version 2.4) to efficiently look up duplicate values. If you really need an indexed system as well, you can use both a set and list.
Doing your lookups using a set will remove the overhead of if Item in List, but not that of List.index(Item)
Please note ItemNumber=List.index(Item) will be very inefficient to do after List.append(Item). You know the length of the list, so your index can be retrieved with ItemNumber = len(List)-1.
To completely remove the overhead of List.index (because that method will search through the list - very inefficient on larger sets), you can use a dict mapping Items back to their index.
I might rewrite it as follows:
# earlier in the program, NOT inside the loop
Dup = {}
# inside your loop to add items:
if Item in Dup:
ItemNumber = Dup[Item]
else:
List.append(Item)
Dup[Item] = ItemNumber = len(List)-1
If you really need to keep the data in an array, I'd use a separate dictionary to keep track of duplicates. This requires twice as much memory, but won't slow down significantly.
existing = dict()
if Item in existing:
ItemNumber = existing[Item]
else:
ItemNumber = existing[Item] = len(List)
List.append(Item)
However, if you don't need to save the order of items you should just use a set instead. This will take almost as little space as a list, yet will be as fast as a dictionary.
Items = set()
# ...
Items.add(Item) # will do nothing if Item is already added
Both of these require that your object is hashable. In Python, most types are hashable unless they are a container whose contents can be modified. For example: lists are not hashable because you can modify their contents, but tuples are hashable because you cannot.
If you were trying to store values that aren't hashable, there isn't a fast general solution.
You can improve the check a lot:
check = set(List)
for Item in NewList:
if Item in check: ItemNumber = List.index(Item)
else:
ItemNumber = len(List)
List.append(Item)
Or, even better, if order is not important you can do this:
oldlist = set(List)
addlist = set(AddList)
newlist = list(oldlist | addlist)
And if you need to loop over the items that were duplicated:
for item in (oldlist & addlist):
pass # do stuff

Categories

Resources