Problems with ordinating elements according to their occurrences - python

I have to create this function that has as inputs a String and a list of strings; and as output a list of the indices of strings that contain the String. I have done it, but then I should ordinate the indices according to the occurrences of the String in the strings. How can i do that? This is my code:
I added the 'count' under 'if' to count the occurrences, how can i use it to ordinate the indices according to that?

You can add a list of counts in each string to your function,
def function(s,lst):
l=[]
counts = []
for i in range(len(lst)):
if s in lst[i]:
counts += [lst[i].count(s)]
l += [i]
return l, counts
Here counts is a list in which each entry is the count of occurrences of s in the string in your input list. The function now returns two lists in a tuple, for example with the first tuple element being l and the second being counts. Note that i=-1 is redundant here as i is an element of the iterable made with range and assigning a value to it before the loop doesn't change it's loop value.
You can now sort the first list based on the second list using a line modified from this post,
out_fun = function(s,inp)
out = [x for x,_ in sorted(zip(out_fun[0],out_fun[1]), key = lambda x: x[1], reverse=True)]
inp is the list of strings, for example inp = ["hello", "cure", "access code"]. out_fun is the return tuple of two lists from the function function. s is the string of interest - here as in your original example it is 'c'.
What this line does is that it first creates a list of tuples using zip, where each first element of the tuple is is element from the list of indices and the second is from the list of occurrences. The program then sorts the tuples based on the second element in reverse order (largest first). The list comprehension fetches only the first element from each tuple in the sorted result, which is again the index list.
If you have questions about this solution, feel free to ask. You have a Python 2.7 tag - in Python 3.X you would need to use list(zip()) as zip returns a zip object rather than a list.
This is a more concise version of your program:
def function(s,lst):
t = [(i,x.count(s)) for i,x in enumerate(lst) if s in x]
return t
It uses a list comprehension to create and return a list of tuples t with first element being the index of the string that has the character s and second being the count. This is not necessarily more efficient, that would need to be checked. But it's a clean one-liner that at least to me is more readable.
The list of tuples can than be sorted in a similar way to the previous program, based on second tuple element i.e. count,
out_fun = function(s,inp)
out = [x for x,_ in sorted(out_fun, key = lambda x: x[1], reverse=True)]

Related

Python: replace values of sublist, with values looked up from another sublist without indexing

Description
I have two lists of lists which are derived from CSVs (minimal working example below). The real dataset for this too large to do this manually.
mainlist = [["MH75","QF12",0,38], ["JQ59","QR21",105,191], ["JQ61","SQ48",186,284], ["SQ84","QF36",0,123], ["GA55","VA63",80,245], ["MH98","CX12",171,263]]
replacelist = [["MH75","QF12","BA89","QR29"], ["QR21","JQ59","VA51","MH52"], ["GA55","VA63","MH19","CX84"], ["SQ84","QF36","SQ08","JQ65"], ["SQ48","JQ61","QF87","QF63"], ["MH98","CX12","GA34","GA60"]]
mainlist contains a pair of identifiers (mainlist[x][0], mainlist[x][1]) and these are associated with to two integers (mainlist[x][2] and mainlist[x][3]).
replacelist is a second list of lists which also contains the same pairs of identifiers (but not in the same order within a pair, or across rows). All sublist pairs are unique. Importantly, replacelist[x][2],replacelist[x][3] corresponds to a replacement for replacelist[x][0],replacelist[x][1], respectively.
I need to create a new third list, newlist which copies mainlist but replaces the identifiers with those from replacelist[x][2],replacelist[x][3]
For example, given:
mainlist[2] is: [JQ61,SQ48,186,284]
The matching pair in replacelist is
replacelist[4]: [SQ48,JQ61,QF87,QF63]
Therefore the expected output is
newlist[2] = [QF87,QF63,186,284]
More clearly put:
if replacelist = [[A, B, C, D]]
A is replaced with C, and B is replaced with D.
but it may appear in mainlist as [[B, A]]
Note newlist row position uses the same as mainlist
Attempt
What has me totally stumped on a simple problem is I feel I can't use basic list comprehension [i for i in replacelist if i in mainlist] as the order within a pair changes, and if I sorted(list) then I lose information about what to replace the lists with. Current solution (with commented blanks):
newlist = []
for k in replacelist:
for i in mainlist:
if k[0] and k[1] in i:
# retrieve mainlist order, then use some kind of indexing to check a series of nested if statements to work out positional replacement.
As you can see, this solution is clearly inefficient and I can't work out the best way to perform the final step in a few lines.
I can add more information if this is not clear
It'll help if you had replacelist as a dict:
mainlist = [[MH75,QF12,0,38], [JQ59,QR21,105,191], [JQ61,SQ48,186,284], [SQ84,QF36,0,123], [GA55,VA63,80,245], [MH98,CX12,171,263]]
replacelist = [[MH75,QF12,BA89,QR29], [QR21,JQ59,VA51,MH52], [GA55,VA63,MH19,CX84], [SQ84,QF36,SQ08,JQ65], [SQ48,JQ61,QF87,QF63], [MH98,CX12,GA34,GA60]]
replacements = {frozenset(r[:2]):dict(zip(r[:2], r[2:])) for r in replacements}
newlist = []
for *ids, val1, val2 in mainlist:
reps = replacements[frozenset([id1, id2])]
newlist.append([reps[ids[0]], reps[ids[1]], val1, val2])
First thing you do - transform both lists in a dictionary:
from collections import OrderedDict
maindct = OrderedDict((frozenset(item[:2]),item[2:]) for item in mainlist)
replacedct = {frozenset(item[:2]):item[2:] for item in replacementlist}
# Now it is trivial to create another dict with the desired output:
output_list = [replacedct[key] + maindct[key] for key in maindct]
The big deal here is that by using a dictionary, you cancel up the search time for the indices on the replacement list - in a list you have to scan all the list for each item you have, which makes your performance worse with the square of your list length. With Python dictionaries, the search time is constant - and do not depend on the data length at all.

Put average of nested list values into new list

I have the following list:
x = [(27.3703703703704, 2.5679012345679, 5.67901234567901,
6.97530864197531, 1.90123456790123, 0.740740740740741,
0.440136054421769, 0.867718446601942),
(25.2608695652174, 1.73913043478261, 6.07246376811594,
7.3768115942029, 1.57971014492754, 0.710144927536232,
0.4875, 0.710227272727273)]
I'm looking for a way to get the average of each of the lists nested within the main list, and create a new list of the averages. So in the case of the above list, the output would be something like:
[[26.315],[2.145],[5.87],etc...]
I would like to apply this formula regardless of the amount of lists nested within the main list.
I assume your list of tuples of one-element lists is looking for the sum of each unpacked element inside the tuple, and a list of those options. If that's not what you're looking for, this won't work.
result = [sum([sublst[0] for sublst in tup])/len(tup) for tup in x]
EDIT to match changed question
result = [sum(tup)/len(tup) for tup in x]
EDIT to match your even-further changed question
result = [[sum(tup)/len(tup)] for tup in x]
An easy way to acheive this is:
means = [] # Defines a new empty list
for sublist in x: # iterates over the tuples in your list
means.append([sum(sublist)/len(sublist)]) # Put the mean of the sublist in the means list
This will work no matter how many sublists are in your list.
I would advise you read a bit on list comprehensions:
https://docs.python.org/2/tutorial/datastructures.html
It looks like you're looking for the zip function:
[sum(l)/len(l) for l in zip(*x)]
zip combines a collection of tuples or lists pairwise, which looks like what you want for your averages. then you just use sum()/len() to compute the average of each pair.
*x notation means pass the list as though it were individual arguments, i.e. as if you called: zip(x[0], x[1], ..., x[len(x)-1])
r = [[sum(i)/len(i)] for i in x]

How do I work out how many lists are in a list?

So if list:
a = [[1],[2],[3],[4],[5]]
How do I can I get python to return the number of lists in list 'a'?
The type of the items in a list doesn't matter; the number of items in the list is len(a).
If there can be items other than lists in a and you want to find out how many lists there are, as opposed to the other kinds of items, try:
sum(isinstance(item, list) for item in a)
You can use len function to get count of number of elements in your list object.
And if you have multiple types of element in your list object, you can filter list type element using built-in filter function.
>>> a = [[1],[2],[3],[4],[5]]
>>> len(filter(lambda x: isinstance(x, list), a))
5

How do I remove partial duplicates from an unsorted list in Python?

I have a large list
[[1,.., ..],[2,...,...],[5,...,...],[1,...,...]]
I need to remove all elements that have the same first value. (keep only once)
How to do it most efficiently?
Keep a set of the first values seen so far, and only keep sublists if their first value isn't in the set.
Because set.add always returns None, keys.add(sublist[0]) or sublist is the same as None or sublist which is the same as sublist, so it doesn't affect what gets kept in the list, while allowing you to add values to the set inside a list comprehension.
keys = set()
biglist = [keys.add(sublist[0]) or sublist
for sublist in biglist
if sublist[0] not in keys]
del keys # if you don't need it any more
If the sequence of the list does not matter, you can try this:
dict([(sublist[0], sublist) for sublist in biglist]).values()
or
dict([(sublist[0], sublist) for sublist in reversed(biglist)]).values()
The difference is that the second one returns the first list for the same first value, the other one does oppositely.

Python:How can i get all the elements in a list before the longest element?

I have a list, e.g.
l = ['abc34','def987','ghij','klmno','pqrstuvwxyz1234567','98765','43','210abc']
How can I get all the elements in the list before the occurrance of the longest element and not the ones that come after?
This is one way:
l = ['abc34','def987','ghij','klmno','pqrstuvwxyz1234567','98765','43','210abc']
new_list = l[:l.index(max(l, key=len))]
This works:
lst = ['abc34','def987','ghij','klmno','pqrstuvwxyz1234567','98765','43','210abc']
idx, maxLenStr = max(enumerate(lst), key=lambda x:len(x[1]))
sublist = lst[:idx]
It only iterates through the list once for determining the maximum length, whereas using max() and then index() iterates twice over all the elements. It also stores the string with the maximum length in maxLenStr and the index where it was found in idx, just in case.

Categories

Resources