Description
I have two lists of lists which are derived from CSVs (minimal working example below). The real dataset for this too large to do this manually.
mainlist = [["MH75","QF12",0,38], ["JQ59","QR21",105,191], ["JQ61","SQ48",186,284], ["SQ84","QF36",0,123], ["GA55","VA63",80,245], ["MH98","CX12",171,263]]
replacelist = [["MH75","QF12","BA89","QR29"], ["QR21","JQ59","VA51","MH52"], ["GA55","VA63","MH19","CX84"], ["SQ84","QF36","SQ08","JQ65"], ["SQ48","JQ61","QF87","QF63"], ["MH98","CX12","GA34","GA60"]]
mainlist contains a pair of identifiers (mainlist[x][0], mainlist[x][1]) and these are associated with to two integers (mainlist[x][2] and mainlist[x][3]).
replacelist is a second list of lists which also contains the same pairs of identifiers (but not in the same order within a pair, or across rows). All sublist pairs are unique. Importantly, replacelist[x][2],replacelist[x][3] corresponds to a replacement for replacelist[x][0],replacelist[x][1], respectively.
I need to create a new third list, newlist which copies mainlist but replaces the identifiers with those from replacelist[x][2],replacelist[x][3]
For example, given:
mainlist[2] is: [JQ61,SQ48,186,284]
The matching pair in replacelist is
replacelist[4]: [SQ48,JQ61,QF87,QF63]
Therefore the expected output is
newlist[2] = [QF87,QF63,186,284]
More clearly put:
if replacelist = [[A, B, C, D]]
A is replaced with C, and B is replaced with D.
but it may appear in mainlist as [[B, A]]
Note newlist row position uses the same as mainlist
Attempt
What has me totally stumped on a simple problem is I feel I can't use basic list comprehension [i for i in replacelist if i in mainlist] as the order within a pair changes, and if I sorted(list) then I lose information about what to replace the lists with. Current solution (with commented blanks):
newlist = []
for k in replacelist:
for i in mainlist:
if k[0] and k[1] in i:
# retrieve mainlist order, then use some kind of indexing to check a series of nested if statements to work out positional replacement.
As you can see, this solution is clearly inefficient and I can't work out the best way to perform the final step in a few lines.
I can add more information if this is not clear
It'll help if you had replacelist as a dict:
mainlist = [[MH75,QF12,0,38], [JQ59,QR21,105,191], [JQ61,SQ48,186,284], [SQ84,QF36,0,123], [GA55,VA63,80,245], [MH98,CX12,171,263]]
replacelist = [[MH75,QF12,BA89,QR29], [QR21,JQ59,VA51,MH52], [GA55,VA63,MH19,CX84], [SQ84,QF36,SQ08,JQ65], [SQ48,JQ61,QF87,QF63], [MH98,CX12,GA34,GA60]]
replacements = {frozenset(r[:2]):dict(zip(r[:2], r[2:])) for r in replacements}
newlist = []
for *ids, val1, val2 in mainlist:
reps = replacements[frozenset([id1, id2])]
newlist.append([reps[ids[0]], reps[ids[1]], val1, val2])
First thing you do - transform both lists in a dictionary:
from collections import OrderedDict
maindct = OrderedDict((frozenset(item[:2]),item[2:]) for item in mainlist)
replacedct = {frozenset(item[:2]):item[2:] for item in replacementlist}
# Now it is trivial to create another dict with the desired output:
output_list = [replacedct[key] + maindct[key] for key in maindct]
The big deal here is that by using a dictionary, you cancel up the search time for the indices on the replacement list - in a list you have to scan all the list for each item you have, which makes your performance worse with the square of your list length. With Python dictionaries, the search time is constant - and do not depend on the data length at all.
I have the following list:
x = [(27.3703703703704, 2.5679012345679, 5.67901234567901,
6.97530864197531, 1.90123456790123, 0.740740740740741,
0.440136054421769, 0.867718446601942),
(25.2608695652174, 1.73913043478261, 6.07246376811594,
7.3768115942029, 1.57971014492754, 0.710144927536232,
0.4875, 0.710227272727273)]
I'm looking for a way to get the average of each of the lists nested within the main list, and create a new list of the averages. So in the case of the above list, the output would be something like:
[[26.315],[2.145],[5.87],etc...]
I would like to apply this formula regardless of the amount of lists nested within the main list.
I assume your list of tuples of one-element lists is looking for the sum of each unpacked element inside the tuple, and a list of those options. If that's not what you're looking for, this won't work.
result = [sum([sublst[0] for sublst in tup])/len(tup) for tup in x]
EDIT to match changed question
result = [sum(tup)/len(tup) for tup in x]
EDIT to match your even-further changed question
result = [[sum(tup)/len(tup)] for tup in x]
An easy way to acheive this is:
means = [] # Defines a new empty list
for sublist in x: # iterates over the tuples in your list
means.append([sum(sublist)/len(sublist)]) # Put the mean of the sublist in the means list
This will work no matter how many sublists are in your list.
I would advise you read a bit on list comprehensions:
https://docs.python.org/2/tutorial/datastructures.html
It looks like you're looking for the zip function:
[sum(l)/len(l) for l in zip(*x)]
zip combines a collection of tuples or lists pairwise, which looks like what you want for your averages. then you just use sum()/len() to compute the average of each pair.
*x notation means pass the list as though it were individual arguments, i.e. as if you called: zip(x[0], x[1], ..., x[len(x)-1])
r = [[sum(i)/len(i)] for i in x]
So if list:
a = [[1],[2],[3],[4],[5]]
How do I can I get python to return the number of lists in list 'a'?
The type of the items in a list doesn't matter; the number of items in the list is len(a).
If there can be items other than lists in a and you want to find out how many lists there are, as opposed to the other kinds of items, try:
sum(isinstance(item, list) for item in a)
You can use len function to get count of number of elements in your list object.
And if you have multiple types of element in your list object, you can filter list type element using built-in filter function.
>>> a = [[1],[2],[3],[4],[5]]
>>> len(filter(lambda x: isinstance(x, list), a))
5
I have a large list
[[1,.., ..],[2,...,...],[5,...,...],[1,...,...]]
I need to remove all elements that have the same first value. (keep only once)
How to do it most efficiently?
Keep a set of the first values seen so far, and only keep sublists if their first value isn't in the set.
Because set.add always returns None, keys.add(sublist[0]) or sublist is the same as None or sublist which is the same as sublist, so it doesn't affect what gets kept in the list, while allowing you to add values to the set inside a list comprehension.
keys = set()
biglist = [keys.add(sublist[0]) or sublist
for sublist in biglist
if sublist[0] not in keys]
del keys # if you don't need it any more
If the sequence of the list does not matter, you can try this:
dict([(sublist[0], sublist) for sublist in biglist]).values()
or
dict([(sublist[0], sublist) for sublist in reversed(biglist)]).values()
The difference is that the second one returns the first list for the same first value, the other one does oppositely.
I have a list, e.g.
l = ['abc34','def987','ghij','klmno','pqrstuvwxyz1234567','98765','43','210abc']
How can I get all the elements in the list before the occurrance of the longest element and not the ones that come after?
This is one way:
l = ['abc34','def987','ghij','klmno','pqrstuvwxyz1234567','98765','43','210abc']
new_list = l[:l.index(max(l, key=len))]
This works:
lst = ['abc34','def987','ghij','klmno','pqrstuvwxyz1234567','98765','43','210abc']
idx, maxLenStr = max(enumerate(lst), key=lambda x:len(x[1]))
sublist = lst[:idx]
It only iterates through the list once for determining the maximum length, whereas using max() and then index() iterates twice over all the elements. It also stores the string with the maximum length in maxLenStr and the index where it was found in idx, just in case.