How can I extract elements in a list of lists and create another one in python. So, I want to get from this:
all_list = [['1 2 3 4','2 3 4 5'],['2 4 4 5', '3 4 5 5' ]]
a new list like this:
list_of_lists = [[('3','4'),('4','5')], [('4','5'),('5','5')]]
Following is what I did, and it doesn't work.
for i in xrange(len(all_lists)):
newlist=[]
for l in all_lists[i]:
mylist = l.split()
score1 = float(mylist[2])
score2 = mylist[3]
temp_list = (score1, score2)
newlist.append(temp_list)
list_of_lists.append(newlist)
Please help. Many thanks in advance.
You could use a nested list comprehension. (This assumes you want the last two "scores" out of each string):
[[tuple(l.split()[-2:]) for l in list] for list in all_list]
It could work almost as-is if you filled in the value for mylist -- right now its undefined.
Hint: use the split function on the strings to break them up into their components, and you can fill mylist with the result.
Hint 2: Make sure that newlist is set back to an empty list at some point.
Adding to eruciforms answer.
First remark, you don't need to generate the indices for the all_list list. You can just iterate over it directly:
for list in all_lists:
for item in list:
# magic stuff
Second remark, you can make your string splitting much more succinct by splicing the list:
values = item.split()[-2:] # select last two numbers.
Reducing it further using map or a list comprehension; you can make all the items a float on the fly:
# make a list from the two end elements of the splitted list.
values = [float(n) for n in item.split()[-2:]]
And tuplify the resulting list with the tuple built-in:
values = tuple([float(n) for n in item.split()[-2:]])
In the end you can collapse it all to one big list comprehension as sdolan shows.
Of course you can manually index into the results as well and create a tuple, but usually it's more verbose, and harder to change.
Took some liberties with your variable names, values would tmp_list in your example.
Related
I'm trying to convert elements in a nested list to strings before joining them with " | " using the join operator. However, the order of the last 2 elements in my list keeps getting reversed when I get the output from the generator
I've tried running the program multiple times but it always comes out reversed.
numbers = [[1,2,3],[4,5,6],[7,8,9]]
for i in numbers:
print(i)
num = {str(x) for x in i}
print(num)
Expected Output is [1,2,3].
Actual output is [1,3,2] consistently.
Any help with this would be much appreciated! :)
This isn't a generator, it's a set comprehension. The result is a set, and sets are unordered.
Use a list comprehension:
num = [str(x) for x in i]
I want to know if you have a list of strings such as:
l = ['ACGAAAG', 'CAGAAGC', 'ACCTGTT']
How do you convert it to:
O = 'ACGAAAG'
P = 'CAGAAGC'
Q = 'ACCTGTT'
Can you do this without knowing the number of items in a list? You have to store them as variables.
(The variables don't matter.)
Welcome to SE!
Structure Known
If you know the structure of the string, then you might simply unpack it:
O, P, Q = my_list
Structure Unknown
Unpack your list using a for loop. Do your work on each string inside the loop. For the below, I am simply printing each one:
for element in l:
print(element)
Good luck!
If you don't know the number of items beforehand, a list is the right structure to keep the items in.
You can, though, cut off fist few known items, and leave the unknown tail as a list:
a, b, *rest = ["ay", "bee", "see", "what", "remains"]
print("%r, %r, rest is %r" % (a, b, rest))
a,b,c = my_list
this will work as long as the numbers of elements in the list is equal to the numbers of variables you want to unpack, it actually work with any iterable, tuple, list, set, etc
if the list is longer you can always access the first 3 elements if that is what you want
a = my_list[0]
b = my_list[1]
c = my_list[2]
or in one line
a, b, c = my_list[0], my_list[1], my_list[2]
even better with the slice notation you can get a sub list of the right with the first 3 elements
a, b, c = my_list[:3]
those would work as long as the list is at least of size 3, or the numbers of variables you want
you can also use the extended unpack notation
a, b, c, *the_rest = my_list
the rest would be a list with everything else in the list other than the first 3 elements and again the list need to be of size 3 or more
And that pretty much cover all the ways to extract a certain numbers of items
Now depending of what you are going to do with those, you may be better with a regular loop
for item in my_list:
#do something with the current item, like printing it
print(item)
in each iteration item would take the value of one element in the list for you to do what you need to do one item at the time
if what you want is take 3 items at the time in each iteration, there are several way to do it
like for example
for i in range(3,len(my_list),3)
a,b,c = my_list[i-3:i]
print(a,b,c)
there are more fun construct like
it = [iter(my_list)]*3
for a,b,c in zip(*it):
print(a,b,c)
and other with the itertools module.
But now you said something interesting "so that every term is assigned to a variable" that is the wrong approach, you don't want an unknown number of variables running around that get messy very fast, you work with the list, if you want to do some work with each element it there are plenty of ways of doing it like list comprehension
my_new_list = [ some_fun(x) for x in my_list ]
or in the old way
my_new_list = []
for x in my_list:
my_new_list.append( some_fun(x) )
or if you need to work with more that 1 item at the time, combine that with some of the above
I do not know if your use case requires the strings to be stored in different variables. It usually is a bad idea.
But if you do need it, then you can use exec builtin which takes the string representation of a python statement and executes it.
list_of_strings = ['ACGAAAG', 'CAGAAGC', 'ACCTGTT']
Dynamically generate variable names equivalent to the column names in an excel sheet. (A,B,C....Z,AA,AB........,AAA....)
variable_names = ['A', 'B', 'C'] in this specific case
for vn, st in zip(variable_names, list_of_strings):
exec('{} = "{}"'.format(vn, st))
Test it out, print(A,B,C) will output the three strings and you can use A,B and C as variables in the rest of the program
Description
I have two lists of lists which are derived from CSVs (minimal working example below). The real dataset for this too large to do this manually.
mainlist = [["MH75","QF12",0,38], ["JQ59","QR21",105,191], ["JQ61","SQ48",186,284], ["SQ84","QF36",0,123], ["GA55","VA63",80,245], ["MH98","CX12",171,263]]
replacelist = [["MH75","QF12","BA89","QR29"], ["QR21","JQ59","VA51","MH52"], ["GA55","VA63","MH19","CX84"], ["SQ84","QF36","SQ08","JQ65"], ["SQ48","JQ61","QF87","QF63"], ["MH98","CX12","GA34","GA60"]]
mainlist contains a pair of identifiers (mainlist[x][0], mainlist[x][1]) and these are associated with to two integers (mainlist[x][2] and mainlist[x][3]).
replacelist is a second list of lists which also contains the same pairs of identifiers (but not in the same order within a pair, or across rows). All sublist pairs are unique. Importantly, replacelist[x][2],replacelist[x][3] corresponds to a replacement for replacelist[x][0],replacelist[x][1], respectively.
I need to create a new third list, newlist which copies mainlist but replaces the identifiers with those from replacelist[x][2],replacelist[x][3]
For example, given:
mainlist[2] is: [JQ61,SQ48,186,284]
The matching pair in replacelist is
replacelist[4]: [SQ48,JQ61,QF87,QF63]
Therefore the expected output is
newlist[2] = [QF87,QF63,186,284]
More clearly put:
if replacelist = [[A, B, C, D]]
A is replaced with C, and B is replaced with D.
but it may appear in mainlist as [[B, A]]
Note newlist row position uses the same as mainlist
Attempt
What has me totally stumped on a simple problem is I feel I can't use basic list comprehension [i for i in replacelist if i in mainlist] as the order within a pair changes, and if I sorted(list) then I lose information about what to replace the lists with. Current solution (with commented blanks):
newlist = []
for k in replacelist:
for i in mainlist:
if k[0] and k[1] in i:
# retrieve mainlist order, then use some kind of indexing to check a series of nested if statements to work out positional replacement.
As you can see, this solution is clearly inefficient and I can't work out the best way to perform the final step in a few lines.
I can add more information if this is not clear
It'll help if you had replacelist as a dict:
mainlist = [[MH75,QF12,0,38], [JQ59,QR21,105,191], [JQ61,SQ48,186,284], [SQ84,QF36,0,123], [GA55,VA63,80,245], [MH98,CX12,171,263]]
replacelist = [[MH75,QF12,BA89,QR29], [QR21,JQ59,VA51,MH52], [GA55,VA63,MH19,CX84], [SQ84,QF36,SQ08,JQ65], [SQ48,JQ61,QF87,QF63], [MH98,CX12,GA34,GA60]]
replacements = {frozenset(r[:2]):dict(zip(r[:2], r[2:])) for r in replacements}
newlist = []
for *ids, val1, val2 in mainlist:
reps = replacements[frozenset([id1, id2])]
newlist.append([reps[ids[0]], reps[ids[1]], val1, val2])
First thing you do - transform both lists in a dictionary:
from collections import OrderedDict
maindct = OrderedDict((frozenset(item[:2]),item[2:]) for item in mainlist)
replacedct = {frozenset(item[:2]):item[2:] for item in replacementlist}
# Now it is trivial to create another dict with the desired output:
output_list = [replacedct[key] + maindct[key] for key in maindct]
The big deal here is that by using a dictionary, you cancel up the search time for the indices on the replacement list - in a list you have to scan all the list for each item you have, which makes your performance worse with the square of your list length. With Python dictionaries, the search time is constant - and do not depend on the data length at all.
I have the following list:
x = [(27.3703703703704, 2.5679012345679, 5.67901234567901,
6.97530864197531, 1.90123456790123, 0.740740740740741,
0.440136054421769, 0.867718446601942),
(25.2608695652174, 1.73913043478261, 6.07246376811594,
7.3768115942029, 1.57971014492754, 0.710144927536232,
0.4875, 0.710227272727273)]
I'm looking for a way to get the average of each of the lists nested within the main list, and create a new list of the averages. So in the case of the above list, the output would be something like:
[[26.315],[2.145],[5.87],etc...]
I would like to apply this formula regardless of the amount of lists nested within the main list.
I assume your list of tuples of one-element lists is looking for the sum of each unpacked element inside the tuple, and a list of those options. If that's not what you're looking for, this won't work.
result = [sum([sublst[0] for sublst in tup])/len(tup) for tup in x]
EDIT to match changed question
result = [sum(tup)/len(tup) for tup in x]
EDIT to match your even-further changed question
result = [[sum(tup)/len(tup)] for tup in x]
An easy way to acheive this is:
means = [] # Defines a new empty list
for sublist in x: # iterates over the tuples in your list
means.append([sum(sublist)/len(sublist)]) # Put the mean of the sublist in the means list
This will work no matter how many sublists are in your list.
I would advise you read a bit on list comprehensions:
https://docs.python.org/2/tutorial/datastructures.html
It looks like you're looking for the zip function:
[sum(l)/len(l) for l in zip(*x)]
zip combines a collection of tuples or lists pairwise, which looks like what you want for your averages. then you just use sum()/len() to compute the average of each pair.
*x notation means pass the list as though it were individual arguments, i.e. as if you called: zip(x[0], x[1], ..., x[len(x)-1])
r = [[sum(i)/len(i)] for i in x]
I have no clue about Python and started to use it on some files. I managed to find out how to do all the things that I need, except for 2 things.
1st
>>>line = ['0', '1', '2', '3', '4', '5', '6']
>>>#prints all elements of line as expected
>>>print string.join(line)
0 1 2 3 4 5 6
>>>#prints the first two elements as expected
>>>print string.join(line[0:2])
0 1
>>>#expected to print the first, second, fourth and sixth element;
>>>#Raises an exception instead
>>>print string.join(line[0:2:4:6])
SyntaxError: invalid syntax
I want this to work similar to awk '{ print $1 $2 $5 $7 }'. How can I accomplish this?
2nd
how can I delete the last character of the line? There is an additional ' that I don't need.
Provided the join here is just to have a nice string to print or store as result (with a coma as separator, in the OP example it would have been whatever was in string).
line = ['A', 'B', 'C', 'D', 'E', 'F', 'G']
print ','.join (line[0:2])
A,B
print ','.join (line[i] for i in [0,1,2,4,5,6])
A,B,C,E,F,G
What you are doing in both cases is extracting a sublist from the initial list. The first one use a slice, the second one use a list comprehension. As others said you could also have accessed to elements one by one, the above syntaxes are merely shorthands for:
print ','.join ([line[0], line[1]])
A,B
print ','.join ([line[0], line[1], line[2], line[4], line[5], line[6]])
A,B,C,E,F,G
I believe some short tutorial on list slices could be helpfull:
l[x:y] is a 'slice' of list l. It will get all elements between position x (included) and position y (excluded). Positions starts at 0. If y is out of list or missing, it will include all list until the end. If you use negative numbers you count from the end of the list. You can also use a third parameter like in l[x:y:step] if you want to 'jump over' some items (not take them in the slice) with a regular interval.
Some examples:
l = range(1, 100) # create a list of 99 integers from 1 to 99
l[:] # resulting slice is a copy of the list
l[0:] # another way to get a copy of the list
l[0:99] # as we know the number of items, we could also do that
l[0:0] # a new empty list (remember y is excluded]
l[0:1] # a new list that contains only the first item of the old list
l[0:2] # a new list that contains only the first two items of the old list
l[0:-1] # a new list that contains all the items of the old list, except the last
l[0:len(l)-1] # same as above but less clear
l[0:-2] # a new list that contains all the items of the old list, except the last two
l[0:len(l)-2] # same as above but less clear
l[1:-1] # a new list with first and last item of the original list removed
l[-2:] # a list that contains the last two items of the original list
l[0::2] # odd numbers
l[1::2] # even numbers
l[2::3] # multiples of 3
If rules to get items are more complex, you'll use a list comprehension instead of a slice, but it's another subjet. That's what I use in my second join example.
You don't want to use join for that. If you just want to print some bits of a list, then specify the ones you want directly:
print '%s %s %s %s' % (line[0], line[1], line[4], line[6])
Assuming that the line variable should contain a line of cells, separated by commas...
You can use map for that:
line = "1,2,3,4,5,6"
cells = line.split(",")
indices=[0,1,4,6]
selected_elements = map( lambda i: cells[i], indices )
print ",".join(selected_elements)
The map function will do the on-the-fly function for each of the indices in the list argument. (Reorder to your liking)
You could use the following using list comprehension :
indices = [0,1,4,6]
Ipadd = string.join([line[i] for i in xrange(len(line)) if i in indices])
Note : You could also use :
Ipadd = string.join([line[i] for i in indices])
but you will need a sorted list of indices without repetition of course.
Answer to the second question:
If your string is contained in myLine, just do:
myLline = myLine[:-1]
to remove the last character.
Or you could also use rstrip():
myLine = myLine.rstrip("'")
>>> token = ':'
>>> s = '1:2:3:4:5:6:7:8:9:10'
>>> sp = s.split(token)
>>> token.join(filter(bool, map(lambda i: i in [0,2,4,6] and sp[i] or False, range(len(sp)))))
'1:3:5:7'
l = []
l.extend(line[0:2])
l.append(line[5]) # fourth field
l.append(line[7]) # sixth field
string.join(l)
Alternatively
"{l[0]} {l[1]} {l[4]} {l[5]}".format(l=line)
Please see PEP 3101 and stop using the % operator for string formatting.