I have one list, like so,
a = ['dog','cat','mouse']
I want to build a list that is a combination of the all the list elements and looks like,
ans = ['cat-dog', 'cat-mouse','dog-mouse']
This is what I came up with,
a = ['dog','cat','mouse']
ans = []
for l in (a):
t= [sorted([l,x]) for x in a if x != l]
ans.extend([x[0]+'-'+x[1] for x in t])
print list(set(sorted(ans)))
Is there a simpler and a more pythonic way!
How important is the ordering?
>>> a = ['dog','cat','mouse']
>>> from itertools import combinations
>>> ['-'.join(el) for el in combinations(a, 2)]
['dog-cat', 'dog-mouse', 'cat-mouse']
Or, to match your example:
>>> ['-'.join(el) for el in combinations(sorted(a), 2)]
['cat-dog', 'cat-mouse', 'dog-mouse']
The itertools module:
>>> import itertools
>>> map('-'.join, itertools.combinations(a, 2))
['dog-cat', 'dog-mouse', 'cat-mouse']
itertools is surely the way to go here. If you want to do it only with build-ins, use:
a = ['dog','cat','mouse']
ans = [x + '-' + y for x in a for y in a if x < y]
Related
I'm trying to speed up the following python code:
for j in range(4,len(var_s),3):
mag_list.append(float(var_s[j]))
mag_list = [value for value in mag_list if value != 99.]
med_mag = np.median(mag_list)
Is there a nice way to combine the two for-loops into one? This way, it is really slow. What I need is to extract every third entry from the var_s list, beginning with the fifths, if the value of that entry is not equal to 99. Of the resulting list, I need the median.
Thanks!
You could probably try:
mag_list = [value for value in var_s[4::3] if value != 99.]
depending on var_s, you might do better using itertools.islice(var_s,4,None,3), but that would definitely need to be timed to know.
Perhaps you'd do even better if you stuck with numpy the whole way:
vs = np.array(var_s[4::3],dtype=np.float64) #could slice after array conversion too ...
med_mag = np.median(vs[vs!=99.])
Again, this would need to be timed to see how it performed relative to the others.
mag_list = filter(lambda x: x != 99, var_s[4::3])
Ok so, here are some timeit trials, all in Python 2.7.2:
The setup:
>>> from random import seed, random
>>> from timeit import Timer
>>> from itertools import islice, ifilter, imap
>>> seed(1234); var_s = [random() for _ in range(100)]
Using a for loop:
>>> def using_for_loop():
... mag_list = []
... for j in xrange(4, len(var_s), 3):
... value = float(var_s[j])
... if value != 99: mag_list.append(value)
...
>>> Timer(using_for_loop).timeit()
11.596584796905518
Using map and filter:
>>> def using_map_filter():
... map(float, filter(lambda x: x != 99, var_s[4::3]))
...
>>> Timer(using_map_filter).timeit()
8.643505096435547
Using islice, imap, ifilter:
>>> def using_itertools():
... list(imap(float, ifilter(lambda x: x != 99, islice(var_s, 4, None, 3))))
...
>>> Timer(using_itertools).timeit()
11.311019897460938
Using a list comprehension and islice:
>>> def using_list_comp():
... [float(v) for v in islice(var_s, 4, None, 3) if v != 99]
...
>>> Timer(using_list_comp).timeit()
8.52650499343872
>>>
In conclusion, using a list comprehension with islice is the fastest, followed by the only slightly slower use of map and filter.
for j in range(4,len(var_s),3):
value = float(var_s[j])
if value != 99:
mag_list.append(value)
med_mag = np.median(mag_list)
I need to concatenate an item from a list with an item from another list. In my case the item is a string (a path more exactly). After the concatenation I want to obtain a list with all the possible items resulted from concatenation.
Example:
list1 = ['Library/FolderA/', 'Library/FolderB/', 'Library/FolderC/']
list2 = ['FileA', 'FileB']
I want to obtain a list like this:
[
'Library/FolderA/FileA',
'Library/FolderA/FileB',
'Library/FolderB/FileA',
'Library/FolderB/FileB',
'Library/FolderC/FileA',
'Library/FolderC/FileB'
]
Thank you!
In [11]: [d+f for (d,f) in itertools.product(list1, list2)]
Out[11]:
['Library/FolderA/FileA',
'Library/FolderA/FileB',
'Library/FolderB/FileA',
'Library/FolderB/FileB',
'Library/FolderC/FileA',
'Library/FolderC/FileB']
or, slightly more portably (and perhaps robustly):
In [16]: [os.path.join(*p) for p in itertools.product(list1, list2)]
Out[16]:
['Library/FolderA/FileA',
'Library/FolderA/FileB',
'Library/FolderB/FileA',
'Library/FolderB/FileB',
'Library/FolderC/FileA',
'Library/FolderC/FileB']
You can use a list comprehension:
>>> [d + f for d in list1 for f in list2]
['Library/FolderA/FileA', 'Library/FolderA/FileB', 'Library/FolderB/FileA', 'Library/FolderB/FileB', 'Library/FolderC/FileA', 'Library/FolderC/FileB']
You may want to use os.path.join() instead of simple concatenation though.
The built-in itertools module defines a product() function for this:
import itertools
result = itertools.product(list1, list2)
The for loop can do this easily:
my_list, combo = [], ''
list1 = ['Library/FolderA/', 'Library/FolderB/', 'Library/FolderC/']
list2 = ['FileA', 'FileB']
for x in list1:
for y in list2:
combo = x + y
my_list.append(combo)
return my_list
You can also just print them:
list1 = ['Library/FolderA/', 'Library/FolderB/', 'Library/FolderC/']
list2 = ['FileA', 'FileB']
for x in list1:
for y in list2:
print str(x + y)
I'm doing this but it feels this can be achieved with much less code. It is Python after all. Starting with a list, I split that list into subsets based on a string prefix.
# Splitting a list into subsets
# expected outcome:
# [['sub_0_a', 'sub_0_b'], ['sub_1_a', 'sub_1_b']]
mylist = ['sub_0_a', 'sub_0_b', 'sub_1_a', 'sub_1_b']
def func(l, newlist=[], index=0):
newlist.append([i for i in l if i.startswith('sub_%s' % index)])
# create a new list without the items in newlist
l = [i for i in l if i not in newlist[index]]
if len(l):
index += 1
func(l, newlist, index)
func(mylist)
You could use itertools.groupby:
>>> import itertools
>>> mylist = ['sub_0_a', 'sub_0_b', 'sub_1_a', 'sub_1_b']
>>> for k,v in itertools.groupby(mylist,key=lambda x:x[:5]):
... print k, list(v)
...
sub_0 ['sub_0_a', 'sub_0_b']
sub_1 ['sub_1_a', 'sub_1_b']
or exactly as you specified it:
>>> [list(v) for k,v in itertools.groupby(mylist,key=lambda x:x[:5])]
[['sub_0_a', 'sub_0_b'], ['sub_1_a', 'sub_1_b']]
Of course, the common caveats apply (Make sure your list is sorted with the same key you're using to group), and you might need a slightly more complicated key function for real world data...
In [28]: mylist = ['sub_0_a', 'sub_0_b', 'sub_1_a', 'sub_1_b']
In [29]: lis=[]
In [30]: for x in mylist:
i=x.split("_")[1]
try:
lis[int(i)].append(x)
except:
lis.append([])
lis[-1].append(x)
....:
In [31]: lis
Out[31]: [['sub_0_a', 'sub_0_b'], ['sub_1_a', 'sub_1_b']]
Use itertools' groupby:
def get_field_sub(x): return x.split('_')[1]
mylist = sorted(mylist, key=get_field_sub)
[ (x, list(y)) for x, y in groupby(mylist, get_field_sub)]
to be able to work with bimodal lists etc.
my attempts so far:
testlist = [1,2,3,3,2,1,4,2,2,3,4,3,3,4,5,3,2,4,55,6,7,4,3,45,543,4,53,4,53,234]
.
from collections import Counter
def modal_1(xs):
cntr = Counter(xs).most_common()
val,count = cntr[0]
return (v for v,c in cntr if c is count)
print(list(modal_1(testlist)))
>>> [3, 4]
-- or perhaps something like --
def modal_2(xs):
cntr = Counter(xs).most_common()
val,count = cntr[0]
return takewhile(lambda x: x[1] is count, cntr)
print(list(modal_2(testlist)))
>>> [(3, 7), (4, 7)]
Please do not answer - use numpy etc.
note :
Counter(xs).most_common(1)
returns the first 'modal' of n modal values. If there are two. It will only return the first. Which is a shame... because that would make this a whole lot easier.
ok, so I was actually quite surprised that one of my original options is actually a good way to do this. for anyone now wanting to find n modal numbers in a list, I would suggest the following options. Both of these functions work well on lists with over 1000 values
All of these return lists of (number,count), where count will be identical for all tuples. I think it is better to have this and then parse it to your hearts desire.
using takewhile:
from collections import Counter
from itertools import takewhile
def modal_3(xs):
counter = Counter(xs).most_common()
mx = counter[0][1]
return takewhile(lambda x: x[1] == mx, counter)
print(list(modal_3(testlist)))
>>> [(3, 7), (4, 7)]
using groupby:
from collections import Counter
from itertools import groupby
from operator import itemgetter
def modal_4(xs):
container = Counter(xs)
return next(groupby(container.most_common(), key=itemgetter(1)))[1]
print(list(modal_4(testlist)))
>>> [(3, 7), (4, 7)]
and the final, pythonic, and fastest way:
def modal_5(xs):
def _mode(xs):
for x in xs:
if x[1] != xs[0][1]:
break
yield x
counter = collections.Counter(xs).most_common()
return [ x for x in _mode(counter) ]
thank you to everyone for the help and information.
I think your second example is best, with some minor modification:
from itertools import takewhile
from collections import Counter
def modal(xs):
counter = Counter(xs).most_common()
_, count = counter[0]
return takewhile(lambda x: x[1] == count, counter)
The change here is to use == rather than is - is checks for identity, which while true for some values as Python does some magic with ints in the background to cache them, won't be true all of the time, and shouldn't be relied upon in this case.
>>> a = 1
>>> a is 1
True
>>> a = 300
>>> a is 300
False
>>> testlist = [1,2,3,3,2,1,4,2,2,3,4,3,3,4,5,3,2,4,55,6,7,4,3,45,543,4,53,4,53,234]
>>> dic={x:testlist.count(x) for x in set(testlist)}
>>> [x for x in dic if dic[x]==max(dic.values())]
[3, 4]
What? takewhile but no groupby?
>>> from collections import Counter
>>> testlist = [1,2,3,3,2,1,4,2,2,3,4,3,3,4,5,3,2,4,55,6,7,4,3,45,543,4,53,4,53,234]
>>> cntr = Counter(testlist)
>>> from itertools import groupby
>>> list(x[0] for x in next(groupby(cntr.most_common(), key=lambda x:x[1]))[1])
[3, 4]
Consider this:
list = 2*[2*[0]]
for y in range(0,2):
for x in range(0,2):
if x ==0:
list[x][y]=1
else:
list[x][y]=2
print list
Result:
[[2,2],[2,2]]
Why doesn't the result be [[1,1],[2,2]]?
Because you are creating a list that is two references to the same sublist
>>> L = 2*[2*[0]]
>>> id(L[0])
3078300332L
>>> id(L[1])
3078300332L
so changes to L[0] will affect L[1] because they are the same list
The usual way to do what you want would be
>>> L = [[0]*2 for x in range(2)]
>>> id(L[0])
3078302124L
>>> id(L[1])
3078302220L
notice that L[0] and L[1] are now distinct
Alternatively to save space:
>>> [[x,x] for x in xrange(1,3)]