I'm trying to create a list from a permutation of a str object. However the resultant list has duplicates. I have the following code:
from itertools import permutations
a = permutations('144')
b = [''.join(i) for i in a]
print(b)
What am I doing wrong? I'm getting the following:
['144', '144', '414', '441', '414', '441']
No. That is the expected result because there are duplicate characters in your input string.
If all you are interested are the elements of the permutation then pass your list through set. If instead you NEED a list, pass it through list again
Example:
from itertools import permutations
a = permutations('144')
b = set(''.join(i) for i in a)
c = list(set(''.join(i) for i in a)) # note that I've removed square brackets
print(b)
print(c)
Protip: use generator expressions wherever possible
You have duplicate elements (characters) in your string. The function permutations will not distinguish between them.
You will have no duplicates if you iterate the permutations of the set of the characters, e.g.:
from itertools import permutations
a = permutations(set('144'))
b = [''.join(i) for i in a]
print(b)
Or, if you want to iterate over the disctinct permutations of the string containing even duplicates of the same characters, you can use the set of the permutations, like:
from itertools import permutations
a = permutations('144')
b = [''.join(i) for i in set(a)]
print(b)
Related
I have a list of strings as such:
['text_1.jpg', 'othertext_1.jpg', 'text_2.jpg', 'othertext_2.jpg', ...]
In reality, there are more entries than 2 per number but this is the general format. I would like to split this list into list of lists as such:
[['text_1.jpg', 'othertext_1.jpg'], ['text_2.jpg', 'othertext_2.jpg'], ...]
These sub-lists being based on the integer after the underscore. My current method to do so is to first sort the list based on the numbers as shown in the first list sample above and then iterate through each index and copy the values into new lists if it matches the value of the previous integer.
I am wondering if there is a simpler more pythonic way of performing this task.
Try:
import re
lst = ["text_1.jpg", "othertext_1.jpg", "text_2.jpg", "othertext_2.jpg"]
r = re.compile(r"_(\d+)\.jpg")
out = {}
for val in lst:
num = r.search(val).group(1)
out.setdefault(num, []).append(val)
print(list(out.values()))
Prints:
[['text_1.jpg', 'othertext_1.jpg'], ['text_2.jpg', 'othertext_2.jpg']]
Similiar solution to #Andrej:
import itertools
import re
def find_number(s):
# it is said that python will compile regex automatically
# feel free to compile first
return re.search(r'_(\d+)\.jpg', s).group(1)
l = ['text_1.jpg', 'othertext_1.jpg', 'text_2.jpg', 'othertext_2.jpg']
res = [list(v) for k, v in itertools.groupby(l, find_number)]
print(res)
#[['text_1.jpg', 'othertext_1.jpg'], ['text_2.jpg', 'othertext_2.jpg']]
I have two lists a and b:
a = ['146769015', '163081689', '172235774', ...]
b = [['StackOverflow (146769015)'], ['StackOverflow (146769015)'], ['StackOverflow (163081689)'], ...]
What I'm trying to do is to check if the elements of list a are in list b, and if they are, how many times they appear.
In this case the output should be:
'146769015':2
'163081689':1
I've already tried the set() function but that does not seem to work
print(set(a)&set(b))
And i get this
print(set(a)&set(b))
TypeError: unhashable type: 'list'
Is it possible to do what i want?
Thank you all.
When you perform set(a) & set(b), you're trying to see which elements both lists share. There are a couple errors in your logic.
First, your first list is comprised of strings. Your second list is comprised of lists.
Second, the elements of your second list are never the same than your first list, because the first has only numbers, and the second has numbers and letters.
Third, even if you only extract the numbers, the intersection of both sets will bring which numbers are on both sets, but not how many times.
A good approach might be to extract the numbers in your second list and then count occurrences if they are present in list a:
from collections import Counter
import re
a=['146769015', '163081689', '172235774']
b=[['StackOverflow (146769015)'],['StackOverflow (146769015)'],['StackOverflow (163081689)']]
numbs = [re.search('\d+', elem[0]).group(0) for elem in b]
cnt = Counter()
for n in numbs:
if n in a:
cnt[n]+= 1
Output:
Counter({'146769015': 2, '163081689': 1})
I'll leave as homework to you to research what are dictionaries and Counters.
It's tricky when you have a string as a subset of strings, otherwise I think you could use a Counter from collections and iterate that using a as a key.
Otherwise you can flatten the list and nested loop through it.
from collections import defaultdict
flat_list = [item for sublist in b for item in sublist]
c = defaultdict(lambda: 0)
for string in a:
for string2 in flat_list:
if string in string2:
c[string] += 1
You can use a dictionary:
a=['146769015', '163081689', '172235774']
b=[['StackOverflow (146769015)'],['StackOverflow (146769015)'],['StackOverflow (163081689)']]
c = {}
for s in a:
for d in b:
for i in d:
if s in i:
if s not in c:
c[s] = 1
else:
c[s] += 1
print(c)
Output:
{'146769015': 2, '163081689': 1}
This question already has answers here:
How to get the cartesian product of multiple lists
(17 answers)
Closed 8 months ago.
I have two lists of strings:
letters = ['abc', 'def', 'ghi']
numbers = ['123', '456']
I want to for loop through them to create a list of strings that is not parallel, so zip() doesn't work here.
Desired outcome:
result = ['abc123', 'def123', 'ghi123', 'abc456', 'def456', 'ghi456']
The order of the elements in the result is irrelevant.
Any ideas?
You can try list comprehension with two nested for loop over numbers and then letters :
print([l+n for n in numbers for l in letters])
# ['abc123', 'def123', 'ghi123', 'abc456', 'def456', 'ghi456']
You can also use nested for loop:
out = []
for n in numbers:
for l in letters:
out.append(l+n)
print(out)
# ['abc123', 'def123', 'ghi123', 'abc456', 'def456', 'ghi456']
For more details on list comprehension, see either the doc or this related topic.
Take the product of numbers and letters (rather than letters and numbers), but then join the resulting tuples in reverse order.
>>> from itertools import product
>>> [''.join([y, x]) for x, y in product(numbers, letters)]
['abc123', 'def123', 'ghi123', 'abc456', 'def456', 'ghi456']
For 2-tuples, y + x would be sufficient rather than using ''.join.
The product of two lists is just the set of all possible tuples created by taking an element from the first list and an element from the second list, in that order.
>>> list(product(numbers, letters))
[('123', 'abc'), ('123', 'def'), ('123', 'ghi'), ('456', 'abc'), ('456', 'def'), ('456', 'ghi')]
Given your lists of prefixes letters and suffixes numbers that have to be combined
letters = ['abc', 'def', 'ghi']
numbers = ['123', '456']
Basic
The first solution that comes to mind (especially if you are new to Python) is using nested loops
result = []
for s in letters:
for n in numbers:
result.append(s+n)
and since - as you said - order is irrelevant, also the following will be a valid solution
result = []
for n in numbers:
for s in letters:
result.append(s+n)
The most important downside of both is that you need to define the result variable before in a way that looks a bit weak.
Advanced
If you switch to list comprehension you can eliminate that extra line
result = [s+n for n in numbers for s in letters]
Expert
Mathematically spoken, you are creating the Cartesian product of numbers and letters. Python provides a function for exact that purpose by itertools.product (which, by the way, also eliminates the double fors)
from itertools import product
result = [''.join(p) for p in product(letters, numbers)]
this may look like overkill in your very example, but as soon as it comes to more components for building results, it may be a big difference, and all tools presented here but itertools.product will tend to explode then.
For illustration, I conclude with an example that loops over prefixes, infixes, and postfixes:
print([''.join(p) for p in product('ab', '+-', '12')])
that gives this output:
['a+1', 'a+2', 'a-1', 'a-2', 'b+1', 'b+2', 'b-1', 'b-2']
Thank you for your answers! I simplified the case, so all of the above solutions work well, however in the real problem I'm working on I want to add more lines of code in between that would iterate through both lists. I completed it nesting those for loops:
for letter in letters:
for number in numbers:
print(letter+number)
# many many lines of more code
Anyway, thanks a lot for your help!
I have two strings:
a ='hellowww'
b ='world'
Expected Output
c = 'hweolrllodwww'
My code:
for x,y in zip(a,b):
print(x,y)
Its not working in my case.
Note : Length of two strings ,may not be same.
zip stops when the shortest iterable is traversed. You can use itertool module instead via chain and zip_longest:
from itertools import chain, zip_longest
res = ''.join(chain.from_iterable(zip_longest(a, b, fillvalue='')))
# 'hweolrllodwww'
So SO, i am trying to "merge" a string (a) and a list of strings (b):
a = '1234'
b = ['+', '-', '']
to get the desired output (c):
c = '1+2-34'
The characters in the desired output string alternate in terms of origin between string and list. Also, the list will always contain one element less than characters in the string. I was wondering what the fastest way to do this is.
what i have so far is the following:
c = a[0]
for i in range(len(b)):
c += b[i] + a[1:][i]
print(c) # prints -> 1+2-34
But i kind of feel like there is a better way to do this..
You can use itertools.zip_longest to zip the two sequences, then keep iterating even after the shorter sequence ran out of characters. If you run out of characters, you'll start getting None back, so just consume the rest of the numerical characters.
>>> from itertools import chain
>>> from itertools import zip_longest
>>> ''.join(i+j if j else i for i,j in zip_longest(a, b))
'1+2-34'
As #deceze suggested in the comments, you can also pass a fillvalue argument to zip_longest which will insert empty strings. I'd suggest his method since it's a bit more readable.
>>> ''.join(i+j for i,j in zip_longest(a, b, fillvalue=''))
'1+2-34'
A further optimization suggested by #ShadowRanger is to remove the temporary string concatenations (i+j) and replace those with an itertools.chain.from_iterable call instead
>>> ''.join(chain.from_iterable(zip_longest(a, b, fillvalue='')))
'1+2-34'