Python : list of strings to list of unique characters [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have a list of strings
ll = ['abc', 'abd', 'xyz', 'xzk']
I want a list of unique characters across all strings in the given list.
For ll, output should be
['a','b','c','d','x','y','z','k']
is there a clean way to do this ?

You want to produce a set of the letters:
{l for word in ll for l in word}
You can always convert that back to a list:
list({l for word in ll for l in word})
Demo:
>>> ll = ['abc', 'abd', 'xyz', 'xzk']
>>> {l for word in ll for l in word}
{'b', 'a', 'x', 'k', 'd', 'c', 'z', 'y'}
You can also use itertools.chain.from_iterable() to provide a single iterator over all the characters:
from itertools import chain
set(chain.from_iterable(ll))
If you must have a list that reflects the order of the first occurrence of the characters, you can use a collections.OrderedDict() object instead of a set, then extract the keys with list():
from collections import OrderedDict
from itertools import chain
list(OrderedDict.fromkeys(chain.from_iterable(ll)))
Demo:
>>> from collections import OrderedDict
>>> list(OrderedDict.fromkeys(chain.from_iterable(ll)))
['a', 'b', 'c', 'd', 'x', 'y', 'z', 'k']

I do not know the simplest way to do this, but I know one way:
list = ['abc', 'abd', 'xyz', 'xzk']
new=set()
for word in list:
for letter in word:
new.add(letter)
print(new)
This is an easy way for a beginner because it doesn't need any modules which you probably don't know how to use yet.

Here's an inefficient way that preserves the order. It's ok when the total number of chars is small, otherwise, you should use Martijn's OrderedDict approach.
ll = ['abc', 'abd', 'xyz', 'xzk']
s = ''.join(ll)
print(sorted(set(s), key=s.index))
output
['a', 'b', 'c', 'd', 'x', 'y', 'z', 'k']
Here's an alternative way to preserve the order which is less compact, but more efficient than the previous approach.
ll = ['abc', 'abd', 'xyz', 'xzk']
d = {c: i for i, c in enumerate(reversed(''.join(ll)))}
print(sorted(d, reverse=True, key=d.get))
output
['a', 'b', 'c', 'd', 'x', 'y', 'z', 'k']
Using s.index as the key function is inefficient because it has to perform a linear scan on the s string for each character that it sorts, whereas my d dict can get the index of each character in O(1). I use the reversed iterator because we want earlier chars to overwrite later duplicates of the same char, and using reversed is a little more efficient than building a new string with [::-1].
Creating the d dict is only slightly slower than creating set(s), and it may be a little faster than using OrderedDict, it certainly uses less RAM.

Consider using a set()
s = set()
for word in ll:
for letter in word:
s.add(letter)
Now s should have all the unique letters. You can convert s to a list using list(s).

You can use itertools for that:
import itertools
ll = ['abc', 'abd', 'xyz', 'xzk']
set(itertools.chain(*[list(x) for x in ll]))
{'a', 'b', 'c', 'd', 'k', 'x', 'y', 'z'}

l2 =list()
for i in ll:
for j in i:
l2.append(j)
[''.join(i) for i in set(l2)]
output:
'a', 'c', 'b', 'd', 'k', 'y', 'x', 'z'

Just another one...
>>> set().union(*ll)
{'d', 'a', 'y', 'k', 'c', 'x', 'b', 'z'}
Wrap list(...) around it if needed, though why would you.

This is a function you can call and give it the list and it will return all unique letters and I added it to print at the end
lst = ['abc', 'abd', 'xyz', 'xzk']
def uniqueLetters(lst1):
unique = set()
for word in lst1:
for letter in word:
unique.add(letter)
return unique
print(uniqueLetters(lst))
To get a variable with the unique variables call the function like so:
uniqueLetters123 = uniqueLetters(lst)
And you can replace lst with your list name.

Related

Replace numbers with letters and offer all permutations

I need to determine all possible letter combinations of a string that has numbers when converting numbers into possible visually similar letters.
Using the dictionary:
number_appearance = {
'1': ['l', 'i'],
'2': ['r', 'z'],
'3': ['e', 'b'],
'4': ['a'],
'5': ['s'],
'6': ['b', 'g'] ,
'7': ['t'],
'8': ['b'],
'9': ['g', 'p'],
'0': ['o', 'q']}
I want to write a function that takes an input and creates all possible letter combinations. For example:
text = 'l4t32'
def convert_numbers(text):
return re.sub('[0-9]', lambda x: number_appearance[x[0]][0], text)
I want the output to be a list with all possible permutations:
['later', 'latbr', 'latbz', 'latez]
The function above works if you are just grabbing the first letter in each list from number_appearance, but I'm trying to figure out the best way to iterate through all possible combinations. Any help would be much appreciated!
As an upgrade from your own answer, I suggest the following:
def convert_numbers(text):
all_items = [number_appearance.get(char, [char]) for char in text]
return [''.join(elem) for elem in itertools.product(*all_items)]
The improvements are that:
it doesn't convert text to a list (there is no need for that)
you don't need regex
it will still work if you decide instead that you also want to add other characters on top of numbers
def convert_num_appearance(text):
string_characters = [character for character in text]
all_items = []
for item in string_characters:
if re.search('[a-zA-Z]', item):
all_items.append([item])
elif re.search('\d', item):
all_items.append(number_appearance[item])
return [''.join(elem) for elem in itertools.product(*all_items)]
I would break down the problem like so:
First, create a function that can do the replacement for a given set of replacement letters. My input specification is a sequence of letters, where the first letter is the replacement for the '0' character, next for 1 etc. This allows me to use the index in that sequence to determine the character being replaced, while generating a plain sequence rather than a dict or other complex structure. To do the replacement, I will use the built-in translate method of the original string. That requires a dictionary as described in the documentation, which I can easily build with a dict comprehension, or with the provided helper method str.maketrans (a static method of the str type).
Use itertools.product to generate those sequences.
Use a list comprehension to apply the replacement for each sequence.
Thus:
from itertools import product
def replace_digits(original, replacement):
# translation = {ord(str(i)): c for i, c in enumerate(replacement)}
translation = str.maketrans('0123456789', ''.join(replacement))
print(translation)
return original.translate(translation)
replacements = product(
['o', 'q'], ['l', 'i'], ['r', 'z'], ['e', 'b'], ['a'],
['s'], ['b', 'g'] , ['t'], ['b'], ['g', 'p']
)
[replace_digits('14732', r) for r in replacements]
(You will notice there are duplicates in the result; this is because of variant replacements for symbols that don't appear in the input.)

How do I create a new list with a nested list comprehension?

Say I have a list of words
word_list = ['cat','dog','rabbit']
and I want to end up with a list of letters (not including any repeated letters), like this:
['c', 'a', 't', 'd', 'o', 'g', 'r', 'b', 'i']
without a list comprehension the code would like this:
letter_list=[]
for a_word in word_list:
for a_letter in a_word:
if a_letter not in letter_list:
letter_list.append(a_letter)
print(letter_list)
is there a way to do this with a list comprehension?
I have tried
letter_list = [a_letter for a_letter in a_word for a_word in word_list]
but I get a
NameError: name 'a_word' is not defined
error. I have see answers for similar problems, but they usually iterate over a nested collection (list or tuple). Is there a way to do this from a non-nested list like a_word?
Trying
letter_list = [a_letter for a_letter in [a_word for a_word in word_list]]
Results in the initial list: ['cat','dog','rabbit']
And trying
letter_list = [[a_letter for a_letter in a_word] for a_word in word_list]
Results in:[['c', 'a', 't'], ['d', 'o', 'g'], ['r', 'a', 'b', 'b', 'i', 't']], which is closer to what I want except it's nested lists. Is there a way to do this and have just the letters be in letter_list?
Update. How about this:
word_list = ['cat','dog','rabbit']
new_list = [letter for letter in ''.join(word_list)]
new_list = sorted(set(new_list), key=new_list.index)
print(new_list)
Output:
['c', 'a', 't', 'd', 'o', 'g', 'r', 'b', 'i']
word_list = ['cat','dog','rabbit']
letter_list = list(set([letter for word in word_list for letter in word]))
This works and removes the duplicate letters, but the order is not preserved. If you want to keep the order you can do this.
from collections import OrderedDict
word_list = ['cat','dog','rabbit']
letter_list = list(OrderedDict.fromkeys("".join(word_list)))
you can do it by using list comprehension
l=[j for i in word_list for j in i ]
print(l)
output:
['c', 'a', 't', 'd', 'o', 'g', 'r', 'a', 'b', 'b', 'i', 't']
You can use a list comprehension. It is faster than looping in cases like yours when you call .append on each iteration, as explained by this answer.
But if you want to keep only unique letters (i.e. without repeating any letter), you can use a set comprehension by changing the braces [] to curly braces {} as in
letter_set = {letter for letter in word for word in word_list}
This way you avoid checking the partial list on every iteration to see if the letter is already part of the set. Instead you make use of pythons embedded hashing algorithms and make your code a lot faster.
Another solution:
>>> s = set()
>>> word_list = ['cat', 'dog', 'rabbit']
>>> [c for word in word_list for c in word if (c not in s, s.add(c))[0]]
['c', 'a', 't', 'd', 'o', 'g', 'r', 'b', 'i']
This will test whether the letter is already in the set or not, and it will unconditionally add it to the set (having no effect if it is already present). The None returned from s.add is stored in the temporary tuple but otherwise ignored. The first element of the temporary tuple (that is, the result of the c not in s) is used to filter the items.
This relies on the fact that the elements of the temporary tuple are evaluated from left to right.
Could be considered a bit hacky :-)

How can I get the list to split how i want automatically?

I have some code here:
lsp_rows = ['a', 'b', 'c', 'd', 'e', 'b', 'c', 'd', 'e', 'a', 'c',
'd', 'e', 'a', 'b', 'd', 'e', 'a', 'b', 'c', 'e', 'a',
'b', 'c', 'd']
n = int(width/length)
x = [a+b+c+d+e for a,b,c,d,e in zip(*[iter(lsp_rows)]*n)]
Currently, this will split my list "lsp_rows" in groups of 5 all the time as my n = 5. But I need it to split differently depending on "n" as it will change depending on the values of width and length.
So if n is 4 i need the list to split into 4's.
I can see that the problem is with the "a+b+c+d+e for a,b,c,d,e", and I don't know a way to make this change without my manual input, is there a way for me to solve this.
If you guys could explain as thoroughly as possible i'd really appreciate it as i'm pretty new to python. Thanks in advance!
With strings only you can:
[''.join(t) for t in zip(*[iter(lsp_rows)]*n)]
Or slightly more succinct and possibly less memory usage:
map(''.join, zip(*[iter(lsp_rows)]*n))
The answer provided by #hpaulj is more useful in the general case.
And, on the off-chance that you're just trying to generate the cycles of a string, the following will produce the same output.
s = 'abcde'
[s[i:] + s[:i] for i in range(len(s))]
I believe this will generalize your expression to n items:
import functools
import operator
[functools.reduce(operator.add,abc) for abc in zip(*[iter(x)]*n)]
though I'd still like see a test case.
For example if x is a list of lists, the result is a list of x flattened.
A list of numbers or a string look better:
In [394]: [functools.reduce(operator.add,abc) for abc in zip(*[iter('abcdefghij')]*4)]
Out[394]: ['abcd', 'efgh']
In [395]: [functools.reduce(operator.add,abc) for abc in zip(*[iter('abcdefghij')]*5)]
Out[395]: ['abcde', 'fghij']
In [396]: [functools.reduce(operator.add,abc) for abc in zip(*[iter(range(20))]*5)]
Out[396]: [10, 35, 60, 85]
with your list of characters
In [400]: [functools.reduce(operator.add,abc) for abc in zip(*[iter(lsp_rows)]*5)]
Out[400]: ['abcde', 'bcdea', 'cdeab', 'deabc', 'eabcd']
In [401]: [functools.reduce(operator.add,abc) for abc in zip(*[iter(lsp_rows)]*6)]
Out[401]: ['abcdeb', 'cdeacd', 'eabdea', 'bceabc']
All these imports can be replaced with join if the items are strings.

Function that retrieves and returns letters from a list of lists

I'm writing a function that needs to go through a list of lists, collect all letters uppercase or lowercase and then return a list with 1 of each letter that it found in order. If the letter appears multiple times in the list of lists the function only has to report the first time it sees the letter.
For example, if the list of lists was [['.', 'M', 'M', 'N', 'N'],['.', '.', '.', '.', 'g'], ['B', 'B', 'B', '.','g']] then the function output should return ["M","N","g","B"].
The code I have so far seems like it could work but it doesn't seem to be working. Any help is appreciated
def get_symbols(lot):
symbols = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
newlot = []
for i in lot:
if i == symbols:
newlot.append(symbols)
return newlot
else:
return None
To build on your existing code:
import string
def get_symbols(lot):
symbols = string.ascii_lowercase + string.ascii_uppercase
newlot = []
for sublot in lot:
for x in sublot:
if x in symbols and x not in newlot:
newlot.append(x)
return newlot
print get_symbols([['.', 'M', 'M', 'N', 'N'],['.', '.', '.', '.', 'g'], ['B', 'B', 'B', '.','g']])
Using string gets us the letters a little more neatly. We then loop over each list provided (each sublot of the lot), and then for each element (x), we check if it is both in our list of all letters and not in our list of found letters. If this is the case, we add it to our output.
There are a few things wrong with your code. You are using return in the wrong place, looping only over the outer list (not over the items in the sublists) and you were appending symbols to newlot instead of the matched item.
def get_symbols(lot):
symbols = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' # You should define this OUTSIDE of the function
newlot = []
for i in lot: # You are iterating over the outer list only here
if i == symbols: # == does not check if an item is in a list, use `in` here
newlot.append(symbols) # You are appending symbols which is the alphabet
return newlot # This will cause your function to exit as soon as the first iteration is over
else:
return None # No need for this
You can use a double for loop and use in to check if the character is in symbols and isn't already in newlot:
l = [['.', 'M', 'M', 'N', 'N'],['.', '.', '.', '.', 'g'], ['B', 'B', 'B', '.','g']]
symbols = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
def get_symbols(lot):
newlot = []
for sublist in lot:
for i in sublist:
if i in symbols and i not in newlot:
newlot.append(i)
return newlot
This is the output for your list:
>>> get_symbols(l)
['M', 'N', 'g', 'B']
this also can be done by using chain, OrderedDict and isalpha as follow
>>> from collections import OrderedDict
>>> from itertools import chain
>>> data = [['.', 'M', 'M', 'N', 'N'],['.', '.', '.', '.', 'g'], ['B', 'B', 'B', '.','g']]
>>> temp = OrderedDict.fromkeys(chain.from_iterable(data))
>>> [x for x in temp if x.isalpha()]
['M', 'N', 'g', 'B']
>>>
chain.from_iterable will serve the same purpose as if you concatenate all the sublist in one
As the order is relevant, OrderedDict will server the same purpose as an set by removing duplicates with the added bonus of preserving the order of the first instance of the object added. The fromkeys class-method will create a dictionary with the given keys and same value, which by default is None, and as we don't care about it, for our purpose is a orderer set
Finally the isalpha will tell you if the string is a letter or not
you can also take a look at the unique_everseen recipe, because itertools is your best friend I recommend to put all those recipes in a file that is always at hand, they always are helpful

With variable length list of variable length strings, how do I create all combinations

In python 3, I have a variable length list, where each element of the list is a variable length string. Something like this
['TO', 'G', 'ZDO', 'DEO', 'SGT', 'D', 'Z', 'FT', 'OV']
and I want to iterate over every possible combination of words where the letters that make up the word are from the strings in the list, and the length of the word is the same as the length of the list. So something like this
TGZDSDZFO
TGZDSDZFV
TGZDSDZTO
...
OGOOTDZTO
OGOOTDZTV
I am having trouble coming up with a generic solution for n-sized list.
>>> (''.join(s) for s in itertools.product(*['TO', 'G', 'ZDO', 'DEO', 'SGT', 'D', 'Z', 'FT', 'OV']))
<generator object <genexpr> at 0x7f2a46468f00>
>>> # to demonstrate:
...
>>> list(itertools.islice((''.join(s) for s in itertools.product(*['TO', 'G', 'ZDO', 'DEO', 'SGT', 'D', 'Z', 'FT', 'OV'])), 3))
['TGZDSDZFO', 'TGZDSDZFV', 'TGZDSDZTO']
As others have suggested, itertools is perhaps the simplest/easiest way to solve this. If you are looking to write your own algorithm however (i.e. reimplement what itertools does under the hood), then take a look at this:
def allPerms(L, sofar=''):
if not L:
print(sofar)
else:
for char in L[0]:
allPerms(L[1:], sofar+char)
Output:
In [97]: L = ['TO', 'G', 'ZDO', 'DEO', 'SGT', 'D', 'Z', 'FT', 'OV']
In [98]: allPerms(L)
TGZDSDZFO
TGZDSDZFV
TGZDSDZTO
TGZDSDZTV
TGZDGDZFO
TGZDGDZFV
TGZDGDZTO
TGZDGDZTV
TGZDTDZFO
TGZDTDZFV
TGZDTDZTO
TGZDTDZTV
TGZESDZFO
TGZESDZFV
TGZESDZTO
TGZESDZTV
TGZEGDZFO
TGZEGDZFV
TGZEGDZTO
TGZEGDZTV
--- truncated ---
EDIT:
As #njzk2 points out, python3's yield-from does a fantastic job of making the output usable:
def allPerms(L, sofar=''):
if not L: yield sofar
else:
for char in L[0]: yield from allPerms(L[1:], sofar+char)
Output:
In [118]: for i in allPerms(L): print(i)
TGZDSDZFO
TGZDSDZFV
TGZDSDZTO
TGZDSDZTV
TGZDGDZFO
TGZDGDZFV
TGZDGDZTO
TGZDGDZTV
TGZDTDZFO
TGZDTDZFV
TGZDTDZTO
TGZDTDZTV
TGZESDZFO
TGZESDZFV
TGZESDZTO
TGZESDZTV
TGZEGDZFO
TGZEGDZFV
TGZEGDZTO
TGZEGDZTV
TGZETDZFO
TGZETDZFV
TGZETDZTO
--- truncated ---
you could use the itertools module to create the permutations of your desired length.
combine all the workds to one string and use it in the permutations function
lst = ['TO', 'G', 'ZDO', 'DEO', 'SGT', 'D', 'Z', 'FT', 'OV']
length = len(lst)
combined = ''.join(lst)
all_perms = itertools.permutations(combined, length)
#this will give you something like [('T', 'O', ...), (...),]
print ([''.join(x) for x in all_perms])

Categories

Resources