Getting strings from list using python - python

Hi I am new to python I am trying to delete some unwanted characters and bring format
My lists are
List=
['2', '4a.', 'D', '__|5.', 'E|6.', 'F', '|7.', 'G', '—|8.'']
['9', '10.', "QRS(q,r", 's)', '11.', 'TUV/', '12.', "XYZ:"]
I want to get the list as follows
['D', 'E', 'F', 'G']
["QRS(q,r,s)", 'TUV/', "XYZ:"]
Here I want to delete numbers and alphanumeric ones
There are two challenges here
in the first list I had 'E|6.' I want to get E only string
in the second list I had "QRS(q,r", 's)' I want it as "QRS(q,r,s)" as only one string
Can anyone plz help me out thanks in advance

First you will need to differentiate between an special character and a alphabet character. For this you must have a list of limited alphabet characters -
import string
string.ascii_lowercase # returns all alphabets in lowercase.
Then you will need to iterate through the string in the list as to find possible alphabets in uppercase.
import string
List = ['2', '4a.', 'D', '__|5.', 'E|6.', 'F', '|7.', 'G', '—|8.'],['9', '10.', "QRS(q,r", 's)', '11.', 'TUV/', '12.', "XYZ:"]
what_you_need = []
for h in List:
for i in h:
for j in i:
if j.upper() in string.ascii_lowercase.upper():
what_you_need.append(j)
print(what_you_need)

you can try regular expression.
check below example:
import re
l1 = []
a = ['2', '4a.', 'D', '__|5.', 'E|16.', 'F', '|7.', 'G', '—|8.']
for idx, ele in enumerate(a):
if '(' in ele:
l1.append(ele + ',' + b[idx+1])
continue
elif ')' in ele:
pass
elif any([i.isdigit() for i in ele]):
g = re.findall(r"([A-Z])",ele)
if g:
l1.append(g[0])
else:
l1.append(ele)
In same way you can prepare regular expression for another list

Related

Replace numbers with letters and offer all permutations

I need to determine all possible letter combinations of a string that has numbers when converting numbers into possible visually similar letters.
Using the dictionary:
number_appearance = {
'1': ['l', 'i'],
'2': ['r', 'z'],
'3': ['e', 'b'],
'4': ['a'],
'5': ['s'],
'6': ['b', 'g'] ,
'7': ['t'],
'8': ['b'],
'9': ['g', 'p'],
'0': ['o', 'q']}
I want to write a function that takes an input and creates all possible letter combinations. For example:
text = 'l4t32'
def convert_numbers(text):
return re.sub('[0-9]', lambda x: number_appearance[x[0]][0], text)
I want the output to be a list with all possible permutations:
['later', 'latbr', 'latbz', 'latez]
The function above works if you are just grabbing the first letter in each list from number_appearance, but I'm trying to figure out the best way to iterate through all possible combinations. Any help would be much appreciated!
As an upgrade from your own answer, I suggest the following:
def convert_numbers(text):
all_items = [number_appearance.get(char, [char]) for char in text]
return [''.join(elem) for elem in itertools.product(*all_items)]
The improvements are that:
it doesn't convert text to a list (there is no need for that)
you don't need regex
it will still work if you decide instead that you also want to add other characters on top of numbers
def convert_num_appearance(text):
string_characters = [character for character in text]
all_items = []
for item in string_characters:
if re.search('[a-zA-Z]', item):
all_items.append([item])
elif re.search('\d', item):
all_items.append(number_appearance[item])
return [''.join(elem) for elem in itertools.product(*all_items)]
I would break down the problem like so:
First, create a function that can do the replacement for a given set of replacement letters. My input specification is a sequence of letters, where the first letter is the replacement for the '0' character, next for 1 etc. This allows me to use the index in that sequence to determine the character being replaced, while generating a plain sequence rather than a dict or other complex structure. To do the replacement, I will use the built-in translate method of the original string. That requires a dictionary as described in the documentation, which I can easily build with a dict comprehension, or with the provided helper method str.maketrans (a static method of the str type).
Use itertools.product to generate those sequences.
Use a list comprehension to apply the replacement for each sequence.
Thus:
from itertools import product
def replace_digits(original, replacement):
# translation = {ord(str(i)): c for i, c in enumerate(replacement)}
translation = str.maketrans('0123456789', ''.join(replacement))
print(translation)
return original.translate(translation)
replacements = product(
['o', 'q'], ['l', 'i'], ['r', 'z'], ['e', 'b'], ['a'],
['s'], ['b', 'g'] , ['t'], ['b'], ['g', 'p']
)
[replace_digits('14732', r) for r in replacements]
(You will notice there are duplicates in the result; this is because of variant replacements for symbols that don't appear in the input.)

How to split a string by spaces and remove non-ASCII characters?

When I am given a string like "Ready[[[, steady, go!", I want to turn it into a list like this: [Ready, steady, go!]. Currently, the best I could do are two list comprehensions but I couldn't figure out a way to combine them.
text_list = [i for i in text.split()]
output: ['Ready[[[,', 'steady,', 'go!']
clean_list = [x for x in list(text) if x in string.ascii_letters]
output: ['R', 'e', 'a', 'd', 'y', 's', 't', 'e', 'a', 'd', 'y', 'g', 'o']
clean_list does succeed in removing non-ASCII letters but literally turns every single character into a list element. text_list keeps the format intact but does not remove non-ASCII characters. How do I combine the two logics to give me the output that I want?
This should work:
import re, string
# filter out all unwanted characters using regex
pattern = re.compile(f"[^{string.ascii_letters} !]")
filtered = pattern.sub('', "Ready[[[, steady, go!")
# split
result = filtered.split()

How do I create a new list with a nested list comprehension?

Say I have a list of words
word_list = ['cat','dog','rabbit']
and I want to end up with a list of letters (not including any repeated letters), like this:
['c', 'a', 't', 'd', 'o', 'g', 'r', 'b', 'i']
without a list comprehension the code would like this:
letter_list=[]
for a_word in word_list:
for a_letter in a_word:
if a_letter not in letter_list:
letter_list.append(a_letter)
print(letter_list)
is there a way to do this with a list comprehension?
I have tried
letter_list = [a_letter for a_letter in a_word for a_word in word_list]
but I get a
NameError: name 'a_word' is not defined
error. I have see answers for similar problems, but they usually iterate over a nested collection (list or tuple). Is there a way to do this from a non-nested list like a_word?
Trying
letter_list = [a_letter for a_letter in [a_word for a_word in word_list]]
Results in the initial list: ['cat','dog','rabbit']
And trying
letter_list = [[a_letter for a_letter in a_word] for a_word in word_list]
Results in:[['c', 'a', 't'], ['d', 'o', 'g'], ['r', 'a', 'b', 'b', 'i', 't']], which is closer to what I want except it's nested lists. Is there a way to do this and have just the letters be in letter_list?
Update. How about this:
word_list = ['cat','dog','rabbit']
new_list = [letter for letter in ''.join(word_list)]
new_list = sorted(set(new_list), key=new_list.index)
print(new_list)
Output:
['c', 'a', 't', 'd', 'o', 'g', 'r', 'b', 'i']
word_list = ['cat','dog','rabbit']
letter_list = list(set([letter for word in word_list for letter in word]))
This works and removes the duplicate letters, but the order is not preserved. If you want to keep the order you can do this.
from collections import OrderedDict
word_list = ['cat','dog','rabbit']
letter_list = list(OrderedDict.fromkeys("".join(word_list)))
you can do it by using list comprehension
l=[j for i in word_list for j in i ]
print(l)
output:
['c', 'a', 't', 'd', 'o', 'g', 'r', 'a', 'b', 'b', 'i', 't']
You can use a list comprehension. It is faster than looping in cases like yours when you call .append on each iteration, as explained by this answer.
But if you want to keep only unique letters (i.e. without repeating any letter), you can use a set comprehension by changing the braces [] to curly braces {} as in
letter_set = {letter for letter in word for word in word_list}
This way you avoid checking the partial list on every iteration to see if the letter is already part of the set. Instead you make use of pythons embedded hashing algorithms and make your code a lot faster.
Another solution:
>>> s = set()
>>> word_list = ['cat', 'dog', 'rabbit']
>>> [c for word in word_list for c in word if (c not in s, s.add(c))[0]]
['c', 'a', 't', 'd', 'o', 'g', 'r', 'b', 'i']
This will test whether the letter is already in the set or not, and it will unconditionally add it to the set (having no effect if it is already present). The None returned from s.add is stored in the temporary tuple but otherwise ignored. The first element of the temporary tuple (that is, the result of the c not in s) is used to filter the items.
This relies on the fact that the elements of the temporary tuple are evaluated from left to right.
Could be considered a bit hacky :-)

Function that retrieves and returns letters from a list of lists

I'm writing a function that needs to go through a list of lists, collect all letters uppercase or lowercase and then return a list with 1 of each letter that it found in order. If the letter appears multiple times in the list of lists the function only has to report the first time it sees the letter.
For example, if the list of lists was [['.', 'M', 'M', 'N', 'N'],['.', '.', '.', '.', 'g'], ['B', 'B', 'B', '.','g']] then the function output should return ["M","N","g","B"].
The code I have so far seems like it could work but it doesn't seem to be working. Any help is appreciated
def get_symbols(lot):
symbols = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
newlot = []
for i in lot:
if i == symbols:
newlot.append(symbols)
return newlot
else:
return None
To build on your existing code:
import string
def get_symbols(lot):
symbols = string.ascii_lowercase + string.ascii_uppercase
newlot = []
for sublot in lot:
for x in sublot:
if x in symbols and x not in newlot:
newlot.append(x)
return newlot
print get_symbols([['.', 'M', 'M', 'N', 'N'],['.', '.', '.', '.', 'g'], ['B', 'B', 'B', '.','g']])
Using string gets us the letters a little more neatly. We then loop over each list provided (each sublot of the lot), and then for each element (x), we check if it is both in our list of all letters and not in our list of found letters. If this is the case, we add it to our output.
There are a few things wrong with your code. You are using return in the wrong place, looping only over the outer list (not over the items in the sublists) and you were appending symbols to newlot instead of the matched item.
def get_symbols(lot):
symbols = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' # You should define this OUTSIDE of the function
newlot = []
for i in lot: # You are iterating over the outer list only here
if i == symbols: # == does not check if an item is in a list, use `in` here
newlot.append(symbols) # You are appending symbols which is the alphabet
return newlot # This will cause your function to exit as soon as the first iteration is over
else:
return None # No need for this
You can use a double for loop and use in to check if the character is in symbols and isn't already in newlot:
l = [['.', 'M', 'M', 'N', 'N'],['.', '.', '.', '.', 'g'], ['B', 'B', 'B', '.','g']]
symbols = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
def get_symbols(lot):
newlot = []
for sublist in lot:
for i in sublist:
if i in symbols and i not in newlot:
newlot.append(i)
return newlot
This is the output for your list:
>>> get_symbols(l)
['M', 'N', 'g', 'B']
this also can be done by using chain, OrderedDict and isalpha as follow
>>> from collections import OrderedDict
>>> from itertools import chain
>>> data = [['.', 'M', 'M', 'N', 'N'],['.', '.', '.', '.', 'g'], ['B', 'B', 'B', '.','g']]
>>> temp = OrderedDict.fromkeys(chain.from_iterable(data))
>>> [x for x in temp if x.isalpha()]
['M', 'N', 'g', 'B']
>>>
chain.from_iterable will serve the same purpose as if you concatenate all the sublist in one
As the order is relevant, OrderedDict will server the same purpose as an set by removing duplicates with the added bonus of preserving the order of the first instance of the object added. The fromkeys class-method will create a dictionary with the given keys and same value, which by default is None, and as we don't care about it, for our purpose is a orderer set
Finally the isalpha will tell you if the string is a letter or not
you can also take a look at the unique_everseen recipe, because itertools is your best friend I recommend to put all those recipes in a file that is always at hand, they always are helpful

Split list into list of lists by regex

I want to split a character list into a list of lists, where the split point is defined by successful Regex match.
For instance, say I have an input list:
["file1","A","B","C","file2","D","E","F","G","H","I"]
I want to produce:
[["file1","A","B","C"],["file2","D","E","F","G","H","I"]]
Where the split points, being file1 and file2 were identified by a successful match to
re.search("file[0-9]+",<TEST STRING>)
It is NOT known in advance, the number of items between each split point, nor is it known how many 'fileXXX' terms are in the original vector.
In reality, my Regex matches are a lot more complicated than this, that is not the concern, what I need help with, if someone would be so kind, is the Pythonic way to execute the split logic?
Assumes the first element will be a proper header. If not, you will need to do some defensive clauses.
import re
result = []
pattern = re.compile(r'^file.*')
for el in input_list:
if pattern.match(el):
row = []
result.append(row)
row.append(el)
The following should work quite nicely:
import re
input_list = ["file1","A","B","C","file2","D","E","F","G","H","I"]
output_list = []
for item in input_list:
if re.match("file[0-9]+", item):
output_list.append([item])
else:
output_list[-1].append(item)
print output_list
Gives the following result:
[['file1', 'A', 'B', 'C'], ['file2', 'D', 'E', 'F', 'G', 'H', 'I']]
Note, this assumes the first item is a match.
Update
A second approach could be:
input_list = ["1", "2", "file1","A","B","C","file2","D","E","F","G","H","I"]
output_list = []
for item in input_list:
if re.match("file[0-9]+", item) or len(output_list) == 0:
output_list.append([item])
else:
output_list[-1].append(item)
print output_list
This would also cope with the non initial match case:
[['1', '2'], ['file1', 'A', 'B', 'C'], ['file2', 'D', 'E', 'F', 'G', 'H', 'I']]
You can find the indexes of file\d:
indeces = list(i for i,val in enumerate(my_list) if match('file\d', val))
And then simply group by these indexes:
output = [my_list[indeces[0]:indeces[1]], my_list[indeces[1]:]]
>>> from re import match
>>> my_list = ["file1","A","B","C","file2","D","E","F","G","H","I"]
>>> indeces = list(i for i,val in enumerate(my_list) if match('file\d', val))
>>> [my_list[indeces[0]:indeces[1]], my_list[indeces[1]:]]
[['file1', 'A', 'B', 'C'], ['file2', 'D', 'E', 'F', 'G', 'H', 'I']]

Categories

Resources