Splitting strings into lists and splitting again - python

I want to split the string
" 510 -9999999 9 99 12 5 [3, 0] [] [6] "
(which contains more or less random numbers of whitespaces between the entries) into it's component parts, including the lists within the string. I can get to this
['510', '-9999999', '9', '99', '12', '5', '[3,', '0]', '[]', '[6]']
through using split and replace. However, I then want to reconstitute the lists within the original string so that I can get to
['510', '-9999999', '9', '99', '12', '5', '[3,0]', '[]', '[6]'].
The real problem is that this string is one of many and the lists may contain many, or no components so I have to deal with this is a general way.
I could potentially search for '[', then search for ']' to close up the list but, as I don't know the length of any of the lists going in, this seems an inefficient way of doing things.
Any help greatly appreciated!

There is always regex, but you can do it on the cheap like this
>>> import shlex
>>> shlex.split(s.replace('[','"[').replace(']',']"'))
['510', '-9999999', '9', '99', '12', '5', '[3, 0]', '[]', '[6]']
The proper solution would be to use pyparsing module, or even better to control the input source to give you something more sensible like json.

If lists aren't nested, you can try this:
def mysplit (a):
return re.split(' +', re.sub('\\[(.*?)\\]', lambda m: '[{}]'.format(m.groups()[0].replace(' ', '')), a))

If lists can't be nested then I think it is possible to preprocess string with:
s = " 510 -9999999 9 99 12 5 [3, 0] [] [6] "
opened = False
s_new = ""
for i in s:
if i == "[":
opened = True
if i == "]":
opened = False
if not opened or (opened and i != " "):
s_new += i
And then split it into list:
l = s_new.split()

Related

How to add digit in the beginning of string array in python?

e.g. I have this array
list=['123', '4', '56']
I want to add '0' in the beginning of list array that has 1 or 2 digit. So the output will be:
list=['123', '004', '056']
Use zfill method:
In [1]: '1'.zfill(3)
Out[1]: '001'
In [2]: '12'.zfill(3)
Out[2]: '012'
Using a list comprehension, we can prepend the string '00' to each number in the list, then retain the final 3 characters only:
list = ['123', '4', '56']
output = [('00' + x)[-3:] for x in list]
print(output) # ['123', '004', '056']
As per How to pad zeroes to a string?, you should use str.zfill:
mylist = ['123', '4', '56']
output = [x.zfill(3) for x in mylist]
Alternatively, you could (though I don't know why you would) use str.rjust
output = [x.rjust(3, "0") for x in mylist]
or string formatting:
output = [f"{x:0>3}" for x in mylist]
I'd say this would be a very easy to read example, but not the shortest.
Basically, we iterate through the lists elements, check if the length is 2 or 1. Based on that we will add the correct amount of '0'
lst=['123', '4', '56']
new = []
for numString in lst:
if len(numString) == 2:
new.append('0'*1+numString)
elif len(numString) == 1:
new.append('0'*2+numString)
else:
new.append(numString)
print(new)
Also I kind of had to include it (list comprehension).But this is barely readable,so I gave the above example. Look here for list comprehension with if, elif, else
lst=['123', '4', '56']
new = ['0'*1+numString if len(numString) == 2 else '0'*2+numString if len(numString) == 1 else numString for numString in lst]
print(new)
output
['123', '004', '056']
trying into integer and add preceding zero/s then convert into a string and replace the element in the same position and the same list
list=["3","45","111"]
n=len(list)
for i in range(0,n):
list[i] = str(f"{int(list[i]):03}")
You can check the solution for this in link

how to find the matching pattern for an input list and then replace the found pattern with the proper pattern conversion using python

note that the final two numbers of this pattern for example FBXASC048 are ment to be ascii code for numbers (0-9)
input example list ['FBXASC048009Car', 'FBXASC053002Toy', 'FBXASC050004Human']
result example ['1009Car', '5002Toy', '2004Human']
what is the proper way to searches for any of these pattern in an input list
num_ascii = ['FBXASC048', 'FBXASC049', 'FBXASC050', 'FBXASC051', 'FBXASC052', 'FBXASC053', 'FBXASC054', 'FBXASC055', 'FBXASC056', 'FBXASC057']
and then replaces the pattern found with one of the items in the conv list but not randomally
because each element in the pattern list equals only one element in the conv_list
conv_list = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
this is the solution in mind:
it has two part
1st part--> is to find for ascii pattern[48, 49, 50, 51, 52, 53, 54, 55, 56,57]
and then replace those with the proper decimal matching (0-9)
so we will get new input list will be called input_modi_list that has ascii replaced with decimal
2nd part-->another process to use fixed pattern to replace using replace function which is this 'FBXASC0'
new_list3
for x in input_modi_list:
y = x.replace('FBXASC0', '')
new_list3.append(new_string)
so new_list3 will have the combined result of the two parts mentioned above.
i don't know if there would be a simplar solution or a better one maybe using regex
also note i don't have any idea on how to replace ascii with decimal for a list of items
I think this should do the trick:
import re
input_list = ['FBXASC048009Car', 'FBXASC053002Toy', 'FBXASC050004Human']
pattern = re.compile('FBXASC(\d{3,3})')
def decode(match):
return chr(int(match.group(1)))
result = [re.sub(pattern, decode, item) for item in input_list]
print(result)
Now, there is some explanation due:
1- the pattern object is a regular expression that will match any part of a string that starts with 'FBXASC' and ends with 3 digits (0-9). (the \d means digit, and {3,3} means that it should occur at least 3, and at most 3 times, i.e. exactly 3 times). Also, the parenthesis around \d{3,3} means that the three digits matched will be stored for later use (explained in the next part).
2- The decode function receives a match object, uses .group(1) to extract the first matched group (which in our case are the three digits matched by \d{3,3}), then uses the int function to parse the string into an integer (for example, convert '048' to 48), and finally uses the chr function to find which character has that ASCII-code. (for example chr(48) will return '0', and chr(65) will return 'A')
3- The final part applies the re.sub function to all elements of list which will replace each occurrence of the pattern you described (FBXASC048[3-digits]) with it's corresponding ASCII character.
You can see that this solution is not limited only to your specific examples. Any number can be used as long as it has a corresponding ASCII character recognized by the chr function.
But, if you do want to limit it just to the 48-57 range, you can simply modify the decode function:
def decode(match):
ascii_code = int(match.group(1))
if ascii_code >= 48 and ascii_code <= 57:
return chr(ascii_code)
else:
return match.group(0) # returns the entire string - no modification
This is how I would do it.
make the regex pattern by simply joining the strings with |:
>>> num_ascii = ['FBXASC048', 'FBXASC049', 'FBXASC050', 'FBXASC051', 'FBXASC052', 'FBXASC053', 'FBXASC054', 'FBXASC055', 'FBXASC056', 'FBXASC057']
>>> conv_list = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
>>> regex_pattern = '|'.join(num_ascii)
>>> regex_pattern
'FBXASC048|FBXASC049|FBXASC050|FBXASC051|FBXASC052|FBXASC053|FBXASC054|FBXASC055
|FBXASC056|FBXASC057'
make a look-up dictionary by simply zipping the two lists:
>>> conv_table = dict(zip(num_ascii, conv_list))
>>> conv_table
{'FBXASC048': '0', 'FBXASC049': '1', 'FBXASC050': '2', 'FBXASC051': '3', 'FBXASC
052': '4', 'FBXASC053': '5', 'FBXASC054': '6', 'FBXASC055': '7', 'FBXASC056': '8
', 'FBXASC057': '9'}
iterate over the data and replace the matched string with the corresponding digit:
>>> import re
>>> result = []
>>> for item in ['FBXASC048009Car', 'FBXASC053002Toy', 'FBXASC050004Human']:
... m = re.match(regex_pattern, item)
... matched_string = m[0]
... digit = (conv_table[matched_string])
... print(f'replacing {matched_string} with {digit}')
... result.append(item.replace(matched_string, digit))
...
replacing FBXASC048 with 0
replacing FBXASC053 with 5
replacing FBXASC050 with 2
>>> result
['0009Car', '5002Toy', '2004Human']

Separate each item of a list in an specific way

I have an input, which is a tuple of strings, encoded in a1z26 cipher: numbers from 1 to 26 represent alphabet letters, hyphens represent same word letters and spaces represent an space between words.
For example:
8-9 20-8-5-18-5 should translate to 'hi there'
Let's say that the last example is a tuple in a var called string
string = ('8-9','20-8-5-18-5')
The first thing I find logical is convert the tuple into a list using
string = list(string)
so now
string = ['8-9','20-8-5-18-5']
The problem now is that when I iterate over the list to compare it with a dictionary which has the translated values, double digit numbers are treated as one, so instead of, for example, translating '20' it translate '2' and then '0', resulting in the string saying 'hi bheahe' (2 =b, 1 = a and 8 = h)
so I need a way to convert the list above to the following
list
['8','-','9',' ','20','-','8','-','5','-','18','-','5',]
I've already tried various codes using
list(),
join() and
split()
But it ends up giving me the same problem.
To sum up, I need to make any given list (converted from the input tuple) into a list of characters that takes into account double digit numbers, spaces and hyphens altogether
This is what I've got so far. (The last I wrote) The input is further up in the code (string)
a1z26 = {'1':'A', '2':'B', '3':'C', '4':'D', '5':'E', '6':'F', '7':'G', '8':'H', '9':'I', '10':'J', '11':'K', '12':'L', '13':'M', '14':'N', '15':'O', '16':'P', '17':'Q', '18':'R', '19':'S', '20':'T', '21':'U', '22':'V', '23':'W', '24':'X', '25':'Y', '26':'Z', '-':'', ' ' : ' ', ', ' : ' '}
translation = ""
code = list(string)
numbersarray1 = code
numbersarray2 = ', '.join(numbersarray1)
for char in numbersarray2:
if char in a1z26:
translation += a1z26[char]
There's no need to convert the tuple to a list. Tuples are iterable too.
I don't think the list you name is what you actually want. You probably want a 2d iterable (not necessarily a list, as you'll see below we can do this in one pass without generating an intermediary list), where each item corresponds to a word and is a list of the character numbers:
[[8, 9], [20, 8, 5, 18, 5]]
From this, you can convert each number to a letter, join the letters together to form the words, then join the words with spaces.
To do this, you need to pass a parameter to split, to tell it how to split your input string. You can achieve all of this with a one liner:
plaintext = ' '.join(''.join(num_to_letter[int(num)] for num in word.split('-'))
for word in ciphertext.split(' '))
This does exactly the splitting procedure as described above, and then for each number looks into the dict num_to_letter to do the conversion.
Note that you don't even need this dict. You can use the fact that A-Z in unicode is contiguous so to convert 1-26 to A-Z you can do chr(ord('A') + num - 1).
You don't really need hypens, am I right?
I suggest you the following approach:
a = '- -'.join(string).split('-')
Now a is ['8', '9', ' ', '20', '8', '5', '18', '5']
You can then convert each number to the proper character using your dictionary
b = ''.join([a1z26[i] for i in a])
Now b is equal to HI THERE
I think, it's better to apply regular expressions there.
Example:
import re
...
src = ('8-9', '20-8-5-18-5')
res = [match for tmp in src for match in re.findall(r"([0-9]+|[^0-9]+)", tmp + " ")][:-1]
print(res)
Result:
['8', '-', '9', ' ', '20', '-', '8', '-', '5', '-', '18', '-', '5']
using regex here is solution
import re
string = '8-9 20-8-5-18-5'
exp=re.compile(r'[0-9]+|[^0-9]+')
data= exp.findall(string)
print(data)
output
['8', '-', '9', ' ', '20', '-', '8', '-', '5', '-', '18', '-', '5']
if you want to get hi there from the input string , here is a method (i am assuming all character are in uppercase):
import re
string = '8-9 20-8-5-18-5'
exp=re.compile(r'[0-9]+|[^0-9]+')
data= exp.findall(string)
new_str =''
for i in range(len(data)):
if data[i].isdigit():
new_str+=chr(int(data[i])+64)
else:
new_str+=data[i]
result = new_str.replace('-','')
output:
HI THERE
You could also try this itertools solution:
from itertools import chain
from itertools import zip_longest
def separate_list(lst, delim, sep=" "):
result = []
for x in lst:
chars = x.split(delim) # 1
pairs = zip_longest(chars, [delim] * (len(chars) - 1), fillvalue=sep) # 2, 3
result.extend(list(chain.from_iterable(pairs))) # 4
return result[:-1] # 5
print(separate_list(["8-9", "20-8-5-18-5"], delim="-"))
Output:
['8', '-', '9', ' ', '20', '-', '8', '-', '5', '-', '18', '-', '5']
Explanation of above code:
Split each string by delimiter '-'.
Create interspersing delimiters.
Create pairs of characters and separators with itertools.zip_longest.
Extend flattened pairs to result list with itertools.chain.from_iterable.
Remove trailing ' ' from result list added.
You could also create your own intersperse generator function and apply it twice:
from itertools import chain
def intersperse(iterable, delim):
it = iter(iterable)
yield next(it)
for x in it:
yield delim
yield x
def separate_list(lst, delim, sep=" "):
return list(
chain.from_iterable(
intersperse(
(intersperse(x.split(delim), delim=delim) for x in lst), delim=[sep]
)
)
)
print(separate_list(["8-9", "20-8-5-18-5"], delim="-"))
# ['8', '-', '9', ' ', '20', '-', '8', '-', '5', '-', '18', '-', '5']

How to remove specific strings from a list

From the following list how can I remove elements ending with Text.
My expected result is a=['1,2,3,4']
My List is a=['1,2,3,4,5Text,6Text']
Should i use endswith to go about this problem?
Split on commas, then filter on strings that are only digits:
a = [','.join(v for v in a[0].split(',') if v.isdigit())]
Demo:
>>> a=['1,2,3,4,5Text,6Text']
>>> [','.join(v for v in a[0].split(',') if v.isdigit())]
['1,2,3,4']
It looks as if you really wanted to work with lists of more than one element though, at which point you could just filter:
a = ['1', '2', '3', '4', '5Text', '6Text']
a = filter(str.isdigit, a)
or, using a list comprehension (more suitable for Python 3 too):
a = ['1', '2', '3', '4', '5Text', '6Text']
a = [v for v in a if v.isdigit()]
Use str.endswith to filter out such items:
>>> a = ['1,2,3,4,5Text,6Text']
>>> [','.join(x for x in a[0].split(',') if not x.endswith('Text'))]
['1,2,3,4']
Here str.split splits the string at ',' and returns a list:
>>> a[0].split(',')
['1', '2', '3', '4', '5Text', '6Text']
Now filter out items from this list and then join them back using str.join.
try this. This works with every text you have in the end.
a=['1,2,3,4,5Text,6Text']
a = a[0].split(',')
li = []
for v in a:
try : li.append(int(v))
except : pass
print li

How can I split a list into smaller lists?

I have a list:
lists = (['1','2','3','S','3','4','S','4','6','7'])
And I want to split the list into s smaller list everytime 'S' appears and eliminate 'S' into something like:
([['1','2','3'],['3','4],['4','6','7']])
My code:
def x (lists):
empty = ''
list = []
for x in lists:
if x == empty:
list[-1:].append(x)
else:
list.append([x])
return (list)
I tried something like this, but I am quite new to python, and Im getting nowhere. Nothing fancy please, how would I fix what I have?
Try itertools.groupby():
>>> from itertools import groupby
>>> lists = ['1','2','3','S','3','4','S','4','6','7']
>>> [list(g[1]) for g in groupby(lists, lambda i:i!='S') if g[0]]
[['1', '2', '3'], ['3', '4'], ['4', '6', '7']]
Maybe something like map(list,''.join(lists).split('S'))
Alternately, [list(s) for s in ''.join(lists).split('S'))
Well, may be funny, but this should work:
[s.split('#') for s in '#'.join(lists).split('#S#')]
Instead of the '#' any character can be used if it's unlikely to appear in lists.

Categories

Resources