Split and retain character of split python - python

I want to split a string bases on 4 criteria [ a, p, :, -], and additionally convert string with only numbers into integers.
import re
DATA = "12:30pm-12:00am"
print (re.split('[-:ap]',DATA))
Input string : "12:30pm-12:00am"
Desired output array:
[ 12, ":", 30, "pm", "-", 12, ":", 00, "am"]
[Full disclosure] This is from a coderbyte challenge. I am sorry if this is so noob it offends you, thank you for your patience.

filter(None, re.split('(-|:|am|pm)', '12:30pm-12:00am'))
Start with this, it will guide you to the solutions, this will give you the desired output to start with:
['12', ':', '30', 'pm', '-', '12', ':', '00', 'am']
Note that the input is string, and in your post you posted the numbers as integers.

Related

Extract digits from string with consecutive digit characters

I cannot use Regular Expressions or library :(. I need to extract all digits from an alphanumeric string. Each consecutive sequence of digits (we can call "temperature") is precluded by a (+, -, or *) and will be considered as a single number (all are integers, no float). There are other non digit characters in the string that can be ignored. I need to extract each "temperature" into a data structure.
Example String "BARN21+77-48CDAIRY87+56-12" yields [21, 77, 48, 87, 56, 12]
The data string can be many many magnitudes larger.
All solutions I can find assume there is only 1 sequence of digits (temperature) in the string or that the (temperatures) are separated by a space/delimiter. I was able to get working by iterating through string and adding a space before and after each digit sequence and then using split but that feels like cheating. I wonder if you professionals distort data for a happy solution??
incoming data "BARN21+77-48CDAIRY87+56-12"
temp is what I change data to
temp = "BARN* 21 + 77 - 48 DAIRY* 87 + 56 - 12"
result = [int(i)
for i in temp.split()
if i.isdigit()]
print("The result ", result)
The result [21, 77, 48, 87, 56, 12]
Here is a version which does not use regular expressions:
inp = "BARN21+77-48CDAIRY87+56-12"
inp = ''.join(' ' if not ch.isdigit() else ch for ch in inp).strip()
nums = inp.split()
print(nums) # ['21', '77', '48', '87', '56', '12']
If regex be available for you, we can use re.findall with the regex pattern \d+:
inp = "BARN21+77-48CDAIRY87+56-12"
nums = re.findall(r'\d+', inp)
print(nums) # ['21', '77', '48', '87', '56', '12']

python split string using alphabet to get only numerical values

Using python 3.8
Given str = A11B11C32D34,....
I want to split it into [11, 11, 32, 34 ...]. Meaning split using alphabets. How could I do this?
Thanks in advance!
Check with
s= 'A11B11C32D34'
s
Out[388]: 'A11B11C32D34'
import re
re.findall(r'\d+', s)
Out[390]: ['11', '11', '32', '34']
I might also suggest using a regex split approach here:
inp = "A11B11C32D34"
nums = [x for x in re.split(r'\D+', inp) if x]
print(nums) # ['11', '11', '32', '34']
The idea here is to split the string on any one or more collection of non digit characters. I also use a list comprehension to remove any leading/trailing empty entries in the output from re.split which might arise due to the string starting/ending with a non digit character.

Separate each item of a list in an specific way

I have an input, which is a tuple of strings, encoded in a1z26 cipher: numbers from 1 to 26 represent alphabet letters, hyphens represent same word letters and spaces represent an space between words.
For example:
8-9 20-8-5-18-5 should translate to 'hi there'
Let's say that the last example is a tuple in a var called string
string = ('8-9','20-8-5-18-5')
The first thing I find logical is convert the tuple into a list using
string = list(string)
so now
string = ['8-9','20-8-5-18-5']
The problem now is that when I iterate over the list to compare it with a dictionary which has the translated values, double digit numbers are treated as one, so instead of, for example, translating '20' it translate '2' and then '0', resulting in the string saying 'hi bheahe' (2 =b, 1 = a and 8 = h)
so I need a way to convert the list above to the following
list
['8','-','9',' ','20','-','8','-','5','-','18','-','5',]
I've already tried various codes using
list(),
join() and
split()
But it ends up giving me the same problem.
To sum up, I need to make any given list (converted from the input tuple) into a list of characters that takes into account double digit numbers, spaces and hyphens altogether
This is what I've got so far. (The last I wrote) The input is further up in the code (string)
a1z26 = {'1':'A', '2':'B', '3':'C', '4':'D', '5':'E', '6':'F', '7':'G', '8':'H', '9':'I', '10':'J', '11':'K', '12':'L', '13':'M', '14':'N', '15':'O', '16':'P', '17':'Q', '18':'R', '19':'S', '20':'T', '21':'U', '22':'V', '23':'W', '24':'X', '25':'Y', '26':'Z', '-':'', ' ' : ' ', ', ' : ' '}
translation = ""
code = list(string)
numbersarray1 = code
numbersarray2 = ', '.join(numbersarray1)
for char in numbersarray2:
if char in a1z26:
translation += a1z26[char]
There's no need to convert the tuple to a list. Tuples are iterable too.
I don't think the list you name is what you actually want. You probably want a 2d iterable (not necessarily a list, as you'll see below we can do this in one pass without generating an intermediary list), where each item corresponds to a word and is a list of the character numbers:
[[8, 9], [20, 8, 5, 18, 5]]
From this, you can convert each number to a letter, join the letters together to form the words, then join the words with spaces.
To do this, you need to pass a parameter to split, to tell it how to split your input string. You can achieve all of this with a one liner:
plaintext = ' '.join(''.join(num_to_letter[int(num)] for num in word.split('-'))
for word in ciphertext.split(' '))
This does exactly the splitting procedure as described above, and then for each number looks into the dict num_to_letter to do the conversion.
Note that you don't even need this dict. You can use the fact that A-Z in unicode is contiguous so to convert 1-26 to A-Z you can do chr(ord('A') + num - 1).
You don't really need hypens, am I right?
I suggest you the following approach:
a = '- -'.join(string).split('-')
Now a is ['8', '9', ' ', '20', '8', '5', '18', '5']
You can then convert each number to the proper character using your dictionary
b = ''.join([a1z26[i] for i in a])
Now b is equal to HI THERE
I think, it's better to apply regular expressions there.
Example:
import re
...
src = ('8-9', '20-8-5-18-5')
res = [match for tmp in src for match in re.findall(r"([0-9]+|[^0-9]+)", tmp + " ")][:-1]
print(res)
Result:
['8', '-', '9', ' ', '20', '-', '8', '-', '5', '-', '18', '-', '5']
using regex here is solution
import re
string = '8-9 20-8-5-18-5'
exp=re.compile(r'[0-9]+|[^0-9]+')
data= exp.findall(string)
print(data)
output
['8', '-', '9', ' ', '20', '-', '8', '-', '5', '-', '18', '-', '5']
if you want to get hi there from the input string , here is a method (i am assuming all character are in uppercase):
import re
string = '8-9 20-8-5-18-5'
exp=re.compile(r'[0-9]+|[^0-9]+')
data= exp.findall(string)
new_str =''
for i in range(len(data)):
if data[i].isdigit():
new_str+=chr(int(data[i])+64)
else:
new_str+=data[i]
result = new_str.replace('-','')
output:
HI THERE
You could also try this itertools solution:
from itertools import chain
from itertools import zip_longest
def separate_list(lst, delim, sep=" "):
result = []
for x in lst:
chars = x.split(delim) # 1
pairs = zip_longest(chars, [delim] * (len(chars) - 1), fillvalue=sep) # 2, 3
result.extend(list(chain.from_iterable(pairs))) # 4
return result[:-1] # 5
print(separate_list(["8-9", "20-8-5-18-5"], delim="-"))
Output:
['8', '-', '9', ' ', '20', '-', '8', '-', '5', '-', '18', '-', '5']
Explanation of above code:
Split each string by delimiter '-'.
Create interspersing delimiters.
Create pairs of characters and separators with itertools.zip_longest.
Extend flattened pairs to result list with itertools.chain.from_iterable.
Remove trailing ' ' from result list added.
You could also create your own intersperse generator function and apply it twice:
from itertools import chain
def intersperse(iterable, delim):
it = iter(iterable)
yield next(it)
for x in it:
yield delim
yield x
def separate_list(lst, delim, sep=" "):
return list(
chain.from_iterable(
intersperse(
(intersperse(x.split(delim), delim=delim) for x in lst), delim=[sep]
)
)
)
print(separate_list(["8-9", "20-8-5-18-5"], delim="-"))
# ['8', '-', '9', ' ', '20', '-', '8', '-', '5', '-', '18', '-', '5']

Python regular expression split string into numbers and text/symbols

I would like to split a string into sections of numbers and sections of text/symbols
my current code doesn't include negative numbers or decimals, and behaves weirdly, adding an empty list element on the end of the output
import re
mystring = 'AD%5(6ag 0.33--9.5'
newlist = re.split('([0-9]+)', mystring)
print (newlist)
current output:
['AD%', '5', '(', '6', 'ag ', '0', '.', '33', '--', '9', '.', '5', '']
desired output:
['AD%', '5', '(', '6', 'ag ', '0.33', '-', '-9.5']
Your issue is related to the fact that your regex captures one or more digits and adds them to the resulting list and digits are used as a delimiter, the parts before and after are considered. So if there are digits at the end, the split results in the empty string at the end to be added to the resulting list.
You may split with a regex that matches float or integer numbers with an optional minus sign and then remove empty values:
result = re.split(r'(-?\d*\.?\d+)', s)
result = filter(None, result)
To match negative/positive numbers with exponents, use
r'([+-]?\d*\.?\d+(?:[eE][-+]?\d+)?)'
The -?\d*\.?\d+ regex matches:
-? - an optional minus
\d* - 0+ digits
\.? - an optional literal dot
\d+ - one or more digits.
Unfortunately, re.split() does not offer an "ignore empty strings" option. However, to retrieve your numbers, you could easily use re.findall() with a different pattern:
import re
string = "AD%5(6ag0.33-9.5"
rx = re.compile(r'-?\d+(?:\.\d+)?')
numbers = rx.findall(string)
print(numbers)
# ['5', '6', '0.33', '-9.5']
As mentioned here before, there is no option to ignore the empty strings in re.split() but you can easily construct a new list the following way:
import re
mystring = "AD%5(6ag0.33--9.5"
newlist = [x for x in re.split('(-?\d+\.?\d*)', mystring) if x != '']
print newlist
output:
['AD%', '5', '(', '6', 'ag', '0.33', '-', '-9.5']

Splitting strings into lists and splitting again

I want to split the string
" 510 -9999999 9 99 12 5 [3, 0] [] [6] "
(which contains more or less random numbers of whitespaces between the entries) into it's component parts, including the lists within the string. I can get to this
['510', '-9999999', '9', '99', '12', '5', '[3,', '0]', '[]', '[6]']
through using split and replace. However, I then want to reconstitute the lists within the original string so that I can get to
['510', '-9999999', '9', '99', '12', '5', '[3,0]', '[]', '[6]'].
The real problem is that this string is one of many and the lists may contain many, or no components so I have to deal with this is a general way.
I could potentially search for '[', then search for ']' to close up the list but, as I don't know the length of any of the lists going in, this seems an inefficient way of doing things.
Any help greatly appreciated!
There is always regex, but you can do it on the cheap like this
>>> import shlex
>>> shlex.split(s.replace('[','"[').replace(']',']"'))
['510', '-9999999', '9', '99', '12', '5', '[3, 0]', '[]', '[6]']
The proper solution would be to use pyparsing module, or even better to control the input source to give you something more sensible like json.
If lists aren't nested, you can try this:
def mysplit (a):
return re.split(' +', re.sub('\\[(.*?)\\]', lambda m: '[{}]'.format(m.groups()[0].replace(' ', '')), a))
If lists can't be nested then I think it is possible to preprocess string with:
s = " 510 -9999999 9 99 12 5 [3, 0] [] [6] "
opened = False
s_new = ""
for i in s:
if i == "[":
opened = True
if i == "]":
opened = False
if not opened or (opened and i != " "):
s_new += i
And then split it into list:
l = s_new.split()

Categories

Resources