Regex Dollar Amount with Spaces - python

I'm looking for an expression that will return $55.66 from this $ 55 66
Note: the amount of spaces between the $ and number could vary.
It will also need to work if the value is less than $10 i.e, something like $ 6 05

For the simple case you've described, you could just split and concatenate the string.
s = '$ 55 66'
s = s.split()
print s[0] + s[1] + '.' + s[2]
>>> $55.66
To support commas
s = '$ 424 552 66'
s = s.split()
print s[0] + ','.join(s[1:-1]) + '.' + s[-1]
>>> $424,552.66

This is the regular expression to find the dollar amount.
re.findall(r'(\$?\d+[,\.]\d+);

Related

How to format a string of nine digits in Python?

I have a range of string such as: "024764108", "002231531", "005231329", they have exactly 9 digits. And I want to add - to each group of 3 digits. The result I want is as below:
"024-764-108", "002-231-531", "005-231-329".
How can I explain my think to python?
Here is a dynamic solution:
In [41]: df
Out[41]:
num
0 024764108
1 002231531
2 005231329
3 012345678901234
In [42]: df.num.str.extractall(r'(\d{3})').groupby(level=0)[0].apply('-'.join)
Out[42]:
0 024-764-108
1 002-231-531
2 005-231-329
3 012-345-678-901-234
Name: 0, dtype: object
If using python 3.6 you could consider 'f strings', f strings allow you to do some processing within the string.
f'{string[:3]}-{string[3:6]}-{string[6:]}'
Another option would be to split your string into three parts then do a join on the array.
split_string = [string[i: i + 3] for i in range(0, len(string), 3)]
formated_number = '-'.join(split_string)
The first line of this creates an array with sub strings of length 3, then it joins the elements of that array with a '-' character in between.
There is probably a better way to do this but you can use [] to split the string into sections of 3.
old_str = "024764108"
new_str = old_str[:3] + '-' + old_str[3:6] + '-' + old_str[6:]
Easy solution:
number = "024764108"
new_number = number[:3] + '-' + number[3:6]+ '-' + number[6:]
Consider this code, using string slicing: The segment of code that converts this str to your format is string[0:3] + "-" + string[3:6] + "-" + string[6:9]
Here is your updated method and some test cases. Also, it only accepts outputs which contain exactly 9 digits.
def format_digitstring(string:str):
if len(string) != 9:
return None
return string[0:3] + "-" + string[3:6] + "-" + string[6:9]
s1 = "024764108"
s2 = "002231531"
s3 = "005231329"
s4 = "00112341"
print(format_digitstring(s1))
print(format_digitstring(s2))
print(format_digitstring(s3))
print(format_digitstring(s4))
Output:
024-764-108
002-231-531
005-231-329
None
This also do:
import re
s='024764108'
print(('{}-'*2+'{}').format(*re.findall('(...)',s)))
or if you want to do it on all row, you can use panda's apply function.
Look ahead positive, \d{3} means three digits which followed with digit (?=\d), '-' is added after three digits ('\1-').
import re
number="024764108"
re.sub(r'(\d{3})(?=\d)',r'\1-',number)

Reverse a string in Python but keep numbers in original order

I can reverse a string using the [::- 1] syntax. Take note of the example below:
text_in = 'I am 25 years old'
rev_text = text_in[::-1]
print rev_text
Output:
dlo sraey 52 ma I
How can I reverse only the letters while keeping the numbers in order?
The desired result for the example is 'dlo sraey 25 ma I'.
Here's an approach with re:
>>> import re
>>> text_in = 'I am 25 years old'
>>> ''.join(s if s.isdigit() else s[::-1] for s in reversed(re.split('(\d+)', text_in)))
'dlo sraey 25 ma I'
>>>
>>> text_in = 'Iam25yearsold'
>>> ''.join(s if s.isdigit() else s[::-1] for s in reversed(re.split('(\d+)', text_in)))
'dlosraey25maI'
Using split() and join() along with str.isdigit() to identify numbers :
>>> s = 'I am 25 years old'
>>> s1 = s.split()
>>> ' '.join([ ele if ele.isdigit() else ele[::-1] for ele in s1[::-1] ])
=> 'dlo sraey 25 ma I'
NOTE : This only works with numbers that are space separated. For others, check out timegeb's answer using regex.
Here is a step by step approach:
text_in = 'I am 25 years old'
text_seq = list(text_in) # make a list of characters
text_nums = [c for c in text_seq if c.isdigit()] # extract the numbers
num_ndx = 0
revers = []
for idx, c in enumerate(text_seq[::-1]): # for each char in the reversed text
if c.isdigit(): # if it is a number
c = text_nums[num_ndx] # replace it by the number not reversed
num_ndx += 1
revers.append(c) # if not a number, preserve the reversed order
print(''.join(revers)) # output the final string
Output :
dlo sraey 25 ma I
You can do it in pythonic way straight forward like below..
def rev_except_digit(text_in):
rlist = text_in[::-1].split() #Reverse the whole string and split into list
for i in range(len(rlist)): # Again reverse only numbers
if rlist[i].isdigit():
rlist[i] = rlist[i][::-1]
return ' '.join(rlist)
Test:
Original: I am 25 years 345 old 290
Reverse: 290 dlo 345 sraey 25 ma I
you can find official python doc here split() and other string methods, slicing[::-1]
text = "I am 25 years old"
new_text = ''
text_rev = text[::-1]
for i in text_rev.split():
if not i.isdigit():
new_text += i + " ";
else:
new_text += i[::-1] + " ";
print(new_text)

Python: how to replace characters from i-th to j-th matches?

For example, if I have:
"+----+----+---+---+--+"
is it possible to replace from second to fourth + to -?
If I have
"+----+----+---+---+--+"
and I want to have
"+-----------------+--+"
I have to replace from 2-nd to 4-th + to -. Is it possible to achieve this by regex? and how?
If you can assume the first character is always a +:
string = '+' + re.sub(r'\+', r'-', string[1:], count=3)
Lop off the first character of your string and sub() the first three + characters, then add the initial + back on.
If you can't assume the first + is the first character of the string, find it first:
prefix = string.index('+') + 1
string = string[:prefix] + re.sub(r'\+', r'-', string[prefix:], count=3)
I would rather iterate over the string, and then replace the pluses according to what I found.
secondIndex = 0
fourthIndex = 0
count = 0
for i, c in enumerate(string):
if c == '+':
count += 1
if count == 2 and secondIndex == 0:
secondIndex = i
elif count == 4 and fourthIndex == 0:
fourthIndex = i
string = string[:secondIndex] + '-'*(fourthIndex-secondIndex+1) + string[fourthIndex+1:]
Test:
+----+----+---+---+--+
+-----------------+--+
I split the string into an array of strings using the character to replace as the separator.
Then rejoin the array, in sections, using the required separators.
example_str="+----+----+---+---+--+"
swap_char="+"
repl_char='-'
ith_match=2
jth_match=4
list_of_strings = example_str.split(swap_char)
new_string = ( swap_char.join(list_of_strings[0:ith_match]) + repl_char +
repl_char.join(list_of_strings[ith_match:jth_match]) +
swap_char + swap_char.join(list_of_strings[jth_match:]) )
print (example_str)
print (new_string)
running it gives :
$ python ./python_example.py
+----+----+---+---+--+
+-------------+---+--+
with regex? Yes, that's possible.
^(\+-+){1}((?:\+[^+]+){3})
explanation:
^
(\+-+){1} # read + and some -'s until 2nd +
( # group 2 start
(?:\+[^+]+){3} # read +, followed by non-plus'es, in total 3 times
) # group 2 end
testing:
$ cat test.py
import re
pattern = r"^(\+-+){1}((?:\+[^+]+){3})"
tests = ["+----+----+---+---+--+"]
for test in tests:
m = re.search(pattern, test)
if m:
print (test[0:m.start(2)] +
"-" * (m.end(2) - m.start(2)) +
test[m.end(2):])
Adjusting is simple:
^(\+-+){1}((?:\+[^+]+){3})
^ ^
the '1' indicates that you're reading up to the 2nd '+'
the '3' indicates that you're reading up to the 4th '+'
these are the only 2 changes you need to make, the group number stays the same.
Run it:
$ python test.py
+-----------------+--+
This is pythonic.
import re
s = "+----+----+---+---+--+"
idx = [ i.start() for i in re.finditer('\+', s) ][1:-2]
''.join([ j if i not in idx else '-' for i,j in enumerate(s) ])
However, if your string is constant and want it simple
print (s)
print ('+' + re.sub('\+---', '----', s)[1:])
Output:
+----+----+---+---+--+
+-----------------+--+
Using only comprehension lists:
s1="+----+----+---+---+--+"
indexes = [i for i,x in enumerate(s1) if x=='+'][1:4]
s2 = ''.join([e if i not in indexes else '-' for i,e in enumerate(s1)])
print(s2)
+-----------------+--+
I saw you already found a solution but I do not like regex so much, so maybe this will help another! :-)

How to add numbers to a string with Python or GREL

I have >4000 numbers in a column that need to be manipulated..
They look like this:
040 413 560 89 or 0361 223240
How dow I put it into the folllowing format:
+49 (040) 41356089 or +49 (0361) 223240
They all need to have the same country dialling code +49 and then the respective area code put into brackets and some are already in the correct format.
We can split the string into groups:
>>> groups = '040 413 560 89'.split()
>>> groups
['040', '413', '560', '89']
We can slice the groups, and assign to variables, also join the later groups into one string:
>>> city, number = groups[0], ''.join(groups[1:])
>>> city, number
('040', '41356089')
We can format a new string:
>>> '+49 ({}) {}'.format(city, number)
'+49 (040) 41356089'
We can check if a number already starts with +:
>>> '+49 (040) 41356089'.startswith('+')
True
Do so like this:
ls_alreadycorrected = ['(',')','+49']
str_in = '040 413 560 89' #or apply to list
for flag in ls_alreadycorrected:
if flag not in str_in:
how_many_spaces = str_in.count(' ')
if how_many_spaces > 2:
str_in = str_in.replace(' ','')
str_out = '+049'+' ' + '(' + str_in[:3] + ') ' + str_in[-8:]
else:
str_in = str_in.replace(' ','')
str_out = '+049'+' ' + '(' + str_in[:4] + ') ' + str_in[-6:]
That's only given you have to types of phone numbers. For a list of numbers, put this on top instead of str_in
for number in list_of_numbers:
str_in = number
Cheers
You can do this.
phone = "456789"
cod = 123
final = str(cod) + phone
Result is "123456789"

Pythonic way to eval all octal values in a string as integers

So I've got a string that looks like "012 + 2 - 01 + 24" for example. I want to be able to quickly (less code) evaluate that expression...
I could use eval() on the string, but I don't want 012 to be represented in octal form (10), I want it to be represented as an int (12).
My solution for this works, but it is not elegant. I am sort of assuming that there is a really good pythonic way to do this.
My solution:
#expression is some string that looks like "012 + 2 - 01 + 24"
atomlist = []
for atom in expression.split():
if "+" not in atom and "-" not in atom:
atomlist.append(int(atom))
else:
atomlist.append(atom)
#print atomlist
evalstring = ""
for atom in atomlist:
evalstring+=str(atom)
#print evalstring
num = eval(evalstring)
Basically, I tear appart the string, and find numbers in it and turn them into ints, and then I rebuild the string with the ints (essentially removing leading 0's except where 0 is a number on its own).
How can this be done better?
I'd be tempted to use regular expressions to remove the leading zeroes:
>>> re.sub(r'\b0+(?!\b)', '', '012 + 2 + 0 - 01 + 204 - 0')
'12 + 2 + 0 - 1 + 204 - 0'
This removes zeroes at the start of every number, except when the number consists entirely of zeroes:
the first \b matches a word (token) boundary;
the 0+ matches one or more consecutive zeroes;
the (?!\b) (negative lookahead) inhibits matches where the sequence of zeroes is followed by a token boundary.
One advantage of this approach over split()-based alternatives is that it doesn't require spaces in order to work:
>>> re.sub(r'\b0+(?!\b)', '', '012+2+0-01+204-0')
'12+2+0-1+204-0'
You can do this in one line using lstrip() to strip off any leading zeros:
>>> eval("".join(token.lstrip('0') for token in s.split()))
37
I'd like to do it this way:
>>> s = '012 + 2 + 0 - 01 + 204 - 0'
>>> ' '.join(str(int(x)) if x.isdigit() else x for x in s.split())
'12 + 2 + 0 - 1 + 204 - 0'
Use float() if you want to handle them too :)
int does not assume that a leading zero indicates an octal number:
In [26]: int('012')
Out[26]: 12
Accordingly, you can safely evalute the expression with the following code
from operator import add, sub
from collections import deque
def mapper(item, opmap = {'+': add, '-': sub}):
try: return int(item)
except ValueError: pass
return opmap[item]
stack = deque()
# if item filters out empty strings between whitespace sequences
for item in (mapper(item) for item in "012 + 2 - 01 + 24".split(' ') if item):
if stack and callable(stack[-1]):
f = stack.pop()
stack.append(f(stack.pop(), item))
else: stack.append(item)
print stack.pop()
Not a one-liner, but it is safe, because you control all of the functions which can be executed.

Categories

Resources