Related
I am working with Grammatical Evolution (GE) on Python 3.7.
My grammar generates executable strings in the format:
np.where(<variable> <comparison_sign> <constant>, (<probability1>), (<probability2>))
Yet, the string can get quite complex, with several chained np.where .
<constant> in some cases contains leading zeros, which makes the executable string to generate errors. GE is supposed to generate expressions containing leading zeros, however, I have to detect and remove them.
An example of a possible solution containing leading zeros:
"np.where(x < 02, np.where(x > 01.5025, (0.9), (0.5)), (1))"
Problem:
There are two types of numbers containing leading zeros: int and float.
Supposing that I detect "02" in the string. If I replace all occurrences in the string from "02" to "2", the float "01.5025" will also be changed to "01.525", which cannot happen.
I've made several attempts with different re patterns, but couldn't solve it.
To detect that an executable string contains leading zeros, I use:
try:
_ = eval(expression)
except SyntaxError:
new_expression = fix_expressions(expression)
I need help building the fix_expressions Python function.
You could try to come up with a regular expression for numbers with leading zeros and then replace the leading zeros.
import re
def remove_leading_zeros(string):
return re.sub(r'([^\.^\d])0+(\d)', r'\1\2', string)
print(remove_leading_zeros("np.where(x < 02, np.where(x > 01.5025, (0.9), (0.5)), (1))"))
# output: np.where(x < 2, np.where(x > 1.5025, (0.9), (0.5)), (1))
The remove_leading_zeros function basically finds all occurrences of [^\.^\d]0+\d and removes the zeros. [^\.^\d]0+\d translates to not a number nor a dot followed by at least one zero followed by a number. The brackets (, ) in the regex signalize capture groups, which are used to preserve the character before the leading zeros and the number after.
Regarding Csaba Toth's comment:
The problem with 02+03*04 is that there is a zero at the beginning of the string.
One can modify the regex such that it matches also the beginning of the string in the first capture group:
r"(^|[^\.^\d])0+(\d)"
You can remove leading 0's in a string using .lstrip()
str_num = "02.02025"
print("Initial string: %s \n" % str_num)
str_num = str_num.lstrip("0")
print("Removing leading 0's with lstrip(): %s" % str_num)
I'm printing a table - several rows containing various variable types -
example:
print('{:10s} ${:12.0f} {:10.1f}%'.format(ID,value,change))
first $324681 2.4%
where the integers 10, 12, and 10 provide the column spacing I want.
But I want to have the $ amounts printed with a comma separator, thus:
print('{:10s} ${:,.0f} {:10.1f}%'.format(ID,value,change))
first $324,681 2.4%
But this loses the '12' spaces allowed for the second item.
But when I try
print('{:10s} ${:,12.0f} {:10.1f}%'.format(ID,value,change))
I get "ValueError: Invalid format specifier"
How can I get both the commas and control over the column spacing?
Python 3.6 running in Spyder.
This should do the trick:
print('{:10s} ${:1,d} {:10.1f}%'.format('first', 324681, 2.4))
OUTPUT:
first $324,681 2.4%
If you are content to have the specified total width but the dollar in a fixed column possibly separated from the digits, then you could just do what you were doing, but with the 12 before the ,.
>>> value = 324681
>>> ID = "first"
>>> change = 2.4
>>> print('{:10s} ${:12,.0f} {:10.1f}%'.format(ID,value,change))
first $ 324,681 2.4%
If you want the numbers to follow immediately after the $, then you can format it as a string without any padding, and then use the string in a fixed-width format specifier:
>>> print('{:10s} {:13s} {:10.1f}%'.format(ID ,'${:,.0f}'.format(value), change))
first $324,681 2.4%
or:
>>> print('{:10s} {:>13s} {:10.1f}%'.format(ID ,'${:,.0f}'.format(value), change))
first $324,681 2.4%
(The width specifier is increased to 13 here, because the $ sign itself is in addition to the 12 characters used for the number.)
I have this string:
abc,12345,abc,abc,abc,abc,12345,98765443,xyz,zyx,123
What can I use to add a 0 to the beginning of each number in this string? So how can I turn that string into something like:
abc,012345,abc,abc,abc,abc,012345,098765443,xyz,zyx,0123
I've tried playing around with Regex but I'm unsure how I can use that effectively to yield the result I want. I need it to match with a string of numbers rather than a positive integer, but with only numbers in the string, so not something like:
1234abc567 into 01234abc567 as it has letters in it. Each value is always separated by a comma.
Use re.sub,
re.sub(r'(^|,)(\d)', r'\g<1>0\2', s)
or
re.sub(r'(^|,)(?=\d)', r'\g<1>0', s)
or
re.sub(r'\b(\d)', r'0\1', s)
Try following
re.sub(r'(?<=\b)(\d+)(?=\b)', r'\g<1>0', str)
If the numbers are always seperated by commas in your string, you can use basic list methods to achieve the result you want.
Let's say your string is called x
y=x.split(',')
x=''
for i in y:
if i.isdigit():
i='0'+i
x=x+i+','
What this piece of code does is the following:
Splits your string into pieces depending on where you have commas and returns a list of the pieces.
Checks if the pieces are actually numbers, and if they are a 0 is added using string concatenation.
Finally your string is rebuilt by concatenating the pieces along with the commas.
How can I automatically place dots by separating it with 3 digits in a group beginning from the right?
Example:
in: 1234; out 1.234
in: 12345678; out 12.345.678
You are looking for a thousands-separator. Format your number with the format() function to using commas as the thousands separator, then replace the commas with dots:
>>> format(1234, ',').replace(',', '.')
'1.234'
>>> format(12345678, ',').replace(',', '.')
'12.345.678'
Here the ',' format signals that the decimal number should be formatted with a thousands-separator (see the Format Specification Mini-language).
The same can be achieved in a wider string format with the str.format() method, where placeholders in the template are replaced with values:
>>> 'Some label for the value: {:,}'.format(1234).replace(',', '.')
'Some label for the value: 1,234'
but then you run the risk of accidentally replacing other full stops in the output string too!
Your other option would be to use the locale-dependent 'n' format, but that requires your machine to be configured for a locale that sets the right LC_NUMERIC options.
Here is a simple solution:
>>> a = 12345678
>>> "{:,}".format(a)
'12,345,678'
>>> "{:,}".format(a).replace(",", ".")
'12.345.678'
>>>
This uses the .format method of a string to add the comma separators and then the .replace method to change those commas to periods.
.I'm trying to achieve something with Python where it can intelligently be able to transform an input and apply string format rules for a repeatable output, sort of a like a smart ETL function, if you will. Case in point, I will be receiving numerical data from geographically disperse clients and that data needs to be transformed into a repeatable format so it can be consumed by our legacy financial engine.
For example, I might receive numerical data such as:
input = 123,456,789.4533
This input data needs to be reformatted to an output of 26 digits, depicted as (17)(9), where the first 17 digits are the values of the input value left of the decimal point, zero padded on the left and the 9 would be all the input values to the right of the decimal point, again, zero padded on the right. So, if we were to transform it, it would look like:
output = 00000000123456789453300000
Now, there might be times where the input data would look like this:
123456789.4533
123.456.789,4533 (european currency)
What would be the best way to perform this in Python?
You can do it with regular expressions
import re
inputs = [r'123,456,789.4533',r'123456789.4533',r'123,456,789,4533',r'123.456.789,4533']
for input in inputs:
decimal = re.search(r'(?<=[.,])\d+$',input).group()
integer = re.search(r'.*(?=[.,]\d+$)',input).group()
checkdigit = lambda x : x.isdigit()
integer = ''.join([character for character in integer if checkdigit(character)])
print integer.rjust(17,'0') + decimal.ljust(9,'0')
prints:
00000000123456789453300000
00000000123456789453300000
00000000123456789453300000
If you're absolutely sure the decimal separator will be present, you can do it like this:
separator = re.match('.*(\D)\d*$', input).group(1)
integer_part, decimal_part = (re.sub('\D', '', x) for x in input.split(separator))
If you're not, you must know what the separator is beforehand, or your problem will be undecidable (what does 123,456 mean? 123456e0 in american notation or 123456e-3 in european one?)
Once you have the integer part and the decimal part, you can pad them the way you need to:
output = integer_part.zfill(17) + decimal_part.ljust(9, '0')
Explanation:
To find what the separator is, I used a regular expression to capture the last non-digit character in the input;
Splitting the string using that separator, you get the integer and decimal parts; removing any remaining non-digits on them, you get only digits.
>>> def transfer(input,euro=false):
... part1, _, part2 = input.partition(',' if euro else '.')
... nondigit = lambda x:x.isdigit()
... part1=filter(nondigit, part1)
... part2=filter(nondigit, part2)
... return part1.rjust(17,'0') + part2.ljust(9,'0')
>>> transfer('123456789.4533')
'00000000123456789453300000'
>>> transfer('123.456.789,4533', true)
'00000000123456789453300000'