what's the different between "%0.6X" and "%06X" in python ?
def formatTest():
print "%0.6X" %1024
print "%06X" %1024
if __name__=='__main__':
formatTest()
The result is :
000400
000400
https://docs.python.org/2/library/stdtypes.html#string-formatting
A conversion specifier contains two or more characters and has the following components, which must occur in this order:
The '%' character, which marks the start of the specifier.
Mapping key (optional), consisting of a parenthesised sequence of characters (for example, (somename)).
Conversion flags (optional), which affect the result of some conversion types.
Minimum field width (optional). If specified as an '*' (asterisk), the actual width is read from the next element of the tuple in values, and the object to convert comes after the minimum field width and optional precision.
Precision (optional), given as a '.' (dot) followed by the precision. If specified as '*' (an asterisk), the actual width is read from the next element of the tuple in values, and the value to convert comes after the precision.
Length modifier (optional).
Conversion type.
So the documentation doesn't clearly state what the interaction of width versus precision is. Let's explore some more.
>>> '%.4X' % 1024
'0400'
>>> '%6.4X' % 1024
' 0400'
>>> '%#.4X' % 1024
'0x0400'
>>> '%#8.4X' % 1024
' 0x0400'
>>> '%#08.4X' % 1024
'0x000400'
Curious. It appears that width (the part before .) controls the whole field, and space-pads by default, unless flagged with 0. Precision (the part after .) controls only the integer part, and always 0-pads.
Let's take a look at new-style formatting. It's the future! (And by future I mean it's available now and has been for many years.)
https://docs.python.org/2/library/string.html#format-specification-mini-language
width is a decimal integer defining the minimum field width. If not specified, then the field width will be determined by the content.
When no explicit alignment is given, preceding the width field by a zero ('0') character enables sign-aware zero-padding for numeric types. This is equivalent to a fill character of '0' with an alignment type of '='.
The precision is a decimal number indicating how many digits should be displayed after the decimal point for a floating point value formatted with 'f' and 'F', or before and after the decimal point for a floating point value formatted with 'g' or 'G'. For non-number types the field indicates the maximum field size - in other words, how many characters will be used from the field content. The precision is not allowed for integer values.
Much more clearly specified! {0:06X} is valid, {0:0.6X} is not.
>>> '{0:06x}'.format(1024)
'000400'
>>> '{0:0.6x}'.format(1024)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: Precision not allowed in integer format specifier
06 means that the data passed in, if less than 6 digits long will be prepended by 0 to fill that space. The x denotes the type of data, in this case the format string is expecting a hexadecimal number.
1024 in hexadecimal is 400, which is why you get 000400 as your result.
For 0.6x, the . denotes the precision (width) of characters to be shown. So, %0.6x means:
% - start of the format string specification
A 0, which means that for numerical values, pad them by 0 to meet the format specification.
A . which precision modifier. The number after this (6) is how much width to give for the precision characters.
Finally, the x which is the conversion type, in this case hexadecimal.
Since hexadecimal numbers don't have float components, the results of both those operations are the same.
These happen to be equivalent. That doesn't mean you can always ignore the ., though; the reason for the equivalence is pretty specific and doesn't generalize.
0.6X specifies a precision of 6, whereas 06X specifies a minimum field width of 6. The documentation doesn't say what a precision does for the X conversion type, but Python follows the behavior of printf here, where a precision for X is treated as a minimum number of digits to print.
With 06X, the formatting produces at least 6 digits, adding leading zeros if necessary. With 0.6X, the formatting produces a result at least 6 characters wide. The result would be padded with spaces, but the 0 says to pad with zeros instead. Overall, the behavior works out to be the same.
Related
I'm dividing a very long into much smaller number. Both are of type decimal.Decimal().
The result is coming out in scientific notation. How do I stop this? I need to print the number in full.
>>> decimal.getcontext().prec
50
>>> val
Decimal('1000000000000000000000000')
>>> units
Decimal('1500000000')
>>> units / val
Decimal('1.5E-15')
The precision is kept internally - you just have to explicitly call for the number of decimal places you want at the point you are exporting your decimal value to a string.
So, if you are going a print, or inserting the value in an HTML template, the first step is to use the string format method (or f-strings), to ensure the number is encompassed:
In [29]: print(f"{units/val:.50f}")
0.00000000000000150000000000000000000000000000000000
Unfortunatelly, the string-format minilanguage has no way to eliminate by itself the redundant zeroes on the right hand side. (the left side can be padded with "0", " ", custom characters, whatever one want, but all the precision after the decimal separator is converted to trailing 0s).
Since finding the least significant non-zero digit is complicated - otherwiser we could use a parameter extracted from the number instead of the "50" for precision in the format expression, the simpler thing is to remove those zeros after formatting take place, with the string .rstrip method:
In [30]: print(f"{units/val:.50f}".rstrip("0"))
0.0000000000000015
In short: this seems to be the only way to go: in all interface points, where the number is leaving the core to an output where it is representd as a string, you format it with an excess of precision with the fixed point notation, and strip out the tailing zeros with f-string:
return template.render(number=f"{number:.50f}".rstrip("0"), ...)
Render the decimal into a formatted string with a float type-indicator {:,f}, and it will display just the right number of digits to express the whole number, regardless of whether it is a very large integer or a very large decimal.
>>> val
Decimal('1000000000000000000000000')
>>> units
Decimal('1500000000')
>>> "{:,f}".format(units / val)
'0.0000000000000015'
# very large decimal integer, formatted as float-type string, appears without any decimal places at all when it has none! Nice!
>>> "{:,f}".format(units * val)
'1,500,000,000,000,000,000,000,000,000,000,000'
You don't need to specify the decimal places. It will display only as many as required to express the number, omitting that trail of useless zeros that appear after the final decimal digit when the decimal is shorter than a fixed format width. And you don't get any decimal places if the number has no fraction part.
Very large numbers are therefore accommodated without having to second guess how large they will be. And you don't have to second guess whether they will be have decimal places either.
Any specified thousands separator {:,f} will likewise only have effect if it turns out that the number is a large integer instead of a long decimal.
Proviso
Decimal(), however, has this idea of significant places, by which it will add trailing zeros if it thinks you want them.
The idea is that it intelligently handles situations where you might be dealing with currency digits such as £ 10.15. To use the example from the documentation:
>>> decimal.Decimal('1.30') + decimal.Decimal('1.20')
Decimal('2.50')
It makes no difference if you format the Decimal() - you still get the trailing zero if the Decimal() deems it to be significant:
>>> "{:,f}".format( decimal.Decimal('1.30') + decimal.Decimal('1.20'))
'2.50'
The same thing happens (perhaps for some good reason?) when you treat thousands and fractions together:
>>> decimal.Decimal(2500) * decimal.Decimal('0.001')
Decimal('2.500')
Remove significant trailing zeros with the Decimal().normalize() method:
>>> (2500 * decimal.Decimal('0.001')).normalize()
Decimal('2.5')
I have been reading the python documentation on the format operator'%' and have encounter some questions.
A conversion specifier contains two or more characters and has the following components, which must occur in this order:
1.The '%' character, which marks the start of the specifier.
2.Mapping key (optional), consisting of a parenthesised sequence of characters (for example, (somename)).
3.Conversion flags (optional), which affect the result of some conversion types.
4.Minimum field width (optional). If specified as an '*' (asterisk), the actual width is read from the next element of the tuple in values, and the object to convert comes after the minimum field width and optional precision.
5.Precision (optional), given as a '.' (dot) followed by the precision. If specified as '*' (an asterisk), the actual precision is read from the next element of the tuple in values, and the value to convert comes after the precision.
6.Length modifier (optional).
7.Conversion type.
A length modifier (h, l, or L) may be present, but is ignored as it is not necessary for Python – so e.g. %ld is identical to %d.
These are the two part that I don't understand
For the Minimum field width, what does the If specified as an '*' (asterisk), the actual width is read from the next element of the tuple in values, and the object to convert comes after the minimum field width and optional precision.means?
Similarly, for the Precision, what does the If specified as '*' (an asterisk), the actual precision is read from the next element of the tuple in values, and the value to convert comes after the precision.means?
For the Length Modifier(k,l,L) what does each of them does to the formatting?
From unicodedata doc:
unicodedata.digit(chr[, default]) Returns the digit value assigned to
the character chr as integer. If no such value is defined, default is
returned, or, if not given, ValueError is raised.
unicodedata.numeric(chr[, default]) Returns the numeric value assigned
to the character chr as float. If no such value is defined, default is
returned, or, if not given, ValueError is raised.
Can anybody explain me the difference between those two functions?
Here ones can read the implementation of both functions but is not evident for me what is the difference from a quick look because I'm not familiar with CPython implementation.
EDIT 1:
Would be nice an example that shows the difference.
EDIT 2:
Examples useful to complement the comments and the spectacular answer from #user2357112:
print(unicodedata.digit('1')) # Decimal digit one.
print(unicodedata.digit('١')) # ARABIC-INDIC digit one
print(unicodedata.digit('¼')) # Not a digit, so "ValueError: not a digit" will be generated.
print(unicodedata.numeric('Ⅱ')) # Roman number two.
print(unicodedata.numeric('¼')) # Fraction to represent one quarter.
Short answer:
If a character represents a decimal digit, so things like 1, ¹ (SUPERSCRIPT ONE), ① (CIRCLED DIGIT ONE), ١ (ARABIC-INDIC DIGIT ONE), unicodedata.digit will return the digit that character represents as an int (so 1 for all of these examples).
If the character represents any numeric value, so things like ⅐ (VULGAR FRACTION ONE SEVENTH) and all the decimal digit examples, unicodedata.numeric will give that character's numeric value as a float.
For technical reasons, more recent digit characters like 🄌 (DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ZERO) may raise a ValueError from unicodedata.digit.
Long answer:
Unicode characters all have a Numeric_Type property. This property can have 4 possible values: Numeric_Type=Decimal, Numeric_Type=Digit, Numeric_Type=Numeric, or Numeric_Type=None.
Quoting the Unicode standard, version 10.0.0, section 4.6,
The Numeric_Type=Decimal property value (which is correlated with the General_Category=Nd
property value) is limited to those numeric characters that are used in decimal-radix
numbers and for which a full set of digits has been encoded in a contiguous range,
with ascending order of Numeric_Value, and with the digit zero as the first code point in
the range.
Numeric_Type=Decimal characters are thus decimal digits fitting a few other specific technical requirements.
Decimal digits, as defined in the Unicode Standard by these property assignments, exclude
some characters, such as the CJK ideographic digits (see the first ten entries in Table 4-5),
which are not encoded in a contiguous sequence. Decimal digits also exclude the compatibility
subscript and superscript digits, to prevent simplistic parsers from misinterpreting
their values in context. (For more information on superscript and subscripts, see
Section 22.4, Superscript and Subscript Symbols.) Traditionally, the Unicode Character
Database has given these sets of noncontiguous or compatibility digits the value Numeric_Type=Digit, to recognize the fact that they consist of digit values but do not necessarily
meet all the criteria for Numeric_Type=Decimal. However, the distinction between
Numeric_Type=Digit and the more generic Numeric_Type=Numeric has proven not to be
useful in implementations. As a result, future sets of digits which may be added to the standard
and which do not meet the criteria for Numeric_Type=Decimal will simply be
assigned the value Numeric_Type=Numeric.
So Numeric_Type=Digit was historically used for other digits not fitting the technical requirements of Numeric_Type=Decimal, but they decided that wasn't useful, and digit characters not meeting the Numeric_Type=Decimal requirements have just been assigned Numeric_Type=Numeric since Unicode 6.3.0. For example, 🄌 (DINGBAT NEGATIVE CIRCLED SANS-SERIF DIGIT ZERO) introduced in Unicode 7.0 has Numeric_Type=Numeric.
Numeric_Type=Numeric is for all characters that represent numbers and don't fit in the other categories, and Numeric_Type=None is for characters that don't represent numbers (or at least, don't under normal usage).
All characters with a non-None Numeric_Type property have a Numeric_Value property representing their numeric value. unicodedata.digit will return that value as an int for characters with Numeric_Type=Decimal or Numeric_Type=Digit, and unicodedata.numeric will return that value as a float for characters with any non-None Numeric_Type.
I can't quite understand what the difference is between the two print statements below for the number I am trying to express in scientific notation. I thought the the bottom one is supposed to allow 2 spaces for the printed result, and move the decimal place 4 times, but the result I get does not corroborate that understanding. As far as the first one, What does 4e mean?
>>> print('{:.4e}'.format(3454356.7))
3.4544e+06
>>> print('{:2.4}'.format(3454356.7))
3.454e+06
All help greatly appreciated.
In the first example , 4e means, 4 decimal places in scientific notation. You can come to know that by doing
>>> print('{:.4e}'.format(3454356.7))
3.4544e+06
>>> print('{:.5e}'.format(3454356.7))
3.45436e+06
>>> print('{:.6e}'.format(3454356.7))
3.454357e+06
In the second example, .4 , means 4 significant figures. And 2 means to fit the whole data into 2 characters
>>> print('{:2.4}'.format(3454356.7))
3.454e+06
>>> print('{:2.5}'.format(3454356.7))
3.4544e+06
>>> print('{:2.6}'.format(3454356.7))
3.45436e+06
Testing with different value of 2
>>> print('-{:20.6}'.format(3454356.7))
- 3.45436e+06
You can learn more from the python documentation on format
If you want to produce a float, you will have to specify the float type:
>>> '{:2.4f}'.format(3454356.7)
'3454356.7000'
Otherwise, if you don’t specify a type, Python will choose g as the type for which precision will mean the precision based on its significant figures, the digits before and after the decimal point. And since you have a precision of 4, it will only display 4 digits, falling back to scientific notation so it doesn’t add false precision.
The precision is a decimal number indicating how many digits should be displayed after the decimal point for a floating point value formatted with 'f' and 'F', or before and after the decimal point for a floating point value formatted with 'g' or 'G'. For non-number types the field indicates the maximum field size - in other words, how many characters will be used from the field content. The precision is not allowed for integer values.
(source, emphasis mine)
Finally, note that the width (the 2 in above format string) includes the full width, including digits before the decimal point, digits after it, the decimal point itself, and the components of the scientific notation. The above result would have a width of 12, so in this case, the width of the format string is simply ignored.
How to truncate a string using str.format in Python? Is it even possible?
There is a width parameter mentioned in the Format Specification Mini-Language:
format_spec ::= [[fill]align][sign][#][0][width][,][.precision][type]
...
width ::= integer
...
But specifying it apparently only works for padding, not truncating:
>>> '{:5}'.format('aaa')
'aaa '
>>> '{:5}'.format('aaabbbccc')
'aaabbbccc'
So it's more a minimal width than width really.
I know I can slice strings, but the data I process here is completely dynamic, including the format string and the args that go in. I cannot just go and explicitly slice one.
Use .precision instead:
>>> '{:5.5}'.format('aaabbbccc')
'aaabb'
According to the documentation of the Format Specification Mini-Language:
The precision is a decimal number indicating how many digits should be displayed after the decimal point for a floating point value formatted with 'f' and 'F', or before and after the decimal point for a floating point value formatted with 'g' or 'G'. For non-number types the field indicates the maximum field size - in other words, how many characters will be used from the field content. The precision is not allowed for integer values.
you may truncate by the precision parameter alone:
>>> '{:.1}'.format('aaabbbccc')
'a'
the size parameter is setting the padded size:
>>> '{:3}'.format('ab')
' ab'
alex