Say I have an arbitrary Pint quantity q. Is there a way to display its units in symbol short form, instead of as a full-length word?
In other words, how would I code unit_symbol() such that it returns "m", not "meter"; "kg" not "kilogram"; etc.? Is there a way to retrieve the short-form unit symbol that is synonym with the quantity's current unit?
import pint
ureg = pint.UnitRegistry()
Q_ = ureg.Quantity
def unit_symbol(q: pint.Quantity) -> str:
# Intended to return "m", not "meter"
# "kg" not "kilogram"
# etc.
# ???
return q.units # returns long-form unit, "meter", "kilogram" etc. :-(
q = Q_(42, ureg.m)
print(unit_symbol(q)) # "meter"... whereas I would like "m"
The above obviously fails to achieve this; it returns the long-form unit.
You can use '~' as a spec for the unit formatting:
q = Q_(42, "m") / Q_(1, "second")
print(format(q, '~')) # 42.0 m / s
print(format(q.u, '~')) # m / s
This feature is apparently undocumented, but can be inferred from the source code for Unit.__format__ (search for "~" on that page to quickly navigate to the relevant piece of code).
I found UnitRegistry.get_symbol(),
ureg.get_symbol(str(q.units)) # "m"
but it seems a bit clunky: converting unit to string, then parsing that string again...
Also this fails for composite units e.g.
q = Q_(42, "m") / Q_(1, "second")
ureg.get_symbol(str(q.units))
# UndefinedUnitError: 'meter / second' is not defined in the unit registry
Use ureg.default_format = '~' if you want the short notation by default. These are also valid options for short units: ~L (LaTeX), ~H (HTML) and ~P (Pretty print).
Related
Basically the title. In pint, is there a way to define the default string formatting per dimension or per unit, instead of 'across the board'?
Stated more precisely: I want to format a quantity's numerical value (i.e., magnitude), based on its physical unit.
Here is what I tried, based on the code shown in the docs:
from pint import UnitRegistry, Unit
ureg = UnitRegistry()
# Specific format for km won't stick...
ureg.default_format = ".0f~"
ureg.km.default_format = ".2fP"
ureg.km.default_format # '.0f~'
# ...as also seen here:
dist = 3 * ureg("km")
time = 500 * ureg("s")
print(f"{dist}, {time}")
# wanted: 3.00 kilometer, 500 s
# got: 3 km, 500 s
Especially when dealing with prices, it's practical to be able to set a 2-digit-default, with all other units having different default format.
PS: I know it's possible to set a default formatting on an individual quantity (e.g. dist.default_format = '.2f~'), but that's too specific for my use case. I want all quantities with the unit 'km' to be displayed with 2 decimals.
I have constructed quite the hacky solution:
from pint import UnitRegistry, Quantity
ureg = UnitRegistry()
# Setting specific formats:
formatdict = {ureg.km: '.2fP'} # extend as required
Quantity.default_format = property(lambda self: formatdict.get(self.u, ".0f~"))
# Works:
dist = 3 * ureg("km")
time = 500 * ureg("s")
print(f"{dist}, {time}") # 3.00 kilometer, 500 s
This works, but I'd be surprised if there isn't a better solution.
EDIT
It only works in a limited sense. ureg.default_format gets changed as well, which prohibits its use in e.g. a pandas.DataFrame:
ureg.default_format # <property at 0x21148ed5ae0>
I'm new to python.
I want to make a calculator and I am facing a problem right now.
Here's a simplified code I am trying to make:
from math import *
input = "(2)(3)e(sqrt(49))pi" #This is an example of equation
equation = "(2)*(3)*e*(sqrt(49))*pi" #The output
How can I add " * " between every ")(", ")e", "e(", and others based on the equation so that I can eval (equation) without having to put "*" manually, just like real life math?
I have tried to do it by making a code like this:
from math import *
input = "(2)(3)e(sqrt(49))pi"
input = input.replace(")(", ")*(")
input = input.replace(")e", ")*e")
input = input.replace("e(", "e*(")
input = input.replace(")pi", ")*pi")
#^^^I can loop this using for loop^^^
equation = input
print(eval(equation))
This definitely only works in this equation. I can loop the replacing method but that would be very inefficient. I don't want to have 49 iterations to just check if 7 different symbols need "*" between it or not.
The issue you will encounter here is that "e(" should be transformed to "e*(" but "sqrt(" should stay. As comments have suggested, the best or "cleanest" solution would be to write a proper parser for your equation. You could put "calculator parser" into your favorite search engine for a quick solution, or if you are interested in over-engineering but learning a lot, you could have a look at parser generators such as ANTLr.
If, for some reason, neither of those are an option, a quick-and-dirty solution could be this:
import re
def add_multiplication_symbols(equation: str) -> str:
constants = ['e', 'pi']
constants_re = '|'.join(f'(?:{re.escape(c)})' for c in constants)
equation = re.sub(r'(\))(\(|\w+)', r'\1*\2', equation)
equation = re.sub(f'({constants_re})' + r'(\()', r'\1*\2', equation)
return equation
Then print(add_multiplication_symbols("(2)(3)e(sqrt(49))pi")) results in (2)*(3)*e*(sqrt(49))*pi.
The function makes use of the re module (regular expressions) to group the cases for all constants together. It tries to work around the issue I described above by defining a set of constant variables (e.g. "e" and "pi") by hand.
I'm new to programming so I thought I'd ask here for help.
So when I use:
eval('12.5 + 3.2'),
it converts 12.5 and 3.2 into floats.
But I want them to be converted into the Decimal datatype.
I can use:
from decimal import Decimal
eval(Decimal(12.5) + Decimal(3.2))
But I can't do that in my program as I'm accepting user input.
I've found a solution but it uses regular expressions, which I'm not familiar with right now (and I can't find it again for some reason).
It would be great if someone could help me out. Thanks!
UPDATE: apparently the official docs has a recipe that does exactly what you're looking for. From https://docs.python.org/3/library/tokenize.html#examples:
from tokenize import tokenize, untokenize, NUMBER, STRING, NAME, OP
from io import BytesIO
def decistmt(s):
"""Substitute Decimals for floats in a string of statements.
>>> from decimal import Decimal
>>> s = 'print(+21.3e-5*-.1234/81.7)'
>>> decistmt(s)
"print (+Decimal ('21.3e-5')*-Decimal ('.1234')/Decimal ('81.7'))"
The format of the exponent is inherited from the platform C library.
Known cases are "e-007" (Windows) and "e-07" (not Windows). Since
we're only showing 12 digits, and the 13th isn't close to 5, the
rest of the output should be platform-independent.
>>> exec(s) #doctest: +ELLIPSIS
-3.21716034272e-0...7
Output from calculations with Decimal should be identical across all
platforms.
>>> exec(decistmt(s))
-3.217160342717258261933904529E-7
"""
result = []
g = tokenize(BytesIO(s.encode('utf-8')).readline) # tokenize the string
for toknum, tokval, _, _, _ in g:
if toknum == NUMBER and '.' in tokval: # replace NUMBER tokens
result.extend([
(NAME, 'Decimal'),
(OP, '('),
(STRING, repr(tokval)),
(OP, ')')
])
else:
result.append((toknum, tokval))
return untokenize(result).decode('utf-8')
Which you can then use like so:
from decimal import Decimal
s = "12.5 + 3.2 + 1.0000000000000001 + (1.0 if 2.0 else 3.0)"
s = decistmt(s)
print(s)
print(eval(s))
Result:
Decimal ('12.5')+Decimal ('3.2')+Decimal ('1.0000000000000001')+(Decimal ('1.0')if Decimal ('2.0')else Decimal ('3.0'))
17.7000000000000001
Feel free to skip the rest of this answer, which is now only of interest to historians of half-correct solutions.
As far as I know, there's no easy way to "hook into" eval in order to change how it interprets float objects.
But if we use the ast module to convert your string into an abstract syntax tree before evaling it, then we can manipulate the tree to replace the floats with Decimal calls.
import ast
from decimal import Decimal
def construct_decimal_node(value):
return ast.Call(
func = ast.Name(id="Decimal", ctx=ast.Load()),
args = [value],
keywords = []
)
return expr
class FloatLiteralReplacer(ast.NodeTransformer):
def visit_Num(self, node):
return construct_decimal_node(node)
s = '12.5 + 3.2'
node = ast.parse(s, mode="eval")
node = FloatLiteralReplacer().visit(node)
ast.fix_missing_locations(node) #add diagnostic information to the nodes we created
code = compile(node, filename="", mode="eval")
result = eval(code)
print("The type of the result of this expression is:", type(result))
print("The result of this expression is:", result)
Result:
The type of the result of this expression is: <class 'decimal.Decimal'>
The result of this expression is: 15.70000000000000017763568394
As you can see, the result is identical to what you would have gotten if you had calculated Decimal(12.5) + Decimal(3.2) directly.
But perhaps you're thinking "Why isn't the result 15.7?". This is because Decimal(3.2) is not exactly identical to 3.2. It's actually equal to 3.20000000000000017763568394002504646778106689453125. This is a hazard when it comes to initializing decimals using float objects -- the inaccuracy is already present. Better to use strings to create decimals, e.g. Decimal("3.2").
Maybe you're now thinking "Ok, so how do I turn 12.5 + 3.2 into Decimal("12.5") + Decimal("3.2")?". The quickest approach would be to modify construct_decimal_node so the Call's args is an ast.Str rather than an ast.Num:
import ast
from decimal import Decimal
def construct_decimal_node(value):
return ast.Call(
func = ast.Name(id="Decimal", ctx=ast.Load()),
args = [ast.Str(str(value.n))],
keywords = []
)
return expr
class FloatLiteralReplacer(ast.NodeTransformer):
def visit_Num(self, node):
return construct_decimal_node(node)
s = '12.5 + 3.2'
node = ast.parse(s, mode="eval")
node = FloatLiteralReplacer().visit(node)
ast.fix_missing_locations(node) #add diagnostic information to the nodes we created
code = compile(node, filename="", mode="eval")
result = eval(code)
print("The type of the result of this expression is:", type(result))
print("The result of this expression is:", result)
Result:
The type of the result of this expression is: <class 'decimal.Decimal'>
The result of this expression is: 15.7
But take care: while I expect this approach to return good results most of the time, there is a corner case where it returns surprising results. In particular, when the expression contains a float f such that float(str(f)) != f. In other words, when the printed representation of the float lacks the precision necessary to represent the float exactly.
For example, if you changed s in the above code to "1.0000000000000001 + 0", the result would be 1.0. This is incorrect, since the result of Decimal("1.0000000000000001") + Decimal("0") is 1.0000000000000001.
I'm not sure how you could prevent this problem... By the time ast.parse has finished executing, the float literal has already been converted into a float object, and there's no obvious way to retrieve the string that was used to create it. Perhaps you could extract it from the expression string, but you'd basically have to reinvent Python's parser to do that.
I have looked on this website for something similar, and attempted to debug using previous answers, and failed.
I'm testing (I did not write this module) a module that changes the grade value of a course's grades from a B- to say a B, but never going across base grade levels (ie, B+ to an A-).
The original module is called transcript.py
I'm testing it in my own testtranscript.py
I'm testing that module by importing it: 'import transcript' and 'import cornelltest'
I have ensured that all files are in the same folder/directory.
There is the function raise_grade present in transcript.py (there are multiple definitions in this module, but raise_grade is the only one giving me any trouble).
ti is in the form ('class name', 'gradvalue')
There's already another definition converting floats to strings and back (ie 3.0--> B).
def raise_grade(ti):
""""Raise gradeval of transcript line ti by a non-noticeable amount.
"""
# value of the base letter grade, e.g., 4 (or 4.0) for a 4.3
bval = int(ti.gradeval)
print 'bval is:"' + str(bval) + '"'
# part after decimal point in raised grade, e.g., 3 (or 3.0) for a 4.3
newdec = min(int((ti.gradeval + .3)*10) % 10, 3)
print 'newdec is:"' + str(newdec) + '"'
# get result by add the two values together, after shifting newdec one
# decimal place
newval = bval + round(newdec/10.0, 1)
ti.gradeval = newval
print 'newval is:"' + str(newval) + '"'
I will probably get rid of the print later.
When I run testtranscript, which imports transcript:
def test_raise():
"""test raise_grade"""
testobj = transcript.Titem('CS1110','B-')
transcript.raise_grade('CS1110','B-')
cornelltest.assert_floats_equal(3.0,transcript.lettergrade_to_val("B-"))
I get this from the cmd shell:
TypeError: raise_grade takes exactly 1 argument (2 given)
Edit1: So now I see that I am giving it two parameters when raise_grade(ti) is just one, but perhaps it would shed more light if I just put out the rest of the code. I'm still stuck as to why I get a ['str' object has no gradeval error]
LETTER_LIST = ['B', 'A']
# List of valid modifiers to base letter grades.
MODIFIER_LIST = ['-','+']
def lettergrade_to_val(lg):
"""Returns: numerical value of letter grade lg.
The usual numerical scheme is assumed: A+ -> 4.3, A -> 4.0, A- -> 3.7, etc.
Precondition: lg is a 1 or 2-character string consisting of a "base" letter
in LETTER_LIST optionally followed by a modifier in MODIFIER_LIST."""
# if LETTER_LIST or MODIFIER_LIST change, the implementation of
# this function must change.
# get value of base letter. Trick: index in LETTER_LIST is shifted from value
bv = LETTER_LIST.index(lg[0]) + 3
# Trick with indexing in MODIFIER_LIST to get the modifier value
return bv + ((MODIFIER_LIST.index(lg[1]) - .5)*.3/.5 if (len(lg) == 2) else 0)
class Titem(object):
"""A Titem is an 'item' on a transcript, like "CS1110 A+"
Instance variables:
course [string]: course name. Always at least 1 character long.
gradeval [float]: the numerical equivalent of the letter grade.
Valid letter grades are 1 or 2 chars long, and consist
of a "base" letter in LETTER_LIST optionally followed
by a modifier in MODIFIER_LIST.
We store values instead of letter grades to facilitate
calculations of GPA later.
(In "real" life, one would write a function that,
when displaying a Titem, would display the letter
grade even though the underlying representation is
numerical, but we're keeping things simple for this
lab.)
"""
def __init__(self, n, lg):
"""Initializer: A new transcript line with course (name) n, gradeval
the numerical equivalent of letter grade lg.
Preconditions: n is a non-empty string.
lg is a string consisting of a "base" letter in LETTER_LIST
optionally followed by modifier in MODIFIER_LIST.
"""
# assert statements that cause an error when preconditions are violated
assert type(n) == str and type(lg) == str, 'argument type error'
assert (len(n) >= 1 and 0 < len(lg) <= 2 and lg[0] in LETTER_LIST and
(len(lg) == 1 or lg[1] in MODIFIER_LIST)), 'argument value error'
self.course = n
self.gradeval = lettergrade_to_val(lg)
Edit2: I understand the original problem... but it seems that the original writer screwed up the code, since raise_grade doesn't work properly for grade values at 3.7 ---> 4.0, since bval takes the original float and makes it an int, which doesn't work in this case.
You are calling the function incorrectly, you should be passing the testobj:
def test_raise():
"""test raise_grade"""
testobj = transcript.Titem('CS1110','B-')
transcript.raise_grade(testobj)
...
The raise_grade function is expecting a single argument ti which has a gradeval attribute, i.e. a Titem instance.
I wrote a simple procedure to calculate the average of the test coverage of some specific packages in a Java project. The raw data in a huge html file is like this:
<body>
package pkg1 <line_coverage>11/111,<branch_coverage>44/444<end>
package pkg2 <line_coverage>22/222,<branch_coverage>55/555<end>
package pkg3 <line_coverage>33/333,<branch_coverage>66/666<end>
...
</body>
Given the specified packages "pkg1" and "pkg3", for example, the average line coverage is:
(11+33)/(111+333)
and average branch coverage is:
(44+66)/(444+666)
I wrote the follow procedure to get the result and it works well. But how to implement this calculation in a functional style? Something like "(x,y) for x in ... for b in ... if...". I know a little Erlang, Haskell and Clojure, So solutions in these languages are also appreciated. Thanks a lot!
from __future__ import division
import re
datafile = ('abc', 'd>11/23d>34/89d', 'e>25/65e>13/25e', 'f>36/92f>19/76')
core_pkgs = ('d', 'f')
covered_lines, total_lines, covered_branches, total_branches = 0, 0, 0, 0
for line in datafile:
for pkg in core_pkgs:
ptn = re.compile('.*'+pkg+'.*'+'>(\d+)/(\d+).*>(\d+)/(\d+).*')
match = ptn.match(line)
if match is not None:
cvln, tlln, cvbh, tlbh = match.groups()
covered_lines += int(cvln)
total_lines += int(tlln)
covered_branches += int(cvbh)
total_branches += int(tlbh)
print 'Line coverage:', '{:.2%}'.format(covered_lines / total_lines)
print 'Branch coverage:', '{:.2%}'.format(covered_branches/total_branches)
Down below you can find my Haskell solution. I will try to explain the important points I went through as I wrote it.
First you will find that I created a data structure for coverage data. It's generally a good idea to create data structures to represent whatever data you want to handle. This is in part because it makes it easier to design your code when you can think in terms of whatever you are designing – closely related to functional programming philosophies, and in part because it can eliminate a few bugs where you think you are doing something but are in actuality doing something else.
Related to the point before: The first thing I do is to convert the string-represented data into my own data structure. When you are doing functional programming, you are often doing things in "sweeps." You don't have a single function that converts data to your format, filters out the unwanted data and summarises the result. You have three different functions for each of those tasks, and you do them one at a time!
This is because functions are very composable, i.e. if you have three different ones, you can stick them together to form a single one if you want to. If you start with a single one, it is very difficult to take it apart to form three different ones.
The actual workings of the conversion function is actually quite uninteresting unless you are specifically doing Haskell. All it does is try to match each string with a regex, and if it succeeds, it adds the coverage data to the resulting list.
Again, mad composition is about to happen. I don't create a function to loop over a list of coverages and sum them up. I create a single function to sum two coverages, because I know I can use it together with the specialised fold loop (which is sort of like a for loop on steroids) to summarise all coverages in a list. There's no need for me to reinvent the wheel and create a loop myself.
Besides, my sumCoverages function works with a lot of specialised loops, so I don't have to write a ton of functions, I just stick my single function into a ton of pre-made library functions!
In the main function you will see what I mean by programming in "sweeps" or "passes" over the data. First I convert it to the internal format, then I filter out the unwanted data, then I summarise the remaining data. These are completely independent computations. That's functional programming.
You will also notice that I use two specialised loops there, filter and fold. This means that I don't have to write any loops myself, I just stick in a function to those standard library loops and let those take it from there.
import Data.Maybe (catMaybes)
import Data.List (foldl')
import Text.Printf (printf)
import Text.Regex (matchRegex, mkRegex)
corePkgs = ["d", "f"]
stats = [
"d>11/23d>34/89d",
"e>25/65e>13/25e",
"f>36/92f>19/76"
]
format = mkRegex ".*(\\w+).*>([0-9]+)/([0-9]+).*>([0-9]+)/([0-9]+).*"
-- It might be a good idea to define a datatype for coverage data.
-- A bit of coverage data is defined as the name of the package it
-- came from, the lines covered, the total amount of lines, the
-- branches covered and the total amount of branches.
data Coverage = Coverage String Int Int Int Int
-- Then we need a way to convert the string data into a list of
-- coverage data. We do this by regex. We try to match on each
-- string in the list, and then we choose to keep only the successful
-- matches. Returned is a list of coverage data that was represented
-- by the strings.
convert :: [String] -> [Coverage]
convert = catMaybes . map match
where match line = do
[name, cl, tl, cb, tb] <- matchRegex format line
return $ Coverage name (read cl) (read tl) (read cb) (read tb)
-- We need a way to summarise two coverage data bits. This can of course also
-- be used to summarise entire lists of coverage data, by folding over it.
sumCoverage (Coverage nameA clA tlA cbA tbA) (Coverage nameB clB tlB cbB tbB) =
Coverage (nameA ++ nameB ++ ",") (clA + clB) (tlA + tlB) (cbA + cbB) (tbA + tbB)
main = do
-- First we need to convert the strings to coverage data
let coverageData = convert stats
-- Then we want to filter out only the relevant data
relevantData = filter (\(Coverage name _ _ _ _) -> name `elem` corePkgs) coverageData
-- Then we need to summarise it, but we are only interested in the numbers
Coverage _ cl tl cb tb = foldl' sumCoverage (Coverage "" 0 0 0 0) relevantData
-- So we can finally print them!
printf "Line coverage: %.2f\n" (fromIntegral cl / fromIntegral tl :: Double)
printf "Branch coverage: %.2f\n" (fromIntegral cb / fromIntegral tb :: Double)
Here are some quickly-hacked, untested ideas applied to your code:
import numpy as np
import re
datafile = ('abc', 'd>11/23d>34/89d', 'e>25/65e>13/25e', 'f>36/92f>19/76')
core_pkgs = ('d', 'f')
covered_lines, total_lines, covered_branches, total_branches = 0, 0, 0, 0
for pkg in core_pkgs:
ptn = re.compile('.*'+pkg+'.*'+'>(\d+)/(\d+).*>(\d+)/(\d+).*')
matches = map(datafile, ptn.match)
statsList = [map(int, match.groups()) for match in matches if matches]
# statsList is a list of [cvln, tlln, cvbh, tlbh]
stats = np.array(statsList)
covered_lines, total_lines, covered_branches, total_branches = stats.sum(axis=1)
Well, as you can see I haven't bothered to finish off the remaining loop, but I think the point is made by now. There's certainly a lot more than one way to do this; I elected to show off map() (which some will say makes this less efficient, and it probably does), as well as NumPy to get the (admittedly light) math done.
This is the corresponding Clojure solution:
(defn extract-data
"extract 4 integer from a string line according to a package name"
[pkg line]
(map read-string
(rest (first
(re-seq
(re-pattern
(str pkg ".*>(\\d+)/(\\d+).*>(\\d+)/(\\d+)"))
line)))))
(defn scan-lines-by-pkg
"scan all string lines and extract all data as integer sequences
according to package names"
[pkgs lines]
(filter seq (for [pkg pkgs
line lines]
(extract-data pkg line))))
(defn sum-data
"add all data in valid lines together"
[pkgs lines]
(apply map + (scan-lines-by-pkg pkgs lines)))
(defn get-percent
[covered all]
(str (format "%.2f" (float (/ (* covered 100) all))) "%"))
(defn get-cov
[pkgs lines]
{:line-cov (apply get-percent (take 2 (sum-data pkgs lines)))
:branch-cov (apply get-percent (drop 2 (sum-data pkgs lines)))})
(get-cov ["d" "f"] ["abc" "d>11/23d>34/89d" "e>25/65e>13/25e" "f>36/92f>19/76"])