Generating Binary decision diagram from an available data structure

Generating Binary decision diagram from an available data structure - python

This question is a bit long; please bear with me.
I have a data structure with elements like this {x1, x2, x3, x4, x5}:
{0 0 0 0 0, 0 0 0 1 0, 1 1 1 1 0,.....}
They represent all the TRUEs in the truth table. Of course, the 5-bit string elements not present in this set correspond to FALSEs in the truth table. But I don't have the boolean function corresponding to the said set data structure.
I see this question but here all the answers assume the boolean function is given, which is not true.
I need to build a ROBDD and then to ZDD from the given set data structure. Preferably with available python packages like these.
Any advice from experts? I am sure a lot has been done along the line of this.

With the Python package dd, which can be installed using the package manager pip with pip install dd, it is possible to convert the set of variable assignments where the Boolean function is TRUE to a binary decision diagram.
The following example in Python assumes that the assignments where the function is TRUE are given as a set of strings.
from dd import autoref as _bdd
# assignments where the Boolean function is TRUE
data = {'0 0 0 0 0', '0 0 0 1 0', '1 1 1 1 0'}
# variable names
vrs = [f'x{i}' for i in range(1, 6)]
# convert the assignments to dictionaries
assignments = list()
for e in data:
tpl = e.split()
assignment = {k: bool(int(v)) for k, v in zip(vrs, tpl)}
assignments.append(assignment)
# initialize a BDD manager
bdd = _bdd.BDD()
# declare variables
bdd.declare(*vrs)
# create binary decision diagram
u = bdd.false
for assignment in assignments:
u |= bdd.cube(assignment)
# to confirm
satisfying_assignments = list(bdd.pick_iter(u))
print(satisfying_assignments)
For a faster implementation of BDDs, and for an implementation of ZDDs using the C library CUDD, the Cython module extensions dd.cudd and dd.cudd_zdd can be installed as following:
pip download dd --no-deps
tar xzf dd-*.tar.gz
cd dd-*
python setup.py install --fetch --cudd --cudd_zdd
For this (small) example there is no practical speed difference between the pure Python module dd.autoref and the Cython module dd.cudd.
The above binary decision diagram (BDD) can be copied to a zero-suppressed binary decision diagram (ZDD) with the following code:
from dd import _copy
from dd import cudd_zdd
# initialize a ZDD manager
zdd = cudd_zdd.ZDD()
# declare variables
zdd.declare(*vrs)
# copy the BDD to a ZDD
u_zdd = _copy.copy_bdd(u, zdd)
# confirm
satisfying_assignments = list(zdd.pick_iter(u_zdd))
print(satisfying_assignments)
The module dd.cudd_zdd was added in dd == 0.5.6, so the above installation requires downloading the distribution of dd >= 0.5.6, either from PyPI, or from the GitHub repository.

Related

Unexpected behavior when using R sample function with rpy2?

I need to cross-validate an R code in python. My code contains lots of pseudo-random number generations, so, for an easier comparison, I decided to use rpy2 to generate those values in my python code "from R".
As an example, in R, I have:
set.seed(1234)
runif(4)
[1] 0.1137034 0.6222994 0.6092747 0.6233794
In python, using rpy2, I have:
import rpy2.robjects as robjects
set_seed = robjects.r("set.seed")
runif = robjects.r("runif")
set_seed(1234)
print(runif(4))
[1] 0.1137034 0.6222994 0.6092747 0.6233794
as expected (values are similar). However, I face a strange behavior with the R sample function (equivalent to the numpy.random.choice function).
As the simplest reproducible example, I have in R:
set.seed(1234)
sample(5)
[1] 1 3 2 4 5
while in python I have:
sample = robjects.r("sample")
set_seed(1234)
print(sample(5))
[1] 4 5 2 3 1
The results are different. Could anyone explain why this happens and/or provide a way to get similar values in R and python using the R sample function?

If you print the value of the R function RNGkind() in both situations, I suspect you won't get the same answer. The Python result looks like the default output, while your R result looks like the old buggy output.
For example, in R:
set.seed(1234, sample.kind = "Rejection")
sample(5)
#> [1] 4 5 2 3 1
set.seed(1234, sample.kind = "Rounding")
#> Warning in set.seed(1234, sample.kind = "Rounding"): non-uniform 'Rounding'
#> sampler used
sample(5)
#> [1] 1 3 2 4 5
set.seed(1234, sample.kind = "default")
sample(5)
#> [1] 4 5 2 3 1
Created on 2021-01-15 by the reprex package (v0.3.0)
So it looks to me as though you are still using the old "Rounding" method in your R session. You probably saved a workspace a long time ago, and have reloaded it since. Don't do that, start with a clean workspace each session.

Maybe give this a shot (stackoverflow answer from here). Quoting the answer : "The p argument corresponds to the prob argument in the sample()function"
import numpy as np
np.random.choice(a, size=None, replace=True, p=None)

How do you view the structure of data stored in a Python variable?

Im coming from an R background where it is trivial to do str(X) to get a text output of a data structure and its values. Is there a similar functionality in Python?
Note that in Python type(X) does not return the data structure- just the class of X which is not what I am looking for.
print(X) returns the values but not their type (num in the R code below)
Example using R of my goal:
> df<-data.frame(1,2,3)
> str(df)
'data.frame': 1 obs. of 3 variables:
$ X1: num 1
$ X2: num 2
$ X3: num 3

There many variants to print smth in python represented as common data structures . I prefere pprint - https://docs.python.org/2/library/pprint.html.
import pprint
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(df) # df in your case
Another variant we can use json serializer:
import json
print(json.dumps(df, sort_keys=True, indent=4)
If you realy asked about Pandas Dataframe you can use any library for df pritty print:
https://pypi.python.org/pypi/tabulate
https://pypi.python.org/pypi/PrettyTable
Also pprint works with DF too.
Another way is to use nattive Pandas mathod: df.shape - http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.shape.html
But if you come from R (and RStudio), maybe you will like Jupyter http://jupyter.org/ - it displays all you need in good, pritty and interactive way. Little turtorial: http://songhuiming.github.io/pages/2017/04/02/jupyter-and-pandas-display/

Converting an imperative algorithm into functional style

I wrote a simple procedure to calculate the average of the test coverage of some specific packages in a Java project. The raw data in a huge html file is like this:
<body>
package pkg1 <line_coverage>11/111,<branch_coverage>44/444<end>
package pkg2 <line_coverage>22/222,<branch_coverage>55/555<end>
package pkg3 <line_coverage>33/333,<branch_coverage>66/666<end>
...
</body>
Given the specified packages "pkg1" and "pkg3", for example, the average line coverage is:
(11+33)/(111+333)
and average branch coverage is:
(44+66)/(444+666)
I wrote the follow procedure to get the result and it works well. But how to implement this calculation in a functional style? Something like "(x,y) for x in ... for b in ... if...". I know a little Erlang, Haskell and Clojure, So solutions in these languages are also appreciated. Thanks a lot!
from __future__ import division
import re
datafile = ('abc', 'd>11/23d>34/89d', 'e>25/65e>13/25e', 'f>36/92f>19/76')
core_pkgs = ('d', 'f')
covered_lines, total_lines, covered_branches, total_branches = 0, 0, 0, 0
for line in datafile:
for pkg in core_pkgs:
ptn = re.compile('.*'+pkg+'.*'+'>(\d+)/(\d+).*>(\d+)/(\d+).*')
match = ptn.match(line)
if match is not None:
cvln, tlln, cvbh, tlbh = match.groups()
covered_lines += int(cvln)
total_lines += int(tlln)
covered_branches += int(cvbh)
total_branches += int(tlbh)
print 'Line coverage:', '{:.2%}'.format(covered_lines / total_lines)
print 'Branch coverage:', '{:.2%}'.format(covered_branches/total_branches)

Down below you can find my Haskell solution. I will try to explain the important points I went through as I wrote it.
First you will find that I created a data structure for coverage data. It's generally a good idea to create data structures to represent whatever data you want to handle. This is in part because it makes it easier to design your code when you can think in terms of whatever you are designing – closely related to functional programming philosophies, and in part because it can eliminate a few bugs where you think you are doing something but are in actuality doing something else.
Related to the point before: The first thing I do is to convert the string-represented data into my own data structure. When you are doing functional programming, you are often doing things in "sweeps." You don't have a single function that converts data to your format, filters out the unwanted data and summarises the result. You have three different functions for each of those tasks, and you do them one at a time!
This is because functions are very composable, i.e. if you have three different ones, you can stick them together to form a single one if you want to. If you start with a single one, it is very difficult to take it apart to form three different ones.
The actual workings of the conversion function is actually quite uninteresting unless you are specifically doing Haskell. All it does is try to match each string with a regex, and if it succeeds, it adds the coverage data to the resulting list.
Again, mad composition is about to happen. I don't create a function to loop over a list of coverages and sum them up. I create a single function to sum two coverages, because I know I can use it together with the specialised fold loop (which is sort of like a for loop on steroids) to summarise all coverages in a list. There's no need for me to reinvent the wheel and create a loop myself.
Besides, my sumCoverages function works with a lot of specialised loops, so I don't have to write a ton of functions, I just stick my single function into a ton of pre-made library functions!
In the main function you will see what I mean by programming in "sweeps" or "passes" over the data. First I convert it to the internal format, then I filter out the unwanted data, then I summarise the remaining data. These are completely independent computations. That's functional programming.
You will also notice that I use two specialised loops there, filter and fold. This means that I don't have to write any loops myself, I just stick in a function to those standard library loops and let those take it from there.
import Data.Maybe (catMaybes)
import Data.List (foldl')
import Text.Printf (printf)
import Text.Regex (matchRegex, mkRegex)
corePkgs = ["d", "f"]
stats = [
"d>11/23d>34/89d",
"e>25/65e>13/25e",
"f>36/92f>19/76"
]
format = mkRegex ".*(\\w+).*>([0-9]+)/([0-9]+).*>([0-9]+)/([0-9]+).*"
-- It might be a good idea to define a datatype for coverage data.
-- A bit of coverage data is defined as the name of the package it
-- came from, the lines covered, the total amount of lines, the
-- branches covered and the total amount of branches.
data Coverage = Coverage String Int Int Int Int
-- Then we need a way to convert the string data into a list of
-- coverage data. We do this by regex. We try to match on each
-- string in the list, and then we choose to keep only the successful
-- matches. Returned is a list of coverage data that was represented
-- by the strings.
convert :: [String] -> [Coverage]
convert = catMaybes . map match
where match line = do
[name, cl, tl, cb, tb] <- matchRegex format line
return $ Coverage name (read cl) (read tl) (read cb) (read tb)
-- We need a way to summarise two coverage data bits. This can of course also
-- be used to summarise entire lists of coverage data, by folding over it.
sumCoverage (Coverage nameA clA tlA cbA tbA) (Coverage nameB clB tlB cbB tbB) =
Coverage (nameA ++ nameB ++ ",") (clA + clB) (tlA + tlB) (cbA + cbB) (tbA + tbB)
main = do
-- First we need to convert the strings to coverage data
let coverageData = convert stats
-- Then we want to filter out only the relevant data
relevantData = filter (\(Coverage name _ _ _ _) -> name `elem` corePkgs) coverageData
-- Then we need to summarise it, but we are only interested in the numbers
Coverage _ cl tl cb tb = foldl' sumCoverage (Coverage "" 0 0 0 0) relevantData
-- So we can finally print them!
printf "Line coverage: %.2f\n" (fromIntegral cl / fromIntegral tl :: Double)
printf "Branch coverage: %.2f\n" (fromIntegral cb / fromIntegral tb :: Double)

Here are some quickly-hacked, untested ideas applied to your code:
import numpy as np
import re
datafile = ('abc', 'd>11/23d>34/89d', 'e>25/65e>13/25e', 'f>36/92f>19/76')
core_pkgs = ('d', 'f')
covered_lines, total_lines, covered_branches, total_branches = 0, 0, 0, 0
for pkg in core_pkgs:
ptn = re.compile('.*'+pkg+'.*'+'>(\d+)/(\d+).*>(\d+)/(\d+).*')
matches = map(datafile, ptn.match)
statsList = [map(int, match.groups()) for match in matches if matches]
# statsList is a list of [cvln, tlln, cvbh, tlbh]
stats = np.array(statsList)
covered_lines, total_lines, covered_branches, total_branches = stats.sum(axis=1)
Well, as you can see I haven't bothered to finish off the remaining loop, but I think the point is made by now. There's certainly a lot more than one way to do this; I elected to show off map() (which some will say makes this less efficient, and it probably does), as well as NumPy to get the (admittedly light) math done.

This is the corresponding Clojure solution:
(defn extract-data
"extract 4 integer from a string line according to a package name"
[pkg line]
(map read-string
(rest (first
(re-seq
(re-pattern
(str pkg ".*>(\\d+)/(\\d+).*>(\\d+)/(\\d+)"))
line)))))
(defn scan-lines-by-pkg
"scan all string lines and extract all data as integer sequences
according to package names"
[pkgs lines]
(filter seq (for [pkg pkgs
line lines]
(extract-data pkg line))))
(defn sum-data
"add all data in valid lines together"
[pkgs lines]
(apply map + (scan-lines-by-pkg pkgs lines)))
(defn get-percent
[covered all]
(str (format "%.2f" (float (/ (* covered 100) all))) "%"))
(defn get-cov
[pkgs lines]
{:line-cov (apply get-percent (take 2 (sum-data pkgs lines)))
:branch-cov (apply get-percent (drop 2 (sum-data pkgs lines)))})
(get-cov ["d" "f"] ["abc" "d>11/23d>34/89d" "e>25/65e>13/25e" "f>36/92f>19/76"])

What statistics module for python supports one way ANOVA with post hoc tests (Tukey, Scheffe or other)?

I have tried looking through multiple statistics modules for Python but can't seem to find any that support one-way ANOVA post hoc tests.

one way ANOVA can be used like
from scipy import stats
f_value, p_value = stats.f_oneway(data1, data2, data3, data4, ...)
This is one way ANOVA and it returns F value and P value.
There is significant difference If the P value is below your setting.
The Tukey-kramer HSD test can be used like
from statsmodels.stats.multicomp import pairwise_tukeyhsd
print pairwise_tukeyhsd(Data, Group)
This is multicomparison.
The output is like
Multiple Comparison of Means - Tukey HSD,FWER=0.05
================================================
group1 group2 meandiff lower upper reject
------------------------------------------------
0 1 -35.2153 -114.8741 44.4434 False
0 2 46.697 -40.4993 133.8932 False
0 3 -7.5709 -87.49 72.3482 False
1 2 81.9123 5.0289 158.7956 True
1 3 27.6444 -40.8751 96.164 False
2 3 -54.2679 -131.4209 22.8852 False
------------------------------------------------
Please refer to this site how to set the arguments.
The tukeyhsd of statsmodels doesn't return P value.
So, if you want to know P value, calculate from these outputted value or use R.

I think that the library Pyvttbl returns a ANOVA table including post-hoc tests (i.e., TukeyHSD). In fact, what is neat with Pyvttbl is that you can carry out ANOVA for repeated measures also.
See the doc for Anova1way here

Python bidirectional mapping

I'm not sure what to call what I'm looking for; so if I failed to find this question else where, I apologize. In short, I am writing python code that will interface directly with the Linux kernel. Its easy to get the required values from include header files and write them in to my source:
IFA_UNSPEC = 0
IFA_ADDRESS = 1
IFA_LOCAL = 2
IFA_LABEL = 3
IFA_BROADCAST = 4
IFA_ANYCAST = 5
IFA_CACHEINFO = 6
IFA_MULTICAST = 7
Its easy to use these values when constructing structs to send to the kernel. However, they are of almost no help to resolve the values in the responses from the kernel.
If I put the values in to dict I would have to scan all the values in the dict to look up keys for each item in each struct from the kernel I presume. There must be a simpler, more efficient way.
How would you do it? (feel free to retitle the question if its way off)

If you want to use two dicts, you can try this to create the inverted dict:
b = {v: k for k, v in a.iteritems()}

Your solution leaves a lot of work do the repeated person creating the file. That is a source for error (you actually have to write each name three times). If you have a file where you need to update those from time to time (like, when new kernel releases come out), you are destined to include an error sooner or later. Actually, that was just a long way of saying, your solution violates DRY.
I would change your solution to something like this:
IFA_UNSPEC = 0
IFA_ADDRESS = 1
IFA_LOCAL = 2
IFA_LABEL = 3
IFA_BROADCAST = 4
IFA_ANYCAST = 5
IFA_CACHEINFO = 6
IFA_MULTICAST = 7
__IFA_MAX = 8
values = {globals()[x]:x for x in dir() if x.startswith('IFA_') or x.startswith('__IFA_')}
This was the values dict is generated automatically. You might want to (or have to) change the condition in the if statement there, according to whatever else is in that file. Maybe something like the following. That version would take away the need to list prefixes in the if statement, but it would fail if you had other stuff in the file.
values = {globals()[x]:x for x in dir() if not x.endswith('__')}
You could of course do something more sophisticated there, e.g. check for accidentally repeated values.

What I ended up doing is leaving the constant values in the module and creating a dict. The module is ip_addr.py (the values are from linux/if_addr.h) so when constructing structs to send to the kernel I can use if_addr.IFA_LABEL and resolves responses with if_addr.values[2]. I'm hoping this is the most straight forward so when I have to look at this again in a year+ its easy to understand :p
IFA_UNSPEC = 0
IFA_ADDRESS = 1
IFA_LOCAL = 2
IFA_LABEL = 3
IFA_BROADCAST = 4
IFA_ANYCAST = 5
IFA_CACHEINFO = 6
IFA_MULTICAST = 7
__IFA_MAX = 8
values = {
IFA_UNSPEC : 'IFA_UNSPEC',
IFA_ADDRESS : 'IFA_ADDRESS',
IFA_LOCAL : 'IFA_LOCAL',
IFA_LABEL : 'IFA_LABEL',
IFA_BROADCAST : 'IFA_BROADCAST',
IFA_ANYCAST : 'IFA_ANYCAST',
IFA_CACHEINFO : 'IFA_CACHEINFO',
IFA_MULTICAST : 'IFA_MULTICAST',
__IFA_MAX : '__IFA_MAX'
}

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Generating Binary decision diagram from an available data structure - python

Related

Unexpected behavior when using R sample function with rpy2?

How do you view the structure of data stored in a Python variable?

Converting an imperative algorithm into functional style

What statistics module for python supports one way ANOVA with post hoc tests (Tukey, Scheffe or other)?

Python bidirectional mapping

Categories

Resources