Scala: recursively modify lists of elements/lists - python

I was hoping someone could provide me with some basic code help in Scala. I've written some demo code in Python.
Consider a list of elements, where an element can hold either an integer or a list of other elements. I'd like to recursively examine this structure and modify it while keeping the overall structure.
To represent this in python, I've made each 'element' a dictionary with one key ('i' for item). The value corresponding to that key is either an int, or a list of similar dicts. Thus,
lst = [{'i': 1}, {'i': 2}, {'i': [{'i': 5}, {'i': 6}]}, {'i': 3}]
def recurse(x):
if isinstance(x, list):
return [recurse(a) for a in x]
else:
if isinstance(x['i'], list):
return dict(i=[recurse(a) for a in x['i']])
else:
return dict(i=(x['i'] + 1))
print "Input:"
for i in lst:
print i
print "\nResult:\n%s" % recurse(lst)
>>>
Input:
{'i': 1}
{'i': 2}
{'i': [{'i': 5}, {'i': 6}]}
{'i': 3}
Result:
[{'i': 2}, {'i': 3}, {'i': [{'i': 6}, {'i': 7}]}, {'i': 4}]
I understand it's a bit of a weird way to go about doing things, but the data I have been provided is structured like that. I think my issue is that python lets you return different types from the same function, while I don't believe Scala does.
Also for the record, the Scala elements are represented as Elem(4), or Elem(List(Elem(3)..., so I assume pattern matching can come into it somewhat.

I would rather not call that a List of List, as that does not tell what those lists contains. The structure is a tree, more precisely a leafy tree, where there are data only in the leaves. That would be :
sealed trait Tree[+A]
case class Node[+A](children: Tree[A]*) extends Tree[A]
case class Leaf[+A](value: A) extends Tree[A]
then add a method map to apply a function on each value in the tree
sealed trait Tree[+A] {
def map[B](f: A => B): Tree[B]
}
case class Node[+A](children: Tree[A]*) extends Tree[A] {
def map[B](f : A => B) = Node(children.map(_.map(f)): _*)
}
case class Leaf[+A](value: A) extends Tree[A] {
def map[B](f: A => B) = Leaf(f(value))
}
Then your input is :
val input = Node(Leaf(1), Leaf(2), Node(Leaf(5), Leaf(6)), Leaf(3))
And if you call input.map(_ + 1) you get your output
The result display is somewhat ugly because of the varargs Tree[A]*. You may improve by adding in Node override def toString = "Node(" + children.mkString(", ") + ")"
You may prefer the method in one place only, either outside the classes, or directly in Tree. Here as a method in Tree
def map[B](f: A => B): Tree[B] = this match {
case Node(children # _*) => Node(children.map(_.map(f)): _*)
case Leaf(v) => Leaf(f(v))
}
Working the untyped way as in Python is not very scala like but may be done.
def recurse(x: Any) : Any = x match {
case list : List[_] => list.map(recurse(_))
case value : Int => value + 1
}
(putting values directly in the list. Your map (dictionary) with key "i" complicates it and force accepting a compiler warning, as we would have to force a cast that could not be checked, namely that a map accepts string as keys : case map: Map[String, _])
Using case Elem(content: Any) sounds to me as giving no extra safety compared to putting values directly in List, while being much more verbose, and none of the safety and clarity of calling it a tree and distinguishing nodes and leaves without being noticeably terser.

Well, here's something that works but a little bit ugly :
def make(o: Any) = Map('i' -> o) // m :: Any => Map[Char,Any]
val lst = List(make(1),make(2),make(List(make(5),make(6))),make(3)) // List[Any]
def recurce(x: Any): Any = x match {
case l: List[_] => l map recurce
case m: Map[Char,_] => val a = m('i')
a match {
case n: Int => make(n + 1)
case l: List[_] => make(l map recurce)
}
}
Example :
scala> recurce(lst)
res9: Any = List(Map((i,2)), Map((i,3)), Map((i,List(Map(i -> 6), Map(i -> 7)))), Map((i,4)))

This solution is more type safe, but without resorting to going the way of a tree (which isn't a bad solution, but someone already made it :). It will be a list of either an int or a list of int. As such, it can only have two levels -- if you want more, make a tree.
val lst = List(Left(1), Left(2), Right(List(5, 6)), Left(3))
def recurse(lst: List[Either[Int, List[Int]]]): List[Either[Int, List[Int]]] = lst match {
case Nil => Nil
case Left(n) :: tail => Left(n + 1) :: recurse(tail)
case Right(l) :: tail => Right(l map (_ + 1)) :: recurse(tail)
}

Related

How to have preserve quotes if the value is a character and remove the quotes if it is a number when creating a dictionary?

I need to be able to convert any string in the format
"var1=var a;var2=var b"
to a dictionary and I managed to do that as follows.
a = "a=b;b=2"
def str_to_conf_dict(input_str):
return dict(u.split('=') for u in input_str.split(';'))
b = str_to_conf_dict(a)
result= {'a': 'b', 'b': '2'}
But the values in the dictionary have quotes regardless whether var a, var b is a number or an alphabet.
I know that the values are always going to be a mix of characters and numbers (int/float/negatives). How would I have the quotes if the variable is a character and remove the quotes if it is a number?
It is crucial to have the quotes only on characters because I will pass the values to functions which work specifically if it meets the criteria, there is no way to modify that end.
Create a separate function for converting the value to its proper type.
Take a look at How to convert list of strings to their correct Python types?, where the answers use either ast.literal_eval or json.loads (amongst other solutions) to deserialize a string to a suitable Python object:
import json
def convert(val):
try:
return json.loads(val)
except ValueError:
return val # return as-is
Then apply that function on each of the values from the original string.
def str_to_conf_dict(input_str):
d = {}
for pair in input_str.split(";"):
k, v = pair.split("=")
d[k] = convert(v)
return d
s = "a=b;b=2;c=-3;d=xyz;e=4ajkl;f=3.14"
print(str_to_conf_dict(s))
{'a': 'b', 'b': 2, 'c': -3, 'd': 'xyz', 'e': '4ajkl', 'f': 3.14}
All the numbers (ints, floats, and negative ones) should be converted to numbers, while others are retained as-is (as strings, with quotes).
If you want to (unnecessarily) force it into a one-liner (for some reason), you'll need to setup a {key : convert(value)} dictionary comprehension. You can either .split twice to get each item of the pair:
def str_to_conf_dict(input_str):
return {
pair.split('=')[0]: convert(pair.split('=')[1])
for pair in input_str.split(';')
}
Or pair up the items from the .split('=') output. You can take inspiration from the pairwise recipe from the itertools package, or implement something simpler if you know the format is always going to be key=val:
def get_two(iterable):
yield iterable[0], iterable[1]
def str_to_conf_dict(input_str):
return {
k: convert(v)
for pair in input_str.split(';')
for k, v in get_two(pair.split('='))
}

Reusing function results in Python dict

I have the following (very simplified) dict. The get_details function is an API call that I would like to avoid doing twice.
ret = {
'a': a,
'b': [{
'c': item.c,
'e': item.get_details()[0].e,
'h': [func_h(detail) for detail in item.get_details()],
} for item in items]
}
I could of course rewrite the code like this:
b = []
for item in items:
details = item.get_details()
b.append({
'c': item.c,
'e': details[0].e,
'h': [func_h(detail) for detail in details],
})
ret = {
'a': a,
'b': b
}
but would like to use the first approach since it seems more pythonic.
You could use an intermediary generator to extract the details from your items. Something like this:
ret = {
'a': a,
'b': [{
'c': item.c,
'e': details[0].e,
'h': [func_h(detail) for detail in details],
} for (item, details) in ((item, item.get_details()) for item in items)]
}
I don't find the second one particularly un-pythonic; you have a complex initialization, and you shouldn't expect to boil down to a single simple expression. That said, you don't need the temporary list b; you can work directly with ret['b']:
ret = {
'a': a,
'b': []
}
for item in items:
details = item.get_details()
d = details[0]
ret['b'].append({
'c': item.c,
'e': d.e,
'h': map(func_h, details)
})
This is also a case where I would choose map over a list comprehension. (If this were Python 3, you would need to wrap that in an additional call to list.)
I wouldn't try too hard to be more pythonic if it means looking like your first approach. I would take your second approach a step further, and just use a separate function:
ret = {
'a': a,
'b': get_b_from_items(items)
}
I think that's as clean as it can get. Use comments/docstrings to indicate what 'b' is, test the function, and then the next person who comes along can quickly read and trust your code. I know you know how to write the function, but for the sake of completeness, here's how I would do it:
# and add this in where you want it
def get_b_from_items(items):
"""Return a list of (your description here)."""
result = []
for item in items:
details = item.get_details()
result.append({
'c': item.c,
'e': details[0].e,
'h': [func_h(detail) for detail in details],
})
return result
That is plenty pythonic (note the docstring-- very pythonic), and very readable. And of course, it has the advantage of being slightly more granularly testable, complex logic abstracted away from the higher level logic, and all the other advantages of using functions.

Building sub-dict from large dict using recursion

I have a dictionary that links various species in a parent-daughter decay chain. For example:
d = {
'A':{'daughter':['B']},
'B':{'daughter':['C']},
'C':{'daughter':['D']},
'D':{'daughter':['None']},
'E':{'daughter':['F']},
'F':{'daughter':['G']},
'G':{'daughter':['H']},
'H':{'daughter':[None]}
}
In this dictionary, the top level key is the 'parent' and the 'daughter' (i.e. what the parent decays to in the chain) is defined as a key:value item in the dictionary attached to the parent key. When None is given for the daughter, that is considered to be the end of the chain.
I want a function to return a sub dictionary containing the items in the chain according to the users input for the starting parent. I would also like to know the position of each item in the chain. In the sub-dictionary this can be a second field ('position').
For example, if the user wants to start the chain at 'A', I would like the function to return:
{'A':{'position':1, 'daughter':['B']},
'B':{'position':2, 'daughter':['C']},
'C':{'position':3, 'daughter':['D']},
'D':{'position':4, 'daughter':['None']}}
Similarly, if the starting value was 'E' I would like it to return:
{'F':{'position':1, 'daughter':['G']},
'G':{'position':3, 'daughter':['H']},
'H':{'position':4, 'daughter':['None']}}
This is relatively easy when the linking is one-to-one i.e. one item decays into another, into another etc.
If I now use a more complex example, as below, you can see that 'B' actually decays into both 'C' and 'D' and from there onwards the chains are separate.
A => B => C => E => G and A => B => D => F => H
d = {
'A':{'daughter':['B']},
'B':{'daughter':['C', 'D']},
'C':{'daughter':['E']},
'D':{'daughter':['F']},
'E':{'daughter':['G']},
'F':{'daughter':['H']},
'G':{'daughter':[None]},
'H':{'daughter':[None]}
}
In this case I would like a function to return the following output. You'll notice because of the diversion of the two chains, the position values are close to the level in the heirachy e.g. C=3 and D = 4, but not exactly the same. I don't want to follow the C chain all the way down, and then repeat for the D chain.
{'A':{'position':1, 'daughter':['B']},
'B':{'position':2, 'daughter':['C']},
'C':{'position':3, 'daughter':['E']},
'D':{'position':4, 'daughter':['F']}
'E':{'position':5, 'daughter':['G']}
'F':{'position':6, 'daughter':['H']}
'G':{'position':8, 'daughter':['None']}
'H':{'position':9, 'daughter':['None']}
}
Any thoughts? The function should be able to cope with more than one diversion in the chain.
Mark
If you don't want to go all the way down from C, then breadth-first search may help。
def bfs(d, start):
answer = {}
queue = [start]
head = 0
while head < len(queue):
# Fetch the first element from queue
now = queue[head]
answer[now] = {
'position': head+1,
'daughter': d[now]['daughter']
}
# Add daughters to the queue
for nxt in d[now]['daughter']:
if nxt == None:
continue
queue.append(nxt)
head += 1
return answer
d = {
'A': {'daughter': ['B']},
'B': {'daughter': ['C', 'D']},
'C': {'daughter': ['E']},
'D': {'daughter': ['F']},
'E': {'daughter': ['G']},
'F': {'daughter': ['H']},
'G': {'daughter': [None]},
'H': {'daughter': [None]}
}
print(bfs(d, 'A'))

Mixing List and Dict in Python

I was wondering how do you mix list and dict together on Python? I know on PHP, I can do something like this:
$options = array(
"option1",
"option2",
"option3" => array("meta1", "meta2", "meta3"),
"option4"
);
The problem is python have different bracket for different list. () for tuple, [] for list, and {} for dict. There don't seems to be any way to mix them and I'm keep getting syntax errors.
I am using python 2.7 now. Please advice how to do it correctly.
Much thanks,
Rufas
Update 1:
I'll slightly elaborate what I'm trying to do. I am trying to write a simple python script to do some API requests here:
http://www.diffbot.com/products/automatic/article/
The relevant part is the fields query parameters. It is something like ...&fields=meta,querystring,images(url,caption)... . So the above array can be written as (in PHP)
$fields = array(
'meta',
'querystring',
'images' => array('url', 'caption')
);
And the $fields will be passed to a method for processing. The result will be returned, like this:
$json = diffbot->get("article", $url, $fields);
The thing is - I have no problem in writing it in PHP, but when I try to write it in Python, the thing is not as easy as it seems...
You can do it this way:
options = {
"option1": None,
"option2": None,
"option3": ["meta1", "meta2", "meta3"],
"option4": None,
}
But options is a dictionary in this case.
If you need the order in the dictionary you can use OrderedDict.
How can you use OrderedDict?
from collections import OrderedDict
options = OrderedDict([
("option1", None),
("option2", None),
("option3", ["meta1", "meta2", "meta3"]),
("option4", None),
])
print options["option3"]
print options.items()[2][1]
print options.items()[3][1]
Output:
['meta1', 'meta2', 'meta3']
['meta1', 'meta2', 'meta3']
None
Here you can access options either using keys (like option3), or indexes (like 2 and 3).
Disclaimer. I must stress that this solution is not one-to-one mapping between PHP and Python. PHP is another language, with other data structures/other semantics etc. You can't do one to one mapping between data structures of Python and PHP. Please also consider the answer of Hyperboreus (I gave +1 to him). It show another way to mix lists and dictionaries in Python. Please also read our discussion below.
Update1.
How can you process such structures?
You must check which type a value in each case has.
If it is a list (type(v) == type([])) you can join it;
otherwise you can use it as it is.
Here I convert the structure to a URL-like string:
options = {
"option1": None,
"option2": None,
"option3": ["meta1", "meta2", "meta3"],
"option4": "str1",
}
res = []
for (k,v) in options.items():
if v is None:
continue
if type(v) == type([]):
res.append("%s=%s" % (k,"+".join(v)))
else:
res.append("%s=%s" % (k,v))
print "&".join(res)
Output:
option4=str1&option3=meta1+meta2+meta3
This seems to do the same thing:
options = {0: 'option1',
1: 'option2',
2: 'option4'
'option3': ['meta1', 'meta2', 'meta3'] }
More in general:
[] denote lists, i.e. ordered collections: [1, 2, 3] or [x ** 2 for x in [1, 2, 3]]
{} denote sets, i.e. unordered collections of unique (hashable) elements, and dictionaries, i.e. mappings between unique (hashable) keys and values: {1, 2, 3}, {'a': 1, 'b': 2}, {x: x ** 2 for x in [1, 2, 3]}
() denote (among other things) tuples, i.e. immutable ordered collections: (1, 2, 3)
() also denote generators: (x ** 2 for x in (1, 2, 3))
You can mix them any way you like (as long as elements of a set and keys of a dictionary are hashable):
>>> a = {(1,2): [2,2], 2: {1: 2}}
>>> a
{(1, 2): [2, 2], 2: {1: 2}}
>>> a[1,2]
[2, 2]
>>> a[1,2][0]
2
>>> a[2]
{1: 2}
>>> a[2][1]
2
I'm pretty sure there are 3 answers for my question and while it received a -1 vote, it is the closest to what I want. It is very strange now that it is gone when I want to pick that one up as "accepted answer" :(
To recap, the removed answer suggest I should do this:
options = [
"option1",
"option2",
{"option3":["meta1", "meta2", "meta3"]},
"option4"
]
And that fits nicely how I want to process each item on the list. I just loop through all values and check for its type. If it is a string, process it like normal. But when it is a dict/list, it will be handled differently.
Ultimately, I managed to make it work and I get what I want.
Special thanks to Igor Chubin and Hyperboreus for providing suggestions and ideas for me to test and discover the answer I've been looking for. Greatly appreciated.
Thank you!
Rufas

pack multiple variables of different datatypes in list/array python

I have multiple variables that I need to pack as one and hold it sequentially like in a array or list. This needs to be done in Python and I am still at Python infancy.
E.g. in Python:
a = Tom
b = 100
c = 3.14
d = {'x':1, 'y':2, 'z':3}
All the above in one sequential data structure. I can probably try and also a similar implementation I would have done in C++ just for the sake of clarity.
struct
{
string a;
int b;
float c;
map <char,int> d;// just as an example for dictionary in python
} node;
vector <node> v; // looking for something like this which can be iterable
If some one can give me a similar implementation for storing, iterating and modifying the contents would be really helpful. Any pointers in the right direction is also good with me.
Thanks
You can either use a dictionary like Michael proposes (but then you need to access the contents of v with v['a'], which is a little cumbersome), or you can use the equivalent of C++'s struct: a named tuple:
import collections
node = collections.namedtuple('node', 'a b c d')
# Tom = ...
v = node(Tom, 100, 3.14, {'x':1, 'y':2, 'z':3})
print node # node(a=…, b=100, c=3.14, d={'x':1, 'y':2, 'z':3})
print node.c # 3.14
print node[2] # 3.14 (works too, but is less meaningful and robust than something like node.last_name)
This is similar to, but simpler than defining your own class: type(v) == node, etc. Note however, as volcano pointed out, that the values stored in a namedtuple cannot be changed (a namedtuple is immutable).
If you indeed need to modify the values inside your records, the best option is a class:
class node(object):
def __init__(self, *arg_list):
for (name, arg) in zip('a b c d'.split(), arg_list):
setattr(self, name, arg)
v = node(1, 20, 300, "Eric")
print v.d # "Eric"
v.d = "Ajay" # Works
The last option, which I do not recommend, is indeed to use a list or a tuple, like ATOzTOA mentions: elements must be accessed in a not-so-legible way: node[3] is less meaningful than node.last_name; also, you cannot easily change the order of the fields, when using a list or a tuple (whereas the order is immaterial if you access a named tuple or custom class attributes).
Multiple node objects are customarily put in a list, the standard Python structure for such a purpose:
all_nodes = [node(…), node(…),…]
or
all_nodes = []
for … in …:
all_nodes.append(node(…))
or
all_nodes = [node(…) for … in …]
etc. The best method depends on how the various node objects are created, but in many cases a list is likely to be the best structure.
Note, however, that if you need to store something akin to an spreadsheet table and need speed and facilities for accessing its columns, you might be better off with NumPy's record arrays, or a package like Pandas.
You could put all the values in a dictionary, and have a list of these dictionaries.
{'a': a, 'b': b, 'c': c, 'd': d}
Otherwise, if this data is something that could be represented by a class, for example a 'Person'; create a class of type Person and create an object of that class with your data:
http://docs.python.org/2/tutorial/classes.html
Just use lists:
a = "Tom"
b = 100
c = 3.14
d = {'x':1, 'y':2, 'z':3}
data = [a, b, c, d]
print data
for item in data:
print item
Output:
['Tom', 100, 3.14, {'y': 2, 'x': 1, 'z': 3}]
Tom
100
3.14
{'y': 2, 'x': 1, 'z': 3}

Categories

Resources