Change json.dumps behaviour : customize serialization - python

Imagine, I've got a dict {"a": "hello", "b": b"list"}
'a' is a string
'b' is a byte string
I would like to serialize the dict into the "json"(*) string --> '{"a": "hello", "b": list}'
(*) : not really json compliant
For that, i've written that method, it works ....
def stringify(obj):
def my(obj):
if isinstance(obj,bytes):
return "<:<:%s:>:>" % obj.decode()
return json.dumps(obj, default=my).replace('"<:<:',"").replace(':>:>"',"")
(the "<:<:" & ":>:>" are just added before serialization, to be replaced, post json serialization, to obtain the desired result)
It's a little be hacky, using string substitution to obtain the result ... it works ;-)
I ask myself, and you, if it can be done in a better/python way ...
Do you have any idea ?
EDIT
I would like to rewrite my stringify, in a better way, with assertions :
assert stringify( dict(a="hello",b=b"byte") ) == '{"a": "hello", "b": byte}'
assert stringify( ["hello", b"world"] ) == '["hello", world]'
assert stringify( "hello" ) == '"hello"'
assert stringify( b"world" ) == "world"

In order to achieve your desired output, i.e. '{"a": "hello", "b": list}' you will need to do some ugly, but fair cosmetic changes, such as reconstructing the dictionary by yourself. As the plain old dictionary {"a": "hello", "b": list} makes no sense as a python variable (well, this specific example does, only because we're using the built-in list, but if it was "mymethod" or anything else - it wouldn't)
def stringify(input_dict: dict):
for k, v in input_dict.items():
if isinstance(v, bytes):
input_dict[k] = v.decode()
else:
input_dict[k] = f'"{v}"'
return '{' + ', '.join([f'{k}: {v}' for k, v in input_dict.items()]) + '}'
We can see that here we are reconstructing literally a dictionary using ASCII characters, not that bad, not that intuitive but nontheless works as intended.
Your solution does work, but it wouldn't work if one of the values in the dictionary has this special set of characters <:<:.
Making this code:
d = {"a": "hello", "b": b"list"}
serialized_dict = stringify(d)
print(serialized_dict)
Output:
{a: "hello", b: list}
Which is of type str, NOT a valid JSON one.
Edit - a more generic stringify function.
We can do it smarter, making it recursively call the stringify function and if we encounter an atomic object (i.e. int, str etc..) we return it with quotation marks, else (i.e. bytes) we return it without quotation marks.
def generic_stringify(input_generic_object):
if isinstance(input_generic_object, dict):
<paste the stringify function Ive posted above>
elif isinstance(input_generic_object, list):
return '[' + ', '.join([generic_stringify(v) for v in input_generic_object]) + ']'
elif isinstance(input_generic_object, bytes):
return input_generic_object.decode()
else:
return f'"{input_generic_object}"'
Here we return the bytes decoded if the type is bytes and return it with quotation marks if it is of type str:
print(generic_stringify(dict(a="hello", b=b"byte")))
print(generic_stringify(["hello", b"world", {"c": b"list"}]))
print(generic_stringify("hello"))
print(generic_stringify(b"world"))
Outputs:
{"a": "hello", "b": byte}
["hello", world, {"c": list}]
"hello"
world

Related

Is there any way to get x in "x for y in x for z"

For example, the following code:
dict = {
"foo": "a",
"bar": "b",
"see": "c",
}
string = "foo in the city"
if any(x in string for x in dict):
print(dict[x])
Is there any way to get x? (or get dict[x] through another method?)
There are a few points that need to be rectified before the answer is provided :
any() can be used only for iterable objects like a list, tuple or dictionary - hence the object that gets returned inside the condition you have defined wont be passed and would be unidentified for any()
List Comprehensions are an amazing way to manipulate values of an iterable and return your desired value in the form of a list, which can further be modified or printed as per your desire
Moving on to the original solution:
dict = {
"foo": "a",
"bar": "b",
"see": "c",
}
string = "foo in the city"
list_final = [x for x in dict if x in string]
A good approach is to get the output in the form of a list element (in case of more than one element:
print(list_final)
If you want it as a single element value (as a string) - very few cases, such as the one here:
print(list_final[0])
Just put "x in string" inside brackets.
So you would get
dict = {
"foo": "a",
"bar": "b",
"see": "c",
}
string = "foo in the city"
if any((x in string) for x in dict):
print(dict[x])
dict = {
"foo": "a",
"bar": "b",
"see": "c",
}
string = "foo in the city"
print([dict[x] for x in string.split() if x in dict])
# Output:
# ['a']
It is not clear what you are trying to achieve.
The solution you are searching for could be:
res = [x for x in dict if x in string]
or
[print(x) for x in dict if x in string]
but the last one it is not the good way to use list comprehension.
p.s. you should not use dict as name for a dictionary
Assuming you want to get all values where their keys are inside your string, you can do the following:
mydict = {
"foo": "a",
"bar": "b",
"see": "c",
}
string = "foo in the city"
matches = [x for x in mydict if x in string]
for match in matches:
print(mydict[match])
Try it online!
According to your need. I think this is what you want.
dict = {
"foo": "a",
"bar": "b",
"see": "c",
}
s = "foo in the city"
result = [dict[x] for x in s.split() if dict.get(x,0) != 0]
print(result)
If you want to just print
[print(dict[x]) for x in s.split() if dict.get(x,0) != 0]
Output:
a
I think you want to check if any of the words in string (please do not name variables as their type) is in the dict? If so, maybe this helps:
# make it a list
words = string.split(' ')
x = 'foo'
if any(x in words for x in dict):
print(dict[x])
# output:
# a

Print empty string if variable is None [duplicate]

This question already has answers here:
Python: most idiomatic way to convert None to empty string?
(17 answers)
Closed 3 years ago.
I'm trying to make a templated string that will print values of a given dict. However, the key may or may not exist in the dict. If it doesn't exist, I'd like it to return an empty string instead.
To demonstrate what I mean, let's say I have a dict:
test_dict = { 1: "a", 2: "b"}
And a template string:
'{} {} {}'.format(test_dict.get(1), test_dict.get(2), test_dict.get(3))
I'd like the following output:
'a b '
But instead I get:
'a b None'
Use the dictionary's get function. This allows you to specify a value to return if the key is not found
'{}, {}, {}'.format(test_dict.get(1,''), test_dict.get(2,''), test_dict.get(3, ''))
One way would be to get the length of the dict, and put the same amount of placeholeders inside the template:
In [27]: test_dict = { 1: "a", 2: "b"}
In [28]: ' '.join(['{}'] * len(test_dict))
Out[28]: '{} {}'
In [29]: ' '.join(['{}'] * len(test_dict)).format(*test_dict.values())
Out[29]: 'a b'
Note that, this is basically the same as ' '.join(test_dict.values()) but showing you the template string as an example.
UPDATES PER OP COMMENT
You can use the string library to help here. See the below script using your test_dict:
#https://stackoverflow.com/a/51359690
from string import Formatter
class NoneAsEmptyFormatter(Formatter):
def get_value(self, key, args, kwargs):
v = super().get_value(key, args, kwargs)
return ' ' if v is None else v
fmt = NoneAsEmptyFormatter()
test_dict = { 1: "a", 2: "b"}
test_str = fmt.format('{} {} {}', test_dict.get(1), test_dict.get(2), test_dict.get(3))
print(test_str)
We build a quick NoneAsEmptyFormatter class and use that to format the strings in coming from the dict.
Re your comment,
Now that you mention the extra space though, is there a way to remove the placeholder completely if key doesn't exist?
Yes, this is possible. Just make a list of values, filter out any Nones, then join the result:
In [3]: values = map(test_dict.get, [1, 2, 3])
In [4]: ' '.join(v for v in values if v is not None)
Out[4]: 'a b'
Or if order is not important, or if you're using Python 3.7+ and you want to preserve insertion order, you can skip some steps:
In [5]: ' '.join(test_dict.values())
Out[5]: 'a b'

How do I convert a list of strings into dict where only a certain type at an unknown index can become the keys?

I have a list of strings that looks, like this:
myList = [
"this 1 is my string",
"a nice 2 string",
"string is 3 so nice"
]
I'd like to convert this string into a dict that also looks, like this:
{
"1": "this is my string",
"2": "a nice string",
"3": "string is so nice"
}
I don't know how to do this.
Only the integer can become the key but everything else must become the value, thank you.
import re
myDict = {}
for element in myList:
# Find number using regex.
key = re.findall(r'\d+', element)[0]
# Get index of number.
index = element.index(key)
# Create new string with index and trailing space removed.
new_element = element[:index] + element[index + 2:]
# Add to dict.
myDict[key] = new_element
If you multiple numbers in a line, it will take the first number as the key for the dict,
>>> for line in myList:
... match = re.search(r'\d+',line)
... if match:
... num = match.group()
... newline = line.partition(num) # control over the partition
... newline = newline[0].strip() + ' '.join(newline[2:])
... d[num] = newline
...
>>>
>>> d
{'1': 'this is my string', '3': 'string is so nice', '2': 'a nice string'}
The simplest way to do this without installing any external dependencies is by using the findall method from the re module.
from re import findall
def list_to_dict(lst):
result = {}
for value in lst:
match = findall(r"\d", value)
if len(match) > 0:
result[match[0]] = value.replace(match[0], "").replace(" ", " ")
return result
If you wanted to, you could replace the 0 index with another index, although you should only do this if you are certain you know where the integer's index is.
Then using your list:
my_list = [
"this 1 is my string",
"a nice 2 string",
"string is 3 so nice"
]
You'd call the function, like this:
print(list_to_dict(my_list))
Which should output this dict:
{'1': 'this is my string', '2': 'a nice string', '3': 'string is so nice'}
Good luck.

How to configure ruamel.yaml.dump output?

With this data structure:
d = {
(2,3,4): {
'a': [1,2],
'b': 'Hello World!',
'c': 'Voilà!'
}
}
I would like to get this YAML:
%YAML 1.2
---
[2,3,4]:
a:
- 1
- 2
b: Hello World!
c: 'Voilà!'
Unfortunately I get this format:
$ print ruamel.yaml.dump(d, default_flow_style=False, line_break=1, explicit_start=True, version=(1,2))
%YAML 1.2
---
? !!python/tuple
- 2
- 3
- 4
: a:
- 1
- 2
b: Hello World!
c: !!python/str 'Voilà!'
I cannot configure the output I want even with safe_dump. How can I do that without manual regex work on the output?
The only ugly solution I found is something like:
def rep(x):
return repr([int(y) for y in re.findall('^\??\s*-\s*(\d+)', x.group(0), re.M)]) + ":\n"
print re.sub('\?(\s*-\s*(\w+))+\s*:', rep,
ruamel.yaml.dump(d, default_flow_style=False, line_break=1, explicit_start=True, version=(1,2)))
New ruamel.yaml API
You cannot get what you want using ruamel.yaml.dump(), but with the new API, which has
a few more controls, you can come very close.
import sys
import ruamel.yaml
d = {
(2,3,4): {
'a': [1,2],
'b': 'Hello World!',
'c': 'Voilà!'
}
}
def prep(d):
if isinstance(d, dict):
needs_restocking = False
for idx, k in enumerate(d):
if isinstance(k, tuple):
needs_restocking = True
try:
if 'à' in d[k]:
d[k] = ruamel.yaml.scalarstring.SingleQuotedScalarString(d[k])
except TypeError:
pass
prep(d[k])
if not needs_restocking:
return
items = list(d.items())
for (k, v) in items:
d.pop(k)
for (k, v) in items:
if isinstance(k, tuple):
k = ruamel.yaml.comments.CommentedKeySeq(k)
d[k] = v
elif isinstance(d, list):
for item in d:
prep(item)
yaml = ruamel.yaml.YAML()
yaml.indent(mapping=2, sequence=4, offset=2)
yaml.version = (1, 2)
data = prep(d)
yaml.dump(d, sys.stdout)
which gives:
%YAML 1.2
---
[2, 3, 4]:
a:
- 1
- 2
b: Hello World!
c: 'Voilà!'
There is still no simple way to suppress the space before the sequence items, so you cannot get [2,3,4] insted of [2, 3, 4] without some major effort.
Original answer:
You cannot get exactly what you want as output using ruamel.yaml.dump() without major rework of the internals.
The output you like has indentation 2 for the values of the top-level mapping (key a, b, etc) and indentation 4 for the elements of the sequence that is the value for the a key (with the - pushed in 2 positions. That would at least require differencing between indentation levels for mapping and sequences (if not for individual collections) and that is non-trivial.
Your sequence output is compacted from the , (comma, space) what a "normal" flow style emits to just a ,. IIRC this cannot currently be influenced by any parameter, and since you have little contextual knowledge when emitting a collection, it is difficult to "not include the spaces when emitting a sequence that is a key". An additional option to dump() would require changes in several of the sources files and classes.
Less difficult issues, with indication of solution:
Your tuple has to magically convert to a sequence to get rid of the tag !!python/tuple. As you don't want to affect all tuples, this is IMO best done by making a subclass of tuple and represent this as a sequence (optionally represent such tuple as list only if actually used as a key). You can use comments.CommentedKeySeq for that (assuming ruamel.yaml>=0.12.14, it has the proper representation support when using ruamel.yaml.round_trip_dump()
Your key is, when tested before emitting, not a simple key and as such it get a '? ' (question mark, space) to indicate a complex mapping key. . You would have to change the emitter so that the SequenceStartEvent starts a simple key (if it has flow style and not block style). An additional issue is that such a SequenceStartEvent then will be "tested" to have a style attribute (which might indicate an explicit need for '?' on key). This requires changing emitter.py:Emitter.check_simple_key() and emitter.py:Emitter.expect_block_mapping_key().
Your scalar string value for c gets quotes, whereas your scalar string value for b doesn't. You only can get that kind of difference in output in ruamel.yaml by making them different types. E.g. by making it type scalarstring.SingleQuotedScalarString() (and using round_trip_dump()).
If you do:
import sys
import ruamel.yaml
from ruamel.yaml.comments import CommentedMap, CommentedKeySeq
assert ruamel.yaml.version_info >= (0, 12, 14)
data = CommentedMap()
data[CommentedKeySeq((2, 3, 4))] = cm = CommentedMap()
cm['a'] = [1, 2]
cm['b'] = 'Hello World!'
cm['c'] = ruamel.yaml.scalarstring.SingleQuotedScalarString('Voilà!')
ruamel.yaml.round_trip_dump(data, sys.stdout, explicit_start=True, version=(1, 2))
you will get:
%YAML 1.2
---
[2, 3, 4]:
a:
- 1
- 2
b: Hello World!
c: 'Voilà!'
which, apart from the now consistent indentation level of 2, the extra spaces in the flow style sequence, and the required use of the round_trip_dump, will get you as close to what you want without major rework.
Whether the above code is ugly as well or not is of course a matter of taste.
The output will, non-incidently, round-trip correctly when loaded using ruamel.yaml.round_trip_load(preserve_quotes=True).
If control over the quotes is not needed, and neither is the order of your mapping keys important, then you can also patch the normal dumper:
def my_key_repr(self, data):
if isinstance(data, tuple):
print('data', data)
return self.represent_sequence(u'tag:yaml.org,2002:seq', data,
flow_style=True)
return ruamel.yaml.representer.SafeRepresenter.represent_key(self, data)
ruamel.yaml.representer.Representer.represent_key = my_key_repr
Then you can use a normal sequence:
data = {}
data[(2, 3, 4)] = cm = {}
cm['a'] = [1, 2]
cm['b'] = 'Hello World!'
cm['c'] = 'Voilà!'
ruamel.yaml.dump(data, sys.stdout, allow_unicode=True, explicit_start=True, version=(1, 2))
will give you:
%YAML 1.2
---
[2, 3, 4]:
a: [1, 2]
b: Hello World!
c: Voilà!
please note that you need to explicitly allow unicode in your output (default with round_trip_dump()) using allow_unicode=True.
¹ Disclaimer: I am the author of ruamel.yaml.

How to convert a malformed string to a dictionary?

I have a string s (note that the a and b are not enclosed in quotation marks, so it can't directly be evaluated as a dict):
s = '{a:1,b:2}'
I want convert this variable to a dict like this:
{'a':1,'b':2}
How can I do this?
This will work with your example:
import ast
def elem_splitter(s):
return s.split(':',1)
s = '{a:1,b:2}'
s_no_braces = s.strip()[1:-1] #s.translate(None,'{}') is more elegant, but can fail if you can have strings with '{' or '}' enclosed.
elements = (elem_splitter(ss) for ss in s_no_braces.split(','))
d = dict((k,ast.literal_eval(v)) for k,v in elements)
Note that this will fail if you have a string formatted as:
'{s:"foo,bar",ss:2}' #comma in string is a problem for this algorithm
or:
'{s,ss:1,v:2}'
but it will pass a string like:
'{s ss:1,v:2}' #{"s ss":1, "v":2}
You may also want to modify elem_splitter slightly, depending on your needs:
def elem_splitter(s):
k,v = s.split(':',1)
return k.strip(),v # maybe `v.strip() also?`
*Somebody else might cook up a better example using more of the ast module, but I don't know it's internals very well, so I doubt I'll have time to make that answer.
As your string is malformed as both json and Python dict so you neither can use json.loads not ast.literal_eval to directly convert the data.
In this particular case, you would have to manually translate it to a Python dictionary by having knowledge of the input data
>>> foo = '{a:1,b:2}'
>>> dict(e.split(":") for e in foo.translate(None,"{}").split(","))
{'a': '1', 'b': '2'}
As Updated by Tim, and my short-sightedness I missed the fact that the values should be integer, here is an alternate implementation
>>> {k: int(v) for e in foo.translate(None,"{}").split(",")
for k, v in [e.split(":")]}
{'a': 1, 'b': 2}
import re,ast
regex = re.compile('([a-z])')
ast.literal_eval(regex.sub(r'"\1"', s))
out:
{'a': 1, 'b': 2}
EDIT:
If you happen to have something like {foo1:1,bar:2} add an additional capture group to the regex:
regex = re.compile('(\w+)(:)')
ast.literal_eval(regex.sub(r'"\1"\2', s))
You can do it simply with this:
s = "{a:1,b:2}"
content = s[s.index("{")+1:s.index("}")]
to_int = lambda x: int(x) if x.isdigit() else x
d = dict((to_int(i) for i in pair.split(":", 1)) for pair in content.split(","))
For simplicity I've omitted exception handling if the string doesn't contain a valid specification, and also this version doesn't strip whitespace, which you may want. If the interpretation you prefer is that the key is always a string and the value is always an int, then it's even easier:
s = "{a:1,b:2}"
content = s[s.index("{")+1:s.index("}")]
d = dict((int(pair[0]), pair[1].strip()) for pair in content.split(","))
As a bonus, this version also strips whitespace from the key to show how simple it is.
import simplejson
s = '{a:1,b:2}'
a = simplejson.loads(s)
print a

Categories

Resources