Echoing my other question now need to find a way to crunch json down to one line: e.g.
{"node0":{
"node1":{
"attr0":"foo",
"attr1":"foo bar",
"attr2":"value with long spaces"
}
}}
would like to crunch down to a single line:
{"node0":{"node1":{"attr0":"foo","attr1":"foo bar","attr2":"value with long spaces"}}}
by removing insignificant white spaces and preserving the ones that are within the value. Is there a library to do this in python?
EDIT
Thank you both drdaeman and Eli Courtwright for super quick response!
http://docs.python.org/library/json.html
>>> import json
>>> json.dumps(json.loads("""
... {"node0":{
... "node1":{
... "attr0":"foo",
... "attr1":"foo bar",
... "attr2":"value with long spaces"
... }
... }}
... """))
'{"node0": {"node1": {"attr2": "value with long spaces", "attr0": "foo", "attr1": "foo bar"}}}'
In Python 2.6:
import json
print json.loads( json_string )
Basically, when you use the json module to parse json, then you get a Python dict. If you simply print a dict and/or convert it to a string, it'll all be on one line. Of course, in some cases the Python dict will be slightly different than the json-encoded string (such as with booleans and nulls), so if this matters then you can say
import json
print json.dumps( json.loads(json_string) )
If you don't have Python 2.6 then you can use the simplejson module. In this case you'd simply say
import simplejson
print simplejson.loads( json_string )
Related
I have this simple following program:
package main
import (
"fmt"
yaml "gopkg.in/yaml.v2"
)
type Test struct {
SomeStringWithQuotes string `yaml:"someStringWithQuotes"`
SomeString string `yaml:"someString"`
SomeHexValue string `yaml:"someHexValue"`
}
func main() {
t := Test{
SomeStringWithQuotes: "\"Hello World\"",
SomeString: "Hello World",
SomeHexValue: "0xDef9C64256DeE61ebf5B212238df11C7E532e3B7",
}
yamlBytes, _ := yaml.Marshal(t)
fmt.Print(string(yamlBytes))
}
This prints the following and obviously demonstrates that Go makes decisions on when to quote a string or not:
someStringWithQuotes: '"Hello World"'
someString: Hello World
someHexValue: 0xDef9C64256DeE61ebf5B212238df11C7E532e3B7
However, when I try to read this YAML using the following Python script:
import yaml
yaml_str = """
someStringWithQuotes: '"Hello World"'
someString: Hello World
someHexValue: 0xDef9C64256DeE61ebf5B212238df11C7E532e3B7
"""
print(yaml.load(yaml_str))
It parses the Hex value as an integer. If I now serialize back to YAML using this code:
import yaml
import sys
yaml_str = """
someStringWithQuotes: '"Hello World"'
someString: Hello World
someHexValue: 0xDef9C64256DeE61ebf5B212238df11C7E532e3B7
"""
print(yaml.dump(yaml.load(yaml_str)))
I get:
someHexValue: 1272966107484048169783147972546098614451903325111
someString: Hello World
someStringWithQuotes: '"Hello World"'
How can I best make sure that the Hex format is preserved? Unfortunately, I personally don't have any influence on the code on the Go side (but a Go-side solution is still welcome for other people who try to do similar things).
You can load and dump that output in Python while preserving the hex value using ruamel.yaml (disclaimer: I am the author of that Python package):
import sys
import ruamel.yaml
yaml_str = """\
someHexValue: 0xDef9C64256DeE61ebf5B212238df11C7E532e3B7
someString: Hello World
someStringWithQuotes: '"Hello World"'
"""
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)
which gives:
someHexValue: 0xDEF9C64256DEE61EBF5B212238DF11C7E532E3B7
someString: Hello World
someStringWithQuotes: '"Hello World"'
The actual output of go is incorrect, if you were to output the string "0xDef9C64256DeE61ebf5B212238df11C7E532e3B7" using Python, then you will see that it outputs
that string with quotes (I am using ruamel.yaml here, but this works the same for PyYAML):
import sys
import ruamel.yaml
data = dict(someHexValue="0xDef9C64256DeE61ebf5B212238df11C7E532e3B7")
yaml = ruamel.yaml.YAML()
yaml.dump(data, sys.stdout)
which gives:
someHexValue: '0xDef9C64256DeE61ebf5B212238df11C7E532e3B7'
That this string needs quoting, is determined by representing the
string "plain" (i.e. without quotes) and then trying to resolving it
to make sure the orgiinal type (string) is returned. This is not the
case, as it is found to be an integer and the representer part of the
dumping process decides that quotes are necessary. (If you ever look
at the loading and dumping code and wonder why the resolver, is used
by both: this is the reason the dumper needs access to the resolver.py as
well).
This works the same way for a string like "True" and "2019-02-08', which also get quoted (in order
not to "confuse" them with a boolean or a date).
This is a rather expensive computational process, and there are of course other ways
of determining whether quotes are needed.
In go, this works in the same way, but there is an error in the relevant code in resolve.go:
intv, err := strconv.ParseInt(plain, 0, 64)
if err == nil {
if intv == int64(int(intv)) {
return yaml_INT_TAG, int(intv)
} else {
return yaml_INT_TAG, intv
}
}
From the documentation for ParseInt:
If base == 0, the base is implied by the string's prefix: base 16 for "0x", base 8 for "0", and base 10 otherwise.
The problem is of course that there is no restriction in YAML nor in
Python on the size of an integer. But in go the are restricted to 64
bits. So in the above ParseInt returns an error and go thinks that the
string doesn't need quoting. ( I reported this as a
bug in the go-yaml
library ).
The go Marshall
function doesn't seem to have a flag to enforce quoting like you can
do with setting yaml.default_style = '"'` inruamel.yaml``.
Go interprets that hex string as a number.
someHexValue: 0xDef9C64256DeE61ebf5B212238df11C7E532e3B7
If that is the yaml it produces then python is right to treat it as a number.
A band aid for this in python is to convert it back to hex using
hex(1272966107484048169783147972546098614451903325111)
Here is the yaml spec that treats that hex as a number
I am reading this text from a CSV file in Python.
Hi there,
This is a test.
and storing it into a variable text.
I am trying to write this variable in a JSON file with json.dump(), but it is being transformed into:
' \ufeffHi there,\n\n\xa0\n\nThis is a test.
How can I make my JSON file look like the one below?:
{
"text": "Hi there,
This is a test."
}
JSON does not allow real line-breaks. If you still want to use them, you will have to make your own "json" writer.
Edit: Here's function that will take python dict (which you can get using json.loads() ) and print it the way you need:
def print_wrong_json(dict_object):
print '{'
print ',\n'.join(['"{}": "{}"'.format(key, dict_object[key]) for key in dict_object])
print '}'
Well it can be done, as user1308345 shows in his answer but it wouldn't be valid JSON anymore and you probably run into issues later, when deserializing the JSON.
But if you really want to do it, and still want to have valid JSON, you could split the string (and remove the new lines) and serialize them as an array like suggested in this answer https://stackoverflow.com/a/7744658/1466757
Then your JSON would look similar to this
{
"text": [
"Hi there,",
"",
"",
"",
"this is a test."
]
}
After deserializing it, you would have to put the line breaks back in.
I get a string which resembles JSON and I'm trying to convert it to valid JSON using python.
It looks like this example, but the real data gets very long:
{u'key':[{
u'key':u'object',
u'something':u'd\xfcabc',
u'more':u'\u2023more',
u'boolean':True
}]
}
So there are also a lot of special characters, as well as the "wrong" boolean which should be just lowercase letters.
I don't have any influence over the data I get, I just have to parse it somehow and extract some stuff from it.
I tried to replace the special characters and everything and force it to be a valid JSON, but it is not at all elegant and I could easily forget to replace one type of special character.
You can use literal_eval from the ast module for this.
ast.literal_eval(yourString)
You can then convert this Object back to JSON.
JSON spec only allows javascript data (true, false for booleans, null, undefined for None properties, etc)
The string of this question, it's an python object, so as #florian-dreschsler says, you must use literal_eval from the ast module
>>> import ast
>>> json_string = """
... {u'key':[{
... u'key':u'object',
... u'something':u'd\xfcabc',
... u'more':u'\u2023more',
... u'boolean':True, #this property fails with json module
... u'null':None, #this property too
... }]
... }
... """
>>> ast.literal_eval(json_string)
{u'key': [{u'boolean': True, u'null': None, u'something': u'd\xfcabc', u'key': u'object', u'more': u'\u2023more'}]}
So In python I make a dictionary in JSON structure
>>> a = {"name":'nikhil',"age":25}
Now I check if a is a Valid JSON using http://jsonlint.com/
. I get it's valid.
Now I do :
>>> b = simplejson.dumps(a)
>>> b= '{"age": 25, "name": "nikhil"}'
Now I do:
>>> c = simplejson.loads(b)
>>> c = {'age': 25, 'name': 'nikhil'}
Now I check if c is a Valid JSON I get error.
Why is Simplejson is not able to convert JSON string back to valid JSON? when I started with a valid JSON only?
You are confusing JSON with Python here. b is a JSON-formatted string, c is a Python object.
Python syntax just happens to look a lot like JSON (JavaScript) in that respect.
Python strings can use either ' or ", depending on the contents; JSON always uses ". You entered a using double quotes for the keys, single quotes for the one string value; if you asked Python to echo it back for you you'll find it'll be shown with only single quotes.
Python booleans are True or False, JSON uses true and false.
The JSON 'empty' value is null, Python uses None instead.
See the Encoders and Decoders section of the json module for an overview how JSON and Python objects are mapped.
I have unicode u"{'code1':1,'code2':1}" and I want it in dictionary format.
I want it in {'code1':1,'code2':1} format.
I tried unicodedata.normalize('NFKD', my_data).encode('ascii','ignore') but it returns string not dictionary.
Can anyone help me?
You can use built-in ast package:
import ast
d = ast.literal_eval("{'code1':1,'code2':1}")
Help on function literal_eval in module ast:
literal_eval(node_or_string)
Safely evaluate an expression node or a string containing a Python expression. The string or node provided may only consist of the following Python literal structures: strings, numbers, tuples, lists, dicts, booleans, and None.
You can use literal_eval. You may also want to be sure you are creating a dict and not something else. Instead of assert, use your own error handling.
from ast import literal_eval
from collections import MutableMapping
my_dict = literal_eval(my_str_dict)
assert isinstance(my_dict, MutableMapping)
EDIT: Turns out my assumption was incorrect; because the keys are not wrapped in double-quote marks ("), the string isn't JSON. See here for some ways around this.
I'm guessing that what you have might be JSON, a.k.a. JavaScript Object Notation.
You can use Python's built-in json module to do this:
import json
result = json.loads(u"{'code1':1,'code2':1}") # will NOT work; see above
I was getting unicode error when I was reading a json from a file. So this one worked for me.
import ast
job1 = {}
with open('hostdata2.json') as f:
job1= json.loads(f.read())
f.close()
#print type before converting this from unicode to dic would be <type 'unicode'>
print type(job1)
job1 = ast.literal_eval(job1)
print "printing type after ast"
print type(job1)
# this should result <type 'dict'>
for each in job1:
print each
print "printing keys"
print job1.keys()
print "printing values"
print job1.values()
You can use the builtin eval function to convert the string to a python object
>>> string_dict = u"{'code1':1, 'code2':1}"
>>> eval(string_dict)
{'code1': 1, 'code2': 1}