how to correctly save "\u**" to json in python - python

I have a dictionary:
data = {"data": "\u512b"}
while I dump that to json:
import json
print json.dumps(data)
I got:{"a":"\\u512b"}
What should I do to get exactly {"a":"\u512b"}?
NOTE: I try to add u before the string so it becomes u'\u512b' and the extra \ won't show up again, please also tell me why

You can do some hacking.
import json
data = {"data": "\u512b"}
s = json.dumps(data)
print(s.replace(r'\u', 'u'))
print(type(s.replace(r'\u', 'u')))
Output:
{"data": "\u512b"}
<type 'str'>

My guess is that you are just confused by the output of the Python interpreter, displaying you the json.dumps generated string with its own \ escape character prepended to the \ character in the string. The JSON string as a value contains exactly one \, as you want (IIUC):
>>> data = {"data": "\u512b"}
>>> data
{'data': '\u512b'}
>>> import json
>>> json.dumps(data)
'{"data": "\\u512b"}'
>>> print(json.dumps(data))
{"data": "\u512b"}
>>> json.dump(data, open('data.json', 'w'))
>>> ^Z
C:\opt\Console2>type data.json
{"data": "\u512b"}
This is entirely independent of JSON in fact, as the following example shows:
>>> s = "s\\u"
>>> s
's\\u'
>>> print(s, len(s)) # length of s is 3, not 4
s\u 3
HTH!

Related

how to dynamically create array variable in json

In python, I can use the below code to dynamically create IP address array to store n number of IP addresses.
for i in range(12):
addr[i] = "10.1.1." + str(i)
print addr[i]
I want a similar code for JSON file. Is it possible to define variables with looping in JSON file?
You don't "loop over a json file" - this makes no sense actually. You create a json string representation of a Python object - usually a dict - and eventually write it to a file (or send it as HTTP response content etc), or create a Python object from a json string:
>>> import json
>>> addr = ["10.1.1.{}".format(i) for i in range(12)]
>>> print(repr(json.dumps(addr)))
'["10.1.1.0", "10.1.1.1", "10.1.1.2", "10.1.1.3", "10.1.1.4", "10.1.1.5", "10.1.1.6", "10.1.1.7", "10.1.1.8", "10.1.1.9", "10.1.1.10", "10.1.1.11"]'
>>> print(repr(json.dumps({"addr": addr})))
'{"addr": ["10.1.1.0", "10.1.1.1", "10.1.1.2", "10.1.1.3", "10.1.1.4", "10.1.1.5", "10.1.1.6", "10.1.1.7", "10.1.1.8", "10.1.1.9", "10.1.1.10", "10.1.1.11"]}'
If you have an existing json string (from a file or any other source) and want to update it, you just parse it (with json.loads) to a Python object, update the python object, and dump it back to json. Assuming you have the following json string (either from a file, http response or whatever, doesn't matter):
>>> source = '{"foo": "bar", "addr": ["10.1.1.0", "10.1.1.1", "10.1.1.2"]}'
and you want to change "foo"'s value to "baaz", add addresses up to 10.1.1.11 and add a "frobnicate: true" key=>value pair:
>>> import json
>>> data = json.loads(source)
>>> data["foo"] = "baaz"
>>> for i in range(12):
... ip = "10.1.1.{}".format(i)
... if ip not in data["addr"]:
... data["addr"].append(ip)
...
>>> data["frobnicate"] = True
>>> updated_json = json.dumps(data)
>>> print(repr(updated_json))
'{"foo": "baaz", "addr": ["10.1.1.0", "10.1.1.1", "10.1.1.2", "10.1.1.3", "10.1.1.4", "10.1.1.5", "10.1.1.6", "10.1.1.7", "10.1.1.8", "10.1.1.9", "10.1.1.10", "10.1.1.11"], "frobnicate": true}'
>>>

Read unicode encoded JSON and access it using Python

I have a very huge unicode JSON data of the following format
{u'completed': True, u'entries': [{u'absolute_time':
u'2017-05-17T10:41:52Z', u'command': None, u'level':
u'NORMAL',......
It has Json objects within JSON objects. Unable to read it and parse it due to the encoding. Tried the following code.
Could someone please tell how to parse it and convert it to a normal JSON object.
with open(r"inp.json", 'r') as jsonData:
jsonToPython = json.load(jsonData) #gives error here itself
#jsonData = ast.literal_eval(jsonData)
print(json.dumps(jsonToPython))
#print (jsonToPython)
You can try to load the (stringified) python object using ast:
>>> #obj = open(r"inp.json", 'r').read()
>>> obj = "{u'completed': True, u'entries': [{u'absolute_time': u'2017-05-17T10:41:52Z'}]}"
>>> ast.literal_eval(obj)
{'completed': True, 'entries': [{'absolute_time': '2017-05-17T10:41:52Z'}]}
>>>

How to convert an string to json in Python

I'm trying to convert an string into json output from local Data or Those datas from BeautifulSoup output as Json.for example:
#! /usr/bin/python
data = ('Hello')
print data
and i need to convert this Hello as json output.
How can do that?
is this possible?
Check out the json module in Python https://docs.python.org/2/library/json.html
import json
json.dumps({"hello": 0}, sort_keys=True)
You can use the json module to encode Python objects as JSON, e.g.
>>> import json
>>> data = ('Hello')
>>> json.dumps(data)
'"Hello"'
>>> data = ('Hello', 'There')
>>> json.dumps(data)
'["Hello", "There"]'
>>> data = {'message': 'Hello'}
>>> json.dumps(data)
'{"message": "Hello"}'

In python, is there a way to extract a embedded json string?

So I'm parsing a really big log file with some embedded json.
So I'll see lines like this
foo="{my_object:foo, bar:baz}" a=b c=d
The problem is that since the internal json can have spaces, but outside of the JSON, spaces act as tuple delimiters (except where they have unquoted strings . Huzzah for whatever idiot thought that was a good idea), I'm not sure how to figure out where the end of the JSON string is without reimplementing large portions of a json parser.
Is there a json parser for Python where I can give it '{"my_object":"foo", "bar":"baz"} asdfasdf', and it can return ({'my_object' : 'foo', 'bar':'baz'}, 'asdfasdf') or am I going to have to reimplement the json parser by hand?
Found a really cool answer. Use json.JSONDecoder's scan_once function
In [30]: import json
In [31]: d = json.JSONDecoder()
In [32]: my_string = 'key="{"foo":"bar"}"more_gibberish'
In [33]: d.scan_once(my_string, 5)
Out[33]: ({u'foo': u'bar'}, 18)
In [37]: my_string[18:]
Out[37]: '"more_gibberish'
Just be careful
In [38]: d.scan_once(my_string, 6)
Out[38]: (u'foo', 11)
Match everything around it.
>>> re.search('^foo="(.*)" a=.+ c=.+$', 'foo="{my_object:foo, bar:baz}" a=b c=d').group(1)
'{my_object:foo, bar:baz}'
Use shlex and json.
Something like:
import shlex
import json
def decode_line(line):
decoded = {}
fields = shlex.split(line)
for f in fields:
k, v = f.split('=', 1)
if k == "foo":
v = json.loads(v)
decoded[k] = v
return decoded
This does assume that the JSON inside the quotes is quoted properly.
Here's a short example program that uses the above:
import pipes
testdict = {"hello": "world", "foo": "bar"}
line = 'foo=' + pipes.quote(json.dumps(testdict)) + ' a=b c=d'
print line
print decode_line(line)
With output:
foo='{"foo": "bar", "hello": "world"}' a=b c=d
{'a': 'b', 'c': 'd', 'foo': {u'foo': u'bar', u'hello': u'world'}}

Python: json.loads returns items prefixing with 'u'

I'll be receiving a JSON encoded string from Objective-C, and I am decoding a dummy string (for now) like the code below. My output comes out with character 'u' prefixing each item:
[{u'i': u'imap.gmail.com', u'p': u'aaaa'}, {u'i': u'333imap.com', u'p': u'bbbb'}...
How is JSON adding this Unicode character? What's the best way to remove it?
mail_accounts = []
da = {}
try:
s = '[{"i":"imap.gmail.com","p":"aaaa"},{"i":"imap.aol.com","p":"bbbb"},{"i":"333imap.com","p":"ccccc"},{"i":"444ap.gmail.com","p":"ddddd"},{"i":"555imap.gmail.com","p":"eee"}]'
jdata = json.loads(s)
for d in jdata:
for key, value in d.iteritems():
if key not in da:
da[key] = value
else:
da = {}
da[key] = value
mail_accounts.append(da)
except Exception, err:
sys.stderr.write('Exception Error: %s' % str(err))
print mail_accounts
The u- prefix just means that you have a Unicode string. When you really use the string, it won't appear in your data. Don't be thrown by the printed output.
For example, try this:
print mail_accounts[0]["i"]
You won't see a u.
Everything is cool, man. The 'u' is a good thing, it indicates that the string is of type Unicode in python 2.x.
http://docs.python.org/2/howto/unicode.html#the-unicode-type
The d3 print below is the one you are looking for (which is the combination of dumps and loads) :)
Having:
import json
d = """{"Aa": 1, "BB": "blabla", "cc": "False"}"""
d1 = json.loads(d) # Produces a dictionary out of the given string
d2 = json.dumps(d) # Produces a string out of a given dict or string
d3 = json.dumps(json.loads(d)) # 'dumps' gets the dict from 'loads' this time
print "d1: " + str(d1)
print "d2: " + d2
print "d3: " + d3
Prints:
d1: {u'Aa': 1, u'cc': u'False', u'BB': u'blabla'}
d2: "{\"Aa\": 1, \"BB\": \"blabla\", \"cc\": \"False\"}"
d3: {"Aa": 1, "cc": "False", "BB": "blabla"}
Those 'u' characters being appended to an object signifies that the object is encoded in Unicode.
If you want to remove those 'u' characters from your object, you can do this:
import json, ast
jdata = ast.literal_eval(json.dumps(jdata)) # Removing uni-code chars
Let's checkout from python shell
>>> import json, ast
>>> jdata = [{u'i': u'imap.gmail.com', u'p': u'aaaa'}, {u'i': u'333imap.com', u'p': u'bbbb'}]
>>> jdata = ast.literal_eval(json.dumps(jdata))
>>> jdata
[{'i': 'imap.gmail.com', 'p': 'aaaa'}, {'i': '333imap.com', 'p': 'bbbb'}]
Unicode is an appropriate type here. The JSONDecoder documentation describe the conversion table and state that JSON string objects are decoded into Unicode objects.
From 18.2.2. Encoders and Decoders:
JSON Python
==================================
object dict
array list
string unicode
number (int) int, long
number (real) float
true True
false False
null None
"encoding determines the encoding used to interpret any str objects decoded by this instance (UTF-8 by default)."
The u prefix means that those strings are unicode rather than 8-bit strings. The best way to not show the u prefix is to switch to Python 3, where strings are unicode by default. If that's not an option, the str constructor will convert from unicode to 8-bit, so simply loop recursively over the result and convert unicode to str. However, it is probably best just to leave the strings as unicode.
I kept running into this problem when trying to capture JSON data in the log with the Python logging library, for debugging and troubleshooting purposes. Getting the u character is a real nuisance when you want to copy the text and paste it into your code somewhere.
As everyone will tell you, this is because it is a Unicode representation, and it could come from the fact that you’ve used json.loads() to load in the data from a string in the first place.
If you want the JSON representation in the log, without the u prefix, the trick is to use json.dumps() before logging it out. For example:
import json
import logging
# Prepare the data
json_data = json.loads('{"key": "value"}')
# Log normally and get the Unicode indicator
logging.warning('data: {}'.format(json_data))
>>> WARNING:root:data: {u'key': u'value'}
# Dump to a string before logging and get clean output!
logging.warning('data: {}'.format(json.dumps(json_data)))
>>> WARNING:root:data: {'key': 'value'}
Try this:
mail_accounts[0].encode("ascii")
Just replace the u' with a single quote...
print (str.replace(mail_accounts,"u'","'"))

Categories

Resources