convert a json string to python object - python

Is it possible to convert a json string (for e.g. the one returned from the twitter search json service) to simple string objects. Here is a small representation of data returned from the json service:
{
results:[...],
"max_id":1346534,
"since_id":0,
"refresh_url":"?since_id=26202877001&q=twitter",
.
.
.
}
Lets say that I somehow store the result in some variable, say, obj. I am looking to get appropriate values like as follows:
print obj.max_id
print obj.since_id
I've tried using simplejson.load() and json.load() but it gave me an error saying 'str' object has no attribute 'read'

I've tried using simplejson.load() and json.load() but it gave me an error saying 'str' object has no attribute 'read'
To load from a string, use json.loads() (note the 's').
More efficiently, skip the step of reading the response into a string, and just pass the response to json.load().

if you don't know if the data will be a file or a string.... use
import StringIO as io
youMagicData={
results:[...],
"max_id":1346534,
"since_id":0,
"refresh_url":"?since_id=26202877001&q=twitter",
.
.
.
}
magicJsonData=json.loads(io.StringIO(str(youMagicData)))#this is where you need to fix
print magicJsonData
#viewing fron the center out...
#youMagicData{}>str()>fileObject>json.loads
#json.loads(io.StringIO(str(youMagicData))) works really fast in my program and it would work here so stop wasting both our reputation here and stop down voting because you have to read this twice
from https://docs.python.org/3/library/io.html#text-i-o
json.loads from the python built-in libraries, json.loads requires a file object and does not check what it's passed so it still calls the read function on what you passed because the file object only gives up data when you call read(). So because the built-in string class does not have the read function we need a wrapper. So the StringIO.StringIO function in short, subclasses the string class and the file class and meshing the inner workings hears my low detail rebuild https://gist.github.com/fenderrex/843d25ff5b0970d7e90e6c1d7e4a06b1
so at the end of all that its like writing a ram file and jsoning it out in one line....

Related

How to cast a mmap.mmap Python object into a string?

I'm making a Bottle powered program and I use the yield keyword with mmap.mmap object to send multiples mapped files in the output stream like in this code :
for mapping in mappings:
yield mapping
This doesn't work out of the box because Bottle want a string (see Iterables and generators section) and when I use str(mapping), this return the object itself not the content.
So, I want to cast the mmap object into a string which contain the file content.
I'm thinking as the C programmer who just want to put a raw pointer
After Googling for hours. I found out that simply using bytes(mapping) work.
for mapping in mappings:
yield bytes(mapping)
This way doesn't seem to do a temporary copy of mapped data

Use of .digest() in hashing?

What is the use of .digest() in this statement? Why do we use it ? I searched on google ( and documentation also) but still I am not able to figure it out.
train_hashes = [hashlib.sha1(x).digest() for x in train_dataset]
What I found is that it convert to string. Am I right or wrong?
The .digest() method returns the actual digest the hash is designed to produce.
It is a separate method because the hashing API is designed to accept data in multiple pieces:
hash = hashlib.sha1()
for chunk in large_amount_of_data:
hash.update(chunk)
final_digest = hash.digest()
The above code creates a hashing object without passing any initial data in, then uses the hash.update() method to put chunks of data in in a loop. This helps avoid having to all of the data into memory all at once, so you can hash anything between 1 byte and the entire Google index, if you ever had access to something that large.
If hashlib.sha1(x) produced the digest directly you could never add additional data to hash first. Moreover, there is also an alternative method of accessing the digest, as a hexadecimal string using the hash.hexdigest() method (equivalent to hash.digest().hex(), but more convenient).
The code you found uses the fact that the constructor of the hash object also accepts data; since that's the all of the data that you wanted to hash you can call .digest() immediately.
The module documentation covers it this way:
There is one constructor method named for each type of hash. All return a hash object with the same simple interface. For example: use sha256() to create a SHA-256 hash object. You can now feed this object with bytes-like objects (normally bytes) using the update() method. At any point you can ask it for the digest of the concatenation of the data fed to it so far using the digest() or hexdigest() methods.
(bold emphasis mine).

How to check whether a json object is array

Im new to python.I came up with this issue while sending json arraylist obect from java to python.While sending the json object from java the json structure of the arraylist is
[{'firstObject' : 'firstVal'}]
but when i receive it in python i get the value as
{'listName':{'firstObject':'firstVal'}}
when i pass more than one object in the array like this :
[{'firstObject' : 'firstVal'},{'secondObject' : 'secondVal'}]
I am receiving the json from python end as
{'listName':[{'firstObject':'firstVal'},{'secondObject' : 'secondVal'}]}
I couldnt figure out why this is happening.Can anyone help me either a way to make the first case a array object or a way to figure out whether a json variable is array type.
Whenever you use the load (or loads) function from the json module, you get either a dict or a list object. To make sure you get a list instead of a dict containing listName, you could do the following:
import json
jsonfile = open(...) # <- your json file
json_obj = json.load(jsonfile)
if isinstance(json_obj, dict) and 'listName' in json_obj:
json_obj = json_obj['listName']
That should give you the desired result.
json module in Python does not change the structure:
assert type(json.loads('[{"firstObject": "firstVal"}]')) == list
If you see {'listName':{'firstObject':'firstVal'}} then something (either in java or in python (in your application code)) changes the output/input.
Note: it is easy to unpack 'listName' value as shown in #Fawers' answer but you should not do that. Fix the upstream code that produces wrong values instead.

Python JSON module not throwing exception for invalid JSON

So, valid JSON must be an Object or Array, correct? I was expecting the following code to throw an exception but it is not:
>>> import json
>>> json.loads("245235")
245235
That's not invalid JSON*. Number is a valid JSON type, just like object. http://en.wikipedia.org/wiki/JSON#Data_types.2C_syntax_and_example any of these types could appear on its own, albeit that object and array are probably the most common top level types.
*according to the python implementation
EDIT:
As pointed out in a deleted (not sure why) answer, the python docs suggest that the JSON RFC does require the top level object to be of array or object type, but that the json module doesn't enforce this. Since a lot of what I know about JSON has come from working with the python json module, I don't know how portable this behavior is.
As requested, this is noted at http://docs.python.org/2/library/json.html#standard-compliance:
This module does not comply with the RFC in a strict fashion,
implementing some extensions that are valid JavaScript but not valid
JSON. In particular:
Top-level non-object, non-array values are accepted and output;
Infinite and NaN number values are accepted and output;
Repeated names within an object are accepted, and only the value of the last name-value pair is used.
JSON data can have a wide range of types including strings, numbers and booleans

Elegant way to store dictionary permanently with Python?

Currently expensively parsing a file, which generates a dictionary of ~400 key, value pairs, which is seldomly updated. Previously had a function which parsed the file, wrote it to a text file in dictionary syntax (ie. dict = {'Adam': 'Room 430', 'Bob': 'Room 404'}) etc, and copied and pasted it into another function whose sole purpose was to return that parsed dictionary.
Hence, in every file where I would use that dictionary, I would import that function, and assign it to a variable, which is now that dictionary. Wondering if there's a more elegant way to do this, which does not involve explicitly copying and pasting code around? Using a database kind of seems unnecessary, and the text file gave me the benefit of seeing whether the parsing was done correctly before adding it to the function. But I'm open to suggestions.
Why not dump it to a JSON file, and then load it from there where you need it?
import json
with open('my_dict.json', 'w') as f:
json.dump(my_dict, f)
# elsewhere...
with open('my_dict.json') as f:
my_dict = json.load(f)
Loading from JSON is fairly efficient.
Another option would be to use pickle, but unlike JSON, the files it generates aren't human-readable so you lose out on the visual verification you liked from your old method.
Why mess with all these serialization methods? It's already written to a file as a Python dict (although with the unfortunate name 'dict'). Change your program to write out the data with a better variable name - maybe 'data', or 'catalog', and save the file as a Python file, say data.py. Then you can just import the data directly at runtime without any clumsy copy/pasting or JSON/shelve/etc. parsing:
from data import catalog
JSON is probably the right way to go in many cases; but there might be an alternative. It looks like your keys and your values are always strings, is that right? You might consider using dbm/anydbm. These are "databases" but they act almost exactly like dictionaries. They're great for cheap data persistence.
>>> import anydbm
>>> dict_of_strings = anydbm.open('data', 'c')
>>> dict_of_strings['foo'] = 'bar'
>>> dict_of_strings.close()
>>> dict_of_strings = anydbm.open('data')
>>> dict_of_strings['foo']
'bar'
If the keys are all strings, you can use the shelve module
A shelf is a persistent, dictionary-like object. The difference with
“dbm” databases is that the values (not the keys!) in a shelf can be
essentially arbitrary Python objects — anything that the pickle module
can handle. This includes most class instances, recursive data types,
and objects containing lots of shared sub-objects. The keys are
ordinary strings.
json would be a good choice if you need to use the data from other languages
If storage efficiency matters, use Pickle or CPickle(for execution performance gain). As Amber pointed out, you can also dump/load via Json. It will be human-readable, but takes more disk.
I suggest you consider using the shelve module since your data-structure is a mapping.
That was my answer to a similar question titled If I want to build a custom database, how could I? There's also a bit of sample code in another answer of mine promoting its use for the question How to get a object database?
ActiveState has a highly rated PersistentDict recipe which supports csv, json, and pickle output file formats. It's pretty fast since all three of those formats are implement in C (although the recipe itself is pure Python), so the fact that it reads the whole file into memory when it's opened might be acceptable.
JSON (or YAML, or whatever) serialisation is probably better, but if you're already writing the dictionary to a text file in python syntax, complete with a variable name binding, you could just write that to a .py file instead. Then that python file would be importable and usable as is. There's no need for the "function which returns a dictionary" approach, since you can directly use it as a global in that file. e.g.
# generated.py
please_dont_use_dict_as_a_variable_name = {'Adam': 'Room 430', 'Bob': 'Room 404'}
rather than:
# manually_copied.py
def get_dict():
return {'Adam': 'Room 430', 'Bob': 'Room 404'}
The only difference is that manually_copied.get_dict gives you a fresh copy of the dictionary every time, whereas generated.please_dont_use_dict_as_a_variable_name[1] is a single shared object. This may matter if you're modifying the dictionary in your program after retrieving it, but you can always use copy.copy or copy.deepcopy to create a new copy if you need to modify one independently of the others.
[1] dict, list, str, int, map, etc are generally viewed as bad variable names. The reason is that these are already defined as built-ins, and are used very commonly. So if you give something a name like that, at the least it's going to cause cognitive-dissonance for people reading your code (including you after you've been away for a while) as they have to keep in mind that "dict doesn't mean what it normally does here". It's also quite likely that at some point you'll get an infuriating-to-solve bug reporting that dict objects aren't callable (or something), because some piece of code is trying to use the type dict, but is getting the dictionary object you bound to the name dict instead.
on the JSON direction there is also something called simpleJSON. My first time using json in python the json library didnt work for me/ i couldnt figure it out. simpleJSON was...easier to use

Categories

Resources