Check if file is json loadable - python

I have two types of txt files, one which is saved in some arbitrary format on the form
Header
key1 value1
key2 value2
and the other file formart is a simple json dump stored as
with open(filename,"w") as outfile:
json.dump(json_data,outfile)
From a dialog window, the user can load either of these two files, but my loader need to be able to distinguish between type1 and type2 and send the file to the correct load routine.
#Pseudocode
def load(filename):
if filename is json-loadable:
json_loader(filename)
else:
other_loader(filename)
The easiest way I can think of is to use a try/except block as
def load(filename):
try:
data = json.load(open(filename))
process_data(data)
except:
other_loader(filename)
but I do not like this approach since there is like a 50/50 risk of fail in the try/except block, and as far as I know try/except is slow if you fail.
So is there a simpler and more convenient way of checking if its json-format or not?

You can do something like this:
def convert(tup):
"""
Convert to python dict.
"""
try:
tup_json = json.loads(tup)
return tup_json
except ValueError, error: # includes JSONDecodeError
logger.error(error)
return None
converted = convert(<string_taht_neeeds_to_be_converted_to_json>):
if converted:
<do_your_logic>
else:
<if_string_is_not_converteble>

If the top-level data you're dumping is an object, you could check if the first character is {, or [ if it's an array. That's only valid if the header for the other format will never start with those characters. It's also not foolproof because it doesn't guarantee that your data is well formed JSON.
On the other hand your existing solution is fine, much more clear and robust.

Related

What is the most Pythonic way to convert a valid json file to a string?

Below is what I'm doing currently, just wondering if there is a better way.
with open("sample.json", "r") as fp:
json_dict = json.load(fp)
json_string = json.dumps(json_dict)
with open("sample.json") as f:
json_string = f.read()
No need to parse and unparse it.
If you need to raise an exception on invalid JSON, you can parse the string and skip the work of unparsing it:
with open("sample.json") as f:
json_string = f.read()
json.loads(json_string) # Raises an exception if the JSON is invalid.
Json file is just a regular file. You open() it and read() it. It will give you a str. If you want to make sure it contains valid JSON, put the load part of the above code in a try/except block.
I don't know if it's Pythonic or just pointless but you could also do this if validation is part of your requirements:
import json
# I'm fully aware of the missing "แบith" or "close" in the line below
json_string = json.dumps(json.load(open('sample.json')))
Otherwise, user2357112 already said it: "No need to parse and unparse it."
You are doing it right. You probably can find some libraries with different implementations for performance or memory optimization specifics. The python standard is reliable, cover most of the cases, is assured to compatible to other platforms and is simple. It cannot get more pythonic than that.
The way you're doing it is probably fine if you just need to have it raise an Exception for invalid JSON. However, if you want to make sure that you're not changing the file at all you could try something like this:
import json
with open("sample.json") as fp:
json_string = fp.read()
json.loads(json_string)
It will still raise a ValueError if it is invalid JSON, and you know that you haven't changed the data at all. If you're wondering what may change, off the top of my head: the order of dict items, and whitespace, Not to mention if there are duplicate keys in the JSON.

Saving a Queue to a File

My goal is to have a text file that allows me to append data to the end and retrieve and remove the first data entry from the file. Essentially I want to use a text file as a queue (first in, first out). I thought of two ways to accomplish this, but I am unsure which way is more Pythonic and efficient. The first way is to use the json library.
import json
def add_to_queue(item):
q = retrieve_queue()
q.append(item)
write_to_queue(q)
def pop_from_queue():
q = retrieve_queue()
write_to_queue(q[1:])
return q[0]
def write_to_queue(data):
with open('queue.txt', 'w') as file_pointer:
json.dump(data, file_pointer)
def retrieve_queue():
try:
with open('queue.txt', 'r') as file_pointer:
return json.load(file_pointer)
except (IOError, ValueError):
return []
Seems pretty clean, but it requires serialization/deserialization of all of the json data every time I write/read, even though I only need the first item in the list.
The second option is to call readlines() and writelines() to retrieve and to store the data in the text file.
def add_to_queue(item):
with open('queue.txt', 'a') as file_pointer:
file_pointer.write(item + '\n')
def pop_from_queue():
with open('queue.txt', 'r+') as file_pointer:
lines = file_pointer.readlines()
file_pointer.seek(0)
file_pointer.truncate()
file_pointer.writelines(lines[1:])
return lines[0].strip()
Both of them work fine, so my question is: what is the recommended way to implement a "text file queue"? Is using json "better" (more Pythonic/faster/more memory efficient) than reading and writing to the file myself? Both of these solutions seem rather complicated based on the simplicity of the problem; am I missing a more obvious way to do this?

What are the appropriate argument/return types for a function to take binary files/streams/filenames and convert them to readable text format?

I have a function that's intended to take a binary file format and convert it to a readable text format, e.g.:
def textualize(binary_stuff):
# magic to turn binary stuff into text
return text_stuff
There are a few different types I could accept as input or produce as output, and I'm unsure what to use. Here are some options and corresponding objections I can think of:
Take a bytes object as input and return a string.
Problematic if, say, the input is originating from a huge file that now has to be read into memory.
Take a file-like object as input, read it, and return a string.
Relies on the caller to open the file in the right mode.
The asymmetry of this disturbs me for reasons I can't quite put a finger on.
Take two file-like objects; read from one and write to the other instead of returning anything.
Again relies on the caller to open the files in the right mode.
Makes the most common cases (named file to named file, or bytes to string) more unwieldly than they need to be.
Take two filenames and handle opening stuff myself.
What if the caller wants to convert data that isn't in a named file?
Accept multiple possible input types.
Possibly complicated to program.
Still leaves the question of what to return.
Is there an established Right Thing to do for conversions like this? Are there additional tradeoffs I'm missing?
You could do this how the json module does this. One function for strings and another for files. And leave the opening and closing of files to the caller -- gives the caller more flexibility. You could then use functools.singledispatch to provide ways to dispatch your functions
eg.
from functools import singledispatch
from io import BytesIO, StringIO, IOBase, TextIOBase
#singledispatch
def textualise(input, output):
if not isinstance(input, IOBase):
raise TypeError(input)
if not isinstance(output, TextIOBase):
raise TypeError(output)
data = input.read().decode("utf-8")
output.write(data)
output.flush()
#textualise.register(bytes)
def textualise_bytes(bytes_):
input = BytesIO(bytes_)
output = StringIO()
textualise(input, output)
return output.getvalue()
#textualise.register(str)
def textualise_filenames(in_filename, out_filename):
with open(in_filename, "rb") as input, open(out_filename, "wt") as output:
textualise(input, output)
s = textualise(b"some text")
assert s == "some text"
textualise("inputfile.txt", "outputfile.txt")
I would personally avoid the the third form since bytes objects are also valid filenames. For example, textualise(b"inputfile.txt", "outputfile.txt") would get dispatched to the wrong function (textualise_bytes).

Skipping broken jsons python

I am reading JSON from the database and parsing it using python.
cur1.execute("Select JSON from t1")
dataJSON = cur1.fetchall()
for row in dataJSON:
jsonparse = json.loads(row)
The problem is some JSON's that I'm reading is broken.
I would like my program to skip the json if its not a valid json and if it is then go ahead and parse it. Right now my program crashes once it encounters a broken json.
T1 has several JSON's that I'm reading one by one.
Update
You're getting an expecting string or buffer - you need to be using row[0] as the results will be 1-tuples... and you wish to take the first and only column.
If you did want to check for bad json
You can put a try/except around it:
for row in dataJSON:
try:
jsonparse = json.loads(row)
except Exception as e:
pass
Now - instead of using Exception as above - use the type of exception that's occuring at the moment so that you don't capture non-json loading related errors... (It's probably ValueError)
If you just want to silently ignore errors, you can wrap json.loads in a try..except block:
try: jsonparse = json.loads(row)
except: pass
Try this:
def f(x):
try:
return json.loads(x)
except:
pass
json_df = pd.DataFrame()
json_df = df.join(df["error"].apply(lambda x: f(x)).apply(pd.Series))
After JSON loads, I also wanted to convert each key-value pair from JSON to a new column (all JSON keys), so I used apply(pd.Series) in conjunction. You should try this by removing that if your goal is only to convert each row from a data frame column to JSON.

how to compare values in an existing dictionary and update the dictionary back to a file?

I am making an utility of sorts with dictionary. What I am trying to achieve is this:
for each XML file that I parse, the existing dictionary is loaded from a file (output.dict) and compared/updated for the current key and stored back along with existing values. I tried with has_key() and attributerror, it does not work.
Since I trying one file at a time, it creates multiple dictionaries and am unable to compare. This is where I am stuck.
def createUpdateDictionary(servicename, xmlfile):
dictionary = {}
if path.isfile == 'output.dict':
dictionary.update (eval(open('output.dict'),'r'))
for event, element in etree.iterparse(xmlfile):
dictionary.setdefault(servicename, []).append(element.tag)
f = open('output.dict', 'a')
write_dict = str(dictionary2)
f.write(write_dict)
f.close()
(here the servicename is nothing but a split '.' of xmlfile which forms the key and values are nothing by the element's tag name)
def createUpdateDictionary(servicename, xmlfile):
dictionary = {}
if path.isfile == 'output.dict':
dictionary.update (eval(open('output.dict'),'r'))
There is a typo, as the 'r' argument belongs to open(), not eval(). Furthermore, you cannot evaluate a file object as returned by open(), you have to read() the contents first.
f = open('output.dict', 'a')
write_dict = str(dictionary2)
f.write(write_dict)
f.close()
Here, you are appending the string representation to the file. The string representation is not guaranteed to represent the dictionary completely. It is meant to be readable by humans to allow inspection, not to persist the data.
Moreover, since you are using 'a' to append the data, you are storing multiple copies of the updated dictionary in the file. Your file might look like:
{}{"foo": []}{"foo": [], "bar":[]}
This is clearly not what you want; you won't even by able to eval() it later (syntax error!).
Since eval() will execute arbitrary Python code, it is considered evil and you really should not use it for object serialization. Either use pickle, which is the standard way of serialization in Python, or use json, which is a human-readable standard format supported by other languages as well.
import json
def createUpdateDictionary(servicename, xmlfile):
with open('output.dict', 'r') as fp:
dictionary = json.load(fp)
# ... process XML, update dictionary ...
with open('output.dict', 'w') as fp:
json.dump(dictionary, fp)

Categories

Resources