Cannot print data off a JSON file [duplicate] - python

This question already has answers here:
How should I write a Windows path in a Python string literal?
(5 answers)
Closed 6 months ago.
I have a JSON file called data.json and I am trying to print the data inside that JSON file. The JSON file got created by the command:
git log --pretty="format:{"commit":"%h", "merge":"%p", "author":"%an", "title":"%s", "body":"%b"}",>"C:\test_temp\data.json"
I am trying to print the data inside the file with the function parse_json but I am getting an error that says IOError: [Errno 22] invalid mode ('r') or filename "C:\test_temp\data.json"
json_directory = "C:\test_temp\data.json"
def parse_json_file(json_directory):
with open(json_directory) as f:
data = json.load(f)
print(data)
The json file is already there but I am not sure why it cannot read that file.
Also the data that got generate from the JSON file does not have proper formatting as the dictionary is not surrounded by the " " even though I indicated it in the executed git log command. Will that cause a problem if I try to parse the json file.

Maybe try:
json_directory = "C:\\test_temp\\data.json"

Your command is producing invalid json, so your json.load method call will never succeed.
You need to escape the quotes-- what you have supplied (as you can see from stack overflow's syntax highlighting) is actually a series of strings which your shell is concatenating together.
In BASH on OSX, escaping the strings looks like:
git log --pretty="format:{\"commit\":\"%h\", \"merge\":\"%p\", \"author\":\"%an\", \"title\":\"%s\", \"body\":\"%b\"}"
You could also enclose the entire argument to pretty with single quotes, as follows:
git log --pretty='format:{"commit":"%h", "merge":"%p", "author":"%an", "title":"%s", "body":"%b"}',>"C:\test_temp\data.json"
Once your json generation command is corrected, I suspect your script will succeed so long as the paths are correct.
If you try to correct the command as I have recommended and it does not work, please post the JSON file you are generating, as well as the shell you are using.

Related

How to validate a Json file in python without using raises? [duplicate]

I have around 2000 JSON files which I'm trying to run through a Python program. A problem occurs when a JSON file is not in the correct format. (Error: ValueError: No JSON object could be decoded) In turn, I can't read it into my program.
I am currently doing something like the below:
for files in folder:
with open(files) as f:
data = json.load(f); # It causes an error at this part
I know there's offline methods to validating and formatting JSON files but is there a programmatic way to check and format these files? If not, is there a free/cheap alternative to fixing all of these files offline i.e. I just run the program on the folder containing all the JSON files and it formats them as required?
SOLVED using #reece's comment:
invalid_json_files = []
read_json_files = []
def parse():
for files in os.listdir(os.getcwd()):
with open(files) as json_file:
try:
simplejson.load(json_file)
read_json_files.append(files)
except ValueError, e:
print ("JSON object issue: %s") % e
invalid_json_files.append(files)
print invalid_json_files, len(read_json_files)
Turns out that I was saving a file which is not in JSON format in my working directory which was the same place I was reading data from. Thanks for the helpful suggestions.
The built-in JSON module can be used as a validator:
import json
def parse(text):
try:
return json.loads(text)
except ValueError as e:
print('invalid json: %s' % e)
return None # or: raise
You can make it work with files by using:
with open(filename) as f:
return json.load(f)
instead of json.loads and you can include the filename as well in the error message.
On Python 3.3.5, for {test: "foo"}, I get:
invalid json: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
and on 2.7.6:
invalid json: Expecting property name: line 1 column 2 (char 1)
This is because the correct json is {"test": "foo"}.
When handling the invalid files, it is best to not process them any further. You can build a skipped.txt file listing the files with the error, so they can be checked and fixed by hand.
If possible, you should check the site/program that generated the invalid json files, fix that and then re-generate the json file. Otherwise, you are going to keep having new files that are invalid JSON.
Failing that, you will need to write a custom json parser that fixes common errors. With that, you should be putting the original under source control (or archived), so you can see and check the differences that the automated tool fixes (as a sanity check). Ambiguous cases should be fixed by hand.
Yes, there are ways to validate that a JSON file is valid. One way is to use a JSON parsing library that will throw exceptions if the input you provide is not well-formatted.
try:
load_json_file(filename)
except InvalidDataException: # or something
# oops guess it's not valid
Of course, if you want to fix it, you naturally cannot use a JSON loader since, well, it's not valid JSON in the first place. Unless the library you're using will automatically fix things for you, in which case you probably wouldn't even have this question.
One way is to load the file manually and tokenize it and attempt to detect errors and try to fix them as you go, but I'm sure there are cases where the error is just not possible to fix automatically and would be better off throwing an error and asking the user to fix their files.
I have not written a JSON fixer myself so I can't provide any details on how you might go about actually fixing errors.
However I am not sure whether it would be a good idea to fix all errors, since then you'd have assume your fixes are what the user actually wants. If it's a missing comma or they have an extra trailing comma, then that might be OK, but there may be cases where it is ambiguous what the user wants.
Here is a full python3 example for the next novice python programmer that stumbles upon this answer. I was exporting 16000 records as json files. I had to restart the process several times so I needed to verify that all of the json files were indeed valid before I started importing into a new system.
I am no python programmer so when I tried the answers above as written, nothing happened. Seems like a few lines of code were missing. The example below handles files in the current folder or a specific folder.
verify.py
import json
import os
import sys
from os.path import isfile,join
# check if a folder name was specified
if len(sys.argv) > 1:
folder = sys.argv[1]
else:
folder = os.getcwd()
# array to hold invalid and valid files
invalid_json_files = []
read_json_files = []
def parse():
# loop through the folder
for files in os.listdir(folder):
# check if the combined path and filename is a file
if isfile(join(folder,files)):
# open the file
with open(join(folder,files)) as json_file:
# try reading the json file using the json interpreter
try:
json.load(json_file)
read_json_files.append(files)
except ValueError as e:
# if the file is not valid, print the error
# and add the file to the list of invalid files
print("JSON object issue: %s" % e)
invalid_json_files.append(files)
print(invalid_json_files)
print(len(read_json_files))
parse()
Example:
python3 verify.py
or
python3 verify.py somefolder
tested with python 3.7.3
It was not clear to me how to provide path to the file folder, so I'd like to provide answer with this option.
path = r'C:\Users\altz7\Desktop\your_folder_name' # use your path
all_files = glob.glob(path + "/*.json")
data_list = []
invalid_json_files = []
for filename in all_files:
try:
df = pd.read_json(filename)
data_list.append(df)
except ValueError:
invalid_json_files.append(filename)
print("Files in correct format: {}".format(len(data_list)))
print("Not readable files: {}".format(len(invalid_json_files)))
#df = pd.concat(data_list, axis=0, ignore_index=True) #will create pandas dataframe
from readable files, if you like

IBM Personality Insights Syntax Errors

I'm trying to learn the basic of how the IBM Watson Personality Insights API works before it shuts down at the end of this year. I have a basic text file that I want analyzed, but I'm having trouble getting the code to run properly. I have been trying to follow along on the official sits instructions, but I'm stuck. What am I doing wrong? (I have blotted out my key in the below code).
from ibm_watson import PersonalityInsightsV3
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
authenticator = IAMAuthenticator('BlottedOutKey')
personality_insights = PersonalityInsightsV3(
version='2017-10-13',
authenticator=authenticator
)
personality_insights.set_service_url('https://api.us-west.personality-insights.watson.cloud.ibm.com')
with open(join(C:\Users\AWaywardShepherd\Documents\Data Science Projects\TwitterScraper-master\TwitterScraper-master\snscrape\python-wrapper\Folder\File.txt), './profile.json')) as profile_json:
profile = personality_insights.profile(
profile_json.read(),
content_type='text/plain',
consumption_preferences=True,
raw_scores=True)
.get_result()
print(json.dumps(profile, indent=2))
I keep getting the following nondescript syntax error:
File "<ipython-input-1-1c7761f3f3ea>", line 11
with open(join(C:\Users\AWaywardShepherd\Documents\Data Science Projects\TwitterScraper-master\TwitterScraper-master\snscrape\python-wrapper\Folder\File.txt), './profile.json')) as profile_json:
^ SyntaxError: invalid syntax
There is so much wrong with that open line.
join is expecting an itterable which it joins into a single string.
In Python, strings become strings by enclosing them with quotes (paths are just strings !)
You are only passing one value into join, which makes it redundant.
The second parameter for open should be a mode, and not a file name.
It looks like you are trying to append a directory with a file name, but for that to work the directory shouldn't end with a filename.
The brackets don't match - You have 2 opening brackets and 3 closing brackets.
In Python you use join to join strings to gather. Normally this would be a path and a filename. Getting the path from the current working directory and joining it with a path.
import os
file = os.path.join(os.getcwd(), 'profile.json')
In your code you are only passing in one string, so there is no need to use join.
Using open you pass in the filename and the mode. The mode would be something like 'r' indicating read mode. So the code with the join becomes.
import os
with open(os.path.join(os.getcwd(), 'profile.json'), 'r') as profile_json:

How to copy a math output to clipboard in Python?

I'm super new to Python so I'm wondering if someone can help me out or link me to an appropriate post that explains this?
What I would like to do is
9999**9999
in Python Terminal, then copy the output directly to my clipboard or sent to a file.
I tried in Batch using
py 9999**9999 >>pythonoutput.txt
but only got an error of
python.exe: can't open file '9999**9999': [Errno 22] Invalid argument
and not sure how I could make that work either.
Any ideas? Cheers
Here's how to write (append) to a file:-
obj=open("yourfile.txt","a+") #open a reference to your file, in append mode. (Use 'w' for write, and 'r' for read if you ever need to)
obj.write("your chars, numbers or whatever here") #use this as many times as you want before closing
obj.close() #close your reference once you're done
Try using:
python -c print(9999*9999) > outfile.txt
You might want to use py instead of python there since you seem to have your executable renamed.
Sent to result to file is much easier than to clipboard.
In the python terminal,you can do this:
with open("/home/my/output","w") as file:#start a file object for writing
file.write(str(9999*9999))#write the content

How to validate the syntax of a Python script? [duplicate]

This question already has answers here:
How can I check the syntax of Python script without executing it?
(9 answers)
Closed 6 years ago.
I just want the simplest possible way for my Python script to ask "is the Python code which I just generated syntactically valid Python?"
I tried:
try:
import py_compile
x = py_compile.compile(generatedScriptPath, doraise=True)
pass
except py_compile.PyCompileError, e:
print str(e)
pass
But even with a file containing invalid Python, the exception is not thrown and afterwards x == None.
There is no need to use py_compile. It's intended use is to write a bytecode file from the given source file. In fact it will fail if you don't have the permissions to write in the directory, and thus you could end up with some false negatives.
To just parse, and thus validate the syntax, you can use the ast module to parse the contents of the file, or directly call the compile built-in function.
import ast
def is_valid_python_file(fname):
with open(fname) as f:
contents = f.read()
try:
ast.parse(contents)
#or compile(contents, fname, 'exec', ast.PyCF_ONLY_AST)
return True
except SyntaxError:
return False
Be sure to not execute the file, since if you cannot trust its contents (and if you don't even know whether the file contains valid syntax I doubt you can actually trust the contents even if you generated them) you could end up executing malicious code.

How to Accept Command Line Arguments With Python Using < [duplicate]

This question already has answers here:
Python command line 'file input stream'
(3 answers)
Closed 8 years ago.
Is it possible to run a python script and feed in a file as an argument using <? For example, my script works as intended using the following command python scriptname.py input.txt and the following code stuffFile = open(sys.argv[1], 'r').
However, what I'm looking to do, if possible, is use this command line syntax: python scriptname.py < input.txt. Right now, running that command gives me only one argument, so I likely have to adjust my code in my script, but am not sure exactly how.
I have an automated system processing this command, so it needs to be exact. If that's possible with a Python script, I'd greatly appreciate some help!
< file is handled by the shell: the file doesn't get passed as an argument. Instead it becomes the standard input of your program, i.e., sys.stdin.
When you use the < operator in a shell you are actually opening the file and adding its contents to your scripts stdin
However there is is a python module that can do both. It's called fileinput.
https://docs.python.org/2/library/fileinput.html
It was shown in this post
How do you read from stdin in Python?
You can use the sys module's stdin attribute as a file like object.

Categories

Resources