Python can't open JSON file, giving JSONDecodeError - python

I want to open a JSON file using Python in my project, but I constantly get the following error:
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
This is the code:
import json
with open("../data.txt") as json_file:
data = json.load(json_file)
I have a really simple text file with JSON formatted data in it. This is the data.txt file:
{
"data": [
{
"day": "22/04/2020 15:35",
"viewcount": "1"
},
{
"day": "22/04/2020 20:51",
"viewcount": "2"
}
]
}

I've tried your source code and the visible JSON data as is, it runs with no problems at all.
I'd suggest checking the contents of file in binary form, e.g. by using a utility such as hexdump to see how it begins:
$ hexdump data.txt
0000000 0a7b 2020 2020 6422 7461 2261 203a 5b20
0000010 200a 2020 2020 2020 2020 2020 2020 2020
0000020 2020 0a7b 2020 2020 2020 2020 2020 2020
...
Or use file utility to check the encoding as described in the following post: https://unix.stackexchange.com/questions/11602/how-can-i-test-the-encoding-of-a-text-file-is-it-valid-and-what-is-it

The python code was not the problem.
The file was not saved in UTF-8 encoding, this was the problem.

Related

How to split up git log output into a list of commits in python? [duplicate]

This question already has answers here:
Python RE library String Split but keep the delimiters/separators as part of the next string
(2 answers)
Closed 5 months ago.
Given git log output like such:
commit 19e0f017ac832238f5a800dd3ea7a5966b3c1343 (HEAD -> master, origin/master, origin/HEAD)
Author: Slim Shady
Date: Sun Sep 18 19:53:42 2022 -0700
ci: remove debugging line github action script
commit body
commit ef82c672d21d70c43f0454b0b4d6fa22ef4ad0a9 (fix_release_action)
Author: Slim Shady
Date: Sun Sep 18 19:41:20 2022 -0700
feat: read and write IDs
commit 8ee8fcbebcab76a2fbf0ee096a0d216e51fe2874
Author: Slim Shady
Date: Sun Sep 18 17:41:03 2022 -0700
feat: new hook to allow custom tags
I'd like that to turn into a list in python, with each element containing a single commit (including hash, author, body, etc.).
I've tried using re.split(r"commit \w{40}", git_log), but it doesn't keep the hash in the output.
You could also use a positive lookahead to split your data.
with open('git_log.txt', 'r') as f:
data = f.read()
res = list(filter(None, re.split(r"(?=commit \w{40})", data)))
Output:
[
'commit 19e0f017ac832238f5a800dd3ea7a5966b3c1343 (HEAD -> master, origin/master, origin/HEAD)\nAuthor: Slim Shady\nDate: Sun Sep 18 19:53:42 2022 -0700\n\n ci: remove debugging line github action script\n\n commit body\n\n',
'commit ef82c672d21d70c43f0454b0b4d6fa22ef4ad0a9 (fix_release_action)\nAuthor: Slim Shady\nDate: Sun Sep 18 19:41:20 2022 -0700\n\n feat: read and write IDs\n\n',
'commit 8ee8fcbebcab76a2fbf0ee096a0d216e51fe2874\nAuthor: Slim Shady\nDate: Sun Sep 18 17:41:03 2022 -0700\n\n feat: new hook to allow custom tags'
]
You need to put the split pattern in a capture group to allow it to be part of the output:
# filter(None, ...) to remove empty strings
>>> res = filter(None, re.split(r'(commit \w{40})', inp))
# Join items in group of two to handle the split between a commit line and rest of its body
>>> output = ["".join(item) for item in zip(*[res] * 2)]
>>> output
[
'commit 19e0f017ac832238f5a800dd3ea7a5966b3c1343 (HEAD -> master, origin/master, origin/HEAD)\nAuthor: Slim Shady\nDate: Sun Sep 18 19:53:42 2022 -0700\n\n ci: remove debugging line github action script\n\n commit body\n\n',
'commit ef82c672d21d70c43f0454b0b4d6fa22ef4ad0a9 (fix_release_action)\nAuthor: Slim Shady\nDate: Sun Sep 18 19:41:20 2022 -0700\n\n feat: read and write IDs\n\n',
'commit 8ee8fcbebcab76a2fbf0ee096a0d216e51fe2874\nAuthor: Slim Shady\nDate: Sun Sep 18 17:41:03 2022 -0700\n\n feat: new hook to allow custom tags'
]
But if you do have control over the git log output, you could format it differently and parse it without regex:
git log --pretty=format:'"%H"%x09"%an"%x09"%ad"%x09"%B"' > output.csv
Then:
>>> import csv
>>> with open("output.csv") as f:
... items = list(csv.reader(f, delimiter='\t'))
...
>>> items[0]
["19e0f017ac832238f5a800dd3ea7a5966b3c1343", "Slim Shady", "Sun Sep 18 19:53:42 2022 -0700", "ci: remove debugging line github action script"]
Other option is to use libraries like https://gitpython.readthedocs.io/en/stable/ to get access to commits as Python objects you can access easily.

calling values from a json file like a dictionary

Warning: beginner here:
So I am reading in a text file that is in the form of a json file. Since the son file is just like a dictionary I want to address parts of the json like I would a dictionary but I don't know how to do this. This is the little bit of what i have:
code:
with open("trump.txt","r") as lines:
for line in lines:
print(line)
what this prints:
{"created_at":"Wed Sep 27 01:19:39 +0000 2017","id":912849180741087232,"id_str":"912849180741087232","text":"RT #TheRickWilson: I see the clickservatives are out in force screaming there were special circumstances in AL.\n\nYes, it's because Trump ca\u2026","source":"\u003ca href=\"http://twitter.com/download/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":66914769,"id_str":"66914769","name":"Kathy","screen_name":"mydoggigi","location":"Earth","url":null,"description":"Love politics, Grandchildren & PSU #StillWithHer #NotMyPresident Blocked by Susan Sarandon, Glenn Greenwald, Joel Osteen and Joe Scarborough!!\ud83d\ude0e #TheResistance","translator_type":"none","protected":false,"verified":false,"followers_count":5878,"friends_count":5973,"listed_count":143,"favourites_count":110285,"statuses_count":138191,"created_at":"Wed Aug 19 04:55:41 +0000 2009","utc_offset":-14400,"time_zone":"Eastern Time (US & Canada)","geo_enabled":true,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","profile_background_tile":false,"profile_link_color":"1DA1F2","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http://pbs.twimg.com/profile_images/903412377424732160/NqCfPFiB_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/903412377424732160/NqCfPFiB_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/66914769/1504225271","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_status":{"created_at":"Wed Sep 27 01:08:45 +0000 2017","id":912846439964987392,"id_str":"912846439964987392","text":"I see the clickservatives are out in force screaming there were special circumstances in AL.\n\nYes, it's because Trump can't deliver. Sad!","source":"\u003ca href=\"http://twitter.com/download/android\" rel=\"nofollow\"\u003eTwitter for Android\u003c/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":19084896,"id_str":"19084896","name":"Rick Wilson","screen_name":"TheRickWilson","location":"Florida and points beyond","url":"http://facebook.com/therickwilson","description":"GOP Media Guy, Dad, Husband, Pilot, Hunter, Writer. I make ads and do politics. Daily Beast columnist. Everything Trump Touches Dies.","translator_type":"none","protected":false,"verified":true,"followers_count":238578,"friends_count":3518,"listed_count":4235,"favourites_count":48094,"statuses_count":250609,"created_at":"Fri Jan 16 20:50:17 +0000 2009","utc_offset":-14400,"time_zone":"America/New_York","geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"1A1B1F","profile_background_image_url":"http://pbs.twimg.com/profile_background_images/220716353/Firefox_Wallpaper.jpg","profile_background_image_url_https":"https://pbs.twimg.com/profile_background_images/220716353/Firefox_Wallpaper.jpg","profile_background_tile":true,"profile_link_color":"445555","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"252429","profile_text_color":"666666","profile_use_background_image":true,"profile_image_url":"http://pbs.twimg.com/profile_images/813585115934658560/gnuRozoD_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/813585115934658560/gnuRozoD_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/19084896/1504722796","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"quote_count":5,"reply_count":50,"retweet_count":100,"favorite_count":456,"entities":{"hashtags":[],"urls":[],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"en"},"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[],"user_mentions":[{"screen_name":"TheRickWilson","name":"Rick Wilson","id":19084896,"id_str":"19084896","indices":[3,17]}],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"en","timestamp_ms":"1506475179263"}
so how can i do something as simple as something below in my code?
dict["created_at"]="Wed Sep 27 01:19:39 +0000 2017"
Try this:
import json
with open('file.json') as file:
data = json.load(file)
#code

I need to format some content and give output to a text file [duplicate]

This question already has answers here:
How to redirect 'print' output to a file?
(15 answers)
Closed 5 years ago.
Here is my content that I am going to to format
{
"url":"https://www.w3schools.com/",
"originalUrl":"https://www.w3schools.com/",
"applications":[
{
"name":"EdgeCast",
"confidence":"100",
"version":"",
"icon":"EdgeCast.png",
"categories":[
"CDN"
]
},
,
{
"name":"Google Analytics",
"confidence":"100",
"version":"UA",
"icon":"Google Analytics.svg",
"categories":[
"Analytics"
]
},
{
"name":"Microsoft ASP.NET",
"confidence":"50",
"version":"",
"icon":"Microsoft ASP.NET.png",
"categories":[
"Web Frameworks"
]
},
{
"name":"IIS",
"confidence":"25",
"version":"",
"icon":"IIS.png",
"categories":[
"Web Servers"
]
},
{
"name":"Windows Server",
"confidence":"25",
"version":"",
"icon":"Microsoft.svg",
"categories":[
"Operating Systems"
]
}
]
}
When I compile this using Python script using below mentioned code snippet it is clearly displaying the content inside the terminal
for criteria in d['applications']:
for key, value in criteria.iteritems():
print key, 'is:', value
print ''
It gives the following output into the terminal:
confidence is: 100
version is: ``
name is: EdgeCast
categories is: [u'CDN']
icon is: EdgeCast.png
confidence is: 100
version is: UA
name is: Google Analytics
categories is: [u'Analytics']
icon is: Google Analytics.svg
confidence is: 50
version is:
name is: Microsoft ASP.NET
categories is: [u'Web Frameworks']
icon is: Microsoft ASP.NET.png
I need to write as it is to a text file.Here I should be able to add write multiple arguments at the same time to write to a text file
You need to do file I/o in python. Open the file using:
with open("your-file-name", 'w') as out:
for criteria in d['applications']:
for key, value in criteria.iteritems():
out.write "{} is: {}\n".format(key, value)
There's a couple of concepts here: open the file with write mode, and also using a string format to print your variable.

Extract JSON Data in Python - Example Code Included

I am brand new to using JSON data and fairly new to Python. I am struggling with being able to parse the following JSON data in Python, in order to import the data into a SQL Server database. I already have a program that will import the parsed data into sql server using PYDOBC, however I can't for the life of me figure out how to correctly parse the JSON data into a Python dictionary.
I know there are a number of threads that address this issue, however I was unable to find any examples of the same JSON data structure. Any help would be greatly appreciated as I am completely stuck on this issue. Thank you SO! Below is a cut of the JSON data I am working with:
{
"data":
[
{
"name": "Mobile Application",
"url": "https://www.example-url.com",
"metric": "users",
"package": "example_pkg",
"country": "USA",
"data": [
[ 1396137600000, 5.76 ],
[ 1396224000000, 5.79 ],
[ 1396310400000, 6.72 ],
....
[ 1487376000000, 7.15 ]
]
}
],"as_of":"2017-01-22"}
Again, I apologize if this thread is repetitive, however as I mentioned above, I was not able to work out the logic from other threads as I am brand new to using JSON.
Thank you again for any help or advice in regard to this.
import json
with open("C:\\Pathyway\\7Park.json") as json_file:
data = json.load(json_file)
assert data["data"][0]["metric"] == "users"
The above code results with the following error:
Traceback (most recent call last):
File "JSONpy", line 10, in <module>
data = json.load(json_file)
File "C:\json\__init__.py", line 291, in load
**kw)
File "C:\json\__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "C:\json\decoder.py", line 367, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 2 column 1 - line 7 column 1 (char 23549 - 146249)
Assuming the data you've described (less the ... ellipsis) is in a file called j.json, this code parses the JSON document into a Python object:
import json
with open("j.json") as json_file:
data = json.load(json_file)
assert data["data"][0]["metric"] == "users"
From your error message it seems possible that your file is not a single JSON document, but a sequence of JSON documents separated by newlines. If that is the case, then this code might be more helpful:
import json
with open("j.json") as json_file:
for line in json_file:
data = json.loads(line)
print (data["data"][0]["metric"])

auto-generation of python file info(author, create date etc.)

I'm using pydev in eclipse. When new a .py file, there will be file info(author, create date etc.)generated like below:
"""
Created on Fri Oct 10 13:50:18 2014
#author: XXXX
"""
How to change the format?
Window-Preferences-PyDev-Editor-Templates-Change your Empty template

Categories

Resources