I am trying to read the .json file in python.
Here is my python code:
import pandas as pd
df_idf = pd.read_json('/home/lazzydevs/Data/datajs.json',lines = True)
print("Schema:\n\n",df_idf.dtypes)
print("Number of questions,columns=",df_idf.shape)
I checked my json file also it's also valid file.
Here is my .json file:
[{
"id": "4821394",
"title": "Serializing a private struct - Can it be done?",
"body": "\u003cp\u003eI have a public class that contains a private struct. The struct contains properties (mostly string) that I want to serialize. When I attempt to serialize the struct and stream it to disk, using XmlSerializer, I get an error saying only public types can be serialized. I don't need, and don't want, this struct to be public. Is there a way I can serialize it and keep it private?\u003c/p\u003e",
"answer_count": "1",
"comment_count": "0",
"creation_date": "2011-01-27 20:19:13.563 UTC",
"last_activity_date": "2011-01-27 20:21:37.59 UTC",
"last_editor_display_name": "",
"owner_display_name": "",
"owner_user_id": "163534",
"post_type_id": "1",
"score": "0",
"tags": "c#|serialization|xml-serialization",
"view_count": "296"
},{
"id": "3367882",
"title": "How do I prevent floated-right content from overlapping main content?",
"body": "\u003cp\u003eI have the following HTML:\u003c/p\u003e\n\n\u003cpre\u003e\u003ccode\u003e\u0026lt;td class='a'\u0026gt;\n \u0026lt;img src='/images/some_icon.png' alt='Some Icon' /\u0026gt;\n \u0026lt;span\u0026gt;Some content that's waaaaaaaaay too long to fit in the allotted space, but which can get cut off.\u0026lt;/span\u0026gt;\n\u0026lt;/td\u0026gt;\n\u003c/code\u003e\u003c/pre\u003e\n\n\u003cp\u003eIt should display as follows:\u003c/p\u003e\n\n\u003cpre\u003e\u003ccode\u003e[Some content that's wa [ICON]]\n\u003c/code\u003e\u003c/pre\u003e\n\n\u003cp\u003eI have the following CSS:\u003c/p\u003e\n\n\u003cpre\u003e\u003ccode\u003etd.a span {\n overflow: hidden;\n white-space: nowrap;\n z-index: 1;\n}\n\ntd.a img {\n display: block;\n float: right;\n z-index: 2;\n}\n\u003c/code\u003e\u003c/pre\u003e\n\n\u003cp\u003eWhen I resize the browser to cut off the text, it cuts off at the edge of the \u003ccode\u003e\u0026lt;td\u0026gt;\u003c/code\u003e rather than before the \u003ccode\u003e\u0026lt;img\u0026gt;\u003c/code\u003e, which leaves the \u003ccode\u003e\u0026lt;img\u0026gt;\u003c/code\u003e overlapping the \u003ccode\u003e\u0026lt;span\u0026gt;\u003c/code\u003e content. I've tried various \u003ccode\u003epadding\u003c/code\u003e and \u003ccode\u003emargin\u003c/code\u003es, but nothing seemed to work. Is this possible?\u003c/p\u003e\n\n\u003cp\u003eNB: It's \u003cem\u003every\u003c/em\u003e difficult to add a \u003ccode\u003e\u0026lt;td\u0026gt;\u003c/code\u003e that just contains the \u003ccode\u003e\u0026lt;img\u0026gt;\u003c/code\u003e here. If it were easy, I'd just do that :)\u003c/p\u003e",
"accepted_answer_id": "3367943",
"answer_count": "2",
"comment_count": "2",
"creation_date": "2010-07-30 00:01:50.9 UTC",
"favorite_count": "0",
"last_activity_date": "2012-05-10 14:16:05.143 UTC",
"last_edit_date": "2012-05-10 14:16:05.143 UTC",
"last_editor_display_name": "",
"last_editor_user_id": "44390",
"owner_display_name": "",
"owner_user_id": "1190",
"post_type_id": "1",
"score": "2",
"tags": "css|overflow|css-float|crop",
"view_count": "4121"
}]
Now i am trying to read the json file in python but every time it's showing error:
Traceback (most recent call last):
File "/home/lazzydevs/Desktop/tfstack.py", line 4, in <module>
df_idf = pd.read_json('/home/lazzydevs/Data/datajs.json',lines = True)
File "/home/lazzydevs/.local/lib/python3.7/site-packages/pandas/io/json/_json.py", line 592, in read_json
result = json_reader.read()
File "/home/lazzydevs/.local/lib/python3.7/site-packages/pandas/io/json/_json.py", line 715, in read
obj = self._get_object_parser(self._combine_lines(data.split("\n")))
File "/home/lazzydevs/.local/lib/python3.7/site-packages/pandas/io/json/_json.py", line 739, in _get_object_parser
obj = FrameParser(json, **kwargs).parse()
File "/home/lazzydevs/.local/lib/python3.7/site-packages/pandas/io/json/_json.py", line 849, in parse
self._parse_no_numpy()
File "/home/lazzydevs/.local/lib/python3.7/site-packages/pandas/io/json/_json.py", line 1093, in _parse_no_numpy
loads(json, precise_float=self.precise_float), dtype=None
ValueError: Expected object or value
I checked so many posts but not working...i don't know what is the problem.
The following piece of code seems to work on my machine.
import pandas as pd
df_idf = pd.read_json('/home/lazzydevs/Data/datajs.json')
print("Schema:\n\n",df_idf.dtypes)
print("Number of questions,columns=",df_idf.shape)
Related
I am looking to format a text file from an api request output. So far my code looks like such:
import requests
url = 'http://URLhere.com'
headers = {'tokenname': 'tokenhash'}
response = requests.get(url, headers=headers,)
with open('newfile.txt', 'w') as outf:
outf.write(response.text)
and this creates a text file but the output is on one line.
What I am trying to do is:
Have it start a new line every time the code reaches a certain word like "id","status", or "closed_at" but unfortunately I have not been able to figure this out.
Also I am trying to get a count of how many "id" there are in the file but I think due to the formatting, the script does not like it.
The output is as follows:
{
[
{
"id": 12345,
"status": "open or close",
"closed_at": null,
"created_at": "yyyy-mm-ddTHH:MM:SSZ",
"due_date": "yyyy-mm-dd",
"notes": null,
"port": [pnumber
],
"priority": 1,
"identifiers": [
"12345"
],
"last_seen_time": "yyyy-mm-ddThh:mm:ss.sssZ",
"scanner_score": 1.0,
"fix_id": 12345,
"scanner_vulnerabilities": [
{
"port": null,
"external_unique_id": "12345",
"open": false
}
],
"asset_id": 12345
This continues on one line with the same names but for different assets.
This code :
with open ('text.txt') as text_file :
data = text_file.read ()
print ('\n'.join (data.split (',')))
Gives this output :
"{[{"id":12345
"status":"open or close"
"closed_at":null
"created_at":"yyyy-mm-ddTHH:MM:SSZ"
"due_date":"yyyy-mm-dd"
"notes":null
"port":[pnumber]
"priority":1
"identifiers":["12345"]
"last_seen_time":"yyyy-mm-ddThh:mm:ss.msmsmsZ"
"scanner_score":1.0
"fix_id":12345
"scanner_vulnerabilities":[{"port":null
"external_unique_id":"12345"
"open":false}]
"asset_id":12345"
And then to write it to a new file :
output = data.split (',')
with open ('new.txt', 'w') as write_file :
for line in output :
write_file.write (line + '\n')
I am stuck here again... I have a file named "data.json" and I want to open it with python but I am getting errors.
import json
>>> data=json.load(open("data.json"))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Angel\AppData\Local\Programs\Python\Python38-32\lib\json\__init__.py", line 293, in load
return loads(fp.read(),
File "C:\Users\Angel\AppData\Local\Programs\Python\Python38-32\lib\json\__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "C:\Users\Angel\AppData\Local\Programs\Python\Python38-32\lib\json\decoder.py", line 340,
in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 4912995)
>>>
According to Python JSON documentation
If the data being deserialized is not a valid JSON document, a JSONDecodeError will be raised.
Not knowing the content of your file, it is hard to say what is wrong, but I would suspect that text in your file is not a valid JSON object, or more likely (according to "Extra data" search, answered here) the file "data.json" includes more than one JSON object.
For example, using your code:
This file works correctly
{ "name":"John", "age":30, "car":null }
but this one
{ "name":"John", "age":30, "car":null }
{ "name":"John", "age":30, "car":null }
throws the same errors
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\a\AppData\Local\Programs\Python\Python37-32\lib\json\__init__.py",
line 296, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "C:\Users\a\AppData\Local\Programs\Python\Python37-32\lib\json\__init__.py",
line 348, in loads
return _default_decoder.decode(s)
File "C:\Users\a\AppData\Local\Programs\Python\Python37-32\lib\json\decoder.py",
line 340, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 6 column 1 (char 55)
In case 2 or more than 2 record, you have to reformat your file as mentioned below OR you have to load file record by record.
You need to reformat your json to contain an array like below:
{
"foo" : [
{"name": "XYZ", "address": "54.7168,94.0215", "country_of_residence": "PQR", "countries": "LMN;PQRST", "date": "28-AUG-2008", "type": null},
{"name": "OLMS", "address": null, "country_of_residence": null, "countries": "Not identified;No", "date": "23-FEB-2017", "type": null}
]
}
This question already has answers here:
Python json.loads shows ValueError: Extra data
(11 answers)
Loading JSONL file as JSON objects
(5 answers)
Closed 2 years ago.
I am running the following code-
import json
addrsfile =
open("C:\\Users\file.json",
"r")
addrJson = json.loads(addrsfile.read())
addrsfile.close()
if addrJson:
print("yes")
But giving me following error-
Traceback (most recent call last):
File "C:/Users/Mayur/Documents/WebPython/Python_WebServices/test.py", line 9, in <module>
addrJson = json.loads(addrsfile.read())
File "C:\Users\Mayur\Anaconda3\lib\json\__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "C:\Users\Mayur\Anaconda3\lib\json\decoder.py", line 342, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 190)
Anyone help me please?
JSON file is like-
{"name": "XYZ", "address": "54.7168,94.0215", "country_of_residence": "PQR", "countries": "LMN;PQRST", "date": "28-AUG-2008", "type": null}
{"name": "OLMS", "address": null, "country_of_residence": null, "countries": "Not identified;No", "date": "23-FEB-2017", "type": null}
You have two records in your json file, and json.loads() is not able to decode more than one. You need to do it record by record.
See Python json.loads shows ValueError: Extra data
OR you need to reformat your json to contain an array:
{
"foo" : [
{"name": "XYZ", "address": "54.7168,94.0215", "country_of_residence": "PQR", "countries": "LMN;PQRST", "date": "28-AUG-2008", "type": null},
{"name": "OLMS", "address": null, "country_of_residence": null, "countries": "Not identified;No", "date": "23-FEB-2017", "type": null}
]
}
would be acceptable again. But there cannot be several top level objects.
I was parsing JSON from a REST API call and got this error. It turns out the API had become "fussier" (eg about order of parameters etc) and so was returning malformed results. Check that you are getting what you expect :)
This error can also show up if there are parts in your string that json.loads() does not recognize. An in this example string, an error will be raised at character 27 (char 27).
string = """[{"Item1": "One", "Item2": False}, {"Item3": "Three"}]"""
My solution to this would be to use the string.replace() to convert these items to a string:
import json
string = """[{"Item1": "One", "Item2": False}, {"Item3": "Three"}]"""
string = string.replace("False", '"False"')
dict_list = json.loads(string)
I'm using the following python script to read and parse a json file
import json
with open('testdata.json', 'r') as raw_data:
content = json.load(raw_data)
print(content)
that has data like:
{"grp":"1"; "total":"10"}
{"event":"run", "timestamp":"2010-01-30 10:00:40", "id": "200", "distance": "5"}
{"event":"walk", "timestamp":"2010-01-31 18:46:00", "id": "200", "disrance": "2"}
I'm getting the error:
Traceback (most recent call last):
File "readdata.py", line 4, in <module>
content = json.load(raw_data)
File "/usr/lib/python2.7/json/__init__.py", line 290, in load **kw)
File "/usr/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 369, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 2 column 1 - line 3 column 1 (char 93 - 187)
If I have one row of data it works... 2 or more rows of data I get the error
Can't see anything that is causing this problem
The SO syntax highlighter solved your issue.
"distance': "5"}
^
Change this to double quotes
But there are many other issues. here is a valid version of your json file.
[
{"grp":1, "total":10},
{"event":"run", "timestamp":"2010-01-30 10:00:40", "id": "200", "distance": "5"},
{"event":"walk", "timestamp":"2010-01-31 18:46:00", "id": "200", "disrance": "2"}
]
Note the " arround each key. the , between key:value pairs, and the , between elements of the list.
You can validate your JSON using tools like jsonlint.com
After contacting a server I get the following strings as response
{"kind": "t2", "data": {"has_mail": null, "name": "shadyabhi", "created": 1273919273.0, "created_utc": 1273919273.0, "link_karma": 1343, "comment_karma": 301, "is_gold": false, "is_mod": false, "id": "425zf", "has_mod_mail": null}}
which is stored as type 'str' in my script.
Now, when I try to decode it using json.dumps(mystring, sort_keys=True, indent=4), I get this.
"{\"kind\": \"t2\", \"data\": {\"has_mail\": null, \"name\": \"shadyabhi\", \"created\": 1273919273.0, \"created_utc\": 1273919273.0, \"link_karma\": 1343, \"comment_karma\": 301, \"is_gold\": false, \"is_mod\": false, \"id\": \"425zf\", \"has_mod_mail\": null}}"
which should really be like this
shadyabhi#archlinux ~ $ echo '{"kind": "t2", "data": {"has_mail": "null", "name": "shadyabhi", "created": 1273919273.0, "created_utc": 1273919273.0, "link_karma": 1343, "comment_karma": 299, "is_gold": "false", "is_mod": "false", "id": "425zf", "has_mod_mail": "null"}}' | python2 -mjson.tool
{
"data": {
"comment_karma": 299,
"created": 1273919273.0,
"created_utc": 1273919273.0,
"has_mail": "null",
"has_mod_mail": "null",
"id": "425zf",
"is_gold": "false",
"is_mod": "false",
"link_karma": 1343,
"name": "shadyabhi"
},
"kind": "t2"
}
shadyabhi#archlinux ~ $
So, what is it that's going wrong?
You need to load it before you can dump it. Try this:
data = json.loads(returnFromWebService)
json.dumps(data, sort_keys=True, indent=4)
To add a bit more detail - you're receiving a string, and then asking the json library to dump it to a string. That doesn't make a great deal of sense. What you need to do first is put the data into a more meaningful container. By calling loads you take the string value of the return and parse it into an actual Python Dictionary. Then, you can pass that data to dumps which outputs a string using your requested formatting.
You have things backwards. If you want to convert a string to a data structure you need to use json.loads(thestring). json.dumps() is for converting a data structure to a json encoded string.
You are supposed to dump an object (like a dictionary) which then becomes a string, not the other way round... see here.
Use json.loads() instead.
You want json.loads. The dumps method is for going the other way (dumping an object to a json string).