I suspect I am facing an issue because my json data is within square brackets ([]).
I am processing pubsub data which is in below format.
[{"ltime":"2022-04-12T11:33:00.970Z","cnt":199,"fname":"MYFILENAME","data":[{"NAME":"N1","ID":11.4,"DATE":"2005-10-14 00:00:00"},{"NAME":"M1","ID":25.0,"DATE":"2005-10-14 00:00:00"}]
I am successfully processing/extracting all the fields except 'data'. I need to create a json file using the 'data' in a cloud storage.
when I use msg['data'] I am getting below,
[{"NAME":"N1","ID":11.4,"DATE":"2005-10-14 00:00:00"},{"NAME":"M1","ID":25.0,"DATE":"2005-10-14 00:00:00"}]
Sample code:
pubsub_msg = base64.b64decode(event['data']).decode('utf-8')
pubsub_data = json.loads(pubsub_msg)
json_file = pubsub_data['fname'] + '.json'
json_content = pubsub_data['data']
try:
<< call function >>
except Exception as e:
<< Error >>
below is the error I am getting it.
2022-04-12T17:35:01.761Zmyapp-pipelineeopcm2kjno5h Error in uploading json document to gcs: [{"NAME":"N1","ID":11.4,"DATE":"2005-10-14 00:00:00"},{"NAME":"M1","ID":25.0,"DATE":"2005-10-14 00:00:00"}] could not be converted to bytes
I am not sure the issue is because of square brackets [].
Correct me if I am wrong and please help to get the exact json data for creating a file.
Related
I have a JSON file hosted locally in my Django directory. It is fetched from that file to a view in views.py, where it is read in like so:
def Stops(request):
json_data = open(finders.find('JSON/myjson.json'))
data1 = json.load(json_data) # deserialises it
data2 = json.dumps(data1) # json formatted string
json_data.close()
return JsonResponse(data2, safe=False)
Using JsonResponse without (safe=False) returns the following error:
TypeError: In order to allow non-dict objects to be serialized set the safe parameter to False.
Similarly, using json.loads(json_data.read()) instead of json.load gives this error:
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
This is confusing to me - I have validated the JSON using an online validator. When the JSON is sent to the frontend with safe=False, the resulting object that arrives is a string, even after calling .json() on it in javascript like so:
fetch("/json").then(response => {
return response.json();
}).then(data => {
console.log("data ", data); <---- This logs a string to console
...
However going another step and calling JSON.parse() on the string converts the object to a JSON object that I can use as intended
data = JSON.parse(data);
console.log("jsonData", data); <---- This logs a JSON object to console
But this solution doesn't strike me as a complete one.
At this point I believe the most likely thing is that there is something wrong with the source JSON - (in the file character encoding?) Either that or json.dumps() is not doing what I think it should, or I am not understanding the Django API's JSONresponse function in a way I'm not aware of...
I've reached the limit of my knowledge on this subject. If you have any wisdom to impart, I would really appreciate it.
EDIT: As in the answer below by Abdul, I was reformatting the JSON into a string with the json.dumps(data1) line
Working code looks like:
def Stops(request):
json_data = open(finders.find('JSON/myjson.json'))
data = json.load(json_data) # deserialises it
json_data.close()
return JsonResponse(data, safe=False) # pass the python object here
Let's see the following lines of your code:
json_data = open(finders.find('JSON/myjson.json'))
data1 = json.load(json_data) # deserialises it
data2 = json.dumps(data1) # json formatted string
You open a file and get a file pointer in json_data, parse it's content and get a python object in data1 and then turn it back into a JSON string and store it into data2. Somewhat redundant right? Next you pass this JSON string to JsonResponse which will further try to serialize it into JSON!! Meaning you then get a string inside a string in JSON.
Try the following code instead:
def Stops(request):
json_data = open(finders.find('JSON/myjson.json'))
data = json.load(json_data) # deserialises it
json_data.close()
return JsonResponse(data, safe=False) # pass the python object here
Note: function names in python should ideally be in snake_case not PascalCase, hence instead of Stops you should use stops. See
PEP 8 -- Style Guide for Python
Code
I trying decrypt my data using google protocol buffer in python
sample.proto file:-
syntax = "proto3";
message SimpleMessage {
string deviceID = 1;
string timeStamp = 2;
string data = 3;
}
After that, I have generated python files using the proto command:-
protoc --proto_path=./ --python_out=./ simple.proto
My Python code below:-
import json
import simple_pb2
import base64
encryptedData = 'iOjEuMCwic2VxIjoxODEsInRtcyI6IjIwMjEtMDEtMjJUMTQ6MDY6MzJaIiwiZGlkIjoiUlFI'
t2 = bytes(encryptedData, encoding='utf8')
print(encryptedData)
data = base64.b64decode(encryptedData)
test = simple_pb2.SimpleMessage()
v1 = test.ParseFromString(data)
While executing above code getting error:- google.protobuf.message.DecodeError: Wrong wire type in tag Error
What i am doing wrong. can anyone help?
Your data is not "encrypted", it's just base64-encoded. If you use your example code and inspect your data variable, then you get:
import base64
data = base64.b64decode(b'eyJ2ZXIiOjEuMCwic2VxIjoxODEsInRtcyI6IjIwMjEtMDEtMjJUMTQ6MDY6MzJaIiwiZGlkIjoiUlFIVlRKRjAwMDExNzY2IiwiZG9wIjoxLjEwMDAwMDAyMzg0MTg1NzksImVyciI6MCwiZXZ0IjoiVE5UIiwiaWdzIjpmYWxzZSwibGF0IjoyMi45OTI0OTc5OSwibG5nIjo3Mi41Mzg3NDgyOTk5OTk5OTUsInNwZCI6MC4wfQo=')
print(data)
> b'{"ver":1.0,"seq":181,"tms":"2021-01-22T14:06:32Z","did":"RQHVTJF00011766","dop":1.1000000238418579,"err":0,"evt":"TNT","igs":false,"lat":22.99249799,"lng":72.538748299999995,"spd":0.0}\n'
Which is evidently a piece of of JSON data, not a binary-serialized protocol buffer - which is what ParseFromString expects. Also, looking at the names and types of the fields, it looks like this payload just doesn't match the proto definition you've shown.
There are certainly ways to parse a JSON into a proto, and even to control the field names in that transformation, but not even the number of fields match directly. So you first need to define what you want: what proto message would you expect this JSON object to represent?
I am just getting started with the python BigQuery API (https://github.com/GoogleCloudPlatform/google-cloud-python/tree/master/bigquery) after briefly trying out (https://github.com/pydata/pandas-gbq) and realizing that the pandas-gbq does not support RECORD type, i.e. no nested fields.
Now I am trying to upload nested data to BigQuery. I managed to create the table with the respective Schema, however I am struggling with the upload of the json data.
from google.cloud import bigquery
from google.cloud.bigquery import Dataset
from google.cloud.bigquery import LoadJobConfig
from google.cloud.bigquery import SchemaField
SCHEMA = [
SchemaField('full_name', 'STRING', mode='required'),
SchemaField('age', 'INTEGER', mode='required'),
SchemaField('address', 'RECORD', mode='REPEATED', fields=(
SchemaField('test', 'STRING', mode='NULLABLE'),
SchemaField('second','STRING', mode='NULLABLE')
))
]
table_ref = client.dataset('TestApartments').table('Test2')
table = bigquery.Table(table_ref, schema=SCHEMA)
table = client.create_table(table)
When trying to upload a very simple JSON to bigquery I get a rather ambiguous error
400 Error while reading data, error message: JSON table encountered
too many errors, giving up. Rows: 1; errors: 1. Please look into the
error stream for more details.
Beside the fact that it makes me kinda sad that its giving up on me :), obviously that error description does not really help ...
Please find below how I try to upload the JSON and the sampledata.
job_config = bigquery.LoadJobConfig()
job_config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON
with open('testjson.json', 'rb') as source_file:
job = client.load_table_from_file(
source_file,
table_ref,
location='US', # Must match the destination dataset location.
job_config=job_config) # API request
job.result() # Waits for table load to complete.
print('Loaded {} rows into {}:{}.'.format(
job.output_rows, dataset_id, table_id))
This is my JSON object
"[{'full_name':'test','age':2,'address':[{'test':'hi','second':'hi2'}]}]"
A JSON example would be wonderful as this seems the only way to upload nested data if I am not mistaken.
I have been reproducing your scenario using your same code and the JSON content you shared, and I suspect the issue is only that you are defining the JSON content between quotation marks (" or '), while it should not have that format.
The correct format is the one that #ElliottBrossard already shared with you in his answer:
{'full_name':'test', 'age':2, 'address': [{'test':'hi', 'second':'hi2'}]}
If I run your code using that content in the testjson.json file, I get the response Loaded 1 rows into MY_DATASET:MY_TABLE and the content gets loaded in the table. If, otherwise, I use the format below (which you are using according to your question and the comments in the other answer), I get the result google.api_core.exceptions.BadRequest: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the error stream for more details.
"{'full_name':'test','age':2,'address':[{'test':'hi','second':'hi2'}]}"
Additionally, you can go to the Jobs page in the BigQuery UI (following the link https://bigquery.cloud.google.com/jobs/YOUR_PROJECT_ID), and there you will find more information about the failed Load job. As an example, when I run your code with the wrong JSON format, this is what I get:
As you will see, here the error message is more relevant:
error message: JSON parsing error in row starting at position 0: Value encountered without start of object
It indicates that the it could not find any valid start of a JSON object (i.e. a bracket { at the very beginning of the object).
TL;DR: remove the quotation marks in your JSON object and the load job should be fine.
I think your JSON content should be:
{'full_name':'test','age':2,'address':[{'test':'hi','second':'hi2'}]}
(No brackets.) As an example using the command-line client:
$ echo "{'full_name':'test','age':2,'address':[{'test':'hi','second':'hi2'}]}" \
> example.json
$ bq query --use_legacy_sql=false \
"CREATE TABLE tmp_elliottb.JsonExample (full_name STRING NOT NULL, age INT64 NOT NULL, address ARRAY<STRUCT<test STRING, second STRING>>);"
$ bq load --source_format=NEWLINE_DELIMITED_JSON \
tmp_elliottb.JsonExample example.json
$ bq head tmp_elliottb.JsonExample
+-----------+-----+--------------------------------+
| full_name | age | address |
+-----------+-----+--------------------------------+
| test | 2 | [{"test":"hi","second":"hi2"}] |
+-----------+-----+--------------------------------+
This question already has answers here:
How can I parse (read) and use JSON?
(5 answers)
Closed 25 days ago.
In Python I'm getting an error:
Exception: (<type 'exceptions.AttributeError'>,
AttributeError("'str' object has no attribute 'read'",), <traceback object at 0x1543ab8>)
Given python code:
def getEntries (self, sub):
url = 'http://www.reddit.com/'
if (sub != ''):
url += 'r/' + sub
request = urllib2.Request (url +
'.json', None, {'User-Agent' : 'Reddit desktop client by /user/RobinJ1995/'})
response = urllib2.urlopen (request)
jsonStr = response.read()
return json.load(jsonStr)['data']['children']
What does this error mean and what did I do to cause it?
The problem is that for json.load you should pass a file like object with a read function defined. So either you use json.load(response) or json.loads(response.read()).
Ok, this is an old thread but.
I had a same issue, my problem was I used json.load instead of json.loads
This way, json has no problem with loading any kind of dictionary.
Official documentation
json.load - Deserialize fp (a .read()-supporting text file or binary file containing a JSON document) to a Python object using this conversion table.
json.loads - Deserialize s (a str, bytes or bytearray instance containing a JSON document) to a Python object using this conversion table.
You need to open the file first. This doesn't work:
json_file = json.load('test.json')
But this works:
f = open('test.json')
json_file = json.load(f)
If you get a python error like this:
AttributeError: 'str' object has no attribute 'some_method'
You probably poisoned your object accidentally by overwriting your object with a string.
How to reproduce this error in python with a few lines of code:
#!/usr/bin/env python
import json
def foobar(json):
msg = json.loads(json)
foobar('{"batman": "yes"}')
Run it, which prints:
AttributeError: 'str' object has no attribute 'loads'
But change the name of the variablename, and it works fine:
#!/usr/bin/env python
import json
def foobar(jsonstring):
msg = json.loads(jsonstring)
foobar('{"batman": "yes"}')
This error is caused when you tried to run a method within a string. String has a few methods, but not the one you are invoking. So stop trying to invoke a method which String does not define and start looking for where you poisoned your object.
AttributeError("'str' object has no attribute 'read'",)
This means exactly what it says: something tried to find a .read attribute on the object that you gave it, and you gave it an object of type str (i.e., you gave it a string).
The error occurred here:
json.load(jsonStr)['data']['children']
Well, you aren't looking for read anywhere, so it must happen in the json.load function that you called (as indicated by the full traceback). That is because json.load is trying to .read the thing that you gave it, but you gave it jsonStr, which currently names a string (which you created by calling .read on the response).
Solution: don't call .read yourself; the function will do this, and is expecting you to give it the response directly so that it can do so.
You could also have figured this out by reading the built-in Python documentation for the function (try help(json.load), or for the entire module (try help(json)), or by checking the documentation for those functions on http://docs.python.org .
Instead of json.load() use json.loads() and it would work:
ex:
import json
from json import dumps
strinjJson = '{"event_type": "affected_element_added"}'
data = json.loads(strinjJson)
print(data)
So, don't use json.load(data.read()) use json.loads(data.read()):
def findMailOfDev(fileName):
file=open(fileName,'r')
data=file.read();
data=json.loads(data)
return data['mail']
use json.loads() function , put the s after that ... just a mistake btw i just realized after i searched error
def getEntries (self, sub):
url = 'http://www.reddit.com/'
if (sub != ''):
url += 'r/' + sub
request = urllib2.Request (url +
'.json', None, {'User-Agent' : 'Reddit desktop client by /user/RobinJ1995/'})
response = urllib2.urlopen (request)
jsonStr = response.read()
return json.loads(jsonStr)['data']['children']
try this
Open the file as a text file first
json_data = open("data.json", "r")
Now load it to dict
dict_data = json.load(json_data)
If you need to convert string to json. Then use loads() method instead of load(). load() function uses to load data from a file so used loads() to convert string to json object.
j_obj = json.loads('["label" : "data"]')
I am writing a code, that gathers some statistics about ontologies. as input I have a folder with files some are RDF/XML, some are turtle or nt.
My problem is, that when I try to parse a file using wrong format, next time even if I parse it with correct format it fails.
Here test file is turtle format. If first parse it with turtle format all is fine. but if I first parse it with the wrong format 1. error is understandable (file:///test:1:0: not well-formed (invalid token)), but error for second is (Unknown namespace prefix : owl). Like I said when I first parse with the correct one, I don't get namespace error.
Pleas help, after 2 days, I'm getting desperate.
query = 'SELECT DISTINCT ?s ?o WHERE { ?s ?p owl:Ontology . ?s rdfs:comment ?o}'
data = open("test", "r")
g = rdflib.Graph("IOMemory")
try:
result = g.parse(file=data,format="xml")
relations = g.query(query)
print(( " graph has %s statements." % len(g)))
except:
print "bad1"
e = sys.exc_info()[1]
print e
try:
result = g.parse(file=data,format="turtle")
relations = g.query(query)
print(( " graph has %s statements." % len(g)))
except :
print "bad2"
e = sys.exc_info()[1]
print e
The problem is that the g.parse reads some part from the file input stream of data first, only to figure out afterwards that it is not xml. The second call (with the turtle format) then continues to read from the input stream after the part where the previous attempt has stopped. The part read by the first parser is lost to the secnd one.
If your test file is small, the xml-parser might have read it all, leaving an "empty" rest. It seems the turtle parser did not complain - it just read in nothing. Only the query in the next statement failed to find anything owl-like in it, as the graph is empty. (I have to admit I cannot reproduce this part, the turtle parser does complain in my case, but maybe I have a different version of rdflib)
To fix it, try to reopen the file; either reorganize the code so you have an data = open("test", "r") every time you call result = g.parse(file=data, format="(some format)"), or call data.seek(0) in the except: clause, like:
for format in 'xml','turtle':
try:
print 'reading', format
result = g.parse(data, format=format)
print 'success'
break
except Exception:
print 'failed'
data.seek(0)