I am trying to create a Python script that can take a JSON object and insert it into a headless Couchbase server. I have been able to successfully connect to the server and insert some data. I'd like to be able to specify the path of a JSON object and upsert that.
So far I have this:
from couchbase.bucket import Bucket
from couchbase.exceptions import CouchbaseError
import json
cb = Bucket('couchbase://XXX.XXX.XXX?password=XXXX')
print cb.server_nodes
#tempJson = json.loads(open("myData.json","r"))
try:
result = cb.upsert('healthRec', {'record': 'bob'})
# result = cb.upsert('healthRec', {'record': tempJson})
except CouchbaseError as e:
print "Couldn't upsert", e
raise
print(cb.get('healthRec').value)
I know that the first commented out line that loads the json is incorrect because it is expecting a string not an actual json... Can anyone help?
Thanks!
Figured it out:
with open('myData.json', 'r') as f:
data = json.load(f)
try:
result = cb.upsert('healthRec', {'record': data})
I am looking into using cbdocloader, but this was my first step getting this to work. Thanks!
I know that you've found a solution that works for you in this instance but I thought I'd correct the issue that you experienced in your initial code snippet.
json.loads() takes a string as an input and decodes the json string into a dictionary (or whatever custom object you use based on the object_hook), which is why you were seeing the issue as you are passing it a file handle.
There is actually a method json.load() which works as expected, as you have used in your eventual answer.
You would have been able to use it as follows (if you wanted something slightly less verbose than the with statement):
tempJson = json.load(open("myData.json","r"))
As Kirk mentioned though if you have a large number of json documents to insert then it might be worth taking a look at cbdocloader as it will handle all of this boilerplate code for you (with appropriate error handling and other functionality).
This readme covers the uses of cbdocloader and how to format your data correctly to allow it to load your documents into Couchbase Server.
Related
I am trying to get all of the data stored in this json
as a dictionary that I can load and access. I am still new to writing spiders, but I believe I need something like
response.xpath().extract()
and then json.load().split() to get an element from it.
But the exact syntax I am not sure of, since there are so many elements in this file.
You can use re_first() to extract JSON from JavaScript code and next loads() it using json module:
import json
d = response.xpath('//script[contains(., "windows.PAGE_MODEL")]/text()').re_first(r'(?s)windows.PAGE_MODEL = (.+?\});')
data = json.loads(d)
property_id = data['propertyData']['id']
You're right, it pretty much works like you suggested in your question.
You can check the script tags for 'windows.PAGE_MODEL' with a simple xpath query.
Please try the following code in the callback for your request:
d = response.xpath('//script[text()[contains(., "windows.PAGE_MODEL")]]/text()').get()
from json import loads
data = loads(d)
I am using the Censys api in python to programmatically look through host and grab information about them. Censys website says it returns Json formatted data and it looks like Json formatted data but, I cant seem to figure out how to tun the api response into a json object. However, if i write the json response to a json file and load it. It works fine Any ideas?
Update: Figured out issue is with nested json that the api returns. Looking for libraries to flatten it.
Main.py
c = censys.ipv4.CensysIPv4(api_id=UID, api_secret=SECRET)
for result in c.search("autonomous_system.asn:15169 AND tags.raw:iot", max_records=1):
hostIPS.append(result["ip"]);
for host in hostIPS:
for details in c.view(host):
# test = json.dumps(details)
# test = json.load(test)
# data = json.load(details)
data = json.loads(details)
print(data)
You don't need to convert it to an object, it's already json.loaded. See the implementation here: https://github.com/censys/censys-python/blob/master/censys/base.py
I have a Python3 Flask app using Flask-Session (which adds server-side session support) and configured to use the filesystem type.
This type underlying uses the Werkzeug class werkzeug.contrib.cache.FileSystemCache (Werkzeug cache documentation).
The raw cache files look like this if opened:
J¬».].Äï;}î(å
_permanentîàå
respondentîåuuidîåUUIDîìî)Åî}î(åintîät˙ò∑flŒºçLÃ/∆6jhåis_safeîhåSafeUUIDîìîNÖîRîubåSECTIONS_VISITEDî]îåcurrent_sectionîKåSURVEY_CONTENTî}î(å0î}î(ås_idîås0îånameîåWelcomeîådescriptionîåîå questionsî]î}î(ås_idîhåq_idîhåq_constructîhåq_textîhå
q_descriptionîhåq_typeîhårequiredîhåoptions_rowîhåoptions_row_alpha_sortîhåreplace_rowîhåoptions_colîhåoptions_col_codesîhåoptions_col_alpha_sortîhåcond_continue_rules_rowîhåq_meta_notesîhuauå1î}î(hås1îhå Screeningîhå[This section determines if you fit into the target group.îh]î(}î(hh/håq1îh hh!å9Have you worked on a product in this field before?
The items stored in the session can be seen a bit above:
- current_section should be an integer, e.g., 0
- SECTIONS_VISITED should be an array of integers, e.g., [0,1,2]
- SURVEY_CONTENT format should be an object with structure like below
{
'item1': {
'label': string,
'questions': [{}]
},
'item2': {
'label': string,
'questions': [{}]
}
}
What you can see in the excerpt above, for example the text This section determines if you fit into the target group is the value of one label. The stuff after questions are keys that can be found in each questions object, e.g., q_text as well as their values, e.g., Have you worked on a product in this field before? is the value of q_text.
I need to retrieve data from the stored cache files in a way that I can read them without all the extra characters like å.
I tried using Werkzeug like this, where the item 9c3c48a94198f61aa02a744b16666317 is the name of the cache file I want to read. However, it was not found in the cache directory.
from werkzeug.contrib.cache import FileSystemCache
cache_dir="flask_session"
mode=0600
threshold=20000
cache = FileSystemCache(cache_dir, threshold=threshold, mode=mode)
item = "9c3c48a94198f61aa02a744b16666317"
print(cache.has(item))
data = cache.get(item)
print(data)
What ways are there to read the cache files?
I opened a GitHub issue in Flask-Session, but that's not really been actively maintained in years.
For context, I had an instance where for my web app writing to the database was briefly not working - but the data I need was also being saved in the session. So right now the only way to retrieve that data is to get it from these files.
EDIT:
Thanks to Tim's answer I solved it using the following:
import pickle
obj = []
with open(file_name,"rb") as fileOpener:
while True:
try:
obj.append(pickle.load(fileOpener))
except EOFError:
break
print(obj)
I needed to load all pickled objects in the file, so I combined Tim's solution with the one here for loading multiple objects: https://stackoverflow.com/a/49261333/11805662
Without this, I was just seeing the first pickled item.
Also, in case anyone has the same problem, I needed to use the same python version as my Flask app (related post). If I didn't, then I would get the following error:
ValueError: unsupported pickle protocol: 4
You can decode the data with pickle. Pickle is part of the Python standard library.
import pickle
with open("PATH/TO/SESSION/FILE") as f:
data = pickle.load(f)
I collected some tweets from the twitter API and stored it to mongodb, I tried exporting the data to a JSON file and didn't have any issues there, until I tried to make a python script to read the JSON and convert it to a csv. I get this traceback error with my code:
json.decoder.JSONDecodeError: Extra data: line 367 column 1 (char 9745)
So, after digging around the internet I was pointed to check the actual JSON data in an online validator, which I did. This gave me the error of:
Multiple JSON root elements
from the site https://jsonformatter.curiousconcept.com/
Here are pictures of the 1st/2nd object beginning/end of the file:
or a link to the data here
Now, the problem is, I haven't found anything on the internet of how to handle that error. I'm not sure if it's an error with the data I've collected, exported, or if I just don't know how to work with it.
My end game with these tweets is to make a network graph. I was looking at either Networkx or Gephi, which is why I'd like to get a csv file.
Robert Moskal is right. If you can address the issue at source and use --jsonArray flag when you use mongoexport then it will make the problem easier i guess. If you can't address it at source then read the below points.
The code below will extract you the individual json objects from the given file and convert them to python dictionaries.
You can then apply your CSV logic to each individual dictionary.
If you are using csv module then I would say use unicodecsv module as it would handle the unicode data in your json objects.
import json
with open('path_to_your_json_file', 'rb') as infile:
json_block = []
for line in infile:
json_block.append(line)
if line.startswith('}'):
json_dict = json.loads(''.join(json_block))
json_block = []
print json_dict
If you want to convert it to CSV using pandas you can use the below code:
import json, pandas as pd
with open('path_to_your_json_file', 'rb') as infile:
json_block = []
dictlist=[]
for line in infile:
json_block.append(line)
if line.startswith('}'):
json_dict = json.loads(''.join(json_block))
dictlist.append(json_dict)
json_block = []
df = pd.DataFrame(jsonlist)
df.to_csv('out.csv',encoding='utf-8')
If you want to flatten out the json object you can use pandas.io.json.json_normalize() method.
Elaborating on #MYGz suggestion to use --jsonArray
Your post doesn't show how you exported the data from mongo. If you use the following via the terminal, you will get valid json from mongodb:
mongoexport --collection=somecollection --db=somedb --jsonArray --out=validfile.json
Replace somecollection, somedb and validfile.json with your target collection, target database, and desired output filename respectively.
The following: mongoexport --collection=somecollection --db=somedb --out=validfile.json...will NOT give you the results you are looking for because:
By default mongoexport writes data using one JSON document for every
MongoDB document. Ref
A bit late reply, and I am not sure it was available the time this question was posted. Anyway, now there is a simple way to import the mongoexport json data as follows:
df = pd.read_json(filename, lines=True)
mongoexport provides each line as a json objects itself, instead of the whole file as json.
I am very new to Python and am not very familiar with the data structures in Python.
I am writing an automatic JSON parser in Python, the JSON message is read into a dictionary using Ultra-JSON:
jsonObjs = ujson.loads(data)
Now, if I try something like:
jsonObjs[param1][0][param2] it works fine
However, I need to get the path from an external source (I read it from the DB), we initially thought we'll just write in the DB:
myPath = [param1][0][param2]
and then try to access:
jsonObjs[myPath]
But after a couple of failures I realized I'm trying to access:
jsonObjs[[param1][0][param2]]
Is there a way to fix this without parsing myPath?
Many thanks for your help and advice
Store the keys in a format that preserves type information, e.g. JSON, and then use reduce() to perform recursive accesses on the structure.