Creating Python Active Resource object from json file - python

So for testing purposes, I am trying to create a python ActiveResource object from a json file (I want the object to have attributes from the json file). More specifically I am using the ShopifyResource from (https://github.com/Shopify/shopify_python_api), which extends the ActiveResource object.
I looked through the source code and found some functions that I thought would be of some use:
(https://github.com/Shopify/shopify_python_api/blob/master/shopify/base.py)
from pyactiveresource.activeresource import ActiveResource
import shopify.mixins as mixins
class ShopifyResource(ActiveResource, mixins.Countable):
_format = formats.JSONFormat
def _load_attributes_from_response(self, response):
if response.body.strip():
self._update(self.__class__.format.decode(response.body))
where _update is from ActiveResource (https://github.com/Shopify/pyactiveresource/blob/master/pyactiveresource/activeresource.py)
def _update(self, attributes):
"""Update the object with the given attributes.
Args:
attributes: A dictionary of attributes.
Returns:
None
"""
if not isinstance(attributes, dict):
return
for key, value in six.iteritems(attributes):
if isinstance(value, dict):
klass = self._find_class_for(key)
attr = klass(value)
elif isinstance(value, list):
klass = None
attr = []
for child in value:
if isinstance(child, dict):
if klass is None:
klass = self._find_class_for_collection(key)
attr.append(klass(child))
else:
attr.append(child)
else:
attr = value
# Store the actual value in the attributes dictionary
self.attributes[key] = attr
So then I tried to do the following:
order = Order()
with open("file.json")) as json_file:
x = json.loads(json_file.read())
order._update(x)
Where Order extends ShopifyResource (which extends ActiveResource). If am not mistaken x should be a dictionary, which is an approriate parameter for the _update() function.
Yet I get the following output:
raceback (most recent call last):
File "/home/vineet/Documents/project/tests/test_sync.py", line 137, in testSaveOrder1
self.getOrder()
File "/home/vineet/Documents/project/tests/tests/test_sync.py", line 113, in getOrder
order._update(x)
File "/home/vineet/Documents/project/venv/lib/python3.6/site-packages/pyactiveresource/activeresource.py", line 962, in _update
attr.append(klass(child))
File "/home/vineet/Documents/project/venv/lib/python3.6/site-packages/shopify/base.py", line 126, in __init__
prefix_options, attributes = self.__class__._split_options(attributes)
File "/home/vineet/Documents/project/venv/lib/python3.6/site-packages/pyactiveresource/activeresource.py", line 466, in _split_options
if key in cls._prefix_parameters():
File "/home/vineet/Documents/project/venv/lib/python3.6/site-packages/pyactiveresource/activeresource.py", line 720, in _prefix_parameters
for match in template.pattern.finditer(path):
TypeError: cannot use a string pattern on a bytes-like object
I even tried the following:
order._update(order._format.decode(json_file.read()))
But that didn't work since 'str' object has no attribute 'decode'.

It seems You are worried if x has correct format. Print it, and check.
Btw: And use
x = json.load(json_file)
instead of
x = json.loads(json_file.read())

Related

How to extract attribute value from a tag in BeautifulSoup

I am trying to extract the value of an attribute from a tag (in this case, TD). The code is as follows (the HTML document is loaded correctly; self.data contains string with HTML data, this method is part of a class):
def getLine (self):
dat = BeautifulSoup(self.data, "html.parser")
tags = dat.find_all("tr")
for current in tags:
line = current.findChildren("td", recursive=False)
for currentLine in line:
# print (currentLine)
clase = currentLine["class"] # <-- PROBLEMATIC LINE
if clase is not None and "result" in clase:
valor = Line()
valor.name = line.text
The error is in the line clase = currentLine["class"]. I just need to check the tag element has this attribute and do things in case it has the value "result".
File "C:\DataProgb\urlwrapper.py", line 43, in getLine
clase = currentLine["class"] #Trying to extract attribute class
\AppData\Local\Programs\Python\Python39\lib\site-packages\bs4\element.py", line 1519, in __getitem__
return self.attrs[key]
KeyError: 'class'
It should work, because it's just an element. I don't understand this error. Thanks.
Main issue is that you try to access the attribute key directly, what will return a KeyError, if the attribute is not available:
currentLine["class"]
Instead use get() that will return in fact of a missing attribute None:
currentLine.get("class")
From the docs - get(key\[, default\]):
Return the value for key if key is in the dictionary, else default. If default is not given, it defaults to None, so that this method never raises a KeyError.

pymongo error: filter must be an instance of dict, bson.son.SON, or other type that inherits from collections.Mapping

I think the query is correct but still an error.
findQ = {"fromid": wordid}, {"toid":1}
res= self.db.wordhidden.find(findQ)
However, find_one(findQ) works. So I can't find the wrong thing.
I use python 3.6 and pymongo.
Here is my code:
def getallhiddenids(self,wordids,urlids):
l1={}
for wordid in wordids:
findQ = {"fromid": wordid}, {"toid":1}
res= self.db.wordhidden.find(findQ)
for row in res: l1[row[0]]=1
for urlid in urlids:
findQ = {"toid": urlid}, {"fromid":1}
res= self.db.hiddenurl.find(findQ)
This is an error:
Traceback (most recent call last):
File "C:\Users\green\Desktop\example.py", line 9, in <module>
neuralnet.trainquery([online], possible, notspam)
File "C:\Users\green\Desktop\nn.py", line 177, in trainquery
self.setupnetwork(wordids,urlids)
File "C:\Users\green\Desktop\nn.py", line 105, in setupnetwork
self.hiddenids=self.getallhiddenids(wordids,urlids)
File "C:\Users\green\Desktop\nn.py", line 93, in getallhiddenids
res= self.db.wordhidden.find(findQ)
File "C:\Users\green\AppData\Local\Programs\Python\Python36-32\lib\site-
packages\pymongo\collection.py", line 1279, in find
return Cursor(self, *args, **kwargs)
File "C:\Users\green\AppData\Local\Programs\Python\Python36-32\lib\site-
packages\pymongo\cursor.py", line 128, in __init__
validate_is_mapping("filter", spec)
File "C:\Users\green\AppData\Local\Programs\Python\Python36-32\lib\site-
packages\pymongo\common.py", line 400, in validate_is_mapping
"collections.Mapping" % (option,))
TypeError: filter must be an instance of dict, bson.son.SON, or other type
that inherits from collections.Mapping
find_one(findQ) works
The error is because PyMongo find() requires a dictionary or a bson.son object. What you have passed in is a Python tuple object is the form of ({"fromid": wordid}, {"toid":1}). You could correct this by invoking the find() method as below:
db.wordhidden.find({"fromid": wordid}, {"toid": 1})
Technically your invocation of find_one() does not work either. It just that the parameter filter has been altered by find_one(). see find_one() L1006-1008. Which basically format your tuple filter into :
{'_id': ({"fromid": wordid}, {"toid":1}) }
The above would (should) not returned any matches in your collection.
Alternative to what you're doing, you could store the filter parameter into two variables, for example:
filterQ = {"fromid": wordid}
projectionQ = {"toid": 1}
cursor = db.wordhidden.find(filterQ, projectionQ)

pymongo typeError: document must be an instance of dict, bson.son.SON, bson.raw_bson.RawBSONDocument

I was trying to migrate data from SQL Server to MongoDB but was getting below type error in the last phase while importing data to MongoDB.
mongoImp = dbo.insert_many(jArray)
File "/home/lrsa/.local/lib/python2.7/site-packages/pymongo/collection.py", line 710, in insert_many
blk.ops = [doc for doc in gen()]
File "/home/lrsa/.local/lib/python2.7/site-packages/pymongo/collection.py", line 702, in gen
common.validate_is_document_type("document", document)
File "/home/lrsa/.local/lib/python2.7/site-packages/pymongo/common.py", line 407, in validate_is_document_type
"collections.MutableMapping" % (option,))
TypeError: document must be an instance of dict, bson.son.SON, bson.raw_bson.RawBSONDocument, or a type that inherits from collections.MutableMapping
I have also checked the type(jArray) which is a str. Tried with converting the data type to list as well but could not succeed.
My Code:
import pyodbc
import json
import collections
import pymongo
from bson import json_util
odbcArray = []
mongoConStr = '192.168.10.107:36006'
sqlConStr = 'DRIVER={MSSQL-NC1311};SERVER=tcp:192.168.10.103,57967;DATABASE=AdventureWorks;UID=testuser;PWD=testuser'
mongoConnect = pymongo.MongoClient(mongoConStr)
sqlConnect = pyodbc.connect(sqlConStr)
dbo = mongoConnect.eaedw.sqlData
dbDocs = dbo.find()
sqlCur = sqlConnect.cursor()
sqlCur.execute("""
SELECT TOP 2 BusinessEntityID,Title, Demographics, rowguid, ModifiedDate
FROM Person.Person
""")
tuples = sqlCur.fetchall()
for tuple in tuples:
doc = collections.OrderedDict()
doc['id'] = tuple.BusinessEntityID
doc['title'] = tuple.Title
doc['dgrap'] = tuple.Demographics
doc['rowi'] = tuple.rowguid
doc['mtime'] = tuple.ModifiedDate
odbcArray.append(doc)
jArray = json.dumps(odbcArray, default=json_util.default)
mongoImp = dbo.insert_many(jArray)
mongoConnect.close()
sqlConnect.close()
Check out this bulk insert example from MongoDB:s webpage. Skip the json.dumps call (which turns your array of documents into a json formatted string) and insert odbcArray directly:
mongoImp = dbo.insert_many(odbcArray)

“TypeError: 'unicode' object does not support item assignment” in dicts when scraping via scrapy pipeline

I'm trying to build a dictionary of keywords and put it into a scrapy item.
'post_keywords':{1: 'midwest', 2: 'i-70',}
The point is that this will all go inside a json object later on down the road. I've tried initializing a new blank dictionary first, but that doesn't work.
Pipeline code:
tag_count = 0
for word, tag in blob.tags:
if tag == 'NN':
tag_count = tag_count+1
nouns.append(word.lemmatize())
keyword_dict = dict()
key = 0
for item in random.sample(nouns, tag_count):
word = Word(item)
key=key+1
keyword_dict[key] = word
item['post_keywords'] = keyword_dict
Item:
post_keywords = scrapy.Field()
Output:
Traceback (most recent call last):
File "B:\Mega Sync\Programming\job_scrape\lib\site-packages\twisted\internet\defer.py", line 588, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "B:\Mega Sync\Programming\job_scrape\cl_tech\cl_tech\pipelines.py", line215, in process_item
item['post_noun_phrases'] = noun_phrase_dict
TypeError: 'unicode' object does not support item assignment
It SEEMS like pipelines behave weirdly, like they don't want to run all the code in the pipeline UNLESS all the item assignments check out, which makes it so that my initialized dictionaries aren't created or something.
Thanks to MarkTolonen for the help.
My mistake was using the variable name 'item' for more than two things.
This works:
for thing in random.sample(nouns, tag_count):
word = Word(thing)
key = key+1
keyword_dict[key] = word
item['post_keywords'] = keyword_dict

json: TypeError: '<li>...spam...</li>' is not JSON serializable

Within the following class I am trying to save some state information to a json file, however when I attempt to save a dictionary I come across a TypeError: '<li>...stuff...</li>' is not JSON serializable
class Save(object):
def __init__(self, MainFrameDict):
super(Save, self).__init__()
self.MainFrameDict = MainFrameDict
import pdb; pdb.set_trace()
self.writeJson()
def writeJson(self):
self.json_state_file = os.path.join(self.MainFrameDict['item_folder'],
self.MainFrameDict['itemNumber']+'.json')
with open(self.json_state_file,'wb') as f:
json.dump(self.MainFrameDict['currentItemInfo'], f)
self.printJsonStateFile()
#import pdb; pdb.set_trace()
def printJsonStateFile(self):
with open(self.json_state_file,'rb') as f:
json_data = json.loads(f)
Within the dictionary that I am trying to save:
(Pdb) print(self.MainFrameDict['currentItemInfo'].keys())
['image_list', 'description', 'specs']
(Pdb) print(self.MainFrameDict['currentItemInfo']['description'])
Run any time in the comfort of your own home. Deluxe treadmill
features 9 programs; plug in your MP3 player to rock your workout!
<ul>
<li>Horizon T101 deluxe treadmill</li>
<li>55" x 20" treadbelt</li>
<li>9 programs</li>
<li>Fan</li>
<li>Motorized incline to 10%</li>
<li>Up to 10 mph</li>
<li>Surround speakers are compatible with your MP3 player (not
included)</li>
<li>71"L x 33"W x 55"H</li>
<li>Wheels for mobility</li>
<li>Folds for storage</li>
<li>Weight limit: 300 lbs.</li>
<li>Assembly required</li>
<li>Limited warranty</li>
<li>Made in USA</li>
</ul>
(Pdb) print type(self.MainFrameDict['currentItemInfo']['description'])
<class 'bs4.BeautifulSoup'>
The traceback that I am trying to figure out:
Traceback (most recent call last):
File "display_image.py", line 242, in onNewItemButton
Save(MainFrame.__dict__)
File "display_image.py", line 20, in __init__
self.writeJson()
File "display_image.py", line 24, in writeJson
json.dump(self.MainFrameDict['currentItemInfo'], f)
File "C:\Python27\Lib\json\__init__.py", line 189, in dump
for chunk in iterable:
File "C:\Python27\Lib\json\encoder.py", line 434, in _iterencode
for chunk in _iterencode_dict(o, _current_indent_level):
File "C:\Python27\Lib\json\encoder.py", line 408, in _iterencode_dict
for chunk in chunks:
File "C:\Python27\Lib\json\encoder.py", line 442, in _iterencode
o = _default(o)
File "C:\Python27\Lib\json\encoder.py", line 184, in default
raise TypeError(repr(o) + " is not JSON serializable")
TypeError: Run any time in the comfort of your own home. Deluxe treadmill
features 9 programs; plug in your MP3 player to rock your workout!
<ul>
<li>Horizon T101 deluxe treadmill</li>
<li>55" x 20" treadbelt</li>
<li>9 programs</li>
<li>Fan</li>
<li>Motorized incline to 10%</li>
<li>Up to 10 mph</li>
<li>Surround speakers are compatible with your MP3 player (not
included)</li>
<li>71"L x 33"W x 55"H</li>
<li>Wheels for mobility</li>
<li>Folds for storage</li>
<li>Weight limit: 300 lbs.</li>
<li>Assembly required</li>
<li>Limited warranty</li>
<li>Made in USA</li>
</ul>
is not JSON serializable
Docs/Posts looked at:
https://docs.python.org/2/library/json.html
What is the correct JSON content type?
Python serializable objects json
Python serializable objects json
is not JSON serializable
Python sets are not json serializable
Python serializable objects json
How to overcome "datetime.datetime not JSON serializable"?
JSON datetime between Python and JavaScript
How to overcome "datetime.datetime not JSON serializable"?
JSON serialization of Google App Engine models
I am not sure if this is because it is nested, or if there is an issue with encoding/decoding. What exactly am I looking for and what am I not understanding? Is there a way to determine the encoding of an item?
what is the type of self.MainFrameDict['currentItemInfo']['description']?
It's not a str, int, float, list, tuple, bool or None, so json doesn't know what to do with it. You'll need to convert it to one of those types...
You can try this. In case of me its work 100%. And can rncode into json any kind of python objects like dictionary, list, tuple etc or normal class object.
import json
class DatetimeEncoder(json.JSONEncoder):
def default(self, obj):
try:
return super(DatetimeEncoder, obj).default(obj)
except TypeError:
return str(obj)
class JsonSerializable(object):
def toJson(self):
return json.dumps(self.__dict__, cls=DatetimeEncoder)
def __repr__(self):
return self.toJson()
class Utility(JsonSerializable):
def __init__(self, result = object, error=False, message=''):
self.result=result
self.error=error
self.message=message
At last call the Utility class like that .and convert to json
jsone = Utility()
jsone.result=result # any kind of object
json.error=True # only bool valu
jsone.toJson()

Categories

Resources