Troubles merging two json urls - python

First of all, I am getting this error. When I try running
pip3 install --upgrade json
in an attempt to resolve the error, python is unable to find the module.
The segment of code I am working with can be found below the error, but some further direction as for the code itself would be appreciated.
Error:
Traceback (most recent call last):
File "Chicago_cp.py", line 18, in <module>
StopWork_data = json.load(BeautifulSoup(StopWork_response.data,'lxml'))
File "/usr/lib/python3.8/json/__init__.py", line 293, in load
return loads(fp.read(),
TypeError: 'NoneType' object is not callable
Script:
#!/usr/bin/python
import json
from bs4 import BeautifulSoup
import urllib3
http = urllib3.PoolManager()
# Define Merge
def Merge(dict1, dict2):
res = {**dict1, **dict2}
return res
# Open the URL and the screen name
StopWork__url = "someJsonUrl"
Violation_url = "anotherJsonUrl"
StopWork_response = http.request('GET', StopWork__url)
StopWork_data = json.load(BeautifulSoup(StopWork_response.data,'lxml'))
Violation_response = http.request('GET', Violation_url)
Violation_data = json.load(BeautifulSoup(Violation_response.data,'lxml'))
dict3 = Merge(StopWork_data,Violation_data)
print (dict1)

json.load expects a file object or something else with a read method. The BeautifulSoup object doesn't have a method read. You can ask it for any attribute and it will try to find a child tag with that name, i.e. a <read> tag in this case. When it doesn't find one it returns None which causes the error. Here's a demo:
import json
from bs4 import BeautifulSoup
soup = BeautifulSoup("<p>hi</p>", "html5lib")
assert soup.read is None
assert soup.blablabla is None
assert json.loads is not None
json.load(soup)
Output:
Traceback (most recent call last):
File "main.py", line 8, in <module>
json.load(soup)
File "/usr/lib/python3.8/json/__init__.py", line 293, in load
return loads(fp.read(),
TypeError: 'NoneType' object is not callable
If the URL is returning JSON then you don't need BeautifulSoup at all because that's for parsing HTML and XML. Just use json.loads(response.data).

Related

unable to load json data

I am unable to load the json data & getting errors which mention below.
My code is ,
import requests
import json
url = 'https://172.28.1.220//actifio/api/info/lsjobhistory?sessionid=cafc8f31-fb39-4020-8172-e8f0085004fd'
ret=requests.get(url,verify=False)
data=json.load(ret)
print(data)
Getting error
Traceback (most recent call last):
File "pr.py", line 7, in <module>
data=json.load(ret)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/__init__.py", line 293, in load
return loads(fp.read(),
AttributeError: 'Response' object has no attribute 'read'
You dont actually need to import json
try this
import requests
url = 'https://172.28.1.220//actifio/api/info/lsjobhistory?sessionid=cafc8f31-fb39-4020-8172-e8f0085004fd'
ret = requests.get(url,verify=False)
data = ret.json()
print(data)

Attribute Error : 'Response' object has no attribute 'css'

When I tried this I am getting an Attribute Error : 'Response' object has no attribute 'css'
I tried with this code :
response.css('h1.ctn-article-title::text').extract()
can anyone help please?
i'm trying to get text "Update Primary Care" from below code which is title :
Update Primary Care
CME
i'm placing my entire code :
response = requests.get(url, headers = headers)
Traceback (most recent call last):
File "<console>", line 1, in <module>
NameError: name 'requests' is not defined
import requests
response = requests.get(url, headers = headers)
Traceback (most recent call last):
File "<console>", line 1, in <module>
NameError: name 'url' is not defined
url = 'somethingurl'
response = requests.get(url, headers = headers)
response.css('h1.ctn-article-title::text').extract()
Traceback (most recent call last):
File "<console>", line 1, in <module>
AttributeError: 'Response' object has no attribute 'css'
response.css('h1').extract()
Traceback (most recent call last):
File "<console>", line 1, in <module>
AttributeError: 'Response' object has no attribute 'css'
response.css('h1.ctn-article-title::text').extract()
As Tarun pointed out in the comments: You are mixing scrapy and requests code.
If you want to create a scrapy response from requests response you can try:
from scrapy.http import TextResponse
import requests
url = 'http://stackoverflow.com'
resp = requests.get(url)
resp = TextResponse(body=resp.content, url=url)
resp.xpath('//div')
# works!
See the docs for requests.Response and scrapy.http.TextResponse objects.
In this case the line where your error occurs expects a CSSResponse object not a normal response. Try to create a CSSResponse instead of the normal Response to resolve the error.
You can get it here
More specifically use an HtmlResponse because your response would be some HTML and not plain text. HtmlResponse is a subclass of CSSResponse so it inherits the missing method.
add this line in your code and it will work fine
remove any imports for requests from any other package.
from scrapy.http import Request

line = line.strip() TypeError: 'NoneType' object is not callable

I am trying to find all num's in a list from an html using beautifulsoup:
import urllib
from BeautifulSoup import *
import re
line = None
url = raw_input('Enter - ')
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)
# Retrieve all of the anchor tags
tags = soup('span')
for line in tags:
line = line.strip()
numlist = re.findall('[0-9]+' , tags)
print numlist`
I'm getting a traceback:
Traceback (most recent call last): File "C:\Documents and
Settings\mea388\Desktop\PythonSchool\new 12.py", line 14, in
line = line.strip() TypeError: 'NoneType' object is not callable
I cannot understand why I'm getting a traceback.
That's because you are trying to run strip on the tag class within beautiful soup.
Change line 14 to:
line = line.string.strip()
However be aware that this can still be None when the tag you are searching for has multiple sub elements. Seee link to string method on doco for beautiful soup

Why can't I retrieve a URL from an XPath query?

I have the following script, to find an image on a page and download it:
from lxml import html
import urllib
import urllib2
url = 'http://www.example.com/pages/page0987/'
usock = urllib2.urlopen(url)
data = usock.read()
usock.close()
tree = html.fromstring(data)
src = tree.xpath('/html/body/div[2]/div[4]/div/div/img/#src')
urllib.urlretrieve(src, "local-filename.jpg")
I get a webpage, access an <img> element on this page (I tr to find it using an XPath query), then I get a src attribute of this element and then try to download the image using this url from the source.
But something is wrong; Python says:
Traceback (most recent call last):
File "C:\Users\Sergey\Desktop\dlImg.py", line 15, in <module>
urllib.urlretrieve(src, "local-filename.jpg")
File "C:\Python27\lib\urllib.py", line 94, in urlretrieve
return _urlopener.retrieve(url, filename, reporthook, data)
File "C:\Python27\lib\urllib.py", line 228, in retrieve
url = unwrap(toBytes(url))
File "C:\Python27\lib\urllib.py", line 1060, in unwrap
url = url.strip()
AttributeError: 'list' object has no attribute 'strip'
Your tree.xpath() query returns a list, not a single match. At the very least index for the first item:
urllib.urlretrieve(src[0], "local-filename.jpg")
or use a loop over the results. Take into account that the list can be empty as well (no matches found).

beautifulsoup title from resultset object

I have been at this for a while, but can't seem to get the text value of an element of a resultset object using beautifulsoup. Here is the method that is failing:
def __getNameOfProduct(self, product):
#product is of type bs4.resultset...
for value in product:
print value.find_all("div",class_="proddisc").title.string
Its my own markup so I don't have a url (I am working through a tutorial), but here is the error I am getting
Traceback (most recent call last):
File "ctd.py", line 64, in <module>
main()
File "ctd.py", line 60, in main
p.getItemsInStock()
File "ctd.py", line 26, in getItemsInStock
return self.__returnItemDetailAsDictionary(itemDetail)
File "ctd.py", line 32, in __returnItemDetailAsDictionary
nameOfProduct = self.__getNameOfProduct(product)
File "ctd.py", line 44, in __getNameOfProduct
print value.find_all("div",class_="proddisc").title.string
AttributeError: 'ResultSet' object has no attribute 'title'
Any help would be very much appreciated.
Thanks!
the way you accessing attribute is only valid for single object not for many_objects as you getting with "find_all".
as i got your requirement, this will work:
html = urllib.urlopen("http://yoursite.com")
soup = BeautifulSoup(html)
prodisc_div = soup.findAll('div', attrs={class:"prodisc"})
for each in prodisc_div:
print each.get("title")

Categories

Resources