I am trying to return a list of completed items in a given category using the ebay API. My code seems to be working however the results seem to be very limited (about 100). I was assuming there would be some limitation on how far back the api would go but even just a few days should return thousands of results for this category. Am I missing something in the code or is this just a limitation of the ebay API? I did make sure I was using production and not the sandbox.
So I have realized now that there are multiple pages to my query up to the 100 item / 100 page max. I am now running into issues with the date filtering. I see the filter reference material on site but I am still not getting the result I expect. In the updated query I am trying to pull only items completed yesterday but when running I am getting stuff from today. Is there a better way to input the date filters?
from ebaysdk.finding import Connection as finding
from bs4 import BeautifulSoup
import os
import csv
api = finding(appid=<my appid>,config_file=None)
response = api.execute(
'findCompletedItems', {
'categoryId': '214',
'keywords' : 'prizm',
'endTimeFrom' : '2020-02-03T00:00:00.000Z',
'endTimeTo' : '2020-02-04T00:00:00.000Z' ,
'paginationInput': {
'entriesPerPage': '100',
'pageNumber': '1'
},
'sortOrder': 'EndTimeSoonest'
}
)
soup = BeautifulSoup(response.content , 'lxml')
totalitems = int(soup.find('totalentries').text)
items = soup.find_all('item')
for item in response.reply.searchResult.item:
print(item.itemId)
print(item.listingInfo.endTime)
I finally figured this out. I needed to add additional code for the item filters. The working code is below.
from ebaysdk.finding import Connection as finding
from bs4 import BeautifulSoup
import os
import csv
api = finding(appid=<my appid>,config_file=None)
response = api.execute(
'findCompletedItems', {
'categoryId': '214',
'keywords' : 'prizm',
'itemFilter': [
{'name': 'EndTimeFrom', 'value': '2020-02-03T00:00:00.000Z'},
{'name': 'EndTimeTo', 'value': '2020-02-04T00:00:00.000Z'}
#{'name': 'MinPrice', 'value': '200', 'paramName': 'Currency', 'paramValue': 'GBP'},
#{'name': 'MaxPrice', 'value': '400', 'paramName': 'Currency', 'paramValue': 'GBP'}
],
'paginationInput': {
'entriesPerPage': '100',
'pageNumber': '100'
},
'sortOrder': 'EndTimeSoonest'
}
)
soup = BeautifulSoup(response.content , 'lxml')
totalitems = int(soup.find('totalentries').text)
items = soup.find_all('item')
for item in response.reply.searchResult.item:
print(item.itemId)
print(item.listingInfo.endTime)
Related
i'm trying to scrape some data from a site called laced.co.uk, and i'm a tad confused on whats going wrong. i'm new to this, so try and explain it simply (if possible please!). Here is my code ;
from bs4 import BeautifulSoup
import requests
url = "https://www.laced.co.uk/products/nike-dunk-low-retro-black-white?size=7"
result = requests.get(url)
doc = BeautifulSoup(result.text, "html.parser")
prices = doc.find_all(text=" £195 ")
print(prices)
thank you! (the price at time of posting was 195 (it showed as the size 7 buy now price on the page).
The price is loaded within a <script> tag on the page:
<script>
typeof(dataLayer) != "undefined" && dataLayer.push({
'event': 'eec.productDetailImpression',
'page': {
'ecomm_prodid': 'DD1391-100'
},
'ecommerce': {
'detail': {
'actionField': {'list': 'Product Page'},
'products': [{
'name': 'Nike Dunk Low Retro Black White',
'id': 'DD1391-100',
'price': '195.0',
'brand': 'Nike',
'category': 'Dunk, Dunk Low, Mens Nike Dunks',
'variant': 'White',
'list': 'Product Page',
'dimension1': '195.0',
'dimension2': '7',
'dimension3': '190',
'dimension4': '332'
}]
}
}
});
</script>
You can use a regular expression pattern to search for the price. Note, there's no need for BeautifulSoup:
import re
import requests
url = "https://www.laced.co.uk/products/nike-dunk-low-retro-black-white?size=7"
result = requests.get(url)
price = re.search(r"'price': '(.*?)',", result.text).group(1)
print(f"£ {price}")
I send POST data using the requests library in python.
I can't get the result of the form. I wonder it's because it's too fast.
The action URL of the form is the same page. And when I manually fill the form and submit it, then the result of the form appears at a div of the page. But when I use requests in python, the result div is empty even though the response status code 200. What should I do to obtain the result?
my code is below:
import requests
import time
from time import sleep
url = "https://******"
data = {
'year': '1973',
'month': '03',
'name': 'chae'
}
res = requests.post(url, data)
print(res) #status code 200
print(res.text)
Any advice would be appreciated.
This code give me the information:
import requests
import time
from time import sleep
url = "https://efine.go.kr/licen/truth/licenTruth.do"
# data = {'checkPage': '1', 'flag': '', 'regYear': '1973', 'regMonth': '03', 'regDate': '01', 'name': '채승완',
# 'licenNo0': '11', 'licenNo1': '91', 'licenNo2': '822161', 'licenNo3': '12'}
data = {
"checkPage": "2",
"flag": "searchPage",
"regYear": "1973",
"regMonth": "03",
"regDate": "01",
"name": "채승완",
"licenNo0": "11",
"licenNo1": "91",
"licenNo2": "822161",
"licenNo3": "12",
"ghostNo": "2161",
}
res = requests.post(url, data=data)
print(res.text)
# result contains "전산 자료와 일치 합니다.식별번호가 일치하지 않습니다."
try using requests.Session() instead of requests, it worked for me.
I've always did it this way, please check out this https://requests.readthedocs.io/en/master/user/advanced/ and let me know if it was helpful.
I have a rather basic bit of code. Basically what it does is sends an API request to a locally hosted Server and returns a JSON string. I'm taking that string and cracking it apart. Then I take what I need from it, make a Dictionary, and export it as an XML file with an nfo extension.
The issue is sometimes there are missing bits to the source data. Season is missing fairly frequently for example. It breaks the Data Mapping. I need a way to handle that. For somethings I may want to exclude the data and for others I need a sane default value.
#!/bin/env python
import os
import requests
import re
import json
import dicttoxml
import xml.dom.minidom
from xml.dom.minidom import parseString
# Grab Shoko Auth Key
apiheaders = {
'Content-Type': 'application/json',
'Accept': 'application/json',
}
apidata = '{"user": "Default", "pass": "", "device": "CLI"}'
r = requests.post('http://192.168.254.100:8111/api/auth',
headers=apiheaders, data=apidata)
key = json.loads(r.text)['apikey']
# Grabbing Episode Data
EpisodeHeaders = {
'accept': 'text/plain',
'apikey': key
}
EpisodeParams = (
('filename',
"FILE HERE"),
('pic', '1'),
)
fileinfo = requests.get(
'http://192.168.254.100:8111/api/ep/getbyfilename', headers=EpisodeHeaders, params=EpisodeParams)
# Mapping Data from Shoko to Jellyfin NFO
string = json.loads(fileinfo.text)
print(string)
eplot = json.loads(fileinfo.text)['summary']
etitle = json.loads(fileinfo.text)['name']
eyear = json.loads(fileinfo.text)['year']
episode = json.loads(fileinfo.text)['epnumber']
season = json.loads(fileinfo.text)['season']
aid = json.loads(fileinfo.text)['aid']
seasonnum = season.split('x')
# Create Dictionary From Mapped Data
show = {
"plot": eplot,
"title": etitle,
"year": eyear,
"episode": episode,
"season": seasonnum[0],
}
Here is some example output when the code crashes
{'type': 'ep', 'eptype': 'Credits', 'epnumber': 1, 'aid': 10713, 'eid': 167848,
'id': 95272, 'name': 'Opening', 'summary': 'Episode Overview not Available',
'year': '2014', 'air': '2014-11-23', 'rating': '10.00', 'votes': '1',
'art': {'fanart': [{'url': '/api/v2/image/support/plex_404.png'}],
'thumb': [{'url': '/api/v2/image/support/plex_404.png'}]}}
Traceback (most recent call last):
File "/home/fletcher/Documents/Shoko-Jellyfin-NFO/Xml3.py", line 48, in <module>
season = json.loads(fileinfo.text)['season']
KeyError: 'season'
The solution based on what Mahori suggested. Worked perfectly.
eplot = json.loads(fileinfo.text).get('summary', None)
etitle = json.loads(fileinfo.text).get('name', None)
eyear = json.loads(fileinfo.text).get('year', None)
episode = json.loads(fileinfo.text).get('epnumber', None)
season = json.loads(fileinfo.text).get('season', '1x1')
aid = json.loads(fileinfo.text).get('aid', None)
This is fairly common scenario with web development, where you cannot always assume other party will send all keys.
The standard way to get around this is by using get instead of named fetch.
season = json.loads(fileinfo.text).get('season', None)
#you can change None to any default value here
I have a structure:
[
# If it is a comment (parent comment)
{
'commentParentId': '',
'parentId': '',
'posted': '28/02/2019',
'author': {
'id': '125379',
'name': 'david',
},
'content': 'i need help'
},
# If it is a comment reply
{
'commentParentId': 'abcdedf',
'parentId': '253654',
'posted': '28/02/2019',
'author': {
'id': '458216',
'name': 'david',
},
'content': 'i need help'
},
........................
}]
I want to scrape comment and comment replies,
If it is a comment: CommentParentIDand ParentID are null.
Else, it is a comment reply: CommentParentID and ParentID will take ID from comment which someone replied.
I am scraping comments using Selenium, like this:
import requests
from bs4 import BeautifulSoup
import json
from datetime import datetime
from selenium import webdriver
# Execute Web link
url = "https://genvita.vn/thu-thach/7-ngay-detox-da-dep-dang-thon-nguoi-
khoe-qua-soc-len-den-8-trieu-dong"
driver_path = ('F:/chromedriver.exe')
browser = webdriver.Chrome(executable_path=driver_path)
browser.get(url)
confirm_write = input("Input ok to scrape data: ")
# I want to load all comments (click 'Xem Thêm' then data was
# scraper)
if confirm_write == 'ok':
getID = browser.find_element_by_css_selector("div[class='media-body-
replies']")
getChildID = getID.find_elements_by_css_selector('data-comment-id')
# Get ID
for childID in getChildID:
print(childID.get_attribute('data-comment-id'))
But my code is not working.
Comment and comment reply have same class, same id, just difference is between comment and comment reply is the class : class ='media-body-replies'.
But I am using this and it is not working.
If I use getChildID = browser.find_elements_by_css_selector('data-comment-id')
I will get all ID's of parentID and replyID (similar with content), I cannot separate between comment and comment reply.
Thank You
JSON RESPONSE FROM WEBSITE I am new to python scrapy and json . I am trying to scrape json response from 78751 . But it is showing error . The code i used is
import scrapy
import json
class BlackSpider(scrapy.Spider):
name = 'black'
start_urls = ['https://appworld.blackberry.com/cas/content/2360/reviews/2.17.2?page=1&pagesize=100&sortby=newest&callback=_content_2360_reviews_2_17_2&_=1499161778751']
def parse(self, response):
data = re.findall('(\{.+\})\);', response.body_as_unicode())
a=json.loads(data[0])
item = MyItem()
item["Reviews"] = a["reviews"][4]["review"]
return item
The error it is showing is
ValueError("No JSON object could be decoded")ERROR
The response you are getting is javascript function with some json in it:
_content_2360_reviews_2_17_2(\r\n{"some":"json"}]});\r\n
To extract the data from this you can use simple regex solution:
import re
import json
data = re.findall('(\{.+\})\);', response.body_as_unicode())
json.loads(data[0])
It trasnslates to: select everything between {} that ends with );
edit: results I'm getting with this:
{'platform': None,
'reviews': [{'createdDate': '2017-07-04',
'model': 'London',
'nickname': 'aravind14-92362',
'rating': 6,
'review': 'Very bad ',
'title': 'My WhatsApp no update '}],
'totalReviews': 569909,
'version': '2.17.2'}