Amazon Advertising API ItemSearch: Get more than 10 pages / 100 results? - python

I'm trying to use the Python wrapper for the Amazon Advertising API (http://pypi.python.org/pypi/python-amazon-product-api/), but when I try to do an ItemSearch and try to access the 11th page, I get something along the lines of:
The value you specified for ItemPage is invalid. ... between 1 and 10
I know in order to avoid this problem, I could just perform another search query, but is there a way to start the search query on a certain page? Or is there a way (for books) to set a boundary for the publishing year? I just need a way so that I could make my search results smaller, so that I don't run into this error. Right now this is how I'm calling it:
results = api.item_search('Books', ResponseGroup='Images,Large',
AssociateTag='qwerty', Publisher=kwd)
Where kwd is just a publisher name obtained from a file.

Amazon will only give you the first 10 pages (as of API version 2011-08-01). You could run slightly different searches to get more results, but these would also be restricted to the first 10 pages.

Related

Number of orders matching parameters?

I'm trying to setup some automation in my life here and I've hit a bit of a snag.
I'm using the WooCommerce API and I've utilized the woocommerce python library.
I'm able to get a max of 100 orders for a given date range. So I've done the following:
wcapi.get("orders?per_page=100&before=2023-02-01T23:59:59&after=2023-01-01T00:00:00").json()
It appears 100 is the max you can set for "per_page". I can then utilize the "page" argument to get page 2 and so on. I'm aware of how to loop through the pages, however I can't find anything in the documentation that can tell me how many orders fit my before/after parameters. So I'm left with paging through until I receive an error?
reports/orders/totals appears to ignore any parameters given to it, so there is no way to do any date filtering.

Google Custom Search JSON API date filter not returning expected results?

I have been using Google's Custom Search JSON API to perform higher education research for a while, but have recently ran into some issues with using "exactTerms" and "dateRestrict" parameters (https://developers.google.com/custom-search/v1/reference/rest/v1/cse.siterestrict/list).
My first question is: Should I expect the same results from the API as I should from using Google's advanced search in my browser?
For example, when I search "University of Utah tuition cost" in-browser, I get the same results as when I run the HTTP request "https://www.googleapis.com/customsearch/v1?key=MY_KEY&cx=MY_PROGRAMMABLE_SEARCH_ENGINE&q=university%20of%20utah%20tuition%20cost"
When I search ""University of Utah tuition cost"" (double-quotes to imply exact word or phrase search) in-browser, I get the same or very similar results as when I run the request "https://www.googleapis.com/customsearch/v1?key=MY_KEY&cx=MY_PROGRAMMABLE_SEARCH_ENGIN&exactTerms=university%20of%20utah%20tuition%20cost"
However, when I combine a search with a date filter (under tools, selecting "past year" or "past month" etc.), I do not get the same results as when I run "https://www.googleapis.com/customsearch/v1?key=MY_KEY&cx=MY_PROGRAMMABLE_SEARCH_ENGINE&exactTerms=university%20of%20utah%20tuition%20cost&dateRestrict=y[1]". A browser search gives 0 results, however, my API search gives at least 10 results.
Is my date filter even working? Should I expect the same (or similar) results to a normal Google browser search?
Thank you!

Getting the number of results of a google search using python in 2020?

There were solutions provided before, but they don't work anymore :
extract the number of results from google search
for example the above code doesn't work anymore because the number of results doesn't seem to even be in the respond, there is no resultStats ID, in my browser the result is in the id of "result-status" but this doesn't exist in the respond
I don't want to actually use the API of google because there is a big limit on daily search, and i need to search for thousands of words daily, what is the solution for me?

Twitter scraping in Python

I have to scrape tweets from Twitter for a specific user (#salvinimi), from January 2018. The issue is that there are a lot of tweets in this range of time, and so I am not able to scrape all the ones I need!
I tried multiple solutions:
1)
pip install twitterscraper
from twitterscraper import query_tweets_from_user as qtfu
tweets = qtfu(user='matteosalvinimi')
With this method, I get only a few teets (500~600 more or less), instead of all the tweets... Do you know why?
2)
!pip install twitter_scraper
from twitter_scraper import get_tweets
tweets = []
for i in get_tweets('matteosalvinimi', pages=100):
tweets.append(i)
With this method I get an error -> "ParserError: Document is empty"...
If I set "pages=40", I get the tweets without errors, but not all the ones. Do you know why?
Three things for the first issue you encounter:
first of all, every API has its limits and one like Twitter would be expected to monitor its use and eventually stop a user from retrieving data if the user is asking for more than the limits. Trying to overcome the limitations of the API might not be the best idea and might result in being banned from accessing the site or other things (I'm taking guesses here as I don't know what's the policy of Twitter on the matter). That said, the documentation on the library you're using states :
With Twitter's Search API you can only sent 180 Requests every 15 minutes. With a maximum number of 100 tweets per Request this means you can mine for 4 x 180 x 100 = 72.000 tweets per hour.
By using TwitterScraper you are not limited by this number but by your internet speed/bandwith and the number of instances of TwitterScraper you are willing to start.
then, the function you're using, query_tweets_from_user() has a limit argument which you can set to an integer. One thing you can try is changing that argument and seeing whether you get what you want or not.
finally, if the above does not work, you could be subsetting your time range in two, three ore more subsets if needed, collect the data separately and merge them together afterwards.
The second issue you mention might be due to many different things so I'll just take a broad guess here. For me, either setting pages=100 is too high and by one way or another the program or the API is unable to retrieve the data, or you're trying to look at a hundred pages when there is less than a hundred in pages to look for reality, which results in the program trying to parse an empty document.

speeding up dynamic web page generation in python django

I am learning django now and I am learning to but a price comparison web app utilizing Amazon advertising API. My overall work flow is as below:
user submit a keyword for query
the keyword is being searched in Amazon for listing some items, currently 10
each item was assigned an unique number by Amazon, which the number is submitted to Amazon again for extracting additional information about the item such as a thumbnail, price. This will be done by a for-loop.
Information from the first and second query to Amazon will be listed on the web page and returned to the user.
The flow is now working, but I found the webapp to be quite slow in the django development server: I search for a book and it takes Amazon more than 10 seconds to extract information for the first 10 item searched, and display all the pictures all at once.
I am aware of the requirement from Amazon that we cannot make more than 1 request per second, but in view of other price comparison website working much faster, I am wondering if there is any programmatic mean for optimizing it.
I have thought of several ways:
caching product information
speeding up the for-loop in step 3
display the text in the page to user first, and load the picture later.
Method 1 is doable, I am not sure if method 2 or 3 is feasible.
Can anyone give any hints?

Categories

Resources