Wikipedia API Wrapper WPTOOLS - python

Usually, when using wptools, an API wrapper for Python I get a 'mostviewed' dictionary under 'data' when I do something like this:
site = wptools.site('en.wikipedia.org') or
blah = site.get_info('en.wikipedia.org') or
fdgfdg= site.top('en.wikipedia.org')
The dictionary (under data) returned contains the most viewed pages for, in this case the English version of wikipedia.org
but lately, the dictionary is empty (I changed nothing in the original version and created a new project to test from scratch but nothing). What can I do to regain this functionality cleanly and quickly in Python 3.7?

This seemed to resolve within 24 hours so I assumed A. I had hit some rate limit as caching wasn't happening under development version. or
B. wptools or wikipedia API was undergoing some change.
...but I can't see any rate limits for this usage and issue ocurring again.
Here is Python code I am using resulting in an empty list returned [ ]
wptools_site_object = wptools.site('en.wikipedia.org')
wptools_site_object.top('en.wikipedia.org')
mostviewedtopics = wptools_site_object.data['mostviewed']

Related

Repeat a Python Function - execute the same number of times as entries in a list

Recently I found a python module for one of our COTS products (Tenable) here at the office. It was developed to help access the product's API. I have it installed in my lab here alongside Python 3.9.10 and based on my observations it is working ok. No problems executing simple code.
This product hosts file repositories that manage uploads from thousands of users. The Python module contains a specific function for making edits to those repositories. That function is very specific about the type of Python object that can be specified. For example, the id number has to be an integer. The name has to be a string, and the allowed IPs have to be a list, etc.
The exact challenge I am facing here is that I need to perform edits on 12 repositories. The data is stored in an external JSON file that I access via json.load.
I can successfully perform an edit (using information in the JSON) on a single repository.
This is how I did it:
x = sc.repositories.edit(repository_id=(repo_id[-1]), name=repo_name[-1], allowed_ips=(ip_list[-1]))
sc.repositories.edit is defined in the module. repo_id, repo_name, and allowed_ips are lists of data that come from the JSON. I then used the position [-1] to tell Python to plug in the last entries in the list. The API considers this a PATCH
This seems to work as expected. I can log into the web application and see the changes.
...but is there a way to repeat this function until it has sent the updates for all 12 repositories?
I can successfully send this using a [0], [-1], or other position. But the module won't accept slices and I don't know how to loop something this complicated.
Thanks in advance.
If I understood you correctly, you could use zip, or even just a simple range-based for-loop:
for current_repo_id, current_repo_name, current_ips in zip(repo_id, repo_name, ip_list):
x = sc.repositories.edit(
repository_id=current_repo_id,
name=current_repo_name,
allowed_ips=current_ips
)

How does searching with pip work?

Yes, I'm dead serious with this question. How does searching with pip work?
The documentation of the keyword search refers to a "pip search reference" at https://pip.pypa.io/en/stable/user_guide/#searching-for-packages which is everything but a reference.
I can't conclude from search attempts how searching works. E.g. if I search for "exec" I get a variety of results such as exec-pypeline (0.4.2) - an incredible python package. I even get results with package names that have nothing to do with "exec" as long as the term "exec" is in the description.
But strangely I don't see one of my own packages in the list though one of the packages contains exec in it's name. That alone now would lead us to the conclusion that pip (at least) searches for complete search terms in the package description (which my package doesn't have).
But building on that assumption if I search for other terms that are provided in the package description I don't get my package listed either. And that applies to other packages as well: E.g. if I search for "projects" I don't get flask-macros in the result set though the term "projects" clearly exists in the description of flask-macros. So as this contradicts the assumption above this is clearly not the way how searching works.
And interestingly I can search for "macro" and get "flask-macros" as a result, but if I search for "macr" "flask-macros" is not found.
So how exactly is searching performed by pip? Where can a suitable reference be found for this?
pip search looks for substring contained in the distribution name or the distribution summary. I can not see this documented anywhere, and found it by following the command in the source code directly.
The code for the search feature, which dates from Feb 2010, is still using an old xmlrpc_client. There is issue395 to change this, open since 2011, since the XML-RPC API is now considered legacy and should not be used. Somewhat surprisingly, the endpoint was not deprecated in the pypi-legacy to warehouse move, as the legacy routes are still there.
flask-macros did not show up in a search for "project" because this is too common a search term. Only 100 results are returned, this is a hardcoded limit in the elasticsearch view which handles the requests to those PyPI search routes. Note that this was reduced from 1000 fairly recently in PR3827.
Code to do a search with an API client directly:
import xmlrpc.client
client = xmlrpc.client.ServerProxy('https://pypi.org/pypi')
query = 'project'
results = client.search({'name': query, 'summary': query}, 'or')
print(len(results), 'results returned')
for result in sorted(results, key=lambda data: data['name'].lower()):
print(result)
edit: The 100 result limit is now documented here.

Dota 2 API (dota2api) library in Python

Does anyone have experience with the Dota 2 API library in Python called 'dota2api'? I wish to pull a list of 200 recent games filtered by various criteria. I'm using the get_match_history() request (see link). Here's my code:
import dota2api
key = '<key>'
api = dota2api.Initialise(key)
match_list = api.get_match_history(matches_requested=200)
I haven't specified any filters yet, since I can't even get the matches_requested argument to work. When I run this code, I get exactly 100 matches. I fact, no matter how I specify the matches_requested argument, I allways get 100 matches.
Does anyone know if I'm specifying the argument wrong or some other reason why it's working as intended?
Thanks in advance.
For such rarely used libraries it is hard to get an answer here.
I have found this issue on the library's Github:
You can't get more than 500 matches through get_match_history, it's
limited by valve api. One approach you can do is alternate hero_id,
like, requesting with account_id, hero_id and start_at_match_id (none
if first request), values assigned, this way you can get at least 500
matches of each hero from that account_id.
Probably that has since changed and now the parameter is ignored by the API completely. Try creating a new issue on the Github.

HTML/Python Page Cataloging System

I've been taking a Coursera course from Duke University on Chemistry. I happen to take a lot of notes when watching the video lectures, and, well, ... there's a lot. I must have around 30 pages already, in no particular order, in my binder, and some form of sorting would be cool. I've come up with some Python 3.3 code, but it's incomplete. My goal: Be able to search() the database with a keyword, and be returned a list of the pages containing that word.
Anyway, here's my checklist:
√ Debug/Finish search() function
Add lookup() function
(Eventually) integrate full code into HTML...?
The lookup() function is my priority right now;
In its finished state, it'll return what binder & section a given page is in.
Any help or comments would be really helpful.   
Thanks!
EDIT: Here is the syncing Dropbox file.
Try a framework called Django. It uses Python to render HTML, combined with data retrieved from a SQL database.
You can combine a Django project with the code you are creating right now.

xgoogle python library is not working any more?

I was using the xgoogle python library for one of my projects. It was working fine till recently. I am not getting the resultset that I used to get before. If anyone who has used this library written by Peter Krummins, faced a similar situation, can you please suggest a work around ?
The presence of BeautifulSoup.py hints that this library uses web scraping to get its result.
A common problem with this is that it will easily break when the design/layout of the page being scraped changes. And the problem you see seems to coincide with the new search results layout that Google introduced just recently.
Another problem is that it often is against the terms of service of the site being scraped. And according to point 5.3 of the Google Terms Of Service it actually is:
You specifically agree not to access (or attempt to access) any of the Services through any automated means (including use of scripts or web crawlers) [...]
A better idea would be to use the Custom Search API.
Peter Krumin's product xgoogle looks to be extremely useful both to me and I image many others.
https://github.com/pkrumins/xgoogle
For me the current version is 1.3 is not working.
I tried a new install from GitHub, ran the examples and nothing is returned.
Adding a debugger to the source code and tracing the data captured in a query to its disappearance the problem occurs in a routine called search.py subroutine "_extract_results" at a parser command
results = soup.findAll('li', {'class': 'g'})
The soup object has material in it but the "findAll" fails to return anything.
Looks like its searching for lists and if there are none it returns nothing.
I am unsure what html you would try to match to get a result.
If anyone knows how to make the is work I am very interested.
A little more googling and it appears xgoogle is no longer supported or works.
Part of the trouble is that Google changes the layout of its results pages every so often and so any scraping software that assumes some standard layout is in time doomed to failure.
There are however other search engines that are locally installed and thus provide a results layout that are less likely change with upgrades and will not change at all if you don't upgrade.
I am currently investigating Yacy. Easy to install and can be pointed at specific sites if you want.

Categories

Resources