How does searching with pip work? - python

Yes, I'm dead serious with this question. How does searching with pip work?
The documentation of the keyword search refers to a "pip search reference" at https://pip.pypa.io/en/stable/user_guide/#searching-for-packages which is everything but a reference.
I can't conclude from search attempts how searching works. E.g. if I search for "exec" I get a variety of results such as exec-pypeline (0.4.2) - an incredible python package. I even get results with package names that have nothing to do with "exec" as long as the term "exec" is in the description.
But strangely I don't see one of my own packages in the list though one of the packages contains exec in it's name. That alone now would lead us to the conclusion that pip (at least) searches for complete search terms in the package description (which my package doesn't have).
But building on that assumption if I search for other terms that are provided in the package description I don't get my package listed either. And that applies to other packages as well: E.g. if I search for "projects" I don't get flask-macros in the result set though the term "projects" clearly exists in the description of flask-macros. So as this contradicts the assumption above this is clearly not the way how searching works.
And interestingly I can search for "macro" and get "flask-macros" as a result, but if I search for "macr" "flask-macros" is not found.
So how exactly is searching performed by pip? Where can a suitable reference be found for this?

pip search looks for substring contained in the distribution name or the distribution summary. I can not see this documented anywhere, and found it by following the command in the source code directly.
The code for the search feature, which dates from Feb 2010, is still using an old xmlrpc_client. There is issue395 to change this, open since 2011, since the XML-RPC API is now considered legacy and should not be used. Somewhat surprisingly, the endpoint was not deprecated in the pypi-legacy to warehouse move, as the legacy routes are still there.
flask-macros did not show up in a search for "project" because this is too common a search term. Only 100 results are returned, this is a hardcoded limit in the elasticsearch view which handles the requests to those PyPI search routes. Note that this was reduced from 1000 fairly recently in PR3827.
Code to do a search with an API client directly:
import xmlrpc.client
client = xmlrpc.client.ServerProxy('https://pypi.org/pypi')
query = 'project'
results = client.search({'name': query, 'summary': query}, 'or')
print(len(results), 'results returned')
for result in sorted(results, key=lambda data: data['name'].lower()):
print(result)
edit: The 100 result limit is now documented here.

Related

Scatterplot in seaborn 0.8

I'm on a remote server where I have access to seaborn=0.8.1 and have no way of updating the package. I'd like to use seaborn.scatterplot, but I'm getting an error that the function does not exist.
Either the function did not exist in the older version, or it had a different name. I am unable to find earlier versions of the documentation (numpy does a great job of providing manuals for earlier versions), so I am kind of stuck here.
How do I find out the API for earlier versions of seaborn, and whether there are older alternatives to scatterplot?
You can use the GitHub repo to find documentation of older versions - just navigate to the correct tag that corresponds to your version, and enter the doc directory -
https://github.com/mwaskom/seaborn/tree/v0.8.1/doc
It's very common for documentation to be stored alongside the code and versioned along with it.
It seems the scatterplot function was added in 0.9 to seaborn/relational.py, which has the line:
__all__ = ["relplot", "scatterplot", "lineplot"], so "seaborn.scatterplot" gets "created" when seaborn/__init__.py performs from .relational import *.
I'm not sure if there's a similar function in v0.8.1. A quick search shows that class class _RegressionPlotter(_LinearPlotter) has a method called scatterplot in version 0.8.1, but I don't think it's the same (not familiar enough with Seaborn to know).

Wikipedia API Wrapper WPTOOLS

Usually, when using wptools, an API wrapper for Python I get a 'mostviewed' dictionary under 'data' when I do something like this:
site = wptools.site('en.wikipedia.org') or
blah = site.get_info('en.wikipedia.org') or
fdgfdg= site.top('en.wikipedia.org')
The dictionary (under data) returned contains the most viewed pages for, in this case the English version of wikipedia.org
but lately, the dictionary is empty (I changed nothing in the original version and created a new project to test from scratch but nothing). What can I do to regain this functionality cleanly and quickly in Python 3.7?
This seemed to resolve within 24 hours so I assumed A. I had hit some rate limit as caching wasn't happening under development version. or
B. wptools or wikipedia API was undergoing some change.
...but I can't see any rate limits for this usage and issue ocurring again.
Here is Python code I am using resulting in an empty list returned [ ]
wptools_site_object = wptools.site('en.wikipedia.org')
wptools_site_object.top('en.wikipedia.org')
mostviewedtopics = wptools_site_object.data['mostviewed']

Dota 2 API (dota2api) library in Python

Does anyone have experience with the Dota 2 API library in Python called 'dota2api'? I wish to pull a list of 200 recent games filtered by various criteria. I'm using the get_match_history() request (see link). Here's my code:
import dota2api
key = '<key>'
api = dota2api.Initialise(key)
match_list = api.get_match_history(matches_requested=200)
I haven't specified any filters yet, since I can't even get the matches_requested argument to work. When I run this code, I get exactly 100 matches. I fact, no matter how I specify the matches_requested argument, I allways get 100 matches.
Does anyone know if I'm specifying the argument wrong or some other reason why it's working as intended?
Thanks in advance.
For such rarely used libraries it is hard to get an answer here.
I have found this issue on the library's Github:
You can't get more than 500 matches through get_match_history, it's
limited by valve api. One approach you can do is alternate hero_id,
like, requesting with account_id, hero_id and start_at_match_id (none
if first request), values assigned, this way you can get at least 500
matches of each hero from that account_id.
Probably that has since changed and now the parameter is ignored by the API completely. Try creating a new issue on the Github.

Plugin not working on MusicBrainz v1.2

As an exercise for learning, I am trying to write a plugin for MusicBrainz that matches the albumartistsort to albumartist and artistsort to artist, as opposed to the (apparently) default of Last Name, First Name format it is currently using.
I am just learning about Python and therefore I am trying to use another plugin as a guide, but some important changes need to be done and that's where I probably screwed up.
When I try installing the plug in, it doesn't appear in the plugin list although it is copied to the plugin folder; and the .pyo file is not generated. I am guessing this is due to a compilation error, but I haven't been able to include whatever I need so I can use the picard module (don't know where to find it nor import it) so I can test in my python interpreter.
This is the code I have:
PLUGIN_NAME = "Sort Artist and Album Artist"
PLUGIN_AUTHOR = "Kevin Hernandez"
PLUGIN_DESCRIPTION = "Sorts artist/album artist by name as in Artist/Album Artist field instead of Last, First"
PLUGIN_VERSION = "0.1"
PLUGIN_API_VERSIONS = ["0.9.0", "0.10", "0.15", "0.16"]
from picard.metadata import register_album_metadata_processor
import re
def copy_albumartist_to_albumartistsort(tagger, metadata, release):
match = re.search($not($eq(metadata["albumartistsort"],metadata["albumartist"])))
if match:
metadata["albumartistsort"] = metadata["albumartist"]
def copy_artist_to_artistsort(tagger, metadata, release):
match = re.search($not($eq(metadata["artistsort"],metadata["artist"])))
if match:
metadata["artistsort"] = metadata["artist"]
register_album_metadata_processor(copy_albumartist_to_albumartistsort)
register_album_metadata_processor(copy_artist_to_artistsort)
and I also tried defining the functions as:
def copy_albumartist_to_albumartistsort(tagger, metadata, release):
metadata["albumartistsort"] = metadata["albumartist"]
def copy_artist_to_artistsort(tagger, metadata, release):
metadata["artistsort"] = metadata["artist"]
I must point out that I don't fully understand when these are called. I believe the plugin documentation here, here and here is not enough to follow the plugins they have there (e.g. the search and match methods they use in different plugins with re are not explained in the documentation links I am referring to.
If there's a more thorough documentation for it, you can pinpoint what I am doing wrong in my code, or know how to include the picard module in an interpreter (where to find it AND how to include it), then your comments are very much appreciated and valid answers to this question.
I think your biggest problem is that you're mixing up the plugin API with the tagger scripting language.
Tagger scripts are written in a simple custom language; plugins are written in Python. You can't mix and match syntax between the two languages. In particular:
match = re.search($not($eq(metadata["albumartistsort"],metadata["albumartist"])))
That $not, $eq, etc. doesn't mean anything in Python. If you want to check whether things are equal, you use the == operator. If you want to use re.search, you use regular expression syntax. And so on.
Also, your code has to be valid Python, with valid indentation. Your code, at least as posted here.
But let's go through your questions one by one:
I am trying to write a plugin for MusicBrainz that matches the albumartistsort to albumartist and artistsort to artist, as opposed to the (apparently) default of Last Name, First Name format it is currently using.
There are very few automatic defaults in MusicBrainz. Each artist has a name and a sort name, in the database, entered by a human user and verified by other users. You can see this from the web interface. For example, go to David Bowie, and in the "Artist Information" panel on the right side, you'll see "Sort name: Bowie, David". If you're not used to using MusicBrainz's web interface, you should explore it before trying to expand Picard.
When I try installing the plug in, it doesn't appear in the plugin list although it is copied to the plugin folder; and the .pyo file is not generated. I am guessing this is due to a compilation error
Yep. If you run Picard from the command-line with the -d flag, it will show you the errors instead of just silently disabling your plugin, so you don't have to guess. This is documented under Troubleshooting. (If you're on a Mac, the path will be something like /Applications/MusicBrainz Picard.app/Contents/MacOS/MusicBrainz Picard; I think the docs don't explain that because it's standard OS X app bundle stuff.)
but I haven't been able to include whatever I need so I can use the picard module (don't know where to find it nor import it) so I can test in my python interpreter.
You really can't test it in your interpreter. Picard bundles its own custom-built Python interpreter, rather than using your system Python. In that custom interpreter, the picard package is on the sys.path, but in your system Python interpreter, it's not. And trying to import that package and use things out of it while not actually running the Picard GUI wouldn't be a good idea anyway.
If you really want to explore what's in the picard package, download the source and run a local build of the code. But you really shouldn't need to do that. You shouldn't need any functions beyond those documented in the API, and if you want to debug things, you want to debug them in the right context, which generally means adding print functions and/or using the logging module in your code.
I must point out that I don't fully understand when these are called.
At some point after each album is downloaded from the MusicBrainz server, all registered album processor functions get called with the album, and all registered track processor function get called with each track on the album.
Notice that an album processor isn't going to be able to change track-level fields like the track artist sort; you will need a track processor for that.
e.g. the search and match methods they use in different plugins with re are not explained in the documentation links I am referring to.
That's because they're part of the Python standard library, which is documented as part of the standard Python docs—in this case, see re.
You're expected to know the basics of Python before being able to write a Picard plugin.
Meanwhile, I'm not sure what you were trying to write here, but it looks like a really convoluted attempt to say "if these two fields aren't equal, make them equal". Which does the same thing as "unconditionally make them equal". So, why even bother with the regexp and the if condition?
So, your functions could be simplified to:
def copy_albumartist_to_albumartistsort(tagger, metadata, release):
metadata["albumartistsort"] = metadata["albumartist"]
def copy_artist_to_artistsort(tagger, metadata, release, track):
metadata["artistsort"] = metadata["artist"]
register_album_metadata_processor(copy_albumartist_to_albumartistsort)
register_track_metadata_processor(copy_artist_to_artistsort)
However, you really don't need a plugin here at all. You should be able to write this whole thing as a trivial tagger script:
$set(artistsort,%artist%)
$set(albumartistsort,%albumartist%)

xgoogle python library is not working any more?

I was using the xgoogle python library for one of my projects. It was working fine till recently. I am not getting the resultset that I used to get before. If anyone who has used this library written by Peter Krummins, faced a similar situation, can you please suggest a work around ?
The presence of BeautifulSoup.py hints that this library uses web scraping to get its result.
A common problem with this is that it will easily break when the design/layout of the page being scraped changes. And the problem you see seems to coincide with the new search results layout that Google introduced just recently.
Another problem is that it often is against the terms of service of the site being scraped. And according to point 5.3 of the Google Terms Of Service it actually is:
You specifically agree not to access (or attempt to access) any of the Services through any automated means (including use of scripts or web crawlers) [...]
A better idea would be to use the Custom Search API.
Peter Krumin's product xgoogle looks to be extremely useful both to me and I image many others.
https://github.com/pkrumins/xgoogle
For me the current version is 1.3 is not working.
I tried a new install from GitHub, ran the examples and nothing is returned.
Adding a debugger to the source code and tracing the data captured in a query to its disappearance the problem occurs in a routine called search.py subroutine "_extract_results" at a parser command
results = soup.findAll('li', {'class': 'g'})
The soup object has material in it but the "findAll" fails to return anything.
Looks like its searching for lists and if there are none it returns nothing.
I am unsure what html you would try to match to get a result.
If anyone knows how to make the is work I am very interested.
A little more googling and it appears xgoogle is no longer supported or works.
Part of the trouble is that Google changes the layout of its results pages every so often and so any scraping software that assumes some standard layout is in time doomed to failure.
There are however other search engines that are locally installed and thus provide a results layout that are less likely change with upgrades and will not change at all if you don't upgrade.
I am currently investigating Yacy. Easy to install and can be pointed at specific sites if you want.

Categories

Resources