Right way to publish authors on PyPi from setuptools

Right way to publish authors on PyPi from setuptools - python

I currently use setuptools to build my Python's package and I have declared the two authors that way in my pyproject.toml file:
authors = [
{name = "X Y", email = "x.y#tt.net"},
{name = "Z H", email = "z.h#tt.net"},
]
Everything works and I can publish it on PyPI but only the first author is published. How can I display both authors.
I have tried to use the following syntax
authors = ["X Y <x.y#tt.net>, Z H <z.h#tt.net>"]
But I have the following error
ValueError: invalid pyproject.toml config: `project.authors[{data__authors_x}]`.
configuration error: `project.authors[{data__authors_x}]` must be object
Notice that I specify:
[build-system]
requires = ["setuptools","numpy","scipy","wheel"]
build-backend = "setuptools.build_meta"

Your original notation is the correct one:
authors = [
{name = "X Y", email = "x.y#tt.net"},
{name = "Z H", email = "z.h#tt.net"},
]
but there are some issues that are out of your control.
On one hand it is not entirely clear how this should translate into the Core Metadata notation, which is the notation used inside the distribution artifacts (wheel), and which is then extracted and displayed by PyPI.
On the other hand, the build back-ends (setuptools included) are not explicit about how they transform from pyproject.toml notation to Core Metadata notation, and they tend to silently pick the first item of the list and ignore the following ones.
References:
https://packaging.python.org/en/latest/specifications/declaring-project-metadata/#authors-maintainers
https://packaging.python.org/en/latest/specifications/core-metadata/#author
https://packaging.python.org/en/latest/specifications/core-metadata/#author-email
https://discuss.python.org/t/the-author-maintainer-distinction-problem-and-pep-621/4562
https://discuss.python.org/t/pep-621-round-3/5472/72
https://discuss.python.org/t/pep-621-round-3/5472/86
https://discuss.python.org/t/pep-621-round-3/5472/91

Related

How to Extract Versions from Software Packages

I'm trying to extract the version number from software packages hosted on SourceForge based on this Stack Overflow post. Specifically, I'm using the Release API and the "best_release.json" call. I have the following examples:
7-zip: https://sourceforge.net/projects/sevenzip/best_release.json
KeePass: https://sourceforge.net/projects/keepass/best_release.json
OpenOffice.org:
https://sourceforge.net/projects/openofficeorg.mirror/best_release.json
Using the following code snippet:
import requests
"""
Un/comment the following lines to change the project name and test
different responses.
"""
proj = "keepass"
# proj = "sevenzip"
# proj = "openofficeorg.mirror"
r = requests.get(f'https://sourceforge.net/projects/{proj}/best_release.json')
json_resp = r.json()
print(json_resp['release']['filename'])
I receive the respective results for each package:
7-Zip: /7-Zip/22.00/7z2200-linux-x86.tar.xz
KeePass: /KeePass 2.x/2.51.1/KeePass-2.51.1.zip
Openoffice.org: /extended/iso/en/OOo_3.3.0_Win_x86_install_en-US_20110219.iso
I'm wondering how I can extract the file versions from these disparate packages. Looking at the results, one can see that there are different naming conventions. For example, 7-Zip puts the file version as "22.00" in the second directory level. KeePass, however, puts it in the second directory level as well as the filename itself. OpenOffice.org puts it inside the filename.
Is there a way to do some sort of fuzzy match that can attempt to extract a "best guess" file version given a filename?
I thought of using regular expressions, re. For example, I can use the (\d+) capture group to capture one or more digits, as demonstrated here. However, this would also capture text such as "x86," which I don't want. I just desire some text that looks closest to a version number, but I'm unsure how to do this.

PEP508: why either version requirement or URL but not both?

When configuring install_requires=[...] in a setup.py file, we can specify either version numbers:
package >= 1.2.3
or a source:
package # git+https://git.example.com/some/path/to/package#master#egg=package
But I did not manager to specify both, I got an error for everything I tried.
Looking at the PEP 508, it looks like it is intended:
specification = wsp* ( url_req | name_req ) wsp*
where wsp* just means optional whitespace.
Did I get it correctly that it is not possible to write something like this?
package >= 1.2.3 # git+https://...
What is the reason for this decision?

I believe this is because getting a python package from a URL/Github does not have a way to get historical builds/packages like you would via packages stored via PyPi.
Github/URLs references a single snapshot of code, you could sort of simulate getting specific versions if you have tags or release branches in GitHub and update the URL to reference those versions:
git+https://git.example.com/some/path/to/package#master#egg=package
git+https://git.example.com/some/path/to/package#develop#egg=package
git+https://git.example.com/some/path/to/package#1.4.2#egg=package

search for all variations of a package name with python

given a package name, i want to search known websites to see if that package exists. the problem is, some packages have capitalized letters, titled letters, small case letters and so on.
for instance, suppose i have a package called: lp-Solve-1.0.tar.gz
and generally, i can find this package on a site like this:
https://www.example.com/packages/l/lp-Solve/lp-Solve-1.0.tar.gz
other times, the website(s) may have the package named with a different case letter:
https://www.example.com/packages/L/lp-solve/lp-solve-1.0.tar.gz
or like this:
https://www.example.com/packages/L/LP-SOLVE/LP-SOLVE-1.0.tar.gz
As you can see here, there are many different ways the package can be named on a website.
I have no clue where to begin with this.
I was using this code:
#!/usr/bin/env python2.7
listOfWebsites = [ website1, website2, website3, website4, and so on ]
goodWebsites = []
for eachWebsite in listOfWebsites:
genURL = eachWebsite + "/" + packageName
res = requests.head(genUrl)
if res.ok:
goodWebsites.append(genURL)
but where Im stuck at is how to search the websites for all the possible names a package could be called.

Replacing a variable name in text with the value of that variable

I have a template that uses placeholders for the varying content that will be filled in. Suppose the template has:
"This article was written by AUTHOR, who is solely responsible for its content."
The author's name is stored in the variable author.
So I of course do:
wholeThing = wholeThing.replace('AUTHOR', author)
The problem is I have 10 of these self-named variables, and it would just be more economical if I could something like this, using only 4 for brevity:
def(self-replace):
...
return
wholeThing = wholeThing.self-replace('AUTHOR', 'ADDR', 'PUBDATE', 'MF_LINK')

With Python 3.6+, you may find formatted string literals (PEP 498) efficient:
# data from #bohrax
d = {"publication": "article", "author": "Me"}
template = f"This {d['publication']} was written by {d['author']}, who is solely responsible for its content."
print(template)
This article was written by Me, who is solely responsible for its content.

Sounds like what you need is string formatting, something like this:
def get_sentence(author,pud_date):
return "This article was written by {}, who is solely responsible for its content. This article was published on {}.".format(author,pub_date)
Assuming you are parsing the variables that make up the string iteratively, you can call this function with the arguments needed and get the string returned.
That str.format() function can be placed anywhere and can take any number of arguments as long as there is a place for it in the string indicated by the {}. I suggest you play around with this function on the interpreter or ipython notebook to get familiar with it.

If you have control over the templates I would use str.format and a dict containing the variables:
>>> template = "This {publication} was written by {author}, who is solely responsible for its content."
>>> variables = {"publication": "article", "author": "Me"}
template.format(**variables)
'This article was written by Me, who is solely responsible for its content.'
It is easy to extend this to a list of strings:
templates = [
"String with {var1}",
"String with {var2}",
]
variables = {
"var1": "value for var1",
"var2": "value for var2",
}
replaced = [template.format(**variables) for template in templates]

Writing metadata to a pdf using pyobjc

I'm trying to write metadata to a pdf file using the following python code:
from Foundation import *
from Quartz import *
url = NSURL.fileURLWithPath_("test.pdf")
pdfdoc = PDFDocument.alloc().initWithURL_(url)
assert pdfdoc, "failed to create document"
print "reading pdf file"
attrs = {}
attrs[PDFDocumentTitleAttribute] = "THIS IS THE TITLE"
attrs[PDFDocumentAuthorAttribute] = "A. Author and B. Author"
PDFDocumentTitleAttribute = "test"
pdfdoc.setDocumentAttributes_(attrs)
pdfdoc.writeToFile_("mynewfile.pdf")
print "pdf made"
This appears to work fine (no errors to the consoled), however when I examine the metadata of the file it is as follows:
PdfID0:
242b7e252f1d3fdd89b35751b3f72d3
PdfID1:
242b7e252f1d3fdd89b35751b3f72d3
NumberOfPages: 4
and the original file had the following metadata:
InfoKey: Creator
InfoValue: PScript5.dll Version 5.2.2
InfoKey: Title
InfoValue: Microsoft Word - PROGRESS ON THE GABION HOUSE Compressed.doc
InfoKey: Producer
InfoValue: GPL Ghostscript 8.15
InfoKey: Author
InfoValue: PWK
InfoKey: ModDate
InfoValue: D:20101021193627-05'00'
InfoKey: CreationDate
InfoValue: D:20101008152350Z
PdfID0: d5fd6d3960122ba72117db6c4d46cefa
PdfID1: 24bade63285c641b11a8248ada9f19
NumberOfPages: 4
So the problems are, it is not appending the metadata, and it is clearing the previous metadata structure. What do I need to do to get this to work? My objective is to append metadata that reference management systems can import.

Mark is on the right track, but there are a few peculiarities that should be accounted for.
First, he is correct that pdfdoc.documentAttributes is an NSDictionary that contains the document metadata. You would like to modify that, but note that documentAttributes gives you an NSDictionary, which is immutable. You have to convert it to an NSMutableDictionary as follows:
attrs = NSMutableDictionary.alloc().initWithDictionary_(pdfDoc.documentAttributes())
Now you can modify attrs as you did. There is no need to write PDFDocument.PDFDocumentTitleAttribute as Mark suggested, that one won't work, PDFDocumentTitleAttribute is declared as a module-level constant, so just do as you did in your own code.
Here is the full code that works for me:
from Foundation import *
from Quartz import *
url = NSURL.fileURLWithPath_("test.pdf")
pdfdoc = PDFDocument.alloc().initWithURL_(url)
attrs = NSMutableDictionary.alloc().initWithDictionary_(pdfdoc.documentAttributes())
attrs[PDFDocumentTitleAttribute] = "THIS IS THE TITLE"
attrs[PDFDocumentAuthorAttribute] = "A. Author and B. Author"
pdfdoc.setDocumentAttributes_(attrs)
pdfdoc.writeToFile_("mynewfile.pdf")

DISCLAIMER: I'm utterly new to Python, but an old hand at PDF.
To avoid smashing all the existing attributes, you need to start attrs with pdfDoc.documentAttributes, not {}. setDocumentAttributes is almost certainly an overwrite rather than a merge (given your output here).
Second, all the PDFDocument*Attribute constants are part of PDFDocument. My Python ignorance is undoubtedly showing, but shouldn't you be referencing them as attributes rather than as bare variables? Like this:
attrs[PDFDocument.PDFDocumentTitleAttribute] = "THIS IS THE TITLE"
That you can assign to PDFDocumentTitleAttribute leads me to believe it's not a constant.
If I'm right, your attrs will have tried to assign numerous values to a null key. My Python is weak, so I don't know how you'd check that. Examining attrs prior to calling pdfDoc.setDocumentAttributes_() should be revealing.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.