YAPF Formatting on Commits

YAPF Formatting on Commits - python

I am currently working on a tool being used for Python developers, who use Spyder IDE. We want a consistent format. They are not willing to change IDE's that have plugins to automatically do this.
I have been testing the YAPF library, and am looking to find a way that anytime that a commit or push happens to GitLab, it automatically formats it in this way.
Do I need some workflow? Is this considered simular to CI Pipelines? I am unsure how to tackle this.
Any feedback is helpful, and greatly appreciated.

I have been testing the YAPF library, and am looking to find a way that anytime that a commit or push happens to GitLab, it automatically formats it in this way.
You don't want to do that: you don't want your CI environment to modify commits, because either (a) you'll end up duplicating every commit with a second "fix the formatting commit", or (b) you'll make the history of your gitlab repository diverge from the history of the person who submitted the change (if you replace the commit).
The typical solution here is to update your CI to reject pull-/merge-requests that don't meet your formatting standards. At the same time, provide developers with a tool they can run locally that will apply the correct formatting to files before they submit a change request.
If your developers are committing directly to the repository, rather than operating through merge requests, your options are far more limited (at that point, you have a social problem, not a technical problem).

Related

Pip Whl naming conventions for git branches

I feel like I am doing something wrong. We have some projects that produce pip packages in CI whenever we push a commit. I am using setuptools_scm to produce a version number based upon the last tag. I have two problems that I am struggling to solve.
Let's say we have a scenario where two developers are working in two different feature branches. Whenever either of them commits their code, our CI produces a new pip package and pushes it to a development pypi server. The version contains information about the previous tag and the commit hash, but it doesn't contain any information about the feature branch that produced it. If I look at the pypi server I will see packages from both developers. As far as I can see, I can't tell which packages came from which feature branch without significant effort.
If someone wants to test out the feature branch, then they need to figure out the exact version number produced by setuptools_scm - something like package-0.1.dev41+gabcdef12. This is painful to communicate everytime someone pushes a new commit. It would be nice if the branch name was somehow part of the version. (Something like package-0.1.branch.dev41+gabcdef12 Then the user could do a pip install package==0.1.branch to get the latest from my branch. But I see that this is not a valid version.)
I've looked at https://the-hitchhikers-guide-to-packaging.readthedocs.io/en/latest/specification.html and the various PEPs that it references. The only place where I could reasonable put a branch name would be in the local section. This would solve the first problem. I could easily see which feature branch each package came from. But it doesn't help me with testing out a feature branch.
I know that I could produce an alpha/beta/rc tag and use that. But this doesn't map to their intended use. An rc would normally have several commits from many feature branches that were merged since the last release, not a new rc for every commit on a feature branch.
I know that I'm not the only one using git and pip packages. Since I can't find a solution to the problem, I worry that I might be thinking about it wrong. Are there commonly used or standardized ways to handle these issues?

For those who may come across this in the future, I think the best solution is to not package feature branches. Pip allows us to install from a feature branch via pip install git+${REPO_URL}#branch syntax. This syntax works on the command line, and requirements.txt files, and with tools like pip-compile. The user can tie themselves to the head of a particular branch or a specific commit.
The syntax is not the easiest to remember, but it does a very effective job of allowing me to share a feature branch. When I want to make a release, I can then tag the repo and create a package that is more publicly consumable.

how to let pre-commit only check the python codes you changed, not the whole file

I used pre-commit to check my python code style before I commit to git.
It works very well. But now I am working on an old project, most of my work just modify several lines of the file. But pre-commit always ask me fix all the pep8 issues of the file.
is there a way that let the pre-commit only check the codes I just modified not the whole file?

there is not in general. pre-commit operates on the file level and does not parse the underlying tool's output in any way
if an underlying tool supports partial changes then you can use that individual tool's features, but doing so at the framework level is difficult, error prone (there is no standardization in tool output), and most of all slow (re-parsing output and diffs is not cheap)
when adapting to an older code base, you're best to take a first pass which improves the code quality and then enable your linting tools. often you can leverage a code formatter such as autopep8 or black to make adoption easier
disclaimer: I'm the author of pre-commit

How to organize and categorize small projects in GitHub?

I'm new to development and trying to upload small projects that I've worked on to my GitHub profile. These projects are not dependent on each other.
My issue is that some of them are small single-file projects. Sort of like mini challenges that I've solved. So I'm thinking of grouping them together under one repo called "Python programming", for example.
Is this a good practice?
If yes, how should I go about it in Git, and
how can I still have a README file showing up for each mini project.
If no, what would you recommend doing?

GitHub will render a README file for every folder you visit, so when using just a single repository, one solution would be to still create one sub folder for each “subproject” that as such can have its own README file.
But before going that route, you should think about if those small projects actually belong together. That’s ultimately what should decide whether you want to put them all in the same repository or whether you want to split it up into several repositories.
Some things to consider for that decision:
If the projects do not depend on another, do they still relate to another? For example, are those projects part of a bigger programming challenge like Project Euler and you’re just collecting all your solutions? Then a single repository might make more sense.
What is the chance for individual projects to grow into bigger things? Many things start very small but can eventually grow into real things that justify their own repository. At that point, you might even get others to contribute.
Does it make sense for those individual files to share a history? Are the files even going to be edited once they are “done”? I.e. is this just a collection of finished things, or are they actually ongoing experiments?
Ultimately, it comes down to your personal choice. But GitHub, as the repository hoster, should not be driving your decision. You should create Git repositories locally as it makes sense to you. If that means you just have a single one, that’s fine. If that means you create lots of them, that’s also fine.
Unfortunately, the GitHub UI is not really made for small one-off projects. The repository list is just to unorganized for that. If you decide to use small projects, I advise you to add some prefix for categorization within your GitHub profile, so you know what this is about.
A good alternative for one-off projects, especially when it’s just a single (or a few) files are Gists. Gists are born as a way to share code snippets but under the hood, every Gist is actually a full Git repository. Of course, Gists do not offer the tools normal repositories on GitHub have (e.g. issues, pull requests, wikis). But for what you describe, you probably need neither of those. Then, Gists are a fine way to share simple things without adding full repositories to your profile. And you can still clone them (the remote URL is git#gist.github.com:/<gist-id>.git) and have a full history and support for multiple files if you need those.

Commonly, you'll see that the top level of the repo contains the README file, maybe a setup.py and some other extraneous information, and perhaps a tests folder. Then there will be a folder that shares a name with the repo. Inside of that folder is the code that's intended to be core content of the module/package/script.
It's also not unusual to see different organization, particularly with very small projects of single-file scripts.
For the specific case you mention, do whatever you like. What you propose sounds totally reasonable to me. I would not want to have a separate repo for all the challenges I solve!
I usually use a gist for trivial items I don't necessarily want to make a repo for, including coding challenges. So I would offer that as an alternative. Do whatever suits you best.

How can I get notified of updates to Python packages in a unified way?

I work on a Python web application with 20+ dependency packages. I'd love to somehow get feeds of updates for all of these packages, so that I could look at their changelogs and update quickly if the package fixes important bugs or potential security vulnerabilities. Is there a way for me to do this without hunting down the project homepage RSS feed (if one exists) for all 20 packages individually?
Ideally I'd like to be able to slurp our requirements.txt file and construct a feed automatically, but I'd settle for just manually curating a list of RSS feeds or manually subscribing some email address to a bunch of email lists.

Maybe https://requires.io/ is interesting to you?
It shows the status of all your dependencies and alerts you for security/outdated ones as well.
See here for example https://requires.io/github/edx/edx-platform/requirements/?branch=master

Github seems to have launched this feature (for other languages too!) since I posted this. See here.

If the Python packages is on PyPI, you can use libraries.io to select packages you want release emails for. This answer to a related question has additional details.

Is there any patch tools that work well with mercurial, when patch-based workflows fail causing .rej hunks in your repo

I am looking for a better patch tool than the ones built into Mercurial, or a visual tool to help me edit patches so that they will get accepted into Mercurial or Gnu patch.
The mercurial wiki has a topic HandlingRejects which shows how simple it is to have Patching fail. I am looking to implement a mercurial based workflow for feature branch version control that is based on feature branches, and relies on exporting and reviewing patches before they are integrated. In my initial testing of this idea, the weakest link in my "patch out", and "review and accept and modify" patches is the way that patch rejection shuts me down.
Here are some common cases where mercurial patch imports fail on me:
Trivial changes to both upstream repoA and feature branch repoB, where a single line is added somewhere, on both branches. Since both branches have a history, it should be possible for a merge tool to see that "someone added one line in repoA, and someone else added one line in repoB". In the case of patch imports, though, this results in a patch import reject and a .rej file turd in your repository which you ahve to manually repair (by editing the .rej file until it can be applied).
The wiki page above mentions the mpatch tool that is found here. I am looking for other Better Merge Tools that (a) work with mercurial, and (b) can handle the trivial case noted in the Handling Rejects wiki page above. Note that mpatch does not work for my purposes, it seems I need something that is more of a rebase tool than a patch tool, and in my case, I may have to make the patch tool be syntax-aware (and thus specific to a single programming language).
I am looking for tools available for working on Windows, either natively, or even via something like cygwin. I am not using a Unix/Linux environment, although I am comfortable with Linux/Unix style tools.
I am not currently using the mq extensions, just exporting ranges of changes using hg export, and importing using hg import, and the rest of the work is my own inventions, however I have tagged this mq, as mq users will be familiar with this .rej handling problem.
Related question here shows ways of resolving such problems while using TortoiseHg.

Emacs is quite capable of handling .rej files.
However, if at all practical, I try to use hg pull --rebase whenever possible. Often I find myself wanting to rebase some lineage of patches onto another changeset that I've already pulled. In these cases I just strip the changeset and pull it in again from .hg/strip-backup, allowing me to use --rebase.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.