Self update a python script posted on github - python

I have made a bot that tracks the earning of a mining rig, but the script is running in a friend of mine house, therefore whenever I update it I have to send the new version to my friend. I have thought to load the program to github, inside the script periodically check if the repo that contains it has been modified, and if so, to update itself.
I have tried to pull the whole repo, launch it and destroy the hold files, but I dont think it's a good idea (if I have to deal with big files for examples).
I also have found a pypi function called selfupdate (https://pypi.org/project/selfupdate/) but there is not much documentation and I didn't get how to make it work
How can I manage to make the script update himself pulling the newer version from github?

Related

Converting python program to a usable exe safety

I have written my python program with a Gui. Now i wanna use this tool in my company.
for this I need an .exe so others can use it.
I know my code. But now i have to compile my code to an .exe file with a third party tool like pyinstaller or pyuic.
How can I be sure this open source tools are safe to use in my company without risking any hackers infiltrated this tools?
Is there any official way or tool to make a usable windows program from a py file?
The "official way" to load PyInstaller is done via the pip command.
Open source does not mean everybody can edit the code that ist officialy distributed. If you would edit a copy of a printed law in your house does not mean that the law changes for everyone. Official commits are reviewed and checked against malicious edits.
Did you ever question other Python packages you loaded into your machine? They are distributed the same way.
Malicious actors will sometimes make clones of packages and publish them with a similar name in order to get lucky when people makea typo in the pip command. This is something you should always check.

How to implement an autoupdate on exe files [duplicate]

I wrote 2-3 Plugins for pyload.
Sometimes they change and i let users know over forum that theres a new version.
To avoid that i'd like to give my scripts an auto selfupdate function.
https://github.com/Gutz-Pilz/pyLoad-stuff/blob/master/FileBot.py
Something like that easy to setup ?
Or someone can point me in a direction ?
Thanks in advance!
It is possible, with some caveats. But it can easily become very complicated. Before you know it, your auto-update "feature" will be bigger than the original code!
First you need to have an URL that always contains the latest version. Since you are using github, using raw.githubusercontent might do very well.
Have your code download the latest version from that URL (e.g. using requests), and compare the version with that in the current code. For this purpose I would recommend a simple integer version number, so you don't need any complicated parsing logic.
However, you might want to consider only running that check once per day, or once per week. If you do it every time your file is run, the server might get hammered! So now you have to save a file with the date when the check was last done, and read that to see if it is time to run the check again. This file will need to be saved in a location that you can access on every platform your code is liable to run on. That in itself can be a challenge.
If it is just a single python file, which is installed as the user that is running it, updating is relatively easy. But if the original was installed as root in the global Python directory and your script is running as a nonprivileged user it will be difficult. Especially if it is running as a plugin and cannot ask the user for (temporary) root credentials to install the file.
And what are you going to do if a newer version has more dependencies outside the standard library?
Last but not least, as a sysadmin I don't really like auto-updating software. Especially for critical system infrstructure I like to be able to estimate the consequences before an update.

Using setuptools, how can I download external data upon installation?

I'd like to create some ridiculously-easy-to-use pip packages for loading common machine-learning datasets in Python. (Yes, some stuff already exists, but I want it to be even simpler.)
What I'd like to achieve is this:
User runs pip install dataset
pip downloads the dataset, say via wget http://mydata.com/data.tar.gz. Note that the data does not reside in the python package itself, but is downloaded from somewhere else.
pip extracts the data from this file and puts it in the directory that the package is installed in. (This isn't ideal, but the datasets are pretty small, so let's assume storing the data here isn't a big deal.)
Later, when the user imports my module, the module automatically loads the data from the specific location.
This question is about bullets 2 and 3. Is there a way to do this with setuptools?
As alluded to by Kevin, Python package installs should be completely reproducible, and any potential external-download issues should be pushed to runtime. This therefore shouldn't be handled with setuptools.
Instead, to avoid burdening the user, consider downloading the data in a lazy way, upon load. Example:
def download_data(url='http://...'):
# Download; extract data to disk.
# Raise an exception if the link is bad, or we can't connect, etc.
def load_data():
if not os.path.exists(DATA_DIR):
download_data()
data = read_data_from_disk(DATA_DIR)
return data
We could then describe download_data in the docs, but the majority of users would never need to bother with it. This is somewhat similar to the behavior in the imageio module with respect to downloading necessary decoders at runtime, rather than making the user manage the external downloads themselves.
Note that the data does not reside in the python package itself, but is downloaded from somewhere else.
Please do not do this.
The whole point of Python packaging is to provide a completely deterministic, repeatable, and reusable means of installing exactly the same thing every time. Your proposal has the following problems at a minimum:
The end user might download your package on computer A, stick it on a thumb drive, and then install it on computer B which does not have internet.
The data on the web might change, meaning that two people who install the same exact package get different results.
The website that provides the data might cease to exist or unwisely change the URL, meaning people who still have the package won't be able to use it.
The user could be behind an internet filter, and you might get a useless "this page is blocked" HTML file instead of the dataset you were expecting.
Instead, you should either include your data with the package (using the package_data or data_files arguments to setup()), or provide a separate top-level function in your Python code to download the data manually when the user is ready to do so.
Python package installation states that it should never execute Python code in order to install Python packages. This means that you may not be able to download stuff during the installation process.
If you want to download some additional data, do it after you install the package , for example when you import your package you could download this data and cache it somewhere in order not to download it at every new import.
This question is rather old, but I want to add that downloading external data at installation time is of course much better than forcing to download external content at runtime.
The original problem is, that one cannot package arbitrary content into a Python package, if it exceeds the max. size limit of the package registry. This size limit effectively breaks up the relationship of the packaged Python code and the data it operates on. Suddenly things that belong together have to be separated and the package creator needs to take care about versioning and availability of external data. If the size limits are met, everything is installed at installation time and the discussion would be over here. I want to stress, that data & algorithms belong together and are normally installed at the same time, not at some later date. That's the whole point of package integrity. If you cannot install a package, because the external content cannot be downloaded, you want to know at installation time.
In the light of Docker & friends, downloading data at runtime makes a container non-reproducible and forces the download of the external content at each start of the container unless you additionally add the path where the data is downloaded to a Docker volume. But then you need to know where exactly this content is downloaded and the user/Dockerfile creator has to know more unnecessary details. There are more issues in using volumes in that regard.
Moreover, content fetched at runtime cannot be cached automatically by Docker, i.e. you need to fetch every time after a docker build.
Then again one could argue, that one should provide a function/executable script that downloads this external content and the user should execute this script directly after installation. Again the user of the package needs to know more than necessary, because someone or some commitee proclaims, executing Python code or downloading external content at installation time is not "recommended".
But forcing the user to run an extra script directly after installation of a package is factually the same as downloading the content directly inside a post-installation step, just more user-unfriendly. Thinking about how popular machine learning is today, the growing size of models and popularity of ML in the future, there will be a lot of scripts to be executed for just a handful of Python package dependencies for model downloads in the near future according to this argumentation.
The only time I see a benefit for an extra script, is when you can choose to download between several different versions of the external content, but then one intentionally involves the user into that decision.
But coming back to the runtime on-demand lazy model download, where the user doesn't need to be involved into executing an extra script: let's assume, the user packages the container, all tests pass successfully on the CI and he/she distributes it to Dockerhub or any other container registry and starts production. Nobody then wants the situation of random fails, because a successfully installed package intermittently downloads content from time to time e.g. after some maintainence task happens like cleaning up docker volumes or if distributing containers on new k8s nodes and the first request to a web app times out because external content is always fetched at startup. Or not fetched at all, because the external URL is in maintenance mode. That's a nightmare!
If it would be allowed to have reasonably sized Python packages, the whole problem would be much less of an issue. E.g. in contrast, the biggest Ruby gems (i.e. packages in the Ruby ecosystem) are over 700MB big and of course it's allowed to download external content at installation time.

Contributing to a repository on GitHub on a new branch

Say someone owns a repository with only one master hosting code that is compatible with Python 2.7.X. I would like to contribute to that repository with my own changes to a new branch new_branch to offer a variant of the repository that is compatible with Python 3.
I followed the steps here:
I forked the repository on GitHub on my account
I cloned my fork on my local machine
I created a new branch new_branch locally
I made the relevant changes
I committed and pushed the changes to my own fork on GitHub
I went on the browser to the GitHub page of the official repository, and asked for a pull request
The above worked, but it did a pull request from "my_account:new_branch" to "official_account:master". This is not what I want, since Python 2.7.x and Python 3 are incompatible with each other. What I would like to do is create a PR to a new branch on the official repository (e.g. with the same name "new_branch"). How can I do that? Is this possible at all?
You really don't want to do things this way. But first I'll explain how to do it, then I'll come back to explain why not to.
Using Pull Requests at GitHub has a pretty good overview, in particular the section "Changing the branch range and destination repository." It's easiest if you use a topic branch, and have the upstream owner create a topic branch of the same name; then you just pull down the menu where it says "base: master" and the choice will be right there, and he can just click the "merge" button and have no surprises.
So, why don't you want to do things this way?
First, it doesn't fit the GitHub model. Topic branches that live forever in parallel with the master branch and have multiple forks make things harder to maintain and visualize.
Second, you need both a git URL and an https URL for you code. You need people to be able to share links, pip install from top of tree, just clone the repo instead of cloning and then checking out a different branch, etc. This all means your code has to be on the master branch.
Third, if you want people to be able to install your 3.x version off PyPI, find docs at readthedocs, etc., you need a single project with a single source tree. Most such sites have a single latest version, not a latest version for each Python version, and definitely not multiple variations of the same version. (You could install completely fork the project, and create a separate foo3 project. But it's much easier for people to be able to pip install foo than to have them try that, fail, come to SO and ask why it doesn't work, and get told they probably have Python 3 and need to pip install foo3 instead.)
How do you merge two versions into a single package? The porting docs should have the most up-to-date advice, but briefly: If it's at all possible to create a single codebase that runs on both versions, that's ideal; if not, and if you can't make things work by running 2to3 or 3to2 at install time, create a parallel directory for the 3.x code (e.g., a foo3 alongside foo) and pick the appropriate directory at install time. (You can always start with that and gradually work toward a unified codebase.)

pulling and integrating remote changes with pygit2

I do have the following problem. I'm writing a script which searches a folder for repositories, looks up the remotes on the net and pulls all new data into the repository, notifying me about new changes. The main idea is clear. I'm using python 2.7 on Windows 7 x64, using pygit2 to access the git features. The command-line supports the simple command "git pull 'origin'", but the git api is more complicated and I don't see the way. Okay, I came that far:
import pygit2
orepository=pygit2.Repository("path/to/repository/.git")
oremote=repo.remotes[0]
result=oremote.fetch()
This code retrieves the new objects and downloads it into the repository, but doesn't update the master branch or check the new data out. By inspecting the repository with TortoiseGit I see that nothing way checked out , even the new log messages don't appear when showing the log. I need to use the git pull command to refresh the repository and working copy at all. Now my question: What do I need to do to do all that by using pygit2? I mean, I download the changes by fetching them, but what do I need to do then? I want to update the master branch and working copy too...
Thank you in advance for helping me with my problem.
Best Regards.
Remote.fetch() does not update the files in the workdir because that's very far from its job. If you want to update the current branch and checkout those files, you need to also perform those steps, via Repository.create_reference() or Reference.target= depending on what data you have at the time, and then e.g. Repository.checkout_head() if you did decide to update.
git-pull is a script that performs very many different steps depending on the configuration and flags passed. When you're writing a tool to simulate it over multiple repositories, you need to figure out what it is that you want to do, rather than hoping everything is set up just so that git-pull won't surprise you.

Categories

Resources