I wrote 2-3 Plugins for pyload.
Sometimes they change and i let users know over forum that theres a new version.
To avoid that i'd like to give my scripts an auto selfupdate function.
https://github.com/Gutz-Pilz/pyLoad-stuff/blob/master/FileBot.py
Something like that easy to setup ?
Or someone can point me in a direction ?
Thanks in advance!
It is possible, with some caveats. But it can easily become very complicated. Before you know it, your auto-update "feature" will be bigger than the original code!
First you need to have an URL that always contains the latest version. Since you are using github, using raw.githubusercontent might do very well.
Have your code download the latest version from that URL (e.g. using requests), and compare the version with that in the current code. For this purpose I would recommend a simple integer version number, so you don't need any complicated parsing logic.
However, you might want to consider only running that check once per day, or once per week. If you do it every time your file is run, the server might get hammered! So now you have to save a file with the date when the check was last done, and read that to see if it is time to run the check again. This file will need to be saved in a location that you can access on every platform your code is liable to run on. That in itself can be a challenge.
If it is just a single python file, which is installed as the user that is running it, updating is relatively easy. But if the original was installed as root in the global Python directory and your script is running as a nonprivileged user it will be difficult. Especially if it is running as a plugin and cannot ask the user for (temporary) root credentials to install the file.
And what are you going to do if a newer version has more dependencies outside the standard library?
Last but not least, as a sysadmin I don't really like auto-updating software. Especially for critical system infrstructure I like to be able to estimate the consequences before an update.
Related
I Have created a python based tool for my teammates, Where we group all the similar JIRA tickets and hence it becomes easier to pick the priority one first. But the problem is every time I make some changes I have to ask people to get the latest one from the Perforce server. So I am looking for a mechanism where whenever anyone uses the tool a pop up should come up as "New version available" please install.
Can anyone help how to achieve that?
On startup, or periodically while running, you could have the tool query your Perforce server and check the latest version. If it doesn't match the version currently running, then you would show the popup, and maybe provide a download link.
I'm not personally familiar with Perforce, but in Git for example you could check the hash of the most recent commit. You could even just include a file with a version number that you manually increment every time you push changes.
I have an idea,you can use requests module to crawl your website(put the number of version in the page) and get the newest version.
And then,get the version in the user's computer and compare to the official version.If different or lower than official version,Pop a window to remind user to update
You could maintain the latest version code/tool on your server and have your tool check it periodically against its own version code. If the version code is higher on the server, then your tool needs to be updated and you can tell the user accordingly or raise appropriate pop-up recommending for an update.
I'd like to create some ridiculously-easy-to-use pip packages for loading common machine-learning datasets in Python. (Yes, some stuff already exists, but I want it to be even simpler.)
What I'd like to achieve is this:
User runs pip install dataset
pip downloads the dataset, say via wget http://mydata.com/data.tar.gz. Note that the data does not reside in the python package itself, but is downloaded from somewhere else.
pip extracts the data from this file and puts it in the directory that the package is installed in. (This isn't ideal, but the datasets are pretty small, so let's assume storing the data here isn't a big deal.)
Later, when the user imports my module, the module automatically loads the data from the specific location.
This question is about bullets 2 and 3. Is there a way to do this with setuptools?
As alluded to by Kevin, Python package installs should be completely reproducible, and any potential external-download issues should be pushed to runtime. This therefore shouldn't be handled with setuptools.
Instead, to avoid burdening the user, consider downloading the data in a lazy way, upon load. Example:
def download_data(url='http://...'):
# Download; extract data to disk.
# Raise an exception if the link is bad, or we can't connect, etc.
def load_data():
if not os.path.exists(DATA_DIR):
download_data()
data = read_data_from_disk(DATA_DIR)
return data
We could then describe download_data in the docs, but the majority of users would never need to bother with it. This is somewhat similar to the behavior in the imageio module with respect to downloading necessary decoders at runtime, rather than making the user manage the external downloads themselves.
Note that the data does not reside in the python package itself, but is downloaded from somewhere else.
Please do not do this.
The whole point of Python packaging is to provide a completely deterministic, repeatable, and reusable means of installing exactly the same thing every time. Your proposal has the following problems at a minimum:
The end user might download your package on computer A, stick it on a thumb drive, and then install it on computer B which does not have internet.
The data on the web might change, meaning that two people who install the same exact package get different results.
The website that provides the data might cease to exist or unwisely change the URL, meaning people who still have the package won't be able to use it.
The user could be behind an internet filter, and you might get a useless "this page is blocked" HTML file instead of the dataset you were expecting.
Instead, you should either include your data with the package (using the package_data or data_files arguments to setup()), or provide a separate top-level function in your Python code to download the data manually when the user is ready to do so.
Python package installation states that it should never execute Python code in order to install Python packages. This means that you may not be able to download stuff during the installation process.
If you want to download some additional data, do it after you install the package , for example when you import your package you could download this data and cache it somewhere in order not to download it at every new import.
This question is rather old, but I want to add that downloading external data at installation time is of course much better than forcing to download external content at runtime.
The original problem is, that one cannot package arbitrary content into a Python package, if it exceeds the max. size limit of the package registry. This size limit effectively breaks up the relationship of the packaged Python code and the data it operates on. Suddenly things that belong together have to be separated and the package creator needs to take care about versioning and availability of external data. If the size limits are met, everything is installed at installation time and the discussion would be over here. I want to stress, that data & algorithms belong together and are normally installed at the same time, not at some later date. That's the whole point of package integrity. If you cannot install a package, because the external content cannot be downloaded, you want to know at installation time.
In the light of Docker & friends, downloading data at runtime makes a container non-reproducible and forces the download of the external content at each start of the container unless you additionally add the path where the data is downloaded to a Docker volume. But then you need to know where exactly this content is downloaded and the user/Dockerfile creator has to know more unnecessary details. There are more issues in using volumes in that regard.
Moreover, content fetched at runtime cannot be cached automatically by Docker, i.e. you need to fetch every time after a docker build.
Then again one could argue, that one should provide a function/executable script that downloads this external content and the user should execute this script directly after installation. Again the user of the package needs to know more than necessary, because someone or some commitee proclaims, executing Python code or downloading external content at installation time is not "recommended".
But forcing the user to run an extra script directly after installation of a package is factually the same as downloading the content directly inside a post-installation step, just more user-unfriendly. Thinking about how popular machine learning is today, the growing size of models and popularity of ML in the future, there will be a lot of scripts to be executed for just a handful of Python package dependencies for model downloads in the near future according to this argumentation.
The only time I see a benefit for an extra script, is when you can choose to download between several different versions of the external content, but then one intentionally involves the user into that decision.
But coming back to the runtime on-demand lazy model download, where the user doesn't need to be involved into executing an extra script: let's assume, the user packages the container, all tests pass successfully on the CI and he/she distributes it to Dockerhub or any other container registry and starts production. Nobody then wants the situation of random fails, because a successfully installed package intermittently downloads content from time to time e.g. after some maintainence task happens like cleaning up docker volumes or if distributing containers on new k8s nodes and the first request to a web app times out because external content is always fetched at startup. Or not fetched at all, because the external URL is in maintenance mode. That's a nightmare!
If it would be allowed to have reasonably sized Python packages, the whole problem would be much less of an issue. E.g. in contrast, the biggest Ruby gems (i.e. packages in the Ruby ecosystem) are over 700MB big and of course it's allowed to download external content at installation time.
Would it be possible to create a python module that lazily downloads and installs submodules as needed? I've worked with "subclassed" modules that mimic real modules, but I've never tried to do so with downloads involved. Is there a guaranteed directory that I can download source code and data to, that the module would then be able to use on subsequent runs?
To make this more concrete, here is the ideal behavior:
User runs pip install magic_module and the lightweight magic_module is installed to their system.
User runs the code import magic_module.alpha
The code goes to a predetermine URL, is told that there is an "alpha" subpackage, and is then given the URLs of alpha.py and alpha.csv files.
The system downloads these files to somewhere that it knows about, and then loads the alpha module.
On subsequent runs, the user is able to take advantage of the downloaded files to skip the server trip.
At some point down the road, the user could run a import magic_module.alpha ; alpha._upgrade() function from the command line to clear the cache and get the latest version.
Is this possible? Is this reasonable? What kinds of problems will I run into with permissions?
Doable, certainly. The core feature will probably be import hooks. The relevant module would be importlib in python 3.
Extending the import mechanism is needed when you want to load modules that are stored in a non-standard way. Examples include [...] modules that are loaded from a database over a network.
Convenient, probably not. The import machinery is one of the parts of python that has seen several changes over releases. It's undergoing a full refactoring right now, with most of the existing things being deprecated.
Reasonable, well it's up to you. Here are some caveats I can think of:
Tricky to get right, especially if you have to support several python versions.
What about error handling? Should application be prepared for import to fail in normal circumstances? Should they degrade gracefully? Or just crash and spew a traceback?
Security? Basically you're downloading code from someplace, how do you ensure the connection is not being hijacked?
How about versionning? If you update some of the remote modules, how can make the application download the correct version?
Dependencies? Pushing of security updates? Permissions management?
Summing it up, you'll have to solve most of the issues of a package manager, along with securing downloads and permissions issues of course. All those issues are tricky to begin with, easy to get wrong with dire consequences.
So with all that in mind, it really comes down to how much resources you deem worth investing into that, and what value that adds over a regular use of readily available tools such as pip.
(the permission question cannot really be answered until you come up with a design for your package manager)
I'm not sure if I'm even asking this question correctly. I just built my first real program and I want to make it available to people in my office. I'm not sure if I will have access to the shared server, but I was hoping I could simply package the program (I hope I'm using this term correctly) and upload it to a website for my coworkers to download.
I know how to zip a file, but something tells me it's a little more complicated than that :) In fact, some of the people in my office who need the program installed do not have python on their computers already, and I would rather avoid asking everyone to install python before downloading my .py files from my hosting server.
So, is there an easy way to package my program, along with python and the other dependencies, for simple distribution from a website? I tried searching for the answer but I can't find exactly what I'm looking for. Oh, and since this is the first time I have done this- are there any precautions I need to take when sharing these files so that everything runs smoothly?
PyInstaller or py2exe can package your Python program.
Both are actively maintained. PyInstaller is actively maintained. py2exe has not been updated for at least a year. I've used each with success.
Also there is cx_Freeze which I have not used.
Take a look at http://www.py2exe.org/
I have a Windows program that I made with python and py2exe. I'd like to create an updating feature so that the software can be readily updated.
What are common ways of going about this?
If you think your code might benefit others, you could put it up on PyPI. Then having different versions is just updating your package, or telling your clients to use easy_install to get the latest version. This doesn't push updates, though.
You can try Esky, which is an auto-update framework for managing different versions, including fetching new versions and rolling back partial updates. It can be found on PyPI.
That said, I haven't used Esky. If you wish to roll your own auto-update feature, you might want to look at Boxed Dice to see how they got around to it.
When you package an app with py2exe, the result is usually a single executable (perhaps with some data files). This is simplest to update by just proposing the user to download and install a new version every once in a while (how you check with a server that such new version exists is a different question).
If you want to reduce the download size the user has to do, application commonly resort to breaking themselves up into multiple DLLs and updating only the relevant DLLs. When you have a Python application you don't have DLLs but you have an even easier option - you can just keep most of your app's logic outside the exe in .pyc files, and update just some of these .pyc files.
Now, mind you, .pyc files are easily "decompilable" into Python (a somewhat obfuscated version of your original code), but having an exe made with py2exe isn't much safer, because py2exe is open-source software and packs all the same files inside the exe anyway.
To conclude, my suggestion is don't bother. How large can your application be? With today's fast connections, it's easier to just make the user download a whole new version than to invest a lot of time into building partial-update functionality into your program.