On my company PC, I do not have full permissions to install Python packages(usually this has to be requested for approval from IT, which is very painful and takes a very long time).
I am thinking to ask my manager to invest in Anaconda Enterprise so that the security aspect of open source Python use will not be an issue anymore. However, also to consider, my boss is looking to move to the cloud and I was wondering if Anaconda Enterprise can be used interchangeably on-premise (offline from cloud, i.e., no use of cloud storage or cloud compute resources) and when needed for big data processing, switched to 'cloud mode' by connecting to any of AWS, GCP, Azure to rent GPU instances? Any advice welcome.
Yes, that can be a good approach for your company, I used it in many projects on GCP and IBM cloud over Debian 7,8 and 9, and is a good approach, you can also depend on your need to create a package channel with the enterprise version and manage the permissions over your packages and it has a deploying tool where you can manage the deploys and audit the different for projects and API's as well also track the deployments and assign them to owners.
You can switch your server nodes to different servers or add and remove as well when you work with those depending on your environment can be difficult at the beginning but is pretty good after implemented.
Below are some links where you can see more information about what I'm talking about:
using-anaconda-enterprise
conda-offline-install-update
server-nodes
Depending on your preferences it may not be necessary to use anaconda enterprise on GCP. If you're boss is looking to move to the cloud then GCP has some great options for analyzing big data. Using the AI Platform you can deploy a new instance choose R, Python, CUDA, TensorFlow etc. Once the instance is deployed you can start your data preprocessing. Install whatever libraries you desire, Numpy, Scipy, Pandas, Matplotlib etc. And start your data manipulation.
If using something like Jupyter Notebooks you can use that offline to prepare your work before entering the GCP platform to run the Model Training.
Oh, also GCP has many labs to test out their Data Science platform.
https://www.qwiklabs.com/quests/43
GCP has many free promos these days below is a link to one.
GCP - Build your cloud skills for free with Google Cloud
Step by step usage for AI Platform
Related
I've almost run out of space on my C: and I'm currently working for myself remotely. I want to purchase cloud storage that will act as a mounted drive, so that it can do the following:
Store all of my Python projects along with any other files
Run my Python scripts on VS code (or any IDE) straight from the drive
Create virtual environments for my Python projects that will be stored on the drive
Set up APIs, from Python scripts stored on this drive, to other programs (eg GA or Heroku) so I can push and pull data as required
I just purchased OneDrive thinking I'd be able do this but according to the answer in this SO post it's not a good idea. This article is describing the exact behaviour that I'm after and pCloud looks like a good option, given it's security, but I can't find much resource on it's compatibility with Python.
Google Cloud, AWS and Azure are all out of my price range and look too complex for what I'm after. My cloud computing knowledge is fairly limited but I was wondering if anyone has any experience of running scripts in Python from the cloud (from pulling data from a warehouse to hosting an application in the public domain) that isn't using one of the big cloud computing companies?
After having worked with it for a while, I would like to understand how Colab really works and whereas it is safe to work with confidential data in it.
A bit of context. I understand the differences between Python, IPython and Jupyter Notebook described in here. and I would summarize it by saying Python is a programming language and can be installed as any other application with sudo apt-get). IPython is an interactive command-line terminal for Python and can be installed with pip, the standard package manager for Python. It allows you to install and manage additional packages writen in Python that are not part of the Python standard library. Jupyter Notebook add a web interface to and it can use several kernels or backends being IPython one of them.
What about Colab? It is my understanding than when using Colab, I get a VM from google with Python pre-installed as well as many other libraries (aka packages) like pandas or matplotlib. These packages are all installed in the base python installation.
Colab VMs comes with some ephemeral storage. This is equivalent to instance storage in AWS. So it will be lost when the VM runtime is interrupted, i.e. our VM is stopped (or would you rather say...terminated?) by Google. I believe that if I were to upload my confidential data there it will not be in my private subnet...
Mounting our Drive is hence equivalent of using an EBS volume in AWS. An EBS volume is network attached drive so the daat in it will persist even if the VM runtime is interrupted. EBS volumes can however be attached to only one EC2 instance... but I can mount my Drive to several Colab sessions. Not exactly clear to me what these sessions are...
Some users would like to create virtual environments in Colab and it looks like mounting the drive is a way to get around it.
When mounting our Drive to Colab, we need to authentificate because we are giving to the IP of the Colab VM access to our private subnet. Hence, if we had some confidential data, by using Colab the data would not be leaving our private company subnet...?
IIUC, the last paragraph asks the question: "Can I use IP-based authentication to restrict access to data in Colab?"
The answer is no: network address filtering cannot provide meaningful access restrictions in Colab.
Colab is a service rather than a machine. Colab backends do not have fixed IP addresses or a fixed IP address range. By analogy, there's no list of IP addresses for restricting access to a particular set of Google Drive users since, of course, Google Drive users don't have a fixed IP address. Colab users and backends are similar.
Instead of attempting to restrict access to IPs, you'll want to restrict access to particular Google accounts, perhaps using typical Drive file ACLs.
I need to share Jupyter notebooks with my colleagues. My company is pretty strict about data, sharing, etc so it needs to be on our private cloud. We use AWS. SageMaker is ok, but we also want to share the same environment I set up for my notebook. We don't have huge budgets for Domino Data Labs. Any methods / tools, even if low cost, you recommend? Really appreciaet the help.
Background on what i tried:
I don't even know where to start..
Our dev ops guys dont have bandwidth to do this for a while, but it's crushing us because we have a deliverable due soon
Do you know how back up machine learning models in Azure Machine Learning Studio in case of idle time when subscription is not bought. I would preferably back those models in Azure DB/DWH on other accounts/instances of Azure. Is it actually possible to copy models' flow to another locations or share it with other users?
I would appreciate the answer.
Based on my understanding, I think you want to export your experiments in Azure Machine Learning Studio to local as a file or other type resources. There seems not to be any way for doing this on Azure offical sites, but I searched a third party tool named azuremlps which is a PowerShell module for Azure ML Studio from a MSFT. You can try to use the cmdlet Export-AmlExperimentGraph to export a specific Experiment graph to a file in JSON format.
Hope it helps.
There's no way to backup the experiments you've created directly. You can share the models with other users in two way.
Share publicly through the gallery. All can see the experiment you've created.
Share privately allows you to share an experiment with the people only have the link for your published experiment.
Use 'Publish to gallery' operation as shown in below for the above task.
I want to remove as much complexity as I can from administering Python in on Amazon EC2 following some truly awful experiences with hosting providers who claim support for Python. I am looking for some guidance on which AMI to choose so that I have a stable and easily managed environment which already included Python and ideally an Apache web server and a database.
I am agnostic to Python version, web server, DB and OS as I am still early enough in my development cycle that I can influence those choices. Cost is not a consideration (within bounds) so Windows will work fine if it means easy administration.
Anyone have any practical experience or recommendations they can share?
Try the Ubuntu EC2 images. Python 2.7 is installed by default. The rest you just apt-get install and optionally create an image when the baseline is the way you want it (or just maintain a script that installs all the pieces and run after you create the base Ubuntu instance).
If you can get by with using the Amazon provided ones, I'd recommend it. I tend to use ami-84db39ed.
Honestly though, if you plan on leaving this running all the time, you would probably save a bit of money by just going with a VPS. Amazon tends to be cheaper if you are turning the service on and off over time.