Azure Machine Learning Studio designer - "create new version" unexpected when registering a data set

Azure Machine Learning Studio designer - "create new version" unexpected when registering a data set - python

I am trying to register a data set as a Python step with the Azure Machine Learning Studio designer. Here is my code:
import pandas as pd
from azureml.core import Workspace, Run, Dataset
def azureml_main(dataframe1 = None, dataframe2 = None):
run = Run.get_context()
ws = run. experiment.workspace
ds = Dataset.from_pandas_dataframe(dataframe1)
ds.register(workspace = ws,
name = "data set name",
description = "example description",
create_new_version = True)
return dataframe1,
I get an error saying that "create_new_version" in the ds.register line was an unexpected keyword argument. However, this keyword appears in the documentation and I need it to keep track of new versions of the file.
If I remove the argument, I get a different error: "Local data source path not supported for this operation", so it still does not work. Any help is appreciated. Thanks!

update
sharing OP's solution here for easier discovery
import pandas as pd
from azureml.core import Workspace, Run, Dataset
def azureml_main(dataframe1 = None, dataframe2 = None):
run = Run.get_context()
ws = run. experiment.workspace
datastore = ws.get_default_datastore()
ds = Dataset.Tabular.register_pandas_dataframe(
dataframe1, datastore, 'data_set_name',
description = 'data set description.')
return dataframe1,
original answer
Sorry you're struggling. You're very close!
A few things may be the culprit here.
It looks like you're using the Dataset class, which has been deprecated. I recommend trying Dataset.Tabular.register_pandas_dataframe() (docs link) instead of Dataset.from_pandas_dataframe(). (more about the Dataset API deprecation)
More conjectire here, but another thing is there might be some limitations to using dataset registration within an "Execute Python Script" (EPS) module due to:
the workspace object might not have the right permissions
you might not be able to use the register_pandas_dataframe method inside the EPS module, but might have better luck with save the dataframe first to parquet, then calling Dataset.Tabular.from_parquet_files
Hopefully something works here!

Related

Create a Java UDF that uses geoip2 library with the database in a S3 bucket

Correct me if i'm wrong, but my understanding of the UDF function in Snowpark is that you can send the function UDF from your IDE and it will be executed inside Snowflake. I have a staged database called GeoLite2-City.mmdb inside a S3 bucket on my Snowflake account and i would like to use it to retrieve informations about an ip address. So my strategy was to
1 Register an UDF which would return a response string n my IDE Pycharm
2 Create a main function which would simple question the database about the ip address and give me a response.
The problem is that, how the UDF and my code can see the staged file at
s3://path/GeoLite2-City.mmdb
in my bucket, in my case i simply named it so assuming that it will eventually find it (with geoip2.database.Reader('GeoLite2-City.mmdb') as reader:) since the
stage_location='#AWS_CSV_STAGE' is the same as were the UDF will be saved? But i'm not sure if i understand correctly what the option stage_location is referring exactly.
At the moment i get the following error:
"Cannot add package geoip2 because Anaconda terms must be accepted by ORGADMIN to use Anaconda 3rd party packages. Please follow the instructions at https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-packages.html#using-third-party-packages-from-anaconda."
Am i importing geoip2.database correctly in order to use it with snowpark and udf?
Do i import it by writing session.add_packages('geoip2') ?
Thank You for clearing my doubts.
The instructions i'm following about geoip2 are here.
https://geoip2.readthedocs.io/en/latest/
my code:
from snowflake.snowpark import Session
import geoip2.database
from snowflake.snowpark.functions import col
import logging
from snowflake.snowpark.types import IntegerType, StringType
logger = logging.getLogger()
logger.setLevel(logging.INFO)
session = None
user = ''*********'
password = '*********'
account = '*********'
warehouse = '*********'
database = '*********'
schema = '*********'
role = '*********'
print("Connecting")
cnn_params = {
"account": account,
"user": user,
"password": password,
"warehouse": warehouse,
"database": database,
"schema": schema,
"role": role,
}
def first_udf():
with geoip2.database.Reader('GeoLite2-City.mmdb') as reader:
response = reader.city('203.0.113.0')
print('response.country.iso_code')
return response
try:
print('session..')
session = Session.builder.configs(cnn_params).create()
session.add_packages('geoip2')
session.udf.register(
func=first_udf
, return_type=StringType()
, input_types=[StringType()]
, is_permanent=True
, name='SNOWPARK_FIRST_UDF'
, replace=True
, stage_location='#AWS_CSV_STAGE'
)
session.sql('SELECT SNOWPARK_FIRST_UDF').show()
except Exception as e:
print(e)
finally:
if session:
session.close()
print('connection closed..')
print('done.')
UPDATE
I'm trying to solve it using a java udf as in my staging area i have the 'geoip2-2.8.0.jar' library staged already. If i could import it's methods to get the country of an ip it would be perfect, the problem is that i don't know how to do it exactly. I'm trying to follow these instructions https://maxmind.github.io/GeoIP2-java/.
I wanna interrogate the database and get as output the iso code of the country and i want to do it on snowflake worksheet.
CREATE OR REPLACE FUNCTION GEO()
returns varchar not null
language java
imports = ('#AWS_CSV_STAGE/lib/geoip2-2.8.0.jar', '#AWS_CSV_STAGE/geodata/GeoLite2-City.mmdb')
handler = 'test'
as
$$
def test():
File database = new File("geodata/GeoLite2-City.mmdb")
DatabaseReader reader = new DatabaseReader.Builder(database).build();
InetAddress ipAddress = InetAddress.getByName("128.101.101.101");
CityResponse response = reader.city(ipAddress);
Country country = response.getCountry();
System.out.println(country.getIsoCode());
$$;
SELECT GEO();

This will be more complicated that it looks:
To use session.add_packages('geoip2') in Snowflake you need to accept the Anaconda terms. This is easy if you can ask your account admin.
But then you can only get the packages that Anaconda has added to Snowflake in this way. The list is https://repo.anaconda.com/pkgs/snowflake/, and I don't see geoip2 there yet.
So you will need to package you own Python code (until Anaconda sees enough requests for geoip2 in the wishlist). I described the process here https://medium.com/snowflake/generating-all-the-holidays-in-sql-with-a-python-udtf-4397f190252b.
But wait! GeoIP2 is not pure Python, so you will need to wait until Anaconda packages the C extension libmaxminddb. But this will be harder, as you can see their docs don't offer a straightforward way like other pip installable C libraries.
So this will be complicated.
There are other alternative paths, like a commercial provider of this functionality (like I describe here https://medium.com/snowflake/new-in-snowflake-marketplace-monetization-315aa90b86c).
There other approaches to get this done without using a paid dataset, but I haven't written about that yet - but someone else might before I get to do it.
Btw, years ago I wrote something like this for BigQuery (https://cloud.google.com/blog/products/data-analytics/geolocation-with-bigquery-de-identify-76-million-ip-addresses-in-20-seconds), but today I was notified that Google recently deleted the tables that I had shared with the world (https://twitter.com/matthew_hensley/status/1598386009129058315).
So it's time to rebuild in Snowflake. But who (me?) and when is still a question.

How to use helper files in PyCharm

I am trying to follow along with a project written by Mike Smales - "Sound Classification using Deep Learning". In there, the author wrote a helper file called wavfilehelper.py:
wavehelper.py Code
import struct
class WavFileHelper():
def read_file_properties(self, filename):
wave_file = open(filename,"rb")
riff = wave_file.read(12)
fmt = wave_file.read(36)
num_channels_string = fmt[10:12]
num_channels = struct.unpack('<H', num_channels_string)[0]
sample_rate_string = fmt[12:16]
sample_rate = struct.unpack("<I",sample_rate_string)[0]
bit_depth_string = fmt[22:24]
bit_depth = struct.unpack("<H",bit_depth_string)[0]
return (num_channels, sample_rate, bit_depth)
In his main program he calls the helper file like this:
from helpers.wavfilehelper import WavFileHelper
wavfilehelper = WavFileHelper()
However, when I run this block of code in PyCharm, it complains "ModuleNotFoundError: No module named 'helpers.wavfilehelper'"...how can I get this helper file to work in the PyCharm environment? Do I have to put the wavehelper.py file in a special folder to be called?
Any help will be greatly appreciated!

It is important to look at (and quote in your question) the actual error messages! In this case, which line is in-error? It is not the instantiation line, but the import - Python is unable to find the module on your machine (using its system paths).
Earlier in the article, the author talks about downloading his files from GitHub (to your machine). Did you follow that step?
Web.Ref: further information about solving this error

How to assign a 2d libreoffice calc named range to a python variable. Can do it in Libreoffice Basic

I can't seem to find a simple answer to the question. I have this successfully working in Libreoffice Basic:
NamedRange = ThisComponent.NamedRanges.getByName("transactions_detail")
RefCells = NamedRange.getReferredCells()
Set MainRange = RefCells.getDataArray()
Then I iterate over MainRange and pull out the rows I am interested in.
Can I do something similar in a python macro? Can I assign a 2d named range to a python variable or do I have to iterate over the range to assign the individual cells?
I am new to python but hope to convert my iteration intensive macro function to python in hopes of making it faster.
Any help would be much appreciated.
Thanks.

LibreOffice can be manipulated from Python with the library pyuno. The documentation of pyuno is unfortunately incomplete but going through this tutorial may help.
To get started:
Python-Uno, the library to communicate via Uno, is already in the LibreOffice Python’s path. To initialize your context, type the following lines in your python shell :
import socket # only needed on win32-OOo3.0.0
import uno
# get the uno component context from the PyUNO runtime
localContext = uno.getComponentContext()
# create the UnoUrlResolver
resolver = localContext.ServiceManager.createInstanceWithContext(
"com.sun.star.bridge.UnoUrlResolver", localContext )
# connect to the running office
ctx = resolver.resolve( "uno:socket,host=localhost,port=2002;urp;StarOffice.ComponentContext" )
smgr = ctx.ServiceManager
# get the central desktop object
desktop = smgr.createInstanceWithContext( "com.sun.star.frame.Desktop",ctx)
# access the current writer document
model = desktop.getCurrentComponent()
Then to get a named range and access the data as an array, you can use the following methods:
NamedRange = model.NamedRanges.getByName(“Test Name”)
MainRange = NamedRange.getDataArray()
However I am unsure that this will result in a noticeable preformance gain.

Install issues with 'lr_utils' in python

I am trying to complete some homework in a DeepLearning.ai course assignment.
When I try the assignment in Coursera platform everything works fine, however, when I try to do the same imports on my local machine it gives me an error,
ModuleNotFoundError: No module named 'lr_utils'
I have tried resolving the issue by installing lr_utils but to no avail.
There is no mention of this module online, and now I started to wonder if that's a proprietary to deeplearning.ai?
Or can we can resolve this issue in any other way?

You will be able to find the lr_utils.py and all the other .py files (and thus the code inside them) required by the assignments:
Go to the first assignment (ie. Python Basics with numpy) - which you can always access whether you are a paid user or not
And then click on 'Open' button in the Menu bar above. (see the image below)
.
Then you can include the code of the modules directly in your code.

As per the answer above, lr_utils is a part of the deep learning course and is a utility to download the data sets. It should readily work with the paid version of the course but in case you 'lost' access to it, I noticed this github project has the lr_utils.py as well as some data sets
https://github.com/andersy005/deep-learning-specialization-coursera/tree/master/01-Neural-Networks-and-Deep-Learning/week2/Programming-Assignments
Note:
The chinese website links did not work when I looked at them. Maybe the server storing the files expired. I did see that this github project had some datasets though as well as the lr_utils file.
EDIT: The link no longer seems to work. Maybe this one will do?
https://github.com/knazeri/coursera/blob/master/deep-learning/1-neural-networks-and-deep-learning/2-logistic-regression-as-a-neural-network/lr_utils.py

Download the datasets from the answer above.
And use this code (It's better than the above since it closes the files after usage):
def load_dataset():
with h5py.File('datasets/train_catvnoncat.h5', "r") as train_dataset:
train_set_x_orig = np.array(train_dataset["train_set_x"][:])
train_set_y_orig = np.array(train_dataset["train_set_y"][:])
with h5py.File('datasets/test_catvnoncat.h5', "r") as test_dataset:
test_set_x_orig = np.array(test_dataset["test_set_x"][:])
test_set_y_orig = np.array(test_dataset["test_set_y"][:])
classes = np.array(test_dataset["list_classes"][:])
train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes

"lr_utils" is not official library or something like that.
Purpose of "lr_utils" is to fetch the dataset that is required for course.
option (didn't work for me): go to this page and there is a python code for downloading dataset and creating "lr_utils"
I had a problem with fetching data from provided url (but at least you can try to run it, maybe it will work)
option (worked for me): in the comments (at the same page 1) there are links for manually downloading dataset and "lr_utils.py", so here they are:
link for dataset download
link for lr_utils.py script download
Remember to extract dataset when you download it and you have to put dataset folder and "lr_utils.py" in the same folder as your python script that is using it (script with this line "import lr_utils").

The way I fixed this problem was by:
clicking File -> Open -> You will see the lr_utils.py file ( it does not matter whether you have paid/free version of the course).
opening the lr_utils.py file in Jupyter Notebooks and clicking File -> Download ( store it in your own folder ), rerun importing the modules. It will work like magic.
I did the same process for the datasets folder.

You can download train and test dataset directly here: https://github.com/berkayalan/Deep-Learning/tree/master/datasets
And you need to add this code to the beginning:
import numpy as np
import h5py
import os
def load_dataset():
train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")
train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # your train set features
train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels
test_dataset = h5py.File('datasets/test_catvnoncat.h5', "r")
test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # your test set features
test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels
classes = np.array(test_dataset["list_classes"][:]) # the list of classes
train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes

I faced similar problem and I had followed the following steps:
1. import the following library
import numpy as np
import matplotlib.pyplot as plt
import h5py
import scipy
from PIL import Image
from scipy import ndimage
2. download the train_catvnoncat.h5 and test_catvnoncat.h5 from any of the below link:
[https://github.com/berkayalan/Neural-Networks-and-Deep-Learning/tree/master/datasets]
or
[https://github.com/JudasDie/deeplearning.ai/tree/master/Improving%20Deep%20Neural%20Networks/Week1/Regularization/datasets]
3. create a folder named datasets and paste these two files in this folder.
[ Note: datasets folder and your source code file should be in same directory]
4. run the following code
def load_dataset():
with h5py.File('datasets1/train_catvnoncat.h5', "r") as train_dataset:
train_set_x_orig = np.array(train_dataset["train_set_x"][:])
train_set_y_orig = np.array(train_dataset["train_set_y"][:])
with h5py.File('datasets1/test_catvnoncat.h5', "r") as test_dataset:
test_set_x_orig = np.array(test_dataset["test_set_x"][:])
test_set_y_orig = np.array(test_dataset["test_set_y"][:])
classes = np.array(test_dataset["list_classes"][:])
train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes
5. Load the data:
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()
check datasets
print(len(train_set_x_orig))
print(len(test_set_x_orig))
your data set is ready, you may check the len of the train_set_x_orig, train_set_y variable. For mine, it was 209 and 50

I could download the dataset directly from coursera page.
Once you open the Coursera notebook you go to File -> Open and the following window will be display:
enter image description here
Here the notebooks and datasets are displayed, you can go to the datasets folder and download the required data for the assignment. The package lr_utils.py is also available for downloading.

below is your code, just save your file named "lr_utils.py" and now you can use it.
import numpy as np
import h5py
def load_dataset():
train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")
train_set_x_orig = np.array(train_dataset["train_set_x"][:]) # your train set features
train_set_y_orig = np.array(train_dataset["train_set_y"][:]) # your train set labels
test_dataset = h5py.File('datasets/test_catvnoncat.h5', "r")
test_set_x_orig = np.array(test_dataset["test_set_x"][:]) # your test set features
test_set_y_orig = np.array(test_dataset["test_set_y"][:]) # your test set labels
classes = np.array(test_dataset["list_classes"][:]) # the list of classes
train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes
if your code file can not find you newly created lr_utils.py file just write this code:
import sys
sys.path.append("full path of the directory where you saved Ir_utils.py file")

Here is the way to get dataset from as #ThinkBonobo:
https://github.com/andersy005/deep-learning-specialization-coursera/tree/master/01-Neural-Networks-and-Deep-Learning/week2/Programming-Assignments/datasets
write a lr_utils.py file, as above answer #StationaryTraveller, put it into any of sys.path() directory.
def load_dataset():
with h5py.File('datasets/train_catvnoncat.h5', "r") as train_dataset:
....
!!! BUT make sure that you delete 'datasets/', cuz now the name of your data file is train_catvnoncat.h5
restart kernel and good luck.

I may add to the answers that you can save the file with lr_utils script on the disc and import that as a module using importlib util function in the following way.
The below code came from the general thread about import functions from external files into the current user session:
How to import a module given the full path?
### Source load_dataset() function from a file
# Specify a name (I think it can be whatever) and path to the lr_utils.py script locally on your PC:
util_script = importlib.util.spec_from_file_location("utils function", "D:/analytics/Deep_Learning_AI/functions/lr_utils.py")
# Make a module
load_utils = importlib.util.module_from_spec(util_script)
# Execute it on the fly
util_script.loader.exec_module(load_utils)
# Load your function
load_utils.load_dataset()
# Then you can use your load_dataset() coming from above specified 'module' called load_utils
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_utils.load_dataset()
# This could be a general way of calling different user specified modules so I did the same for the rest of the neural network function and put them into separate file to keep my script clean.
# Just remember that Python treat it like a module so you need to prefix the function name with a 'module' name eg.:
# d = nnet_utils.model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 1000, learning_rate = 0.005, print_cost = True)
nnet_script = importlib.util.spec_from_file_location("utils function", "D:/analytics/Deep_Learning_AI/functions/lr_nnet.py")
nnet_utils = importlib.util.module_from_spec(nnet_script)
nnet_script.loader.exec_module(nnet_utils)
That was the most convenient way for me to source functions/methods from different files in Python so far.
I am coming from the R background where you can call just one line function source() to bring external scripts contents into your current session.

The above answers didn't help, some links had expired.
So, lr_utils is not a pip library but a file in the same notebook as the CourseEra website.
You can click on "Open", and it'll open the explorer where you can download everything that you would want to run in another environment.
(I used this on a browser.)

This is how i solved mine, i copied the lir_utils file and paste it in my notebook thereafter i downloaded the dataset by zipping the file and extracting it. With the following code. Note: Run the code on coursera notebook and select only the zipped file in the directory to download.
!pip install zipfile36
zf = zipfile.ZipFile('datasets/train_catvnoncat_h5.zip', mode='w')
try:
zf.write('datasets/train_catvnoncat.h5')
zf.write('datasets/test_catvnoncat.h5')
finally:
zf.close()

GeoModel Usage in Google App Engine

I am trying to do a proximity_fetch with the GeoModel class for Google App Engine. The entity I want to use it for is ndb and I am not sure what I need to download and import and what I can just import from google in my python code. The websites seem to be a little outdated and I was wondering if anyone had more pertinant information. This is what I have so far and it is telling me that Location has no attribute proximity_fetch, which I know but I am not sure how I should define it in the Location(ndb.Model) class.
g = geocoders.Google()
place, (lat, lng) = g.geocode(inputlocation, exactly_one=False)
bound = 20
upper = lat + bound
lower = lat - bound
left = lng + bound
right = lng - bound
locations = []
if lat and lng:
locations = Location.proximity_fetch(
Location.query(),
geotypes.Point(lat, lng),
max_results=50,
max_distance=500000)
Also when I try to import geomodel and geotype which seem pretty vital for this it gives me an import error and I am not sure where to get them from.
Any help or examples would be greatly appreciated!

You should first checkout the latest code from the SVN repository. You can find the information about this at http://code.google.com/p/geomodel/source/checkout
After you have the code locally on your machine, inside the main directory there is a directory called geo. You should copy this directory into your GAE project. Then in your code, you import what you need from this package. For example:
from geo import geomodel
Now, regarding your Location model, in order to be able to execute a proximity_fetch Query, your model should extend the provided model in geomodel called GeoModel. Thus, you should have something like this:
class Location(ndb.Model, GeoModel):
....
Notice that GeoModel currently uses the "old" GAE datastore layer db and not ndb that you are using in your code. However, it shouldn't cause any trouble.
For more information on how to use geomodel, you should also take a look at the demos that also exist in the code you got from SVN. You can find them in the demos directory.
Hope this helps!

from geo import geotypes
They do a full example that can be found here: http://code.google.com/p/geomodel/source/browse/trunk/demos/pubschools/handlers/service.py
http://code.google.com/p/geomodel/source/browse/#svn/trunk/demos/pubschools
results = PublicSchool.proximity_fetch(
base_query,
center, max_results=max_results, max_distance=max_distance)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.