Loading TF1 universal sentence encoder into TF2 - hub

Loading TF1 universal sentence encoder into TF2 - hub - python

I am trying to load a TF1 model, using hub and following this guide.
This model has a sentencepiece model which comes with it:
spm_path = m.signatures['spm_path']
<tensorflow.python.eager.wrap_function.WrappedFunction at 0x1341129e8>
If I execute this function:
spm_path()
{'default': <tf.Tensor: id=5905, shape=(), dtype=string, numpy=b'SAVEDMODEL-ASSET'>}
However if I use the output of b'SAVEDMODEL-ASSET' to load my sentencepiece model - I get the following error:
sp.Load(b'SAVEDMODEL-ASSET')
OSError: Not found: "SAVEDMODEL-ASSET": No such file or directory Error #2
The issue is that I am not sure where this asset is located - where does hub store dowloaded modules?
I can find the following: os.environ['TFHUB_CACHE_DIR'] = '/tmp/tfhub' but this is not enough for me to locate the actual file on my machine and pass in the correct file.

It's a bit clunky but here's one way to do it:
uselite = hub.load("https://tfhub.dev/google/universal-sentence-encoder-lite/2")
sp = sentencepiece.SentencePieceProcessor()
sp.load(uselite.asset_paths[0].asset_path.numpy())
The asset_paths contains only one item and it's the path to the spm model.

I have the same problem, trying to load the assets needed for the tokenization for the ALBERT model. I found that there is an asset list in m.asset_paths. There are the Asset objects, and you can access the path with the .asset_path property. The problem is that you need to check the paths of the assets, to find the one that you need. Maybe there is a better way, but I don't know it.

Related

Meeting ValueError while using flopy to load a MODFLOW-USG model

I am using FloPy to load an existing MODFLOW-USG model.
load_model = flopy.modflow.Modflow.load('HTHModel',model_ws='model_ws',version='mfusg',exe_name='exe_name',
verbose = True, check = False)
In the process of loading the LPF package, python shows that hk and hani have been successfully loaded, and then the following error is reported：
loading bas6 package file...
adding Package: BAS6
BAS6 package load...success
loading lpf package file...
loading IBCFCB, HDRY, NPLPF...
loading LAYTYP...
loading LAYAVG...
loading CHANI...
loading LAYVKA...
loading LAYWET...
loading hk layer 1...
loading hani layer 1...
D:\Anaconda\program\lib\site-packages\flopy\utils\util_array.py in parse_control_record(line,
current_unit, dtype, ext_unit_dict, array_format)
3215 locat = int(line[0:10].strip())
ValueError: invalid literal for int() with base 10: '-877.0
How can I solve this kind of problem.
By the way, I created this model by using the"save native text copy" function in GMS. Flopy can read other contents in the LPF package normally, and the position where it reports the error appears in the part of reading the [ANGLEX(NJAG)] data.
I compared the LFP file with the input and output description of MODFLOW-USG, and it meets the format requirements of the input file.
I am a newbie to pyhton and flopy and this question confused me a lot. Thank you very much for providing me with some reference information, whether it is about Python, FloPy, MODFLOW-USG or GMS.

Can you upload your lpf file? Then I can check this out. But at first glance, that "'" before the -877.0 looks suspect - is that in the lpf file?

How to extract metadata from tflite model

I'm loading this object detection model in python. I can load it with the following lines of code:
import tflite_runtime.interpreter as tflite
model_path = 'path_to_model_file.tf'
interpreter = tflite.Interpreter(model_path)
I'm able to perform inferences on this without any problem. However, labels are suposed to be included in the metadata, according to model's documentation, but I can't extract it.
The closest I was, it was when following this:
from tflite_support import metadata as _metadata
displayer = _metadata.MetadataDisplayer.with_model_file(model_path)
export_json_file = "extracted_metadata.json")
json_file = displayer.get_metadata_json()
# Optional: write out the metadata as a json file
with open(export_json_file, "w") as f:
f.write(json_file)
but the very first line of code, fails with this error: {AtributeError}'int' object has no attribute 'tobytes'.
How to extract it?

If you only care about the label file, you can simply run command like unzip model_path on Linux or Mac. TFLite model with metadata is essentially a zip file. See the public introduction for more details.
You code snippet to extract metadata works on my end. Make sure to double check model_path. It should be a string, such as "lite-model_ssd_mobilenet_v1_1_metadata_2.tflite".
If you'd like to read label files in an Android app, here is the sample code to do so.

How to get `nl` file from `pyomo` from inside `pyomo` script solver?

I am building many models using pyomo, and from what I understand, pyomo reformulates models before solving them.
I want to know exactly what the model looks like when it gets passed to the solver files ipopt and couenne.
From what I see here it is not clear to me how to get the nl file from a script (but I see how to get it from the command line).
Here is how I am solving the models in pyomo:
ipopt_solver = SolverFactory('ipopt')
ipopt_results_solver = ipopt_solver.solve(my_model, tee=True)
print ipopt_results_solver
couenne_solver = SolverFactory('couenne')
couenne_results_solver = couenne_solver.solve(my_model, tee=True)
print couenne_results_solver
How do I get the nl file just before solving? (and I assume it is just as easy to spit out another format other than nl).

If you just want the NL file, you can call the write method on the model with a filename that ends with .nl (e.g., my_model.write('junk.nl').
If you want to tell the solver object to not delete the temporary solver files so that you can access them after the solve, you should add keepfiles=True to the solve call. This will print the location of the temporary solver files. If you need to access them from the script, I believe the NL filename can be found as one of the entries in the _problem_files list attribute on the solver object. The log filename is stored on the _log_file attribute.

How to customize Stanford NER in python?

I learned how to customize Stanford NER (Named Entity Recognizer) in Java from here:
http://nlp.stanford.edu/software/crf-faq.shtml#a
But I am developing my project with Python and here I need to train my classier with some custom entities.
I searched a lot for a solution but could not find any. Any idea? If it is not possible, is there any other way to train my classifier with custom entities, i.e, with nltk or others in python?
EDIT: Code addition
This is what I did to set up and test Stanford NER which worked nicely:
from nltk.tag.stanford import StanfordNERTagger
path_to_model = "C:\..\stanford-ner-2016-10-31\classifiers\english.all.3class.distsim.crf.ser"
path_to_jar = "C:\..\stanford-ner-2016-10-31\stanford-ner.jar"
nertagger=StanfordNERTagger(path_to_model, path_to_jar)
query="Show me the best eye doctor in Munich"
print(nertagger.tag(query.split()))
This code worked successfully. Then, I downloaded the sample austen.prop file and both jane-austen-emma-ch1.tsv and jane-austen-emma-ch2.tsv file and put it in a custom folder in NerTragger library folder. I modified the jane-austen-emma-ch1.tsv file with my custom entity tags. The code of austen.prop file has link to jane-austen-emma-ch1.tsv file. Now, I modified the above code as follow but it is not working:
from nltk.tag.stanford import StanfordNERTagger
path_to_model = "C:\..\stanford-ner-2016-10-31\custom/austen.prop"
path_to_jar = "C:\..\stanford-ner-2016-10-31\stanford-ner.jar"
nertagger=StanfordNERTagger(path_to_model, path_to_jar)
query="Show me the best eye doctor in Munich"
print(nertagger.tag(query.split()))
But this code is producing the following error:
Exception in thread "main" edu.stanford.nlp.io.RuntimeIOException: java.io.StreamCorruptedException: invalid stream header: 236C6F63
raise OSError('Java command failed : ' + str(cmd))
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifierNoExceptions(AbstractSequenceClassifier.java:1507)
at edu.stanford.nlp.ie.crf.CRFClassifier.main(CRFClassifier.java:3017)
Caused by: java.io.StreamCorruptedException: invalid stream header: 236C6F63
OSError: Java command failed : ['C:\\Program Files\\Java\\jdk1.8.0_111\\bin\\java.exe', '-mx1000m', '-cp', 'C:/Users/HP/Desktop/Downloads1/Compressed/stanford-ner-2016-10-31/stanford-ner-2016-10-31\\stanford-ner-3.7.0-javadoc.jar;C:/Users/HP/Desktop/Downloads1/Compressed/stanford-ner-2016-10-31/stanford-ner-2016-10-31\\stanford-ner-3.7.0-sources.jar;C:/Users/HP/Desktop/Downloads1/Compressed/stanford-ner-2016-10-31/stanford-ner-2016-10-31\\stanford-ner-3.7.0.jar;C:/Users/HP/Desktop/Downloads1/Compressed/stanford-ner-2016-10-31/stanford-ner-2016-10-31\\stanford-ner.jar;C:/Users/HP/Desktop/Downloads1/Compressed/stanford-ner-2016-10-31/stanford-ner-2016-10-31\\lib\\joda-time.jar;C:/Users/HP/Desktop/Downloads1/Compressed/stanford-ner-2016-10-31/stanford-ner-2016-10-31\\lib\\jollyday-0.4.9.jar;C:/Users/HP/Desktop/Downloads1/Compressed/stanford-ner-2016-10-31/stanford-ner-2016-10-31\\lib\\stanford-ner-resources.jar', 'edu.stanford.nlp.ie.crf.CRFClassifier', '-loadClassifier', 'C:/Users/HP/Desktop/Downloads1/Compressed/stanford-ner-2016-10-31/stanford-ner-2016-10-31/custom/austen.prop', '-textFile', 'C:\\Users\\HP\\AppData\\Local\\Temp\\tmppk8_741f', '-outputFormat', 'slashTags', '-tokenizerFactory', 'edu.stanford.nlp.process.WhitespaceTokenizer', '-tokenizerOptions', '"tokenizeNLs=false"', '-encoding', 'utf8']
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:808)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:301)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1462)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1494)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifierNoExceptions(AbstractSequenceClassifier.java:1505)
... 1 more

The Stanford NER classifier is a java program. The NLTK's module is only an interface to the java executable. So you train a model exactly as you did before (or as you saw done in the link you provide).
In your code, you are confusing the training of a model with its use to chunk new text. The .prop file contains instructions for training a new model; it is not itself a model. This is what I recommend:
Forget about python/nltk for the moment, and train a new model from the Windows command line (CMD prompt or whatever): Follow the how-to you mention in your question, to generate a serialized model (.ser file) named ner-model.ser.gz or whatever you decide to call it from your .prop file.
In your python code, set the path_to_model variable to point to the .ser file you generated in step 1.
If you really want to control the training process from python, you could use the subprocess module to issue the appropriate command line commands. But it sounds like you don't really need this; just try to understand what these steps do so that you can carry them out properly.

Tensorflow examples all fail due to AttributeError: 'module' object has no attribute 'datasets'

I have built v0.8.0 of tensorflow using pip install, but when I try any of the skflow examples, they all fail due to
AttributeError: 'module' object has no attribute 'datasets'
Which is as a result of this
from tensorflow.contrib import learn
### Training data
# Downloads, unpacks and reads DBpedia dataset.
dbpedia = learn.datasets.load_dataset('dbpedia')

Several people have encountered this. Please install latest version, .e.g. one of the recent nightly builds.
run this from the command line
pip3 install --upgrade http://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_CONTAINER_TYPE=CPU,TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=cpu-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-0.8.0-cp34-cp34m-linux_x86_64.whl

I've found a less annoying way around this problem is to just download and load the data manually. It's quite easy, here is how I did it.
from tensorflow.contrib import learn
# Downloads, unpacks and reads DBpedia dataset.
## dbpedia = learn.datasets.load_dataset('dbpedia')
## BUT THAT ABOVE FUNCTION DOESN'T WORK SO....
## MANUALLY DOWNLOAD THE DATA FROM THIS LINK:
## https://googledrive.com/host/0Bz8a_Dbh9Qhbfll6bVpmNUtUcFdjYmF2SEpmZUZUcVNiMUw1TWN6RDV3a0JHT3kxLVhVR2M/dbpedia_csv.tar.gz
## MANUALLY UNPACK THE DATA BY DOUBLE CLICKING IT
## make sure the paths are correct
## LOAD IT LIKE YOU WOULD A REGULAR CSV FILE.
train = pandas.read_csv('dbpedia_csv/train.csv', header=None)
X_train, y_train = train[2], train[0]
test = pandas.read_csv('dbpedia_csv/test.csv', header=None)
X_test, y_test = test[2], test[0]

Hi I seem to have the same issue and traced it to the ~/skflow/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/datasets init.py does not have dbpedia as a dataset yet the github version of it has it. I am using version 0.8.0 of tensor flow

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Loading TF1 universal sentence encoder into TF2 - hub - python

It's a bit clunky but here's one way to do it: uselite = hub.load("https://tfhub.dev/google/universal-sentence-encoder-lite/2") sp = sentencepiece.SentencePieceProcessor() sp.load(uselite.asset_paths[0].asset_path.numpy()) The asset_paths contains only one item and it's the path to the spm model.

Related

Meeting ValueError while using flopy to load a MODFLOW-USG model

How to extract metadata from tflite model

How to get `nl` file from `pyomo` from inside `pyomo` script solver?

How to customize Stanford NER in python?

Tensorflow examples all fail due to AttributeError: 'module' object has no attribute 'datasets'

Categories

Resources