torchserve model not running and giving a load of errors - python

I ran the following commands
torch-model-archiver --model-name "bert" --version 1.0 --serialized-file ./bert_model/pytorch_model.bin --extra-files "./bert_model/config.json,./bert_model/vocab.txt" --handler "./handler.py"
I created all the files and then I created a new directory and copied the model into it.
Then I executed the following command:
torchserve --start --model-store model_store --models bert=bert.mar
It then displayed a slew of errors.
Here is my error text. It is too long and repetitive; hence, I posted it on paste bin.
error

I would suggest lowering down the number of workers per model (Default workers per model: 12) now you get the maximum number that your can handle
How?
Go to config.properties file and add (the first line indicates the workers to 2):
default_workers_per_model=2
Then when you will do the torchserve add this (ts-config option to point on the location of you config.properties file):
torchserve --start \
--model-store ./deployment/model-store \
--ts-config ./deployment/config.properties \
--models bert=bert.mar
Let me know if this solves the error.
Note : you can add other parameters as well in the config.properties file such as :
inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
metrics_address=http://0.0.0.0:8082
default_workers_per_model=2
number_of_netty_threads=1
netty_client_threads=1
prefer_direct_buffer=true

Related

Azure IOTEdge - Raspberry Pi 4 (standard_init_linux.go:207: exec user process caused "exec format error")

I am trying to follow the https://github.com/Azure/ai-toolkit-iot-edge/blob/master/IoT%20Edge%20anomaly%20detection%20tutorial/00-anomaly-detection-tutorial.ipynb tutorial. The one difference is that I am pushing to a Raspberry Pi 4. The edgeHub and edgeAgent start fine but my container machinelearningmodule fails.
sudo docker logs -f machinelearningmodule
standard_init_linux.go:207: exec user process caused "exec format error"
I have looked at this github issue which suggests using arm64v8/python images https://github.com/emqx/emqx-docker/issues/108.
However when I put this into the code...
image_config = ContainerImage.image_configuration(runtime= "python",
execution_script="iot_score.py",
conda_file="myenv.yml",
tags = {'area': "iot", 'type': "classification"},
description = "IOT Edge anomaly detection demo",
base_image = 'arm64v8/python'
)
I get the following error:
Step 2/25 : FROM arm64v8/python
no matching manifest for linux/amd64 in the manifest list entries
latest: Pulling from arm64v8/python
2020/02/18 17:48:41 Container failed during run: acb_step_0. No retries remaining.
So guessing that is a dead end. Any suggestions on where to go from here?
p.s. also tried this
https://stackoverflow.com/questions/59000007/standard-init-linux-go207-exec-user-process-caused-exec-format-error
doesn't work either sadly.

How to link interactive problems (w.r.t. CodeJam)?

I'm not sure if it's allowed to seek for help(if not, I don't mind not getting an answer until the competition period is over).
I was solving the Interactive Problem (Dat Bae) on CodeJam. On my local files, I can run the judge (testing_tool.py) and my program (<name>.py) separately and copy paste the I/O manually. However, I assume I need to find a way to make it automatically.
Edit: To make it clear, I want every output of x file to be input in y file and vice versa.
Some details:
I've used sys.stdout.write / sys.stdin.readline instead of print / input throughout my program
I tried running interactive_runner.py, but I don't seem to figure out how to use it.
I tried running it on their server, my program in first tab, the judge file in second. It's always throwing TLE error.
I don't seem to find any tutorial to do the same either, any help will be appreciated! :/
The usage is documented in comments inside the scripts:
interactive_runner.py
# Run this as:
# python interactive_runner.py <cmd_line_judge> -- <cmd_line_solution>
#
# For example:
# python interactive_runner.py python judge.py 0 -- ./my_binary
#
# This will run the first test set of a python judge called "judge.py" that
# receives the test set number (starting from 0) via command line parameter
# with a solution compiled into a binary called "my_binary".
testing_tool.py
# Usage: `testing_tool.py test_number`, where the argument test_number
# is 0 for Test Set 1 or 1 for Test Set 2.
So use them like this:
python interactive_runner.py python testing_tool.py 0 -- ./dat_bae.py

OSError 105 : No buffer Space - Zeroconf

I'm using a NanoPi M1 (Allwinner H3 board) & running a Yocto-based OS. On my first encounter with ZeroConf-python,
>>> from zeroconf import Zeroconf, ServiceBrowser
>>> zero = Zeroconf()
I'm getting the error:
File "/usr/lib/python3.5/site-packages/zeroconf.py", line 1523, in __init__
socket.inet_aton(_MDNS_ADDR) + socket.inet_aton(i))
OSError: [Errno 105] No buffer space available
This error doesn't arise when I run it in Raspbian(on RPI).
I've tried to search for fixes to such errors in homeassistant, but none provide a good overview to the real problem, rest-aside the solution.
Update the net/ipv4/igmp_max_memberships value of sysctl to greater than zero.
Execute the following commands on the terminal:
$ sysctl -w net.ipv4.igmp_max_memberships=20 (or any other value greater than zero)
&
$ sysctl -w net.ipv4.igmp_max_msf=10
Then, restart the avahi-daemon
systemctl restart avahi-daemon
You can verify the existing values of the above keys using
'sysctl net.ipv4.igmp_max_memberships'.
An addition to the answer of Neelotpal:
This post includes a nice solution proposal with all options to check for this problem:
# Bigger buffers (to make 40Gb more practical). These are maximums, but the default is unaffected.
net.core.wmem_max=268435456
net.core.rmem_max=268435456
net.core.netdev_max_backlog=10000
# Avoids problems with multicast traffic arriving on non-default interfaces
net.ipv4.conf.default.rp_filter=0
net.ipv4.conf.all.rp_filter=0
# Force IGMP v2 (required by CBF switch)
net.ipv4.conf.all.force_igmp_version=2
net.ipv4.conf.default.force_igmp_version=2
# Increase the ARP cache table
net.ipv4.neigh.default.gc_thresh3=4096
net.ipv4.neigh.default.gc_thresh2=2048
net.ipv4.neigh.default.gc_thresh1=1024
# Increase number of multicast groups permitted
net.ipv4.igmp_max_memberships=1024
I don't suggest to just blindly copy these values but to systematically test which one it is that is limiting your resources:
use sysctl <property> to get the currently set value
verify if the property is currently running at the limit by checking system stats
change the configuration as described by Neelotpal with sysctl -w or by changing /etc/sysctl.conf directly and realoading it via sysctl -p
In my case increasing the net.ipv4.igmp_max_memberships did the trick:
I checked the current value with sysctl net.ipv4.igmp_max_memberships which was 20
I checked how many memberships there are with netstat -gn, realizing that my numerous docker containers take up most of that
finally I increased value in syctl.conf, and it worked
And of course it is also good to read up on those properties to understand what they actually do, for example on sysctl-explorer.net.

Ansible ERROR! no action detected in task

I am trying to run a playbook https://github.com/Datanexus/dn-cassandra
With the different deployment scenarios listed out there, I am going for multinode cassandra setup described here: deployment scenarios.
I have setup a static inventory file.
cassandra-seed-01 ansible_ssh_host=192.168.0.17 ansible_ssh_port=22 ansible_ssh_user='root' ansible_ssh_private_key_file='keys/id_rsa'
cassandra-seed-02 ansible_ssh_host=192.168.0.18 ansible_ssh_port=22 ansible_ssh_user='root' ansible_ssh_private_key_file='keys/id_rsa'
cassandra-non-seed-01 ansible_ssh_host=192.168.0.22 ansible_ssh_port=22 ansible_ssh_user='root' ansible_ssh_private_key_file='keys/id_rsa'
[cassandra_seed]
192.168.0.17
192.168.0.18
[cassandra]
192.168.0.22
However when I try running the playbook it throws the following error:
ERROR! no action detected in task
The error appears to have been in
'/home/laumair/workspace/dn-cassandra/provision-cassandra.yml': line
21, column 9, but may be elsewhere in the file depending on the exact
syntax problem.
The offending line appears to be:
# then, build the seed and non-seed host groups
- include_role:
^ here
I would appreciate any sort of direction with this error as I have tried out solutions for similar errors but no luck so far.
include_role is available since Ansible 2.2.
Please upgrade your Ansible installation.

How to Set spark.sql.parquet.output.committer.class in pyspark

I'm trying to set spark.sql.parquet.output.committer.class and nothing I do seems to get the setting to take effect.
I'm trying to have many threads write to the same output folder, which would work with org.apache.spark.sql.
parquet.DirectParquetOutputCommitter since it wouldn't use the _temporary folder. I'm getting the following error, which is how I know it's not working:
Caused by: java.io.FileNotFoundException: File hdfs://path/to/stuff/_temporary/0/task_201606281757_0048_m_000029/some_dir does not exist.
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:795)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$700(DistributedFileSystem.java:106)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:853)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:849)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:849)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:382)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:384)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:326)
at org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:46)
at org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:230)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelation.scala:151)
Note the call to org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob, the default class.
I've tried the following, based on other SO answers and searches:
sc._jsc.hadoopConfiguration().set(key, val) (this does work for settings like parquet.enable.summary-metadata)
dataframe.write.option(key, val).parquet
Adding --conf "spark.hadoop.spark.sql.parquet.output.committer.class=org.apache.spark.sql.parquet.DirectParquetOutputCommitter" to the spark-submit call
Adding --conf "spark.sql.parquet.output.committer.class"=" org.apache.spark.sql.parquet.DirectParquetOutputCommitter" to the spark-submit call.
That's all I've been able to find, and nothing works. It looks like it's not hard to set in Scala but appears impossible in Python.
The approach in this comment definitively worked for me:
16/06/28 18:49:59 INFO ParquetRelation: Using user defined output committer for Parquet: org.apache.spark.sql.execution.datasources.parquet.DirectParquetOutputCommitter
It was a lost log message in the flood that Spark gives, and the error I was seeing was unrelated. It's all moot anyway, since the DirectParquetOutputCommitter has been removed from Spark.

Categories

Resources