I am trying to run a playbook https://github.com/Datanexus/dn-cassandra
With the different deployment scenarios listed out there, I am going for multinode cassandra setup described here: deployment scenarios.
I have setup a static inventory file.
cassandra-seed-01 ansible_ssh_host=192.168.0.17 ansible_ssh_port=22 ansible_ssh_user='root' ansible_ssh_private_key_file='keys/id_rsa'
cassandra-seed-02 ansible_ssh_host=192.168.0.18 ansible_ssh_port=22 ansible_ssh_user='root' ansible_ssh_private_key_file='keys/id_rsa'
cassandra-non-seed-01 ansible_ssh_host=192.168.0.22 ansible_ssh_port=22 ansible_ssh_user='root' ansible_ssh_private_key_file='keys/id_rsa'
[cassandra_seed]
192.168.0.17
192.168.0.18
[cassandra]
192.168.0.22
However when I try running the playbook it throws the following error:
ERROR! no action detected in task
The error appears to have been in
'/home/laumair/workspace/dn-cassandra/provision-cassandra.yml': line
21, column 9, but may be elsewhere in the file depending on the exact
syntax problem.
The offending line appears to be:
# then, build the seed and non-seed host groups
- include_role:
^ here
I would appreciate any sort of direction with this error as I have tried out solutions for similar errors but no luck so far.
include_role is available since Ansible 2.2.
Please upgrade your Ansible installation.
Related
we have the following basic EKS Operator on MWAA (Airflow version 2.2.2)
start_pod = EKSPodOperator(
aws_conn_id="eks-connection",
task_id='start_pod',
namespace="airflow",
cluster_name="eks-data-stg",
in_cluster=False,
service_account_name="airflow-sa",
image='amazon/aws-cli:latest',
cmds=['sh', '-c', 'echo Test Airflow; date'],
labels={'demo': 'hello_world'},
get_logs=True,
# Delete the pod when it reaches its final state, or the execution is interrupted.
is_delete_operator_pod=True,
)
This fails with the following error:
airflow.exceptions.AirflowConfigException: `[logging] logging_level` should not be 'fatal'. Possible values: CRITICAL, FATAL, ERROR, WARN, WARNING, INFO, DEBUG.
This we traced back to the following issue that is fixed in a older version of the Operator(https://github.com/apache/airflow/issues/21421).
Unfortunately we are not able to override theairflow-providers-amazon.
Has anyone found a way around this bug by either overriding the dependency or fixing the operator?
I have Glue job, a python shell code. When I try to run it I end up getting the below error.
Job Name : xxxxx Job Run Id : yyyyyy failed to execute with exception Internal service error : Invalid input provided
It is not specific to code, even if I just put
import boto3
print('loaded')
I am getting the error right after clicking the run job option. What is the issue here?
It happend to me but the same job is working on a different account.
AWS documentation is not really explainative about this error:
The input provided was not valid.
I doubt this is an Amazon issue as mentionned #Quartermass
Same issue here in eu-west-2 yesterday, working now. This was only happening with Pythonshell jobs, not Pyspark ones, and job runs weren't getting as far as outputting any log streams. I can only assume it was an AWS issue they've now fixed and not issued a service announcement for.
I think Quatermass is right, the jobs started working out of the blue the next day without any changes.
I too received this super helpful error message.
What worked for me was explicitly setting properties like worker type, number of workers, Glue version and Python version.
In Terraform code:
resource "aws_glue_job" "my_job" {
name = "my_job"
role_arn = aws_iam_role.glue.arn
worker_type = "Standard"
number_of_workers = 2
glue_version = "4.0"
command {
script_location = "s3://my-bucket/my-script.py"
python_version = "3"
}
default_arguments = {
"--enable-job-insights" = "true",
"--additional-python-modules" : "boto3==1.26.52,pandas==1.5.2,SQLAlchemy==1.4.46,requests==2.28.2",
}
}
Update
After doing some more digging, I realised that what I needed was a Python shell script Glue job, not an ETL (Spark) job. By choosing this flavour of job, setting the Python version to 3.9 and "ticking the box" for Glue's pre-installed analytics libraries, my script, incidentally, had access to all the libraries I needed.
My Terraform code ended up looking like this:
resource "aws_glue_job" "my_job" {
name = "my-job"
role_arn = aws_iam_role.glue.arn
glue_version = "1.0"
max_capacity = 1
connections = [
aws_glue_connection.redshift.name
]
command {
name = "pythonshell"
script_location = "s3://my-bucket/my-script.py"
python_version = "3.9"
}
default_arguments = {
"--enable-job-insights" = "true",
"--library-set" : "analytics",
}
}
Note that I have switched to using Glue version 1.0. I arrived at this after some trial and error, and could not find this explicitly stated as the compatible version for pythonshell jobs… but it works!
Well, in my case, I get this error from time to time without any clear reason. The only thing that seems to cause the issue, is modifying some job parameter and saving the modifications. As soon as I save and try to execute the job, I usually get this error and, the only way to solve the issue, is destroying the job and, then, re-creating it again. Does anybody solved this issue by other means? As I saw in the accepted answer, the job simply begun to work again wthout any manual action, giving an understanding that the problem was a bug in AWS that was corrected.
I was facing a similar issue. I was invoking my job from a workflow. I could solve it by adding WorkerType, GlueVersion, NumberOfWorkers to the job before adding the job to the workflow. I could see it consistently fail before and succeed after this addition.
when I try to reload firewalld, it tells me
Error: COMMAND_FAILED: 'python-nftables' failed: internal:0:0-0: Error: Could not process rule: Numerical result out of range
JSON blob:
{"nftables": [{"metainfo": {"json_schema_version": 1}}, {"add": {"chain": {"family": "inet", "table": "firewalld", "name": "filter_IN_policy_allow-host-ipv6"}}}]}
I don't know why this is, after Google, it still hasn't been resolved
I had the same error message. I enabled verbose debugs on firewalld and tailed the logs to file for a deeper dive. In my case the exception was originally happening in "nftables.py" on line "361".
Exception:
2022-01-23 14:00:23 DEBUG3: <class 'firewall.core.nftables.nftables'>: calling python-nftables with JSON blob: {"nftables": [{"metainfo": {"json_schema_version": 1}}, {"add": {"chain": {"family": "inet", "table": "firewalld", "name": "filter_IN_policy_allow-host-ipv6"}}}]}
2022-01-23 14:00:23 DEBUG1: Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/firewall/core/fw.py", line 888, in rules
backend.set_rule(rule, self._log_denied)
File "/usr/lib/python3.6/site-packages/firewall/core/nftables.py", line 390, in set_rule
self.set_rules([rule], log_denied)
File "/usr/lib/python3.6/site-packages/firewall/core/nftables.py", line 361, in set_rules
raise ValueError("'%s' failed: %s\nJSON blob:\n%s" % ("python-nftables", error, json.dumps(json_blob)))
ValueError: 'python-nftables' failed: internal:0:0-0: Error: Could not process rule: Numerical result out of range
Line 361 in "nftables.py":
self._loader(config.FIREWALLD_POLICIES, "policy")
Why this is a problem:
Basically nftables is a backend service and firewalld is a frontend service. They are dependent on each other to function. Each time you restart firewalld it has to reconcile the backend, in this case nftables. At some point during the reconciliation a conflict is occurring in the python code. That is unfortunate as the only real solution will likely have to come from code improvements from nftables in how it is able to populate policies into chains and tables.
A work-around:
The good news is, if you are like me, you don't use ipv6, in which case we simply disable the policy rather than solve for the issue. I'll put the work-around steps below.
Work-around Steps:
The proper way to remove the policy is to use the command "firewall-cmd --delete-policy=allow-host-ipv6 --permanent" but I encountered other errors and exceptions in python when attempting to do that. Since I don't care about ipv6 I manually deleted the XML from configuration and restarted the firewalld service.
rm /usr/lib/firewalld/policies/allow-host-ipv6.xml
rm /etc/firewalld/policies/allow-host-ipv6.xml
systemctl restart firewalld
Side Note:
Once I fixed this conflict, I also had some additional conflicts between nftables/iptables/fail2ban that had to be cleared up. For that I just used the command "fail2ban-client unban --all" to make fail2ban wipe clean all of the chains it added to iptables.
I have an AWS Glue job written in Python that pulls in the spark-xml library (through the Dependent jars path). I'm using spark-xml_2.11-0.2.0.jar. When I try to output my DataFrame to XML I get an error. The code I'm using is:
applymapping1.toDF().repartition(1).write.format("com.databricks.xml").save("s3://glue.xml.output/Test.xml");
The error I get is:
"/mnt/yarn/usercache/root/appcache/application_1517883778506_0016/container_1517883778506_0016_02_000001/pyspark.zip/pyspark/sql/readwriter.py",
line 550, in save File
"/mnt/yarn/usercache/root/appcache/application_1517883778506_0016/container_1517883778506_0016_02_000001/py4j-0.10.4-src.zip/py4j/java_gateway.py",
line 1133, in call File
"/mnt/yarn/usercache/root/appcache/application_1517883778506_0016/container_1517883778506_0016_02_000001/pyspark.zip/pyspark/sql/utils.py",
line 63, in deco File
"/mnt/yarn/usercache/root/appcache/application_1517883778506_0016/container_1517883778506_0016_02_000001/py4j-0.10.4-src.zip/py4j/protocol.py",
line 319, in get_return_value py4j.protocol.Py4JJavaError: An error
occurred while calling o75.save. : java.lang.AbstractMethodError:
com.databricks.spark.xml.DefaultSource15.createRelation(Lorg/apache/spark/sql/SQLContext;Lorg/apache/spark/sql/SaveMode;Lscala/collection/immutable/Map;Lorg/apache/spark/sql/Dataset;)Lorg/apache/spark/sql/sources/BaseRelation;
at
org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426)
at
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
at
If I change it to CSV, it works fine:
applymapping1.toDF().repartition(1).write.format("com.databricks.csv").save("s3://glue.xml.output/Test.xml");
Note: When using CSV I don't have to import spark-xml. I think spark-csv is included in AWS Glue's Spark environment.
Any suggestions to what to try?
I've tried various versions of spark-xml:
spark-xml_2.11-0.2.0
spark-xml_2.11-0.3.1
spark-xml_2.10-0.2.0
That question is very similar to (but not an exact duplicate of) Why does elasticsearch-spark 5.5.0 give AbstractMethodError when submitting to YARN cluster? that also deals with AbstractMethodError.
Quoting the javadoc of java.lang.AbstractMethodError:
Thrown when an application tries to call an abstract method. Normally, this error is caught by the compiler; this error can only occur at run time if the definition of some class has incompatibly changed since the currently executing method was last compiled.
That pretty much explains what you experience (note the part that starts with "this error can only occur at run time").
I think it's a Spark version mismatch in play here.
Given com.databricks.spark.xml.DefaultSource15 in the stack trace and the change that does the following:
Remove the separated DefaultSource15 due to compatibility in Spark 1.5+
This removes DefaultSource15 and merge it into DefaultSource. This was separated for compatibility in Spark 1.5+ . In master and spark-xml 0.4.x, it dropped 1.x support.
You should make sure that the version of Spark in AWS Glue's Spark environment matches the spark-xml. The latest version of spark-xml 0.4.1 was released on 6 Nov 2016.
I'm trying to set spark.sql.parquet.output.committer.class and nothing I do seems to get the setting to take effect.
I'm trying to have many threads write to the same output folder, which would work with org.apache.spark.sql.
parquet.DirectParquetOutputCommitter since it wouldn't use the _temporary folder. I'm getting the following error, which is how I know it's not working:
Caused by: java.io.FileNotFoundException: File hdfs://path/to/stuff/_temporary/0/task_201606281757_0048_m_000029/some_dir does not exist.
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:795)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$700(DistributedFileSystem.java:106)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:853)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:849)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:849)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:382)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:384)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:326)
at org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:46)
at org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:230)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelation.scala:151)
Note the call to org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob, the default class.
I've tried the following, based on other SO answers and searches:
sc._jsc.hadoopConfiguration().set(key, val) (this does work for settings like parquet.enable.summary-metadata)
dataframe.write.option(key, val).parquet
Adding --conf "spark.hadoop.spark.sql.parquet.output.committer.class=org.apache.spark.sql.parquet.DirectParquetOutputCommitter" to the spark-submit call
Adding --conf "spark.sql.parquet.output.committer.class"=" org.apache.spark.sql.parquet.DirectParquetOutputCommitter" to the spark-submit call.
That's all I've been able to find, and nothing works. It looks like it's not hard to set in Scala but appears impossible in Python.
The approach in this comment definitively worked for me:
16/06/28 18:49:59 INFO ParquetRelation: Using user defined output committer for Parquet: org.apache.spark.sql.execution.datasources.parquet.DirectParquetOutputCommitter
It was a lost log message in the flood that Spark gives, and the error I was seeing was unrelated. It's all moot anyway, since the DirectParquetOutputCommitter has been removed from Spark.