Apache storm stream parse in Windows - python

I'm a newbie in apache storm.
I'm trying to run apache storm + stream parse in windows 10.
so I just tried to do in following.
(http://streamparse.readthedocs.io/en/master/quickstart.html)
First, Install Python 3.5 and JDK 1.8.0_131.
Secod, Download Apache Storm 1.1.0 and extract it.
Third, Zookeeper-3.3.6
And set Windows Environment Variable to like this.
JAVA_HOME=D:\dev\jdk1.8.0_131
STORM_HOME=D:\dev\apache-storm-1.1.0
LEIN_ROOT=D:\dev\leiningen-2.7.1-standalone
Path = %STORM_HOME%\bin;%JAVA_HOME%\bin;D:\Program Files\Python35;D:\Program Files\Python35\Lib\site-packages\;D:\Program Files\Python35\Scripts\;
PATHEXT=.PY;
Then, in cmd,
"zkServer.cmd"
"storm nimbus"
"storm supervisor"
"storm ui"
That's okay.
But
"sparse quickstart wordcount"
"cd wordcount"
"sparse run"
then
there is a log in cmd.
Traceback (most recent call last):
File "d:\program files\python35\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "d:\program files\python35\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "D:\Program Files\Python35\Scripts\sparse.exe\__main__.py", line 9, in <module>
File "d:\program files\python35\lib\site-packages\streamparse\cli\sparse.py", line 71, in main
if os.getuid() == 0 and not os.getenv('LEIN_ROOT'):
AttributeError: module 'os' has no attribute 'getuid'
so I modified sparse.py line 71 to "if not os.getenv('LEIN_ROOT'):"
d:\dev\wordcount\jps
2768 Supervisor
13396 QuorumPeerMain
1492 nimbus
8388 Flux
1016 core
12220 Jps
This is a log.
2017-07-18 15:07:02.731 o.a.s.z.Zookeeper main [INFO] Staring ZK Curator
2017-07-18 15:07:02.732 o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl main [INFO] Starting
2017-07-18 15:07:02.733 o.a.s.s.o.a.z.ZooKeeper main [INFO] Initiating client connection, connectString=localhost:2000/storm sessionTimeout=20000 watcher=org.apache.storm.shade.org.apache.curator.ConnectionState#491f8831
2017-07-18 15:07:02.738 o.a.s.s.o.a.z.ClientCnxn main-SendThread(0:0:0:0:0:0:0:1:2000) [INFO] Opening socket connection to server 0:0:0:0:0:0:0:1/0:0:0:0:0:0:0:1:2000. Will not attempt to authenticate using SASL (unknown error)
2017-07-18 15:07:02.740 o.a.s.s.o.a.z.ClientCnxn main-SendThread(0:0:0:0:0:0:0:1:2000) [INFO] Socket connection established to 0:0:0:0:0:0:0:1/0:0:0:0:0:0:0:1:2000, initiating session
2017-07-18 15:07:02.730 o.a.s.s.o.a.z.s.NIOServerCnxn NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2000 [WARN] caught end of stream exception
org.apache.storm.shade.org.apache.zookeeper.server.ServerCnxn$EndOfStreamException: Unable to read additional data from client sessionid 0x15d5485539a000b, likely client has closed socket
at org.apache.storm.shade.org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) [storm-core-1.1.0.jar:1.1.0]
at org.apache.storm.shade.org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) [storm-core-1.1.0.jar:1.1.0]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
2017-07-18 15:07:02.745 o.a.s.s.o.a.z.s.NIOServerCnxn NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2000 [INFO] Closed socket connection for client /127.0.0.1:3925 which had sessionid 0x15d5485539a000b
2017-07-18 15:07:02.747 o.a.s.s.o.a.z.s.NIOServerCnxnFactory NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2000 [INFO] Accepted socket connection from /0:0:0:0:0:0:0:1:3928
2017-07-18 15:07:02.748 o.a.s.s.o.a.z.s.ZooKeeperServer NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2000 [INFO] Client attempting to establish new session at /0:0:0:0:0:0:0:1:3928
2017-07-18 15:07:02.925 o.a.s.s.o.a.z.s.ZooKeeperServer SyncThread:0 [INFO] Established session 0x15d5485539a000c with negotiated timeout 20000 for client /0:0:0:0:0:0:0:1:3928
2017-07-18 15:07:02.925 o.a.s.s.o.a.z.ClientCnxn main-SendThread(0:0:0:0:0:0:0:1:2000) [INFO] Session establishment complete on server 0:0:0:0:0:0:0:1/0:0:0:0:0:0:0:1:2000, sessionid = 0x15d5485539a000c, negotiated timeout = 20000
2017-07-18 15:07:02.925 o.a.s.s.o.a.c.f.s.ConnectionStateManager main-EventThread [INFO] State change: CONNECTED
2017-07-18 15:07:02.948 o.a.s.l.Localizer main [INFO] Reconstruct localized resource: C:\Users\BigTone\AppData\Local\Temp\50d72755-68f7-4d97-86e8-a1e4c1035242\supervisor\usercache
2017-07-18 15:07:02.949 o.a.s.l.Localizer main [WARN] No left over resources found for any user during reconstructing of local resources at: C:\Users\BigTone\AppData\Local\Temp\50d72755-68f7-4d97-86e8-a1e4c1035242\supervisor\usercache
2017-07-18 15:07:02.950 o.a.s.d.s.Supervisor main [INFO] Starting Supervisor with conf {topology.builtin.metrics.bucket.size.secs=60, nimbus.childopts=-Xmx1024m, ui.filter.params=null, storm.cluster.mode=local, storm.messaging.netty.client_worker_threads=1, logviewer.max.per.worker.logs.size.mb=2048, supervisor.run.worker.as.user=false, topology.max.task.parallelism=null, topology.priority=29, zmq.threads=1, storm.group.mapping.service=org.apache.storm.security.auth.ShellBasedGroupsMapping, transactional.zookeeper.root=/transactional, topology.sleep.spout.wait.strategy.time.ms=1, scheduler.display.resource=false, topology.max.replication.wait.time.sec=60, drpc.invocations.port=3773, supervisor.localizer.cache.target.size.mb=10240, topology.multilang.serializer=org.apache.storm.multilang.JsonSerializer, storm.messaging.netty.server_worker_threads=1, nimbus.blobstore.class=org.apache.storm.blobstore.LocalFsBlobStore, resource.aware.scheduler.eviction.strategy=org.apache.storm.scheduler.resource.strategies.eviction.DefaultEvictionStrategy, topology.max.error.report.per.interval=5, storm.thrift.transport=org.apache.storm.security.auth.SimpleTransportPlugin, zmq.hwm=0, storm.group.mapping.service.params=null, worker.profiler.enabled=false, storm.principal.tolocal=org.apache.storm.security.auth.DefaultPrincipalToLocal, supervisor.worker.shutdown.sleep.secs=3, pacemaker.host=localhost, storm.zookeeper.retry.times=5, ui.actions.enabled=true, zmq.linger.millis=0, supervisor.enable=true, topology.stats.sample.rate=0.05, storm.messaging.netty.min_wait_ms=100, worker.log.level.reset.poll.secs=30, storm.zookeeper.port=2000, supervisor.heartbeat.frequency.secs=5, topology.enable.message.timeouts=true, supervisor.cpu.capacity=400.0, drpc.worker.threads=64, supervisor.blobstore.download.thread.count=5, task.backpressure.poll.secs=30, drpc.queue.size=128, topology.backpressure.enable=false, supervisor.blobstore.class=org.apache.storm.blobstore.NimbusBlobStore, storm.blobstore.inputstream.buffer.size.bytes=65536, topology.shellbolt.max.pending=100, drpc.https.keystore.password=, nimbus.code.sync.freq.secs=120, logviewer.port=8000, topology.scheduler.strategy=org.apache.storm.scheduler.resource.strategies.scheduling.DefaultResourceAwareStrategy, topology.executor.send.buffer.size=1024, resource.aware.scheduler.priority.strategy=org.apache.storm.scheduler.resource.strategies.priority.DefaultSchedulingPriorityStrategy, pacemaker.auth.method=NONE, storm.daemon.metrics.reporter.plugins=[org.apache.storm.daemon.metrics.reporters.JmxPreparableReporter], topology.worker.logwriter.childopts=-Xmx64m, topology.spout.wait.strategy=org.apache.storm.spout.SleepSpoutWaitStrategy, ui.host=0.0.0.0, storm.nimbus.retry.interval.millis=2000, nimbus.inbox.jar.expiration.secs=3600, dev.zookeeper.path=/tmp/dev-storm-zookeeper, topology.acker.executors=null, topology.fall.back.on.java.serialization=true, topology.eventlogger.executors=0, supervisor.localizer.cleanup.interval.ms=600000, storm.zookeeper.servers=[localhost], nimbus.thrift.threads=64, logviewer.cleanup.age.mins=10080, topology.worker.childopts=null, topology.classpath=null, supervisor.monitor.frequency.secs=3, nimbus.credential.renewers.freq.secs=600, topology.skip.missing.kryo.registrations=true, drpc.authorizer.acl.filename=drpc-auth-acl.yaml, pacemaker.kerberos.users=[], storm.group.mapping.service.cache.duration.secs=120, blobstore.dir=C:\Users\BigTone\AppData\Local\Temp\d43926be-beb0-44af-9620-1d547b57a96d, topology.testing.always.try.serialize=false, nimbus.monitor.freq.secs=10, storm.health.check.timeout.ms=5000, supervisor.supervisors=[], topology.tasks=null, topology.bolts.outgoing.overflow.buffer.enable=false, storm.messaging.netty.socket.backlog=500, topology.workers=1, pacemaker.base.threads=10, storm.local.dir=C:\Users\BigTone\AppData\Local\Temp\50d72755-68f7-4d97-86e8-a1e4c1035242, worker.childopts=-Xmx%HEAP-MEM%m -XX:+PrintGCDetails -Xloggc:artifacts/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=artifacts/heapdump, storm.auth.simple-white-list.users=[], topology.disruptor.batch.timeout.millis=1, topology.message.timeout.secs=30, topology.state.synchronization.timeout.secs=60, topology.tuple.serializer=org.apache.storm.serialization.types.ListDelegateSerializer, supervisor.supervisors.commands=[], nimbus.blobstore.expiration.secs=600, logviewer.childopts=-Xmx128m, topology.environment=null, topology.debug=false, topology.disruptor.batch.size=100, storm.disable.symlinks=false, storm.messaging.netty.max_retries=300, ui.childopts=-Xmx768m, storm.network.topography.plugin=org.apache.storm.networktopography.DefaultRackDNSToSwitchMapping, storm.zookeeper.session.timeout=20000, drpc.childopts=-Xmx768m, drpc.http.creds.plugin=org.apache.storm.security.auth.DefaultHttpCredentialsPlugin, storm.zookeeper.connection.timeout=15000, storm.zookeeper.auth.user=null, storm.meta.serialization.delegate=org.apache.storm.serialization.GzipThriftSerializationDelegate, topology.max.spout.pending=null, storm.codedistributor.class=org.apache.storm.codedistributor.LocalFileSystemCodeDistributor, nimbus.supervisor.timeout.secs=60, nimbus.task.timeout.secs=30, drpc.port=3772, pacemaker.max.threads=50, storm.zookeeper.retry.intervalceiling.millis=30000, nimbus.thrift.port=6627, storm.auth.simple-acl.admins=[], topology.component.cpu.pcore.percent=10.0, supervisor.memory.capacity.mb=3072.0, storm.nimbus.retry.times=5, supervisor.worker.start.timeout.secs=120, storm.zookeeper.retry.interval=1000, logs.users=null, storm.cluster.metrics.consumer.publish.interval.secs=60, worker.profiler.command=flight.bash, transactional.zookeeper.port=null, drpc.max_buffer_size=1048576, pacemaker.thread.timeout=10, task.credentials.poll.secs=30, blobstore.superuser=BigTone, drpc.https.keystore.type=JKS, topology.worker.receiver.thread.count=1, topology.state.checkpoint.interval.ms=1000, supervisor.slots.ports=[1027, 1028, 1029], topology.transfer.buffer.size=1024, storm.health.check.dir=healthchecks, topology.worker.shared.thread.pool.size=4, drpc.authorizer.acl.strict=false, nimbus.file.copy.expiration.secs=600, worker.profiler.childopts=-XX:+UnlockCommercialFeatures -XX:+FlightRecorder, topology.executor.receive.buffer.size=1024, backpressure.disruptor.low.watermark=0.4, nimbus.task.launch.secs=120, storm.local.mode.zmq=false, storm.messaging.netty.buffer_size=5242880, storm.cluster.state.store=org.apache.storm.cluster_state.zookeeper_state_factory, worker.heartbeat.frequency.secs=1, storm.log4j2.conf.dir=log4j2, ui.http.creds.plugin=org.apache.storm.security.auth.DefaultHttpCredentialsPlugin, storm.zookeeper.root=/storm, topology.tick.tuple.freq.secs=null, drpc.https.port=-1, storm.workers.artifacts.dir=workers-artifacts, supervisor.blobstore.download.max_retries=3, task.refresh.poll.secs=10, storm.exhibitor.port=8080, task.heartbeat.frequency.secs=3, pacemaker.port=6699, storm.messaging.netty.max_wait_ms=1000, topology.component.resources.offheap.memory.mb=0.0, drpc.http.port=3774, topology.error.throttle.interval.secs=10, storm.messaging.transport=org.apache.storm.messaging.netty.Context, topology.disable.loadaware.messaging=false, storm.messaging.netty.authentication=false, topology.component.resources.onheap.memory.mb=128.0, topology.kryo.factory=org.apache.storm.serialization.DefaultKryoFactory, worker.gc.childopts=, nimbus.topology.validator=org.apache.storm.nimbus.DefaultTopologyValidator, nimbus.seeds=[localhost], nimbus.queue.size=100000, nimbus.cleanup.inbox.freq.secs=600, storm.blobstore.replication.factor=3, worker.heap.memory.mb=768, logviewer.max.sum.worker.logs.size.mb=4096, pacemaker.childopts=-Xmx1024m, ui.users=null, transactional.zookeeper.servers=null, supervisor.worker.timeout.secs=30, storm.zookeeper.auth.password=null, storm.blobstore.acl.validation.enabled=false, client.blobstore.class=org.apache.storm.blobstore.NimbusBlobStore, storm.thrift.socket.timeout.ms=600000, supervisor.childopts=-Xmx256m, topology.worker.max.heap.size.mb=768.0, ui.http.x-frame-options=DENY, backpressure.disruptor.high.watermark=0.9, ui.filter=null, ui.header.buffer.bytes=4096, topology.min.replication.count=1, topology.disruptor.wait.timeout.millis=1000, storm.nimbus.retry.intervalceiling.millis=60000, topology.trident.batch.emit.interval.millis=50, storm.auth.simple-acl.users=[], drpc.invocations.threads=64, java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib, ui.port=8080, storm.exhibitor.poll.uripath=/exhibitor/v1/cluster/list, storm.messaging.netty.transfer.batch.size=262144, logviewer.appender.name=A1, nimbus.thrift.max_buffer_size=1048576, storm.auth.simple-acl.users.commands=[], drpc.request.timeout.secs=600}
2017-07-18 15:07:03.115 o.a.s.d.s.Slot main [WARN] SLOT DESKTOP-PDE9HPE:1027 Starting in state EMPTY - assignment null
2017-07-18 15:07:03.115 o.a.s.d.s.Slot main [WARN] SLOT DESKTOP-PDE9HPE:1028 Starting in state EMPTY - assignment null
2017-07-18 15:07:03.115 o.a.s.d.s.Slot main [WARN] SLOT DESKTOP-PDE9HPE:1029 Starting in state EMPTY - assignment null
2017-07-18 15:07:03.115 o.a.s.l.AsyncLocalizer main [INFO] Cleaning up unused topologies in C:\Users\BigTone\AppData\Local\Temp\50d72755-68f7-4d97-86e8-a1e4c1035242\supervisor\stormdist
2017-07-18 15:07:03.115 o.a.s.d.s.Supervisor main [INFO] Starting supervisor with id 93ef51bb-5109-4f38-907f-495ccc7f552d at host DESKTOP-PDE9HPE.
2017-07-18 15:07:03.146 o.a.s.d.nimbus main [WARN] Topology submission exception. (topology name='topologies\wordcount') #error {
:cause nil
:via
[{:type org.apache.storm.generated.InvalidTopologyException
:message nil
:at [org.apache.storm.daemon.nimbus$validate_topology_name_BANG_ invoke nimbus.clj 1320]}]
:trace
[[org.apache.storm.daemon.nimbus$validate_topology_name_BANG_ invoke nimbus.clj 1320]
[org.apache.storm.daemon.nimbus$mk_reified_nimbus$reify__10782 submitTopologyWithOpts nimbus.clj 1643]
[org.apache.storm.daemon.nimbus$mk_reified_nimbus$reify__10782 submitTopology nimbus.clj 1726]
[sun.reflect.NativeMethodAccessorImpl invoke0 NativeMethodAccessorImpl.java -2]
[sun.reflect.NativeMethodAccessorImpl invoke NativeMethodAccessorImpl.java 62]
[sun.reflect.DelegatingMethodAccessorImpl invoke DelegatingMethodAccessorImpl.java 43]
[java.lang.reflect.Method invoke Method.java 498]
[clojure.lang.Reflector invokeMatchingMethod Reflector.java 93]
[clojure.lang.Reflector invokeInstanceMethod Reflector.java 28]
[org.apache.storm.testing$submit_local_topology invoke testing.clj 310]
[org.apache.storm.LocalCluster$_submitTopology invoke LocalCluster.clj 49]
[org.apache.storm.LocalCluster submitTopology nil -1]
[org.apache.storm.flux.Flux runCli Flux.java 207]
[org.apache.storm.flux.Flux main Flux.java 98]]}
So I changed topology name with "wordcount" in tmp.yaml.
storm jar D:\dev\wordcount\_build\wordcount-0.0.1-SNAPSHOT-standalone.jar org.apache.storm.flux.Flux --local --no-splash --sleep 9223372036854775807 c:\users\bigtone\appdata\local\temp\tmpwodnya.yaml
02:23:26.894 [Thread-18-count_bolt2222222-executor[2 2]] ERROR org.apache.storm.util - Async loop died!
java.lang.RuntimeException: Error when launching multilang subprocess
Traceback (most recent call last):
File "d:\dev\python27\lib\runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "d:\dev\python27\lib\runpy.py", line 72, in _run_code
exec code in run_globals
File "D:\dev\Python27\Scripts\streamparse_run.exe\__main__.py", line 9, in <module>
File "d:\dev\python27\lib\site-packages\streamparse\run.py", line 45, in main
cls(serializer=args.serializer).run()
File "d:\dev\python27\lib\site-packages\pystorm\bolt.py", line 68, in __init__
super(Bolt, self).__init__(*args, **kwargs)
File "d:\dev\python27\lib\site-packages\pystorm\component.py", line 211, in __init__
signal.signal(rdb_signal, remote_pdb_handler)
TypeError: an integer is required
How can I fix this?
Who can help me?
Thanks for any tips in advance!

Related

Airflow Scheduler fails with error `sqlalchemy.exc.OperationalError: (MySQLdb.OperationalError) (2006, 'MySQL server has gone away')`

I'm using Airflow 2.5.1 and setting up Airflow scheduler (Using LocalExecutor) + webserver on Deb9 Instance.
MySQL DB is on another instance and I checked using PING and airflow db check that the connection to the MySQL server is successful. I had even run airflow db init from this instance and it was able to create all the tables successfully.
When I start the scheduler using airflow scheduler command, getting the following error:
airflow-scheduler.err log below:
sqlalchemy.exc.OperationalError: (MySQLdb.OperationalError) (2006, 'MySQL server has gone away')
(Background on this error at: https://sqlalche.me/e/14/e3q8)
[2023-02-09 01:12:34 +0530] [15451] [ERROR] Connection in use: ('0.0.0.0', 8793)
[2023-02-09 01:12:34 +0530] [15451] [ERROR] Retrying in 1 second.
[2023-02-09 01:12:35 +0530] [15451] [ERROR] Connection in use: ('0.0.0.0', 8793)
[2023-02-09 01:12:35 +0530] [15451] [ERROR] Retrying in 1 second.
[2023-02-09 01:12:36 +0530] [15451] [ERROR] Connection in use: ('0.0.0.0', 8793)
[2023-02-09 01:12:36 +0530] [15451] [ERROR] Retrying in 1 second.
[2023-02-09 01:12:37 +0530] [15451] [ERROR] Can't connect to ('0.0.0.0', 8793)
airflow-scheduler.log below:
sqlalchemy.exc.OperationalError: (MySQLdb.OperationalError) (2006, 'MySQL server has gone away')
(Background on this error at: https://sqlalche.me/e/14/e3q8)
2023-02-09 01:12:33,040 INFO - Shutting down LocalExecutor; waiting for running tasks to finish. Signal again if you don't want to wait.
2023-02-09 01:12:34,066 INFO - Sending Signals.SIGTERM to group 15492. PIDs of all processes in the group: [15492]
2023-02-09 01:12:34,066 INFO - Sending the signal Signals.SIGTERM to group 15492
2023-02-09 01:12:34,158 INFO - Process psutil.Process(pid=15492, status='terminated', exitcode=0, started='01:12:32') (15492) terminated with exit code 0
2023-02-09 01:12:34,159 INFO - Exited execute loop
Any idea why is this happening?
[Update]
Connection in use: ('0.0.0.0', 8793) was because of some processes running from previous run. Have killed those processes. But I'm still getting MySQL server has gone away error.
airflow-scheduler.err log:
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/airflow/airflow_venv/bin/airflow", line 8, in <module>
sys.exit(main())
File "/home/airflow/airflow_venv/lib/python3.7/site-packages/airflow/__main__.py", line 39, in main
args.func(args)
File "/home/airflow/airflow_venv/lib/python3.7/site-packages/airflow/cli/cli_parser.py", line 52, in command
return func(*args, **kwargs)
File "/home/airflow/airflow_venv/lib/python3.7/site-packages/airflow/utils/cli.py", line 108, in wrapper
return f(*args, **kwargs)
File "/home/airflow/airflow_venv/lib/python3.7/site-packages/airflow/cli/commands/scheduler_command.py", line 68, in scheduler
_run_scheduler_job(args=args)
File "/home/airflow/airflow_venv/lib/python3.7/site-packages/airflow/cli/commands/scheduler_command.py", line 43, in _run_scheduler_job
job.run()
File "/home/airflow/airflow_venv/lib/python3.7/site-packages/airflow/jobs/base_job.py", line 258, in run
self._execute()
File "/home/airflow/airflow_venv/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 759, in _execute
self._run_scheduler_loop()
File "/home/airflow/airflow_venv/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 840, in _run_scheduler_loop
self.adopt_or_reset_orphaned_tasks()
File "/home/airflow/airflow_venv/lib/python3.7/site-packages/airflow/utils/session.py", line 75, in wrapper
return func(*args, session=session, **kwargs)
File "/usr/local/lib/python3.7/contextlib.py", line 119, in __exit__
next(self.gen)
File "/home/airflow/airflow_venv/lib/python3.7/site-packages/airflow/utils/session.py", line 36, in create_session
session.commit()
File "/home/airflow/airflow_venv/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 1451, in commit
self._transaction.commit(_to_root=self.future)
File "/home/airflow/airflow_venv/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 836, in commit
trans.commit()
File "/home/airflow/airflow_venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2459, in commit
self._do_commit()
File "/home/airflow/airflow_venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2649, in _do_commit
self._connection_commit_impl()
File "/home/airflow/airflow_venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2620, in _connection_commit_impl
self.connection._commit_impl()
File "/home/airflow/airflow_venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1091, in _commit_impl
self._handle_dbapi_exception(e, None, None, None, None)
File "/home/airflow/airflow_venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2125, in _handle_dbapi_exception
sqlalchemy_exception, with_traceback=exc_info[2], from_=e
File "/home/airflow/airflow_venv/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
raise exception
File "/home/airflow/airflow_venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1089, in _commit_impl
self.engine.dialect.do_commit(self.connection)
File "/home/airflow/airflow_venv/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 686, in do_commit
dbapi_connection.commit()
sqlalchemy.exc.OperationalError: (MySQLdb.OperationalError) (2006, 'MySQL server has gone away')
(Background on this error at: https://sqlalche.me/e/14/e3q8)
When running the LocalExecutor, the scheduler also starts a process to serve log files, by default on port 8793. The error tells you you've already got something running on port 8793, therefore it can't start and returns an error. You've probably already got a scheduler running.
The port is configurable by AIRFLOW__LOGGING__WORKER_LOG_SERVER_PORT. For more information, see https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#worker-log-server-port.

How to start a Ray cluster on one local server using yaml file config without docker

Could any one help me how to start ray on local server by a config file?
My current local server can run Ray successfully when using below command:
ray start --head --node-ip-address 127.0.0.1 --port 6379 --dashboard-host 0.0.0.0 --dashboard-port 8265 --gcs-server-port 8075 --object-manager-port 8076 --node-manager-port 8077 --min-worker-port 10002 --max-worker-port 19999
But now I need to move it to a config file so that other service can control it. I did try to make a cluster.yaml file and start it with command ray up cluster.yaml, but it's facing error.
The content of cluster.yaml file is:
cluster_name: default
max_workers: 1
upscaling_speed: 1.0
idle_timeout_minutes: 5
provider:
​type: local
​head_ip: 0.0.0.0
​worker_ips:
​- 127.0.0.1
auth:
​ssh_user: root
​ssh_private_key: ~/.ssh/id_rsa
file_mounts: {}
cluster_synced_files: []
file_mounts_sync_continuously: False
rsync_exclude:
​- "**/.git"
​- "**/.git/**"
rsync_filter:
​- ".gitignore"
initialization_commands: []
setup_commands: []
head_setup_commands: []
worker_setup_commands: []
head_start_ray_commands:
​- ray stop
​- ray start --head --port 6379 --dashboard-host '0.0.0.0' --dashboard-port 8265 --gcs-server-port 8075 --object-manager-port 8076 --node-manager-port 8077 --min-worker-port 10002 --max-worker-port 19999 --autoscaling-config=~/ray_bootstrap_config.yaml
worker_start_ray_commands:
​- ray stop
​- ray start --address=$RAY_HEAD_IP:6379 --object-manager-port=8076
head_node: {}
worker_nodes: {}
But it's unable to start due to below error:
<1/1> Setting up head node
​Prepared bootstrap config
2022-01-27 05:52:26,910 INFO node_provider.py:103 -- ClusterState: Writing cluster state: ['127.0.0.1', '0.0.0.0']
[1/7] Waiting for SSH to become available
​Running `uptime` as a test.
​Fetched IP: 0.0.0.0
05:52:33 up 3:43, 0 users, load average: 0.10, 0.18, 0.16
Shared connection to 0.0.0.0 closed.
​Success.
[2/7] Processing file mounts
[3/7] No worker file mounts to sync
[4/7] No initialization commands to run.
[5/7] Initalizing command runner
[6/7] No setup commands to run.
[7/7] Starting the Ray runtime
Shared connection to 0.0.0.0 closed.
Local node IP: 192.168.1.18
2022-01-27 05:52:37,541 INFO services.py:1340 -- View the Ray dashboard at http://192.168.1.18:8265
Traceback (most recent call last):
​File "/usr/local/lib/python3.7/dist-packages/ray/node.py", line 240, in __init__
​self.redis_password)
​File "/usr/local/lib/python3.7/dist-packages/ray/_private/services.py", line 324, in wait_for_node
​raise TimeoutError("Timed out while waiting for node to startup.")
TimeoutError: Timed out while waiting for node to startup.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
​File "/usr/local/bin/ray", line 8, in <module>
​sys.exit(main())
​File "/usr/local/lib/python3.7/dist-packages/ray/scripts/scripts.py", line 1989, in main
​return cli()
​File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1128, in __call__
​return self.main(*args, **kwargs)
​File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1053, in main
​rv = self.invoke(ctx)
​File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1659, in invoke
​return _process_result(sub_ctx.command.invoke(sub_ctx))
​File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1395, in invoke
​return ctx.invoke(self.callback, **ctx.params)
​File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 754, in invoke
​return __callback(*args, **kwargs)
​File "/usr/local/lib/python3.7/dist-packages/ray/scripts/scripts.py", line 633, in start
​ray_params, head=True, shutdown_at_exit=block, spawn_reaper=block)
​File "/usr/local/lib/python3.7/dist-packages/ray/node.py", line 243, in __init__
​"The current node has not been updated within 30 "
Exception: The current node has not been updated within 30 seconds, this could happen because of some of the Ray processes failed to startup.
Shared connection to 0.0.0.0 closed.
2022-01-27 05:53:07,718 INFO node_provider.py:103 -- ClusterState: Writing cluster state: ['127.0.0.1', '0.0.0.0']
​New status: update-failed
​!!!
​SSH command failed.
​!!!
​Failed to setup head node.

Gunicorn autorestart causing errors

I am running gunicorn in async mode behind a nginx reverse proxy. Both are in separate docker containers on the same VM in the host network and everything is running fine as long as I don't configure max_requests to autorestart workers after a certain amount of requests. With autorestart configured the reboot of workers is not handled correctly, throwing errors and causing failed responses. I need this settings to fix problems with memory leaks and prevent gunicorn and other application components from crashing.
Gunicorn log:
2020-08-07 06:55:23 [1438] [INFO] Autorestarting worker after current request.
2020-08-07 06:55:23 [1438] [ERROR] Socket error processing request.
Traceback (most recent call last):
File "/opt/mapproxy/lib/python3.5/site-packages/gunicorn/workers/base_async.py", line 65, in handle
util.reraise(*sys.exc_info())
File "/opt/mapproxy/lib/python3.5/site-packages/gunicorn/util.py", line 625, in reraise
raise value
File "/opt/mapproxy/lib/python3.5/site-packages/gunicorn/workers/base_async.py", line 38, in handle
listener_name = listener.getsockname()
OSError: [Errno 9] Bad file descriptor
Gunicorn is running with the following configuration:
bind = '0.0.0.0:8081'
worker_class = 'eventlet'
workers = 8
timeout = 60
no_sendfile = True
max_requests = 1000
max_requests_jitter = 500

gunicorn threads getting killed silently

gunicorn version 19.9.0
Got the following gunicorn config:
accesslog = "access.log"
worker_class = 'sync'
workers = 1
worker_connections = 1000
timeout = 300
graceful_timeout = 300
keepalive = 300
proc_name = 'server'
bind = '0.0.0.0:8080'
name = 'server.py'
preload = True
log_level = "info"
threads = 7
max_requests = 0
backlog = 100
As you can see, the server is configured to run 7 threads.
The server is started with:
gunicorn -c gunicorn_config.py server:app
Here are the number of lines and thread IDs from our log file at the beginning (with the last line being the thread of the main server):
10502 140625414080256
10037 140624842843904
9995 140624859629312
9555 140625430865664
9526 140624851236608
9409 140625405687552
2782 140625422472960
6 140628359804736
So 7 threads are processing the requests. (Already we can see that thread 140625422472960 is processing substantially fewer requests than the other threads.)
But after the lines examined above, thread 140625422472960 just vanishes and the log file only has:
19602 140624859629312
18861 140625405687552
18766 140624851236608
18765 140624842843904
12523 140625414080256
2111 140625430865664
(excluding the main thread here)
From the server logs we could see that the thread received a request and started processing it, but never finished. The client received no response either.
There is no error/warning in the log file, nor in stderr.
And running the app for a little longer, two more threads are gone:
102 140624842843904
102 140624851236608
68 140624859629312
85 140625405687552
How to debug this?
Digging into the stderr logs further, finally found something like this exception stack trace:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/gunicorn/workers/sync.py", line 134, in handle
req = six.next(parser)
File "/usr/local/lib/python3.6/site-packages/gunicorn/http/parser.py", line 41, in __next__
self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count)
File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 181, in __init__
super(Request, self).__init__(cfg, unreader)
File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 54, in __init__
unused = self.parse(self.unreader)
File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 230, in parse
self.headers = self.parse_headers(data[:idx])
File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 74, in parse_headers
remote_addr = self.unreader.sock.getpeername()
OSError: [Errno 107] Transport endpoint is not connected
[2018-11-04 17:57:55 +0330] [31] [ERROR] Socket error processing request.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/gunicorn/workers/sync.py", line 134, in handle
req = six.next(parser)
File "/usr/local/lib/python3.6/site-packages/gunicorn/http/parser.py", line 41, in __next__
self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count)
File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 181, in __init__
super(Request, self).__init__(cfg, unreader)
File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 54, in __init__
unused = self.parse(self.unreader)
File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 230, in parse
self.headers = self.parse_headers(data[:idx])
File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 74, in parse_headers
remote_addr = self.unreader.sock.getpeername()
OSError: [Errno 107] Transport endpoint is not connected
This is due to this gunicorn bug.
An interim solution until this bug is fixed is to monkey patch gunicorn as done by asantoni.

Twisted plugin needs to fail fast of port is taken

I have a twistd plugin that listens on a port and does very simple things. The problem is that when I start it, if the post is not available it just sits there with the process running, but doing nothing. I need the process to exit immediately in this case so the larger system can notice and deal with the problem
I have code like this:
def makeService(options):
root = Resource() # Not what I actually have...
factory = server.Site(root)
server_string = b'tcp:{0}:interface={1}'.format(options['port'], options['interface'])
endpoint = endpoints.serverFromString(reactor, server_string)
service = internet.StreamServerEndpointService(endpoint, factory)
return service
This results in:
2016-12-19T11:42:21-0600] [info] [3082] [-] Log opened.
[2016-12-19T11:42:21-0600] [info] [3082] [-] twistd 15.5.0 (/home/matthew/code-venvs/wgcbap/bin/python 2.7.6) starting up.
[2016-12-19T11:42:21-0600] [info] [3082] [-] reactor class: twisted.internet.epollreactor.EPollReactor.
[2016-12-19T11:42:21-0600] [critical] [3082] [-] Unhandled Error
Traceback (most recent call last):
File "/home/matthew/code-venvs/wgcbap/local/lib/python2.7/site-packages/twisted/scripts/_twistd_unix.py", line 394, in startApplication
service.IService(application).privilegedStartService()
File "/home/matthew/code-venvs/wgcbap/local/lib/python2.7/site-packages/twisted/application/service.py", line 278, in privilegedStartService
service.privilegedStartService()
File "/home/matthew/code-venvs/wgcbap/local/lib/python2.7/site-packages/twisted/application/internet.py", line 352, in privilegedStartService
self._waitingForPort = self.endpoint.listen(self.factory)
File "/home/matthew/code-venvs/wgcbap/local/lib/python2.7/site-packages/twisted/internet/endpoints.py", line 457, in listen
interface=self._interface)
--- <exception caught here> ---
File "/home/matthew/code-venvs/wgcbap/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 121, in execute
result = callable(*args, **kw)
File "/home/matthew/code-venvs/wgcbap/local/lib/python2.7/site-packages/twisted/internet/posixbase.py", line 478, in listenTCP
p.startListening()
File "/home/matthew/code-venvs/wgcbap/local/lib/python2.7/site-packages/twisted/internet/tcp.py", line 984, in startListening
raise CannotListenError(self.interface, self.port, le)
twisted.internet.error.CannotListenError: Couldn't listen on 127.0.0.1:9999: [Errno 98] Address already in use.
And it continues to run, doing nothing....
Adding a line service._raiseSynchronously = True just above the return works, but seems to be undocumented and feels dirty.
Is there an approved way to do this?

Categories

Resources