Apache storm stream parse in Windows - python
I'm a newbie in apache storm.
I'm trying to run apache storm + stream parse in windows 10.
so I just tried to do in following.
(http://streamparse.readthedocs.io/en/master/quickstart.html)
First, Install Python 3.5 and JDK 1.8.0_131.
Secod, Download Apache Storm 1.1.0 and extract it.
Third, Zookeeper-3.3.6
And set Windows Environment Variable to like this.
JAVA_HOME=D:\dev\jdk1.8.0_131
STORM_HOME=D:\dev\apache-storm-1.1.0
LEIN_ROOT=D:\dev\leiningen-2.7.1-standalone
Path = %STORM_HOME%\bin;%JAVA_HOME%\bin;D:\Program Files\Python35;D:\Program Files\Python35\Lib\site-packages\;D:\Program Files\Python35\Scripts\;
PATHEXT=.PY;
Then, in cmd,
"zkServer.cmd"
"storm nimbus"
"storm supervisor"
"storm ui"
That's okay.
But
"sparse quickstart wordcount"
"cd wordcount"
"sparse run"
then
there is a log in cmd.
Traceback (most recent call last):
File "d:\program files\python35\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "d:\program files\python35\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "D:\Program Files\Python35\Scripts\sparse.exe\__main__.py", line 9, in <module>
File "d:\program files\python35\lib\site-packages\streamparse\cli\sparse.py", line 71, in main
if os.getuid() == 0 and not os.getenv('LEIN_ROOT'):
AttributeError: module 'os' has no attribute 'getuid'
so I modified sparse.py line 71 to "if not os.getenv('LEIN_ROOT'):"
d:\dev\wordcount\jps
2768 Supervisor
13396 QuorumPeerMain
1492 nimbus
8388 Flux
1016 core
12220 Jps
This is a log.
2017-07-18 15:07:02.731 o.a.s.z.Zookeeper main [INFO] Staring ZK Curator
2017-07-18 15:07:02.732 o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl main [INFO] Starting
2017-07-18 15:07:02.733 o.a.s.s.o.a.z.ZooKeeper main [INFO] Initiating client connection, connectString=localhost:2000/storm sessionTimeout=20000 watcher=org.apache.storm.shade.org.apache.curator.ConnectionState#491f8831
2017-07-18 15:07:02.738 o.a.s.s.o.a.z.ClientCnxn main-SendThread(0:0:0:0:0:0:0:1:2000) [INFO] Opening socket connection to server 0:0:0:0:0:0:0:1/0:0:0:0:0:0:0:1:2000. Will not attempt to authenticate using SASL (unknown error)
2017-07-18 15:07:02.740 o.a.s.s.o.a.z.ClientCnxn main-SendThread(0:0:0:0:0:0:0:1:2000) [INFO] Socket connection established to 0:0:0:0:0:0:0:1/0:0:0:0:0:0:0:1:2000, initiating session
2017-07-18 15:07:02.730 o.a.s.s.o.a.z.s.NIOServerCnxn NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2000 [WARN] caught end of stream exception
org.apache.storm.shade.org.apache.zookeeper.server.ServerCnxn$EndOfStreamException: Unable to read additional data from client sessionid 0x15d5485539a000b, likely client has closed socket
at org.apache.storm.shade.org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) [storm-core-1.1.0.jar:1.1.0]
at org.apache.storm.shade.org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) [storm-core-1.1.0.jar:1.1.0]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
2017-07-18 15:07:02.745 o.a.s.s.o.a.z.s.NIOServerCnxn NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2000 [INFO] Closed socket connection for client /127.0.0.1:3925 which had sessionid 0x15d5485539a000b
2017-07-18 15:07:02.747 o.a.s.s.o.a.z.s.NIOServerCnxnFactory NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2000 [INFO] Accepted socket connection from /0:0:0:0:0:0:0:1:3928
2017-07-18 15:07:02.748 o.a.s.s.o.a.z.s.ZooKeeperServer NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2000 [INFO] Client attempting to establish new session at /0:0:0:0:0:0:0:1:3928
2017-07-18 15:07:02.925 o.a.s.s.o.a.z.s.ZooKeeperServer SyncThread:0 [INFO] Established session 0x15d5485539a000c with negotiated timeout 20000 for client /0:0:0:0:0:0:0:1:3928
2017-07-18 15:07:02.925 o.a.s.s.o.a.z.ClientCnxn main-SendThread(0:0:0:0:0:0:0:1:2000) [INFO] Session establishment complete on server 0:0:0:0:0:0:0:1/0:0:0:0:0:0:0:1:2000, sessionid = 0x15d5485539a000c, negotiated timeout = 20000
2017-07-18 15:07:02.925 o.a.s.s.o.a.c.f.s.ConnectionStateManager main-EventThread [INFO] State change: CONNECTED
2017-07-18 15:07:02.948 o.a.s.l.Localizer main [INFO] Reconstruct localized resource: C:\Users\BigTone\AppData\Local\Temp\50d72755-68f7-4d97-86e8-a1e4c1035242\supervisor\usercache
2017-07-18 15:07:02.949 o.a.s.l.Localizer main [WARN] No left over resources found for any user during reconstructing of local resources at: C:\Users\BigTone\AppData\Local\Temp\50d72755-68f7-4d97-86e8-a1e4c1035242\supervisor\usercache
2017-07-18 15:07:02.950 o.a.s.d.s.Supervisor main [INFO] Starting Supervisor with conf {topology.builtin.metrics.bucket.size.secs=60, nimbus.childopts=-Xmx1024m, ui.filter.params=null, storm.cluster.mode=local, storm.messaging.netty.client_worker_threads=1, logviewer.max.per.worker.logs.size.mb=2048, supervisor.run.worker.as.user=false, topology.max.task.parallelism=null, topology.priority=29, zmq.threads=1, storm.group.mapping.service=org.apache.storm.security.auth.ShellBasedGroupsMapping, transactional.zookeeper.root=/transactional, topology.sleep.spout.wait.strategy.time.ms=1, scheduler.display.resource=false, topology.max.replication.wait.time.sec=60, drpc.invocations.port=3773, supervisor.localizer.cache.target.size.mb=10240, topology.multilang.serializer=org.apache.storm.multilang.JsonSerializer, storm.messaging.netty.server_worker_threads=1, nimbus.blobstore.class=org.apache.storm.blobstore.LocalFsBlobStore, resource.aware.scheduler.eviction.strategy=org.apache.storm.scheduler.resource.strategies.eviction.DefaultEvictionStrategy, topology.max.error.report.per.interval=5, storm.thrift.transport=org.apache.storm.security.auth.SimpleTransportPlugin, zmq.hwm=0, storm.group.mapping.service.params=null, worker.profiler.enabled=false, storm.principal.tolocal=org.apache.storm.security.auth.DefaultPrincipalToLocal, supervisor.worker.shutdown.sleep.secs=3, pacemaker.host=localhost, storm.zookeeper.retry.times=5, ui.actions.enabled=true, zmq.linger.millis=0, supervisor.enable=true, topology.stats.sample.rate=0.05, storm.messaging.netty.min_wait_ms=100, worker.log.level.reset.poll.secs=30, storm.zookeeper.port=2000, supervisor.heartbeat.frequency.secs=5, topology.enable.message.timeouts=true, supervisor.cpu.capacity=400.0, drpc.worker.threads=64, supervisor.blobstore.download.thread.count=5, task.backpressure.poll.secs=30, drpc.queue.size=128, topology.backpressure.enable=false, supervisor.blobstore.class=org.apache.storm.blobstore.NimbusBlobStore, storm.blobstore.inputstream.buffer.size.bytes=65536, topology.shellbolt.max.pending=100, drpc.https.keystore.password=, nimbus.code.sync.freq.secs=120, logviewer.port=8000, topology.scheduler.strategy=org.apache.storm.scheduler.resource.strategies.scheduling.DefaultResourceAwareStrategy, topology.executor.send.buffer.size=1024, resource.aware.scheduler.priority.strategy=org.apache.storm.scheduler.resource.strategies.priority.DefaultSchedulingPriorityStrategy, pacemaker.auth.method=NONE, storm.daemon.metrics.reporter.plugins=[org.apache.storm.daemon.metrics.reporters.JmxPreparableReporter], topology.worker.logwriter.childopts=-Xmx64m, topology.spout.wait.strategy=org.apache.storm.spout.SleepSpoutWaitStrategy, ui.host=0.0.0.0, storm.nimbus.retry.interval.millis=2000, nimbus.inbox.jar.expiration.secs=3600, dev.zookeeper.path=/tmp/dev-storm-zookeeper, topology.acker.executors=null, topology.fall.back.on.java.serialization=true, topology.eventlogger.executors=0, supervisor.localizer.cleanup.interval.ms=600000, storm.zookeeper.servers=[localhost], nimbus.thrift.threads=64, logviewer.cleanup.age.mins=10080, topology.worker.childopts=null, topology.classpath=null, supervisor.monitor.frequency.secs=3, nimbus.credential.renewers.freq.secs=600, topology.skip.missing.kryo.registrations=true, drpc.authorizer.acl.filename=drpc-auth-acl.yaml, pacemaker.kerberos.users=[], storm.group.mapping.service.cache.duration.secs=120, blobstore.dir=C:\Users\BigTone\AppData\Local\Temp\d43926be-beb0-44af-9620-1d547b57a96d, topology.testing.always.try.serialize=false, nimbus.monitor.freq.secs=10, storm.health.check.timeout.ms=5000, supervisor.supervisors=[], topology.tasks=null, topology.bolts.outgoing.overflow.buffer.enable=false, storm.messaging.netty.socket.backlog=500, topology.workers=1, pacemaker.base.threads=10, storm.local.dir=C:\Users\BigTone\AppData\Local\Temp\50d72755-68f7-4d97-86e8-a1e4c1035242, worker.childopts=-Xmx%HEAP-MEM%m -XX:+PrintGCDetails -Xloggc:artifacts/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=artifacts/heapdump, storm.auth.simple-white-list.users=[], topology.disruptor.batch.timeout.millis=1, topology.message.timeout.secs=30, topology.state.synchronization.timeout.secs=60, topology.tuple.serializer=org.apache.storm.serialization.types.ListDelegateSerializer, supervisor.supervisors.commands=[], nimbus.blobstore.expiration.secs=600, logviewer.childopts=-Xmx128m, topology.environment=null, topology.debug=false, topology.disruptor.batch.size=100, storm.disable.symlinks=false, storm.messaging.netty.max_retries=300, ui.childopts=-Xmx768m, storm.network.topography.plugin=org.apache.storm.networktopography.DefaultRackDNSToSwitchMapping, storm.zookeeper.session.timeout=20000, drpc.childopts=-Xmx768m, drpc.http.creds.plugin=org.apache.storm.security.auth.DefaultHttpCredentialsPlugin, storm.zookeeper.connection.timeout=15000, storm.zookeeper.auth.user=null, storm.meta.serialization.delegate=org.apache.storm.serialization.GzipThriftSerializationDelegate, topology.max.spout.pending=null, storm.codedistributor.class=org.apache.storm.codedistributor.LocalFileSystemCodeDistributor, nimbus.supervisor.timeout.secs=60, nimbus.task.timeout.secs=30, drpc.port=3772, pacemaker.max.threads=50, storm.zookeeper.retry.intervalceiling.millis=30000, nimbus.thrift.port=6627, storm.auth.simple-acl.admins=[], topology.component.cpu.pcore.percent=10.0, supervisor.memory.capacity.mb=3072.0, storm.nimbus.retry.times=5, supervisor.worker.start.timeout.secs=120, storm.zookeeper.retry.interval=1000, logs.users=null, storm.cluster.metrics.consumer.publish.interval.secs=60, worker.profiler.command=flight.bash, transactional.zookeeper.port=null, drpc.max_buffer_size=1048576, pacemaker.thread.timeout=10, task.credentials.poll.secs=30, blobstore.superuser=BigTone, drpc.https.keystore.type=JKS, topology.worker.receiver.thread.count=1, topology.state.checkpoint.interval.ms=1000, supervisor.slots.ports=[1027, 1028, 1029], topology.transfer.buffer.size=1024, storm.health.check.dir=healthchecks, topology.worker.shared.thread.pool.size=4, drpc.authorizer.acl.strict=false, nimbus.file.copy.expiration.secs=600, worker.profiler.childopts=-XX:+UnlockCommercialFeatures -XX:+FlightRecorder, topology.executor.receive.buffer.size=1024, backpressure.disruptor.low.watermark=0.4, nimbus.task.launch.secs=120, storm.local.mode.zmq=false, storm.messaging.netty.buffer_size=5242880, storm.cluster.state.store=org.apache.storm.cluster_state.zookeeper_state_factory, worker.heartbeat.frequency.secs=1, storm.log4j2.conf.dir=log4j2, ui.http.creds.plugin=org.apache.storm.security.auth.DefaultHttpCredentialsPlugin, storm.zookeeper.root=/storm, topology.tick.tuple.freq.secs=null, drpc.https.port=-1, storm.workers.artifacts.dir=workers-artifacts, supervisor.blobstore.download.max_retries=3, task.refresh.poll.secs=10, storm.exhibitor.port=8080, task.heartbeat.frequency.secs=3, pacemaker.port=6699, storm.messaging.netty.max_wait_ms=1000, topology.component.resources.offheap.memory.mb=0.0, drpc.http.port=3774, topology.error.throttle.interval.secs=10, storm.messaging.transport=org.apache.storm.messaging.netty.Context, topology.disable.loadaware.messaging=false, storm.messaging.netty.authentication=false, topology.component.resources.onheap.memory.mb=128.0, topology.kryo.factory=org.apache.storm.serialization.DefaultKryoFactory, worker.gc.childopts=, nimbus.topology.validator=org.apache.storm.nimbus.DefaultTopologyValidator, nimbus.seeds=[localhost], nimbus.queue.size=100000, nimbus.cleanup.inbox.freq.secs=600, storm.blobstore.replication.factor=3, worker.heap.memory.mb=768, logviewer.max.sum.worker.logs.size.mb=4096, pacemaker.childopts=-Xmx1024m, ui.users=null, transactional.zookeeper.servers=null, supervisor.worker.timeout.secs=30, storm.zookeeper.auth.password=null, storm.blobstore.acl.validation.enabled=false, client.blobstore.class=org.apache.storm.blobstore.NimbusBlobStore, storm.thrift.socket.timeout.ms=600000, supervisor.childopts=-Xmx256m, topology.worker.max.heap.size.mb=768.0, ui.http.x-frame-options=DENY, backpressure.disruptor.high.watermark=0.9, ui.filter=null, ui.header.buffer.bytes=4096, topology.min.replication.count=1, topology.disruptor.wait.timeout.millis=1000, storm.nimbus.retry.intervalceiling.millis=60000, topology.trident.batch.emit.interval.millis=50, storm.auth.simple-acl.users=[], drpc.invocations.threads=64, java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib, ui.port=8080, storm.exhibitor.poll.uripath=/exhibitor/v1/cluster/list, storm.messaging.netty.transfer.batch.size=262144, logviewer.appender.name=A1, nimbus.thrift.max_buffer_size=1048576, storm.auth.simple-acl.users.commands=[], drpc.request.timeout.secs=600}
2017-07-18 15:07:03.115 o.a.s.d.s.Slot main [WARN] SLOT DESKTOP-PDE9HPE:1027 Starting in state EMPTY - assignment null
2017-07-18 15:07:03.115 o.a.s.d.s.Slot main [WARN] SLOT DESKTOP-PDE9HPE:1028 Starting in state EMPTY - assignment null
2017-07-18 15:07:03.115 o.a.s.d.s.Slot main [WARN] SLOT DESKTOP-PDE9HPE:1029 Starting in state EMPTY - assignment null
2017-07-18 15:07:03.115 o.a.s.l.AsyncLocalizer main [INFO] Cleaning up unused topologies in C:\Users\BigTone\AppData\Local\Temp\50d72755-68f7-4d97-86e8-a1e4c1035242\supervisor\stormdist
2017-07-18 15:07:03.115 o.a.s.d.s.Supervisor main [INFO] Starting supervisor with id 93ef51bb-5109-4f38-907f-495ccc7f552d at host DESKTOP-PDE9HPE.
2017-07-18 15:07:03.146 o.a.s.d.nimbus main [WARN] Topology submission exception. (topology name='topologies\wordcount') #error {
:cause nil
:via
[{:type org.apache.storm.generated.InvalidTopologyException
:message nil
:at [org.apache.storm.daemon.nimbus$validate_topology_name_BANG_ invoke nimbus.clj 1320]}]
:trace
[[org.apache.storm.daemon.nimbus$validate_topology_name_BANG_ invoke nimbus.clj 1320]
[org.apache.storm.daemon.nimbus$mk_reified_nimbus$reify__10782 submitTopologyWithOpts nimbus.clj 1643]
[org.apache.storm.daemon.nimbus$mk_reified_nimbus$reify__10782 submitTopology nimbus.clj 1726]
[sun.reflect.NativeMethodAccessorImpl invoke0 NativeMethodAccessorImpl.java -2]
[sun.reflect.NativeMethodAccessorImpl invoke NativeMethodAccessorImpl.java 62]
[sun.reflect.DelegatingMethodAccessorImpl invoke DelegatingMethodAccessorImpl.java 43]
[java.lang.reflect.Method invoke Method.java 498]
[clojure.lang.Reflector invokeMatchingMethod Reflector.java 93]
[clojure.lang.Reflector invokeInstanceMethod Reflector.java 28]
[org.apache.storm.testing$submit_local_topology invoke testing.clj 310]
[org.apache.storm.LocalCluster$_submitTopology invoke LocalCluster.clj 49]
[org.apache.storm.LocalCluster submitTopology nil -1]
[org.apache.storm.flux.Flux runCli Flux.java 207]
[org.apache.storm.flux.Flux main Flux.java 98]]}
So I changed topology name with "wordcount" in tmp.yaml.
storm jar D:\dev\wordcount\_build\wordcount-0.0.1-SNAPSHOT-standalone.jar org.apache.storm.flux.Flux --local --no-splash --sleep 9223372036854775807 c:\users\bigtone\appdata\local\temp\tmpwodnya.yaml
02:23:26.894 [Thread-18-count_bolt2222222-executor[2 2]] ERROR org.apache.storm.util - Async loop died!
java.lang.RuntimeException: Error when launching multilang subprocess
Traceback (most recent call last):
File "d:\dev\python27\lib\runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "d:\dev\python27\lib\runpy.py", line 72, in _run_code
exec code in run_globals
File "D:\dev\Python27\Scripts\streamparse_run.exe\__main__.py", line 9, in <module>
File "d:\dev\python27\lib\site-packages\streamparse\run.py", line 45, in main
cls(serializer=args.serializer).run()
File "d:\dev\python27\lib\site-packages\pystorm\bolt.py", line 68, in __init__
super(Bolt, self).__init__(*args, **kwargs)
File "d:\dev\python27\lib\site-packages\pystorm\component.py", line 211, in __init__
signal.signal(rdb_signal, remote_pdb_handler)
TypeError: an integer is required
How can I fix this?
Who can help me?
Thanks for any tips in advance!
Related
Airflow Scheduler fails with error `sqlalchemy.exc.OperationalError: (MySQLdb.OperationalError) (2006, 'MySQL server has gone away')`
I'm using Airflow 2.5.1 and setting up Airflow scheduler (Using LocalExecutor) + webserver on Deb9 Instance. MySQL DB is on another instance and I checked using PING and airflow db check that the connection to the MySQL server is successful. I had even run airflow db init from this instance and it was able to create all the tables successfully. When I start the scheduler using airflow scheduler command, getting the following error: airflow-scheduler.err log below: sqlalchemy.exc.OperationalError: (MySQLdb.OperationalError) (2006, 'MySQL server has gone away') (Background on this error at: https://sqlalche.me/e/14/e3q8) [2023-02-09 01:12:34 +0530] [15451] [ERROR] Connection in use: ('0.0.0.0', 8793) [2023-02-09 01:12:34 +0530] [15451] [ERROR] Retrying in 1 second. [2023-02-09 01:12:35 +0530] [15451] [ERROR] Connection in use: ('0.0.0.0', 8793) [2023-02-09 01:12:35 +0530] [15451] [ERROR] Retrying in 1 second. [2023-02-09 01:12:36 +0530] [15451] [ERROR] Connection in use: ('0.0.0.0', 8793) [2023-02-09 01:12:36 +0530] [15451] [ERROR] Retrying in 1 second. [2023-02-09 01:12:37 +0530] [15451] [ERROR] Can't connect to ('0.0.0.0', 8793) airflow-scheduler.log below: sqlalchemy.exc.OperationalError: (MySQLdb.OperationalError) (2006, 'MySQL server has gone away') (Background on this error at: https://sqlalche.me/e/14/e3q8) 2023-02-09 01:12:33,040 INFO - Shutting down LocalExecutor; waiting for running tasks to finish. Signal again if you don't want to wait. 2023-02-09 01:12:34,066 INFO - Sending Signals.SIGTERM to group 15492. PIDs of all processes in the group: [15492] 2023-02-09 01:12:34,066 INFO - Sending the signal Signals.SIGTERM to group 15492 2023-02-09 01:12:34,158 INFO - Process psutil.Process(pid=15492, status='terminated', exitcode=0, started='01:12:32') (15492) terminated with exit code 0 2023-02-09 01:12:34,159 INFO - Exited execute loop Any idea why is this happening? [Update] Connection in use: ('0.0.0.0', 8793) was because of some processes running from previous run. Have killed those processes. But I'm still getting MySQL server has gone away error. airflow-scheduler.err log: The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/airflow/airflow_venv/bin/airflow", line 8, in <module> sys.exit(main()) File "/home/airflow/airflow_venv/lib/python3.7/site-packages/airflow/__main__.py", line 39, in main args.func(args) File "/home/airflow/airflow_venv/lib/python3.7/site-packages/airflow/cli/cli_parser.py", line 52, in command return func(*args, **kwargs) File "/home/airflow/airflow_venv/lib/python3.7/site-packages/airflow/utils/cli.py", line 108, in wrapper return f(*args, **kwargs) File "/home/airflow/airflow_venv/lib/python3.7/site-packages/airflow/cli/commands/scheduler_command.py", line 68, in scheduler _run_scheduler_job(args=args) File "/home/airflow/airflow_venv/lib/python3.7/site-packages/airflow/cli/commands/scheduler_command.py", line 43, in _run_scheduler_job job.run() File "/home/airflow/airflow_venv/lib/python3.7/site-packages/airflow/jobs/base_job.py", line 258, in run self._execute() File "/home/airflow/airflow_venv/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 759, in _execute self._run_scheduler_loop() File "/home/airflow/airflow_venv/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", line 840, in _run_scheduler_loop self.adopt_or_reset_orphaned_tasks() File "/home/airflow/airflow_venv/lib/python3.7/site-packages/airflow/utils/session.py", line 75, in wrapper return func(*args, session=session, **kwargs) File "/usr/local/lib/python3.7/contextlib.py", line 119, in __exit__ next(self.gen) File "/home/airflow/airflow_venv/lib/python3.7/site-packages/airflow/utils/session.py", line 36, in create_session session.commit() File "/home/airflow/airflow_venv/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 1451, in commit self._transaction.commit(_to_root=self.future) File "/home/airflow/airflow_venv/lib/python3.7/site-packages/sqlalchemy/orm/session.py", line 836, in commit trans.commit() File "/home/airflow/airflow_venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2459, in commit self._do_commit() File "/home/airflow/airflow_venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2649, in _do_commit self._connection_commit_impl() File "/home/airflow/airflow_venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2620, in _connection_commit_impl self.connection._commit_impl() File "/home/airflow/airflow_venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1091, in _commit_impl self._handle_dbapi_exception(e, None, None, None, None) File "/home/airflow/airflow_venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2125, in _handle_dbapi_exception sqlalchemy_exception, with_traceback=exc_info[2], from_=e File "/home/airflow/airflow_venv/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 211, in raise_ raise exception File "/home/airflow/airflow_venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1089, in _commit_impl self.engine.dialect.do_commit(self.connection) File "/home/airflow/airflow_venv/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 686, in do_commit dbapi_connection.commit() sqlalchemy.exc.OperationalError: (MySQLdb.OperationalError) (2006, 'MySQL server has gone away') (Background on this error at: https://sqlalche.me/e/14/e3q8)
When running the LocalExecutor, the scheduler also starts a process to serve log files, by default on port 8793. The error tells you you've already got something running on port 8793, therefore it can't start and returns an error. You've probably already got a scheduler running. The port is configurable by AIRFLOW__LOGGING__WORKER_LOG_SERVER_PORT. For more information, see https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#worker-log-server-port.
How to start a Ray cluster on one local server using yaml file config without docker
Could any one help me how to start ray on local server by a config file? My current local server can run Ray successfully when using below command: ray start --head --node-ip-address 127.0.0.1 --port 6379 --dashboard-host 0.0.0.0 --dashboard-port 8265 --gcs-server-port 8075 --object-manager-port 8076 --node-manager-port 8077 --min-worker-port 10002 --max-worker-port 19999 But now I need to move it to a config file so that other service can control it. I did try to make a cluster.yaml file and start it with command ray up cluster.yaml, but it's facing error. The content of cluster.yaml file is: cluster_name: default max_workers: 1 upscaling_speed: 1.0 idle_timeout_minutes: 5 provider: type: local head_ip: 0.0.0.0 worker_ips: - 127.0.0.1 auth: ssh_user: root ssh_private_key: ~/.ssh/id_rsa file_mounts: {} cluster_synced_files: [] file_mounts_sync_continuously: False rsync_exclude: - "**/.git" - "**/.git/**" rsync_filter: - ".gitignore" initialization_commands: [] setup_commands: [] head_setup_commands: [] worker_setup_commands: [] head_start_ray_commands: - ray stop - ray start --head --port 6379 --dashboard-host '0.0.0.0' --dashboard-port 8265 --gcs-server-port 8075 --object-manager-port 8076 --node-manager-port 8077 --min-worker-port 10002 --max-worker-port 19999 --autoscaling-config=~/ray_bootstrap_config.yaml worker_start_ray_commands: - ray stop - ray start --address=$RAY_HEAD_IP:6379 --object-manager-port=8076 head_node: {} worker_nodes: {} But it's unable to start due to below error: <1/1> Setting up head node Prepared bootstrap config 2022-01-27 05:52:26,910 INFO node_provider.py:103 -- ClusterState: Writing cluster state: ['127.0.0.1', '0.0.0.0'] [1/7] Waiting for SSH to become available Running `uptime` as a test. Fetched IP: 0.0.0.0 05:52:33 up 3:43, 0 users, load average: 0.10, 0.18, 0.16 Shared connection to 0.0.0.0 closed. Success. [2/7] Processing file mounts [3/7] No worker file mounts to sync [4/7] No initialization commands to run. [5/7] Initalizing command runner [6/7] No setup commands to run. [7/7] Starting the Ray runtime Shared connection to 0.0.0.0 closed. Local node IP: 192.168.1.18 2022-01-27 05:52:37,541 INFO services.py:1340 -- View the Ray dashboard at http://192.168.1.18:8265 Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/ray/node.py", line 240, in __init__ self.redis_password) File "/usr/local/lib/python3.7/dist-packages/ray/_private/services.py", line 324, in wait_for_node raise TimeoutError("Timed out while waiting for node to startup.") TimeoutError: Timed out while waiting for node to startup. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/bin/ray", line 8, in <module> sys.exit(main()) File "/usr/local/lib/python3.7/dist-packages/ray/scripts/scripts.py", line 1989, in main return cli() File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1128, in __call__ return self.main(*args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1053, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1659, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1395, in invoke return ctx.invoke(self.callback, **ctx.params) File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 754, in invoke return __callback(*args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/ray/scripts/scripts.py", line 633, in start ray_params, head=True, shutdown_at_exit=block, spawn_reaper=block) File "/usr/local/lib/python3.7/dist-packages/ray/node.py", line 243, in __init__ "The current node has not been updated within 30 " Exception: The current node has not been updated within 30 seconds, this could happen because of some of the Ray processes failed to startup. Shared connection to 0.0.0.0 closed. 2022-01-27 05:53:07,718 INFO node_provider.py:103 -- ClusterState: Writing cluster state: ['127.0.0.1', '0.0.0.0'] New status: update-failed !!! SSH command failed. !!! Failed to setup head node.
Gunicorn autorestart causing errors
I am running gunicorn in async mode behind a nginx reverse proxy. Both are in separate docker containers on the same VM in the host network and everything is running fine as long as I don't configure max_requests to autorestart workers after a certain amount of requests. With autorestart configured the reboot of workers is not handled correctly, throwing errors and causing failed responses. I need this settings to fix problems with memory leaks and prevent gunicorn and other application components from crashing. Gunicorn log: 2020-08-07 06:55:23 [1438] [INFO] Autorestarting worker after current request. 2020-08-07 06:55:23 [1438] [ERROR] Socket error processing request. Traceback (most recent call last): File "/opt/mapproxy/lib/python3.5/site-packages/gunicorn/workers/base_async.py", line 65, in handle util.reraise(*sys.exc_info()) File "/opt/mapproxy/lib/python3.5/site-packages/gunicorn/util.py", line 625, in reraise raise value File "/opt/mapproxy/lib/python3.5/site-packages/gunicorn/workers/base_async.py", line 38, in handle listener_name = listener.getsockname() OSError: [Errno 9] Bad file descriptor Gunicorn is running with the following configuration: bind = '0.0.0.0:8081' worker_class = 'eventlet' workers = 8 timeout = 60 no_sendfile = True max_requests = 1000 max_requests_jitter = 500
gunicorn threads getting killed silently
gunicorn version 19.9.0 Got the following gunicorn config: accesslog = "access.log" worker_class = 'sync' workers = 1 worker_connections = 1000 timeout = 300 graceful_timeout = 300 keepalive = 300 proc_name = 'server' bind = '0.0.0.0:8080' name = 'server.py' preload = True log_level = "info" threads = 7 max_requests = 0 backlog = 100 As you can see, the server is configured to run 7 threads. The server is started with: gunicorn -c gunicorn_config.py server:app Here are the number of lines and thread IDs from our log file at the beginning (with the last line being the thread of the main server): 10502 140625414080256 10037 140624842843904 9995 140624859629312 9555 140625430865664 9526 140624851236608 9409 140625405687552 2782 140625422472960 6 140628359804736 So 7 threads are processing the requests. (Already we can see that thread 140625422472960 is processing substantially fewer requests than the other threads.) But after the lines examined above, thread 140625422472960 just vanishes and the log file only has: 19602 140624859629312 18861 140625405687552 18766 140624851236608 18765 140624842843904 12523 140625414080256 2111 140625430865664 (excluding the main thread here) From the server logs we could see that the thread received a request and started processing it, but never finished. The client received no response either. There is no error/warning in the log file, nor in stderr. And running the app for a little longer, two more threads are gone: 102 140624842843904 102 140624851236608 68 140624859629312 85 140625405687552 How to debug this?
Digging into the stderr logs further, finally found something like this exception stack trace: Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/gunicorn/workers/sync.py", line 134, in handle req = six.next(parser) File "/usr/local/lib/python3.6/site-packages/gunicorn/http/parser.py", line 41, in __next__ self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count) File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 181, in __init__ super(Request, self).__init__(cfg, unreader) File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 54, in __init__ unused = self.parse(self.unreader) File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 230, in parse self.headers = self.parse_headers(data[:idx]) File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 74, in parse_headers remote_addr = self.unreader.sock.getpeername() OSError: [Errno 107] Transport endpoint is not connected [2018-11-04 17:57:55 +0330] [31] [ERROR] Socket error processing request. Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/gunicorn/workers/sync.py", line 134, in handle req = six.next(parser) File "/usr/local/lib/python3.6/site-packages/gunicorn/http/parser.py", line 41, in __next__ self.mesg = self.mesg_class(self.cfg, self.unreader, self.req_count) File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 181, in __init__ super(Request, self).__init__(cfg, unreader) File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 54, in __init__ unused = self.parse(self.unreader) File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 230, in parse self.headers = self.parse_headers(data[:idx]) File "/usr/local/lib/python3.6/site-packages/gunicorn/http/message.py", line 74, in parse_headers remote_addr = self.unreader.sock.getpeername() OSError: [Errno 107] Transport endpoint is not connected This is due to this gunicorn bug. An interim solution until this bug is fixed is to monkey patch gunicorn as done by asantoni.
Twisted plugin needs to fail fast of port is taken
I have a twistd plugin that listens on a port and does very simple things. The problem is that when I start it, if the post is not available it just sits there with the process running, but doing nothing. I need the process to exit immediately in this case so the larger system can notice and deal with the problem I have code like this: def makeService(options): root = Resource() # Not what I actually have... factory = server.Site(root) server_string = b'tcp:{0}:interface={1}'.format(options['port'], options['interface']) endpoint = endpoints.serverFromString(reactor, server_string) service = internet.StreamServerEndpointService(endpoint, factory) return service This results in: 2016-12-19T11:42:21-0600] [info] [3082] [-] Log opened. [2016-12-19T11:42:21-0600] [info] [3082] [-] twistd 15.5.0 (/home/matthew/code-venvs/wgcbap/bin/python 2.7.6) starting up. [2016-12-19T11:42:21-0600] [info] [3082] [-] reactor class: twisted.internet.epollreactor.EPollReactor. [2016-12-19T11:42:21-0600] [critical] [3082] [-] Unhandled Error Traceback (most recent call last): File "/home/matthew/code-venvs/wgcbap/local/lib/python2.7/site-packages/twisted/scripts/_twistd_unix.py", line 394, in startApplication service.IService(application).privilegedStartService() File "/home/matthew/code-venvs/wgcbap/local/lib/python2.7/site-packages/twisted/application/service.py", line 278, in privilegedStartService service.privilegedStartService() File "/home/matthew/code-venvs/wgcbap/local/lib/python2.7/site-packages/twisted/application/internet.py", line 352, in privilegedStartService self._waitingForPort = self.endpoint.listen(self.factory) File "/home/matthew/code-venvs/wgcbap/local/lib/python2.7/site-packages/twisted/internet/endpoints.py", line 457, in listen interface=self._interface) --- <exception caught here> --- File "/home/matthew/code-venvs/wgcbap/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 121, in execute result = callable(*args, **kw) File "/home/matthew/code-venvs/wgcbap/local/lib/python2.7/site-packages/twisted/internet/posixbase.py", line 478, in listenTCP p.startListening() File "/home/matthew/code-venvs/wgcbap/local/lib/python2.7/site-packages/twisted/internet/tcp.py", line 984, in startListening raise CannotListenError(self.interface, self.port, le) twisted.internet.error.CannotListenError: Couldn't listen on 127.0.0.1:9999: [Errno 98] Address already in use. And it continues to run, doing nothing.... Adding a line service._raiseSynchronously = True just above the return works, but seems to be undocumented and feels dirty. Is there an approved way to do this?