Splitting a string for printing

Splitting a string for printing - python

I need to print some headers around some strings, you can see it working fine here, although if the string is really long I need to split it then print a much longer header.
+===================================================+
| Running: sdt_test |
| Skipping:inquiry/"Inq VPD C0" mem/"Maint In Cmnd" |
+===================================================+
sh: /net/flanders/export/ws/ned/proto/bin/sdt_test: No such file or directory
+=====================+
| Running: dtd_tester |
+=====================+
sh: /net/flanders/export/ws/ned/proto/bin/dtd_tester: No such file or directory
+===============+
| Running: pssm |
+===============+
sh: /net/flanders/export/ws/ned/proto/bin/pssm: No such file or directory
+==============+
| Running: psm |
+==============+
sh: /net/flanders/export/ws/ned/proto/bin/psm: No such file or directory
+===============================================================================================================================================================================================================================================================================================================================================+
| Running: ssm |
| Skipping:"Secondary Subset Manager Tests"/"SSSM_3 Multi Sequence" "Secondary Subset Manager Tests"/"SSSM_2 Steady State" "Secondary Subset Manager Tests"/"SSSM_4 Test Abort" "Secondary Subset Manager Tests"/"SSSM_6 Test extend" "Secondary Subset Manager Tests"/"SSSM_9 exceptions" "Secondary Subset Manager Tests"/"SSSM_11 failed io" |
+===============================================================================================================================================================================================================================================================================================================================================+
It appears fine there, although the SSM test, I would like split up on a certain amount of characters, maybe 100 or just on the whitespace between suites.
I'm really not too sure on how to do this, this is the code that currently does it.
#calculate lengths to make sure header is correct length
l1 = len(x)
l2 = len(y)
skip = False
if 'disable=' in test and 'disable="*"' not in test:
skip = True
#if entire test suite is to be disabled or not run
if disable:
headerBreak ="+" + "="*(l1+12) + "+"
print headerBreak
print "| Skipping: %s |" % x
#if the test suite will be executed
else:
if skip == False:
l2 = 0
headerBreak = "+" + "="*(max(l1,l2)+11) + "+"
print headerBreak
print "| Running: %s" % x, ' '*(l2-l1)+ '|'
#if some suites are disabled but some are still running
if skip:
print "| Skipping:%s |" % y
print headerBreak
sys.stdout.flush()

You can use the textwrap module to simplify this
For example if the max width was 44
>>> max_width = 44
>>> header='''Skipping:"Secondary Subset Manager Tests"/"SSSM_3 Multi Sequence" "Secondary Subset Manager Tests"/"SSSM_2 Steady State" "Secondary Subset Manager Tests"/"SSSM_4 Test Abort" "Secondary Subset Manager Tests"/"SSSM_6 Test extend" "Secondary Subset Manager Tests"/"SSSM_9 exceptions" "Secondary Subset Manager Tests"/"SSSM_11 failed io"'''
>>> h = ["Running: ssm"] + textwrap.wrap(header, width=max_width-4)
>>> maxh = len(max(h, key=len))
>>> print "+=" + "="*maxh + "=+"
+==========================================+
>>> for i in h:
... print "| " + i.ljust(maxh) + " |"...
| Running: ssm |
| Skipping:"Secondary Subset Manager |
| Tests"/"SSSM_3 Multi Sequence" |
| "Secondary Subset Manager Tests"/"SSSM_2 |
| Steady State" "Secondary Subset Manager |
| Tests"/"SSSM_4 Test Abort" "Secondary |
| Subset Manager Tests"/"SSSM_6 Test |
| extend" "Secondary Subset Manager |
| Tests"/"SSSM_9 exceptions" "Secondary |
| Subset Manager Tests"/"SSSM_11 failed |
| io" |
>>> print "+=" + "="*maxh + "=+"
+==========================================+

Related

Error when trying to print string with multiple lines

I'm trying to make a tic tac toe game in Python but whenever I try to print the board, I get a weird error. Here is my code:
import os
from colorama import Fore
board = " a b c\n | | \n1 - | - | - \n _____|_____|_____\n | | \n2 - | - | - \n _____|_____|_____\n | | \n3 - | - | - \n | | "
def board():
print(board)
board()
And this is the error:
 SIGQUIT: quit
PC=0x7f1a62b22792 m=0 sigcode=128
signal arrived during cgo execution
goroutine 1 [syscall, locked to thread]:
runtime.cgocall(0x4bb660, 0xc000066d90)
runtime/cgocall.go:156 +0x5c fp=0xc000066d68 sp=0xc000066d30 pc=0x40651c
main._Cfunc_PyRun_InteractiveLoopFlags(0x7f1a62be9800, 0x55555691b800, 0x0)
_cgo_gotypes.go:418 +0x4c fp=0xc000066d90 sp=0xc000066d68 pc=0x4b8e4c
main.Python.REPL.func2(0x55555691b800)
github.com/replit/prybar/languages/python3/main.go:122 +0x66 fp=0xc000066dd0 sp=0xc000066d90 pc=0x4bacc6
main.Python.REPL({})
github.com/replit/prybar/languages/python3/main.go:122 +0x99 fp=0xc000066e10 sp=0xc000066dd0 pc=0x4bac19
main.(*Python).REPL(0x5d56d0)
<autogenerated>:1 +0x2a fp=0xc000066e20 sp=0xc000066e10 pc=0x4bb36a
github.com/replit/prybar/utils.Language.REPL({{0x5075d0, 0x5d56d0}, {0x7ffd00c3ac79, 0x4e1680}})
github.com/replit/prybar/utils/language.go:100 +0x5d fp=0xc000066e78 sp=0xc000066e20 pc=0x4b825d
github.com/replit/prybar/utils.DoCli({0x5075d0, 0x5d56d0})
github.com/replit/prybar/utils/utils.go:77 +0x3d7 fp=0xc000066f60 sp=0xc000066e78 pc=0x4b8a97
main.main()
github.com/replit/prybar/languages/python3/generated_launch.go:7 +0x27 fp=0xc000066f80 sp=0xc000066f60 pc=0x4b8bc7
<function board at 0x7f5f840e2280>
I haven't seen a error like this before and am wondering what it means and how to fix it? Thanks for help!

The problem is that you are naming everything "board"
When you do "def board()" after "board = ..." you are redefining board to be a function and no longer a string.
The error you are getting is due to your editor somehow not supporting printing functions

Pyspark Java.lang.OutOfMemoryError: Java heap space

I am solving a problem using spark running in my local machine.
I am reading a parquet file from the local disk and storing it to the dataframe.
import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
spark = SparkSession.builder\
.config("spark.driver.memory","4g")\
.config("spark.executor.memory","4g")\
.config("spark.driver.maxResultSize","2g")\
.getOrCreate()
content = spark.read.parquet('./files/file')
So, Content Dataframe contents around 500k rows i.e.
+-----------+----------+
|EMPLOYEE_ID|MANAGER_ID|
+-----------+----------+
| 100| 0|
| 101| 100|
| 102| 100|
| 103| 100|
| 104| 100|
| 105| 100|
| 106| 101|
| 101| 101|
| 101| 101|
| 101| 101|
| 101| 102|
| 101| 102|
. .
. .
. .
I write this code to provide each EMPLOYEE_ID an EMPLOYEE_LEVEL according to their hierarchy.
# Assign EMPLOYEE_LEVEL 1 WHEN MANAGER_ID is 0 ELSE NULL
content_df = content.withColumn("EMPLOYEE_LEVEL", when(col("MANAGER_ID") == 0, 1).otherwise(lit('')))
level_df = content_df.select("*").filter("Level = 1")
level = 1
while True:
ldf = level_df
temp_df = content_df.join(
ldf,
((ldf["EMPLOYEE_LEVEL"] == level) &
(ldf["EMPLOYEE_ID"] == content_df["MANAGER_ID"])),
"left") \
.withColumn("EMPLOYEE_LEVEL",ldf["EMPLOYEE_LEVEL"]+1)\
.select("EMPLOYEE_ID","MANAGER_ID","EMPLOYEE_LEVEL")\
.filter("EMPLOYEE_LEVEL IS NOT NULL")\
.distinct()
if temp_df.count() == 0:
break
level_df = level_df.union(temp_df)
level += 1
It's running, but very slow execution and after some period of time it gives this error.
Py4JJavaError: An error occurred while calling o383.count.
: java.lang.OutOfMemoryError: Java heap space
at scala.collection.immutable.List.$colon$colon(List.scala:117)
at scala.collection.immutable.List.$plus$colon(List.scala:220)
at org.apache.spark.sql.catalyst.expressions.String2TrimExpression.children(stringExpressions.scala:816)
at org.apache.spark.sql.catalyst.expressions.String2TrimExpression.children$(stringExpressions.scala:816)
at org.apache.spark.sql.catalyst.expressions.StringTrim.children(stringExpressions.scala:948)
at org.apache.spark.sql.catalyst.trees.TreeNode.withNewChildren(TreeNode.scala:351)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:595)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:486)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:486)
at org.apache.spark.sql.catalyst.trees.TreeNode$$Lambda$1822/0x0000000100d21040.apply(Unknown Source)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.TraversableLike$$Lambda$61/0x00000001001d2040.apply(Unknown Source)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:595)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:486)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:486)
at org.apache.spark.sql.catalyst.trees.TreeNode$$Lambda$1822/0x0000000100d21040.apply(Unknown Source)
at org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1148)
at org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1147)
at org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:555)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:486)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:486)
at org.apache.spark.sql.catalyst.trees.TreeNode$$Lambda$1822/0x0000000100d21040.apply(Unknown Source)
at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1122)
at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1121)
at org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:486)
I tried many solutions including increasing driver and executor memory, using cache() and persist() for dataframe also doesn't worked for me.
I am using Spark 3.2.1
Spark
Any help will be appreciated.
Thank you.

I figure out the problem. This error related to the mechanism of spark DAG, it use DAG lineage to track a series transformations, when the algorithms need to iterate, the lineage can grow fast and hit the limitation of memory. So break the lineage is necessary when implementing iteration algorithms.
There are mainly 2 ways: 1. add checkpoint. 2.recreate dataframe.
I modify my codes below, which just add a checkpoint to break the lineage and works for me.
epoch_cnt = 0
while True:
print('hahaha1')
print('cached df', len(spark.sparkContext._jsc.getPersistentRDDs().items()))
singer_pairs_undirected_ungrouped = singer_pairs_undirected.join(old_song_group_kernel,
on=singer_pairs_undirected['src'] == old_song_group_kernel['id'],
how='left').filter(F.col('id').isNull()) \
.select('src', 'dst')
windowSpec = Window.partitionBy("src").orderBy(F.col("song_group_id_cnt").desc())
singer_pairs_vote = singer_pairs_undirected_ungrouped.join(old_song_group_kernel,
on=singer_pairs_undirected_ungrouped['dst'] ==
old_song_group_kernel['id'], how='inner') \
.groupBy('src', 'song_group_id') \
.agg(F.count('song_group_id').alias('song_group_id_cnt')) \
.withColumn('song_group_id_cnt_rnk', F.row_number().over(windowSpec)) \
.filter(F.col('song_group_id_cnt_rnk') == 1)
singer_pairs_vote_output = singer_pairs_vote.select('src', 'song_group_id') \
.withColumnRenamed('src', 'id')
print('hahaha5')
new_song_group_kernel = old_song_group_kernel.union(singer_pairs_vote_output) \
.select('id', 'song_group_id').dropDuplicates().persist().checkpoint()
print('hahaha9')
current_kernel_cnt = new_song_group_kernel.count()
print('hahaha2')
old_song_group_kernel.unpersist()
print('hahaha3')
old_song_group_kernel = new_song_group_kernel
epoch_cnt += 1
print('epoch rounds: ', epoch_cnt)
print('previous kernel count: ', previous_kernel_cnt)
print('current kernel count: ', current_kernel_cnt)
if current_kernel_cnt <= previous_kernel_cnt:
print('Iteration done !')
break
print('hahaha4')
previous_kernel_cnt = current_kernel_cnt

Unable to use saved model as starting point for training Baselines' MlpPolicy?

I'm currently using code from OpenAI baselines to train a model, using the following code in my train.py:
from baselines.common import tf_util as U
import tensorflow as tf
import gym, logging
from visak_dartdeepmimic import VisakDartDeepMimicArgParse
def train(env, initial_params_path,
save_interval, out_prefix, num_timesteps, num_cpus):
from baselines.ppo1 import mlp_policy, pposgd_simple
sess = U.make_session(num_cpu=num_cpus).__enter__()
U.initialize()
def policy_fn(name, ob_space, ac_space):
print("Policy with name: ", name)
policy = mlp_policy.MlpPolicy(name=name, ob_space=ob_space, ac_space=ac_space,
hid_size=64, num_hid_layers=2)
saver = tf.train.Saver()
if initial_params_path is not None:
print("Tried to restore from ", initial_params_path)
saver.restore(tf.get_default_session(), initial_params_path)
return policy
def callback_fn(local_vars, global_vars):
iters = local_vars["iters_so_far"]
saver = tf.train.Saver()
if iters % save_interval == 0:
saver.save(sess, out_prefix + str(iters))
pposgd_simple.learn(env, policy_fn,
max_timesteps=num_timesteps,
callback=callback_fn,
timesteps_per_actorbatch=2048,
clip_param=0.2, entcoeff=0.0,
optim_epochs=10, optim_stepsize=3e-4, optim_batchsize=64,
gamma=1.0, lam=0.95, schedule='linear',
)
env.close()
Which is based off of the code that OpenAI itself provides in the baselines repository
This works fine, except that I get some pretty weird looking learning curves which I suspect are due to some hyperparameters passed to the learn function which cause performance to decay / high variance as things go on (though I don't know for certain)
Anyways, to confirm this hypothesis I'd like to retrain the model but not from scratch: I'd like to start it off from a high point: say, iteration 1600 for which I have a saved model lying around (having saved it with saver.save in callback_fn
So now I call the train function, but this time I provide it with an inital_params_path pointing to the save prefix for iteration 1600. By my understanding, the call to saver.restore in policy_fn should restore "reset" the model to where it was at 1teration 1600 (and I've confirmed that the load routine runs using the print statement)
However, in practice I find that it's almost like nothing gets loaded. For instance, if I got statistics like
----------------------------------
| EpLenMean | 74.2 |
| EpRewMean | 38.7 |
| EpThisIter | 209 |
| EpisodesSoFar | 662438 |
| TimeElapsed | 2.15e+04 |
| TimestepsSoFar | 26230266 |
| ev_tdlam_before | 0.95 |
| loss_ent | 2.7640965 |
| loss_kl | 0.09064759 |
| loss_pol_entpen | 0.0 |
| loss_pol_surr | -0.048767302 |
| loss_vf_loss | 3.8620138 |
----------------------------------
for iteration 1600, then for iteration 1 of the new trial (ostensibly using 1600's parameters as a starting point), I get something like
----------------------------------
| EpLenMean | 2.12 |
| EpRewMean | 0.486 |
| EpThisIter | 7676 |
| EpisodesSoFar | 7676 |
| TimeElapsed | 12.3 |
| TimestepsSoFar | 16381 |
| ev_tdlam_before | -4.47 |
| loss_ent | 45.355236 |
| loss_kl | 0.016298374 |
| loss_pol_entpen | 0.0 |
| loss_pol_surr | -0.039200217 |
| loss_vf_loss | 0.043219414 |
----------------------------------
which is back to square one (this is around where my models trained from scratch start)
The funny thing is I know that the model is being saved properly at least, since I can actually replay it using eval.py
from baselines.common import tf_util as U
from baselines.ppo1 import mlp_policy, pposgd_simple
import numpy as np
import tensorflow as tf
class PolicyLoaderAgent(object):
"""The world's simplest agent!"""
def __init__(self, param_path, obs_space, action_space):
self.action_space = action_space
self.actor = mlp_policy.MlpPolicy("pi", obs_space, action_space,
hid_size = 64, num_hid_layers=2)
U.initialize()
saver = tf.train.Saver()
saver.restore(tf.get_default_session(), param_path)
def act(self, observation, reward, done):
action2, unknown = self.actor.act(False, observation)
return action2
if __name__ == "__main__":
parser = VisakDartDeepMimicArgParse()
parser.add_argument("--params-prefix", required=True, type=str)
args = parser.parse_args()
env = parser.get_env()
U.make_session(num_cpu=1).__enter__()
U.initialize()
agent = PolicyLoaderAgent(args.params_prefix, env.observation_space, env.action_space)
while True:
ob = env.reset(0, pos_stdv=0, vel_stdv=0)
done = False
while not done:
action = agent.act(ob, reward, done)
ob, reward, done, _ = env.step(action)
env.render()
and I can clearly see that its learned something as compared to an untrained baseline. The loading action is the same across both files (or rather, if there's a mistake there then I can't find it), so it appears probable to me that train.py is correctly loading the model and then, due to something in the pposdg_simple.learn function's, promptly forgets about it.
Could anyone shed some light on this situation?

Not sure if this is still relevant since the baselines repository has changed quite a bit since this question was posted, but it seems that you are not actually initialising the variables before restoring them. Try moving the call of U.initialize() inside your policy_fn:
def policy_fn(name, ob_space, ac_space):
print("Policy with name: ", name)
policy = mlp_policy.MlpPolicy(name=name, ob_space=ob_space,
ac_space=ac_space, hid_size=64, num_hid_layers=2)
saver = tf.train.Saver()
if initial_params_path is not None:
print("Tried to restore from ", initial_params_path)
U.initialize()
saver.restore(tf.get_default_session(), initial_params_path)
return policy

pyparsing ParseException: Expected end of line -- general questions

I am a newbie to pyparsing and have been reading the examples, looking here and trying some things out.
I created a grammar and provided a buffer. I do however have a heavy background in lex/yacc from the old days.
I have a general question or two.
I'm currently seeing
ParseException: Expected end of line (at char 7024), (line 213, col:2)
and then it terminates
Because of the nature of my buffer, newlines have meaning, I did:
ParserElement.setDefaultWhitespaceChars('') # <-- zero len string
Does this error mean that somewhere in my productions, I have a rule that is looking for an LineEnd() and that rule happens to somehow be 'last'?
The location it is dying is the 'end of file'. I tried using parseFile but my file contains chars > ord(127) so instead I am loading it to memory, filtering all > ord(127) chars, then calling parseString.
I tried turning on verbose_stacktrace=True for some of the elements of my grammar where I thought the problem originated.
Is there a better way to track down the exact ParserElement it is trying to recognize when an error such as this occurs? Or can I get a 'stack or most recently recognized production trace?
I didn't realize I could edit up here...
My crash is this:
[centos#new-host /tmp/sample]$ ./zooparser.py
!(zooparser.py) TEST test1: valid message type START
Ready to roll
Parsing This message: ( ignore leading>>> and trailing <<< ) >>>
ZOO/STATUS/FOOD ALLOCATION//
TOPIC/BIRD FEED IS RUNNING LOW//
FREE/WE HAVE DISCOVERED MOTHS INFESTED THE BIRDSEED AND IT IS NO
LONGER USABLE.//
<<<
Match {Group:({Group:({Group:({[LineEnd]... "ZOO" Group:({[LineEnd]... "/" [Group:({{{W:(abcd...) | LineEnd | "://" | " " | W:(!##$...) | ":"}}... ["/"]...})]... {W:(abcd...) | LineEnd | "://" | " " | W:(!##$...)}}) "//"}) Group:({LineEnd "TOPIC" {Group:({[LineEnd]... Group:({"/" {W:(abcd...) | Group:({W:(abcd...) [{W:(abcd...)}...]... W:(abcd...)}) | Group:({{{"ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ'"}... | Group:({{"0123456789"}... ":"})} {W:(abcd...) | Group:({W:(abcd...) [{W:(abcd...)}...]... W:(abcd...)})}}) | "-"}})})}... [LineEnd]... "//"})}) [Group:({LineEnd "FREE" Group:({[LineEnd]... "/" [Group:({{{W:(abcd...) | LineEnd | "://" | " " | W:(!##$...) | ":"}}... ["/"]...})]... {W:(abcd...) | LineEnd | "://" | " " | W:(!##$...)}}) "//"})]...}) [LineEnd]... StringEnd} at loc 0(1,1)
Match Group:({Group:({[LineEnd]... "ZOO" Group:({[LineEnd]... "/" [Group:({{{W:(abcd...) | LineEnd | "://" | " " | W:(!##$...) | ":"}}... ["/"]...})]... {W:(abcd...) | LineEnd | "://" | " " | W:(!##$...)}}) "//"}) Group:({LineEnd "TOPIC" {Group:({[LineEnd]... Group:({"/" {W:(abcd...) | Group:({W:(abcd...) [{W:(abcd...)}...]... W:(abcd...)}) | Group:({{{"ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZ'"}... | Group:({{"0123456789"}... ":"})} {W:(abcd...) | Group:({W:(abcd...) [{W:(abcd...)}...]... W:(abcd...)})}}) | "-"}})})}... [LineEnd]... "//"})}) at loc 0(1,1)
Match Group:({[LineEnd]... "ZOO" Group:({[LineEnd]... "/" [Group:({{{W:(abcd...) | LineEnd | "://" | " " | W:(!##$...) | ":"}}... ["/"]...})]... {W:(abcd...) | LineEnd | "://" | " " | W:(!##$...)}}) "//"}) at loc 0(1,1)
Exception raised:None
Exception raised:None
Exception raised:None
Traceback (most recent call last):
File "./zooparser.py", line 319, in <module>
test1(pgm)
File "./zooparser.py", line 309, in test1
test(pgm, zooMsg, 'test1: valid message type' )
File "./zooparser.py", line 274, in test
tokens = zg.getTokensFromBuffer(fileName)
File "./zooparser.py", line 219, in getTokensFromBuffer
tokens = self.text.parseString(filteredBuffer,parseAll=True)
File "/usr/local/lib/python2.7/site-packages/pyparsing-1.5.7-py2.7.egg/pyparsing.py", line 1006, in parseString
raise exc
pyparsing.ParseException: Expected end of line (at char 148), (line:8, col:2)
[centos#new-host /tmp/sample]$
source: see http://prj1.y23.org/zoo.zip

pyparsing takes a different view toward parsing than lex/yacc does. You have to let the classes do some of the work. Here's an example in your code:
self.columnHeader = OneOrMore(self.aucc) \
| OneOrMore(nums) \
| OneOrMore(self.blankCharacter) \
| OneOrMore(self.specialCharacter)
You are equating OneOrMore with the '+' character of a regex. In pyparsing, this is true for ParseElements, but at the character level, pyparsing uses the Word class:
self.columnHeader = Word(self.aucc + nums + self.blankCharacter + self.specialCharacter)
OneOrMore works with ParseElements, not characters. Look at:
OneOrMore(nums)
nums is the string "0123456789", so OneOrMore(nums) will match "0123456789", "01234567890123456789", etc., but not "123". That is what Word is for. OneOrMore will accept a string argument, but will implicitly convert it to a Literal.
This is a fundamental difference between using pyparsing and lex/yacc, and I think is the source of much of the complexity in your code.
Some other suggestions:
Your code has some premature optimizations in it - you write:
aucc = ''.join(set([alphas.upper(),"'"]))
Assuming that this will be used for defining Words, just do:
aucc = alphas.upper() + "'"
There is no harm in having duplicate characters in aucc, Word will convert this to a set internally.
Write a BNF for what you want to parse. It does not have to be overly rigorous as you would with lex/yacc. From your samples, it looks something like:
# sample
ZOO/STATUS/FOOD ALLOCATION//
TOPIC/BIRD FEED IS RUNNING LOW//
FREE/WE HAVE DISCOVERED MOTHS INFESTED THE BIRDSEED AND IT IS NO
LONGER USABLE.//
parser :: header topicEntry+
header :: "ZOO" sep namedValue
namedValue :: uppercaseWord sep valueBody
valueBody :: (everything up to //)
topicEntry :: topicHeader topicBody
topicHeader :: "TOPIC" sep valuebody
topicBody :: freeText
freeText :: "FREE" sep valuebody
sep :: "/"
Converting to pyparsing, this looks something like:
SEP = Literal("/")
BODY_TERMINATOR = Literal("//")
FREE_,TOPIC_,ZOO_ = map(Keyword,"FREE TOPIC ZOO".split())
uppercaseWord = Word(alphas.upper())
valueBody = SkipTo(BODY_TERMINATOR) # adjust later, but okay for now...
freeText = FREE_ + SEP + valueBody
topicBody = freeText
topicHeader = TOPIC_ + SEP + valueBody
topicEntry = topicHeader + topicBody
namedValue = uppercaseWord + SEP + valueBody
zooHeader = ZOO_ + SEP + namedValue
parser = zooHeader + OneOrMore(topicEntry)
(valueBody will have to get more elaborate when you add support for '://' embedded within a value, but save that for Round 2.)
Don't make things super complicated until you get at least some simple stuff working.

Tornado long polling requests

Below is the most simple example of my issue:
When a request is made it will print Request via GET <__main__.MainHandler object at 0x104041e10> and then the request will remain open. Good! However, when you make another request it does not call the MainHandler.get method until the first connection has finished.
How can I get multiple requests into the get method while having them remain long-polling. I'm passing arguments with each request that will get different results from a pub/sub via redis. Issue is that I only get one connection in at a time. Whats wrong? And why is this blocking other requests?
import tornado.ioloop
import tornado.web
import os
class MainHandler(tornado.web.RequestHandler):
#tornado.web.asynchronous
def get(self):
print 'Request via GET', self
if __name__ == '__main__':
application = tornado.web.Application([
(r"/", MainHandler)])
try:
application.listen(int(os.environ.get('PORT', 5000)))
tornado.ioloop.IOLoop.instance().start()
except KeyboardInterrupt:
tornado.ioloop.IOLoop.instance().stop()
Diagram Left: As described in issue above. The requests are not handled in the fashion requested in right diagram.
Diagram on the right I need the requests (a-d) to be handled by the RequestHandler and then wait for the pub/sub to announce their data.
a b c d
+ + + + ++ a b c d
| | | | || + + + +
| | | | || | | | |
| | | | || | | | |
| | | | || | | | |
| v v v || | | | |
+---|-----------------------------+ || +-----|----|---|---|------------------+
| | | || | | | | | |
| + RequestHandler| || | + + + + RequestHan. |
| | | || | | | | | |
+---|-----------------------------+ || +-----|----|---|---|------------------+
+---|-----------------------------+ || +-----|----|---|---|------------------+
| | | || | | | | | |
| + Sub/Pub Que | || | v + v v Que |
| | | || | | |
+---|-----------------------------+ || +----------|--------------------------+
+---|-----------------------------+ || +----------|--------------------------+
| || |
| Finished || | Finished
v || v
||
||
||
||
||
||
||
++
If this is accomplishable with another programming language please let me know.
Thank you for your help!

From http://www.tornadoweb.org/en/stable/web.html#tornado.web.asynchronous:
tornado.web.asynchronous(method)
...
If this decorator is given, the response is not finished when the
method returns. It is up to the request handler to call self.finish()
to finish the HTTP request. Without this decorator, the request is
automatically finished when the get() or post() method returns.
You have to finish get method explicitly:
import tornado.ioloop
import tornado.web
import tornado.options
from tornado.options import define, options
define("port", default=8000, help="run on the given port", type=int)
class MainHandler(tornado.web.RequestHandler):
#tornado.web.asynchronous
def get(self):
print 'Request via GET', self
self.finish()
if __name__ == '__main__':
application = tornado.web.Application([
(r"/", MainHandler)])
try:
application.listen(options.port)
tornado.ioloop.IOLoop.instance().start()
except KeyboardInterrupt:
tornado.ioloop.IOLoop.instance().stop()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Splitting a string for printing - python

Related

Error when trying to print string with multiple lines

Pyspark Java.lang.OutOfMemoryError: Java heap space

Unable to use saved model as starting point for training Baselines' MlpPolicy?

pyparsing ParseException: Expected end of line -- general questions

Tornado long polling requests

Categories

Resources