I have a neat little script in python that I would like to port to Ruby and I think it's highlighting my noobishness at Ruby. I'm getting the error that there is an unexpected END statement, but I don't see how this can be so. Perhaps there is a keyword that requires an END or something that doesn't want an END that I forgot about. Here is all of the code leading up to the offending line Offending line is commented.
begin
require base64
require base32
rescue LoadError
puts "etext requires base32. use 'gem install --remote base32' and try again"
end
# Get a string from a text file from disk
filename = ARGV.first
textFile = File.open(filename)
text = textFile.read()
mailType = "text only" # set the default mailType
#cut the email up by sections
textList1 = text.split(/\n\n/)
header = textList1[0]
if header.match (/MIME-Version/)
mailType = "MIME"
end
#If mail has no attachments, parse as text-only. This is the class that does this
class TextOnlyMailParser
def initialize(textList)
a = 1
body = ""
header = textList[0]
#parsedEmail = Email.new(header)
while a < textList.count
body += ('\n' + textList[a] + '\n')
a += 1
end
#parsedEmail.body = body
end
end
def separate(text,boundary = nil)
# returns list of strings and lists containing all of the parts of the email
if !boundary #look in the email for "boundary= X"
text.scan(/(?<=boundary=).*/) do |bound|
textList = recursiveSplit(text,bound)
end
return textList
end
if boundary
textList = recursiveSplit(text,boundary)
end
end
def recursiveSplit(chunk,boundary)
if chunk.is_a? String
searchString = "--" + boundary
ar = cunk.split(searchString)
return ar
elsif chunk.is_a? Array
chunk do |bit|
recursiveSplit(bit,boundary);
end
end
end
class MIMEParser
def initialize(textList)
#textList = textList
#nestedItems = []
newItem = NestItem.new(self)
newItem.value = #textList[0]
newItem.contentType = "Header"
#nestedItems.push(newItem)
#setup parsed email
#parsedEmail = Email.new(newItem.value)
self._constructNest
end
def checkForContentSpecial(item)
match = item.value.match (/Content-Disposition: attachment/)
if match
filename = item.value.match (/(?<=filename=").+(?=")/)
encoding = item.value.match (/(?<=Content-Transfer-Encoding: ).+/)
data = item.value.match (/(?<=\n\n).*(?=(\n--)|(--))/m)
dataGroup = data.split(/\n/)
dataString = ''
i = 0
while i < dataGroup.count
dataString += dataGroup[i]
i ++
end #<-----THIS IS THE OFFENDING LINE
#parsedEmail.attachments.push(Attachment.new(filename,encoding,dataString))
end
Your issue is the i ++ line, Ruby does not have a post or pre increment/decrement operators and the line is failing to parse. I can't personally account as to why i++ evaluates in IRB but i ++ does not perform any action.
Instead replace your ++ operators with += 1 making that last while:
while i < dataGroup.count
dataString += dataGroup[i]
i += 1
end
But also think about the ruby way, if you're just adding that to a string why not do a dataString = dataGroup.join instead of looping over with a while construct?
Related
I'm trying to extract some words between two delimiters. It works for the files where the script find these delimiters, but for the others files, the code extract all of the file.
Example:
File 00.txt:
'bqukfkb saved qshfqs illjQNqdj iohqsijqsd qsoiqsdqs'
File 01.txt:
'jkhjkl dbdqs ihnzqid Bad value okkkk SPAN sfsdf didjsfsdf'
I want to open 2 or more files like these two and extract only words between:
'Bad Value' and 'SPAN'.
My code works for the file 01.txt, but not for the 00.txt ( i think it's because it doesn't find the delimiters so he prints everything. How can i fix it ?
def get_path(): #return the path of the selected file(s)
root = Tk()
i= datetime.datetime.now()
day = i.day
month=i.month
root.filename = filedialog.askopenfilenames(initialdir = "Z:\SGI\SYNCBBG",title = "Select your files",filetypes = (("Fichier 1","f6365tscf.SCD*"+str(month)+str(day)+".1"),("all files",".*")))
root.withdraw()
return (root.filename)
def extraction_error(file):
f=open(file,'r')
file=f.read()
f.close()
start = file.find('Bad value') +9
end = file.find('SPAN', start)
return(file[start:end])
paths=get_path()
cpt=len(paths)
for x in range(0,cpt):
print(extraction_error(paths[x]))
Output : saved qshfqs illjQNqdj iohqsijqsd qsoiqsdq
okkkk
So in this case i just want to extract 'okkkk' and not print ' saved....' for the other file.
Thanks in advance for your help
In your extraction_error function, you may want to test if the two key words can be found:
start = file.find('Bad value') # remove + 9 here, put it later
end = file.find('SPAN', start)
if start != -1 and end != -1: # test if key words can be found, -1 for not found:
return(file[start+9:end])
else:
return ""
You're printing out something, because you are adding 8 to the start variable. Find returns negative one if the string is not found. So what you end up doing is printing out the elements from [7:-1]. I would add an if statement before the print statement:
start = file.find('Bad value')
end = file.find('SPAN', start)
if start != -1 and end != -1:
print(file[start + 9: end])
string.find() return -1 if the argument is not found in the string, example:
print "abcd".find("e") # -1
You can just check the result before the return:
start = file.find('Bad value') + 9
end = file.find('SPAN', start)
if start == -1 or end == -1:
return '' # Or None
return(file[start:end])
Using re:
import re
def get_text(text):
pattern= r'.+(Bad value)(.+)(SPAN).+'
r=re.match(pattern,text)
if r!=None and len(r.groups()) == 3:
print(r.groups()[1])
lines = [
'jkhjkl dbdqs ihnzqid Bad value okkkk SPAN sfsdf didjsfsdf'
,'ghghujh']
for line in lines:
get_text(line)
Output:
okkkk
I made a Python script to encrypt plaintext files using the symmetric-key algorithm described in this video. I then created a second script to decrypt the encrypted message. Here is the original text:
I came, I saw, I conquered.
Here is the text after being encrypted and decrypted:
I came, I saw, I conquerdd.
Almost perfect, except for a single letter. For longer texts, there will be multiple letters which are just off ie the numerical representation of the character which appears is one lower than the numerical representation of the original character. I have no idea why this is.
Here's how my scripts work. First, I generated a random sequence of digits -- my PAD -- and saved it in the text file "pad.txt". I won't show the code because it is so straightforward. I then saved the text which I want to be encrypted in "text.txt". Next, I run the encryption script, which encrypts the text and saves it in the file "encryptedText.txt":
#!/usr/bin/python3.4
import string
def getPad():
padString = open("pad.txt","r").read()
pad = padString.split(" ")
return pad
def encrypt(textToEncrypt,pad):
encryptedText = ""
possibleChars = string.printable[:98] # last two elements are not used bec
# ause they don't show up well on te
# xt files.
for i in range(len(textToEncrypt)):
char = textToEncrypt[i]
if char in possibleChars:
num = possibleChars.index(char)
else:
return False
encryptedNum = num + int(pad[(i)%len(pad)])
if encryptedNum >= len(possibleChars):
encryptedNum = encryptedNum - len(possibleChars)
encryptedChar = possibleChars[encryptedNum]
encryptedText = encryptedText + encryptedChar
return encryptedText
if __name__ == "__main__":
textToEncrypt = open("text.txt","r").read()
pad = getPad()
encryptedText = encrypt(textToEncrypt,pad)
if not encryptedText:
print("""An error occurred during the encryption process. Confirm that \
there are no forbidden symbols in your text.""")
else:
open("encryptedText.txt","w").write(encryptedText)
Finally, I decrypt the text with this script:
#!/usr/bin/python3.4
import string
def getPad():
padString = open("pad.txt","r").read()
pad = padString.split(" ")
return pad
def decrypt(textToDecrypt,pad):
trueText = ""
possibleChars = string.printable[:98]
for i in range(len(textToDecrypt)):
encryptedChar = textToDecrypt[i]
encryptedNum = possibleChars.index(encryptedChar)
trueNum = encryptedNum - int(pad[i%len(pad)])
if trueNum < 0:
trueNum = trueNum + len(possibleChars)
trueChar = possibleChars[trueNum]
trueText = trueText + trueChar
return trueText
if __name__ == "__main__":
pad = getPad()
textToDecrypt = open("encryptedText.txt","r").read()
trueText = decrypt(textToDecrypt,pad)
open("decryptedText.txt","w").write(trueText)
Both scripts seem very straightforward, and they obvious work almost perfectly. However, every once in a while there is an error and I cannot see why.
I found the solution to this problem. It turns out that every character that was not decrypted properly was encrypted to \r, which my text editor changed to a \n for whatever reason. Removing \r from the list of possible characters fixed the issue.
I am writing a function for deleting selected text (in a special way) when vim is running in a ssh session:
python << EOF
def delSelection():
buf = vim.current.buffer
(lnum1, col1) = buf.mark('<')
(lnum2, col2) = buf.mark('>')
# get selected text
# lines = vim.eval('getline({}, {})'.format(lnum1, lnum2))
# lines[0] = lines[0][col1:]
# lines[-1] = lines[-1][:col2+1]
# selected = "\n".join(lines) + "\n"
# passStrNc(selected)
# delete selected text
lnum1 -= 1
lnum2 -= 1
firstSeletedLine = buf[lnum1]
firstSeletedLineNew = buf[lnum1][:col1]
lastSelectedLine = buf[lnum2]
lastSelectedLineNew = buf[lnum2][(col2 + 1):]
newBuf = ["=" for i in range(lnum2 - lnum1 + 1)]
newBuf[0] = firstSeletedLineNew
newBuf[-1] = lastSelectedLineNew
print(len(newBuf))
print(len(buf[lnum1:(lnum2 + 1)]))
buf[lnum1:(lnum2 + 1)] = newBuf
EOF
function! DelSelection()
python << EOF
delSelection()
EOF
endfunction
python << EOF
import os
sshTty = os.getenv("SSH_TTY")
if sshTty:
cmd6 = "vnoremap d :call DelSelection()<cr>"
vim.command(cmd6)
EOF
Apparently vim is calling the function on every line selected, which defeats the whole purpose of the function. How should I do this properly?
That's because the : automatically inserts the '<,'> range when issued in visual mode. The canonical way to clear that is by prepending <C-u> to the mapping:
cmd6 = "vnoremap d :<C-u>call DelSelection()<cr>"
Alternatively, you can also append the range keyword to the :function definition, cp. :help a:firstline.
ok, I got it. I just need to add an Esc key before calling the function:
python << EOF
import os
sshTty = os.getenv("SSH_TTY")
if sshTty:
cmd6 = "vnoremap d <esc>:call DelSelection()<cr>"
vim.command(cmd6)
EOF
I have a text file, which is strucutred as following:
segmentA {
content Aa
content Ab
content Ac
....
}
segmentB {
content Ba
content Bb
content Bc
......
}
segmentC {
content Ca
content Cb
content Cc
......
}
I know how to search certrain strings through the whole text file, but how can i define to search for a certain string whithin, like example, "segmentC". I need something like reg expression to tell the script??:
If text beginn with "segmentC {" perform a search of a certain string until the first "}" appears.
Someone an idea?
Thanks in advance!
Not a RegEx solution ...but would do the work!
def SearchStuff(lines,sstr):
i=0
while(lines[i]!='}'):
#Do stuffff .....for e.g.
if 'Ca' in lines[i]:
return lines[i]
i+=1
def main(search_str):
f=open('file.txt','r')
lines = f.readlines()
f.close()
for line in lines:
if search_str in line:
index = lines.index(line)
break
lines = lines[index+1:]
print SearchStuff(lines,search_str)
search_str = 'segmentC' #set this string accordingly
main(search_str)
Depending on the complexity you are looking for, you can range from a simple state machine with line based pattern searching to a full lexer.
Line based search
The below example makes the assumption that you are only looking for one segment and that segmentC { and the closing } are on one single line.
def parsesegment(fh):
# Yields all lines inside "segmentC"
state = "out"
for line in fh:
line = line.strip() # in case there are whitespaces around
if state == "out":
if line.startswith("segmentC {"):
state = "in"
break
elif state == "in":
if line.startswith("}"):
state = "out"
break
# Work on the specific lines here
yield line
with open(...) as fh:
for line in parsesegment(fh):
# do something
Simple Lexer
If you need more flexibility, you can design a simple lexer/parser couple. For example, the following code makes no assumption to the organisation of the syntax between lines. It also ignores unknown pattern, which a typical lexer do not (normally it should raise a syntax error):
import re
class ParseSegment:
# Dictionary of patterns per state
# Tuples are (token name, pattern, state change command)
_regexes = {
"out": [
("open", re.compile(r"segment(?P<segment>\w+)\s+\{"), "in")
],
"in": [
("close", re.compile(r"\}"), "out"),
# Here an example of what you could want to match
("content", re.compile(r"content\s+(?P<content>\w+)"), None)
]
}
def lex(self, source, initpos = 0):
pos = initpos
end = len(source)
state = "out"
while pos < end:
for token_name, reg, state_chng in self._regexes[state]:
# Try to get a match
match = reg.match(source, pos)
if match:
# Advance according to how much was matched
pos = match.end()
# yield a token if it has a name
if token_name is not None:
# Yield token name, the full matched part of source
# and the match grouped according to (?P<tag>) tags
yield (token_name, match.group(), match.groupdict())
# Switch state if requested
if state_chng is not None:
state = state_chng
break
else:
# No match, advance by one character
# This is particular to that lexer, usually no match means
# the input file has an error in the syntax and lexer should
# yield an exception
pos += 1
def parse(self, source, initpos = 0):
# This is an example of use of the lexer with a parser
# This converts the input file into a dictionary. Keys are segment
# names, and values are list of contents.
segments = {}
cur_segment = None
# Use lexer to get tokens from source
for token, fullmatch, groups in self.lex(source, initpos):
# On open, create the list of content in segments
if token == "open":
cur_segment = groups["segment"]
segments[cur_segment] = []
# On content, ensure we know the segment and add content to the
# list
elif token == "content":
if cur_segment is None:
raise RuntimeError("Content found outside a segment")
segments[cur_segment].append(groups["content"])
# On close, set the current segment to unknown
elif token == "close":
cur_segment = None
# ignore unknown tokens, we could raise an error instead
return segments
def main():
with open("...", "r") as fh:
data = fh.read()
lexer = ParseSegment()
segments = lexer.parse(data)
print(segments)
return 0
if __name__ == '__main__':
main()
Full Lexer
Then if you need even more flexibility and reuseability, you will have to create a full parser. No need to reinvent the wheel, have a look at this list of language parsing modules, you will probably find the one that suits you.
If I have a keyword, how can I get it to, once it encounters a keyword, to just grab the rest of the line and return it as a string? Once it encounters an end of line, return everything on that line.
Here is the line I'm looking at:
description here is the rest of my text to collect
Thus, when the lexer encounters description, I would like "here is the rest of my text to collect" returned as a string
I have the following defined, but it seems to be throwing an error:
states = (
('bcdescription', 'exclusive'),
)
def t_bcdescription(t):
r'description '
t.lexer.code_start = t.lexer.lexpos
t.lexer.level = 1
t.lexer.begin('bcdescription')
def t_bcdescription_close(t):
r'\n'
t.value = t.lexer.lexdata[t.lexer.code_start:t.lexer.lexpos+1]
t.type="BCDESCRIPTION"
t.lexer.lineno += t.valiue.count('\n')
t.lexer.begin('INITIAL')
return t
This is part of the error being returned:
File "/Users/me/Coding/wm/wm_parser/ply/lex.py", line 393, in token
raise LexError("Illegal character '%s' at index %d" % (lexdata[lexpos],lexpos), lexdata[lexpos:])
ply.lex.LexError: Illegal character ' ' at index 40
Finally, if I wanted this functionality for more than one token, how could I accomplish that?
Thanks for your time
There is no big problem with your code,in fact,i just copy your code and run it,it works well
import ply.lex as lex
states = (
('bcdescription', 'exclusive'),
)
tokens = ("BCDESCRIPTION",)
def t_bcdescription(t):
r'\bdescription\b'
t.lexer.code_start = t.lexer.lexpos
t.lexer.level = 1
t.lexer.begin('bcdescription')
def t_bcdescription_close(t):
r'\n'
t.value = t.lexer.lexdata[t.lexer.code_start:t.lexer.lexpos+1]
t.type="BCDESCRIPTION"
t.lexer.lineno += t.value.count('\n')
t.lexer.begin('INITIAL')
return t
def t_bcdescription_content(t):
r'[^\n]+'
lexer = lex.lex()
data = 'description here is the rest of my text to collect\n'
lexer.input(data)
while True:
tok = lexer.token()
if not tok: break
print tok
and result is :
LexToken(BCDESCRIPTION,' here is the rest of my text to collect\n',1,50)
So maybe your can check other parts of your code
and if I wanted this functionality for more than one token, then you can simply capture words and when there comes a word appears in those tokens, start to capture the rest of content by the code above.
It is not obvious why you need to use a lexer/parser for this without further information.
>>> x = 'description here is the rest of my text to collect'
>>> a, b = x.split(' ', 1)
>>> a
'description'
>>> b
'here is the rest of my text to collect'