Python Regex on File Read Input - python

So im reading from a file like so.
f = open("log.txt", 'w+r')
f.read()
So f now has a bunch of lines, but im mainly concerned with it having a number, and then a specific string (For example "COMPLETE" would be the string)
How...exactly would you go about checking this?
I thought it'd be something like:
r.search(['[0-9]*'+,"COMPLETE")
but that doesn't seem to work? Maybe it's my Regex thats wrong (im pretty terrible at it)
but basically it just needs to check the Entire String (which is multiple lines and contain's \n's for a Number (specifically 200) and the word COMPLETE (in caps)
edit: For reference here is what the logfile looks like
Using https
Sending install data... DONE
200 COMPLETE
<?xml version="1.0" encoding="UTF-8"?>
<SolutionsRevision version="185"/>
I just need to make sure it says "200" and COMPLETE

Regular expressions are overkill here if you're just looking for "200 COMPLETE". Just do this:
if "200 COMPLETE" in log:
# Do something

You should use Rafe's answer instead of regex to search for "200 COMPLETE" in the file content, but your current code won't work for reading the file, using "w+r" as the mode will truncate your file. You need to do something like this:
f = open("log.txt", "r")
log = f.read()
if "200 COMPLETE" in log:
# Do something

It should be something like
m = r.search('[0-9]+\s+COMPLETE',line)

Related

Run python files and change strings with 1 file

lets say i have a folder which contains the following files:
f1.py
f2.py
f3.py
in f1.py i got this code:
#O = "Random string"
print("ABCD")
#P = "Random string"
but in f2.py and f3.py i have this code:
#M = "Random string"
print("EFGH")
#Z = "Random string"
And i want to change the strings in the 'print' function in f2.py and f3.py to the string i have in print in f1.py, and run all the files in the folder after changing the strings, using f1.py
It would be best to have more context why you want to do this.
This is possible, but in 99% of the cases it's not a god idea to write self modifying code, though it can be a lot of fun.
In fact you do not really write self modifying code, but more one piece of code modifying other files. But this is also rarely to be recommended.
What's more usual is, that one script analyzes / parses f1.py, extracts the data writes some data into a file (e.g. a json file)
and f2.py and f3.py read the data from that file and do then print this data.
Is there a particular reason you want to have code, that is modifying other python files.
If you really want to have f2.py and f3.py modified, then there is another solution, which is called templating (you can for example use Jinji).
In this case you have two template files f2.py.template and f3.py.template.
you write a file parsing f1.py, extracting the data and creates f2.py from f2.py.template and the extracted data. (Same for f3.py.template and f3.py)
If you're really 100% sure, that you really want what you ask for.
Yes it is possible:
you write a script, tht opens and reads f1.py line by line, looks for the line "#O = ", then memorizes the next line.
Then it reads f2.py line by line and writes it to another file (e.g. next_version_of_f2.py). it reads in a line and writes it out until it encounters the line #M = "Random string in f2.py In this case the line will be written out, the desired print will be written out, the print line from f2.py will be read and ignored and then you read and write all the other lines.
Then you close f2.py and next_version_of_f2.py, rename f2.py into f2.py.old and rename next_version_of_f2.py to f2.py
This is certainly possible but probably inadvisable.
Editing code should typically be a separate action from executing it (even when we use a single tool that can do both, like a lot of modern IDEs).
It suggests a poor workflow. If you want f2.py to print("ABCD"), then it should be written that way.
It's confusing. In order to understand what f2.py does, you have to mentally model the entirety of f1.py and f2.py, and there's no indication of this in f2.py.
It invites all kinds of difficult-to-debug situations. What happens if f1.py is run twice at the same time? What if two different versions of f1.py are run at the same time? Or if I happen to be reading f2.py when you run f1.py? Or if I'm editing f2.py and save my changes while you're running f1.py?
It's a security problem. For f1.py to edit f2.py, the user (shell, web-server, or other surface) calling f1.py has to have edit permissions on f2.py. That means that if they can get f1.py to do something besides what you intended (specifically, if they can get their own text in place of "ABCD"), then they can get arbitrary code execution in everyone else's runtime!
Note that it's perfectly fine to have code that generates or edits other code. The problem is when a program (possibly spanning multiple files) edits its own source.
gelonida discusses some options, which are fine and appropriate for certain contexts such as managing user-specific configuration or building documents. That said, if you're familiar with functions, variables, imports, and other basics of computer science, then you may not even need a config.json file or a template engine.
Reconsider the end result you're trying to accomplish. Do some more research/reading, and if you're still stuck start a new question about the bigger-picture task.
complaints about XYZ problems are old hat, so here's how to do what you want, even though it's awful and you really shouldn't.
f1.py
import os
import re
import sys
#O = "Random string"
print("ABCD")
#P = "Random string"
selector = re.compile(
r'#([A-Z]) = \"Random string\"\nprint\(\"([A-Z]{4})\"\)\n#([A-Z]) = \"Random string\"')
template = '''#{} = "Random string"
print("{}")
#{} = "Random string"'''
own_file_name = os.path.abspath(__file__)
own_directory = os.path.dirname(own_file_name)
def read_file(name: str) -> str:
with open(name, 'r') as f:
return f.read()
find_replacement = selector.match(read_file(own_file_name))
replacement = find_replacement.group(1) if find_replacement else False
if not replacement:
sys.exit(-1)
def make_replacement(reg_match) -> str:
return template.format(reg_match.group(1), replacement, reg_match.group(3))
for dir_entry in os.listdir(own_directory):
if dir_entry.is_file():
original = read_file(dir_entry.path)
with open(dir_entry.path, 'w') as out_file:
out_file.write(selector.sub(make_replacement, original))
# this will cause an infinite loop, but you technically asked for it :)
for dir_entry in os.listdir(own_directory):
if dir_entry.is_file():
exec(read_file(dir_entry.path))
I want to be clear that the above is a joke. I haven't tested it, and I desperately hope it won't actually solve any problems for you.

Taking String arguments for a function without quotes

I've got a function meant to download a file from a URL and write it to a disk, along with imposing a particular file extension. At present, it looks something like this:
import requests
import os
def getpml(url,filename):
psc = requests.get(url)
outfile = os.path.join(os.getcwd(),filename+'.pml')
f = open(outfile,'w')
f.write(psc.content)
f.close()
try:
with open(outfile) as f:
print "File Successfully Written"
except IOError as e:
print "I/O Error, File Not Written"
return
When I try something like
getpml('http://www.mysite.com/data.txt','download') I get the appropriate file sitting in the current working directory, download.pml. But when I feed the function the same arguments without the ' symbol, Python says something to the effect of "NameError: name 'download' is not defined" (the URL produces a syntax error). This even occurs if, within the function itself, I use str(filename) or things like that.
I'd prefer not to have to input the arguments of the function in with quote characters - it just makes entering URLs and the like slightly more difficult. Any ideas? I presume there is a simple way to do this, but my Python skills are spotty.
No, that cannot be done. When you are typing Python source code you have to type quotes around strings. Otherwise Python can't tell where the string begins and ends.
It seems like you have a more general misunderstanding too. Calling getpml(http://www.mysite.com) without quotes isn't calling it with "the same argument without quotes". There simply isn't any argument there at all. It's not like there are "arguments with quotes" and "arguments without quotes". Python isn't like speaking a natural human language where you can make any sound and it's up to the listener to figure out what you mean. Python code can only be made up of certain building blocks (object names, strings, operators, etc.), and URLs aren't one of those.
You can call your function differently:
data = """\
http://www.mysite.com/data1.txt download1
http://www.mysite.com/data2.txt download2
http://www.mysite.com/data3.txt download3
"""
for line in data.splitlines():
url, filename = line.strip().split()
getpml(url, filename)

reading output from a text file

For instance, the following test.py script can be displayed at ../cgi-bin/test.py, but is there a way I could display the same output from a text file (like test.txt) instead of text.py, so that the url would be something like ../text.txt?
--- test.py ---
def printxt():
print "Content-Type: text/plain"
print """my text here..."""
If you point your browser to www.yoursite.com/yourtext.txt, you'll discover that most browsers are fully capable of displaying text files without any additional code.
You appear to be asking if web servers can serve, and web browsers display text files. The answer is yes. Put the text file underneath DOCUMENTROOT and it's all good.
Or did I misunderstand the question?
If you want to use Python to do this in the context of a larger program (the example you just gave would be useless if that's all that you wanted it to do), you can simply use the standard:
file = open("filename.txt", "r")
for line in file:
print line
You already know about the Content-type line which of course would be the same.
If the file is small enough to be read into memory all at once without causing problems, you can use "print file.read()" in order to have file.read() read the entire file as a single string, then print that out.

Python - writing lines from file into IRC buffer

Ok, so I am trying to write a Python script for XCHAT that will allow me to type "/hookcommand filename" and then will print that file line by line into my irc buffer.
EDIT: Here is what I have now
__module_name__ = "scroll.py"
__module_version__ = "1.0"
__module_description__ = "script to scroll contents of txt file on irc"
import xchat, random, os, glob, string
def gg(ascii):
ascii = glob.glob("F:\irc\as\*.txt")
for textfile in ascii:
f = open(textfile, 'r')
def gg_cb(word, word_eol, userdata):
ascii = gg(word[0])
xchat.command("msg %s %s"%(xchat.get_info('channel'), ascii))
return xchat.EAT_ALL
xchat.hook_command("gg", gg_cb, help="/gg filename to use")
Well, your first problem is that you're referring to a variable ascii before you define it:
ascii = gg(ascii)
Try making that:
ascii = gg(word[0])
Next, you're opening each file returned by glob... only to do absolutely nothing with them. I'm not going to give you the code for this: please try to work out what it's doing or not doing for yourself. One tip: the xchat interface is an extra complication. Try to get it working in plain Python first, then connect it to xchat.
There may well be other problems - I don't know the xchat api.
When you say "not working", try to specify exactly how it's not working. Is there an error message? Does it do the wrong thing? What have you tried?

trouble parsing XML in python

I'm trying to query a database, then convert the file-like object it returns to an XML document. Here's what I've been doing:
>>> import urllib, xml.dom.minidom
>>> query = "http://sbol.bhi.washington.edu/openrdf-sesame/repositories/sbol_test?query=select%20distinct%20%3Fname%20%3Ffeaturename%20where%20%7B%3Fpart%20%3Chttp%3A%2F%2Fsbol.bhi.washington.edu%2Frdf%2Fsbol.owl%23annotation%3E%20%3Fannotation%3B%3Chttp%3A%2F%2Fsbol.bhi.washington.edu%2Frdf%2Fsbol.owl%23status%3E%20'Available'%3B%3Chttp%3A%2F%2Fsbol.bhi.washington.edu%2Frdf%2Fsbol.owl%23name%3E%20%3Fname.%3Fannotation%20%3Chttp%3A%2F%2Fsbol.bhi.washington.edu%2Frdf%2Fsbol.owl%23feature%3E%20%3Ffeature.%3Ffeature%20%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23type%3E%20%3Chttp%3A%2F%2Fsbol.bhi.washington.edu%2Frdf%2Fsbol.owl%23binding%3E%3B%3Chttp%3A%2F%2Fsbol.bhi.washington.edu%2Frdf%2Fsbol.owl%23name%3E%20%3Ffeaturename%7D"
>>> raw_result = urllib.urlopen(query)
>>> xml_result = xml.dom.minidom.parse(raw_result)
That last command gives me
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1, column 4
Almost the same thing happens if I use xml.etree.ElementTree to do the parsing. I think they both use Expat. The weird part is, if instead of loading the file in python I just paste the query into Firefox, the resulting file can be read in perfectly well using open(path_to_file, "r").
Any ideas what this could be?
UPDATE:
This is the first line of the file:
<?xml version='1.0' encoding='UTF-8'?>
However that may not be what's in raw_result... that's what you get after downloading query-result.srx and changing the extension to .txt. The file extension doesn't matter does it? Also, I'm pretty new to this whole xml thing—why is column 4 the 8th character? – Jeff 0 secs ago edit
Your server is picky about the accept header in deciding what to send back and in which format. The following should work:
In [265]: import urllib2
In [266]: req = urllib2.Request(query, headers={'Accept':'application/xml'})
In [267]: rsp = urllib2.urlopen(req)
In [268]: xml = minidom.parse(rsp)
In [268]: xml.toxml()[:64]
Out[268]: u'<?xml version="1.0" ?><sparql xmlns="http://www.w3.org/2005/spar'
Note the accept header in urllib2.Request.
Any chance you could post the XML snippet? The parser is indicating that the error is happening at the very first line. My guess is the formatting is off or reporting incorrectly, which is causing EXPAT to pitch an exception right off the bat.
My guess is that first line violates something in the "well formed XML" content anwyay. For reference, you might compare against http://en.wikipedia.org/wiki/XML
Looks like something is wrong with your XML file, right about line 1, column 4.
I tried this, and what I got doesn't look like XML to me. Here are the first eight characters, as Alex suggested:
>>> raw_result.read(8)
'BRTR\x00\x00\x00\x03'
It seems that the RDF server is delivering plain text to your urllib.urlopen call.
You should be able, with setting the right header
Accept: application/sparql-results+xml, */*;q=0.5
, to get the xml response. You have to read the RDF protocol specification of openRDF for details - there is for openRDF more than one format.

Categories

Resources