Python read strings from file, preserving variables to be printed - python

I am making a Python script that will choose a response at random from a list.
To fill this list I want to read strings from a file, the strings will look something like this:
"This number is " + str(num) + ", this is good"
"Oh no the number is " + str(num) +", this is good
Obviously these are read from the file as strings so if I printed one of them they would come out as you see them here and wont have the value for "num" substituted. Is there anyway to read these strings from a file while keeping the ability to substitute variables (like a raw format) like how it would work if my code did
list.append("This number is " + str(num) + ", this is good")
The reason I want to read from a file is because I will have many different strings and they may change so I would rather not hard code them into the program (keep in mind the example strings are very basic)
Thanks

You could use the format specification mini-language, and then call .format on your strings before displaying them.
strings.txt:
This number is {num} this is good
Oh no the number is {num} this is good
main.py:
import random
with open("strings.txt") as file:
possible_strings = file.read().split("\n")
number = 23
s = random.choice(possible_strings)
print(s.format(num=number))
Possible output:
This number is 23 this is good

Use something in your file to indicate a substitution is needed, and then make those substitutions.
For example, if you need to put in the value of num, your text could use {{num}} where the substitution is needed. Then use regex to find such substrings, and replace them with the desired values.

Related

Basic Input/Output in Python

I am using notepad++ to write code for python and have a variable that I need to add 1 to for my next question. I am new to coding and would like to know how to achieve this. I would also like to phrase the question so that the answer (variable plus 1) is placed between text. Below, in my next line I would like it to read (for instance if the number is 3) How often do the 4 of you visit?
I have tried different ways of framing my variable +1 within parentheses and quotation marks but at best, when run, it shows exactly what I wrote not the answer to the equation.
famnumber = input ("How many of your family members still live there?")
I would like the answer to appear within text as noted above if possible.
Here is some code:
famadd = float(famnumber) + (1)
print ("Do all (famadd) of you get together often?")
There are several ways to do this. Note that when you you get something from input, it's a string. So using famnumber += 1 won't work, as you can't add a number to a string. So we have to turn the numeral string into an actual number. You can use int() to convert the input text into an integer. Then to include the value in a new string for your next question, use %d ('d' for 'digit'). This makes more sense than using a float, since people don't report family members in fractions of whole numbers (likewise, you don't want to say something like 'How often do the 4.0 of your meet?').
famnumber = input("How many of your family members still live there? ")
new_number = int(famnumber) + 1
next_question = input("How often do the %d of you meet? " % new_number)
Other ways to accomplish the same thing is converting 'famnumber' itself from a string to an integer, then back into a string to join in the sentence. Personally I'd go with the previous method, but this should give you an idea of some of the other things you can do in Python:
famnumber = input("How many of your family members still live there? ")
famnumber = int(famnumber)
famnumber += 1
next_question = input("How often do the " + str(famnumber) + " of you meet? ")
Also, while Notepad++ is a great text editor, if you're planning on doing a lot of Python scripting and writing, you may want to consider instead using an IDE, such as PyCharm, or IDLE which is included in the Python package. Tools like this make it easier for yourself to read and run your code.
I think you are looking for this kind of code:
famnumber = input("How many of your family members still live there?")
incremented_number = int(famnumber) + 1
next_number = input("How often do you " + str(incremented_number) + "visit")
print(next_number)
In the second line simply cast the input to int and increment it by one.
In the third line put the variable where you want it to be shown surrounded by + signs. You have to cast it to a string by using str() because the return type itself is a string. You can verify the type of the variable next_number simply by adding this line print(type(next_number))

I want to read in strings to the new line character in Python 2.7

I have a long text file that I am trying to pull certain strings out of. The length of these strings are variable with the text file but are always located after certain identifiers. So for example say my text file looks like this:
junk text...
Name:
Age:
Robert
twenty
four.
junk text...
I always know that the "Robert" string is located at "Age:\n\n" but I am not sure how long it is only that it will end at a "\n\n" and the same principle with the "twenty four." string. I have tried using
namepos1 = string.find("Age:")
namepos2 = namepos1 + 6
this will give the starting location of the string I want but I do not know how to save it into a variable such that it always saves the whole string up to the two new line characters. If it was a set length and not variable I think I could use:
name = string[namepos2:length]
but any help would be greatly appreciated. I may have to go about doing it completely different, but this is the first way I have thought about it and tried to do it.
Thanks!
You could do this by finding age, then moving forward your cursor two lines if you would like to do that, if you want the entire section of text after the "junk", and you know how long that text is, this would also work:
lookup = 'age'
lines=[]
with open('C:/Users/Luke/Desktop/Summer 2016/Programs/untitled5.txt') as myFile:
for num, line in enumerate(myFile, 1):
if lookup in line:
lines.append(num+2)
ofile=open('C:/Users/Luke/Desktop/Summer 2016/Programs/untitled5.txt')
line=ofile.readlines()
interestinglines=''
for i in range(len(lines)):
interestinglines+=(line[lines[i]]+'\n')
you may need to tinker with it a bit, but I believe this should reproduce mostly what you're looking for. The '\n' is added onto the line[lines[i]] so that you may save it to a new file.
After you found the location in string, you can split the String by \n\n and get the first item.
s = file_str[namepos2 :]
name = s.split('\n\n')[0]

Function creation - "Undefined name" - Python

I'm writing some code that reads words from a text file and sorts them into a dictionary. It actually all runs fine, but for reference here it is:
def find_words(file_name, delimiter = " "):
"""
A function for finding the number of individual words, and the most popular words, in a given file.
The process will stop at any line in the file that starts with the word 'finish'.
If there is no finish point, the process will go to the end of the file.
Inputs: file_name: Name of file you want to read from, e.g. "mywords.txt"
delimiter: The way the words in the file are separated e.g. " " or ", "
: Delimiter will default to " " if left blank.
Output: Dictionary with all the words contained in the given file, and how many times each word appears.
"""
words = []
dictt = {}
with open(file_name, 'r') as wordfile:
for line in wordfile:
words = line.split(delimiter)
if words[0]=="finish":
break
# This next part is for filling the dictionary
# and correctly counting the amount of times each word appears.
for i in range(len(words)):
a = words[i]
if a=="\n" or a=="":
continue
elif dictt.has_key(a)==False:
dictt[words[i]] = 1
else:
dictt[words[i]] = int(dictt.get(a)) + 1
return dictt
The problem is that it only works if the arguments are given as string literals, e.g, this works:
test = find_words("hello.txt", " " )
But this doesn't:
test = find_words(hello.txt, )
The error message is undefined name 'hello'
I don't know how to alter the function arguments such that I can enter them without speech marks.
Thanks!
Simple, you define that name:
class hello:
txt = "hello.txt"
But joking aside, all the argument values in a function call are expressions. If you want to pass a string literally you'll have to make a string literal, using the quotes. Python is not a text preprocessor like m4 or cpp, and expects the entire program text to follow its syntax.
So it turns out I just misunderstood what was being asked. I've had it clarified by the course leader now.
As I am now fully aware, a function definition needs to be told when a string is being entered, hence the quote marks being required.
I admit full ignorance over my depth of understanding of how it all works - I thought you could pretty much put any assortment of letters and/or numbers in as an argument and then you can manipulate them within the function definition.
My ignorance may stem from the fact that I'm quite new to Python, having learned my coding basics on C++ where, if I remember correctly (it was well over a year ago), functions are defined with each argument being specifically set up as their type, e.g.
int max(int num1, int num2)
Whereas in Python you don't quite do it like that.
Thanks for the attempts at help (and ridicule!)
Problem is sorted now.

Splitting a string in Python 2.7

I want to know how to allow multiple inputs in Python.
Ex: If a message is "!comment postid customcomment"
I want to be able to take that post ID, put that somewhere, and then the customcomment, and put that somewhere else.
Here's my code:
import fb
token="access_token_here"
facebook=fb.graph.api(token)
#__________ Later on in the code: __________
elif msg.startswith('!comment '):
postid = msg.replace('!comment ','',1)
send('Commenting...')
facebook.publish(cat="comments", id=postid, message="customcomment")
send('Commented!')
I can't seem to figure it out.
Thank you in advanced.
I can't quite tell what you are asking but it seems that this will do what you want.
Assuming that
msg = "!comment postid customcomment"
you can use the built-in string method split to turn the string into a list of strings, using " " as a separator and a maximum number of splits of 2:
msg_list=msg.split(" ",2)
the zeroth index will contain "!comment" so you can ignore it
postid=msg_list[1] or postid=int(msg_list[1]) if you need a numerical input
message = msg_list[2]
If you don't limit split and just use the default behavior (ie msg_list=msg.split()), you would have to rejoin the rest of the strings separated by spaces. To do so you can use the built-in string method join which does just that:
message=" ".join(msg_list[2:])
and finally
facebook.publish(cat="comments", id=postid, message=message)

Verify CSV against given format

I am expecting users to upload a CSV file of max size 1MB to a web form that should fit a given format similar to:
"<String>","<String>",<Int>,<Float>
That will be processed later. I would like to verify the file fits a specified format so that the program that shall later use the file doesnt receive unexpected input and that there are no security concerns (say some injection attack against the parsing script that does some calculations and db insert).
(1) What would be the best way to go about doing this that would be fast and thorough? From what I've researched I could go the path of regex or something more like this. I've looked at the python csv module but that doesnt appear to have any built in verification.
(2) Assuming I go for a regex, can anyone direct me to towards the best way to do this? Do I match for illegal characters and reject on that? (eg. no '/' '\' '<' '>' '{' '}' etc.) or match on all legal eg. [a-zA-Z0-9]{1,10} for the string component? I'm not too familiar with regular expressions so pointers or examples would be appreciated.
EDIT:
Strings should contain no commas or quotes it would just contain a name (ie. first name, last name). And yes I forgot to add they would be double quoted.
EDIT #2:
Thanks for all the answers. Cutplace is quite interesting but is a standalone. Decided to go with pyparsing in the end because it gives more flexibility should I add more formats.
Pyparsing will process this data, and will be tolerant of unexpected things like spaces before and after commas, commas within quotes, etc. (csv module is too, but regex solutions force you to add "\s*" bits all over the place).
from pyparsing import *
integer = Regex(r"-?\d+").setName("integer")
integer.setParseAction(lambda tokens: int(tokens[0]))
floatnum = Regex(r"-?\d+\.\d*").setName("float")
floatnum.setParseAction(lambda tokens: float(tokens[0]))
dblQuotedString.setParseAction(removeQuotes)
COMMA = Suppress(',')
validLine = dblQuotedString + COMMA + dblQuotedString + COMMA + \
integer + COMMA + floatnum + LineEnd()
tests = """\
"good data","good2",100,3.14
"good data" , "good2", 100, 3.14
bad, "good","good2",100,3.14
"bad","good2",100,3
"bad","good2",100.5,3
""".splitlines()
for t in tests:
print t
try:
print validLine.parseString(t).asList()
except ParseException, pe:
print pe.markInputline('?')
print pe.msg
print
Prints
"good data","good2",100,3.14
['good data', 'good2', 100, 3.1400000000000001]
"good data" , "good2", 100, 3.14
['good data', 'good2', 100, 3.1400000000000001]
bad, "good","good2",100,3.14
?bad, "good","good2",100,3.14
Expected string enclosed in double quotes
"bad","good2",100,3
"bad","good2",100,?3
Expected float
"bad","good2",100.5,3
"bad","good2",100?.5,3
Expected ","
You will probably be stripping those quotation marks off at some future time, pyparsing can do that at parse time by adding:
dblQuotedString.setParseAction(removeQuotes)
If you want to add comment support to your input file, say a '#' followed by the rest of the line, you can do this:
comment = '#' + restOfline
validLine.ignore(comment)
You can also add names to these fields, so that you can access them by name instead of index position (which I find gives more robust code in light of changes down the road):
validLine = dblQuotedString("key") + COMMA + dblQuotedString("title") + COMMA + \
integer("qty") + COMMA + floatnum("price") + LineEnd()
And your post-processing code can then do this:
data = validLine.parseString(t)
print "%(key)s: %(title)s, %(qty)d in stock at $%(price).2f" % data
print data.qty*data.price
I'd vote for parsing the file, checking you've got 4 components per record, that the first two components are strings, the third is an int (checking for NaN conditions), and the fourth is a float (also checking for NaN conditions).
Python would be an excellent tool for the job.
I'm not aware of any libraries in Python to deal with validation of CSV files against a spec, but it really shouldn't be too hard to write.
import csv
import math
dataChecker = csv.reader(open('data.csv'))
for row in dataChecker:
if len(row) != 4:
print 'Invalid row length.'
return
my_int = int(row[2])
my_float = float(row[3])
if math.isnan(my_int):
print 'Bad int found'
return
if math.isnan(my_float):
print 'Bad float found'
return
print 'All good!'
Here's a small snippet I made:
import csv
f = csv.reader(open("test.csv"))
for value in f:
value[0] = str(value[0])
value[1] = str(value[1])
value[2] = int(value[2])
value[3] = float(value[3])
If you run that with a file that doesn't have the format your specified, you'll get an exception:
$ python valid.py
Traceback (most recent call last):
File "valid.py", line 8, in <module>
i[2] = int(i[2])
ValueError: invalid literal for int() with base 10: 'a3'
You can then make a try-except ValueError to catch it and let the users know what they did wrong.
There can be a lot of corner-cases for parsing CSV, so you probably don't want to try doing it "by hand". At least start with a package/library built-in to the language that you're using, even if it doesn't do all the "verification" you can think of.
Once you get there, then examine the fields for your list of "illegal" chars, or examine the values in each field to determine they're valid (if you can do so). You also don't even need a regex for this task necessarily, but it may be more concise to do it that way.
You might also disallow embedded \r or \n, \0 or \t. Just loop through the fields and check them after you've loaded the data with your csv lib.
Try Cutplace. It verifies that tabluar data conforms to an interface control document.
Ideally, you want your filtering to be as restrictive as possible - the fewer things you allow, the fewer potential avenues of attack. For instance, a float or int field has a very small number of characters (and very few configurations of those characters) which should actually be allowed. String filtering should ideally be restricted to only what characters people would have a reason to input - without knowing the larger context it's hard to tell you exactly which you should allow, but at a bare minimum the string match regex should require quoting of strings and disallow anything that would terminate the string early.
Keep in mind, however, that some names may contain things like single quotes ("O'Neil", for instance) or dashes, so you couldn't necessarily rule those out.
Something like...
/"[a-zA-Z' -]+"/
...would probably be ideal for double-quoted strings which are supposed to contain names. You could replace the + with a {x,y} length min/max if you wanted to enforce certain lengths as well.

Categories

Resources