Replacing multiple words in a string from different data sets in Python

Replacing multiple words in a string from different data sets in Python - python

Essentially I have a python script that loads in a number of files, each file contains a list and these are used to generate strings. For example: "Just been to see $film% in $location%, I'd highly recommend it!" I need to replace the $film% and $location% placeholders with a random element of the array of their respective imported lists.
I'm very new to Python but have picked up most of it quite easily but obviously in Python strings are immutable and so handling this sort of task is different compared to other languages I've used.
Here is the code as it stands, I've tried adding in a while loop but it would still only replace the first instance of a replaceable word and leave the rest.
#!/usr/bin/python
import random
def replaceWord(string):
#Find Variable Type
if "url" in string:
varType = "url"
elif "film" in string:
varType = "film"
elif "food" in string:
varType = "food"
elif "location" in string:
varType = "location"
elif "tvshow" in string:
varType = "tvshow"
#LoadVariableFile
fileToOpen = "/prototype/default_" + varType + "s.txt"
var_file = open(fileToOpen, "r")
var_array = var_file.read().split('\n')
#Get number of possible variables
numberOfVariables = len(var_array)
#ChooseRandomElement
randomElement = random.randrange(0,numberOfVariables)
#ReplaceWord
oldValue = "$" + varType + "%"
newString = string.replace(oldValue, var_array[randomElement], 1)
return newString
testString = "Just been to see $film% in $location%, I'd highly recommend it!"
Test = replaceWord(testString)
This would give the following output: Just been to see Harry Potter in $location%, I'd highly recommend it!
I have tried using while loops, counting the number of words to replace in the string etc. however it still only changes the first word. It also needs to be able to replace multiple instances of the same "variable" type in the same string, so if there are two occurrences of $film% in a string it should replace both with a random element from the loaded file.

The following program may be somewhat closer to what you are trying to accomplish. Please note that documentation has been included to help explain what is going on. The templates are a little different than yours but provide customization options.
#! /usr/bin/env python3
import random
PATH_TEMPLATE = './prototype/default_{}s.txt'
def main():
"""Demonstrate the StringReplacer class with a test sting."""
replacer = StringReplacer(PATH_TEMPLATE)
text = "Just been to see {film} in {location}, I'd highly recommend it!"
result = replacer.process(text)
print(result)
class StringReplacer:
"""StringReplacer(path_template) -> StringReplacer instance"""
def __init__(self, path_template):
"""Initialize the instance attribute of the class."""
self.path_template = path_template
self.cache = {}
def process(self, text):
"""Automatically discover text keys and replace them at random."""
keys = self.load_keys(text)
result = self.replace_keys(text, keys)
return result
def load_keys(self, text):
"""Discover what replacements can be made in a string."""
keys = {}
while True:
try:
text.format(**keys)
except KeyError as error:
key = error.args[0]
self.load_to_cache(key)
keys[key] = ''
else:
return keys
def load_to_cache(self, key):
"""Warm up the cache as needed in preparation for replacements."""
if key not in self.cache:
with open(self.path_template.format(key)) as file:
unique = set(filter(None, map(str.strip, file)))
self.cache[key] = tuple(unique)
def replace_keys(self, text, keys):
"""Build a dictionary of random replacements and run formatting."""
for key in keys:
keys[key] = random.choice(self.cache[key])
new_string = text.format(**keys)
return new_string
if __name__ == '__main__':
main()

The varType you are assigning will be set in only one of your if-elif-else sequence and then the interpreter will go outside. You would have to run all over it and perform operations. One way would be to set flags which part of sentence you want to change. It would go that way:
url_to_change = False
film_to_change = False
if "url" in string:
url_to_change = True
elif "film" in string:
film_to_change = True
if url_to_change:
change_url()
if film_to_change:
change_film()
If you want to change all occurances you could use a foreach loop. Just do something like this in the part you are swapping a word:
for word in sentence:
if word == 'url':
change_word()
Having said this, I'd reccomend introducing two improvements. Push changing into separate functions. It would be easier to manage your code.
For example function for getting items from file to random from could be
def load_variable_file(file_name)
fileToOpen = "/prototype/default_" + file_name + "s.txt"
var_file = open(fileToOpen, "r")
var_array = var_file.read().split('\n')
var_file.clos()
return var_array
Instead of
if "url" in string:
varType = "url"
you could do:
def change_url(sentence):
var_array = load_variable_file(url)
numberOfVariables = len(var_array)
randomElement = random.randrange(0,numberOfVariables)
oldValue = "$" + varType + "%"
return sentence.replace(oldValue, var_array[randomElement], 1)
if "url" in sentence:
setnence = change_url(sentence)
And so on. You could push some part of what I've put into change_url() into a separate function, since it would be used by all such functions (just like loading data from file). I deliberately do not change everything, I hope you get my point. As you see with functions with clear names you can write less code, split it into logical, reusable parts, no needs to comment the code.

A few points about your code:
You can replace the randrange with random.choice as you just
want to select an item from an array.
You can iterate over your types and do the replacement without
specifying a limit (the third parameter), then assign it to the same object, so you keep all your replacements.
readlines() do what you want for open, read from the file as store the lines as an array
Return the new string after go through all the possible replacements
Something like this:
#!/usr/bin/python
import random
def replaceWord(string):
#Find Variable Type
types = ("url", "film", "food", "location", "tvshow")
for t in types:
if "$" + t + "%" in string:
var_array = []
#LoadVariableFile
fileToOpen = "/prototype/default_" + varType + "s.txt"
with open(fname) as f:
var_array = f.readlines()
tag = "$" + t + "%"
while tag in string:
choice = random.choice(var_array)
string = string.replace(tag, choice, 1)
var_array.remove(choice)
return string
testString = "Just been to see $film% in $location%, I'd highly recommend it!"
new = replaceWord(testString)
print(new)

Related

It doesn't give anthing. Can someone help me about dictionaries and objects and classes in python?

I am trying to print frequencies of all words in a text.
I wanna print all keys according to their sorted values.
Namely, I wanna print the frequencies from most frequent to least frequent.
Here is my code:
freqMap = {}
class analysedText(object):
def __init__(self, text):
# remove punctuation
formattedText = text.replace('.', '').replace('!', '').replace('?', '').replace(',', '')
# make text lowercase
formattedText = formattedText.lower()
self.fmtText = formattedText
def freqAll(self):
wordList = self.fmtText.split(' ')
freqMap = {}
for word in set(wordList):
freqMap[word] = wordList.count(word)
return freqMap
mytexte = str(input())
my_text = analysedText(mytexte)
my_text.freqAll()
freqKeys = freqMap.keys()
freqValues = sorted(freqMap.values())
a = 0
for i in freqValues:
if i == a:
pass
else:
for key in freqKeys:
if freqMap[key] == freqValues[i]:
print(key,": ", freqValues[i])
a = i

Your function freqAll returns a value that you are not catching.
It should be:
counts = my_text.freqAll()
Then you use the counts variable in the rest of your code.

freqAll method of your class does return freqMap which you should store but do not do that, therefore you are in fact processing empty dict freqMap, which was created before class declaration. Try replacing
my_text.freqAll()
using
freqMap = my_text.freqAll()

How to dynamically replace chars in a string with chars in another string having same structure based on condition?

I have a graph of dependencies where there's parent and child nodes. Child nodes have a # sign indicating that char/number is the same as the parent node. I understand the title might be weird, let me give you an example:
Initial reference variable:
ref = '12345.1.1'
Strings that will need replacing within:
example1 = '#.1.2'
example2 = '#.#.3'
Outcome after conversion/replacing (this is what I need help with):
# Make some magic, replace #'s with matching parent digits to get this output on string variables above:
example1 = '12345.1.2'
example2 = '12345.1.3'
In essence, how do I replace the # char (if present) to its matching "parent" stringified digits? I guess it might be able to work using replace or regex, but if there's any builtin methods that would work, I'd be happy to know.
Thanks in advance.

ref = '12345.1.1'
example1 = '#.1.2'
example2 = '#.#.3'
def replace(text, ref='12345.1.1', split='.', placeholder='#'):
ref = ref.split(split)
text = text.split(split)
return split.join(txt1 if txt2 == placeholder else txt2
for txt1, txt2 in zip(ref, text))
print(replace(example1))
print(replace(example2))
print(replace('#.#.#'))
output
12345.1.2
12345.1.3
12345.1.1

A bit cumbersome, but this should do it:
import re
class Replacement:
def __init__(self, ref):
self.ref = ref.split(".")
self.counter = 0
def repl(self, match):
if match.group() == "#":
res = self.ref[self.counter]
self.counter += 1
return res
return match.group()
example1 = '#.1.2'
example2 = '#.#.3'
for example in [example1, example2]:
r = Replacement(ref='12345.1.1')
result = re.sub("#", r.repl, example)
print(result)
Output
12345.1.2
12345.1.3
Note that you need to create a new Replacement object or restart the counter for each example in your input data.

Concise function
def replace_char(string: str, palce_holder: str = '#', split: str = '.') -> str:
return split.join((node[0] if node[1] == palce_holder else node[1] for node in zip(ref.split('.'), string.split('.'))))

NameError in Python - not sure if the class or the function is causing the error

I have a text file describing resumes where each line looks like:
name university sex filename
So one line would say something like
John Texas M resume1.doc
The file has standard formatting and does not contain any errors. There are four possible names and four possible universities, randomized to create 64 resumes. I'm trying to write a program that reads through the text file, creates a resume object with attributes for the name, university, sex, and filename, and adds these objects to a list of resume objects. I have a lot of experience in C++, but this is my first Python program and I'm getting thrown off by an error:
File "mycode.py", line 142, in <module>
resumes()
File "mycode.py", line 65, in resumes
r = resume(name,uni,sex,filename)
NameError: global name "name" is not defined
My code looks like:
class resume:
def __init__(self, name, uni, sex, filename)
self.name = name
self.uni = uni
self.sex = sex
self.filename = filename
mylist[]
def resumes():
f = open("resumes.txt",'r')
for line in f:
for word in line.split():
if word == ("John" or "Fred" or "Jim" or "Michael"):
name = word
elif word == ("Texas" or "Georgia" or "Florida" or "Montana"):
uni = word
elif word == "M":
sex = word
elif re.match(r'\w\.doc',word):
filename = word
r = resume(name,uni,sex,filename)
mylist.insert(r)
I'm not sure if the error is in the class or the function. My computer isn't showing any syntax errors but I'm new to this so if there are, please feel free to tell me how to fix them.
I've tried defining name, uni, etc. outside the "for word in line.split()" loop but the program still had an issue with the line "r = resume(name,uni,sex,filename)" so I'm not sure what the issue is. I've read through other answers about NameError but I'm new to Python and couldn't figure out the equivalent problem in my code.

The NameError is caused by undefined variables in cases where no values are found in the text file. Define them within the function before you try to assign values from the text file to them:
def resumes():
f = open("resumes.txt",'r')
for line in f:
name = ""
uni = ""
sex = ""
filename = ""
for word in line.split():
...
You can also pre-define the variables in your class initialization by using keyword arguments if you like (this isn't the cause of the NameError though):
class resume:
def __init__(self, name="", uni="", sex="", filename="")
self.name = name
self.uni = uni
self.sex = sex
self.filename = filename
Defining a list in python is done by typing mylist = [], not mylist[]. Also, at the moment, the list would be defined in the global namespace which is generally discouraged. Instead, you can make resumes return a list and assign this value to mylist:
def resumes():
resume_list = []
f = open("resumes.txt",'r')
for line in f:
for word in line.split():
if word == ("John" or "Fred" or "Jim" or "Michael"):
name = word
elif word == ("Texas" or "Georgia" or "Florida" or "Montana"):
uni = word
elif word == "M":
sex = word
elif re.match(r'\w\.doc',word):
filename = word
r = resume(name,uni,sex,filename)
resume_list.insert(r)
return resume_list
Then you can do the following anywhere in your code:
mylist = resumes()
Remember to close files after opening them; in your case by calling f.close() after processing all the lines. Even better, have python manage it automatically by using the context manager with so you don't have to call f.close():
def resumes():
with open("resumes.txt",'r') as f:
for line in f:
...
Typically, you'd use append rather than insert when working with lists. insert takes two arguments (position/index, and the element to insert) so mylist.insert(r) should raise a TypeError: insert() takes exactly 2 arguments (1 given). Instead, do mylist.append(r) to insert r after the last element in the list.
As, johnrsharpe pointed out in the comments, your word comparisons probably aren't doing what you expect. See this example:
>>> word = "John"
>>> word == ("John" or "Fred" or "Jim" or "Michael")
True
>>> word = "Fred"
>>> word == ("John" or "Fred" or "Jim" or "Michael")
False
>>>
Instead, use a tuple or a set and the keyword in to check if word equals any of the four names:
>>> word = "John"
>>> word in {"John", "Fred", "Jim", "Michael"}
True
>>> word = "Fred"
>>> word in {"John", "Fred", "Jim", "Michael"}
True
>>>
>>> type({"John", "Fred", "Jim", "Michael"})
<type 'set'>
>>>
Finally, as Daniel pointed out, remember the colon, :, after function definitions such as def __init__(...)

Your code is throwing a NameError because at some point in the iteration of your file, some word variable doesn't fulfill any of the conditionals in this line of your function: if word == ("John" or "Fred" or "Jim" or "Michael"):, and name doesn't get defined.
The simplest way to workaround this error is to assign default values to your variables outside the scopes of your class and function (or within the scope of your function):
name = "name"
uni = "uni"
sex = "sex"
filename = "filename"
class resume:
# rest of your code
As an alternative, you could include conditional checks within your function for your variables; if the variable isn't yet defined, assign it a default value:
if "name" not in locals():
name = "name"
r = resume(name,uni,sex,filename)
Finally, you'll want to append a colon to this line, from this:
def __init__(self, name, uni, sex, filename)
to this:
def __init__(self, name, uni, sex, filename):
change this line where you intialize mylist from this:
mylist[]
to this:
mylist = []
and change:
mylist.insert(r)
to:
mylist.append(r)

Python equivalent of Fortran list-directed input

I'd like to be able to read data from an input file in Python, similar to the way that Fortran handles a list-directed read (i.e. read (file, *) char_var, float_var, int_var).
The tricky part is that the way Fortran handles a read statement like this is very "forgiving" as far as the input format is concerned. For example, using the previous statement, this:
"some string" 10.0, 5
would be read the same as:
"some string", 10.0
5
and this:
"other string", 15.0 /
is read the same as:
"other string"
15
/
with the value of int_var retaining the same value as before the read statement. And trickier still this:
"nother string", , 7
will assign the values to char_var and int_var but float_var retains the same value as before the read statement.
Is there an elegant way to implement this?

That is indeed tricky - I found it easier to write a pure-python stated-based tokenizer than think on a regular expression to parse each line (tough it is possible).
I've used the link provided by Vladimir as the spec - the tokenizer have some doctests that pass.
def tokenize(line, separator=',', whitespace="\t\n\x20", quote='"'):
"""
>>> tokenize('"some string" 10.0, 5')
['some string', '10.0', '5']
>>> tokenize(' "other string", 15.0 /')
['other string', '15.0', '/']
>>> tokenize('"nother string", , 7')
['nother string', '', '7']
"""
inside_str = False
token_started = False
token = ""
tokens = []
separated = False
just_added = False
for char in line:
if char in quote:
if not inside_str:
inside_str = True
else:
inside_str = False
tokens.append(token)
token = ""
just_added = True
continue
if char in (whitespace + separator) and not inside_str:
if token:
tokens.append(token)
token = ""
just_added = True
elif char in separator:
if not just_added:
tokens.append("")
just_added = False
continue
token += char
if token:
tokens.append(token)
return tokens
class Character(object):
def __init__(self, length=None):
self.length = length
def __call__(self, text):
if self.length is None:
return text
if len(text) > self.length:
return text[:self.length]
return "{{:{}}}".format(self.length).format(text)
def make_types(types, default_value):
return types, [default_value] * len[types]
def fortran_reader(file, types, default_char="/", default_value=None, **kw):
types, results = make_types(types, default_value)
tokens = []
while True:
tokens = []
while len(tokens) < len(results):
try:
line = next(file)
except StopIteration:
raise StopIteration
tokens += tokenize(line, **kw)
for i, (type_, token) in enumerate(zip(types, tokens)):
if not token or token in default_char:
continue
results[i] = type_(token)
changed_types = yield(results)
if changed_types:
types, results = make_types(changed_types)
I have not teste this thoughtfully - but for the tokenizer -
it is designed to work in a Python forstatement if the same fields are repeated over and over again - or it can be used with Python's iterators send method to change the values to be read on each iteration.
Please test, and e-mail me (address at my profile) some testing file. If there is indeed nothing similar, maybe this deserves some polishing and be published in Pypi.

Since I was not able to find a solution to this problem, I decided to write my own solution.
The main drivers are a reader class, and a tokenizer. The reader gets one line at a time from the file, passes it to the tokenizer, and assigns to the variables it is given, getting the next line as necessary.
class FortranAsciiReader(file):
def read(self, *args):
"""
Read from file into the given objects
"""
num_args = len(args)
num_read = 0
encountered_slash = False
# If line contained '/' or read into all varialbes, we're done
while num_read < num_args and not encountered_slash:
line = self.readline()
if not line:
raise Exception()
values = tokenize(line)
# Assign elements one-by-one into args, skipping empty fields and stopping at a '/'
for val in values:
if val == '/':
encountered_slash = True
break
elif val == '':
num_read += 1
else:
args[num_read].assign(val)
num_read += 1
if num_read == num_args:
break
The tokenizer splits the line into tokens in accordance with the way that Fortran performs list directed reads, where ',' and white space are separators, tokens may be "repeated" via 4*token, and a / terminates input.
My implementation of the tokenizer is a bit long to reproduce here, and I also included classes to transparently provide the functionality of the basic Fortran intrinsic types (i.e. Real, Character, Integer, etc.). The whole project can be found on my github account, currently at https://github.com/bprichar/PyLiDiRe. Thanks jsbueno for inspiration for the tokenizer.

Why does my program add ('', ' to the name of my file?

Here is my code (sorry for the messy code):
def main():
pass
if __name__ == '__main__':
main()
from easygui import *
import time
import os
import random
import sys
##multenterbox(msg='Fill in values for the fields.', title=' ', fields=(), values=())
msg = "Enter your personal information"
title = "Credit Card Application"
fieldNames = ["First name",'Last name','email',"Street Address","City","State","ZipCode",'phone','phone 2)']
fieldValues = [] # we start with blanks for the values
fieldValues = multenterbox(msg,title, fieldNames)
# make sure that none of the fields was left blank
def make(x):
xys = x,".acc"
xyzasd = str(xys)
tf = open(xyzasd,'a+')
tf.writelines(lifes)
tf.writelines("\n")
tf.writelines("credits = 0")
tf.close
def add(x):
nl = "\n"
acc = ".acc"
xy = x + acc
exyz = xy
xyz = exyz
xxx = str(xyz)
tf = open('accounts.dat',"a+")
tf.writelines(nl)
tf.writelines(xxx)
tf.close
while 1:
if fieldValues == None: break
errmsg = ""
for i in range(len(fieldNames)-1):
if fieldValues[i].strip() == "":
errmsg += ('"%s" is a required field.\n\n' % fieldNames[i])
if errmsg == "":
break # no problems found
fieldValues = multenterbox(errmsg, title, fieldNames, fieldValues)
names = enterbox(msg= ('confirm FIRST name and the FIRST LETTER of the persons LAST name'))
##txt = "acc"
##na = str(name)
##name = (names)
life = ( str(fieldValues))
lifes = life,'\n'
herro = ("Reply was: %s" % str(fieldValues))
correct = buttonbox(msg=(herro,'\n is that correct'),choices = ('yes','no','cancel'))
if correct == "yes":
make(names)
add(names)
elif correct == "no":
os.system('openacc.py')
time.sleep(0.5)
sys.exit()
else:
os.system('cellocakes-main.py')
sys.exit()
os.system('cellocakes-main.py')
I don't know what the problem is also I am sorry about how sloppy it was programmed I have a white board to help me out still new to programming (I'm only 13) sorry. Personally I think the issue is in the def add area's syntax but because I am still new I don't see the issue personally I am hoping to have a more experienced programmer help me out.

This is an answer not directly answering your question.
Alas, comment fields are STILL not capable to hold formatted code, so I choose this way.
def main():
pass
if __name__ == '__main__':
main()
This is a nice coding pattern, but used by you in a useless way.
It is supposed to prevent executing of the stuff if it is imported as a module and not executed as a script.
Nevertheless, it is not bad to use it always, but then put your code inside the main() function instead of adding it below.
fieldNames = ["First name",'Last name','email',"Street Address","City","State","ZipCode",'phone','phone 2)']
There is a ) too much.
fieldValues = [] # we start with blanks for the values
fieldValues = multenterbox(msg,title, fieldNames)
The second line makes the first one useless, as you don't use fieldValues in-between.
It would be different if you expected multenterbox() to fail and would want [] as a default value.
def make(x):
xys = x,".acc"
xyzasd = str(xys)
tf = open(xyzasd,'a+')
tf.writelines(lifes)
tf.writelines("\n")
tf.writelines("credits = 0")
tf.close
You was already told about this: x, ".acc" creates a tuple, not a string. To create a string, use x + ".acc".
Besides, your close call is no call, because it is missing the (). This one just references the function and ignores the value.
A better way to write this would be (please name your variables appropriately)
with open(xyzs, 'a+') as tf:
tf.writelines(lifes)
tf.writelines("\n")
tf.writelines("credits = 0")
The with statement automatically closes the file, even if an error occurs.
Besides, you use writelines() wrong: it is supposed to take a sequence of strings and write each element to the file. As it doesn't add newlines in-between, the result looks the same,. but in your case, it writes each byte separately, making it a little bit more inefficient.
Additionally, you access the global variable lifes from within the function. You should only do such things if it is absolutely necessary.
def add(x):
Here the same remarks hold as above, plus
xy = x + acc
exyz = xy
xyz = exyz
xxx = str(xyz)
why that? Just use xy; the two assignments do nothing useful and the str() call is useless as well, as you already have a string.
for i in range(len(fieldNames)-1):
if fieldValues[i].strip() == "":
errmsg += ('"%s" is a required field.\n\n' % fieldNames[i])
Better:
for name, value in zip(fieldNames, fieldValues):
if not value.strip(): # means: empty
errmsg += '"%s" is a required field.\n\n' % name
Then:
life = ( str(fieldValues))
makes a string from a list.
lifes = life,'\n'
makes a tuple from these 2 strings.
os.system('openacc.py')
os.system('cellocakes-main.py')
Please don't use os.system(); it is deprecated. Better use the subprocess module.

The problem of the question is here:
# assign the tuple (x, ".acc") to xys
xys = x,".acc"
# now xyzasd is the tuple converted to a string, thus
# making the name of your file into '("content of x", ".acc")'
xyzasd = str(xys)
# and open file named thus
tf = open(xyzasd,'a+')
What you wanted to do is:
# use proper variable and function names!
def make_account(account):
filename = account + '.acc'
the_file = open(filename, 'a+')
....
On the other hand there are other problems with your code, for example the
def main():
pass
if __name__ == '__main__':
main()
is utterly useless.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Replacing multiple words in a string from different data sets in Python - python

Related

It doesn't give anthing. Can someone help me about dictionaries and objects and classes in python?

How to dynamically replace chars in a string with chars in another string having same structure based on condition?

NameError in Python - not sure if the class or the function is causing the error

Python equivalent of Fortran list-directed input

Why does my program add ('', ' to the name of my file?

Categories

Resources