Python, working with strings

Python, working with strings - python

i need to construct a program to my class which will : read a messed text from file and give this text a book form so from input:
This is programing story , for programmers . One day a variable
called
v comes to a bar and ordred some whiskey, when suddenly
a new variable was declared .
a new variable asked : " What did you ordered? "
into output
This is programing story,
for programmers. One day
a variable called v comes
to a bar and ordred some
whiskey, when suddenly a
new variable was
declared. A new variable
asked: "what did you
ordered?"
I am total beginner at programming, and my code is here
def vypis(t):
cely_text = ''
for riadok in t:
cely_text += riadok.strip()
a = 0
for i in range(0,80):
if cely_text[0+a] == " " and cely_text[a+1] == " ":
cely_text = cely_text.replace (" ", " ")
a+=1
d=0
for c in range(0,80):
if cely_text[0+d] == " " and (cely_text[a+1] == "," or cely_text[a+1] == "." or cely_text[a+1] == "!" or cely_text[a+1] == "?"):
cely_text = cely_text.replace (" ", "")
d+=1
def vymen(riadok):
for ch in riadok:
if ch in '.,":':
riadok = riadok[ch-1].replace(" ", "")
x = int(input("Zadaj x"))
t = open("text.txt", "r")
v = open("prazdny.txt", "w")
print(vypis(t))
This code have deleted some spaces and i have tried to delete spaces before signs like " .,_?" but this do not worked why ? Thanks for help :)

You want to do quite a lot of things, so let's take them in order:
Let's get the text in a nice text form (a list of strings):
>>> with open('text.txt', 'r') as f:
... lines = f.readlines()
>>> lines
['This is programing story , for programmers . One day a variable',
'called', 'v comes to a bar and ordred some whiskey, when suddenly ',
' a new variable was declared .',
'a new variable asked : " What did you ordered? "']
You have newlines all around the place. Let's replace them by spaces and join everything into a single big string:
>>> text = ' '.join(line.replace('\n', ' ') for line in lines)
>>> text
'This is programing story , for programmers . One day a variable called v comes to a bar and ordred some whiskey, when suddenly a new variable was declared . a new variable asked : " What did you ordered? "'
Now we want to remove any multiple spaces. We split by space, tabs, etc... and keep only the non-empty words:
>>> words = [word for word in text.split() if word]
>>> words
['This', 'is', 'programing', 'story', ',', 'for', 'programmers', '.', 'One', 'day', 'a', 'variable', 'called', 'v', 'comes', 'to', 'a', 'bar', 'and', 'ordred', 'some', 'whiskey,', 'when', 'suddenly', 'a', 'new', 'variable', 'was', 'declared', '.', 'a', 'new', 'variable', 'asked', ':', '"', 'What', 'did', 'you', 'ordered?', '"']
Let us join our words by spaces... (only one this time)
>>> text = ' '.join(words)
>>> text
'This is programing story , for programmers . One day a variable called v comes to a bar and ordred some whiskey, when suddenly a new variable was declared . a new variable asked : " What did you ordered? "'
We now want to remove all the <SPACE>., <SPACE>, etc...:
>>> for char in (',', '.', ':', '"', '?', '!'):
... text = text.replace(' ' + char, char)
>>> text
'This is programing story, for programmers. One day a variable called v comes to a bar and ordred some whiskey, when suddenly a new variable was declared. a new variable asked:" What did you ordered?"'
OK, the work is not done as the " are still messed up, the upper case are not set etc... You can still incrementally update your text. For the upper case, consider for instance:
>>> sentences = text.split('.')
>>> sentences
['This is programing story, for programmers', ' One day a variable called v comes to a bar and ordred some whiskey, when suddenly a new variable was declared', ' a new variable asked:" What did you ordered?"']
See how you can fix it ?
The trick is to take only string transformations such that:
A correct sentence is UNCHANGED by the transformation
An incorrect sentence is IMPROVED by the transformation
This way you can compose them an improve your text incrementally.
Once you have a nicely formatted text, like this:
>>> text
'This is programing story, for programmers. One day a variable called v comes to a bar and ordred some whiskey, when suddenly a new variable was declared. A new variable asked: "what did you ordered?"'
You have to define similar syntactic rules for printing it out in book format. Consider for instance the function:
>>> def prettyprint(text):
... return '\n'.join(text[i:i+50] for i in range(0, len(text), 50))
It will print each line with an exact length of 50 characters:
>>> print prettyprint(text)
This is programing story, for programmers. One day
a variable called v comes to a bar and ordred som
e whiskey, when suddenly a new variable was declar
ed. A new variable asked: "what did you ordered?"
Not bad, but can be better. Just like we previously juggled with text, lines, sentences and words to match the syntactic rules of English language, with want to do exactly the same to match the syntactic rules of printed books.
In that case, both the English language and printed books work on the same units: words, arranged in sentences. This suggests we might want to work on these directly. A simple way to do that is to define your own objects:
>>> class Sentence(object):
... def __init__(self, content, punctuation):
... self.content = content
... self.endby = punctuation
... def pretty(self):
... nice = []
... content = self.content.pretty()
... # A sentence starts with a capital letter
... nice.append(content[0].upper())
... # The rest has already been prettified by the content
... nice.extend(content[1:])
... # Do not forget the punctuation sign
... nice.append('.')
... return ''.join(nice)
>>> class Paragraph(object):
... def __init__(self, sentences):
... self.sentences = sentences
... def pretty(self):
... # Separating our sentences by a single space
... return ' '.join(sentence.pretty() for sentence in sentences)
etc... This way you can represent your text as:
>>> Paragraph(
... Sentence(
... Propositions([Proposition(['this',
... 'is',
... 'programming',
... 'story']),
... Proposition(['for',
... 'programmers'])],
... ',')
... '.'),
... Sentence(...
etc...
Converting from a string (even a messed up one) to such a tree is relatively straightforward as you only break down to the smallest possible elements. When you want to print it in book format, you can define your own book methods on each element of the tree, e.g. like this, passing around the current line, the output lines and the current offset on the current line:
class Proposition(object):
...
def book(self, line, lines, offset, line_length):
for word in self.words:
if offset + len(word) > line_length:
lines.append(' '.join(line))
line = []
offset = 0
line.append(word)
return line, lines, offset
...
class Propositions(object):
...
def book(self, lines, offset, line_length):
lines, offset = self.Proposition1.book(lines, offset, line_length)
if offset + len(self.punctuation) + 1 > line_length:
# Need to add the punctuation sign with the last word
# to a new line
word = line.pop()
lines.append(' '.join(line))
line = [word + self.punctuation + ' ']
offset = len(word + self.punctuation + ' ')
line, lines, offset = self.Proposition2.book(lines, offset, line_length)
return line, lines, offset
And work your way up to Sentence, Paragraph, Chapter...
This is a very simplistic implementation (and actually a non-trivial problem) which does not take into account syllabification or justification (which you would probably like to have), but this is the way to go.
Note that I did not mention the string module, string formatting or regular expressions which are tools to use once you can define your syntactic rules or transformations. These are extremely powerful tools, but the most important here is to know exactly the algorithm to transform an invalid string into a valid one. Once you have some working pseudocode, regexps and format strings can help you achieve it with less pain than plain character iteration. (in my previous example of tree of words for instance, regexps can tremendously ease the construction of the tree, and Python's powerful string formatting functions can make the writing of book or pretty methods much easier).

To strip the multiple spaces you could use a simple regex substitution.
import re
cely_text = re.sub(' +',' ', cely_text)
Then for punctuation you could run a similar sub:
cely_text = re.sub(' +([,.:])','\g<1>', cely_text)

Related

Translate paragraph in python

I am trying to translate a Paragraph from english to my local language which I have written the code as:
def translate(inputvalue):
//inputvalue is an array of english paragraphs
try:
translatedData = []
trans = Translator()
for i in inputvalue:
sentence = re.sub(r'(?<=[.,])(?=[^\s])', r' ', i)
//adding space where there is no space after , or ,
t = trans.translate(sentence, src='en', dest = 'ur')
//translating from english to my local language urdu
translatedData.append(t.text)
//appending data in translatedData array
DisplayOutput.output(translatedData)
//finally calling DisplayOutput function to print translated data
The problem I am facing here is that my local language begins writing from [Right side]
and googletrans is not giving proper output. It puts periods ,commas, untranslated words at the beginning or at the end for example:
I am 6 years old. I love to draw cartoons, animals, and plants. I do not have ADHD.
it would translate this sentence as:
میری عمر 6 سال ہے،. مجھے کارٹون جانور اور پودے کھینچنا پسند ہےمجھے ADHD 6نہیں ہے.
As you can observe it could not translate ADHD as it is just an abbreviation it puts that at the beginning of the sentence and same goes for periods and numbers and commas.
How should I translate it so that it does not conflict like that.
If putting the sentence in another array like:
['I am', '6', 'years old', '.', 'I love to draw cartoons',',', 'animals',',', 'and plants','.', 'I do not have', 'ADHD','.']
I have no idea how to achieve this type of array but I believe it can solve the problem.
As I can translate only the parts that has English words and then appending the list in a string.
Kindly Help me generate this type of array or any other solution

string = "I am 6 years old. I love to draw cartoons, animals, and plants. I do not have ADHD."
arr = []
substring = ""
alpha = None
for char in string:
if char.isalpha() or char == " ": alpha = True
else: alpha = False
if substring.replace(" ","").isalpha():
if alpha:
substring += char
else:
arr.append(substring)
substring = char
else:
if alpha:
arr.append(substring)
substring = char
while " " in arr: arr.remove(" ")
while "" in arr: arr.remove("")
print(arr)
Loop through each character in the string, then check if it is a letter or not a letter with ".isalpha()". Then depending on the conditions of the current substring, you append to it or create a new one.

Replace a word in a String by indexing without "string replace function" -python

Is there a way to replace a word within a string without using a "string replace function," e.g., string.replace(string,word,replacement).
[out] = forecast('This snowy weather is so cold.','cold','awesome')
out => 'This snowy weather is so awesome.
Here the word cold is replaced with awesome.
This is from my MATLAB homework which I am trying to do in python. When doing this in MATLAB we were not allowed to us strrep().
In MATLAB, I can use strfind to find the index and work from there. However, I noticed that there is a big difference between lists and strings. Strings are immutable in python and will likely have to import some module to change it to a different data type so I can work with it like how I want to without using a string replace function.

just for fun :)
st = 'This snowy weather is so cold .'.split()
given_word = 'awesome'
for i, word in enumerate(st):
if word == 'cold':
st.pop(i)
st[i - 1] = given_word
break # break if we found first word
print(' '.join(st))

Here's another answer that might be closer to the solution you described using MATLAB:
st = 'This snow weather is so cold.'
given_word = 'awesome'
word_to_replace = 'cold'
n = len(word_to_replace)
index_of_word_to_replace = st.find(word_to_replace)
print st[:index_of_word_to_replace]+given_word+st[index_of_word_to_replace+n:]

You can convert your string into a list object, find the index of the word you want to replace and then replace the word.
sentence = "This snowy weather is so cold"
# Split the sentence into a list of the words
words = sentence.split(" ")
# Get the index of the word you want to replace
word_to_replace_index = words.index("cold")
# Replace the target word with the new word based on the index
words[word_to_replace_index] = "awesome"
# Generate a new sentence
new_sentence = ' '.join(words)

Using Regex and a list comprehension.
import re
def strReplace(sentence, toReplace, toReplaceWith):
return " ".join([re.sub(toReplace, toReplaceWith, i) if re.search(toReplace, i) else i for i in sentence.split()])
print(strReplace('This snowy weather is so cold.', 'cold', 'awesome'))
Output:
This snowy weather is so awesome.

Why is my RegEx code replacing some strings, but not others?

I have abstracts of academic articles. Sometimes, the abstract will contain lines like "PurposeThis article explores...." or "Design/methodology/approachThe design of our study....". I call terms like "Purpose" and "Design/methodology/approach" labels. I want the string to look like this: [label][:][space]. For example: "Purpose: This article explores...."
The code below gets me the result I want when the original string has a space between the label and the text (e.g. "Purpose This article explores....". But I don't understand why it also doesn't work when there is no space. May I ask what I need to do to the code below so that the labels are formatted the way I want, even when the original text has no space between the label and the text? Note that I imported re.sub.
def clean_abstract(my_abstract):
labels = ['Purpose', 'Design/methodology/approach', 'Methodology/Approach', 'Methodology/approach' 'Findings', 'Research limitations/implications', 'Research limitations/Implications' 'Practical implications', 'Social implications', 'Originality/value']
for i in labels:
cleaned_abstract = sub(i, i + ': ', cleaned_abstract)
return cleaned_abstract

Code
See code in use here
labels = ['Purpose', 'Design/methodology/approach', 'Methodology/Approach', 'Methodology/approach' 'Findings', 'Research limitations/implications', 'Research limitations/Implications' 'Practical implications', 'Social implications', 'Originality/value']
strings = ['PurposeThis article explores....', 'Design/methodology/approachThe design of our study....']
print [l + ": " + s.split(l)[1].lstrip() for l in labels for s in strings if l in s]
Results
[
'Purpose: This article explores....',
'Design/methodology/approach: The design of our study....'
]
Explanation
Using the logic from this post.
print [] returns a list of results
l + ": " + s.split(l)[1].lstrip() creates our strings
l is explained below
: literally
s.split(l).lstrip() Split s on l and remove any whitespace from the left side of the string
for l in labels Loops over labels setting l to the value upon each iteration
for s in strings Loops over strings setting s to the value upon each iteration
if l in s If l is found in s

Python search and replace

I have written two functions in Python.
When I run replace(), it looks at the data structure named replacements. It takes the key, iterates through the document and when it matches a key to a word in the document, it replaces the word with the value.
Now it seems what is happening, because i also have the reverse ('stopped' changes to 'suspended' and 'suspended' changes to 'stopped', depending on what is in the text file), it seems that as it goes through the file, some words are changed, and then changed back (i.e so no changes are made)
when I run replace2() i take each word from the text document, and see if this is a key in replacements. If it is, I replace it. What I have noticed though, when I run this, suspended (contains the substring "ended") ends up as "suspfinished"?
Is there an easier way to iterate through the text file and only change the word once, if found? I think replace2() does what I want it to do, although I'm losing phrases, but it also seems to pick up substrings, which it should not, as i did use the split() function.
def replace():
fileinput = open('tennis.txt').read()
out = open('tennis.txt', 'w')
for i in replacements.keys():
fileinput = fileinput.replace(i, replacements[i])
print(i, " : ", replacements[i])
out.write(fileinput)
out.close
def replace2():
fileinput = open('tennis.txt').read()
out = open('tennis.txt', 'w')
#for line in fileinput:
for word in fileinput.split():
for i in replacements.keys():
print(i)
if word == i:
fileinput = fileinput.replace(word, replacements[i])
out.write(fileinput)
out.close
replacements = {
'suspended' : 'stopped',
'stopped' : 'suspended',
'due to' : 'because of',
'ended' : 'finished',
'finished' : 'ended',
'40' : 'forty',
'forty' : '40',
'because of' : 'due to' }
the match ended due to rain a mere 40 minutes after it started. it was
suspended because of rain.

Improved version of rawbeans answer. It didn't work as expected since some of your replacement keys contain multiple words.
Tested with your example line and it outputs: the match finished because of rain a mere forty minutes after it started. it was stopped due to rain.
import re
def replace2():
fileinput = open('tennis.txt').read()
out = open('tennisout.txt', 'w')
#for line in fileinput:
wordpats = '|'.join(replacements.keys())
pattern = r'({0}|\w+|\W|[.,!?;-_])'.format(wordpats)
words = re.findall(pattern, fileinput)
output = "".join(replacements.get(x, x) for x in words)
out.write(output)
out.close()
replacements = {
'suspended' : 'stopped',
'stopped' : 'suspended',
'due to' : 'because of',
'ended' : 'finished',
'finished' : 'ended',
'40' : 'forty',
'forty' : '40',
'because of' : 'due to' }
if __name__ == '__main__':
replace2()

is there an easier way to iterate through the text file and only change the word once, if found?
There's a much simpler way:
output = " ".join(replacements.get(x, x) for x in fileinput.split())
out.write(output)

To account for punctuation, use a regular expression instead of split():
output = " ".join(replacements.get(x, x) for x in re.findall(r"[\w']+|[.,!?;]", fileinput))
out.write(output)
This way, punctuation will be ignored during the replace, but will be present in the final string. See this post for an explanation and potential caveats.

Problems with nested loops…

I’m going to explain to you in details of what I want to achieve.
I have 2 programs about dictionaries.
The code for program 1 is here:
import re
words = {'i':'jeg','am':'er','happy':'glad'}
text = "I am happy.".split()
translation = []
for word in text:
word_mod = re.sub('[^a-z0-9]', '', word.lower())
punctuation = word[-1] if word[-1].lower() != word_mod[-1] else ''
if word_mod in words:
translation.append(words[word_mod] + punctuation)
else:
translation.append(word)
translation = ' '.join(translation).split('. ')
print('. '.join(s.capitalize() for s in translation))
This program has following advantages:
You can write more than one sentence
You get the first letter capitalized after “.”
The program “append” the untranslated word to the output (“translation = []”)
Here is the code for program 2:
words = {('i',): 'jeg', ('read',): 'leste', ('the', 'book'): 'boka'}
max_group = len(max(words))
text = "I read the book".lower().split()
translation = []
position = 0
while text:
for m in range(max_group - 1, -1, -1):
word_mod = tuple(text[:position + m])
if word_mod in words:
translation.append(words[word_mod])
text = text[position + m:]
position += 1
translation = ' '.join(translation).split('. ')
print('. '.join(s.capitalize() for s in translation))
With this code you can translate idiomatic expressions or
“the book” to “boka”.
Here is how the program proceeds the codes.
This is the output:
1
('i',)
['jeg']
['read', 'the', 'book']
0
()
1
('read', 'the')
0
('read',)
['jeg', 'leste']
['the', 'book']
1
('the', 'book')
['jeg', 'leste', 'boka']
[]
0
()
Jeg leste boka
What I want is to implement some of the codes from program 1 into program 2.
I have tried many times with no success…
Here is my dream…:
If I change the text to the following…:
text = "I read the book. I read the book! I read the book? I read the book.".lower().split()
I want the output to be:
Jeg leste boka. Jeg leste boka! Jeg leste boka? Jeg leste boka.
So please, tweak your brain and help me with a solution…
I appreciate any reply very much!
Thank you very much in advance!

My solution flow would be something like this:
dict = ...
max_group = len(max(dict))
input = ...
textWPunc = input.lower().split()
textOnly = [re.sub('[^a-z0-9]', '', x) for x in input.lower().split()]
translation = []
while textOnly:
for m in [max_group..0]:
if textOnly[:m] in words:
check for punctuation here using textWPunc[:m]
if punctuation present in textOnly[:m]:
Append translated words + punctuation
else:
Append only translated words
textOnly = textOnly[m:]
textWPunc = textWPunc[m:]
join translation to finish
The key part being you keep two parallel lines of text, one that you check for words to translate and the other you check for punctuation if your translation search comes up with a hit. To check for punctuation, I fed the word group that I was examining into re() like so: re.sub('[a-z0-9]', '', wordGroup) which will strip out all characters but no punctuation.
Last thing was that your indexing looks kind of weird to me with that position variable. Since you're truncating the source string as you go, I'm not sure that's really necessary. Just check the leftmost x words as you go instead of using that position variable.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python, working with strings - python

To strip the multiple spaces you could use a simple regex substitution. import re cely_text = re.sub(' +',' ', cely_text) Then for punctuation you could run a similar sub: cely_text = re.sub(' +([,.:])','\g<1>', cely_text)

Related

Translate paragraph in python

Replace a word in a String by indexing without "string replace function" -python

Why is my RegEx code replacing some strings, but not others?

Python search and replace

Problems with nested loops…

Categories

Resources