how can make the Periods ring next to scientist (no space) - python

variable1 = "I"
variable2 = "plan"
variable3= "to"
variable4= "think"
variable5="like"
variable6= "a"
variable7="computer"
variable8= "scientist"
variable9= "."
print(variable1,variable2,variable3,variable4,variable5,variable6,variable7,variable8,variable9)
i got
I plan to think like a computer scientist .
i hope to get
I plan to think like a computer scientist.

You can use two print() calls, one to print the first 8 words separated by spaces, then another to print the .. Use end='' in the first call so it won't go to a new line after it.
print(variable1,variable2,variable3,variable4,variable5,variable6,variable7,variable8, end='')
print(variable9)

There're tons of ways to do this. One simple and adhoc way is
' '.join(['I', 'plan', 'to', 'think', 'like', 'a', 'computer', 'scientist', '.']).replace(' .', '.')

Related

Regex to find Capital Words in a Series while allowing and,to etc. to be between them

So say I have a string such as:
Hello There what have You Been Doing.
I am Feeling Pretty Good and I Want to Keep Smiling.
I'm looking for the result:
['Hello There', 'You Been Doing', 'I am Feeling Pretty Good and I Want to Keep Smiling']
After a long time of head scratching which later evolved into head slamming, I turned to the internet for my answers. So far, I've managed to find the following:
r"([A-Z][a-z]+(?=\s[A-Z])(?:\s[A-Z][a-z]+)+)"
The above works but it clearly does not allow for 'and', 'to', 'for', 'am' (these are the only three I'm looking for) to be in the middle of the words and I can not figure out how to add that in there. I'm assuming I have to use the Pipe to do that, but where exactly do I put that group in?
I've also tried the answers over here, but they didn't end up working for me either.
If you are able to enumerate the words you're ok with being uncapitalized in the middle of a capitalized sentence, I would use an alternation to represent them :
\b(?:and|or|but|to|am)\b
And use that alternation to match a sequence of capitalized words and accepted uncapitalized words, which must start with a capitalized word :
[A-Z][a-z]*(?:\s(?:[A-Z][a-z]*|(?:and|or|but|to|am)\b))*
If you are ok with any word of three letters or less (including words like 'owl' or 'try', but not words like 'what') being uncapitalized, you can use the following :
[A-Z][a-z]*(?:\s(?:[A-Z][a-z]*|[a-z]{1,3}\b))*
I guess below works too with itertools.groupby
from itertools import groupby
s = 'Hello There what have You Been Doing. I am Feeling Pretty Good and I Want to Keep Smiling.'
[ ' '.join( list(g) ) for k, g in groupby(s.split(), lambda x: x[0].islower() and x not in ['and','to'] ) if not k ]
Output:
['Hello There',
'You Been Doing. I',
'Feeling Pretty Good and I Want to Keep Smiling.']

How to break a sentence in python depending on fullstop '.'?

I writing a script in python in which I have the following string:
a = "write This is mango. write This is orange."
I want to break this string into sentences and then add each sentence as an item of a list so it becomes:
list = ['write This is mango.', 'write This is orange.']
I have tried using TextBlob but it is not reading it correctly.(Reads the whole string as one sentence).
Is there a simple way of doing it?
One approach is re.split with positive lookbehind assertion:
>>> import re
>>> a = "write This is mango. write This is orange."
>>> re.split(r'(?<=\w\.)\s', a)
['write This is mango.', 'write This is orange.']
If you want to split on more than one separator, say . and ,, then use a character set in the assertion:
>>> a = "write This is mango. write This is orange. This is guava, and not pear."
>>> re.split(r'(?<=\w[,\.])\s', a)
['write This is mango.', 'write This is orange.', 'This is guava,', 'and not pear.']
On a side note, you should not use list as the name of a variable as this will shadow the builtin list.
This should work. Check out the .split() function here: http://www.tutorialspoint.com/python/string_split.htm
a = "write This is mango. write This is orange."
print a.split('.', 1)
you should look in to the NLTK for python.
Here's a sample from NLTK.org
>>> import nltk
>>> sentence = """At eight o'clock on Thursday morning
... Arthur didn't feel very good."""
>>> tokens = nltk.word_tokenize(sentence)
>>> tokens
['At', 'eight', "o'clock", 'on', 'Thursday', 'morning',
'Arthur', 'did', "n't", 'feel', 'very', 'good', '.']
>>> tagged = nltk.pos_tag(tokens)
>>> tagged[0:6]
[('At', 'IN'), ('eight', 'CD'), ("o'clock", 'JJ'), ('on', 'IN'),
('Thursday', 'NNP'), ('morning', 'NN')]
for your case you can do
import nltk
a = "write This is mango. write This is orange."
tokens = nltk.word_tokenize(a)
You know about string.split? It can take a multicharacter split criterion:
>>> "wer. wef. rgo.".split(". ")
['wer', 'wef', 'rgo.']
But it's not very flexible about things like amount of white space. If you can't control how many spaces come after the full stop, I recommend regular expressions ("import re"). For that matter, you could just split on "." and clean up the white space at the front of each sentence and the empty list that you will get after the last ".".
<code>a.split()</code>
a.split() seems like a simple way of doing it, but you will eventually run into problems.
For example suppose you have
a = 'What is the price of the orange? \
It costs $1.39. \
Thank you! \
See you soon Mr. Meowgi.'
The a.split('.') would return:
a[0] = 'What is the price of the orange? It costs $1'
a[1] = '39'
a[2] = 'Thank you! See you soon Mr'
a[3] = 'Meowgi'
I am also not factoring in
code snippets
e.g. 'The problem occured when I ran ./sen_split function. "a.str(" did not have a closing bracket.'
Possible Names of Company's
e.g. 'I work for the Node.js company'
etc.
This eventually boils down to English syntax. I would recommend looking into nltk module as Mike Tung pointed out.

CSV translate data within rows

i hope someone can point me in the right direction.
what would be an efficient way to translate data within a row[x]?
for example i want to convert the following: street,avenue,road,court to st,ave,rd,ct.
i was thinking of using a dictionary, the reason being is that sometimes the first letter will be capitalized and sometimes it wont ie: {'ave':['Avenue','avenue','AVENUE','av','AV']}
having said that, could i also do something (prior to translating) like convert all the data to lower case (in the original csv file) to prevent working with data that contains mixed caps?
this is for csv files with anywhere between 500-1000 lines..
thank you
edit: i should add that the row[x] string would be something like: '123 main street' and that is what im looking to translate to '123 main st'
edit#2:
mydict = {'avenue':'ave', 'street':'st', 'road':'rd', 'court':'ct'}
add1 = '123 MAIN ROAD'
newadd1 = []
for i in add1.lower().split(' '):
newtext = mydict.get(i.lower(),i)
newadd1.append(newtext)
print ' '.join(newadd1)
thank you everyone
The way I would tackle it would be, as you suggested, constructing a dictionary . For example, say that any form of Avenue - I would like to display as "Ave":
mapper = {'ave': 'Ave', 'avenue': 'Ave', 'av': 'Ave', 'st': 'Street', 'street': 'Street, ...}
and then use it with every word in the address field as follows:
word = mapper.get(word.lower(), word)

Python break list values into sub-components and maintain key

Hello I have a list as follows:
['2925729', 'Patrick did not shake our hands nor ask our names. He greeted us promptly and politely, but it seemed routine.'].
My goal is a result as follows:
['2925729','Patrick did not shake our hands nor ask our names'], ['2925729', 'He greeted us promptly and politely, but it seemed routine.']
Any pointers would be very much appreciated.
>>> t = ['2925729', 'Patrick did not shake our hands nor ask our names. He greeted us promptly and politely, but it seemed routine.']
>>> [ [t[0], a + '.'] for a in t[1].rstrip('.').split('.')]
[['2925729', 'Patrick did not shake our hands nor ask our names.'], ['2925729', ' He greeted us promptly and politely, but it seemed routine.']]
If you have a large dataset and want to conserve memory, you may want to create a generator instead of a list:
g = ( [t[0], a + '.'] for a in t[1].rstrip('.').split('.') )
for key, sentence in g:
# do processing
Generators do not create lists all at once. They create each element as you access it. This is only helpful if you don't need the whole list at once.
ADDENDUM: You asked about making dictionaries if you have multiple keys:
>>> data = ['1', 'I think. I am.'], ['2', 'I came. I saw. I conquered.']
>>> dict([ [t[0], t[1].rstrip('.').split('.')] for t in data ])
{'1': ['I think', ' I am'], '2': ['I came', ' I saw', ' I conquered']}

Python, working with strings

i need to construct a program to my class which will : read a messed text from file and give this text a book form so from input:
This is programing story , for programmers . One day a variable
called
v comes to a bar and ordred some whiskey, when suddenly
a new variable was declared .
a new variable asked : " What did you ordered? "
into output
This is programing story,
for programmers. One day
a variable called v comes
to a bar and ordred some
whiskey, when suddenly a
new variable was
declared. A new variable
asked: "what did you
ordered?"
I am total beginner at programming, and my code is here
def vypis(t):
cely_text = ''
for riadok in t:
cely_text += riadok.strip()
a = 0
for i in range(0,80):
if cely_text[0+a] == " " and cely_text[a+1] == " ":
cely_text = cely_text.replace (" ", " ")
a+=1
d=0
for c in range(0,80):
if cely_text[0+d] == " " and (cely_text[a+1] == "," or cely_text[a+1] == "." or cely_text[a+1] == "!" or cely_text[a+1] == "?"):
cely_text = cely_text.replace (" ", "")
d+=1
def vymen(riadok):
for ch in riadok:
if ch in '.,":':
riadok = riadok[ch-1].replace(" ", "")
x = int(input("Zadaj x"))
t = open("text.txt", "r")
v = open("prazdny.txt", "w")
print(vypis(t))
This code have deleted some spaces and i have tried to delete spaces before signs like " .,_?" but this do not worked why ? Thanks for help :)
You want to do quite a lot of things, so let's take them in order:
Let's get the text in a nice text form (a list of strings):
>>> with open('text.txt', 'r') as f:
... lines = f.readlines()
>>> lines
['This is programing story , for programmers . One day a variable',
'called', 'v comes to a bar and ordred some whiskey, when suddenly ',
' a new variable was declared .',
'a new variable asked : " What did you ordered? "']
You have newlines all around the place. Let's replace them by spaces and join everything into a single big string:
>>> text = ' '.join(line.replace('\n', ' ') for line in lines)
>>> text
'This is programing story , for programmers . One day a variable called v comes to a bar and ordred some whiskey, when suddenly a new variable was declared . a new variable asked : " What did you ordered? "'
Now we want to remove any multiple spaces. We split by space, tabs, etc... and keep only the non-empty words:
>>> words = [word for word in text.split() if word]
>>> words
['This', 'is', 'programing', 'story', ',', 'for', 'programmers', '.', 'One', 'day', 'a', 'variable', 'called', 'v', 'comes', 'to', 'a', 'bar', 'and', 'ordred', 'some', 'whiskey,', 'when', 'suddenly', 'a', 'new', 'variable', 'was', 'declared', '.', 'a', 'new', 'variable', 'asked', ':', '"', 'What', 'did', 'you', 'ordered?', '"']
Let us join our words by spaces... (only one this time)
>>> text = ' '.join(words)
>>> text
'This is programing story , for programmers . One day a variable called v comes to a bar and ordred some whiskey, when suddenly a new variable was declared . a new variable asked : " What did you ordered? "'
We now want to remove all the <SPACE>., <SPACE>, etc...:
>>> for char in (',', '.', ':', '"', '?', '!'):
... text = text.replace(' ' + char, char)
>>> text
'This is programing story, for programmers. One day a variable called v comes to a bar and ordred some whiskey, when suddenly a new variable was declared. a new variable asked:" What did you ordered?"'
OK, the work is not done as the " are still messed up, the upper case are not set etc... You can still incrementally update your text. For the upper case, consider for instance:
>>> sentences = text.split('.')
>>> sentences
['This is programing story, for programmers', ' One day a variable called v comes to a bar and ordred some whiskey, when suddenly a new variable was declared', ' a new variable asked:" What did you ordered?"']
See how you can fix it ?
The trick is to take only string transformations such that:
A correct sentence is UNCHANGED by the transformation
An incorrect sentence is IMPROVED by the transformation
This way you can compose them an improve your text incrementally.
Once you have a nicely formatted text, like this:
>>> text
'This is programing story, for programmers. One day a variable called v comes to a bar and ordred some whiskey, when suddenly a new variable was declared. A new variable asked: "what did you ordered?"'
You have to define similar syntactic rules for printing it out in book format. Consider for instance the function:
>>> def prettyprint(text):
... return '\n'.join(text[i:i+50] for i in range(0, len(text), 50))
It will print each line with an exact length of 50 characters:
>>> print prettyprint(text)
This is programing story, for programmers. One day
a variable called v comes to a bar and ordred som
e whiskey, when suddenly a new variable was declar
ed. A new variable asked: "what did you ordered?"
Not bad, but can be better. Just like we previously juggled with text, lines, sentences and words to match the syntactic rules of English language, with want to do exactly the same to match the syntactic rules of printed books.
In that case, both the English language and printed books work on the same units: words, arranged in sentences. This suggests we might want to work on these directly. A simple way to do that is to define your own objects:
>>> class Sentence(object):
... def __init__(self, content, punctuation):
... self.content = content
... self.endby = punctuation
... def pretty(self):
... nice = []
... content = self.content.pretty()
... # A sentence starts with a capital letter
... nice.append(content[0].upper())
... # The rest has already been prettified by the content
... nice.extend(content[1:])
... # Do not forget the punctuation sign
... nice.append('.')
... return ''.join(nice)
>>> class Paragraph(object):
... def __init__(self, sentences):
... self.sentences = sentences
... def pretty(self):
... # Separating our sentences by a single space
... return ' '.join(sentence.pretty() for sentence in sentences)
etc... This way you can represent your text as:
>>> Paragraph(
... Sentence(
... Propositions([Proposition(['this',
... 'is',
... 'programming',
... 'story']),
... Proposition(['for',
... 'programmers'])],
... ',')
... '.'),
... Sentence(...
etc...
Converting from a string (even a messed up one) to such a tree is relatively straightforward as you only break down to the smallest possible elements. When you want to print it in book format, you can define your own book methods on each element of the tree, e.g. like this, passing around the current line, the output lines and the current offset on the current line:
class Proposition(object):
...
def book(self, line, lines, offset, line_length):
for word in self.words:
if offset + len(word) > line_length:
lines.append(' '.join(line))
line = []
offset = 0
line.append(word)
return line, lines, offset
...
class Propositions(object):
...
def book(self, lines, offset, line_length):
lines, offset = self.Proposition1.book(lines, offset, line_length)
if offset + len(self.punctuation) + 1 > line_length:
# Need to add the punctuation sign with the last word
# to a new line
word = line.pop()
lines.append(' '.join(line))
line = [word + self.punctuation + ' ']
offset = len(word + self.punctuation + ' ')
line, lines, offset = self.Proposition2.book(lines, offset, line_length)
return line, lines, offset
And work your way up to Sentence, Paragraph, Chapter...
This is a very simplistic implementation (and actually a non-trivial problem) which does not take into account syllabification or justification (which you would probably like to have), but this is the way to go.
Note that I did not mention the string module, string formatting or regular expressions which are tools to use once you can define your syntactic rules or transformations. These are extremely powerful tools, but the most important here is to know exactly the algorithm to transform an invalid string into a valid one. Once you have some working pseudocode, regexps and format strings can help you achieve it with less pain than plain character iteration. (in my previous example of tree of words for instance, regexps can tremendously ease the construction of the tree, and Python's powerful string formatting functions can make the writing of book or pretty methods much easier).
To strip the multiple spaces you could use a simple regex substitution.
import re
cely_text = re.sub(' +',' ', cely_text)
Then for punctuation you could run a similar sub:
cely_text = re.sub(' +([,.:])','\g<1>', cely_text)

Categories

Resources