Splitting a text

Splitting a text - python

I have a large text in which three people talking.
I read that text to a string variable in python.
Text is like
JOHN: hello
MIKE: hello john
SARAH: hello guys
Imagine a long talk between 3 people. I want to split the texts into lists like
john = []
mike = []
sarah = []
and I want the list john to contain every sentence john said.
Can anyone help me with the code I need?

See if this is enough to get you started.
for line in text:
if line.startswith('JOHN'):
john.append(line)
elif line.startswith('MIKE'):
mike.append(line)
elif line.startswith('SARAH'):
sarah.append(line)

Related

Remove New Line Feed But Only Between Quotes

I have the following code:
output = requests.get(url=url, auth=oauth, headers=headers, data=payload)
output_data = output.content
type(output_date)
<class 'bytes'>
output_data
Squeezed Text (3632 Lines)
When looking at the squeezed text, I have some values that look like this:
Steve likes to walk his dog. Steve says to John "I like \n Pineapple, oranges, \n and pizza.\n" and then he went to bed \n.
John likes his beer cold.\n
Sally likes her teeth brushed with a bottle of jack.\n
How can I remove the \n characters, but ONLY if it is contained within double quotes, so that my results look like this:
Steve likes to walk his dog. Steve says to John "I like Pineapple, oranges, and pizza." and then he went to bed \n.
John likes his beer cold.\n
Sally likes her teeth brushed with a bottle of jack.\n
I know how to remove \n characters, but I am not sure how to do this if I only want to remove the values if they are contained within double quotes.
Here is what I have tries:
I found this, and used this code:
my_text = re.sub(r'"\\n"','',my_text)
But it doesn't seem to be working.

I might be complicating it a bit, but something like this might work
parts = content.split("\"")
for i, part in enumerate(parts):
if i % 2:
parts[i] = part.replace("\n", "")
content = "\"".join(parts)

Figured it out.
Steps:
Convert bytes to String
Create the pattern for Regex
Use regex to format the values.
Step 1:
my_text = my_text.decode("utf-8")
Step 2:
pattern = re.compile(r'".*?"',re.DOTALL)
Step 3:
my_text = pattern.sub(lambda x:x.group().replace('\n',''),my_text)
This solves my problem.

Printing a list to new line for each set of data

I am still learning coding so please forgive me if this is basic. I have a set of code that asks the user how many users it wants to input (x) and asks for basic information about all X of those students (first and last name, age). However, when I print it out it all comes out as one long line. I have seen a few ways to print each character to a new line but not each new set of data. I would like to separate it by user. What I currently have is:
Name: Age:
Tiger woods 40, karen woods 33, charlie brown 44
What I would like it:
Name: Age:
Tiger Woods 40
Karen Woods 33
Charlie Brown 44
This is the code I am currently working with:
list.append(firstname)
list.append(lastname)
list.append(age)
print("name: age:")
print(", ".join(map(str, list)))

The simplest way is to use \n in a string to start a new line. For example, if you have this print statement:
print("Hello world!\nHow are you today?")
It would display as:
Hello world!
How are you today?
Thus, you should simply be able to replace this:
print(", ".join(map(str, list)))
with this:
print("\n".join(map(str, list)))
or something of that nature.

Try using a list of touples instead a list of words
list.append((firstname,lastname,age))
edited
sorry, you can access the data through a subindex
you_list[0] #for firstname
you_list[2] #for age

How do I replace multiple words within each row of a column that contains full sentences?

I have a data frame (let's call it 'littletext') that has a column with sentences within each row. I also have another table (let's call it 'littledict') that I would like to use as a reference by which to find and replace words and/or phrases within each row of 'littletext'.
Here are my two data frames. I am hard-coding them in this example but will load them as csv files in "real life":
raw_text = {
"text": ["Hello, world!", "Hello, how are you?", "This world is funny!"],
"col2": [0,1,1]}
littletext = pd.DataFrame(raw_text, index = pd.Index(['A', 'B', 'C'], name='letter'), columns = pd.Index(['text', 'col2'], name='attributes'))
raw_dict = {
"key": ["Hello", "This", "funny"],
"replacewith": ["Hi", "That", "hilarious"]}
littledict = pd.DataFrame(raw_dict, index = pd.Index(['a','b','c'], name='letter'), columns = pd.Index(['key', 'replacewith'], name='attributes'))
print(littletext) # ignore 'col2' since it is irrelevant in this discussion
text col2
A Hello, world! 0
B Hello, how are you? 1
C This world is funny! 1
print(littledict)
key replacewith
a Hello Hi
b This That
c funny hilarious
I would like to have 'littletext' modified as per below wherein Python will look at more than one word within each sentence of my 'littletext' table (dataframe) and replace multiple words, acting on all rows. The final product should be that 'Hello' has been replaced by 'Hi' in lines A and B, and 'That' was replaced with 'This' and 'funny' was replaced with 'hilarious', both within line C:
text col2
A Hi, world! 0
B Hi, how are you? 1
C That world is hilarious! 1
Here are two attempts that I have tried but neither of which work. They are not generating errors, they are just not modifying 'littletext' as I described above. Attempt #1 'technically' works but it is inefficient and therefore useless for large-scale jobs because I would have to anticipate and program every possible sentence I would need to replace other sentence. Attempt #2 simply does not change anything at all.
My two Attempts that do NOT work are:
Attempt #1: this is not helpful because to use it, I would have to program entire sentences to replace other sentences, which is pointless:
littltext['text'].replace({'Hello, world!': 'Hi there, world.', 'This world is funny!': 'That world is hilarious'})
Attempt #1 returns:
Out[125]:
0 Hi there, world.
1 Hello, how are you?
2 That world is hilarious
Name: text, dtype: object
Attempt #2: this attempt is closer to the mark but returns no changes whatsoever:
for key in littledict:
littletext = littletext.replace(key,littledict[key])
Attempt #2 returns:
text col2
0 Hello, world! 0
1 Hello, how are you? 1
2 This world is funny! 1
I have scoured the internet, including Youtube, Udemy, etc., but to no avail. Numerous 'tutorial' sites only cover individual text examples, not entire columns of sentences like the example I am showing and are therefore useless in scaling up to industry-size projects. I am hoping someone can graciously shed light on this since this kind of text manipulation is commonplace in many industry settings.
My humble thanks and appreciation to anyone who can help!!

dict littledict to enable you generate a regex and use the regex in .replace.str()to replace the characters you need as follows
s=dict(zip(littledict.key,littledict.replacewith))
littletext['text'].str.replace('|'.join(s), lambda x: s[x.group()])
0 Hi, world!
1 Hi, how are you?
2 That world is hilarious!
Name: text, dtype: object

you were pretty close with the first attempt. You can create the dictionary from littledict with key in index and use regex=True.
print (littletext['text']
.replace(littledict.set_index('key')
['replacewith'].to_dict(),
regex=True)
)
0 Hi, world!
1 Hi, how are you?
2 That world is hilarious!
Name: text, dtype: object

Need to listing user input in python

I need help figuring out how to turn a simple user input like
a = input('Enternumber: ')
and if the user was to input say...
hello bob Jeff Lexi Ava
How am I supposed to have the computer turn that into a list like,
hello
bob
Jeff
Lexi
Ava
If someone has the code could they please explain what they are doing. *This is python

Use the split method.
my_string = 'hello bob Jeff Lexi Ava'
print(my_string.split()) # ['hello', 'bob', 'Jeff', 'Lexi', 'Ava']
To print each on a line:
for word in my_string:
print(word)

Function that insert words into text

I have a text that goes like this:
text = "All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood."
How do I write a function hedging(text) that processes my text and produces a new version that inserts the word "like" in the every third word of the text?
The outcome should be like that:
text2 = "All human beings like are born free like and equal in like..."
Thank you!

Instead of giving you something like
solution=' like '.join(map(' '.join, zip(*[iter(text.split())]*3)))
I'm posting a general advice on how to approach the problem. The "algorithm" is not particularly "pythonic", but hopefully easy to understand:
words = split text into words
number of words processed = 0
for each word in words
output word
number of words processed += 1
if number of words processed is divisible by 3 then
output like
Let us know if you have questions.

You could go with something like that:
' '.join([n + ' like' if i % 3 == 2 else n for i, n in enumerate(text.split())])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Splitting a text - python

See if this is enough to get you started. for line in text: if line.startswith('JOHN'): john.append(line) elif line.startswith('MIKE'): mike.append(line) elif line.startswith('SARAH'): sarah.append(line)

Related

Remove New Line Feed But Only Between Quotes

Printing a list to new line for each set of data

How do I replace multiple words within each row of a column that contains full sentences?

Need to listing user input in python

Function that insert words into text

Categories

Resources