Accumulating Characters in Python - python

So I have this textfile, and in that file it goes like this... (just a bit of it)
"The truest love that ever heart
Felt at its kindled core
Did through each vein in quickened start
The tide of being pour
Her coming was my hope each day
Her parting was my pain
The chance that did her steps delay
Was ice in every vein
I dreamed it would be nameless bliss
As I loved loved to be
And to this object did I press
As blind as eagerly
But wide as pathless was the space
That lay our lives between
And dangerous as the foamy race
Of ocean surges green
And haunted as a robber path
Through wilderness or wood
For Might and Right and Woe and Wrath
Between our spirits stood
I dangers dared I hindrance scorned
I omens did defy
Whatever menaced harassed warned
I passed impetuous by
On sped my rainbow fast as light
I flew as in a dream
For glorious rose upon my sight
That child of Shower and Gleam"
Now, the calculate the length of words without the letter 'e' in each line of text. So in the first line it should have 4, then 5, then 17, etc.
My current code is
for line in open("textname.txt"):
line_strip = line.strip()
line_strip_split = line_strip.split()
for word in line_strip_split:
if "e" not in word:
word_e = word
print (len(word_e))
My explanation is: Strip each word from each other by removing spaces, so it becomes ['Felt','at','its','kindled','core'], etc. Then we split each word because we can regard it individually when removing words with 'e'?. So we want words without e, then print the length of the string.
HOWEVER, this separates each word into a different line by splitting then separating the string? So this doesn't add all the words together in each line but separates it, so the answer becomes "4 / 2 / 3"

Try this:
for line in open("textname.txt"):
line_strip = line.strip()
line_strip_split = line_strip.split()
words_with_no_e = []
for word in line_strip_split:
if "e" not in word:
# Adding words without e to a new list
words_with_no_e.append(word)
# ''.join() will returns all the elements of array concatenated
# len() will count the length
print(len(''.join(words_with_no_e)))
It append all the words without e in into new list in each line, then concatenate all words then it prints length of it.

Related

Create a list of alphabetically sorted UNIQUE words and display the first N words in python

I am new to Python, apologize for a simple question. My task is the following:
Create a list of alphabetically sorted unique words and display the first 5 words
I have text variable, which contains a lot of text information
I did
test = text.split()
sorted(test)
As a result, I receive a list, which starts from symbols like $ and numbers.
How to get to words and print N number of them.
I'm assuming by "word", you mean strings that consist of only alphabetical characters. In such a case, you can use .filter to first get rid of the unwanted strings, turn it into a set, sort it and then print your stuff.
text = "$1523-the king of the 521236 mountain rests atop the king mountain's peak $#"
# Extract only the words that consist of alphabets
words = filter(lambda x: x.isalpha(), text.split(' '))
# Print the first 5 words
sorted(set(words))[:5]
Output-
['atop', 'king', 'mountain', 'of', 'peak']
But the problem with this is that it will still ignore words like mountain's, because of that pesky '. A regex solution might actually be far better in such a case-
For now, we'll be going for this regex - ^[A-Za-z']+$, which means the string must only contain alphabets and ', you may add more to this regex according to what you deem as "words". Read more on regexes here.
We'll be using re.match instead of .isalpha this time.
WORD_PATTERN = re.compile(r"^[A-Za-z']+$")
text = "$1523-the king of the 521236 mountain rests atop the king mountain's peak $#"
# Extract only the words that consist of alphabets
words = filter(lambda x: bool(WORD_PATTERN.match(x)), text.split(' '))
# Print the first 5 words
sorted(set(words))[:5]
Output-
['atop', 'king', 'mountain', "mountain's", 'of']
Keep in mind however, this gets tricky when you have a string like hi! What's your name?. hi!, name? are all words except they are not fully alphabetic. The trick to this is to split them in such a way that you get hi instead of hi!, name instead of name? in the first place.
Unfortunately, a true word split is far outside the scope of this question. I suggest taking a look at this question
I am newbie here, apologies for mistakes. Thank you.
test = '''The coronavirus outbreak has hit hard the cattle farmers in Pabna and Sirajganj as they are now getting hardly any customer for the animals they prepared for the last year targeting the Eid-ul-Azha this year.
Normally, cattle traders flock in large numbers to the belt -- one of the biggest cattle producing areas of the country -- one month ahead of the festival, when Muslims slaughter animals as part of their efforts to honour Prophet Ibrahim's spirit of sacrifice.
But the scene is different this year.'''
test = test.lower().split()
test2 = sorted([j for j in test if j.isalpha()])
print(test2[:5])
You can slice the sorted return list until the 5 position
sorted(test)[:5]
or if looking only for words
sorted([i for i in test if i.isalpha()])[:5]
or by regex
sorted([i for i in test if re.search(r"[a-zA-Z]")])
by using the slice of a list you will be able to get all list elements until a specific index in this case 5.

Text Indexing and Slicing

I am supposed to transform each sentence such that we only keep the words between the third and the third-last word (inclusive) and skip every second word on the way.
Text in jane_eyre_sentences.txt:
My feet they are sore and my limbs they are weary
Long is the way and the mountains are wild
Soon will the twilight close moonless and dreary
Over the path of the poor orphan child
My Code is shown as below:
for line in open("jane_eyre_sentences.txt"):
line_strip = line.rstrip()
words = line_strip.split()
if len(words)%2 == 0:
print(" ".join(words[2:-4:2]), ""+ "".join(words[-3]))
else:
print(" ".join(words[2:-3:2]),""+ "".join(words[-3]))
My Output:
they sore my they
the and mountains
the moonless
path poor
Expected Output:
they sore my they
the and mountains
the close
path the
You are appending the wrong word for even lines. You must change this line
print(" ".join(words[2:-4:2]), ""+ "".join(words[-3]))
to
print(" ".join(words[2:-4:2]), ""+ "".join(words[-4]))
You can also get rid of the unnecessary empty string and the second join as it is a single word anyway:
print(" ".join(words[2:-4:2]), words[-4])

How to return to original formatting

I have broken down lines of text file into individual words to check if they are in a dictionary. I now want to return/print the words back in the same lines.
I have tried editing the positions in my loop as I know I have the lines broken down already. I have thought that maybe I have to use a pop or remove function. I cannot use swap function.
def replace_mode(text_list,misspelling):
for line in text_list:
word = line.split(' ')
for element in word:
if element in misspelling.keys():
print(misspelling[element], end=(' '))
else:
print(element, end=(' '))
It is printing in a single line:
"joe and his family went to the zoo the other day the zoo had many animals including an elephant the elephant was being too dramatic though after they walked around joe left the zoo"
I want the processed text to be back in its original format(4 lines):
joe and his family went to the zoo the other day
the zooo had many animals including an elofent
the elaphant was being too dramati though
after they walked around joe left the zo
Add this line, right after your last print(element, end=(' ')) statement, at the same level of indentation as for element in word::
print()
This will print a newline at the end of each of the original lines, right after you've finished processing every word from that line but before you've moved on to the next line.

Splitting elements within a list and separate strings, then counting the length

If I have several lines of code, such that
"Jane, I don't like cavillers or questioners; besides, there is something truly forbidding in a child taking up her elders in that manner.
Be seated somewhere; and until you can speak pleasantly, remain silent."
I mounted into the window- seat: gathering up my feet, I sat cross-legged, like a Turk; and, having drawn the red moreen curtain nearly close, I was shrined in double retirement.
and I want to split the 'string' or sentences for each line by the ";" punctuation, I would do
for line in open("jane_eyre_sentences.txt"):
words = line.strip("\n")
words_split = words.split(";")
However, now I would get strings of text such that,
["Jane, I don't like cavillers or questioners', 'besides, there is something truly forbidding in a child taking up her elders in that manner.']
[Be seated somewhere', 'and until you can speak pleasantly, remain silent."']
['I mounted into the window- seat: gathering up my feet, I sat cross-legged, like a Turk', 'and, having drawn the red moreen curtain nearly close, I was shrined in double retirement.']
So it has now created two separate elements in this list.
How would I actually separate this list.
I know I need a 'for' loop because it needs to process through all the lines. I will need to use another 'split' method, however I have tried "\n" as well as ',' but it will not generate an answer, and the python thing says "AttributeError: 'list' object has no attribute 'split'". What would this mean?
Once I separate into separate strings, I want to calculate the length of each string, so i would do len(), etc.
You can iterate through the list of created words like this:
for line in open("jane_eyre_sentences.txt"):
words = line.strip("\n")
for sentence_part in words.split(";"):
print(sentence_part) # will print the elements of the list
print(len(sentence_part) # will print the length of the sentence parts
Alernatively if you just need the length for each of the parts:
for line in open("jane_eyre_sentences.txt"):
words = line.strip("\n")
sentence_part_lengths = [len(sentence_part) for sentence_part in words.split(";")]
Edit: With further information from your second post.
for count, line in enumerate(open("jane_eyre_sentences.txt")):
words = line.strip("\n")
if ";" in words:
wordssplit = words.split(";")
number_of_words_per_split = [(x, len(x.split())) for x in wordsplit]
print("Line {}: ".format(count), number_of_words_per_split)

How to use text.split() and retain blank (empty) lines

New to python, need some help with my program. I have a code which takes in an unformatted text document, does some formatting (sets the pagewidth and the margins), and outputs a new text document. My entire code works fine except for this function which produces the final output.
Here is the segment of the problem code:
def process(document, pagewidth, margins, formats):
res = []
onlypw = []
pwmarg = []
count = 0
marg = 0
for segment in margins:
for i in range(count, segment[0]):
res.append(document[i])
text = ''
foundmargin = -1
for i in range(segment[0], segment[1]+1):
marg = segment[2]
text = text + '\n' + document[i].strip(' ')
words = text.split()
Note: segment [0] means the beginning of the document, and segment[1] just means to the end of the document if you are wondering about the range. My problem is when I copy text to words (in words=text.split() ) it does not retain my blank lines. The output I should be getting is:
This is my substitute for pistol and ball. With a
philosophical flourish Cato throws himself upon his sword; I
quietly take to the ship. There is nothing surprising in
this. If they but knew it, almost all men in their degree,
some time or other, cherish very nearly the same feelings
towards the ocean with me.
There now is your insular city of the Manhattoes, belted
round by wharves as Indian isles by coral reefs--commerce
surrounds it with her surf.
And what my current output looks like:
This is my substitute for pistol and ball. With a
philosophical flourish Cato throws himself upon his sword; I
quietly take to the ship. There is nothing surprising in
this. If they but knew it, almost all men in their degree,
some time or other, cherish very nearly the same feelings
towards the ocean with me. There now is your insular city of
the Manhattoes, belted round by wharves as Indian isles by
coral reefs--commerce surrounds it with her surf.
I know the problem happens when I copy text to words, since it doesn't keep the blank lines. How can I make sure it copies the blank lines plus the words?
Please let me know if I should add more code or more detail!
First split on at least 2 newlines, then split on words:
import re
paragraphs = re.split('\n\n+', text)
words = [paragraph.split() for paragraph in paragraphs]
You now have a list of lists, one per paragraph; process these per paragraph, after which you can rejoin the whole thing into new text with double newlines inserted back in.
I've used re.split() to support paragraphs being delimited by more than 2 newlines; you could use a simple text.split('\n\n') if there are ever only going to be exactly 2 newlines between paragraphs.
use a regexp to find the words and the blank lines rather than split
m = re.compile('(\S+|\n\n)')
words=m.findall(text)

Categories

Resources