How to split string while keeping \n

How to split string while keeping \n - python

I want to write the first letter of every item while linebreak stays the same but when I turn the list to string it's written in one line. Like this "I w t w f l o e I w l s s" but I want output to look like this "I w t \n w t f l \n o e i \n w l \n s s".
r = '''I want to
write the first letter
of every item
while linebreak
stay same'''
list_of_words = r.split()
m = [x[0] for x in list_of_words]
string = ' '.join([str(item) for item in m])
print(string)

What you are doing is you are splitting all the lines in a single go, so you are losing the information of each line. You need to create list of list to preserve the line information.
When you provide no argument means split according to any whitespace, that means both ' ' and '\n'.
r = '''I want to
write the first letter
of every item
while linebreak
stay same'''
list_of_words = [i.split() for i in r.split('\n')]
m = [[y[0] for y in x] for x in list_of_words]
string = '\n'.join([' '.join(x) for x in m])
print(string)
I w t
w t f l
o e i
w l
s s

Via regexp
r = '''I want to
write the first letter
of every item
while linebreak
stay same'''
import re
string = re.sub(r"(.)\S+(\s)", r"\1\2", r + " ")[:-1]
print(string)
Output:
I t
w t f l
o e i
w l
s s

What you're doing is - Get the first letter from each word
of the list and then joining them. You are not keeping track of the \n in the string.
You could do this instead.
list_of_words = r.split('\n')
m = [[x[0] for x in y.split()] for y in list_of_words]
for i in m:
string = ' '.join(i)
print(string)
Output
I w t
w t f l
o e i
w l
s s

Here is the solution by using while loop
r = '''I want to
write the first letter
of every item
while linebreak
stay same'''
total_lines = len(r.splitlines())
line_no = 0
while line_no < total_lines:
words_line = r.splitlines()[line_no]
list_of_words = words_line.split()
m = [x[0] for x in list_of_words]
print(' '.join([str(item) for item in m]))
line_no = line_no + 1

Since many valid methods were already provided, here's a nice and comprehensive way of doing the same task without the use of str.split(), which creates unnecessary list intermediates in memory (not that it represents any problem in this case though).
This method takes advantage of str.isspace() to deliver the whole set of instructions in one line:
string = "".join([string[i] for i in range(len(string)) if string[i].isspace() or string[i-1].isspace() or i == 0])

Related

How can I edit this code so that it prints the result a certain way?

I'm trying to get this code:
line = input("String: ")
letters = ""
words = line.split()
for word in words:
letters = letters + word[0]
print(" ".join(letters))
String = I have a dog
Result: I h a d
To print the result like this:
I
h
a
d

change your print statement for this one
print("\n".join(letters))
\n is the new line character, that's why.

Just out of curiosity, you can reduce the code to the:
line = input("String: ")
letters = map(lambda word: word[0], line.split())
print("\n".join(letters))

On top of #Capies' answer. You can simplify the code to:
>>> x = "what is foo bar"
>>> "\n".join([i[0] for i in x.split()])
'w\ni\nf\nb'
Which prints as
>>> print('w\ni\nf\nb')
w
i
f
b
Explanation:
[i[0] for i in x.split()]
i[0] is the first letter of each word in x
As for i in x.split() is the iteration of words in the string x
And you then join each letter with a newline charater \n with "\n".join

Using list comprehension.
[print(word[0]) for word in input().split()]

How to remove white spaces within a word using python?

This is the input given John plays chess and l u d o. I want the output to be in this format (given below)
John plays chess and ludo.
I have tried Regular expression for removing spaces
but doesn't work for me.
import re
sentence='John plays chess and l u d o'
sentence = re.sub(r"\s+", "", sentence, flags=re.UNICODE)
print(sentence)
I expected the output John plays chess and ludo. .
But the output I got is Johnplayschessandludo

This should work! In essence, the solution extracts the single characters out of the sentence, makes it a word and joins it back to the remaining sentence.
s = 'John plays chess and l u d o'
chars = []
idx = 0
#Get the word which is divided into single characters
while idx < len(s)-1:
#This will get the single characters around single spaces
if s[idx-1] == ' ' and s[idx].isalpha() and s[idx+1] == ' ':
chars.append(s[idx])
idx+=1
#This is get the single character if it is present as the last item
if s[len(s)-2] == ' ' and s[len(s)-1].isalpha():
chars.append(s[len(s)-1])
#Create the word out of single character
join_word = ''.join(chars)
#Get the other words
old_words = [item for item in s.split() if len(item) > 1]
#Form the final string
res = ' '.join(old_words + [join_word])
print(res)
The output will then look like
John plays chess and ludo

Above code won't maintain the sequence of words while solving the problem.
For example, try entering this sentence "John plays c h e s s and ludo"
Try using this instead if you have single word with whitespaces in the text at any position:
sentence = "John plays c h e s s and ludo"
sentence_list = sentence.split()
index = [index for index, item in enumerate(sentence_list) if len(item) == 1]
join_word = "".join([item for item in sentence_list if len(item) == 1])
if index != []:
list(map(lambda x: sentence_list.pop(index[0]), index[:-1]))
sentence_list[index[0]] = join_word
sentence = " ".join(sentence_list)
else:
sentence

Returning list of words counted on different list

Good afternoon guys,
Today i have been asked to write the following function:
def compareurl(url1,url2,enc,n)
This function compares two urls and return a list containing:
[word,occ_in_url1,occ_in_u2]
where:
word ---> word with n lenght
occ_in_url1 ---> times word in url1
occ_in_url2 ---> times word in url2
So I started writing the function, this is what i have wrote so far:
def compare_url(url1,url2,enc,n):
from urllib.request import urlopen
with urlopen('url1') as f1:
readpage1 = f1.read()
decodepage1 = readpage1.decode('enc')
with urlopen('url2') as f2:
readpage2 = f2.read()
decodepage2 = readpage2.decode('enc')
all_lower1 = decodepage1.lower()
all_lower2 = decodepage2.lower()
import string
all_lower1nopunctuation = "".join(l for l in all_lower1 if l not in string.punctuation)
all_lower2nopunctuation = "".join(l for l in all_lower2 if l not in string.punctuation)
for word1 in all_lower1nopunctuation:
if len(word1) == k:
all_lower1nopunctuation.count(word1)
for word2 in all_lower2nopunctuation:
if len(word2) == k:
all_lower2opunctuation.count(word2)
return(word1,all_lower1nopunctuation.count(word1),all_lower2opunctuation.count(word2))
return(word2,all_lower1nopunctuation.count(word1),all_lower2opunctuation.count(word2))
But this code doesn't work in the way I thought, actually it doesn't work at all.
I would also like to:
sort the returning list decreasingly (from the word which return the most times)
if 2 words occurs the same number of times, they must be returned in
alphabetical order

There are some typos in your code (watch out for those in the future) but there are some python problems (or things that can be improved) as well.
First of all, your imports should come in the top of the document
from urllib.request import urlopen
import string
You should call urlopen with a string, and that's what you are doing, but this string is 'url1' and not 'http://...'. You don't use variables inside quotes:
with urlopen(url1) as f1: #remove quotes
readpage1 = f1.read()
decodepage1 = readpage1.decode(enc) #remove quotes
with urlopen(url2) as f2: #remove quotes
readpage2 = f2.read()
decodepage2 = readpage2.decode(enc) #remove quotes
You need to improve your all_lower1nopunctuation initialization. You are replacing stackoverflow.com with stackoverflowcom, which should actually be stackoverflow com.
#all_lower1nopunctuation = "".join(l for l in all_lower1 if l not in string.punctuation)
#the if statement should be after 'l' and before 'for'
#you should include 'else' to replace the punctuation with a space
all_lower1nopunctuation = ''.join(l if l not in string.punctuation
else ' ' for l in all_lower1)
all_lower2nopunctuation = ''.join(l if l not in string.punctuation
else ' ' for l in all_lower2)
Merge both for into one. Also add the found word in a set (list of unique elements).
all_lower1nopunctuation.count(word1) returns the number of times word1 appears in all_lower1nopunctuation. It doesn't increment a counter.
for word1 in all_lower1nopunctuation doesn't work because all_lower1nopunctuation is a string (and not a list). Transform it into a list with .split(' ').
.replace('\n', '') removes all line breaks, otherwise they would be counted as words too.
#for word1 in all_lower1nopunctuation:
# if len(word1) == k: #also, this should be == n, not == k
# all_lower1nopunctuation.count(word1)
#for word2 in all_lower2nopunctuation:
# if len(word2) == k:
# all_lower2opunctuation.count(word2)
word_set = set([])
for word in all_lower1nopunctuation.replace('\n', '').split(' '):
if len(word) == n and word in all_lower2nopunctuation:
word_set.add(word) #set uses .add() instead of .append()
Now that you have a set of words that appear on both urls, you need to store how many word is in each url.
The following code will ensure you have a list of tuples as you asked
count_list = []
for final_word in word_set:
count_list.append((final_word,
all_lower1nopunctuation.count(final_word),
all_lower2nopunctuation.count(final_word)))
Returning means the function is finished and the interpreter continues wherever it was before the function was called, so whatever comes after the return is irrelevant.
As said by RemcoGerlich.
Your code will always only return the first return, so you need to merge both returns into one.
#return(word1,all_lower1nopunctuation.count(word1),all_lower2opunctuation.count(word2))
#return(word2,all_lower1nopunctuation.count(word1),all_lower2opunctuation.count(word2))
return(count_list) # which contains a list of tuples with all words and its counts
TL;DR
from urllib.request import urlopen
import string
def compare_url(url1,url2,enc,n):
with urlopen(url1) as f1:
readpage1 = f1.read()
decodepage1 = readpage1.decode(enc)
with urlopen(url2) as f2:
readpage2 = f2.read()
decodepage2 = readpage2.decode(enc)
all_lower1 = decodepage1.lower()
all_lower2 = decodepage2.lower()
all_lower1nopunctuation = ''.join(l if l not in string.punctuation
else ' ' for l in all_lower1)
all_lower2nopunctuation = ''.join(l if l not in string.punctuation
else ' ' for l in all_lower2)
word_set = set([])
for word in all_lower1nopunctuation.replace('\n', '').split(' '):
if len(word) == n and word in all_lower2nopunctuation:
word_set.add(word)
count_list = []
for final_word in word_set:
count_list.append((final_word,
all_lower1nopunctuation.count(final_word),
all_lower2nopunctuation.count(final_word)))
return(count_list)
url1 = 'https://www.tutorialspoint.com/python/list_count.htm'
url2 = 'https://stackoverflow.com/a/128577/7067541'
for word_count in compare_url(url1,url2, 'utf-8', 5):
print (word_count)

Python programming - beginner

so i have to create a code in which it reads every third letter and it creates a space in between each letter, my code creates the spaces but it also has a space after the last letter, this is my code:
msg = input("Message? ")
length = len(msg)
for i in range (0, length, 3):
x = msg[i]
print(x, end=" ")
My output was:
Message?
I enter:
cxohawalkldflghemwnsegfaeap
I get back
c h a l l e n g e
when the output isn't meant to have the last " " after the e.
I have read by adding print(" ".join(x)) should give me the output i need but when i put it in it just gives me a error. Please and Thank you

In Python, strings are one kind of data structures called sequences. Sequences support slicing, which is a simple and fancy way of doing things like "from nth", "to nth" and "every nth". The syntax is sequence[from_index:to_index:stride]. One does not even a for loop for doing that.ago
We can get every 3th character easily by omitting from_index and to_index, and have stride of 3:
>>> msg = input("Message? ")
cxohawalkldflghemwnsegfaeap
>>> every_3th = msg[::3]
>>> every_3th
'challenge'
Now, we just need to insert spaces after each letter. separator.join(iterable) will join elements from iterable together in order with the given separator in between. A string is an iterable, whose elements are the individiual characters.
Thus we can do:
>>> answer = ' '.join(every_3th)
>>> answer
'c h a l l e n g e'
For the final code we can omit intermediate variables and have still a quite readable two-liner:
>>> msg = input('Message? ')
>>> print(' '.join(msg[::3]))

Try
>>> print " ".join([msg[i] for i in range(0, len(msg), 3)])
'c h a l l e n g e'

Python fastest way to remove single spaces from spaced out letters in string

I have a document with some lines that have spaced out letters which I want to remove.
The problem is, that the strings are not following all the same rules. So I have some with just one space, also between the words and some with two or three speaces between the words
Examples:
"H e l l o g u y s"
"H e l l o g u y s"
"H e l l o g u y s"
all the above should be converted to --> "Hello guys"
"T h i s i s P a g e 1" --> "This is Page 1"
I wrote a script to remove every second space but not if next letter is numeric or capital. It's working almost OK, since the processed text is German and almost every time the words begin with capital letters... almost.
Anyways I'm not satisfied with it. So I'm asking if there is a neat function for my problem.
text = text.strip() # remove spaces from start and end
out = text
if text.count(' ') >= (len(text)/2)-1:
out = ''
idx = 0
for c in text:
if c != ' ' or re.match('[0-9]|\s|[A-Z0-9ÄÜÖ§€]', text[idx+1]) or (idx > 0 and text[idx-1] == '-'):
out += c
idx += 1
text = out

Not the most original answer but I've seen that your problem almost matches this one.
I have taken unutbu's answer, slightly modified it to solve your queries with enchant. If you have any other dictionary, you can use that instead.
import enchant
d = enchant.Dict("en_US") # or de_DE
def find_words(instring, prefix = ''):
if not instring:
return []
if (not prefix) and (d.check(instring)):
return [instring]
prefix, suffix = prefix + instring[0], instring[1:]
solutions = []
# Case 1: prefix in solution
if d.check(prefix):
try:
solutions.append([prefix] + find_words(suffix, ''))
except ValueError:
pass
# Case 2: prefix not in solution
try:
solutions.append(find_words(suffix, prefix))
except ValueError:
pass
if solutions:
return sorted(solutions,
key = lambda solution: [len(word) for word in solution],
reverse = True)[0]
else:
raise ValueError('no solution')
inp = "H e l l o g u y s T h i s i s P a g e 1"
newInp = inp.replace(" ", "")
print(find_words(newInp))
This outputs:
['Hello', 'guys', 'This', 'is', 'Page', '1']
The linked page certainly is a good starting point for some pragmatic solutions. However, I think a proper solution should use n-grams. This solution could be modified to make use of multiple whitespaces as well, since they might indicate the presence of a word boundary.
Edit:
You can also have a look at Generic Human's solution using a dictionary with relative word frequencies.

You can check whether a word is a english word and then split the words. You could use a dedicated spellchecking library like PyEnchant.
For example:
import enchant
d = enchant.Dict("en_US")
d.check("Hello")
This will be a good starter. But there is the problem with "Expertsexchange".

Converting "H e l l o g u y s" might be very hard or not under the scope of this site. but if you wont to convert the strings like "H e l l o g u y s" or other that the number of spaces between words is different from spaces between letters you can use a the following code :
>>> import re
>>> s1="H e l l o g u y s"
>>> s2="H e l l o g u y s"
>>> ' '.join([''.join(i.split()) for i in re.split(r' {2,}',s2)])
'Hello guys'
>>> ' '.join([''.join(i.split()) for i in re.split(r' {2,}',s1)])
'Hello guys'
this code use a regular expression (' {2,}') for split the words . that split the string from where that have more than 2 spaces !

Demo
This is an algorithm that could do it. Not battle-tested, but just an idea.
d = ['this', 'is', 'page', 'hello', 'guys']
m = ["H e l l o g u y s", "T h i s i s P a g e 1", "H e l l o g u y s", "H e l l o g u y s"]
j = ''.join(m[0].split()).lower()
temp = []
fix = []
for i in j:
temp.append(i)
s = ''.join(temp)
if s in d:
fix.append(s)
del temp[:]
if i.isdigit():
fix.append(i)
print(' '.join(fix))
Prints the following:
this is page 1, hello guys with your supplied test inputs.
Extending
You can use this dictionary which has words on each line, convert it to a list and play around from there.
Issues
As Martjin suggested, what would you do when you encounter "E x p e r t s e x c h a n g e". Well, in such scenarios, using n-gram probabilities would be an appropriate solution. For this you would have to look into NLP (Natural Language Processing) but I assume you don't want to go that far.

You cannot do this - the situation where valid word boundaries are represented the same way as spaces which should be removed is theoretically the same situation where you have no spaces at all in the text.
So you can "reduce" your problem to the problem of re-inserting word boundary spaces in a text with no spaces at all - which is just as impossible, because even with a dictionary containing every valid word - which you do not have -, you can either go for a greedy match and insert too few spaces, or go for a non-greedy match and insert too many.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to split string while keeping \n - python

Via regexp r = '''I want to write the first letter of every item while linebreak stay same''' import re string = re.sub(r"(.)\S+(\s)", r"\1\2", r + " ")[:-1] print(string) Output: I t w t f l o e i w l s s

Related

How can I edit this code so that it prints the result a certain way?

How to remove white spaces within a word using python?

Returning list of words counted on different list

Python programming - beginner

Python fastest way to remove single spaces from spaced out letters in string

Categories

Resources