Merge 2 elements together if the elements contains - python

I have a very messy data I am noticing patterns where ever there is '\n' end of the element, it needs to be merged with single element before that.
sample list:
ls = ['hello','world \n','my name','is john \n','How are you?','I am \n doing well']
ls
return/tryouts:
print([s for s in ls if "\n" in s[-1]])
>>> ['world \n', 'is john \n'] # gave elements that ends with \n
How do I get it elements that ends with '\n' merge with 1 before element? Looking for a output like this one:
['hello world \n', 'my name is john \n', 'How are you?','I am \n doing well']

If you are reducing a list, maybe, one readable approach is to use reduce function.
functools.reduce(func, iter, [initial_value]) cumulatively performs an operation on all the iterable’s elements and, therefore, can’t be applied to infinite iterables.
First of all, you need a kind of struck to accumulate results, I use a tuple with two elements: buffer with concatenated strings until I found "\n" and the list of results. See initial struct (1).
ls = ['hello','world \n','my name','is john \n','How are you?','I am \n doing well']
def combine(x,y):
if y.endswith('\n'):
return ( "", x[1]+[x[0]+" "+y] ) #<-- buffer to list
else:
return ( x[0]+" "+y, x[1] ) #<-- on buffer
t=reduce( combine, ls, ("",[]) ) #<-- see initial struct (1)
t[1]+[t[0]] if t[0] else t[1] #<-- add buffer if not empty
Result:
['hello world \n', 'my name is john \n', 'How are you? ', 'I am \n doing well ']
(1) Explained initial struct: you use a tuple to store buffer string until \n and a list of already cooked strings:
("",[])
Means:
("__ buffer string not yet added to list __", [ __result list ___ ] )

I wrote this out so it is simple to understand instead of trying to make it more complex as a list comprehension.
This will work for any number of words until you hit a \n character and clean up the remainder of your input as well.
ls_out = [] # your outgoing ls
out = '' # keeps your words to use
for i in range(0, len(ls)):
if '\n' in ls[i]: # check for the ending word, if so, add it to output and reset
out += ls[i]
ls_out.append(out)
out = ''
else: # otherwise add to your current word list
out += ls[i]
if out: # check for remaining words in out if total ls doesn't end with \n
ls_out.append(out)
You may need to add spaces when you string concatenate but I am guessing that it is just with your example. If you do, make this edit:
out += ' ' + ls[i]
Edit:
If you want to only grab the one before and not multiple before, you could do this:
ls_out = []
for i in range(0, len(ls)):
if ls[i].endswith('\n'): # check ending only
if not ls[i-1].endswith('\n'): # check previous string
out = ls[i-1] + ' ' + ls[i] # concatenate together
else:
out = ls[i] # this one does, previous didn't
elif ls[i+1].endswith('\n'): # next one will grab this so skip
continue
else:
out = ls[i] # next one won't so add this one in
ls_out.append(out)

You can solve it using the regex expression using the 're' module.
import re
ls = ['hello','world \n','my name','is john \n','How are you?','I am \n doing well']
new_ls = []
for i in range(len(ls)):
concat_word = '' # reset the concat word to ''
if re.search(r"\n$", str(ls[i])): # matching the \n at the end of the word
try:
concat_word = str(ls[i-1]) + ' ' + str(ls[i]) # appending to the previous word
except:
concat_word = str(ls[i]) # in case if the first word in the list has \n
new_ls.append(concat_word)
elif re.search(r'\n',str(ls[i])): # matching the \n anywhere in the word
concat_word = str(ls[i])
new_ls.extend([str(ls[i-1]), concat_word]) # keeps the word before the "anywhere" match separate
print(new_ls)
This returns the output
['hello world \n', 'my name is john \n', 'How are you?', 'I am \n doing well']

Assuming the first element doesn't end with \n and all words are longer than 2 characters:
res = []
for el in ls:
if el[-2:] == "\n":
res[-1] = res[-1] + el
else:
res.append(el)

Try this:
lst=[]
for i in range(len(ls)):
if "\n" in ls[i][-1]:
lst.append((ls[i-1] + ' ' + ls[i]))
lst.remove(ls[i-1])
else:
lst.append(ls[i])
lst
Result:
['hello world \n', 'my name is john \n', 'How are you?', 'I am \n doing well']

Related

Wrong output in function

Hi I'am totally new to programmering and i have just jumped into it.
The problem i am trying to solve is to make a function that standardized an adress as input.
example:
def standardize_address(a):
numbers =[]
letters = []
a.replace('_', ' ')
for word in a.split():
if word. isdigit():
numbers. append(int(word))
elif word.isalpha():
letters.append(word)
s = f"{numbers} {letters}"
return s
Can someone help me explain my error and give me a "pro" programmers solution and "noob" (myself) solution?
This is what i should print:
a = 'New_York 10001'
s = standardize_address(a)
print(s)
and the output should be:
10001 New York
Right now my output is:
[10001] ['New', 'York']
Issues
strings are immutable so you need to keep the replace result, so do a = a.replace('_', ' ') or chain it before the split call
You need to concatenate the lists into one numbers + letters then join the elements with " ".join()
don't convert the numeric to int, that's useless and would force you to convert them back to str in the " ".join
def standardize_address(a):
numbers = []
letters = []
for word in a.replace('_', ' ').split():
if word.isdigit():
numbers.append(word)
elif word.isalpha():
letters.append(word)
return ' '.join(numbers + letters)
Improve
In fact you want to sort the words regarding the isdigit condition, so you can express that with a sort and the appropriate sorted
def standardize_address(value):
return ' '.join(sorted(value.replace('_', ' ').split(),
key=str.isdigit, reverse=True))
numbers and letters are both lists of strings, and if you format them they'll be rendered with []s and ''s appropriately. What you want to do is to replace this:
s = f"{numbers} {letters}"
return s
with this:
return ' '.join(numbers + letters)
numbers + letters is the combined list of number-strings and letter-strings, and ' '.join() takes that list and turns it into a string by putting ' ' between each item.

Combining continuous new lines based on a list in Python

I am trying to create a function that will continuous new lines based on a list in Python.
For example, I have a list:
I want my function to output this:
I have a function already however its output is wrong:
final_list = list()
for sentence in test:
if sentence != "\r\n":
print(sentence)
final_list.append(sentence)
else:
#Check if the next sentence is a newline as well
curr_idx = test.index(sentence)
curr_sentence = sentence
next_idx = curr_idx + 1
next_sentence = test[next_idx]
times = 0
while next_sentence == "\r\n":
times += 1
combine_newlines = next_sentence * times
next_idx += 1
next_sentence = test[next_idx]
continue
final_list.append(combine_newlines+ "\r\n")
Thank you very much!
You can use itertools.groupby to group consecutive items of the same value, and then join the grouped items for output:
from itertools import groupby
print([''.join(g) for _, g in groupby(test)])
test = [
'I will be out',
'I will have limited',
'\r\n',
'\r\n',
'\r\n',
'Thanks,',
'\r\n',
'Dave',
'\r\n',
'\r\n',
]
final_list = []
for e in test:
if e != '\r\n':
final_list.append(e)
else:
if len(final_list) == 0 or not final_list[-1].endswith('\r\n'):
final_list.append(e)
else:
final_list[-1] += e
prints
['I will be out', 'I will have limited', '\r\n\r\n\r\n', 'Thanks,', '\r\n', 'Dave', '\r\n\r\n']
Explanation: Iterate over your items e in test, if they are not equal to \r\n, simply append them to the final_list, otherwise either append them to final_list or extend the last element in final_list based on if it ends with \r\n.

How can I insert multiple values into a list? (python)

In Python, I want to add a space to each string in the list. The result is irrelevant whether it is a string or a list. But I would like to get the number of all cases where there can be spaces in the string.
first, I tried like this, Make string to list.
sent = 'apple is good for health'
sent.split(' ')
sentence = []
b = ''
for i in sent:
c = ''
for j in sent[(sent.index(i)):]:
c += j
sentence.append((b+' '+c).strip())
b += i
sentence
In this case, the result obtain a string that contains only one space.
I also tried
for i in range(len(sent)):
sent[i:i] = [' ']
and another try is used ' '.join(sent[i:])
but the results are same.
how can I get
'apple isgoodforhealth', 'appleis goodforhealth', 'appleisgood forhealth', 'appleisgoodfor health', 'appleis goodfor health', 'apple isgood forhealth', 'appleisgood forhealth' ...
like this?
I really want to get the number of all cases.
My take on the problem utilizing itertools.combinations.
import itertools
sent = 'apple is good for health'
sent = sent.split(' ')
# Get indices for spaces
N = range(len(sent) - 1)
for i in N:
# Get all combinations where the space suits
# Note that this doesn't include the option of no spaces at all
for comb in itertools.combinations(N, i + 1):
# Add space to the end of each word
# with index contained in the combination
listsent = [s + " " if j in comb else s for j, s in enumerate(sent)]
# Make the result a string or count the combinations if you like
tempsent = "".join(listsent)
print(tempsent)
Use Join:-
sent = 'apple is good for health'
sent = sent.split(' ')
start_index = 0
last_index = len(sent)
for i in range(len(sent)-1):
first_word = "".join(sent[start_index:i+1])
second_word = "".join(sent[i+1:last_index])
print(first_word, " ", second_word)
Hope the above code will give output your way i.e, 'apple
isgoodforhealth', 'appleis goodforhealth', 'appleisgood forhealth etc.

Removing And Re-Inserting Spaces

What is the most efficient way to remove spaces from a text, and then after the neccessary function has been performed, re-insert the previously removed spacing?
Take this example below, here is a program for encoding a simple railfence cipher:
from string import ascii_lowercase
string = "Hello World Today"
string = string.replace(" ", "").lower()
print(string[::2] + string[1::2])
This outputs the following:
hlooltdyelwrdoa
This is because it must remove the spacing prior to encoding the text. However, if I now want to re-insert the spacing to make it:
hlool tdyel wrdoa
What is the most efficient way of doing this?
As mentioned by one of the other commenters, you need to record where the spaces came from then add them back in
from string import ascii_lowercase
string = "Hello World Today"
# Get list of spaces
spaces = [i for i,x in enumerate(string) if x == ' ']
string = string.replace(" ", "").lower()
# Set string with ciphered text
ciphered = (string[::2] + string[1::2])
# Reinsert spaces
for space in spaces:
ciphered = ciphered[:space] + ' ' + ciphered[space:]
print(ciphered)
You could use str.split to help you out. When you split on spaces, the lengths of the remaining segments will tell you where to split the processed string:
broken = string.split(' ')
sizes = list(map(len, broken))
You'll need the cumulative sum of the sizes:
from itertools import accumulate, chain
cs = accumulate(sizes)
Now you can reinstate the spaces:
processed = ''.join(broken).lower()
processed = processed[::2] + processed[1::2]
chunks = [processed[index:size] for index, size in zip(chain([0], cs), sizes)]
result = ' '.join(chunks)
This solution is not especially straightforward or efficient, but it does avoid explicit loops.
Using list and join operation,
random_string = "Hello World Today"
space_position = [pos for pos, char in enumerate(random_string) if char == ' ']
random_string = random_string.replace(" ", "").lower()
random_string = list(random_string[::2] + random_string[1::2])
for index in space_position:
random_string.insert(index, ' ')
random_string = ''.join(random_string)
print(random_string)
I think this might Help
string = "Hello World Today"
nonSpaceyString = string.replace(" ", "").lower()
randomString = nonSpaceyString[::2] + nonSpaceyString[1::2]
spaceSet = [i for i, x in enumerate(string) if x == " "]
for index in spaceSet:
randomString = randomString[:index] + " " + randomString[index:]
print(randomString)
string = "Hello World Today"
# getting index of ' '
index = [i for i in range(len(string)) if string[i]==' ']
# storing the non ' ' characters
data = [i for i in string.lower() if i!=' ']
# applying cipher code as mention in OP STATEMENT
result = data[::2]+data[1::2]
# inserting back the spaces in there position as they had in original string
for i in index:
result.insert(i, ' ')
# creating a string solution
solution = ''.join(result)
print(solution)
# output hlool tdyel wrdoa
You can make a new string with this small yet simple (kind of) code:
Note this doesn't use any libraries, which might make this slower, but less confusing.
def weird_string(string): # get input value
spaceless = ''.join([c for c in string if c != ' ']) # get spaceless version
skipped = spaceless[::2] + spaceless[1::2] # get new unique 'code'
result = list(skipped) # get list of one letter strings
for i in range(len(string)): # loop over strings
if string[i] == ' ': # if a space 'was' here
result.insert(i, ' ') # add the space back
# end for
s = ''.join(result) # join the results back
return s # return the result

Removing whitespace from the end of string while keeping space in the middle of each letter

The goal of this code is to take a bunch of letters and print the first letter and every third letter after that for the user. What's the easiest way to remove the whitespace at the end of the output here while keeping all the spaces in the middle?
msg = input('Message? ')
for i in range(0, len(msg), 3):
print(msg[i], end = ' ')
str_object.rstrip() will return a copy of str_object without trailing whitespace. Just do
msg = input('Message? ').rstrip()
For what it's worth, you can replace your loop by string slicing:
print(*msg[::3], sep=' ')
n = ' hello '
n.rstrip()
' hello'
n.lstrip()
'hello '
n.strip()
'hello'
What about?
msg = input('Message? ')
output = ' '.join(msg[::3]).rstrip()
print(output)
You can use at least 2 methods:
1) Slicing method:
print(" ".join(msg[0::3]))
2) List comprehension (more readable/powerful):
print(" ".join([letter for i,letter in enumerate(msg) if i%3==0])

Categories

Resources