How to concatenate line space string in python - python

I have a python code which is on seperate line and want to get a response with "\n". When I write a code
txt="""a
b
c
d"""
txt = str.join(" ", txt.splitlines())
x = txt.split()
s = ""
for item in x:
s += item + "\ n"
print(s)
it gives me response correctly because i have a space:
a\ nb\ nc\ nd\ n
But if i take sspace between \ and n I get response back as
a
b
c
d
I want one blob and should be give me response as a blob of one character.

Great thank you. I just had to add two // and it worked.
txt="""a
b
c
d"""
txt = str.join(" ", txt.splitlines())
x = txt.split()
s = ""
for item in x:
s += item + "\\n"
print(s)

Related

Concatenate the single characters in texts

I have a list with company names, some of them has abbreviations. ex:
compNames = ['Costa Limited', 'D B M LTD']
I need to convert compNames of text to a matrix of token counts using the following. But this does not output columns for B D M in D B M LTD
count_vect = CountVectorizer(analyzer='word')
count_vect.fit_transform(compNames).toarray()
What is the best way to concatenate the single characters in a text?
ex: 'D B M LTD' to 'DBM LTD'
import re
string = 'D B M LTD'
print re.sub("([^ ]) ", r"\1", re.sub(" ([^ ]{2,})", r" \1", string))
Awkward, but it should work. It introduces an additional space in front of LTD and then replaces "D " with "D", "B " with "B" and so on.
Here is a short function that breaks a string on white space characters to a list, iterates the list, builds a temporary string if the element is of length 1, appends the temp string to a new list when an element with length greater than one is encounters.
import re
a = 'D B M LTD'
def single_concat(s):
out = []
tmp = ''
for x in re.split(r'\s+', s):
if len(x) == 1:
tmp += x
else:
if tmp:
out.append(tmp)
out.append(x)
tmp = ''
return ' '.join(out)
single_concat(a)
# returns:
'DBM LTD'
import re
s = "D B M LTD"
first_part = ''
for chunk in re.compile("([A-Z]{1})\s").split(s):
if len(chunk) == 1:
first_part += chunk
elif len(chunk) > 1:
last_part = chunk
print(first_part + " " + last_part)
Prints DBM LTD.
import re
string = 'D B M LTD'
print re.sub(r"\+", r"", re.sub(r"\+(\w\B)", r" \1", re.sub(r"(\b\w) ", r"\1+", string)))
I'm using the + character as temporary, assuming there are no + characters in the string. If there are, use some other that doesn't occur.
Look, no re:
def mingle(s):
""" SO: 49692941 """
l = s.split()
r = []
t = []
for e in l:
if len(e) == 1:
t.append(e)
else:
j = "".join(t)
r.append( j )
r.append( e )
t = []
return " ".join(r)
print( mingle('D B M LTD') )
prints
DBM LTD

Python find similar sequences in string

I want a code to return sum of all similar sequences in two string. I wrote the following code but it only returns one of them
from difflib import SequenceMatcher
a='Apple Banana'
b='Banana Apple'
def similar(a,b):
c = SequenceMatcher(None,a.lower(),b.lower()).get_matching_blocks()
return sum( [c[i].size if c[i].size>1 else 0 for i in range(0,len(c)) ] )
print similar(a,b)
and the output will be
6
I expect it to be: 11
get_matching_blocks() returns the longest contiguous matching subsequence. Here the longest matching subsequence is 'banana' in both the strings, with length 6. Hence it is returning 6.
Try this instead:
def similar(a,b):
c = 'something' # Initialize this to anything to make the while loop condition pass for the first time
sum = 0
while(len(c) != 1):
c = SequenceMatcher(lambda x: x == ' ',a.lower(),b.lower()).get_matching_blocks()
sizes = [i.size for i in c]
i = sizes.index(max(sizes))
sum += max(sizes)
a = a[0:c[i].a] + a[c[i].a + c[i].size:]
b = b[0:c[i].b] + b[c[i].b + c[i].size:]
return sum
This "subtracts" the matching part of the strings, and matches them again, until len(c) is 1, which would happen when there are no more matches left.
However, this script doesn't ignore spaces. In order to do that, I used the suggestion from this other SO answer: just preprocess the strings before you pass them to the function like so:
a = 'Apple Banana'.replace(' ', '')
b = 'Banana Apple'.replace(' ', '')
You can include this part inside the function too.
When we edit your code to this it will tell us where 6 is coming from:
from difflib import SequenceMatcher
a='Apple Banana'
b='Banana Apple'
def similar(a,b):
c = SequenceMatcher(None,a.lower(),b.lower()).get_matching_blocks()
for block in c:
print "a[%d] and b[%d] match for %d elements" % block
print similar(a,b)
a[6] and b[0] match for 6 elements
a[12] and b[12] match for 0 elements
I made a small change to your code and it is working like a charm, thanks #Antimony
def similar(a,b):
a=a.replace(' ', '')
b=b.replace(' ', '')
c = 'something' # Initialize this to anything to make the while loop condition pass for the first time
sum = 0
i = 2
while(len(c) != 1):
c = SequenceMatcher(lambda x: x == ' ',a.lower(),b.lower()).get_matching_blocks()
sizes = [i.size for i in c]
i = sizes.index(max(sizes))
sum += max(sizes)
a = a[0:c[i].a] + a[c[i].a + c[i].size:]
b = b[0:c[i].b] + b[c[i].b + c[i].size:]
return sum

Python - Split a string into list after a certain number of special characters

I have a python program which does a SOAP request to a server, and it works fine:
I get the answer from the server, parse it, clean it, and when I am done, I end up with a string like that:
name|value|value_name|default|seq|last_modify|record_type|1|Detail|0|0|20150807115904|zero_out|0|No|0|0|20150807115911|out_ind|1|Partially ZeroOut|0|0|20150807115911|...
Basically, it is a string with values delimited by "|". I also know the structure of the database I am requesting, so I know that it has 6 columns and various rows. I basically need to split the string after every 6th "|" character, to obtain something like:
name|value|value_name|default|seq|last_modify|
record_type|1|Detail|0|0|20150807115904|
zero_out|0|No|0|0|20150807115911|
out_ind|1|Partially ZeroOut|0|0|20150807115911|...
Can you tell me how to do that in Python? Thank you!
Here's a functional-style solution.
s = 'name|value|value_name|default|seq|last_modify|record_type|1|Detail|0|0|20150807115904|zero_out|0|No|0|0|20150807115911|out_ind|1|Partially ZeroOut|0|0|20150807115911|'
for row in map('|'.join, zip(*[iter(s.split('|'))] * 6)):
print(row + '|')
output
name|value|value_name|default|seq|last_modify|
record_type|1|Detail|0|0|20150807115904|
zero_out|0|No|0|0|20150807115911|
out_ind|1|Partially ZeroOut|0|0|20150807115911|
For info on how zip(*[iter(seq)] * rowsize) works, please see the links at Splitting a list into even chunks.
data = "name|value|value_name|default|seq|last_modify|record_type|1|Detail|0|0|20150807115904|zero_out|0|No|0|0|20150807115911|out_ind|1|Partially ZeroOut|0|0|20150807115911|"
splits = data.split('|')
splits = list(filter(None, splits)) # Filter empty strings
row_len = 6
rows = ['|'.join(splits[i:i + row_len]) + '|' for i in range(0, len(splits), row_len)]
print(rows)
>>> ['name|value|value_name|default|seq|last_modify|', 'record_type|1|Detail|0|0|20150807115904|', 'zero_out|0|No|0|0|20150807115911|', 'out_ind|1|Partially ZeroOut|0|0|20150807115911|']
How about this:
a = 'name|value|value_name|default|seq|last_modify|record_type|1|Detail|0|0|20150807115904|zero_out|0|No|0|0|20150807115911|out_ind|1|Partially ZeroOut|0|0|20150807115911|'
b = a.split('|')
c = [b[6*i:6*(i+1)] for i in range(len(b)//6)] # this is a very workable form of data storage
print('\n'.join('|'.join(i) for i in c)) # produces your desired output
# prints:
# name|value|value_name|default|seq|last_modify
# record_type|1|Detail|0|0|20150807115904
# zero_out|0|No|0|0|20150807115911
# out_ind|1|Partially ZeroOut|0|0|20150807115911
Here is a flexible generator approach:
def splitOnNth(s,d,n, keep = False):
i = s.find(d)
j = 1
while True:
while i > 0 and j%n != 0:
i = s.find(d,i+1)
j += 1
if i < 0:
yield s
return #end generator
else:
yield s[:i+1] if keep else s[:i]
s = s[i+1:]
i = s.find(d)
j = 1
#test runs, showing `keep` in action:
test = 'name|value|value_name|default|seq|last_modify|record_type|1|Detail|0|0|20150807115904|zero_out|0|No|0|0|20150807115911|out_ind|1|Partially ZeroOut|0|0|20150807115911|'
for s in splitOnNth(test,'|',6,True): print(s)
print('')
for s in splitOnNth(test,'|',6): print(s)
Output:
name|value|value_name|default|seq|last_modify|
record_type|1|Detail|0|0|20150807115904|
zero_out|0|No|0|0|20150807115911|
out_ind|1|Partially ZeroOut|0|0|20150807115911|
name|value|value_name|default|seq|last_modify
record_type|1|Detail|0|0|20150807115904
zero_out|0|No|0|0|20150807115911
out_ind|1|Partially ZeroOut|0|0|20150807115911
There are really many ways to do it. Even with a loop:
a = 'name|value|value_name|default|seq|last_modify|record_type|1|Detail|0|0|20150807115904' \
'|zero_out|0|No|0|0|20150807115911|out_ind|1|Partially ZeroOut|0|0|20150807115911|'
new_a = []
ind_start, ind_end = 0, 0
for i in range(a.count('|')// 6):
for i in range(6):
ind_end = a.index('|', ind_end+1)
print(a[ind_start:ind_end + 1])
new_a.append(a[ind_start:ind_end+1])
ind_start = ind_end+1
The print is just to saw the results, you remove it:
name|value|value_name|default|seq|last_modify|
record_type|1|Detail|0|0|20150807115904|
zero_out|0|No|0|0|20150807115911|
out_ind|1|Partially ZeroOut|0|0|20150807115911|

Python: Count character in string which are following each other

I have a string in which I want to count the occurrences of # following each other to replace them by numbers to create a increment.
For example:
rawString = 'MyString1_test##_edit####'
for x in xrange(5):
output = doConvertMyString(rawString)
print output
MyString1_test01_edit0001
MyString1_test02_edit0002
MyString1_test03_edit0003
MyString1_test04_edit0004
MyString1_test05_edit0005
Assuming that the number of # is not fixed and that rawString is a user input containing only string.ascii_letters + string.digits + '_' + '#, how can I do that?
Here is my test so far:
rawString = 'MyString1_test##_edit####'
incrDatas = {}
key = '#'
counter = 1
for x in xrange(len(rawString)):
if rawString[x] != key:
counter = 1
continue
else:
if x > 0:
if rawString[x - 1] == key:
counter += 1
else:
pass
# ???
You may use zfill in the re.sub replacement to pad any amount of # chunks. #+ regex pattern matches 1 or more # symbols. The m.group() stands for the match the regex found, and thus, we replace all #s with the incremented x converted to string padded with the same amount of 0s as there are # in the match.
import re
rawString = 'MyString1_test##_edit####'
for x in xrange(5):
output = re.sub(r"#+", lambda m: str(x+1).zfill(len(m.group())), rawString)
print output
Result of the demo:
MyString1_test01_edit0001
MyString1_test02_edit0002
MyString1_test03_edit0003
MyString1_test04_edit0004
MyString1_test05_edit0005
The code below converts the rawString to a format string, using groupby in a list comprehension to find groups of hashes. Each run of hashes is converted into a format directive to print a zero-padded integer of the appropriate width, runs of non-hashes are simply joined back together.
This code works on Python 2.6 and later.
from itertools import groupby
def convert(template):
return ''.join(['{{x:0{0}d}}'.format(len(list(g))) if k else ''.join(g)
for k, g in groupby(template, lambda c: c == '#')])
rawString = 'MyString1_test##_edit####'
fmt = convert(rawString)
print(repr(fmt))
for x in range(5):
print(fmt.format(x=x))
output
'MyString1_test{x:02d}_edit{x:04d}'
MyString1_test00_edit0000
MyString1_test01_edit0001
MyString1_test02_edit0002
MyString1_test03_edit0003
MyString1_test04_edit0004
How about this-
rawString = 'MyString1_test##_edit####'
splitString = rawString.split('_')
for i in xrange(10): # you may put any count
print '%s_%s%02d_%s%04d' % (splitString[0], splitString[1][0:4], i, splitString[2][0:4], i, )
You can try this naive (and probably not most efficient) solution. It assumes that the number of '#' is fixed.
rawString = 'MyString1_test##_edit####'
for i in range(1, 6):
temp = rawString.replace('####', str(i).zfill(4)).replace('##', str(i).zfill(2))
print(temp)
>> MyString1_test01_edit0001
MyString1_test02_edit0002
MyString1_test03_edit0003
MyString1_test04_edit0004
MyString1_test05_edit0005
test_string = 'MyString1_test##_edit####'
def count_hash(raw_string):
str_list = list(raw_string)
hash_count = str_list.count("#") + 1
for num in xrange(1, hash_count):
new_string = raw_string.replace("####", "000" + str(num))
new_string = new_string.replace("##", "0" + str(num))
print new_string
count_hash(test_string)
It's a bit clunky, and only works for # counts of less than 10, but seems to do what you want.
EDIT: By "only works" I mean that you'll get extra characters with the fixed number of # symbols inserted
EDIT2: amended code

pattern finding in a string python

I try to create a modified LZW which will find patterns of words inside a string. My problem is that 1st element is '' and last is not checked if it is in the list. I saw the pseudo-code from here : https://www.cs.duke.edu/csed/curious/compression/lzw.html . Here is my script for compression:
string = 'this is a test a test this is pokemon'
diction = []
x = ""
count = 0
for c in string.split():
print (c)
print (x)
#x = x + " " + c
if x in diction:
x += " " + c
#print("debug")
else:
#print(x)
diction.append(x)
x = c
count +=1
#print(count)
print (diction)
I tried to fix the 2nd problem by 'appending' a random word to the end of the string but I don't think that's the best solution.
For the 1st problem I tried just to define the variable "x" as str or None but I get this < class 'str' > inside the list.
The link deals with character and splitting a string will give an array of words.
In order to get not an empty string in the dictionary and parsing the last element.
string = 'this is a test a test this is pokemon'
diction = []
x = ""
count = 0
for c in string.split():
print (c)
if x+" "+c in diction:
x += " " + c
else:
diction.append(x+" "+c)
x = c
count +=1
print (diction)
But perhaps you would like something like :
string = 'this is a test a test this is pokemon'
diction = []
x = ""
count = 0
for c in string:
print (c)
if x+c in diction:
x += c
else:
diction.append(x+c)
x = c
count +=1
print (diction)
I'm not sure what the code pretends, but to fix the issues that you mentioned I think you could do this:
string = 'this is a test a test this is pokemon'
diction = []
x = None
count = 0
for c in string.split():
if x in diction:
x += " " + c
else:
if x: diction.append(x)
x = c
count += 1
if not x in diction: diction.append(x)
print (diction)
The output for that code would be:
['this', 'is', 'a', 'test', 'a test', 'this is', 'pokemon']

Categories

Resources