Text Merging - How to do this in Python? (R source)

Text Merging - How to do this in Python? (R source) - python

I have tried several methods but none worked to translate it to Python, specially because I have this error:
'str' object does not support item assignment
R can do the same with the following code:
f<-0
text<- c("foo", "btextr", "cool", "monsttex")
for (i in 1:length(text)){
f[i]<-paste(text[i],text[i+1], sep = "_")
}
f
The output is:
"foo_btextr" "btextr_cool" "cool_monsttex" "monsttex_NA"
I would appreciate so much if you can help me to do the same for Python. Thanks.

In R your output would have been (next time please put this in the question):
> f
[1] "foo_btextr" "btextr_cool" "cool_monsttex" "monsttex_NA"
In Python strings are immutable. So you'll need to create new strings, e.g.:
new_strings = []
text = ['foo', 'btextr', 'cool', 'monsttex']
for i,t in enumerate(text):
try:
new_strings.append(text[i] + '_' + text[i+1])
except IndexError:
new_strings.append(text[i] + '_NA')
Which results in:
>>> new_strings
['foo_btextr', 'btextr_cool', 'cool_monsttex', 'monsttex_NA']

this works:
>>> from itertools import zip_longest
>>>
>>> f = ['foo', 'btextr', 'cool', 'monsttex']
>>>
>>> ['_'.join(i) for i in zip_longest(f, f[1:], fillvalue='NA')]
['foo_btextr', 'btextr_cool', 'cool_monsttex', 'monsttex_NA']

Related

Make List to String Python

I want to make list data to string.
My list data like this :
[['data1'],['data2'],['data3']]
I want to convert to string like this :
"[data1] [data2] [data3]"
I try to use join like this :
data=[['data1'],['data2'],['data3']]
list=" ".join(data)
But get error like this :
string= " ".join(data)
TypeError: sequence item 0: expected string, list found
Can somebody help me?

Depending on how closely you want the output to conform to your sample, you have a few options, show here in ascending order of complexity:
>>> data=[['data1'],['data2'],['data3']]
>>> str(data)
"[['data1'], ['data2'], ['data3']]"
>>> ' '.join(map(str, data))
"['data1'] ['data2'] ['data3']"
>>> ' '.join(map(str, data)).replace("'", '')
'[data1] [data2] [data3]'
Keep in mind that, if your given sample of data doesn't match your actual data, these methods may or may not produce the desired results.

Have you tried?
data=[['data1'],['data2'],['data3']]
t = map(lambda x : str(x), data)
print(" ".join(t))
Live demo - https://repl.it/BOaS

In Python 3.x , the elements of the iterable for str.join() has to be a string .
The error you are getting - TypeError: sequence item 0: expected string, list found - is because the elements of the list you pass to str.join() is list (as data is a list of lists).
If you only have a single element per sublist, you can simply do -
" ".join(['[{}]'.format(x[0]) for x in data])
Demo -
>>> data=[['data1'],['data2'],['data3']]
>>> " ".join(['[{}]'.format(x[0]) for x in data])
'[data1] [data2] [data3]'
If the sublists can have multiple elements and in your output you want those multiple elements separated by a , . You can use a list comprehension inside str.join() to create a list of strings as you want. Example -
" ".join(['[{}]'.format(','.join(x)) for x in data])
For some other delimiter other than ',' , use that in - '<delimiter>'.join(x) .
Demo -
>>> data=[['data1'],['data2'],['data3']]
>>> " ".join(['[{}]'.format(','.join(x)) for x in data])
'[data1] [data2] [data3]'
For multiple elements in sublist -
>>> data=[['data1','data1.1'],['data2'],['data3','data3.1']]
>>> " ".join(['[{}]'.format(','.join(x)) for x in data])
'[data1,data1.1] [data2] [data3,data3.1]'

>>> import re
>>> l = [['data1'], ['data2'], ['data3']]
>>> s = ""
>>> for i in l:
s+= re.sub(r"\'", "", str(i))
>>> s
'[data1][data2][data3]'

How about this?
data = [['data1'], ['data2'], ['data3']]
result = " ".join('[' + a[0] + ']' for a in data)
print(result)

How about this:
In [13]: a = [['data1'],['data2'],['data3']]
In [14]: import json
In [15]: temp = " ".join([json.dumps(x) for x in a]).replace("\"", "")
In [16]: temp
Out[16]: '[data1] [data2] [data3]'

Try the following. This can also be achieved by "Reduce":
from functools import reduce
data = [['data1'], ['data2'], ['data3']]
print(list(reduce(lambda x,y : x+y, data)))
output: ['data1', 'data2', 'data3']

Moving parts of string around python

I have a string, well, several actually. The strings are simply:
string.a.is.this
or
string.a.im
in that fashion.
and what I want to do is make those stings become:
this.is.a.string
and
im.a.string
What I've tried:
new_string = string.split('.')
new_string = (new_string[3] + '.' + new_string[2] + '.' + new_string[1] + '.' + new_string[0])
Which works fine for making:
string.a.is.this
into
this.is.a.string
but gives me a error of 'out of range' if I try it on:
string.a.im
yet if I do:
new_string = (new_string[2] + '.' + new_string[1] + '.' + new_string[0])
that works fine to make:
string.a.im
into
im.a.string
but obviously does not work for:
string.a.is.this
since it is not setup for 4 indices. I was trying to figure out how to make the extra index optional, or any other work around, or, better method. Thanks.

You can use str.join, str.split, and [::-1]:
>>> mystr = 'string.a.is.this'
>>> '.'.join(mystr.split('.')[::-1])
'this.is.a.string'
>>> mystr = 'string.a.im'
>>> '.'.join(mystr.split('.')[::-1])
'im.a.string'
>>>
To explain better, here is a step-by-step demonstration with the first string:
>>> mystr = 'string.a.is.this'
>>>
>>> # Split the string on .
>>> mystr.split('.')
['string', 'a', 'is', 'this']
>>>
>>> # Reverse the list returned above
>>> mystr.split('.')[::-1]
['this', 'is', 'a', 'string']
>>>
>>> # Join the strings in the reversed list, separating them by .
>>> '.'.join(mystr.split('.')[::-1])
'this.is.a.string'
>>>

You could do it through python's re module,
import re
mystr = 'string.a.is.this'
regex = re.findall(r'([^.]+)', mystr)
'.'.join(regex[::-1])
'this.is.a.string'

Why wont this loop work

def cut(path):
test = str(foundfiles)
newList = [s for s in test if test.endswith('.UnitTests.vbproj')]
for m in newList:
print m
return newList
This function parses through foundliles which is a list of files in a folder that I have already parsed through of about 20+ files. I need to parse through that list of every file thta ends in ".UnitTests.vbproj" However, I can't get it working. Any advice would be greatly appreciated!
Edit1: This is what I made my code now, and I get the atrribute error message box saying that 'tuple' object has no attribute 'endswith'
def cut(path):
test = foundfiles
newList = [s for s in foundfiles if s.endswith('.UnitTests.vbproj')]
for m in newList:
print m
return newList

You turned the list into a string. Looping over test gives you individual characters instead:
>>> foundfiles = ['foo', 'bar']
>>> for c in str(foundfiles):
... print c
...
[
'
f
o
o
'
,
'
b
a
r
'
]
There is no need to turn foundfiles into a string. You also need to test the elements of the list, not test:
newList = [s for s in foundfiles if s.endswith('.UnitTests.vbproj')]

I really don't know what's the type of your 'foundfiles'.
Maybe this way will help you:
def cut(path):
import os
newlist = []
for parent,dirnames,filenames in os.walk(path):
for FileName in filenames:
fileName = os.path.join(parent,FileName)
if fileName.endswith('.UnitTests.vbproj'):newlist.append(fileName)
return newlist

Finding the index number for a string of words

I'm creating a program in python that will go through a list of sentences and find the words in capitals within the sentences. I've used a findall function to acquire the capitals at the moment.
Here is an example of the output I am receiving at the minute:
line 0: the dog_SUBJ bit_VERB the cat_OBJ
['S'] ['U'] ['B'] ['J'] [] ['V'] ['E'] ['R'] ['B'] [] ['O'] ['B'] ['J']
However, I want for the output to be full words, as so:
['SUBJ'] [] ['VERB'] [] ['OBJ']
I also want the indices of the words as so:
['SUBJ'] [0]
['VERB'] [1]
['OBJ'] [2]
Is it possible to do this? I've seen the above done before on in the terminal and I think that 'index' is used or something similar?
Here's my code below (as far as I have got):
import re, sys
f = open('findallEX.txt', 'r')
lines = f.readlines()
ii=0
for l in lines:
sys.stdout.write('line %s: %s' %(ii, l))
ii = ii + 1
results = []
for s in l:
results.append(re.findall('[A-Z]+', s))
Thanks! Any help would be greatly appreciated!

Something like:
>>> s = 'the dog_SUBJ bit_VERB the cat_OBJ'
>>> import re
>>> from itertools import count
>>> zip(re.findall('[A-Z]+', s), count())
[('SUBJ', 0), ('VERB', 1), ('OBJ', 2)]
Format as appropriate...

Python findall in a string

There must be an easier way or function to do this code here:
#!/usr/bin/env python
string = "test [*string*] test [*st[ *ring*] test"
points = []
result = string.find("[*")
new_string = string[result+1:]
while result != -1:
points.append(result)
new_string = new_string[result+1:]
result = new_string.find("[*")
print points
Any ideas?

import re
string = "test [*string*] test [*st[ *ring*] test"
points = [m.start() for m in re.finditer('\[', string)]

It looks like you're trying to get the indices in the string that match '[*'...
indices=[i for i in range(len(string)-1) if string[i:i+2] == '[*']
But this output is different than what your code will produce. Can you verify that your code does what you want?
Also note that string is the name of a python module in the standard library -- while it isn't used very often, it's probably a good idea to avoid using it as a variable name. (don't use str either)

>>> indexes = lambda str_, pattern: reduce(
... lambda acc, x: acc + [acc[-1] + len(x) + len(pattern)],
... str_.split(pattern), [-len(pattern)])[1:-1]
>>> indexes('123(456(', '(')
[3, 7]
>>> indexes('', 'x')
[]
>>> indexes("test [*string*] test [*st[ *ring*] test", '[*')
[5, 21]
>>> indexes('1231231','1')
[0, 3, 6]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Text Merging - How to do this in Python? (R source) - python

this works: >>> from itertools import zip_longest >>> >>> f = ['foo', 'btextr', 'cool', 'monsttex'] >>> >>> ['_'.join(i) for i in zip_longest(f, f[1:], fillvalue='NA')] ['foo_btextr', 'btextr_cool', 'cool_monsttex', 'monsttex_NA']

Related

Make List to String Python

Moving parts of string around python

Why wont this loop work

Finding the index number for a string of words

Python findall in a string

Categories

Resources