How do you get rid of the first word of the string? I want to get rid of the number and get the rest as a whole string.
Input text is:
1456208278 Hello world start
What I wanted for output was:
'Hello world start'
Here was my approach:
if isfile('/directory/text_file'):
with open('/directory/test_file', 'r') as f:
lines = f.readlines()
try:
first = str((lines[0].strip().split()))
final = first.split(None, 1)[1].strip("]")
print final
except Exception as e:
print str(e)
The output of the code was:
'Hello', 'world', 'start'
I do not want " ' " for every single string.
Alternative solution with the partition() method:
In [41]: '1456208278 Hello world start'.partition(' ')[2]
Out[41]: 'Hello world start'
If you split and then join, you may lose spaces which may be relevant for your application. Just search the first space and slice the string from the next character (I think it is also more efficient).
s = '1456208278 Hello world start'
s[s.index(' ') + 1:]
EDIT
Your code is way too complex for the task: first you split the line, getting a list, then you convert the list to a string, this means that you will get ' and ] in the string. Then you have to split again and clean the stuff. It's overly complex :)
Another approach you could use it to split and join, but as I said earlier, you may lose spaces:
s = '1456208278 Hello world start'
t1 = s.split() # ['1456208278', 'Hello', 'world', 'start']
t2 = s[1:] # ['Hello', 'world', 'start']
s2 = ' '.join(t2)
or more concisely
s2 = ' '.join(s.split()[1:])
This approach could be better if you want to use comma to separate the tokens, e.g.
s3 = ', '.join(s.split()[1:])
will produce
s3 = 'Hello, world, start'
Related
I have a task where I need to fetch N words before and after every substring (could be multiple words) in a string. I initially considered using str.split(" ") and work with the list but the issue is I'm fetching a substring which can be multiple words.
I've tried using str.partition and its very close to doing exactly what I want but it only gets the first keyword.
Code:
text = "Hello World how are you doing Hello is the keyword I'm trying to get Hello is a repeating word"
part = text.partition("Hello")
part = list(map(str.strip, part))
Output:
['', 'Hello', "World how are you doing Hello is the keyword I'm trying to get Hello is a repeating word"]
This gets me exactly what I need for the first keyword. I have enough to then get the prior and posterior words. Unfortunately, this fails me when the substring I'm looking for is repeating.
If the output could instead be a list of list partitions then I could actually make it work. How should I approach this?
text = "Hello World how are you doing Hello is the keyword I'm trying to get Hello is a repeating word"
def recursive_partition(text, pattern):
if not text:
return text
tmp = text.partition(pattern)
if tmp and tmp[1]:
return [tmp[0]] + [tmp[1]] + recursive_partition(tmp[2], pattern)
else:
return [tmp[0]]
res = recursive_partition(text, "Hello")
print(res) # ['', 'Hello', ' World how are you doing ', 'Hello', " is the keyword I'm trying to get ", 'Hello', ' is a repeating word']
I have a string
str1='This Python is good Good python'
I want the output removing duplicates keeping in the first word irrespective of case, for eg. good and Good are considered same as Python python. The output should be
output='This Python is good'
Following a rather traditional approach:
str1 = 'This Python is good Good python'
words_seen = set()
output = []
for word in str1.split():
if word.lower() not in words_seen:
words_seen.add(word.lower())
output.append(word)
output = ' '.join(output)
print(output) # This Python is good
A caveat: it would not preserve word boundaries consisting of multiple spaces: 'python puppy' would become 'python puppy'.
A very ugly short version:
words_seen = set()
output = ' '.join(word for word in str1.split() if not (word.lower() in words_seen or words_seen.add(word.lower())))
One approach might be to use regular expressions to remove any word for which we can find a duplicate. The catch is that regex engines move from start to end of a string. Since you want to retain the first occurrence, we can reverse the string, and then do the cleanup.
str1 = 'This Python is good Good python'
str1_reverse = ' '.join(reversed(str1.split(' ' )))
str1_reverse = re.sub(r'\s*(\w+)\s*(?=.*\b\1\b)', ' ', str1_reverse, flags=re.I)
str1 = ' '.join(reversed(str1_reverse.strip().split(' ' )))
print(str1) # This Python is good
Not sure how to do this. I have a string that I need the first part of it gone. When print(result.text) runs it prints "#PERSONSTWITTER their message" I need to remove the first part "#PERSONSTWITTER".
At first I had it remove everything from the #. I ran into a problem, the first is that the person username could be any amount of letters. (#PERSONSTWITTER2, #PERSONSTWITTER12, etc) they don't have the same amount of characters. Now I'm not sure what to do. Any help would be great!
So all I need is to isolate "their message" and not the username.
for s in twt:
sn = s.user.screen_name
m = "#%s MESSAGE" % (sn)
s = api.update_status(m, s.id)
#time.sleep(5)
for result in twt:
print(result.text)
You can use regular expressions:
import re
s = "#PERSONSTWITTER their message"
new_s = re.sub('^\S+', '', s)[1:]
Output:
'their message'
You may filter the words starting with # using string.startswith as:
>>> s = "#PERSONSTWITTER their message. #ANOTHERWRITER their another message."
>>> ' '.join(word for word in s.split() if not word.startswith('#'))
'their message. their another message.'
Here I'm firstly splitting your sentence into words, filtering the words not starting with #, and then joining back again.
Use the .split() method to convert the string into a list of strings formed from splitting the original at (by default) spaces.
Then use the .join method to join all the elements from index 1 on wards in the list together, separated again by a space.
s = "#PERSONSTWITTER their message"
' '.join(s.split()[1:])
# --> 'their message'
An alternative approach would be to just index the first space and slice from then on wards:
s = "#PERSONSTWITTER their message"
s[s.index(' ')+1:]
# --> 'their message'
Note that we had to add 1 to the index because strings are zero-based
s = "#PERSONSTWITTER their message"
split_s = s.split(' ')
message = ' '.join( split_s[1:] )
I want to split string that I have.
Lets say string is hello how are you.
I want to print only the how are (meaning start after hello and finish after are
My code for now just start after the hello, but print all the rest.
Want to avoid the you.
ReadJSONFile=JSONResponseFile.read() # this is the txt file with the line
print ReadJSONFile.split('hellow',1)[1] # this gives me everything after hello
You could use string slicing:
>>> s = "hello how are you"
>>> s[6:13]
'how are'
Combine two str.split calls:
>>> s = 'hello how are you'
>>> s.split('hello', 1)[-1]
' how are you'
>>> s.split('hello', 1)[-1].split('you', 1)[0]
' how are '
>>> s.split('hello', 1)[-1].split('you', 1)[0].strip() # remove surrounding spaces
'how are'
If you have the start and end indices you can extract an slice of the string by using the slice notation:
str = 'Hello how are you"
# you want from index 6 (h) to 12 (e)
print str[6:12+1]
This should help: (Using index and slicing)
>>> start = h.index('hello')+len('hello')
>>> end =h.index('you')
>>> h[start:end].strip()
'how are'
I have a line that i want to split into three parts:
line4 = 'http://www.example.org/lexicon#'+synset_offset+' http://www.monnetproject.eu/lemon#gloss '+gloss+''
The variable gloss contains full sentences, which I dont want to be split. How do I stop this from happening?
The final 3 split parts should be:
'http://www.example.org/lexicon#'+synset_offset+'
http://www.monnetproject.eu/lemon#gloss
'+gloss+''
after running triple = line4.split()
I'm struggling to understand, but why not just create a list to start with:
line4 = [
'http://www.example.org/lexicon#' + synset_offset,
'http://www.monnetproject.eu/lemon#gloss',
gloss
]
Simplified example - instead of joining them all together, then splitting them out again, just join them properly in the first place:
a = 'hello'
b = 'world'
c = 'i have spaces in me'
d = ' '.join((a,b,c)) # <- correct way
# hello world i have spaces in me
print ' '.join(d.split(' ', 2)) # take joined, split out again making sure not to split `c`, then join back again!?
If they are all begin with "http" you could split them using http as delimiter, otherwise you could do two steps:
First extract the first url from the string by using the space or http as
firstSplit=line4.split(' ', 1)
firstString= firstSplit.pop(0) -> pop the first url
secondSplit =firstSplit.join() -> join the rest
secondSplit[-1].split('lemon#gloss') ->splits the remaining two
>>> synset_offset = "foobar"
>>> gloss = "This is a full sentence."
>>> line4 = 'http://www.example.org/lexicon#'+synset_offset+' http://www.monnetproject.eu/lemon#gloss '+gloss
>>> import string
>>> string.split(line4, maxsplit=2)
['http://www.example.org/lexicon#foobar', 'http://www.monnetproject.eu/lemon#gloss', 'This is a full sentence.']
Not sure what you're trying to do here. If in general you're looking to avoid splitting a keyword, you should do:
>>> string.split(line:line.index(keyword)) + [line[line.index(keyword):line.index(keyword)+len(keyword)]] + string.split(line[line.index(keyword)+len(keyword):])
If the gloss (or whatever keyword part) of the string is the end part, that slice will just be an empty string ''; if that is the case, don't append it, or remove it if you do.