Not sure how to do this. I have a string that I need the first part of it gone. When print(result.text) runs it prints "#PERSONSTWITTER their message" I need to remove the first part "#PERSONSTWITTER".
At first I had it remove everything from the #. I ran into a problem, the first is that the person username could be any amount of letters. (#PERSONSTWITTER2, #PERSONSTWITTER12, etc) they don't have the same amount of characters. Now I'm not sure what to do. Any help would be great!
So all I need is to isolate "their message" and not the username.
for s in twt:
sn = s.user.screen_name
m = "#%s MESSAGE" % (sn)
s = api.update_status(m, s.id)
#time.sleep(5)
for result in twt:
print(result.text)
You can use regular expressions:
import re
s = "#PERSONSTWITTER their message"
new_s = re.sub('^\S+', '', s)[1:]
Output:
'their message'
You may filter the words starting with # using string.startswith as:
>>> s = "#PERSONSTWITTER their message. #ANOTHERWRITER their another message."
>>> ' '.join(word for word in s.split() if not word.startswith('#'))
'their message. their another message.'
Here I'm firstly splitting your sentence into words, filtering the words not starting with #, and then joining back again.
Use the .split() method to convert the string into a list of strings formed from splitting the original at (by default) spaces.
Then use the .join method to join all the elements from index 1 on wards in the list together, separated again by a space.
s = "#PERSONSTWITTER their message"
' '.join(s.split()[1:])
# --> 'their message'
An alternative approach would be to just index the first space and slice from then on wards:
s = "#PERSONSTWITTER their message"
s[s.index(' ')+1:]
# --> 'their message'
Note that we had to add 1 to the index because strings are zero-based
s = "#PERSONSTWITTER their message"
split_s = s.split(' ')
message = ' '.join( split_s[1:] )
Related
Im trying to find in the following string a way to get out the $30K and I would always want the first occurance of a $ (dollar sign) but bring through the full value
text = 'My 82 Benchmark $30K 1000m S7 $23'
text_string = (text.split())
text_string
Output = ['My', '82', 'Benchmark', '$30K', '1000m', 'S7', '$23']
I have tried this code
for i in text_string:
if('$' in i) :
print ("Element Exists")
This seems to know that it exists but not which one it exists in
You can easily get what you want by using re module
import re
To get the first occurrence you can do:
m = re.search('([$][0-9]+K?)', text)
print(m.group(0))
And if you want all occurrences you can do:
re.findall('([$][0-9]+K?)', text)
To get a list with all matches. Hope it helps.
You could split on the "$" before splitting on spaces:
"$"+text.split("$",1)[1].split()[0]
List-comprehensions are little overkill, use generator:
next((x for x in text.split() if '$' in x), '')
In code:
text = 'My 82 Benchmark $30K 1000m S7 $23'
print(next((x for x in text.split() if '$' in x), ''))
# 30K
You have done everything correctly, you just need to print the variable.
If you just want the first value, then you can use break to escape the loop.
This stops the loop once it reaches the desired value.
This code is fixed and printed $30.
text = 'My 82 Benchmark $30K 1000m S7 $23'
text_string = text.split()
print(text_string)
for i in text_string:
if '$' in i:
print(i)
break
I also removed the extra bracket in your code, and a space between print and open (
update
I will also try to explain what Tom Karzes said in a small answer here.
Tom Karzes said use list comprehension. To list comprehension the code is
strings = [s for s in text_string if '$' in s]
if len(strings) > 0:
print(strings[0])
list comprehension is a fast way of making lists. Here you go through all the strings in text_string list, and collect only the strings that have a $ character in it (see if '$' in s).
This way you get a string list, where all strings have a $ character in it. You can now check if size is more than 0, using len and then print first the item in string list.
If you just need the first element that has a $ symbol in it then you can simply break out of the loop.
text = 'My 82 Benchmark $30K 1000m S7 $23'
text_string = text.split()
print(text_string)
for i in text_string:
if '$' in i:
print(i)
break
I'm writing a program in python that'll split off the contents after the last space in a string. e.g. if a user enters "this is a test", I want it to return "test". I'm stuck on how to do this?
Easy and efficient with str.rsplit.
>>> x = 'this is a test'
>>> x.rsplit(None, 1)[-1] # Splits at most once from right on whitespace runs
'test'
Alternate:
>>> x.rpartition(' ')[-1] # Splits on the first actual space found
'test'
string = "this is a test"
lastWord = string.rsplit()[-1]
print lastWord
'test'
The fastest and most efficient way:
>>> "this is a test".rpartition(' ')[-1]
'test'
>>> help(str.rpartition)
Help on method_descriptor:
rpartition(...)
S.rpartition(sep) -> (head, sep, tail)
Search for the separator sep in S, starting at the end of S, and return
the part before it, the separator itself, and the part after it. If the
separator is not found, return two empty strings and S.
I am trying to get users who are mentioned in an article. That is, words starting with # symbol and then wrap < and > around them.
WHAT I TRIED:
def getUsers(content):
users = []
l = content.split(' ')
for user in l:
if user.startswith('#'):
users.append(user)
return users
old_string = "Getting and replacing mentions of users. #me #mentee #you #your #us #usa #wo #world #word #wonderland"
users = getUsers(old_string)
new_array = old_string.split(' ')
for mention in new_array:
for user in users:
if mention == user and len(mention) == len(user):
old_string = old_string.replace(mention, '<' + user + '>')
print old_string
print users
The code is behaving funny. It wraps words starting with the same alphabets and even truncate subsequent as shown in the print below:
RESULT:
Getting and replacing mentions of users. <#me> <#me>ntee <#you> <#you>r <#us> <#us>a <#wo> <#wo>rld <#wo>rd <#wo>nderland
['#me', '#mentee', '#you', '#your', '#us', '#usa', '#wo', '#world', '#word', '#wonderland']
EXPECTED RESULT:
Getting and replacing mentions of users. <#me> <#mentee> <#you> <#your> <#us> <#usa> <#wo> <#world> <#word> <#wonderland>
['#me', '#mentee', '#you', '#your', '#us', '#usa', '#wo', '#world', '#word', '#wonderland']
Process finished with exit code 0
Why is this happening and how can do this the right way?
Why this happens: When you split the string, you put a lot of checks in to make sure you are looking at the right user e.g. you have #me and #mentee - so for user me, it will match the first, and not the second.
However, when you do replace, you are doing replace on the whole string - so when you say to replace e.g. #me with <#me>, it doesn't know anything about your careful split - it's just going to look for #me in the string and replace it. So #mentee ALSO contains #me, and will get replaced.
Two (well, three) choices: One is to add the spaced around it, to gate it (like #parchment wrote).
Second is to use your split: Instead of replacing the original string, replace the local piece. The simplest way to do this is with enumerate:
new_array = old_string.split(' ')
for index, mention in enumerate(new_array):
for user in users:
if mention == user and len(mention) == len(user):
#We won't replace this in old_string, we'll replace the current entry
#old_string = old_string.replace(a, '<' + user + '>')
new_array[index] = '<%s>'%user
new_string = ' '.join(new_array)
Third way... this is a bit more complex, but what you really want is for any instance of '#anything' to be replaced with <#anything> (perhaps with whitespace?). You can do this in one shot with re.sub:
new_string = re.sub(r'(#\w+)', r'<\g<0>>', old_string)
My previous answer was based entirely on correcting the problems in your current code. But, there is a better way to do this, which is using regular expressions.
import re
oldstring = re.sub(r'(#\w+)\b', r'<\1>', oldstring)
For more information, see the documentation on the re module.
Because #me occurs first in your array, your code replaces the #me in #mentee.
Simplest way to fix that is to add a space after the username that you want to be replaced:
old_string = old_string.replace(a + ' ', '<' + user + '> ')
# I added space here ^ and here ^
A new problem occurs, though. The last word is not wrapped, because there's no space after it. A very simple way to fix it would be:
oldstring = oldstring + ' '
for mention in ... # Your loop
oldstring = oldstring[:-1]
This should work, as long as there isn't any punctuation (like commas) next to the usernames.
def wrapUsers(content):
L = content.split()
newL = []
for word in L:
if word.startswith('#'): word = '<'+word+'>'
newL.append(word)
return " ".join(newL)
I have a line that i want to split into three parts:
line4 = 'http://www.example.org/lexicon#'+synset_offset+' http://www.monnetproject.eu/lemon#gloss '+gloss+''
The variable gloss contains full sentences, which I dont want to be split. How do I stop this from happening?
The final 3 split parts should be:
'http://www.example.org/lexicon#'+synset_offset+'
http://www.monnetproject.eu/lemon#gloss
'+gloss+''
after running triple = line4.split()
I'm struggling to understand, but why not just create a list to start with:
line4 = [
'http://www.example.org/lexicon#' + synset_offset,
'http://www.monnetproject.eu/lemon#gloss',
gloss
]
Simplified example - instead of joining them all together, then splitting them out again, just join them properly in the first place:
a = 'hello'
b = 'world'
c = 'i have spaces in me'
d = ' '.join((a,b,c)) # <- correct way
# hello world i have spaces in me
print ' '.join(d.split(' ', 2)) # take joined, split out again making sure not to split `c`, then join back again!?
If they are all begin with "http" you could split them using http as delimiter, otherwise you could do two steps:
First extract the first url from the string by using the space or http as
firstSplit=line4.split(' ', 1)
firstString= firstSplit.pop(0) -> pop the first url
secondSplit =firstSplit.join() -> join the rest
secondSplit[-1].split('lemon#gloss') ->splits the remaining two
>>> synset_offset = "foobar"
>>> gloss = "This is a full sentence."
>>> line4 = 'http://www.example.org/lexicon#'+synset_offset+' http://www.monnetproject.eu/lemon#gloss '+gloss
>>> import string
>>> string.split(line4, maxsplit=2)
['http://www.example.org/lexicon#foobar', 'http://www.monnetproject.eu/lemon#gloss', 'This is a full sentence.']
Not sure what you're trying to do here. If in general you're looking to avoid splitting a keyword, you should do:
>>> string.split(line:line.index(keyword)) + [line[line.index(keyword):line.index(keyword)+len(keyword)]] + string.split(line[line.index(keyword)+len(keyword):])
If the gloss (or whatever keyword part) of the string is the end part, that slice will just be an empty string ''; if that is the case, don't append it, or remove it if you do.
I am trying to figure out how to remove the first character of a words in a string.
My program reads in a string.
Suppose the input is :
this is demo
My intention is to remove the first character of each word of the string, that is
tid, leaving his s emo.
I have tried
Using a for loop and traversing the string
Checking for space in the string using isspace() function.
Storing the index of the letter which is encountered after the
space, i = char + 1, where char is the index of space.
Then, trying to remove the empty space using str_replaced = str[i:].
But it removed the entire string except the last one.
List comprehensions is your friend. This is the most basic version, in just one line
str = "this is demo";
print " ".join([x[1:] for x in str.split(" ")]);
output:
his s emo
In case the input string can have not only spaces, but also newlines or tabs, I'd use regex.
In [1]: inp = '''Suppose we have a
...: multiline input...'''
In [2]: import re
In [3]: print re.sub(r'(?<=\b)\w', '', inp)
uppose e ave
ultiline nput...
You can simply using python comprehension
str = 'this is demo'
mstr = ' '.join([s[1:] for s in str.split(' ')])
then mstr variable will contains these values 'his s emo'
This method is a bit long, but easy to understand. The flag variable stores if the character is a space. If it is, the next letter must be removed
s = "alpha beta charlie"
t = ""
flag = 0
for x in range(1,len(s)):
if(flag==0):
t+=s[x]
else:
flag = 0
if(s[x]==" "):
flag = 1
print(t)
output
lpha eta harlie