Im trying to find in the following string a way to get out the $30K and I would always want the first occurance of a $ (dollar sign) but bring through the full value
text = 'My 82 Benchmark $30K 1000m S7 $23'
text_string = (text.split())
text_string
Output = ['My', '82', 'Benchmark', '$30K', '1000m', 'S7', '$23']
I have tried this code
for i in text_string:
if('$' in i) :
print ("Element Exists")
This seems to know that it exists but not which one it exists in
You can easily get what you want by using re module
import re
To get the first occurrence you can do:
m = re.search('([$][0-9]+K?)', text)
print(m.group(0))
And if you want all occurrences you can do:
re.findall('([$][0-9]+K?)', text)
To get a list with all matches. Hope it helps.
You could split on the "$" before splitting on spaces:
"$"+text.split("$",1)[1].split()[0]
List-comprehensions are little overkill, use generator:
next((x for x in text.split() if '$' in x), '')
In code:
text = 'My 82 Benchmark $30K 1000m S7 $23'
print(next((x for x in text.split() if '$' in x), ''))
# 30K
You have done everything correctly, you just need to print the variable.
If you just want the first value, then you can use break to escape the loop.
This stops the loop once it reaches the desired value.
This code is fixed and printed $30.
text = 'My 82 Benchmark $30K 1000m S7 $23'
text_string = text.split()
print(text_string)
for i in text_string:
if '$' in i:
print(i)
break
I also removed the extra bracket in your code, and a space between print and open (
update
I will also try to explain what Tom Karzes said in a small answer here.
Tom Karzes said use list comprehension. To list comprehension the code is
strings = [s for s in text_string if '$' in s]
if len(strings) > 0:
print(strings[0])
list comprehension is a fast way of making lists. Here you go through all the strings in text_string list, and collect only the strings that have a $ character in it (see if '$' in s).
This way you get a string list, where all strings have a $ character in it. You can now check if size is more than 0, using len and then print first the item in string list.
If you just need the first element that has a $ symbol in it then you can simply break out of the loop.
text = 'My 82 Benchmark $30K 1000m S7 $23'
text_string = text.split()
print(text_string)
for i in text_string:
if '$' in i:
print(i)
break
Related
my problem is:
I have a list which, after numerous cleanup, features elements which looks like '455XYZ455'. I'm trying to remove everything after X, but it is inside a list. The code that allows me to do this is the following:
check = [re.sub(r'\W', '', i) for i in content]
# print(check)
check2 = [re.sub('[aclassnewpagehref]', '', i) for i in check]
# print(check2)
check3 = [re.sub('[/<=""]', '', i) for i in check2]
# print(check3)
check4 = [item for item in check3 if item != '']
print(check4)
As expected it gives me just like above, a lot of '455XYZ455'. I just want the '455', but this is a list.
Being a complete beginner in Python, I am entirely stuck.
Thank you for reading and perhaps helping me!
You can capture the first digits in group 1 that you want to keep and remove the rest starting from X.
\A(\d+)X.*\Z
Explanation
\A Start of string
(\d+)X Capture 1+ digits in group 1, then match X
.*\Z Match any char 0+ times and assert the end of the string
regex demo
For example
import re
final = ["455XYZ455", "455XYZ455"]
for item in final:
print(re.sub(r"^(\d+)X.*\Z", r"\1", item))
Output
455
455
You can do this just by using split without regex.Suppose the string '455XYZ455' is in variable a.
s = a.split('x')[0]
Here a will be split by 'x' and it will return the list of parts before and after 'x' but as you need just the part before 'x' , I've assigned the first element of list to s.
This is what I came up with, before getting stuck (NB source of the text : The Economist) :
import random
import re
text = 'One calculation by a film consultant implies that half of Hollywood productions with budgets over one hundred million dollars lose money.'
nbofwords = len(text.split())
words = text.split()
randomword = random.choice(words)
randomwordstr = str(randomword)
Step 1 works : Delete the random word from the original text
replaced1 = re.sub(randomwordstr, '', text)
replaced2 = re.sub(' ', ' ', replaced1)
Step 2 works : Select a defined number of random words
nbofsamples = 3
randomitems = random.choices(population=words, k=nbofsamples)
gives, e.g. ['over', 'consultant', 'One']
Step 3 works : Delete from the original text one element of that list of random words thanks to its index
replaced3 = re.sub(randomitems[1], '', text)
replaced4 = re.sub(' ', ' ', replaced3)
deletes the word 'consultant'
Step 4 fails : Delete from the original text all the elements of that list of random words thanks to their index
The best I can figure out is :
replaced5 = re.sub(randomitems[0],'',text)
replaced6 = re.sub(randomitems[1],'',replaced5)
replaced7 = re.sub(randomitems[2],'',replaced6)
replaced8 = re.sub(' ', ' ', replaced7)
print(replaced8)
It works (all 3 words have been deleteg), but it is clumsy and inefficient (I would have to rewrite it if I changed the nbofsamples variable).
How can I iterate from my list of random words (step 2) to delete those words in the original text ?
Thanks in advance
to delete words in a list from a string just use a for-loop. This will iterate through each item in the list, assigning the value of the item in the list to whatever variable you want (In this case i used "i", but i can be pretty much anything a normal variable could be) and executes the code in the loop until there are no more items in the list given. Here's the bare bones version of a for-loop:
list = []
for i in list:
print(i)
in your case you wanted to remove the words specified in the list from a string, so just plug the variable "i" into the same method you've been using to remove the words. After that you need a constantly changing variable, otherwise the loop would have only removed the last word in the list from the string. after that you can print the output. This code will work a list of and length.
r=replaced3
for i in randomitems:
replaced4 = re.sub(i, '', r)
r=replaced4
print(replaced4)
Note that as long as you do not use any regular expressions but replace just simple strings by others (or nothing), you don't need re:
for r in randomitems:
text = text.replace(r, '')
print(text)
For replacing only the first occurence you can simple set desired number of occurences in the replace function:
text = text.replace(r, '', 1)
Not sure how to do this. I have a string that I need the first part of it gone. When print(result.text) runs it prints "#PERSONSTWITTER their message" I need to remove the first part "#PERSONSTWITTER".
At first I had it remove everything from the #. I ran into a problem, the first is that the person username could be any amount of letters. (#PERSONSTWITTER2, #PERSONSTWITTER12, etc) they don't have the same amount of characters. Now I'm not sure what to do. Any help would be great!
So all I need is to isolate "their message" and not the username.
for s in twt:
sn = s.user.screen_name
m = "#%s MESSAGE" % (sn)
s = api.update_status(m, s.id)
#time.sleep(5)
for result in twt:
print(result.text)
You can use regular expressions:
import re
s = "#PERSONSTWITTER their message"
new_s = re.sub('^\S+', '', s)[1:]
Output:
'their message'
You may filter the words starting with # using string.startswith as:
>>> s = "#PERSONSTWITTER their message. #ANOTHERWRITER their another message."
>>> ' '.join(word for word in s.split() if not word.startswith('#'))
'their message. their another message.'
Here I'm firstly splitting your sentence into words, filtering the words not starting with #, and then joining back again.
Use the .split() method to convert the string into a list of strings formed from splitting the original at (by default) spaces.
Then use the .join method to join all the elements from index 1 on wards in the list together, separated again by a space.
s = "#PERSONSTWITTER their message"
' '.join(s.split()[1:])
# --> 'their message'
An alternative approach would be to just index the first space and slice from then on wards:
s = "#PERSONSTWITTER their message"
s[s.index(' ')+1:]
# --> 'their message'
Note that we had to add 1 to the index because strings are zero-based
s = "#PERSONSTWITTER their message"
split_s = s.split(' ')
message = ' '.join( split_s[1:] )
I am trying to figure out how to remove the first character of a words in a string.
My program reads in a string.
Suppose the input is :
this is demo
My intention is to remove the first character of each word of the string, that is
tid, leaving his s emo.
I have tried
Using a for loop and traversing the string
Checking for space in the string using isspace() function.
Storing the index of the letter which is encountered after the
space, i = char + 1, where char is the index of space.
Then, trying to remove the empty space using str_replaced = str[i:].
But it removed the entire string except the last one.
List comprehensions is your friend. This is the most basic version, in just one line
str = "this is demo";
print " ".join([x[1:] for x in str.split(" ")]);
output:
his s emo
In case the input string can have not only spaces, but also newlines or tabs, I'd use regex.
In [1]: inp = '''Suppose we have a
...: multiline input...'''
In [2]: import re
In [3]: print re.sub(r'(?<=\b)\w', '', inp)
uppose e ave
ultiline nput...
You can simply using python comprehension
str = 'this is demo'
mstr = ' '.join([s[1:] for s in str.split(' ')])
then mstr variable will contains these values 'his s emo'
This method is a bit long, but easy to understand. The flag variable stores if the character is a space. If it is, the next letter must be removed
s = "alpha beta charlie"
t = ""
flag = 0
for x in range(1,len(s)):
if(flag==0):
t+=s[x]
else:
flag = 0
if(s[x]==" "):
flag = 1
print(t)
output
lpha eta harlie
I'm looking for a clean way to get a set (list, array, whatever) of words starting with # inside a given string.
In C#, I would write
var hashtags = input
.Split (' ')
.Where (s => s[0] == '#')
.Select (s => s.Substring (1))
.Distinct ();
What is comparatively elegant code to do this in Python?
EDIT
Sample input: "Hey guys! #stackoverflow really #rocks #rocks #announcement"
Expected output: ["stackoverflow", "rocks", "announcement"]
With #inspectorG4dget's answer, if you want no duplicates, you can use set comprehensions instead of list comprehensions.
>>> tags="Hey guys! #stackoverflow really #rocks #rocks #announcement"
>>> {tag.strip("#") for tag in tags.split() if tag.startswith("#")}
set(['announcement', 'rocks', 'stackoverflow'])
Note that { } syntax for set comprehensions only works starting with Python 2.7.
If you're working with older versions, feed list comprehension ([ ]) output to set function as suggested by #Bertrand.
[i[1:] for i in line.split() if i.startswith("#")]
This version will get rid of any empty strings (as I have read such concerns in the comments) and strings that are only "#". Also, as in Bertrand Marron's code, it's better to turn this into a set as follows (to avoid duplicates and for O(1) lookup time):
set([i[1:] for i in line.split() if i.startswith("#")])
the findall method of regular expression objects can get them all at once:
>>> import re
>>> s = "this #is a #string with several #hashtags"
>>> pat = re.compile(r"#(\w+)")
>>> pat.findall(s)
['is', 'string', 'hashtags']
>>>
I'd say
hashtags = [word[1:] for word in input.split() if word[0] == '#']
Edit: this will create a set without any duplicates.
set(hashtags)
there are some problems with the answers presented here.
{tag.strip("#") for tag in tags.split() if tag.startswith("#")}
[i[1:] for i in line.split() if i.startswith("#")]
wont works if you have hashtag like '#one#two#'
2 re.compile(r"#(\w+)") wont work for many unicode languages (even using re.UNICODE)
i had seen more ways to extract hashtag, but found non of them answering on all cases
so i wrote some small python code to handle most of the cases. it works for me.
def get_hashtagslist(string):
ret = []
s=''
hashtag = False
for char in string:
if char=='#':
hashtag = True
if s:
ret.append(s)
s=''
continue
# take only the prefix of the hastag in case contain one of this chars (like on: '#happy,but i..' it will takes only 'happy' )
if hashtag and char in [' ','.',',','(',')',':','{','}'] and s:
ret.append(s)
s=''
hashtag=False
if hashtag:
s+=char
if s:
ret.append(s)
return set(ret)
Another option is regEx:
import re
inputLine = "Hey guys! #stackoverflow really #rocks #rocks #announcement"
re.findall(r'(?i)\#\w+', inputLine) # will includes #
re.findall(r'(?i)(?<=\#)\w+', inputLine) # will not include #