I have a string:
some_string = "I rode my bike 100' North toward the train station"
I want to change the (' North ) part to (' N ), so that that part reads as (...my bike 100' N toward the...) etc.
Write now I'm trying:
some_string = some_string.replace("' North ", "' N ")
But it just stays the same.
I don't want to use anything tricky like .replace('orth', '') because I want it to work with longer sentences that might include instances of 'North' but no apostrophe nearby.
Why isn't my first method working?
Please help!
EDIT:
So I am getting that first string by searching within another string.
Python, for some reason, returns it so that the apostrophe is a different kind of apostrope!!? To distinguish it from the single quotes that are not escaped.
some_string = '’'
^ It looks like that (copied and pasted it). Where does that come from? How would I type it out using my keyboard? Wtf!
EDIT 2:
I am getting the first string from Adobe PDF. I think it is formatted as a "fancy quote" that you get by holding down Alt and typing 0146 on number pad!!!
In Python (and generally in most high level programming language), string are immutable. You can not change it. Indeed, you can produce another string.
So, to achieve your goal, here is my suggestion:
some_string = "I rode my bike 100' North toward the train station"
some_string = some_string.replace("' North ", "' N ") # assign the new string to the old string
print(some_string)
Output: "I rode my bike 100' N toward the train station"
This will work if you assign a new variable.
some_string = "I rode my bike 100' North toward the train station"
new_string = some_string.replace("North ", " N ")
print(new_string)
>> I rode my bike 100' N toward the train station
Strings are immutable, meaning they can't be changed; likewise, this method returns a new string, it does not edit in place. You simply need to assign the code you already have to a variable. You can even assign the output of your code above to the same variable, such that the variable name now points to a different string (i.e. the one you want.)
Try
some_string=some_string.replace("' North ", "' N ")
instead.
Note that there is a documentation, https://docs.python.org/3/library/stdtypes.html, telling
str.replace(old, new[, count])
Return a copy of the string with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.
I call bs on that edit, it works properly, see https://ideone.com/8Mossq
some_string = "I rode my bike 100' North toward the train station"
some_string = some_string.replace("' North ", "' N ")
print(some_string)
Result:
I rode my bike 100' N toward the train station
Related
I have this homework to do (no libraries allowed) and i've underestimated this problem:
let's say we have a list of strings: str_list = ["my head's", "free", "at last", "into alarm", "in another moment", "neck"]
What we know for sure about this is that every single string is contained in the master_string, are in order, and are without punctuation. (all this thanks to previous controls i've made)
Then we have the string: master_string = "'Come, my head's free at last!' said Alice in a tone of delight, which changed into alarm in another moment, when she found that her shoulders were nowhere to be found: all she could see, when she looked down, was an immense length of neck, which seemed to rise like a stalk out of a sea of green leaves that lay far below her."
What i must do here is basically check the sequences of string of at least k (in this case k = 2) from str_list that are contained in the master_string, however i underestimated the fact that in str_list we have more than 1 word in each string so doing master_string.split() won't take me anywhere because would mean to ask something like if "my head's" == "my" and that would be false of course.
I was thinking about doing something like concatenating strings somehow one at time and searching into the master_string.strip(".,:;!?") but if i find corresponding sequences i need absolutely to take them directly from the master_string because i need the punctuation in the result variable. This basically means to take directly slices from master_string but how can that be possible? Is even something possible or i got to change approach? This is driving me totally crazy especially because there are no libraries allowed to do this.
In case you're wondering what is the expected result here would be:
["my head's free at last!", "into alarm in another moment,"] (because both respect the condition of at least k strings from str_list) and "neck" would be saved in a discard_list since it doesn't respect that condition (it musn't be discarded with .pop() because i need to do other stuff with variables discarded)
Follows my solution:
Try to extend all the basing yourself on the master_string and a finite set of punctuation characters (e.g. my head’s -> my head’s free at last!; free -> free at last!).
Keep only the substrings that have been extended at least k times.
Remove redundant substrings (e.g. free at last! is already present with my head’s free at last!).
This is the code:
str_list = ["my head’s", "free", "at last", "into alarm", "in another moment", "neck"]
master_string = "‘Come, my head’s free at last!’ said Alice in a tone of delight, which changed into alarm in another moment, when she found that her shoulders were nowhere to be found: all she could see, when she looked down, was an immense length of neck, which seemed to rise like a stalk out of a sea of green leaves that lay far below her."
punctuation_characters = ".,:;!?" # list of punctuation characters
k = 1
def extend_string(current_str, successors_num = 0) :
# check if the next token is a punctuation mark
for punctuation_mark in punctuation_characters :
if current_str + punctuation_mark in master_string :
return extend_string(current_str + punctuation_mark, successors_num)
# check if the next token is a proper successor
for successor in str_list :
if current_str + " " + successor in master_string :
return extend_string(current_str + " " + successor, successors_num+1)
# cannot extend the string anymore
return current_str, successors_num
extended_strings = []
for s in str_list :
extended_string, successors_num = extend_string(s)
if successors_num >= k : extended_strings.append(extended_string)
extended_strings.sort(key=len) # sorting by ascending length
result_list = []
for es in extended_strings :
result_list = list(filter(lambda s2 : s2 not in es, result_list))
result_list.append(es)
print(result_list) # result: ['my head’s free at last!', 'into alarm in another moment,']
Ive got two different versions, number 1 gives you neck :(, but number 2 doesn't give you as much, here's number 1:
master_string = "Come, my head’s free at last!’ said Alice in a tone of delight, which changed into alarm in another moment, when she found that her shoulders were nowhere to be found: all she could see, when she looked down, was an immense length of neck, which seemed to rise like a stalk out of a sea of green leaves that lay far below her."
str_list = ["my head's", "free", "at last", "into alarm", "in another moment", "neck"]
new_str = ''
for word in str_list:
if word in master_string:
new_str += word + ' '
print(new_str)
and here's number 2:
master_string = "Come, my head’s free at last!’ said Alice in a tone of delight, which changed into alarm in another moment, when she found that her shoulders were nowhere to be found: all she could see, when she looked down, was an immense length of neck, which seemed to rise like a stalk out of a sea of green leaves that lay far below her."
str_list = ["my head's", "free", "at last", "into alarm", "in another moment", "neck"]
new_str = ''
for word in str_list:
if word in master_string:
new_word = word.split(' ')
if len(new_word) == 2:
new_str += word + ' '
print(new_str)
Say I have a sentence such as:
The bird flies at night and has a very large wing span.
My goal is to split the string so that the result comes out to be:
and has a very large wing
I've tried using split(), however, my efforts have not been successful. How can I split the string into pieces, and delete the beginning part of the string and the end part?
import re
text = 'The bird flies at night and has a very large wing span.'
l = re.split(r'.+?(?=and)|(?<=wing).+?', text)[1]
out:
and has a very large wing
I guess this is the best way to do what you want:
s = "The bird flies at night and has a very large wing span."
and_position = s.find("and") # return the first index of "and" in the string
wing_position = s.find("wing") # similar to the above
result = s[and_position:wing_position+4] # this is called python's slice
If you're not familiar with python slice, read more at here.
The following gives a syntax error:
my eyes = 'Brown' my_hair = 'Brown'
print "Hes got %s and %s hair" % (my_eyes, my_hair)
The only way this seems to work is if I put Brown, Brown in the last parenthesis.
You're incorrectly assigning, you should try to unpack the tuple of strings into two variables. In addition, Python variables can not contain spaces so you'll want to use an underscore for eyes.
my_eyes, my_hair = 'Brown', 'Brown' # unpacking tuple here
Also, I suggest you use the format method which is more common. That style is deprecated.
print "He's got {0} and {1} hair".format(my_eyes, my_hair)
The problem turned out to be that the period at the end of the print statement was outside of the parenthesis. This now works: % (eyes, hair). The format version also works now.
Here's your variables:
name = "some name"
Age = 57
Height = 64
Weight = 135
Eyes = "brown"
Teeth = "white"
Hair = "brown"
To print a string with variables, use str.format.
print "Let's talk about {}".format(name)
print "She's {} inches tall".format(Height)
... So on
Make sure that your variables contain no spaces. They're case sensitive too :)
I'm trying to execute a bunch of code only if the string I'm searching contains a comma.
Here's an example set of rows that I would need to parse (name is a column header for this tab-delimited file and the column (annoyingly) contains the name, degree, and area of practice:
name
Sam da Man J.D.,CEP
Green Eggs Jr. Ed.M.,CEP
Argle Bargle Sr. MA
Cersei Lannister M.A. Ph.D.
My issue is that some of the rows contain a comma, which is followed by an acronym which represents an "area of practice" for the professional and some do not.
My code relies on the principle that each line contains a comma, and I will now have to modify the code in order to account for lines where there is no comma.
def parse_ieca_gc(s):
########################## HANDLE NAME ELEMENT ###############################
degrees = ['M.A.T.','Ph.D.','MA','J.D.','Ed.M.', 'M.A.', 'M.B.A.', 'Ed.S.', 'M.Div.', 'M.Ed.', 'RN', 'B.S.Ed.', 'M.D.']
degrees_list = []
# separate area of practice from name and degree and bind this to var 'area'
split_area_nmdeg = s['name'].split(',')
area = split_area_nmdeg.pop() # when there is no area of practice and hence no comma, this pops out the name + deg and leaves an empty list, that's why 'print split_area_nmdeg' returns nothing and 'area' returns the name and deg when there's no comma
print 'split area nmdeg'
print area
print split_area_nmdeg
# Split the name and deg by spaces. If there's a deg, it will match with one of elements and will be stored deg list. The deg is removed name_deg list and all that's left is the name.
split_name_deg = re.split('\s',split_area_nmdeg[0])
for word in split_name_deg:
for deg in degrees:
if deg == word:
degrees_list.append(split_name_deg.pop())
name = ' '.join(split_name_deg)
# area of practice
category = area
re.search() and re.match() both do not work, it appears, because they return instances and not a boolean, so what should I use to tell if there's a comma?
The easiest way in python to see if a string contains a character is to use in. For example:
if ',' in s['name']:
if re.match(...) is not None :
instead of looking for boolean use that. Match returns a MatchObject instance on success, and None on failure.
You are already searching for a comma. Just use the results of that search:
split_area_nmdeg = s['name'].split(',')
if len(split_area_nmdeg) > 2:
print "Your old code goes here"
else:
print "Your new code goes here"
I'm trying to parse the title tag in an RSS 2.0 feed into three different variables for each entry in that feed. Using ElementTree I've already parsed the RSS so that I can print each title [minus the trailing )] with the code below:
feed = getfeed("http://www.tourfilter.com/dallas/rss/by_concert_date")
for item in feed:
print repr(item.title[0:-1])
I include that because, as you can see, the item.title is a repr() data type, which I don't know much about.
A particular repr(item.title[0:-1]) printed in the interactive window looks like this:
'randy travis (Billy Bobs 3/21'
'Michael Schenker Group (House of Blues Dallas 3/26'
The user selects a band and I hope to, after parsing each item.title into 3 variables (one each for band, venue, and date... or possibly an array or I don't know...) select only those related to the band selected. Then they are sent to Google for geocoding, but that's another story.
I've seen some examples of regex and I'm reading about them, but it seems very complicated. Is it? I thought maybe someone here would have some insight as to exactly how to do this in an intelligent way. Should I use the re module? Does it matter that the output is currently is repr()s? Is there a better way? I was thinking I'd use a loop like (and this is my pseudoPython, just kind of notes I'm writing):
list = bandRaw,venue,date,latLong
for item in feed:
parse item.title for bandRaw, venue, date
if bandRaw == str(band)
send venue name + ", Dallas, TX" to google for geocoding
return lat,long
list = list + return character + bandRaw + "," + venue + "," + date + "," + lat + "," + long
else
In the end, I need to have the chosen entries in a .csv (comma-delimited) file looking like this:
band,venue,date,lat,long
randy travis,Billy Bobs,3/21,1234.5678,1234.5678
Michael Schenker Group,House of Blues Dallas,3/26,4321.8765,4321.8765
I hope this isn't too much to ask. I'll be looking into it on my own, just thought I should post here to make sure it got answered.
So, the question is, how do I best parse each repr(item.title[0:-1]) in the feed into the 3 separate values that I can then concatenate into a .csv file?
Don't let regex scare you off... it's well worth learning.
Given the examples above, you might try putting the trailing parenthesis back in, and then using this pattern:
import re
pat = re.compile('([\w\s]+)\(([\w\s]+)(\d+/\d+)\)')
info = pat.match(s)
print info.groups()
('Michael Schenker Group ', 'House of Blues Dallas ', '3/26')
To get at each group individual, just call them on the info object:
print info.group(1) # or info.groups()[0]
print '"%s","%s","%s"' % (info.group(1), info.group(2), info.group(3))
"Michael Schenker Group","House of Blues Dallas","3/26"
The hard thing about regex in this case is making sure you know all the known possible characters in the title. If there are non-alpha chars in the 'Michael Schenker Group' part, you'll have to adjust the regex for that part to allow them.
The pattern above breaks down as follows, which is parsed left to right:
([\w\s]+) : Match any word or space characters (the plus symbol indicates that there should be one or more such characters). The parentheses mean that the match will be captured as a group. This is the "Michael Schenker Group " part. If there can be numbers and dashes here, you'll want to modify the pieces between the square brackets, which are the possible characters for the set.
\( : A literal parenthesis. The backslash escapes the parenthesis, since otherwise it counts as a regex command. This is the "(" part of the string.
([\w\s]+) : Same as the one above, but this time matches the "House of Blues Dallas " part. In parentheses so they will be captured as the second group.
(\d+/\d+) : Matches the digits 3 and 26 with a slash in the middle. In parentheses so they will be captured as the third group.
\) : Closing parenthesis for the above.
The python intro to regex is quite good, and you might want to spend an evening going over it http://docs.python.org/library/re.html#module-re. Also, check Dive Into Python, which has a friendly introduction: http://diveintopython3.ep.io/regular-expressions.html.
EDIT: See zacherates below, who has some nice edits. Two heads are better than one!
Regular expressions are a great solution to this problem:
>>> import re
>>> s = 'Michael Schenker Group (House of Blues Dallas 3/26'
>>> re.match(r'(.*) \((.*) (\d+/\d+)', s).groups()
('Michael Schenker Group', 'House of Blues Dallas', '3/26')
As a side note, you might want to look at the Universal Feed Parser for handling the RSS parsing as feeds have a bad habit of being malformed.
Edit
In regards to your comment... The strings occasionally being wrapped in "s rather than 's has to do with the fact that you're using repr. The repr of a string is usually delimited with 's, unless that string contains one or more 's, where instead it uses "s so that the 's don't have to be escaped:
>>> "Hello there"
'Hello there'
>>> "it's not its"
"it's not its"
Notice the different quote styles.
Regarding the repr(item.title[0:-1]) part, not sure where you got that from but I'm pretty sure you can simply use item.title. All you're doing is removing the last char from the string and then calling repr() on it, which does nothing.
Your code should look something like this:
import geocoders # from GeoPy
us = geocoders.GeocoderDotUS()
import feedparser # from www.feedparser.org
feedurl = "http://www.tourfilter.com/dallas/rss/by_concert_date"
feed = feedparser.parse(feedurl)
lines = []
for entry in feed.entries:
m = re.search(r'(.*) \((.*) (\d+/\d+)\)', entry.title)
if m:
bandRaw, venue, date = m.groups()
if band == bandRaw:
place, (lat, lng) = us.geocode(venue + ", Dallas, TX")
lines.append(",".join([band, venue, date, lat, lng]))
result = "\n".join(lines)
EDIT: replaced list with lines as the var name. list is a builtin and should not be used as a variable name. Sorry.