I was wondering why I should use something like this:
name = "Doe"
surname = "John"
print("He is {0} {1}".format(surname, name))
Instead of:
name = "Doe"
surname = "John"
print("He is" + surname + " " + name)
For starters, try doing this with +:
>>> concatenate_me = (1,2,99999,100,600, 80)
>>>'{0} {0} {2} {2} {1} {2} {3} {5} {5} {4} {0} {2}'.format(*concatenate_me)
.format() benefits:
Contains placeholders, i.e...{0}..{1}..{2}. Using .format, arguments passed are substituted into their respective placeholders (based on their order). This allows you to re-use arguments, as seen in the example above.
In each replacement with .format, you have a format specification (:). This specification allows you control with respect to many properties for each substitution you make, and there's a whole mini-language for it.
Additionally, .format is a function, which you can pass as an argument when needed. In Python 3 it is called advanced string formatting as it is much more powerful than simple concatenation.
You can do some pretty wild and flexible things if you really want using the .format function as well, for instance:
>>>'Python {0.version_info[0]:!<13.2%}'.format(sys)
'Python 300.00%!!!!!!'
And one further example with a dictionary, to display its ability to take keyword arguments:
>>>my_dict = { 'adjective': 'cool', 'function':'format'}
>>>"Look how awesome my {adjective} Python {function} skills are!".format(**my_dict)
'Look how awesome my cool Python format skills are.'
There's some further examples and uses in the Python docs.
format is much more powerful, and as you can see in the other answer, you can do a loot of cool things with it. However, I would like to add that format is not the fastest (at least in python 3.4 on ubuntu 14.04). For simple formatting, plus notation is faster. For example:
import timeit
print(timeit.timeit("name = \"Doe\"; surname = \"John\"; 'He is {0} {1}'.format(surname, name)", number=100000))
# 0.04642631400201935
print(timeit.timeit("name = \"Doe\"; surname = \"John\"; \"He is\" + surname + \" \" + name", number=100000))
# 0.01718082799925469
Related
I am using following python regex code to analyze values from the To field of an email:
import re
PATTERN = re.compile(r'''((?:[^(;|,)"']|"[^"]*"|'[^']*')+)''')
list = PATTERN.split(raw)[1::2]
The list should output the name and address of each recipient, based on either "," or ";" as seperator. If these values are within quotes, they are to be ignorded, this is part of the name, often: "Last Name, First Name"
Most of the times this works well, however in the following case I am getting unexpected behaviour:
"Some Name | Company Name" <name#example.com>
In this case it is splitting on the "|" character. Even though when I check the pattern on regex tester websites, it selects the name and address as a whole. What am I doing wrong?
Example input would be:
"Some Name | Company Name" <name1#example.com>, "Some Other Name | Company Name" <name2#example.com>, "Last Name, First Name" <name3#example.com>
This is not a direct answer to your question but to the problem you seem to be solving and therefore maybe still helpful:
To parse emails I always make extensive use of Python's email library.
In your case you could use something like this:
from email.utils import getaddresses
from email import message_from_string
msg = message_from_string(str_with_msg_source)
tos = msg.get_all('to', [])
ccs = msg.get_all('cc', [])
resent_tos = msg.get_all('resent-to', [])
resent_ccs = msg.get_all('resent-cc', [])
all_recipients = getaddresses(tos + ccs + resent_tos + resent_ccs)
for (name, address) in all_recipients:
# do some postprocessing on name or address if necessary
This always took reliable care of splitting names and addresses in mail headers in my cases.
You can use a much simpler regex using look arounds to split the text.
r'(?<=>)\s*,\s*(?=")'
Regex Explanation
\s*,\s* matches , which is surrounded by zero or more spaces (\s*)
(?<=>) Look behind assertion. Checks if the , is preceded by a >
(?=") Look ahead assertion. Checks if the , is followed by a "
Test
>>> re.split(r'(?<=>)\s*,\s*(?=")', string)
['"Some Name | Company Name" <name1#example.com>', '"Some Other Name | Company Name" <name2#example.com>', '"Last Name, First Name" <name3#example.com>']
Corrections
Case 1 In the above example, we used a single delimiter ,. If yo wish to split on basis of more than one delimiters you can use a character class
r'(?<=>)\s*[,;]\s*(?=")'
[,;] Character class, matches , or ;
Case 2 As mentioned in comments, if the address part is missing, all we need to do is to add " to the look behind
Example
>>> string = '"Some Other Name | Company Name" <name2#example.com>, "Some Name, Nothing", "Last Name, First Name" <name3#example.com>'
>>> re.split(r'(?<=(?:>|"))\s*[,;]\s*(?=")', string)
['"Some Other Name | Company Name" <name2#example.com>', '"Some Name, Nothing"', '"Last Name, First Name" <name3#example.com>']
The following gives a syntax error:
my eyes = 'Brown' my_hair = 'Brown'
print "Hes got %s and %s hair" % (my_eyes, my_hair)
The only way this seems to work is if I put Brown, Brown in the last parenthesis.
You're incorrectly assigning, you should try to unpack the tuple of strings into two variables. In addition, Python variables can not contain spaces so you'll want to use an underscore for eyes.
my_eyes, my_hair = 'Brown', 'Brown' # unpacking tuple here
Also, I suggest you use the format method which is more common. That style is deprecated.
print "He's got {0} and {1} hair".format(my_eyes, my_hair)
The problem turned out to be that the period at the end of the print statement was outside of the parenthesis. This now works: % (eyes, hair). The format version also works now.
Here's your variables:
name = "some name"
Age = 57
Height = 64
Weight = 135
Eyes = "brown"
Teeth = "white"
Hair = "brown"
To print a string with variables, use str.format.
print "Let's talk about {}".format(name)
print "She's {} inches tall".format(Height)
... So on
Make sure that your variables contain no spaces. They're case sensitive too :)
im sure this is simple but im not good with regexp or string manipulation and i want to learn :)
I have an output from a string I get using snimpy. it looks like this:
ARRIS DOCSIS 3.0 Touchstone WideBand Cable Modem <<HW_REV: 1; VENDOR: Arris Interactive, L.L.C.; BOOTR: 1.2.1.62; SW_REV: 7.3.123; MODEL: CM820A>>
I want to be able to look into that string and use that info in an if to then print some stuff. I want to see if the model is a CM820A and then check the firmware version SW_REV and if its not the right version I want to print the version else I move on to the next string i get from my loop.
host.sysDescr it what returns the above string. as of now I know how to find all the CM820A but then i get sloppy when I try to verify the firmware version.
sysdesc = host.sysDescr
if "CM820A" in str(sysdesc):
if "7.5.125" not in str(sysdesc):
print("Modem CM820A " + modem + " at version " + version)
print(" Sysdesc = " + sysdesc)
if "7.5.125" in sysdesc:
print ("Modem CM820A " + modem + " up to date")
Right now I am able to see if the CM820A has the right version easily but I can't print only the version of the bad modems. I was only able to print the whole string which contains a lot of useless info. I just want to print form that string the SW_REV value.
Question
I need help with how to do this then I will understand better and be able to rewrite this whole thing which I currently am using only to learn python but I want to put to practice for useful purposes.
All you need is split() , you can split your string with a special character for example see the following :
>>> l= s.split(';')
['ARRIS DOCSIS 3.0 Touchstone WideBand Cable Modem <<HW_REV: 1', ' VENDOR: Arris Interactive, L.L.C.', ' BOOTR: 1.2.1.62', ' SW_REV: 7.3.123', ' MODEL: CM820A>>']
>>> for i in l :
... if 'BOOTR' in i:
... print i.split(':')
...
[' BOOTR', ' 1.2.1.62']
So then you can get the second element easily with indexing !
This answer will simply explain how to retrieve your desired information.
You will need to perform multiple splits on your data.
First, I notice that your string's information is subdivided by semi-colons.
so:
description_list = sysdesc.split(";")
will create a list of your major sections. since the sysdesc string has a standard format, you can then access the proper substring:
sub_string = description_list[3]
now, split the substring with the colon:
revision_list = sub_string.split(":")
now, just reference:
revision_list[1]
whenever you want to print it.
d = {
"key": "Impress the playing crowd with these classic "
"Playing Cards \u00a9 Personalized Coasters.These beautiful"
" coasters are made from glass, and measure approximately 4\u201d x 4\u201d (inches)"
".Great to look at, and lovely to the touch.There are 4 coasters in a set.We have "
"created this exclusive design for all card lovers.Each coaster is a different suit, "
"with the underneath.Make your next Bridge, or Teen Patti session uber-personal!"
"Will look great on the bar, or any tabletop.Gift Designed for: Couples, Him, "
"HerOccasion:Diwali, Bridge, Anniversary, Birthday"}
i have tried the replace function on it but didn't work.
s = d[key].replace('\u00a9','')
If you want to remove all Unicode characters from a string, you can use string.encode("ascii", "ignore").
It tries to encode the string to ASCII, and the second parameter ignore tells it to ignore characters that it can't convert (all Unicode chars) instead of throwing an exception as it would normally do without that second parameter, so it returns a string with only the chars that could successfully be converted, thus removing all Unicode characters.
Example usage :
unicodeString = "Héllò StàckOvèrflow"
print(unicodeString.encode("ascii", "ignore")) # prints 'Hll StckOvrflow'
More info : str.encode() and Unicode in the Python documentation.
d['key'].decode('unicode-escape').encode('ascii', 'ignore')
is what you are looking for
>>> d = {
... "key": "Impress the playing crowd with these classic "
... "Playing Cards \u00a9 Personalized Coasters.These beautiful"
... " coasters are made from glass, and measure approximately 4\u201d x 4\u201d (inches)"
... ".Great to look at, and lovely to the touch.There are 4 coasters in a set.We have "
... "created this exclusive design for all card lovers.Each coaster is a different suit, "
... "with the underneath.Make your next Bridge, or Teen Patti session uber-personal!"
... "Will look great on the bar, or any tabletop.Gift Designed for: Couples, Him, "
... "HerOccasion:Diwali, Bridge, Anniversary, Birthday"}
>>> d['key'].decode('unicode-escape').encode('ascii', 'ignore')
'Impress the playing crowd with these classic Playing Cards Personalized Coasters.These beautiful coasters are made from glass, and measure approximately 4 x 4 (inches).Great to look at, and lovely to the touch.There are 4 coasters in a set.We have created this exclusive design for all card lovers.Each coaster is a different suit, with the underneath.Make your next Bridge, or Teen Patti session uber-personal!Will look great on the bar, or any tabletop.Gift Designed for: Couples, Him, HerOccasion:Diwali, Bridge, Anniversary, Birthday'
>>>
In order to remove characters represented by unicode escape sequences, you need to use a unicode string.
For example,
s = d[key].replace(u'\u00a9', '')
However, as people have mentioned in comments, removing the copyright symbol might be a very bad idea, though it depends on what you're actually doing with the string.
I'm trying to parse the title tag in an RSS 2.0 feed into three different variables for each entry in that feed. Using ElementTree I've already parsed the RSS so that I can print each title [minus the trailing )] with the code below:
feed = getfeed("http://www.tourfilter.com/dallas/rss/by_concert_date")
for item in feed:
print repr(item.title[0:-1])
I include that because, as you can see, the item.title is a repr() data type, which I don't know much about.
A particular repr(item.title[0:-1]) printed in the interactive window looks like this:
'randy travis (Billy Bobs 3/21'
'Michael Schenker Group (House of Blues Dallas 3/26'
The user selects a band and I hope to, after parsing each item.title into 3 variables (one each for band, venue, and date... or possibly an array or I don't know...) select only those related to the band selected. Then they are sent to Google for geocoding, but that's another story.
I've seen some examples of regex and I'm reading about them, but it seems very complicated. Is it? I thought maybe someone here would have some insight as to exactly how to do this in an intelligent way. Should I use the re module? Does it matter that the output is currently is repr()s? Is there a better way? I was thinking I'd use a loop like (and this is my pseudoPython, just kind of notes I'm writing):
list = bandRaw,venue,date,latLong
for item in feed:
parse item.title for bandRaw, venue, date
if bandRaw == str(band)
send venue name + ", Dallas, TX" to google for geocoding
return lat,long
list = list + return character + bandRaw + "," + venue + "," + date + "," + lat + "," + long
else
In the end, I need to have the chosen entries in a .csv (comma-delimited) file looking like this:
band,venue,date,lat,long
randy travis,Billy Bobs,3/21,1234.5678,1234.5678
Michael Schenker Group,House of Blues Dallas,3/26,4321.8765,4321.8765
I hope this isn't too much to ask. I'll be looking into it on my own, just thought I should post here to make sure it got answered.
So, the question is, how do I best parse each repr(item.title[0:-1]) in the feed into the 3 separate values that I can then concatenate into a .csv file?
Don't let regex scare you off... it's well worth learning.
Given the examples above, you might try putting the trailing parenthesis back in, and then using this pattern:
import re
pat = re.compile('([\w\s]+)\(([\w\s]+)(\d+/\d+)\)')
info = pat.match(s)
print info.groups()
('Michael Schenker Group ', 'House of Blues Dallas ', '3/26')
To get at each group individual, just call them on the info object:
print info.group(1) # or info.groups()[0]
print '"%s","%s","%s"' % (info.group(1), info.group(2), info.group(3))
"Michael Schenker Group","House of Blues Dallas","3/26"
The hard thing about regex in this case is making sure you know all the known possible characters in the title. If there are non-alpha chars in the 'Michael Schenker Group' part, you'll have to adjust the regex for that part to allow them.
The pattern above breaks down as follows, which is parsed left to right:
([\w\s]+) : Match any word or space characters (the plus symbol indicates that there should be one or more such characters). The parentheses mean that the match will be captured as a group. This is the "Michael Schenker Group " part. If there can be numbers and dashes here, you'll want to modify the pieces between the square brackets, which are the possible characters for the set.
\( : A literal parenthesis. The backslash escapes the parenthesis, since otherwise it counts as a regex command. This is the "(" part of the string.
([\w\s]+) : Same as the one above, but this time matches the "House of Blues Dallas " part. In parentheses so they will be captured as the second group.
(\d+/\d+) : Matches the digits 3 and 26 with a slash in the middle. In parentheses so they will be captured as the third group.
\) : Closing parenthesis for the above.
The python intro to regex is quite good, and you might want to spend an evening going over it http://docs.python.org/library/re.html#module-re. Also, check Dive Into Python, which has a friendly introduction: http://diveintopython3.ep.io/regular-expressions.html.
EDIT: See zacherates below, who has some nice edits. Two heads are better than one!
Regular expressions are a great solution to this problem:
>>> import re
>>> s = 'Michael Schenker Group (House of Blues Dallas 3/26'
>>> re.match(r'(.*) \((.*) (\d+/\d+)', s).groups()
('Michael Schenker Group', 'House of Blues Dallas', '3/26')
As a side note, you might want to look at the Universal Feed Parser for handling the RSS parsing as feeds have a bad habit of being malformed.
Edit
In regards to your comment... The strings occasionally being wrapped in "s rather than 's has to do with the fact that you're using repr. The repr of a string is usually delimited with 's, unless that string contains one or more 's, where instead it uses "s so that the 's don't have to be escaped:
>>> "Hello there"
'Hello there'
>>> "it's not its"
"it's not its"
Notice the different quote styles.
Regarding the repr(item.title[0:-1]) part, not sure where you got that from but I'm pretty sure you can simply use item.title. All you're doing is removing the last char from the string and then calling repr() on it, which does nothing.
Your code should look something like this:
import geocoders # from GeoPy
us = geocoders.GeocoderDotUS()
import feedparser # from www.feedparser.org
feedurl = "http://www.tourfilter.com/dallas/rss/by_concert_date"
feed = feedparser.parse(feedurl)
lines = []
for entry in feed.entries:
m = re.search(r'(.*) \((.*) (\d+/\d+)\)', entry.title)
if m:
bandRaw, venue, date = m.groups()
if band == bandRaw:
place, (lat, lng) = us.geocode(venue + ", Dallas, TX")
lines.append(",".join([band, venue, date, lat, lng]))
result = "\n".join(lines)
EDIT: replaced list with lines as the var name. list is a builtin and should not be used as a variable name. Sorry.