Python3 .format() align usage - python

Can anyone help me to change the writing of these lines?
I want to get my code to be more elegant using .format(), but I don't really know how to use it.
print("%3s %-20s %12s" %("Id", "State", "Population"))
print("%3d %-20s %12d" %
(state["id"],
state["name"],
state["population"]))

Your format is easily translated to the str.format() formatting syntax:
print("{:>3s} {:20s} {:>12s}".format("Id", "State", "Population"))
print("{id:3d} {name:20s} {population:12d}".format(**state))
Note that left-alignment is achieved by prefixing the width with <, not -, and default alignment for strings is to left-align, so a > is needed for the header strings and the < can be omitted, but otherwise the formats are closely related.
This extracts the values directly from the state dictionary by using the keys in the format itself.
You may as well just use the actual output result of the first format directly:
print(" Id State Population")
Demo:
>>> state = {'id': 15, 'name': 'New York', 'population': 19750000}
>>> print("{:>3s} {:20s} {:>12s}".format("Id", "State", "Population"))
Id State Population
>>> print("{id:3d} {name:20s} {population:12d}".format(**state))
15 New York 19750000

You can write:
print("{id:>3s} {state:20s} {population:>12s}".format(id='Id', state='State', population='Population'))
print("{id:>3d} {state:20s} {population:>12d}".format(id=state['id'], state=state['name'], population=state['population']))
Note that you have to use > to right-align as the items are left-aligned by default. You can also name the items in the formatted string which makes it more readable to see what value goes where.

Related

Need to check all my variables for an ampersand, I put variables in a list and check using a for loop, but Python only changes value inside list

I've got the following code, but unfortunately it only changes the value inside the list. Is there any way I can change the value outside the list, so it can be used later in the script?
street_number = "100 & 102"
street_name = "Fake Street"
suburb = "Faketown"
allvariables = [street_number, street_name, suburb]
ampersand = "&"
ampersand_escape = "&"
for i, item in enumerate(allvariables):
if isinstance(item, str):
if ampersand in item:
allvariables[i] = item.replace(ampersand,ampersand_escape)
print(allvariables) # -> ['100 & 102', 'Fake Street', 'Faketown']
print(street_number) # -> 100 & 102
The only alternative I can imagine is checking each variable individually, but I've got a LOT of variables that need to be checked so it would take forever:
if ampersand in street_number:
street_number.replace(ampersand,ampersand_escape)
if ampersand in street_name:
street_name.replace(ampersand,ampersand_escape)
if ampersand in suburb:
suburb.replace(ampersand,ampersand_escape)
But that seems extremely time consuming. Thank you in advance for your help!
P.S. just in case - I need to do a few more checks besides the ampersands
Each variable in python (for instance, street_number) is just a reference to something. In this case, street_number is a reference to a string, namely "100 & 102".
When you write allvariables = [street_number, street_name, suburb], you are simply creating a list with elements that have been initialized by the variables. So in your list, position 0 contains a string which was copied from street_number and has the same value "100 & 102", but there is no ongoing linkage to the variable street_number.
So if you update allvariables[0] to be '100 & 102', this will have no effect on the value referenced by the variable street_number.
One way to get the result I think you want would be this:
street_number = "100 & 102"
street_name = "Fake Street"
suburb = "Faketown"
allvariableNames = ['street_number', 'street_name', 'suburb']
ampersand = "&"
ampersand_escape = "&"
ampIndices = [i for i, item in enumerate(allvariableNames) if isinstance(eval(item), str) and ampersand in eval(item)]
for i in ampIndices:
exec(f'{allvariableNames[i]} = {allvariableNames[i]}.replace(ampersand, ampersand_escape)')
print(', '.join(f"'{eval(item)}'" for item in allvariableNames)) # -> ['100 & 102', 'Fake Street', 'Faketown']
print(street_number)
Output:
'100 & 102', 'Fake Street', 'Faketown'
100 & 102
Explanation:
instead of initializing a list using the variables you have in mind, initialize a list with the names of these variables as strings
build a list of the indices into the variable name list for the value of the variable (obtained using the eval() function) contains the search pattern
use exec() to execute a python statement that uses the string name of the variable to update the variable's value by replacing the search pattern with the new string &amp
It seems like all your variables are related to each other, so using a dictionary to store the variables might be a good idea. Like a list, you can look over it, but unlike a list, you can give its members names. Here's some example code:
address = {
"street_number": "100 & 102",
"street_name": "Fake Street",
"suburb": "Faketown",
}
ampersand = "&"
ampersand_escape = "&"
for (item, value) in address.items():
if isinstance(value, str):
if ampersand in value:
address[item] = value.replace(ampersand,ampersand_escape)
print(address)
Strings in Python are immutable which means that once created they cannot be changed. Only a new string can be created. So what you want to do is to store the newly created string back in the same variable.
for example
s = "hello"
s.upper() #does not change s.. only creates a new string and discards it
s = s.upper() # creates the new string but then overrides the value of s
Also, adding strings to the list means any manipulation you do won't affect the original string.

how do i use string.replace() to replace only when the string is exactly matching

I have a dataframe with a list of poorly spelled clothing types. I want them all in the same format , an example is i have "trous" , "trouse" and "trousers", i would like to replace the first 2 with "trousers".
I have tried using string.replace but it seems its getting the first "trous" and changing it to "trousers" as it should and when it gets to "trouse", it works also but when it gets to "trousers" it makes "trousersersers"! i think its taking the strings which contain trous and trouse and trousers and changing them.
Is there a way i can limit the string.replace to just look for exactly "trous".
here's what iv troied so far, as you can see i have a good few changes to make, most of them work ok but its the likes of trousers and t-shirts which have a few similar changes to be made thats causing the upset.
newTypes=[]
for string in types:
underwear = string.replace(('UNDERW'), 'UNDERWEAR').replace('HANKY', 'HANKIES').replace('TIECLI', 'TIECLIPS').replace('FRAGRA', 'FRAGRANCES').replace('ROBE', 'ROBES').replace('CUFFLI', 'CUFFLINKS').replace('WALLET', 'WALLETS').replace('GIFTSE', 'GIFTSETS').replace('SUNGLA', 'SUNGLASSES').replace('SCARVE', 'SCARVES').replace('TROUSE ', 'TROUSERS').replace('SHIRT', 'SHIRTS').replace('CHINO', 'CHINOS').replace('JACKET', 'JACKETS').replace('KNIT', 'KNITWEAR').replace('POLO', 'POLOS').replace('SWEAT', 'SWEATERS').replace('TEES', 'T-SHIRTS').replace('TSHIRT', 'T-SHIRTS').replace('SHORT', 'SHORTS').replace('ZIP', 'ZIP-TOPS').replace('GILET ', 'GILETS').replace('HOODIE', 'HOODIES').replace('HOODZIP', 'HOODIES').replace('JOGGER', 'JOGGERS').replace('JUMP', 'SWEATERS').replace('SWESHI', 'SWEATERS').replace('BLAZE ', 'BLAZERS').replace('BLAZER ', 'BLAZERS').replace('WC', 'WAISTCOATS').replace('TTOP', 'T-SHIRTS').replace('TROUS', 'TROUSERS').replace('COAT', 'COATS').replace('SLIPPE', 'SLIPPERS').replace('TRAINE', 'TRAINERS').replace('DECK', 'SHOES').replace('FLIP', 'SLIDERS').replace('SUIT', 'SUITS').replace('GIFTVO', 'GIFTVOUCHERS')
newTypes.append(underwear)
types = newTypes
Assuming you're okay with not using string.replace(), you can simply do this:
lst = ["trousers", "trous" , "trouse"]
for i in range(len(lst)):
if "trous" in lst[i]:
lst[i] = "trousers"
print(lst)
# Prints ['trousers', 'trousers', 'trousers']
This checks if the shortest substring, trous, is part of the string, and if so converts the entire string to trousers.
Use a dict for string to be replaced:
d={
'trous': 'trouser',
'trouse': 'trouser',
# ...
}
newtypes=[d.get(string,string) for string in types]
d.get(string,string) will return string if string is not in d.

Formatting Python output into rows

So, I'm still sort of new to programming, and I'm trying to format the output of some arrays in Python. I'm finding it hard to wrap my head around some of the aspects of formatting.
I have a few arrays that I want to print, in the format of a table.
headings = ["Name", "Age", "Favourite Colour"]
names = ["Barry", "Eustace", "Clarence", "Razputin", "Harvey"]
age = [39, 83, 90, 15, 23]
favouriteColour = ["Green", "Baby Pink", "Sky Blue", "Orange", "Crimson"]
I want the output to look like this: (where the column widths are a little more than the max length in that column)
Name Age Favourite Colour
Barry 39 Green
Eustace 83 Baby Pink
Clarence 90 Sky Blue
Razputin 15 Orange
Harvey 23 Crimson
I tried to do this:
mergeArr = [headings, name, age, favouriteColour]
but (I think) that won't print the headings in the right place?
I tried this:
mergeArr = [name, age, favouriteColour]
col_width = max(len(str(element)) for row in merge for element in row) + 2
for row in merge:
print ("".join(str(element).ljust(col_width) for element in row))
but that prints the data of each object in columns, rather than rows.
Help is appreciated! Thanks.
You'd print the heading on its own (the one with name, age, favourite colour).
Then you use the code you have, but with:
rows = zip(name, age, favouriteColour)
for row in rows...
You might also look into the tabulate package for nicely formatted tables.
Just adding the extra formatting:
ll = [headings] + list(zip(names, age, favouriteColour))
for l in ll:
print("{:<10}\t{:<2}\t{:<16}".format(*l))
# Name Age Favourite Colour
# Barry 39 Green
# Eustace 83 Baby Pink
# Clarence 90 Sky Blue
# Razputin 15 Orange
# Harvey 23 Crimson
The parts in the curly braces are part of python's new character formatting features, while the TABs serve as delimiters. In sum, the .format() method looks for those curly braces inside the string part to determine what values inside the container l
go where and how those values should be formatted. For example, in the case of the headers, the following is what's happening:
headings = ["Name", "Age", "Favourite Colour"]
print("{:<10}\t{:<3}\t{:<16}".format(*headings))
We use the asterisk (*) in front of the list to unpack the elements inside that list.
The first curly brace is for the string "Name", and it is formatted with :<10 which means that it is adjusted to the left and padded with extra space characters, if the length of the string is less than 10. In essence, it will print all characters in a given string and add extra spaces to the right of that string.
The second curly brace is for "Age" and is formatted with :<3.
The third curly brace is for "Favourite Colour" and is formatted with :<16.
All those strings are delimited with the TAB character.
The combination of the above steps inside the print function yields:
# Name Age Favourite Colour
I hope this proves useful.
Use zip(*iterables):
print(heading)
for row in zip(names, age, favouriteColour):
print(row) # formatting is up to you :)
Jacob Krall is perfectly correct about using zip to combine your lists. Once you've done that, though, if you want your columns to align nicely (assuming you are using Python 3.x) then take a look at the .format() method which is available with strings and as part of the Python print function. This allows you to specify field widths in your output.

Ambiguity in parsing csv file

I am trying to parse a csv file with the following contents:
# country,title1,title2,type
GB,Fast Friends,Burn Notice, S:4, E:2,episode,
SE,The Spiderwick Chronicles,"SPIDERWICK CHRONICLES, THE",movie,
The expected output is:
['SE', 'The Spiderwick Chronicles', '"SPIDERWICK CHRONICLES, THE"', 'movie']
['GB', 'Fast Friends', 'Burn Notice, S:4, E:2', 'episode']
The problem is, the commas in the 'title' fields are not escaped. I tried using csvreader as well as doing string and regex parsing, but was unable to get unambiguous matches.
Is it possible at all to parse this file accurately with unescaped commas on half of the fields? Or, does it require that a new csv be created?
You may be able to play a trick if you can make the assumption that all commas will appear in title2. Otherwise, you have ambiguous data.
strings = ['SE,The Spiderwick Chronicles,"SPIDERWICK CHRONICLES, THE",movie,'
,'GB,Fast Friends,Burn Notice, S:4, E:2,episode,'
]
for string in strings:
xs = string.split(',')
country = xs[0]
title1 = xs[1]
title2 = ' '.join(xs[2:-2])
mtype = xs[-2]
print [country, title1, title2, mtype]
Output:
['SE', 'The Spiderwick Chronicles', '"SPIDERWICK CHRONICLES THE"', 'movie']
['GB', 'Fast Friends', 'Burn Notice S:4 E:2', 'episode']
You can use RegEx (import re) - see documentation
Match for (\".*\",)|(.*,)
This way you're looking either for [quoted string,] or [any string,].
If there are commas in the fields, I would save the excel as text file with fields separated by tab.

Python parsing

I'm trying to parse the title tag in an RSS 2.0 feed into three different variables for each entry in that feed. Using ElementTree I've already parsed the RSS so that I can print each title [minus the trailing )] with the code below:
feed = getfeed("http://www.tourfilter.com/dallas/rss/by_concert_date")
for item in feed:
print repr(item.title[0:-1])
I include that because, as you can see, the item.title is a repr() data type, which I don't know much about.
A particular repr(item.title[0:-1]) printed in the interactive window looks like this:
'randy travis (Billy Bobs 3/21'
'Michael Schenker Group (House of Blues Dallas 3/26'
The user selects a band and I hope to, after parsing each item.title into 3 variables (one each for band, venue, and date... or possibly an array or I don't know...) select only those related to the band selected. Then they are sent to Google for geocoding, but that's another story.
I've seen some examples of regex and I'm reading about them, but it seems very complicated. Is it? I thought maybe someone here would have some insight as to exactly how to do this in an intelligent way. Should I use the re module? Does it matter that the output is currently is repr()s? Is there a better way? I was thinking I'd use a loop like (and this is my pseudoPython, just kind of notes I'm writing):
list = bandRaw,venue,date,latLong
for item in feed:
parse item.title for bandRaw, venue, date
if bandRaw == str(band)
send venue name + ", Dallas, TX" to google for geocoding
return lat,long
list = list + return character + bandRaw + "," + venue + "," + date + "," + lat + "," + long
else
In the end, I need to have the chosen entries in a .csv (comma-delimited) file looking like this:
band,venue,date,lat,long
randy travis,Billy Bobs,3/21,1234.5678,1234.5678
Michael Schenker Group,House of Blues Dallas,3/26,4321.8765,4321.8765
I hope this isn't too much to ask. I'll be looking into it on my own, just thought I should post here to make sure it got answered.
So, the question is, how do I best parse each repr(item.title[0:-1]) in the feed into the 3 separate values that I can then concatenate into a .csv file?
Don't let regex scare you off... it's well worth learning.
Given the examples above, you might try putting the trailing parenthesis back in, and then using this pattern:
import re
pat = re.compile('([\w\s]+)\(([\w\s]+)(\d+/\d+)\)')
info = pat.match(s)
print info.groups()
('Michael Schenker Group ', 'House of Blues Dallas ', '3/26')
To get at each group individual, just call them on the info object:
print info.group(1) # or info.groups()[0]
print '"%s","%s","%s"' % (info.group(1), info.group(2), info.group(3))
"Michael Schenker Group","House of Blues Dallas","3/26"
The hard thing about regex in this case is making sure you know all the known possible characters in the title. If there are non-alpha chars in the 'Michael Schenker Group' part, you'll have to adjust the regex for that part to allow them.
The pattern above breaks down as follows, which is parsed left to right:
([\w\s]+) : Match any word or space characters (the plus symbol indicates that there should be one or more such characters). The parentheses mean that the match will be captured as a group. This is the "Michael Schenker Group " part. If there can be numbers and dashes here, you'll want to modify the pieces between the square brackets, which are the possible characters for the set.
\( : A literal parenthesis. The backslash escapes the parenthesis, since otherwise it counts as a regex command. This is the "(" part of the string.
([\w\s]+) : Same as the one above, but this time matches the "House of Blues Dallas " part. In parentheses so they will be captured as the second group.
(\d+/\d+) : Matches the digits 3 and 26 with a slash in the middle. In parentheses so they will be captured as the third group.
\) : Closing parenthesis for the above.
The python intro to regex is quite good, and you might want to spend an evening going over it http://docs.python.org/library/re.html#module-re. Also, check Dive Into Python, which has a friendly introduction: http://diveintopython3.ep.io/regular-expressions.html.
EDIT: See zacherates below, who has some nice edits. Two heads are better than one!
Regular expressions are a great solution to this problem:
>>> import re
>>> s = 'Michael Schenker Group (House of Blues Dallas 3/26'
>>> re.match(r'(.*) \((.*) (\d+/\d+)', s).groups()
('Michael Schenker Group', 'House of Blues Dallas', '3/26')
As a side note, you might want to look at the Universal Feed Parser for handling the RSS parsing as feeds have a bad habit of being malformed.
Edit
In regards to your comment... The strings occasionally being wrapped in "s rather than 's has to do with the fact that you're using repr. The repr of a string is usually delimited with 's, unless that string contains one or more 's, where instead it uses "s so that the 's don't have to be escaped:
>>> "Hello there"
'Hello there'
>>> "it's not its"
"it's not its"
Notice the different quote styles.
Regarding the repr(item.title[0:-1]) part, not sure where you got that from but I'm pretty sure you can simply use item.title. All you're doing is removing the last char from the string and then calling repr() on it, which does nothing.
Your code should look something like this:
import geocoders # from GeoPy
us = geocoders.GeocoderDotUS()
import feedparser # from www.feedparser.org
feedurl = "http://www.tourfilter.com/dallas/rss/by_concert_date"
feed = feedparser.parse(feedurl)
lines = []
for entry in feed.entries:
m = re.search(r'(.*) \((.*) (\d+/\d+)\)', entry.title)
if m:
bandRaw, venue, date = m.groups()
if band == bandRaw:
place, (lat, lng) = us.geocode(venue + ", Dallas, TX")
lines.append(",".join([band, venue, date, lat, lng]))
result = "\n".join(lines)
EDIT: replaced list with lines as the var name. list is a builtin and should not be used as a variable name. Sorry.

Categories

Resources