So, I'm still sort of new to programming, and I'm trying to format the output of some arrays in Python. I'm finding it hard to wrap my head around some of the aspects of formatting.
I have a few arrays that I want to print, in the format of a table.
headings = ["Name", "Age", "Favourite Colour"]
names = ["Barry", "Eustace", "Clarence", "Razputin", "Harvey"]
age = [39, 83, 90, 15, 23]
favouriteColour = ["Green", "Baby Pink", "Sky Blue", "Orange", "Crimson"]
I want the output to look like this: (where the column widths are a little more than the max length in that column)
Name Age Favourite Colour
Barry 39 Green
Eustace 83 Baby Pink
Clarence 90 Sky Blue
Razputin 15 Orange
Harvey 23 Crimson
I tried to do this:
mergeArr = [headings, name, age, favouriteColour]
but (I think) that won't print the headings in the right place?
I tried this:
mergeArr = [name, age, favouriteColour]
col_width = max(len(str(element)) for row in merge for element in row) + 2
for row in merge:
print ("".join(str(element).ljust(col_width) for element in row))
but that prints the data of each object in columns, rather than rows.
Help is appreciated! Thanks.
You'd print the heading on its own (the one with name, age, favourite colour).
Then you use the code you have, but with:
rows = zip(name, age, favouriteColour)
for row in rows...
You might also look into the tabulate package for nicely formatted tables.
Just adding the extra formatting:
ll = [headings] + list(zip(names, age, favouriteColour))
for l in ll:
print("{:<10}\t{:<2}\t{:<16}".format(*l))
# Name Age Favourite Colour
# Barry 39 Green
# Eustace 83 Baby Pink
# Clarence 90 Sky Blue
# Razputin 15 Orange
# Harvey 23 Crimson
The parts in the curly braces are part of python's new character formatting features, while the TABs serve as delimiters. In sum, the .format() method looks for those curly braces inside the string part to determine what values inside the container l
go where and how those values should be formatted. For example, in the case of the headers, the following is what's happening:
headings = ["Name", "Age", "Favourite Colour"]
print("{:<10}\t{:<3}\t{:<16}".format(*headings))
We use the asterisk (*) in front of the list to unpack the elements inside that list.
The first curly brace is for the string "Name", and it is formatted with :<10 which means that it is adjusted to the left and padded with extra space characters, if the length of the string is less than 10. In essence, it will print all characters in a given string and add extra spaces to the right of that string.
The second curly brace is for "Age" and is formatted with :<3.
The third curly brace is for "Favourite Colour" and is formatted with :<16.
All those strings are delimited with the TAB character.
The combination of the above steps inside the print function yields:
# Name Age Favourite Colour
I hope this proves useful.
Use zip(*iterables):
print(heading)
for row in zip(names, age, favouriteColour):
print(row) # formatting is up to you :)
Jacob Krall is perfectly correct about using zip to combine your lists. Once you've done that, though, if you want your columns to align nicely (assuming you are using Python 3.x) then take a look at the .format() method which is available with strings and as part of the Python print function. This allows you to specify field widths in your output.
Related
Ive written a program which takes in the name and age of multiple entries seperated by a comma and then sepearates the aplhabets from the numerics and then compares the name with a pre defined set/list.
If the entry doesnt match with the pre defined data, the program sends a message"incorrect entry" along with the element which didnt match.
heres the code:
from string import digits
print("enter name and age")
order=input("Seperate entries using a comma ',':")
order1=order.strip()
order2=order1.replace(" ","")
order_sep=order2.split()
removed_digits=str.maketrans('','',digits)
names=order.translate(removed_digits)
print(names)
names1=names.split(',')
names_list=['abby','chris','john','cena']
names_list=set(names_list)
for name in names1:
if name not in names_list:
print(f"{name}:doesnt match with predefined data")
the problem im having is even when i enter chris or john, the program treats them as they dont belong to the pre defined list
sample input : ravi 19,chris 20
output:ravi ,chris
ravi :doesnt match with predefined data
chris :doesnt match with predefined data
also i have another issue , ive written a part to eliminate whitespace but i dont know why, it doesnt elimintae them
sample input:ravi , chris
ravi :doesnt match with predefined data
()chris :doesnt match with predefined data
theres a space where ive put parenthesis.
any suggestion to tackle this problem and/or improve this code is appreciated!
I think some of the parts can be simplified, especially when removing the digits. As long as the input is entered with a space between the name and age, you can use split() twice. First to separate the entries with split(',') and next to separate out the ages with split(). It makes comparisons easier later if you store the names by themselves with no punctuation or whitespace around them. To print the names out from an iterable, you can use the str.join() function. Here is an example:
print("enter name and age")
order = input("Seperate entries using a comma ',': ")
names1 = [x.split()[0] for x in order.split(',')]
print(', '.join(names1))
names_list=['abby', 'chris', 'john', 'cena']
for name in names1:
if name not in names_list:
print(f"{name}:doesnt match with predefined data")
This will give the desired output:
enter name and age
Seperate entries using a comma ',': ravi 19, chris 20
ravi, chris
ravi:doesnt match with predefined data
I am currently working on a project, my task here right now is I need to specify whether the device is 2G or not based on the Bands given in the Bands column. For example,
Device ID |Bands|2G(New added column)
123 |GSM 1800, GSM 700 |
124 | GSM 1800, GSM 700, GSM 1, LTE TDD |
125 | TD-SCDMA,1 SIM |
126 |GSM850 (GSM800),WCDMA FDD Band I,WCDMA FDD Band VIII,2 SIM |
So if the column "Bands" only contains the word "GSM" then it is 2G, else, N.
I have tried using the re module but I am stuck at some point.
import re
import csv
...
two_G_only = []
...
with open('filepath.txt', "rU") as f:
reader = csv.DictReader(f, delimiter = "|")
for row in reader:
...
...
if 'GSM' in row['Bands']:
gsm_only = " ".join(re.findall("[a-zA-Z]+", row['Bands']))
#Im stuck at here because I don't know how to test whether there is only GSM or else
else:
two_G_only.append('N')
...
...
What do I need for the result
Device ID | Bands | 2G
123 | GSM 1800, GSM 700 | Y
124 |GSM 1800, GSM 700, GSM 1, LTE TDD | N
125 |TD-SCDMA,1 SIM | N
126 |GSM850 (GSM800),WCDMA FDD Band I,WCDMA FDD Band VIII,2 SIM|N
Thank you in advance, do comment if my question is not understandable. I already searched some solutions given in the site yet I am sure the question asked is not the same problematic/concept.
You show data separated into columns with tabs or spaces, but your code indicates that you are using a vertical bar (|) as delimiter. I'm not sure which is right, but that's your problem.
Your condition, as I understand it, is to look at the various subfields in the second column, delimited with commas, and return one value (true) if each and every one of the subfields contains the text string 'GSM' anywhere in the subfield, but to return a different value (false) if at least one of the subfields DOES NOT contain that string. Right?
Let us then presume you have your csv reader in reader, as shown in your example. The for-row-in loop is correct, because you want to do this computation separately for every row.
for row in reader:
Within that loop, you need access to the Bands column:
bands = row['Bands']
In order to examine the subfields, let's use the basic str.split function, splitting the subfields by commas:
subfields = bands.split(',')
Now, let's convert that list of strings into a list of boolean values, and use Python's built-in any function to evaluate the entire list. We will do this with a list comprehension:
if any( [ ('GSM' not in band) for band in subfields ] ):
_2g_or_not_2g = 'N'
else:
_2g_or_not_2g = 'Y'
This if-statement will do roughly what it says: it will match if any one of the bands fails to contain 'GSM'.
There are some other ways you could write this code. For example, you could make the negative test into a positive test by using the Python all function. This would reverse the "sense" of the if statement, and switch the arms:
if all( [ 'GSM' in band for band in subfields ] ):
_2g_or_not_2g = 'Y'
else:
_2g_or_not_2g = 'N'
Also, you could use the ... if condition modifier on the list comprehension to filter the list down to a smaller list.
Finally, of course, you can start merging the expressions into one another - replace subfields with the actual split expression, etc.
One thing to notice that on every row, the bands are each separated by a comma.
You can take advantage of this.
The split() function can give you a list of strings for that row, each containing the name of a single band.
Now the problem is much simpler: If any individual band is missing the substring 'GSM', then that row is disqualified: Return 'N'.
If none of the bands in that row are disqualified, (ie all contain 'GSM in the name), then return 'Y' for the row.
You can use the find() function to see if a string contains a given substring.
For example 'LTE TDD'.find('GSM') returns the value -1, because it does not.
Notice that you do not even need to remove the device id - it can be part of the item that includes the first band. Keeping it simple: all you want to know is if, on any given row, all blocks of text (segregated by commas) contain the substring 'GSM' ..or not.
def is_GSM(bands):
for band in bands:
if (band.find('GSM') = -1:
return('N')
return('Y')
for row in reader:
bands = row.split(',')
two_G_only.append(is_GSM(bands))
```
def is_GSM(bands):
for band in bands:
if (band.find('GSM') = -1:
# "GSM" wasn't in the band name
return('N')
# we looked at all the bands, and did not find a disqualifier..
# This row must be "GSM' only 2G bands.
return('Y')
for row in reader:
# not needed: first strip off the device_id in this row's string.
# row = row[3:]
bands = row.split(',')
# ie: bands = ['124 GSM 1800', ' GSM 700', ' GSM 1', ' LTE TDD']
# send this list to is_GSM(), and append the result
two_G_only.append(all_are_GSM(bands))
```
If I have these names:
bob = "Bob 1"
james = "James 2"
longname = "longname 3"
And priting these gives me:
Bob 1
James 2
longname 3
How can I make sure that the numbers would be aligned (without using \t or tabs or anything)? Like this:
Bob 1
James 2
longname3
This is a good use for a format string, which can specify a width for a field to be filled with a character (including spaces). But, you'll have to split() your strings first if they're in the format at the top of the post. For example:
"{: <10}{}".format(*bob.split())
# output: 'Bob 1'
The < means left align, and the space before it is the character that will be used to "fill" the "emtpy" part of that number of characters. Doesn't have to be spaces. 10 is the number of spaces and the : is just to prevent it from thinking that <10 is supposed to be the name of the argument to insert here.
Based on your example, it looks like you want the width to be based on the longest name. In which case you don't want to hardcode 10 like I just did. Instead you want to get the longest length. Here's a better example:
names_and_nums = [x.split() for x in (bob, james, longname)]
longest_length = max(len(name) for (name, num) in names_and_nums)
format_str = "{: <" + str(longest_length) + "}{}"
for name, num in names_and_nums:
print(format_str.format(name, num))
See: Format specification docs
Given a list of actors, with their their character name in brackets, separated by either a semi-colon (;) or comm (,):
Shelley Winters [Ruby]; Millicent Martin [Siddie]; Julia Foster [Gilda];
Jane Asher [Annie]; Shirley Ann Field [Carla]; Vivien Merchant [Lily];
Eleanor Bron [Woman Doctor], Denholm Elliott [Mr. Smith; abortionist];
Alfie Bass [Harry]
How would I parse this into a list of two-typles in the form of [(actor, character),...]
--> [('Shelley Winters', 'Ruby'), ('Millicent Martin', 'Siddie'),
('Denholm Elliott', 'Mr. Smith; abortionist')]
I originally had:
actors = [item.strip().rstrip(']') for item in re.split('\[|,|;',data['actors'])]
data['actors'] = [(actors[i], actors[i + 1]) for i in range(0, len(actors), 2)]
But this doesn't quite work, as it also splits up items within brackets.
You can go with something like:
>>> re.findall(r'(\w[\w\s\.]+?)\s*\[([\w\s;\.,]+)\][,;\s$]*', s)
[('Shelley Winters', 'Ruby'),
('Millicent Martin', 'Siddie'),
('Julia Foster', 'Gilda'),
('Jane Asher', 'Annie'),
('Shirley Ann Field', 'Carla'),
('Vivien Merchant', 'Lily'),
('Eleanor Bron', 'Woman Doctor'),
('Denholm Elliott', 'Mr. Smith; abortionist'),
('Alfie Bass', 'Harry')]
One can also simplify some things with .*?:
re.findall(r'(\w.*?)\s*\[(.*?)\][,;\s$]*', s)
inputData = inputData.replace("];", "\n")
inputData = inputData.replace("],", "\n")
inputData = inputData[:-1]
for line in inputData.split("\n"):
actorList.append(line.partition("[")[0])
dataList.append(line.partition("[")[2])
togetherList = zip(actorList, dataList)
This is a bit of a hack, and I'm sure you can clean it up from here. I'll walk through this approach just to make sure you understand what I'm doing.
I am replacing both the ; and the , with a newline, which I will later use to split up every pair into its own line. Assuming your content isn't filled with erroneous ]; or ], 's this should work. However, you'll notice the last line will have a ] at the end because it didn't have a need a comma or semi-colon. Thus, I splice it off with the third line.
Then, just using the partition function on each line that we created within your input string, we assign the left part to the actor list, the right part to the data list and ignore the bracket (which is at position 1).
After that, Python's very useful zip funciton should finish the job for us by associating the ith element of each list together into a list of matched tuples.
I'm trying to parse the title tag in an RSS 2.0 feed into three different variables for each entry in that feed. Using ElementTree I've already parsed the RSS so that I can print each title [minus the trailing )] with the code below:
feed = getfeed("http://www.tourfilter.com/dallas/rss/by_concert_date")
for item in feed:
print repr(item.title[0:-1])
I include that because, as you can see, the item.title is a repr() data type, which I don't know much about.
A particular repr(item.title[0:-1]) printed in the interactive window looks like this:
'randy travis (Billy Bobs 3/21'
'Michael Schenker Group (House of Blues Dallas 3/26'
The user selects a band and I hope to, after parsing each item.title into 3 variables (one each for band, venue, and date... or possibly an array or I don't know...) select only those related to the band selected. Then they are sent to Google for geocoding, but that's another story.
I've seen some examples of regex and I'm reading about them, but it seems very complicated. Is it? I thought maybe someone here would have some insight as to exactly how to do this in an intelligent way. Should I use the re module? Does it matter that the output is currently is repr()s? Is there a better way? I was thinking I'd use a loop like (and this is my pseudoPython, just kind of notes I'm writing):
list = bandRaw,venue,date,latLong
for item in feed:
parse item.title for bandRaw, venue, date
if bandRaw == str(band)
send venue name + ", Dallas, TX" to google for geocoding
return lat,long
list = list + return character + bandRaw + "," + venue + "," + date + "," + lat + "," + long
else
In the end, I need to have the chosen entries in a .csv (comma-delimited) file looking like this:
band,venue,date,lat,long
randy travis,Billy Bobs,3/21,1234.5678,1234.5678
Michael Schenker Group,House of Blues Dallas,3/26,4321.8765,4321.8765
I hope this isn't too much to ask. I'll be looking into it on my own, just thought I should post here to make sure it got answered.
So, the question is, how do I best parse each repr(item.title[0:-1]) in the feed into the 3 separate values that I can then concatenate into a .csv file?
Don't let regex scare you off... it's well worth learning.
Given the examples above, you might try putting the trailing parenthesis back in, and then using this pattern:
import re
pat = re.compile('([\w\s]+)\(([\w\s]+)(\d+/\d+)\)')
info = pat.match(s)
print info.groups()
('Michael Schenker Group ', 'House of Blues Dallas ', '3/26')
To get at each group individual, just call them on the info object:
print info.group(1) # or info.groups()[0]
print '"%s","%s","%s"' % (info.group(1), info.group(2), info.group(3))
"Michael Schenker Group","House of Blues Dallas","3/26"
The hard thing about regex in this case is making sure you know all the known possible characters in the title. If there are non-alpha chars in the 'Michael Schenker Group' part, you'll have to adjust the regex for that part to allow them.
The pattern above breaks down as follows, which is parsed left to right:
([\w\s]+) : Match any word or space characters (the plus symbol indicates that there should be one or more such characters). The parentheses mean that the match will be captured as a group. This is the "Michael Schenker Group " part. If there can be numbers and dashes here, you'll want to modify the pieces between the square brackets, which are the possible characters for the set.
\( : A literal parenthesis. The backslash escapes the parenthesis, since otherwise it counts as a regex command. This is the "(" part of the string.
([\w\s]+) : Same as the one above, but this time matches the "House of Blues Dallas " part. In parentheses so they will be captured as the second group.
(\d+/\d+) : Matches the digits 3 and 26 with a slash in the middle. In parentheses so they will be captured as the third group.
\) : Closing parenthesis for the above.
The python intro to regex is quite good, and you might want to spend an evening going over it http://docs.python.org/library/re.html#module-re. Also, check Dive Into Python, which has a friendly introduction: http://diveintopython3.ep.io/regular-expressions.html.
EDIT: See zacherates below, who has some nice edits. Two heads are better than one!
Regular expressions are a great solution to this problem:
>>> import re
>>> s = 'Michael Schenker Group (House of Blues Dallas 3/26'
>>> re.match(r'(.*) \((.*) (\d+/\d+)', s).groups()
('Michael Schenker Group', 'House of Blues Dallas', '3/26')
As a side note, you might want to look at the Universal Feed Parser for handling the RSS parsing as feeds have a bad habit of being malformed.
Edit
In regards to your comment... The strings occasionally being wrapped in "s rather than 's has to do with the fact that you're using repr. The repr of a string is usually delimited with 's, unless that string contains one or more 's, where instead it uses "s so that the 's don't have to be escaped:
>>> "Hello there"
'Hello there'
>>> "it's not its"
"it's not its"
Notice the different quote styles.
Regarding the repr(item.title[0:-1]) part, not sure where you got that from but I'm pretty sure you can simply use item.title. All you're doing is removing the last char from the string and then calling repr() on it, which does nothing.
Your code should look something like this:
import geocoders # from GeoPy
us = geocoders.GeocoderDotUS()
import feedparser # from www.feedparser.org
feedurl = "http://www.tourfilter.com/dallas/rss/by_concert_date"
feed = feedparser.parse(feedurl)
lines = []
for entry in feed.entries:
m = re.search(r'(.*) \((.*) (\d+/\d+)\)', entry.title)
if m:
bandRaw, venue, date = m.groups()
if band == bandRaw:
place, (lat, lng) = us.geocode(venue + ", Dallas, TX")
lines.append(",".join([band, venue, date, lat, lng]))
result = "\n".join(lines)
EDIT: replaced list with lines as the var name. list is a builtin and should not be used as a variable name. Sorry.