I have a list:
grades= ['doe john 100 90 80 90', 'miller sally 70 90 60 100 80', 'smith jakob 45 55 50 58', 'white jack 85 95 65 80 75']
I want to be able to break that list so the output would be:
['doe john 100 90 80 90']
['miller sally 70 90 60 100 80']
['smith jakob 45 55 50 58']
['white jack 85 95 65 80 75']
Additionally, I would like to split the elements in the list so it looks like:
['doe', 'john', '100', '90', '80', '90']
['miller', 'sally', '70', '90', '60', '100', '80']
['smith', 'jakob', '45', '55', '50', '58']
['white', 'jack', '85', '95', '65', '80', '75']
I'm not really sure how to go about doing this or if this is even possible as I'm just starting to learn python. Any ideas?
for l in grades:
l = l.split()
OR
final = [l.split() for l in grades]
See Split string on whitespace in Python
This can be done quickly with .split() in a list comprehension.
grades = ['doe john 100 90 80 90', 'miller sally 70 90 60 100 80', 'smith jakob 45 55 50 58', 'white jack 85 95 65 80 75']
grades = [grade.split() for grade in grades]
print (grades)
Related
I've been working on this lab project to take in some tsv values of a "grade book" run some calculations, and finally output them to a new report.txt file but whenever I get to the end to print out my report.txt the format of the file doesn't match what I'd expect from a row by row tsv file and is instead a giant list.
So my question is how can I convert my list into a proper line by line tsv like below?
Barrett Edan 70 45 59 F
Bradshaw Reagan 96 97 88 A
Charlton Caius 73 94 80 B
Mayo Tyrese 88 61 36 D
Stern Brenda 90 86 45 C
Averages: midterm1 83.40, midterm2 76.60, final 61.60
My input is commented out in the StudentInfo.tsv= below
My current output is listed below the code.
# TODO: Declare any necessary variables here.
import csv
studentgrades=[]
s_grades=[]
all_grades=[]
all_grades_computed=[]
# TODO: Read a file name from the user and read the tsv file here.
#StudentInfo.tsv= Barrett Edan 70 45 59
# Bradshaw Reagan 96 97 88
# Charlton Caius 73 94 80
# Mayo Tyrese 88 61 36
# Stern Brenda 90 86 45
with open('StudentInfo.tsv', 'r') as file:
studentgrades = csv.reader(file, delimiter='\t')
sgrades=list(studentgrades)
for row in sgrades:
avg2=0
rowstr=' '
for i in row[2:]:
avg2+=int(i)
#print(i)
#print(avg2)
studentavg=float(int(avg2)/len(row[2:]))
if studentavg >= 90:
s_grades.append('A')
elif 80<=studentavg<90:
s_grades.append('B')
elif 70<=studentavg<80:
s_grades.append('C')
elif 60<=studentavg<70:
s_grades.append('D')
else:
s_grades.append('F')
print('{} average: {:.2f}'.format(rowstr.join(row[0:2]), studentavg))
#print('test length',len(row[2:]))
#print(row[2:])
print()
print(s_grades)
print(sgrades)
# for i in s_grades:
# sgrades.append(i)
# print(sgrades)
for i in range(len(s_grades)):
all_grades.append(str(sgrades[i])+str(s_grades[i]))
print(all_grades)
print()
# TODO: Compute student grades and exam averages, then output results to a text file here.
student1=sgrades[0]
student2=sgrades[1]
student3=sgrades[2]
student4=sgrades[3]
student5=sgrades[4]
print('Averages:',end=' ')
midterm1=(int(student1[2])+int(student2[2])+int(student3[2])+int(student4[2])+int(student5[2]))/len(sgrades)
#print(student1[2])
print('midterm1','{:.2f},'.format(midterm1),end=' ')
midterm2=(int(student1[3])+int(student2[3])+int(student3[3])+int(student4[3])+int(student5[3]))/len(sgrades)
#print(student1[3])
print('midterm2','{:.2f},'.format(midterm2),end=' ')
final=(int(student1[4])+int(student2[4])+int(student3[4])+int(student4[4])+int(student5[4]))/len(sgrades)
print('final','{:.2f}'.format(final))
all_grades_computed=['Averages:','midterm1','{:.2f},'.format(midterm1),'midterm2','{:.2f},'.format(midterm2), 'final','{:.2f}'.format(final)]
with open('report.txt', 'w+') as report:
csv_writer=csv.writer(report, delimiter='\t')
csv_writer.writerow(all_grades)
csv_writer.writerow(all_grades_computed)
with open('report.txt','r') as report:
reports=csv.reader(report, delimiter='\t')
reportlist=list(reports)
print(reportlist)
Barrett Edan average: 58.00
Bradshaw Reagan average: 93.67
Charlton Caius average: 82.33
Mayo Tyrese average: 61.67
Stern Brenda average: 73.67
['F', 'A', 'B', 'D', 'C']
[['Barrett', 'Edan', '70', '45', '59'], ['Bradshaw', 'Reagan', '96', '97', '88'], ['Charlton', 'Caius', '73', '94', '80'], ['Mayo', 'Tyrese', '88', '61', '36'], ['Stern', 'Brenda', '90', '86', '45']]
["['Barrett', 'Edan', '70', '45', '59']F", "['Bradshaw', 'Reagan', '96', '97', '88']A", "['Charlton', 'Caius', '73', '94', '80']B", "['Mayo', 'Tyrese', '88', '61', '36']D", "['Stern', 'Brenda', '90', '86', '45']C"]
Averages: midterm1 83.40, midterm2 76.60, final 61.60
[["['Barrett', 'Edan', '70', '45', '59']F", "['Bradshaw', 'Reagan', '96', '97', '88']A", "['Charlton', 'Caius', '73', '94', '80']B", "['Mayo', 'Tyrese', '88', '61', '36']D", "['Stern', 'Brenda', '90', '86', '45']C"], ['Averages:', 'midterm1', '83.40,', 'midterm2', '76.60,', 'final', '61.60']]
I have a data frame that looks like this:
data = {'State': ['24', '24', '24',
'24','24','24','24','24','24','24','24','24'],
'County code': ['001', '001', '001',
'001','002','002','002','002','003','003','003','003'],
'TT code': ['123', '123', '123',
'123','124','124','124','124','125','125','125','125'],
'BLK code': ['221', '221', '221',
'221','222','222','222','222','223','223','223','223'],
'Age Code': ['1', '1', '2', '2','2','2','2','2','2','1','2','1']}
df = pd.DataFrame(data)
essentially I want to just have where only the TT code where the age code is 2 and there are no 1's. So I just want to have the data frame where:
'State': ['24', '24', '24', '24'],
'County code': ['002','002','002','002',],
'TT code': ['124','124','124','124',],
'BLK code': ['222','222','222','222'],
'Age Code': ['2','2','2','2']
is there a way to do this?
IIUC, you want to keep only the TT groups where there are only Age groups with value '2'?
You can use a groupby.tranform('all') on the boolean Series:
df[df['Age Code'].eq('2').groupby(df['TT code']).transform('all')]
output:
State County code TT code BLK code Age Code
4 24 002 124 222 2
5 24 002 124 222 2
6 24 002 124 222 2
7 24 002 124 222 2
This should work.
df111['Age Code'] = "2"
I am just wondering why the choice of string for valueType of integer
This question already has answers here:
Printing Lists as Tabular Data
(20 answers)
Closed 1 year ago.
I have the following 2D list:
table = [['Position', 'Club', 'MP', 'GD', 'Points'],
['1', 'Man City', '38', '51', '86'],
['2', 'Man Utd', '38', '29', '74'],
['3', 'Liverpool', '38', '26', '69'],
['4', 'Chelsea', '38', '22', '67'],
['5', 'Leicester', '38', '18', '66']]
I am wanting to print it so that the format is as following:
Position Club MP GD Points
1 Man City 38 51 86
2 Man Utd 38 29 74
3 Liverpool 38 26 69
4 Chelsea 38 22 67
5 Leicester 38 18 66
My issue is with the even spacing. My attempt at solving this was using:
for i in range(len(table)):
print(*table[i], sep=" "*(15-len(table[i])))
However, I realised that problem is that 'i' refers to the number of items within each row, rather than the length of each individual item, which is what it would take to make even spacing - I think.
How can I get my desired format? And is my approach okay or is there a much better way of approaching this?
I have looked at this - 2D list output formatting - which helped with aspects but not with the even spacing problem I don't think
Any help would be much appreciated, thank you!
You can use str.format for formatting the table (documentation):
table = [
["Position", "Club", "MP", "GD", "Points"],
["1", "Man City", "38", "51", "86"],
["2", "Man Utd", "38", "29", "74"],
["3", "Liverpool", "38", "26", "69"],
["4", "Chelsea", "38", "22", "67"],
["5", "Leicester", "38", "18", "66"],
]
format_string = "{:<15}" * 5
for row in table:
print(format_string.format(*row))
Prints:
Position Club MP GD Points
1 Man City 38 51 86
2 Man Utd 38 29 74
3 Liverpool 38 26 69
4 Chelsea 38 22 67
5 Leicester 38 18 66
You should use the string formatting approach, but the problem with your current solution is that you need to consider the length of each item in the row. So you want something like:
for row in table:
for item in row:
print(item, end=' '*(15 - len(item)))
print()
Note, you almost never want to use for index in range(len(some_list)):, instead, iterate directly over the list. Even in the cases where you do need the index, you almost certainly would end up using enumerate instead of range. Here's the equivalent using ranges... but again, you wouldn't do it this way, it isn't Pythonic:
for i in range(len(table)):
for j in range(len(table[i])):
print(table[i][j], end=' '*(15 - len(table[i][j])))
print()
Question
How can I run through the string so that when locationRegex condition is met it will add it's output to a dictionary, then add any subsequent numbers from numbersRegex to the same dictionary then create a new one with the next location arrives. As shown in Desired output.
Code
import re
# Text to check
text = "Italy Roma 20 40 10 4902520 10290" \
"Italy Milan 20 10 49 20 1030" \
"Germany Berlin 20 10 10 10 29 490" \
"Germany Frankfurt 20 0 0 0 0" \
"Luxemburg Luxemburg 20 10 49"
# regex to find location
locationRegex = re.compile(r'[A-Z]\w+\s[A-Z]\w+')
# regex to find numbers
numberRegex = re.compile(r'[0-9]+')
# Desired output
locations = {'Italy Roma': {'numbers': [10, 40, 10, 4902520]},
'Italy Milan': {'numbers': [20, 10, 49, 20, 1030]}}
What I have tried
I have ran the regex against the string with re.findall however I have the issue of assigning the numbers to the locations as they sit in two separate pots of locations and numbers.
Use a single regex to split the text in chunks, use groups within the regex to separate the data (note the parenthesis), and finally use split to split the number string on the spaces:
import re
text = (
"Italy Roma 20 40 10 4902520 10290"
"Italy Milan 20 10 49 20 1030"
"Germany Berlin 20 10 10 10 29 490"
"Germany Frankfurt 20 0 0 0 0"
"Luxemburg Luxemburg 20 10 49"
)
line_regex = re.compile(r"([A-Z]\w+\s[A-Z]\w+) ([0-9 ]+)")
loc_dict = {}
for match in re.finditer(line_regex, text):
print(match.group(1))
print(match.group(2))
loc_dict[match.group(1)] = {"numbers": match.group(2).split(" ")}
print(loc_dict)
The dict will be:
{'Italy Roma': {'numbers': ['20', '40', '10', '4902520', '10290']},
'Italy Milan': {'numbers': ['20', '10', '49', '20', '1030']},
'Germany Berlin': {'numbers': ['20', '10', '10', '10', '29', '490']},
'Germany Frankfurt': {'numbers': ['20', '0', '0', '0', '0']},
'Luxemburg Luxemburg': {'numbers': ['20', '10', '49']}}
Note that you should check for edge cases: no numbers, cities with a space in the name and so on.
Cheers!
I have a data dictionary like this:
data = {
'new_value': [
'100', '100',
'250', '250',
'250', '50',
'90', '90',
'90', '90'
],
'prev_value': [
'None', 'None',
'None', 'None',
'None', 'None',
'None', 'None',
'None', 'None'
]
}
new_value prev_value
0 100 None
1 100 None
2 250 None
3 250 None
4 250 None
5 50 None
6 90 None
7 90 None
8 90 None
9 90 None
And I would expect to get another dictionary exp_result like this:
exp_result = {
'new_value': [
'100', '100',
'250', '250',
'250', '50',
'90', '90',
'90', '90'
],
'prev_value': [
'100', '100',
'100', '100',
'100', '250',
'50', '50',
'50', '50'
]
}
new_value prev_value
0 100 100
1 100 100
2 250 100
3 250 100
4 250 100
5 50 250
6 90 50
7 90 50
8 90 50
9 90 50
I tried pandas.Series.shift() function, but my data isn't periodic and I have no idea now.
Idea is convert all values without last consecutives to missing values by Series.mask, then Series.shift for values to next groups, forward filling missing values by previous values by ffill and last replace first missing values by originals by fillna:
m = df['new_value'].shift(-1).eq(df['new_value'])
df['prev_value'] = df['new_value'].mask(m).shift().ffill().fillna(df['new_value'])
print (df)
new_value prev_value
0 100 100
1 100 100
2 250 100
3 250 100
4 250 100
5 50 250
6 90 50
7 90 50
8 90 50
9 90 50