Display list sorted alphabetically - python

We're trying to create a function that takes the input, some data containing the following information: ID number, Name, as well as a number of columns containing the grades for different assignments, and then sorts the data alphabetically (according to the name) and then displays the data with a column added that also displays the final grade (that we calculate with another function we made). We've tried writing the following code, but can't get it to work... The error-message given is "names = GRADESdata[:,1].tolist() TypeError: string indices must be integers".
Can anyone help us to figure out how to get it working?
def listOfgrades(GRADESdata):
names = GRADESdata[:,1].tolist()
names = names.sort(names)
assignments = GRADESdata[:,2::]
final_grades = computeFinalGrades(GRADESdata)
final_grades = np.array(final_grades.reshape(len(final_grades),1))
List_of_grades = np.hstack((GRADESdata, final_grades))
NOofColumns = np.size(GRADESdata,axis = 1)
display = np.zeros(NOofColumns)
for i in names:
display = np.vstack((display,GRADESdata[GRADESdata[:,1] == i]))
grades = display[1::,2:-1]
gradesfinal = display[1::,-1]
#Column titles
c = {"Student ID": GRADESdata[1::,0], "Name": GRADESdata[1::,1]}
for i in range(GRADESdata.shape[1]):
c["Assign.{}".format(i+1)] = GRADESdata[:,i]
c["Final grade"] = final_grades
d = pd.DataFrame(c)
print(d.to_string())
display = np.array([student_list, names, assignments, final_grades])
return display
The expected output is something like this (with the data below ofc):
ID number Name Assignment 1 Assignment 2 Final Grade
EDIT: the data input is a .csv file containing the following data:ID number,Name,Assignment 1,Assignment 2, etc.

The comma in
names = GRADESdata[:,1].tolist()
is not a valid character. the part between [: and ] must be an integer

From looking at .tolist(), I assume the data structure you're supposed to use is numpy.ndarray.
I managed to replicate the error with the following code:
print("12354"[:,1].tolist())
which makes sense if you're using a file name as input - and that's your mistake.
In order to fix this problem, you need to implement a string parser at the beginning or outside the function.
Add the following to your code at the beginning:
file=open(GRADESdata,"r")
data=file.read()
file.close()
list1=data.split("\n")#Replace \n with appropriate line separator
list2=[e.split(",") for e in list1]
GRADESdata=numpy.array(list2)

Related

Unable to store values in an array from for loop in python

First I tried directly storing values from a list having the name 'data' in an array variable 'c' using loop but 'none' got printed
for i in data:
print(i['name'])
c=i['name']
Here print(i['name']) perfectly worked and output appeared
This is the working ouput
Then I printed c in order to print the values generated using loop. The ouput came as none.
print(c)
Then I tried another way by storing the values and making the array iterable at the same time using for loop. An error occurred which I was unable to resolve.
for i in data:
b[c]=i['name']
c=c+1
The error apeared is as follow-
I have tried two ways, if there is any other way please help me out as I am new to python.
It looks like the variable 'data' is a dictionary.
If you want to add each name from that dictionary to a list:
# create a new list variable
names = []
for i in data:
name = i['name']
print(name)
# add the name to the list
names.append(name)
# output the new list
print(names)
Assuming your data object here is a list like [{"name": "Mr. Green", ...}, {"name": "Mr. Blue", ...}].
If your goal is to end up with c == ["Mr. Green", "Mr. Blue"], then you're looking for something like:
c = []
for i in data:
c.append(i['name'])
print(c)
or you can accomplish tasks like these using list comprehensions like:
c = [i['name'] for i in data]
print(c)
The first code example you posted is iterating through the items in data and reassigning the value of c to each item's name key - not adding them to a list("array"). Without knowing more about the code you ran to produce the screenshot and/or the contents of data, it's hard to say why you're seeing print(c) produce None. I'd guess the last item in data is something like {"name": None, ...} which if it's coming from JSON is possible if the value is null. Small note: I'd generally use .get("name") here instead so that your program doesn't blow up if an item is missing a "name" key entirely.
For your second code example, the error is different but I think falls along a similar logical fallacy which is that lists in python function differently from primitives(things like numbers and strings). For the interpreter to know that b or c are supposed to be lists("arrays"), they need to be instantiated differently and they have their own set of syntax/methods for mutation. For example, like arrays in other languages, lists are indexed by position so doing b[c] = <something> will only work if c is an integer. So something similar to your second example that would also produce a list of names like my above would be:
b = [None] * len(data)
c = 0
for i in data:
b[c]=i['name']
c=c+1
Note that if you only initialize b = [], you get an IndexError: list assignment index out of range on the initial assignment of b[0] = "some name" because the list is of size 0.
Add
b = []
above your first line of code. As the error is saying that you have not (and correctly so) defined the list to append.
I personally would use list comprehension here
b = [obj['name'] for obj in data]
where obj is i as you have defined it.

How can I add two or more elements to a list from a single input in Python?

What my script is doing now is adding elements to a list. For example, if the user types "JO", I will add "John" to the list. What I want to do now is that, if the user types "2 JO", I add two elements to the list: "John" and "John".
This is how the database looks like now:
Sample database copy.png
This is the code now:
import pandas
data = pandas.read_excel("Sample database copy.xlsx")
name = dict(zip(data["Abbreviation"],data["Name"]))
list1 = []
incoming_msg = input(Please type what you want to add: )
list1.append(name[incoming_msg])
I need to do it all from the same input, I cannot ask separately for quantity and abbreviation. I wanted to know if there is any library that can do this somehow easily because I am a beginner coder. If there is no library but you have any idea how I could solve it, it would be awesome as well.
Thank you so much in advance!
you can use string.split() to split the string by space into a list then use the first element to multiply a list that contains the value from the dictionary and increment it to the result list. see the code
name = dict(zip(data["Abbreviation"],data["Name"]))
list1 = []
incoming_msg = input('Please type what you want to add: ')
incoming_msg = incoming_msg.split() # split the string by space
if len(incoming_msg) == 2: # if there are two elements in the list (number and name)
list1 += [name[incoming_msg[1]]] * int(incoming_msg[0])
else:
list1.append(name[incoming_msg[0]])

Best search algorithm to find 'similar' strings in excel spreadsheet

I am trying to figure out the most efficient way of finding similar values of a specific cell in a specified column(not all columns) in an excel .xlsx document. The code I have currently assumes all of the strings are unsorted. However the file I am using and the files I will be using all have strings sorted from A-Z. So instead of doing a linear search I wonder what other search algorithm I could use as well as being able to fix my coding eg(binary search etc).
So far I have created a function: find(). Before the function runs the program takes in a value from the user's input that then gets set as the sheet name. I print out all available sheet names in the excel doc just to help the user. I created an empty array results[] to store well....the results. I created a for loop that iterates through only column A because I only want to iterate through a custom column. I created a variable called start that is the first coordinate in column A eg(A1 or A400) this will change depending on the iteration the loop is on. I created a variable called next that will get compared with the start. Next is technically just start + 1, however since I cant add +1 to a string I concatenate and type cast everything so that the iteration becomes a range from A1-100 or however many cells are in column A. My function getVal() gets called with two parameters, the coordinate of the cell and the worksheet we are working from. The value that is returned from getVal() is also passed inside my function Similar() which is just a function that calls SequenceMatcher() from difflib. Similar just returns the percentage of how similar two strings are. Eg. similar(hello, helloo) returns int 90 or something like that. Once the similar function is called if the strings are above 40 percent similar appends the coordinates into the results[] array.
def setSheet(ws):
sheet = wb[ws]
return sheet
def getVal(coordinate, worksheet):
value = worksheet[coordinate].value
return value
def similar(first, second):
percent = SequenceMatcher(None, first, second).ratio() * 100
return percent
def find():
column = "A"
print("\n")
print("These are all available sheets: ", wb.sheetnames)
print("\n")
name = input("What sheet are we working out of> ")
results = []
ws = setSheet(name)
for i in range(1, ws.max_row):
temp = str(column + str(i))
x = ws[temp]
start = ws[x].coordinate
y = str(column + str(i + 1))
next = ws[y].coordinate
if(similar(getVal(start,ws), getVal(next,ws)) > 40):
results.append(getVal(start))
return results
This is some nasty looking code so I do apologize in advance. The expected results should just be a list of strings that are "similar".

Sorting on list values read into a list from a file

I am trying to write a routine to read values from a text file, (names and scores) and then be able to sort the values az by name, highest to lowest etc. I am able to sort the data but only by the position in the string, which is no good where names are different lengths. This is the code I have written so far:
ClassChoice = input("Please choose a class to analyse Class 1 = 1, Class 2 = 2")
if ClassChoice == "1":
Classfile = open("Class1.txt",'r')
else:
Classfile = open("Class2.txt",'r')
ClassList = [line.strip() for line in Classfile]
ClassList.sort(key=lambda s: s[x])
print(ClassList)
This is an example of one of the data files (Each piece of data is on a separate line):
Bob,8,7,5
Fred,10,9,9
Jane,7,8,9
Anne,6,4,8
Maddy,8,5,5
Jim, 4,6,5
Mike,3,6,5
Jess,8,8,6
Dave,4,3,8
Ed,3,3,4
I can sort on the name, but not on score 1, 2 or 3. Something obvious probably but I have not been able to find an example that works in the same way.
Thanks
How about something like this?
indexToSortOn = 0 # will sort on the first integer value of each line
classChoice = ""
while not classChoice.isdigit():
classChoice = raw_input("Please choose a class to analyse (Class 1 = 1, Class 2 = 2) ")
classFile = "Class%s.txt" % classChoice
with open(classFile, 'r') as fileObj:
classList = [line.strip() for line in fileObj]
classList.sort(key=lambda s: int(s.split(",")[indexToSortOn+1]))
print(classList)
The key is to specify in the key function that you pass in what part of each string (the line) you want to be sorting on:
classList.sort(key=lambda s: int(s.split(",")[indexToSortOn+1]))
The cast to an integer is important as it ensures the sort is numeric instead of alphanumeric (e.g. 100 > 2, but "100" < "2")
I think I understand what you are asking. I am not a sort expert, but here goes:
Assuming you would like the ability to sort the lines by either the name, the first int, second int or third int, you have to realize that when you are creating the list, you aren't creating a two dimensional list, but a list of strings. Due to this, you may wish to consider changing your lambda to something more like this:
ClassList.sort(key=lambda s: str(s).split(',')[x])
This assumes that the x is defined as one of the fields in the line with possible values 0-3.
The one issue I see with this is that list.sort() may sort Fred's score of 10 as being less than 2 but greater than 0 (I seem to remember this being how sort worked on ints, but I might be mistaken).

Splitting json data in python

I'm trying to manipulate a list of items in python but im getting the error "AttributeError: 'list' object has no attribute 'split'"
I understand that list does not understand .split but i don't know what else to do. Below is a copy paste of the relevant part of my code.
tourl = 'http://data.bitcoinity.org/chart_data'
tovalues = {'timespan':'24h','resolution':'hour','currency':'USD','exchange':'all','mining_pool':'all','compare':'no','data_type':'price_volume','chart_type':'line_bar','smoothing':'linear','chart_types':'ccacdfcdaa'}
todata = urllib.urlencode(tovalues)
toreq = urllib2.Request(tourl, todata)
tores = urllib2.urlopen(toreq)
tores2 = tores.read()
tos = json.loads(tores2)
tola = tos["data"]
for item in tola:
ting = item.get("values")
ting.split(',')[2] <-----ERROR
print(ting)
To understand what i'm trying to do you will also need to see the json data. Ting outputs this:
[
[1379955600000L, 123.107310846774], [1379959200000L, 124.092526428571],
[1379962800000L, 125.539504822835], [1379966400000L, 126.27024617931],
[1379970000000L, 126.723474983766], [1379973600000L, 126.242406356837],
[1379977200000L, 124.788410570987], [1379980800000L, 126.810084904632],
[1379984400000L, 128.270580796748], [1379988000000L, 127.892411269036],
[1379991600000L, 126.140579640523], [1379995200000L, 126.513705084746],
[1379998800000L, 128.695124951923], [1380002400000L, 128.709738051044],
[1380006000000L, 125.987767097378], [1380009600000L, 124.323433535528],
[1380013200000L, 123.359378559603], [1380016800000L, 125.963250678733],
[1380020400000L, 125.074618194444], [1380024000000L, 124.656345088853],
[1380027600000L, 122.411303435449], [1380031200000L, 124.145747100372],
[1380034800000L, 124.359452274881], [1380038400000L, 122.815357211394],
[1380042000000L, 123.057706915888]
]
[
[1379955600000L, 536.4739135], [1379959200000L, 1235.42506637],
[1379962800000L, 763.16329656], [1379966400000L, 804.04579319],
[1379970000000L, 634.84689741], [1379973600000L, 753.52716718],
[1379977200000L, 506.90632968], [1379980800000L, 494.473732950001],
[1379984400000L, 437.02095093], [1379988000000L, 176.25405034],
[1379991600000L, 319.80432715], [1379995200000L, 206.87212398],
[1379998800000L, 638.47226435], [1380002400000L, 438.18036666],
[1380006000000L, 512.68490443], [1380009600000L, 904.603705539997],
[1380013200000L, 491.408088450001], [1380016800000L, 670.275397960001],
[1380020400000L, 767.166941339999], [1380024000000L, 899.976089609997],
[1380027600000L, 1243.64963909], [1380031200000L, 1508.82429811],
[1380034800000L, 1190.18854705], [1380038400000L, 546.504592349999],
[1380042000000L, 206.84883264]
]
And ting[0] outputs this:
[1379955600000L, 123.187067936508]
[1379955600000L, 536.794013499999]
What i'm really trying to do is add up the values from ting[0-24] that comes AFTER the second comma. This made me try to do a split but that does not work
You already have a list; the commas are put there by Python to delimit the values only when printing the list.
Just access element 2 directly:
print ting[2]
This prints:
[1379962800000, 125.539504822835]
Each of the entries in item['values'] (so ting) is a list of two float values, so you can address each of those with index 0 and 1:
>>> print ting[2][0]
1379962800000
>>> print ting[2][1]
125.539504822835
To get a list of all the second values, you could use a list comprehension:
second_vals = [t[1] for t in ting]
When you load the data with json.loads, it is already parsed into a real list that you can slice and index as normal. If you want the data starting with the third element, just use ting[2:]. (If you just want the third element by itself, just use ting[2].)

Categories

Resources