Hi I am new to coding and i've just started. I have seen many examples of the same error but I am unsure how to apply that to my code. I am trying to sort a text file in order of score. This is my current code:
ScoresFile = open("Top Scores.txt","r")
newScoresRec = []
ScoresRec = []
for row in ScoresFile:
ScoresRec = row.split(",")
username = ScoresRec[0]
Bestscores = int(ScoresRec[1])
newScoresRec.append(username)
newScoresRec.append(Bestscores)
ScoresRec.append(newScoresRec)
newScoresRec = []
sortedTable = sorted(ScoresRec,key=lambda x:x[1])
for n in range (len(sortedTable)):
print(sortedTable[n][0],sortedTable[n][1])
ScoresFile.close()
The text file is just in the simple format of:
'username','score'-
example: BO15,78
Any help would be greatly appreciated thanks.
Try this and let me know if it works:
scores_rec = []
with open("Top Scores.txt", "r") as scores_file:
lines = scores_file.readlines()
for line in lines:
s_line = line.rstrip().split(",")
scores_rec.append([s_line[0], int(s_line[1])])
sorted_table = sorted(scores_rec, key=lambda x: x[1])
for item in sorted_table:
print(item[0], item[1])
Part of the issue is that you're appending back into the split list,
ScoresRec = row.split(",")
...
ScoresRec.append(newScoresRec)
and not building a list of lists; at the end of the loop, you are sorting the last line of the file with the two appended values that you added
Therefore, you are sorting a list of a few strings and a list, with the sort key being a list containing a string and int, thus the error.
Text file (file.txt):
"MORETHANSMALLER",2
"BIGGER",10
"SMALLER",1
"UNDERBIGGER",9
"MIDDLE",5
Code (csv_reader.py):
import csv
with open('file.txt', newline='') as csvfile:
rows = list(csv.reader(csvfile, delimiter=',', quotechar='"'))
rows.sort(key = lambda x: int(x[1]))
print('From 1 to 10:')
print(rows)
rows.reverse()
print('From 10 to 1:')
print(rows)
Result:
From 1 to 10:
[['SMALLER', '1'], ['MORETHANSMALLER', '2'], ['MIDDLE', '5'], ['UNDERBIGGER', '9'], ['BIGGER', '10']]
From 10 to 1:
[['BIGGER', '10'], ['UNDERBIGGER', '9'], ['MIDDLE', '5'], ['MORETHANSMALLER', '2'], ['SMALLER', '1']]
Don't parse CSV files manually. Use python CSV libraries. That will help you to avoid traps with quotes.
Related
I am trying to solve a problem which has to make a .csv file into list of lists (list1) and then I have to use map function to extract the desired output into another list (list2) from list1
the csv file contains data like
Last name, First name, Final, Grade
Alfalfa, Aloysius,49, D-
Alfred, University,48, D+
After making the .csv into list I have to check for the marks if the student will be selected or not by using map on the list1
So here I code it like this
import csv
from curses.ascii import isdigit
def selection(lis):
for x in lis:
if(x.isdigit() and int(x) > 50):
return lis
list1=[]
list2=[]
with open('D:\C++\Programs\Advanced Programming\grades.csv', 'r') as csv_file:
csv_reader = csv.reader(csv_file)
next(csv_reader)
for line in csv_reader:
list1.append(line)
for i in list1:
r = map(selection, i)
R = list(r)
list2.append(R)
print(list1)
print(list2)
list1 prints correctly
[['Alfalfa', ' Aloysius', '49', ' D-'], ['Alfred', ' University', '48', ' D+']....]
But my list2 is printing
[[None, None, None, None], [None, None, None, None].....]
I am not getting how to use map on list of lists. Why it is printing none. Please help to solve it.
just update ur selection function as
def selection(lis):
return lis if (lis.isdigit() and int(lis) > 50) else None
map sends a single item to the function not the entire list
Output will be
[['boo', 'foo', '53', 'a']]
[[None, None, '53', None]]
you can return anything else other than None if you want in selection function's else statement
Issue #1
The first problem is that you are going a little too deep. The code
for i in list1:
r = map(selection, i)
is sending individual list items to selection when you are expecting the entire list to be sent to selection. You want to change that code to just be
r = map(selection, list1)
How this results in a bunch of Nones
If you add some print statements around the place to debug what is going on you would see that
for i in list1: makes i be a single list such as ['Alfalfa', ' Aloysius', '49', ' D-']
Then map sends each item from our list i to the selection function. So def selection(lis): doesn't actually recieve a list, it will receive an item from the list such as Alfalfa or 49.
Then we have for x in lis: where we check each character in Alfalfa to see if it is a number greater than 50. Is A greater than 5n, No. Is l greater than 50, no. And so on. The for loop finishes, and whenever a function finishes without returning anything, None is returned. Map then moves on to the next item in the list until it gets to a number such as 51 where it will check each character, a 5 and a 1 in this case. Since 5 is not greater than 50 and 1 is not greater than 50, we continue on. As you can see you will never end up with a number greater than 50 since we are in a little too deep checking each individual character instead of the whole item.
Issue #2
The second problem is you want to use filter instead of map to ignore anything that returns None. filter will loop over each item sending each item to selection and will add the item to our list r only if selection returns something true.
The following code does what you want.
import csv
def selection(lis):
for x in lis:
if(x.isdigit() and int(x) > 50):
return True
list1=[]
list2=[]
with open('D:\C++\Programs\Advanced Programming\grades.csv', 'r') as csv_file:
csv_reader = csv.reader(csv_file)
next(csv_reader)
for line in csv_reader:
list1.append(line)
r = filter(selection, i)
list2 = list(r)
print(list1)
print(list2)
How I'd solve this type of problem
Below is how I would go about solving this problem. I'm using list comprehension instead of map or filter.
import csv
csv_file = 'D:\C++\Programs\Advanced Programming\grades.csv'
with open(csv_file) as fh:
csv_reader = csv.reader(fh)
headers = next(csv_reader)
data = list(csv_reader)
passing_grades = [x for x in data if x.isdigit() and int(x) > 50]
print('passing grades: ', passing_grades)
I apologize for the confusing title. I'm very new to Python and here's what I'm trying to achieve:
I'm parsing a file file.txt that has data like this (and other stuff):
file.txt:
...
a = (
1
2
3 )
...
I need to store this type of data in 2 parts:
name = "a"
value = {"(", "1", "2", "3 )"}
^ each line is an element of the list
I'm parsing the file line by line as shown in the snippet below and I can't change that. I'm not sure how to store the data this way by looking ahead a few lines, storing their values and then skipping them so that they're not processed twice. I want the 2 variables name and value populated when the loop is at the first line "a = "
with open(file.txt) as fp:
for line in fp:
...
Thanks for the help.
I suggest using a dictionary:
txt=open(r"file.txt","r").readlines()
dictionary=dict()
for i in range(len(txt)):
if "=" in txt[i]:
name,values=txt[i].split()[0],[txt[i].split()[-1]]
dictionary[name],i={"name":name},i+1
while True:
values.append(txt[i])
if ")" in txt[i]:
break
i=i+1
values=[value.replace("\n","") for value in values]
dictionary[name].update({"values":values})
i=i-1
i=i+1
>>dictionary["a"]
Out[40]: {'name': 'a', 'values': ['(', '1', '2', '3 )']}
>>dictionary["b"]
Out[45]: {'name': 'b', 'values': ['(', '3', '4', '6 )']}
So, you parse the file line-to-line. Whenever you find an equal sign "=" in a line, it means that the char before the "=" is the name value you want. Then the next line is the first element of the list, the line after that the second element etc... when a line has the char ")" it means that it is the last value of the list.
See the Python string.find method for this. Try to understand the concept and the coding shouldn't be hard.
[u'a']
['(', '1', '2', '3', ')']
Is this what you need?
Then you can follow these lines of code:
import nltk
name = []
value = []
with open("file.txt") as fp:
for line in fp:
words = line.split()
if ('(') in words:
name.append(words[0].decode('utf-8'))
value.append('(')
else:
for entry in words:
value.append(entry)
print (name)
print (value)
fp.close()
If the file is not too large, read whole file into memory then use a while loop to do a finer-grained control:
# python3
with open("file.txt") as f:
lines = f.readlines()
index = 0
while True:
# do something here
Else, if only last value contains ')', do this:
with open('file.txt') as f:
pairs = []
for line in f:
values = []
name, value = line.strip().split('=')
name = name.strip()
values.append(value.strip())
while True:
line = next(f)
values.append(line.strip())
if ')' in line:
break
pairs.append((name, values))
I have a data file with a certain amount of rows and columns that I import. I want to store the values of each row in a list and finally create a list consisting of the lists of each row, e.g. a simplified version:
Input:
1 2 3
4 5 6
7 8 9
And as outcome I want
[[1,2,3],[4,5,6],[7,8,9]]
My code atm:
result = []
col1 = []
for line in lines[1:]:
# split the line into fields based on white space
fields = line.split()
# convert the text to numbers, make list of values in row k
while k < real:
col = float(fields[k])
col1.append(col)
k+=1
else:
result.append(col1) #make list of lists of values in rows
k=0 #Reset k for other loop using k
del col1[:] #Delete temp list
print result
For some reason after del col1[:], result also gets emptied. Any idea why this is?
Any suggestions on how to do this in a more simplified way are always welcome! As you'll probably have noticed, I'm not that experienced with python.
Note that in my real case I have a data-file with 100 columns and 108k rows.
Thanks in advance!
The Answer(s)
Using Python 2.x it's as simple as
list_of_lists = [map(int,l.split()) for l in open('data.txt').readlines()]
but for Python 3.x the map builtin returns a generator, not a list so it has to be written using list comprehension (LC)
lol = [[int(s) for s in l.split()] for l in open('data.txt').readlines()]
BTW, the second possibility works as well in Python 2.x, so from a compatibility POV it could be the preferred approach.
Why does it works?
Let's focus on the second answer, our list of lists (LOL) is built using a nested list comprehension, the outer producing a list of objects produced by the inner one, i.e., lists, hence a LOL as requested...
The fundamental concept is that you need not an explicit loop on the lines of a file because every file object, as returned from the open builtin, has a .readlines method that returns a list of lines, each line represented by a string terminated by the linefeed character.
The elements of this list (the lines) can be split in individual elements using the .split method of strings --- by default split acts on whitespace, so it follows your requirements and we can write, using a LC
[l.split() for l in open('data.txt').readlines()]
obtaining the following LOL
[['1', '2', '3'], ['4', '5', '6'], ['7', '8', '9']],
as you can see we are close to our target, but the elements of the inner lists are not numbers, but textual repersentations of numbers, i.e., strings.
We have to introduce a further step, that is converting strings to numbers. We have two choices, the builtins int and float, in your case it seems that you want integers so we want int, a function that accepts a single argument (that's not exactly true) either a number or a string.
If we pass to int the outcome of l.split() an error will be raised, because l.split() doesn't return a string but a list of strings... we have to 1. unpack the elements of the lists and 2. pack back the results into a list, in other words it is again a LC!
[int(s) for s in l.split()] # -> [1, 2, 3] for the first line, etc
Let's put the pieces together and you have your answer:
lol = [[int(s) for s in l.split()] for l in open('data.txt').readlines()]
It's really easy (if you already knew all the stuff I tried to explain, that is...)
You could use csv module.
import csv
with open('file') as f:
reader = csv.reader(f, delimiter=" ")
print([i for i in reader])
Output:
[['1', '2', '3'], ['4', '5', '6'], ['7', '8', '9']]
Easy:
with open("/tmp/f") as f:
m = [row.split() for row in f.read().split("\n") if row]
print(m)
Output:
[['1', '2', '3'], ['4', '5', '6'], ['7', '8', '9']]
with open("data.txt") as inf:
# skip header row
next(inf, "")
# parse data
result = [[float(f) for f in line.split()] for line in inf]
results in
[[1.0, 2.0, 3.0],
[4.0, 5.0, 6.0],
[7.0, 8.0, 9.0]]
Add type casting.
>>> file_path = '/home/Desktop/123.csv'
>>> import csv
>>> with open(file_path) as fp:
... reader = csv.reader(fp, delimiter=" ")
... tmp = [i for i in reader]
... result = []
... for i in tmp:
... result.append([int(j) for j in i])
...
>>> print result
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>>
file_list = []
f = open(file.txt, 'r')
for line in f.xreadlines():
file_list.append([line])
f.close()
I'm trying to export a numpy array to different csv files using a function and based on
the second value of each line of data array. My goal is to export data to the same file if the second value of each line is equal. So far I can create diffferent files but I also export all data to each file. The second part of the problem is that I don't have a maximum number of options for second values in my array. This is the code that I've done:
a = np.array([(2,"Ana",9),(5,"Maria",4),(6,"Joao",3),
(1,"Ana",4)])
export_path = r"c:"
def export(array_values):
names = []
for i in xrange(len(array_values)):
names.append(array_values[i][1])
names = sorted(set(names))
for i in xrange(len(array_values)):
for j in xrange(len(names)):
if array_values[i][1] == names[j]:
name = "..."
export_file_path = os.path.join(export_path,name + ".csv")
myfile = open(export_file_path, 'wb')
wr = csv.writer(myfile, quoting=csv.QUOTE_NONE)
wr.writerows(array_values)
export(a)
Thanks in advance for your help.
Ivo
itertools.groupby() makes this easy.
import itertools as it
import numpy as np
from operator import itemgetter
import csv
a = np.array([(2,"Ana",9),(5,"Maria",4),(6,"Joao",3), (1,"Ana",4)])
name = itemgetter(1)
a = sorted(a, key = name)
for k, g in it.groupby(a, name):
filename = k + '.csv'
with open(filename, 'wb') as f:
writer = csv.writer(f)
writer.writerows(list(g))
Alright:
def export(A):
_v2=set(A[:,1])
for item in _v2:
sub_A=A[A[:,1]==item][:,[0,2]]
with open(item+'.txt', 'w') as f:
wr = csv.writer(f, quoting=csv.QUOTE_NONE)
wr.writerows(sub_A)
should generate 3 files: i.e., ana.txt (since the name already becomes the file name, we can drop it from the final csv file):
2,9
1,4
First, figure out how many files you will need:
>>> unq, unq_idx = np.unique(a[:, 1], return_inverse=True)
>>> unq
array(['Ana', 'Joao', 'Maria'],
dtype='|S5')
>>> unq_idx
array([0, 2, 1, 0])
You can now loop over the groups, extract the corresponding rows, and save them:
for j, name in enumerate(unq):
sub_a = a[unq_idx == j]
# sub_a holds the lines that have name in the 2nd column
...
Here are a couple of pointers that might be useful ...
Sort the initial array:
In [53]: a1 = sorted(a, key=lambda x:x[1])
In [54]: a1
Out[54]:
[array(['2', 'Ana', '9'],
dtype='|S5'),
array(['1', 'Ana', '4'],
dtype='|S5'),
array(['6', 'Joao', '3'],
dtype='|S5'),
array(['5', 'Maria', '4'],
dtype='|S5')]
Next, you can filter out values which have the second item same like so:
In [55]: filter(lambda x: x[1] == a1[0][1] , a1)
Out[55]:
[array(['2', 'Ana', '9'],
dtype='|S5'),
array(['1', 'Ana', '4'],
dtype='|S5')]
Save these values in a file with name a1[0][1]
Filter out the rest of the values (name this new list a1, which is not shown below):
In [56]: filter(lambda x: x[1] != a1[0][1] , a1)
Out[56]:
[array(['6', 'Joao', '3'],
dtype='|S5'),
array(['5', 'Maria', '4'],
dtype='|S5')]
Repeat until you find an empty list. You can either do a recursion or a nurmal loop. Anything is fine.
Hope this helps.
I have a csv file. Each column represents a parameter and contains few values (eg. 1, 2, 3, 5) repeated hundreds of times.
I want to write a python program that reads each column and stocks its content in a dictionary {column_header: list_numbers} (without repeating the numbers).
I tried to adapt the example given in the python documentation:
def getlist(file):
content = dict()
with open(file, newline = '') as inp:
my_reader = reader(inp, delimiter = ' ')
for col in zip(*my_reader):
l = []
for k in col:
if k not in l:
l.append(k)
print(k) # for debugging purposes
content[col[0]] = l
I was expecting, by printing k, to see each element of the column. Instead, I get several columns at a time.
Any idea about what is wrong?
It looks like you are almost there. I'd use a set to detect repeated numbers (more efficient):
def getlist(file):
content = {}
with open(file, newline = '') as inp:
my_reader = reader(inp, delimiter = ' ')
for col in zip(*my_reader):
content[col[0]] = l = []
seen = set()
for k in col[1:]:
if k not in seen:
l.append(k)
seen.add(k)
return content
Make sure you get your delimiter right; if the above doesn't work for you then print() may show you whole rows with the delimiters still in them, as strings.
Say, your file uses , as a delimiter instead, the output would look something like:
{'a,b,c,d': ['0,1,2,3', '1,2,3,4']}
while configuring the correct delimiter would give you:
{'d': ['3', '4'], 'c': ['2', '3'], 'b': ['1', '2'], 'a': ['0', '1']}
Does the following python script work for you?
import csv
test_file = 'test.csv'
csv_file = csv.DictReader(open(test_file, 'rb'), delimiter=',')
for line in csv_file:
print line['x']