Getting elements of a csv file - python

I have a csv file. Each column represents a parameter and contains few values (eg. 1, 2, 3, 5) repeated hundreds of times.
I want to write a python program that reads each column and stocks its content in a dictionary {column_header: list_numbers} (without repeating the numbers).
I tried to adapt the example given in the python documentation:
def getlist(file):
content = dict()
with open(file, newline = '') as inp:
my_reader = reader(inp, delimiter = ' ')
for col in zip(*my_reader):
l = []
for k in col:
if k not in l:
l.append(k)
print(k) # for debugging purposes
content[col[0]] = l
I was expecting, by printing k, to see each element of the column. Instead, I get several columns at a time.
Any idea about what is wrong?

It looks like you are almost there. I'd use a set to detect repeated numbers (more efficient):
def getlist(file):
content = {}
with open(file, newline = '') as inp:
my_reader = reader(inp, delimiter = ' ')
for col in zip(*my_reader):
content[col[0]] = l = []
seen = set()
for k in col[1:]:
if k not in seen:
l.append(k)
seen.add(k)
return content
Make sure you get your delimiter right; if the above doesn't work for you then print() may show you whole rows with the delimiters still in them, as strings.
Say, your file uses , as a delimiter instead, the output would look something like:
{'a,b,c,d': ['0,1,2,3', '1,2,3,4']}
while configuring the correct delimiter would give you:
{'d': ['3', '4'], 'c': ['2', '3'], 'b': ['1', '2'], 'a': ['0', '1']}

Does the following python script work for you?
import csv
test_file = 'test.csv'
csv_file = csv.DictReader(open(test_file, 'rb'), delimiter=',')
for line in csv_file:
print line['x']

Related

Python-'<' not supported between instances of 'int' and 'str'

Hi I am new to coding and i've just started. I have seen many examples of the same error but I am unsure how to apply that to my code. I am trying to sort a text file in order of score. This is my current code:
ScoresFile = open("Top Scores.txt","r")
newScoresRec = []
ScoresRec = []
for row in ScoresFile:
ScoresRec = row.split(",")
username = ScoresRec[0]
Bestscores = int(ScoresRec[1])
newScoresRec.append(username)
newScoresRec.append(Bestscores)
ScoresRec.append(newScoresRec)
newScoresRec = []
sortedTable = sorted(ScoresRec,key=lambda x:x[1])
for n in range (len(sortedTable)):
print(sortedTable[n][0],sortedTable[n][1])
ScoresFile.close()
The text file is just in the simple format of:
'username','score'-
example: BO15,78
Any help would be greatly appreciated thanks.
Try this and let me know if it works:
scores_rec = []
with open("Top Scores.txt", "r") as scores_file:
lines = scores_file.readlines()
for line in lines:
s_line = line.rstrip().split(",")
scores_rec.append([s_line[0], int(s_line[1])])
sorted_table = sorted(scores_rec, key=lambda x: x[1])
for item in sorted_table:
print(item[0], item[1])
Part of the issue is that you're appending back into the split list,
ScoresRec = row.split(",")
...
ScoresRec.append(newScoresRec)
and not building a list of lists; at the end of the loop, you are sorting the last line of the file with the two appended values that you added
Therefore, you are sorting a list of a few strings and a list, with the sort key being a list containing a string and int, thus the error.
Text file (file.txt):
"MORETHANSMALLER",2
"BIGGER",10
"SMALLER",1
"UNDERBIGGER",9
"MIDDLE",5
Code (csv_reader.py):
import csv
with open('file.txt', newline='') as csvfile:
rows = list(csv.reader(csvfile, delimiter=',', quotechar='"'))
rows.sort(key = lambda x: int(x[1]))
print('From 1 to 10:')
print(rows)
rows.reverse()
print('From 10 to 1:')
print(rows)
Result:
From 1 to 10:
[['SMALLER', '1'], ['MORETHANSMALLER', '2'], ['MIDDLE', '5'], ['UNDERBIGGER', '9'], ['BIGGER', '10']]
From 10 to 1:
[['BIGGER', '10'], ['UNDERBIGGER', '9'], ['MIDDLE', '5'], ['MORETHANSMALLER', '2'], ['SMALLER', '1']]
Don't parse CSV files manually. Use python CSV libraries. That will help you to avoid traps with quotes.

Skip multiple lines while parsing file in python and storing their values

I apologize for the confusing title. I'm very new to Python and here's what I'm trying to achieve:
I'm parsing a file file.txt that has data like this (and other stuff):
file.txt:
...
a = (
1
2
3 )
...
I need to store this type of data in 2 parts:
name = "a"
value = {"(", "1", "2", "3 )"}
^ each line is an element of the list
I'm parsing the file line by line as shown in the snippet below and I can't change that. I'm not sure how to store the data this way by looking ahead a few lines, storing their values and then skipping them so that they're not processed twice. I want the 2 variables name and value populated when the loop is at the first line "a = "
with open(file.txt) as fp:
for line in fp:
...
Thanks for the help.
I suggest using a dictionary:
txt=open(r"file.txt","r").readlines()
dictionary=dict()
for i in range(len(txt)):
if "=" in txt[i]:
name,values=txt[i].split()[0],[txt[i].split()[-1]]
dictionary[name],i={"name":name},i+1
while True:
values.append(txt[i])
if ")" in txt[i]:
break
i=i+1
values=[value.replace("\n","") for value in values]
dictionary[name].update({"values":values})
i=i-1
i=i+1
>>dictionary["a"]
Out[40]: {'name': 'a', 'values': ['(', '1', '2', '3 )']}
>>dictionary["b"]
Out[45]: {'name': 'b', 'values': ['(', '3', '4', '6 )']}
So, you parse the file line-to-line. Whenever you find an equal sign "=" in a line, it means that the char before the "=" is the name value you want. Then the next line is the first element of the list, the line after that the second element etc... when a line has the char ")" it means that it is the last value of the list.
See the Python string.find method for this. Try to understand the concept and the coding shouldn't be hard.
[u'a']
['(', '1', '2', '3', ')']
Is this what you need?
Then you can follow these lines of code:
import nltk
name = []
value = []
with open("file.txt") as fp:
for line in fp:
words = line.split()
if ('(') in words:
name.append(words[0].decode('utf-8'))
value.append('(')
else:
for entry in words:
value.append(entry)
print (name)
print (value)
fp.close()
If the file is not too large, read whole file into memory then use a while loop to do a finer-grained control:
# python3
with open("file.txt") as f:
lines = f.readlines()
index = 0
while True:
# do something here
Else, if only last value contains ')', do this:
with open('file.txt') as f:
pairs = []
for line in f:
values = []
name, value = line.strip().split('=')
name = name.strip()
values.append(value.strip())
while True:
line = next(f)
values.append(line.strip())
if ')' in line:
break
pairs.append((name, values))

Python: read file, store rows as list and create list of rows

I have a data file with a certain amount of rows and columns that I import. I want to store the values of each row in a list and finally create a list consisting of the lists of each row, e.g. a simplified version:
Input:
1 2 3
4 5 6
7 8 9
And as outcome I want
[[1,2,3],[4,5,6],[7,8,9]]
My code atm:
result = []
col1 = []
for line in lines[1:]:
# split the line into fields based on white space
fields = line.split()
# convert the text to numbers, make list of values in row k
while k < real:
col = float(fields[k])
col1.append(col)
k+=1
else:
result.append(col1) #make list of lists of values in rows
k=0 #Reset k for other loop using k
del col1[:] #Delete temp list
print result
For some reason after del col1[:], result also gets emptied. Any idea why this is?
Any suggestions on how to do this in a more simplified way are always welcome! As you'll probably have noticed, I'm not that experienced with python.
Note that in my real case I have a data-file with 100 columns and 108k rows.
Thanks in advance!
The Answer(s)
Using Python 2.x it's as simple as
list_of_lists = [map(int,l.split()) for l in open('data.txt').readlines()]
but for Python 3.x the map builtin returns a generator, not a list so it has to be written using list comprehension (LC)
lol = [[int(s) for s in l.split()] for l in open('data.txt').readlines()]
BTW, the second possibility works as well in Python 2.x, so from a compatibility POV it could be the preferred approach.
Why does it works?
Let's focus on the second answer, our list of lists (LOL) is built using a nested list comprehension, the outer producing a list of objects produced by the inner one, i.e., lists, hence a LOL as requested...
The fundamental concept is that you need not an explicit loop on the lines of a file because every file object, as returned from the open builtin, has a .readlines method that returns a list of lines, each line represented by a string terminated by the linefeed character.
The elements of this list (the lines) can be split in individual elements using the .split method of strings --- by default split acts on whitespace, so it follows your requirements and we can write, using a LC
[l.split() for l in open('data.txt').readlines()]
obtaining the following LOL
[['1', '2', '3'], ['4', '5', '6'], ['7', '8', '9']],
as you can see we are close to our target, but the elements of the inner lists are not numbers, but textual repersentations of numbers, i.e., strings.
We have to introduce a further step, that is converting strings to numbers. We have two choices, the builtins int and float, in your case it seems that you want integers so we want int, a function that accepts a single argument (that's not exactly true) either a number or a string.
If we pass to int the outcome of l.split() an error will be raised, because l.split() doesn't return a string but a list of strings... we have to 1. unpack the elements of the lists and 2. pack back the results into a list, in other words it is again a LC!
[int(s) for s in l.split()] # -> [1, 2, 3] for the first line, etc
Let's put the pieces together and you have your answer:
lol = [[int(s) for s in l.split()] for l in open('data.txt').readlines()]
It's really easy (if you already knew all the stuff I tried to explain, that is...)
You could use csv module.
import csv
with open('file') as f:
reader = csv.reader(f, delimiter=" ")
print([i for i in reader])
Output:
[['1', '2', '3'], ['4', '5', '6'], ['7', '8', '9']]
Easy:
with open("/tmp/f") as f:
m = [row.split() for row in f.read().split("\n") if row]
print(m)
Output:
[['1', '2', '3'], ['4', '5', '6'], ['7', '8', '9']]
with open("data.txt") as inf:
# skip header row
next(inf, "")
# parse data
result = [[float(f) for f in line.split()] for line in inf]
results in
[[1.0, 2.0, 3.0],
[4.0, 5.0, 6.0],
[7.0, 8.0, 9.0]]
Add type casting.
>>> file_path = '/home/Desktop/123.csv'
>>> import csv
>>> with open(file_path) as fp:
... reader = csv.reader(fp, delimiter=" ")
... tmp = [i for i in reader]
... result = []
... for i in tmp:
... result.append([int(j) for j in i])
...
>>> print result
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>>
file_list = []
f = open(file.txt, 'r')
for line in f.xreadlines():
file_list.append([line])
f.close()

How can I export Python multidimensional numpy array to different files based on second value of each line?

I'm trying to export a numpy array to different csv files using a function and based on
the second value of each line of data array. My goal is to export data to the same file if the second value of each line is equal. So far I can create diffferent files but I also export all data to each file. The second part of the problem is that I don't have a maximum number of options for second values in my array. This is the code that I've done:
a = np.array([(2,"Ana",9),(5,"Maria",4),(6,"Joao",3),
(1,"Ana",4)])
export_path = r"c:"
def export(array_values):
names = []
for i in xrange(len(array_values)):
names.append(array_values[i][1])
names = sorted(set(names))
for i in xrange(len(array_values)):
for j in xrange(len(names)):
if array_values[i][1] == names[j]:
name = "..."
export_file_path = os.path.join(export_path,name + ".csv")
myfile = open(export_file_path, 'wb')
wr = csv.writer(myfile, quoting=csv.QUOTE_NONE)
wr.writerows(array_values)
export(a)
Thanks in advance for your help.
Ivo
itertools.groupby() makes this easy.
import itertools as it
import numpy as np
from operator import itemgetter
import csv
a = np.array([(2,"Ana",9),(5,"Maria",4),(6,"Joao",3), (1,"Ana",4)])
name = itemgetter(1)
a = sorted(a, key = name)
for k, g in it.groupby(a, name):
filename = k + '.csv'
with open(filename, 'wb') as f:
writer = csv.writer(f)
writer.writerows(list(g))
Alright:
def export(A):
_v2=set(A[:,1])
for item in _v2:
sub_A=A[A[:,1]==item][:,[0,2]]
with open(item+'.txt', 'w') as f:
wr = csv.writer(f, quoting=csv.QUOTE_NONE)
wr.writerows(sub_A)
should generate 3 files: i.e., ana.txt (since the name already becomes the file name, we can drop it from the final csv file):
2,9
1,4
First, figure out how many files you will need:
>>> unq, unq_idx = np.unique(a[:, 1], return_inverse=True)
>>> unq
array(['Ana', 'Joao', 'Maria'],
dtype='|S5')
>>> unq_idx
array([0, 2, 1, 0])
You can now loop over the groups, extract the corresponding rows, and save them:
for j, name in enumerate(unq):
sub_a = a[unq_idx == j]
# sub_a holds the lines that have name in the 2nd column
...
Here are a couple of pointers that might be useful ...
Sort the initial array:
In [53]: a1 = sorted(a, key=lambda x:x[1])
In [54]: a1
Out[54]:
[array(['2', 'Ana', '9'],
dtype='|S5'),
array(['1', 'Ana', '4'],
dtype='|S5'),
array(['6', 'Joao', '3'],
dtype='|S5'),
array(['5', 'Maria', '4'],
dtype='|S5')]
Next, you can filter out values which have the second item same like so:
In [55]: filter(lambda x: x[1] == a1[0][1] , a1)
Out[55]:
[array(['2', 'Ana', '9'],
dtype='|S5'),
array(['1', 'Ana', '4'],
dtype='|S5')]
Save these values in a file with name a1[0][1]
Filter out the rest of the values (name this new list a1, which is not shown below):
In [56]: filter(lambda x: x[1] != a1[0][1] , a1)
Out[56]:
[array(['6', 'Joao', '3'],
dtype='|S5'),
array(['5', 'Maria', '4'],
dtype='|S5')]
Repeat until you find an empty list. You can either do a recursion or a nurmal loop. Anything is fine.
Hope this helps.

Python - Importing strings into a list, into another list :)

Basically I want to read strings from a text file, put them in lists three by three, and then put all those three by three lists into another list. Actually let me explain it better :)
Text file (just an example, I can structure it however I want):
party
sleep
study
--------
party
sleep
sleep
-----
study
sleep
party
---------
etc
From this, I want Python to create a list that looks like this:
List1 = [['party','sleep','study'],['party','sleep','sleep'],['study','sleep','party']etc]
But it's super hard. I was experimenting with something like:
test2 = open('test2.txt','r')
List=[]
for line in 'test2.txt':
a = test2.readline()
a = a.replace("\n","")
List.append(a)
print(List)
But this just does horrible horrible things. How to achieve this?
If you want to group the data in size of 3. Assumes your data in the text file is not grouped by any separator.
You need to read the file, sequentially and create a list. To group it you can use any of the known grouper algorithms
from itertools import izip, imap
with open("test.txt") as fin:
data = list(imap(list, izip(*[imap(str.strip, fin)]*3)))
pprint.pprint(data)
[['party', 'sleep', 'study'],
['party', 'sleep', 'sleep'],
['study', 'sleep', 'party']]
Steps of Execution
Create a Context Manager with the file object.
Strip each line. (Remove newline)
Using zip on the iterator list of size 3, ensures that the items are grouped as tuples of three items
Convert tuples to list
Convert the generator expression to a list.
Considering all are generator expressions, its done on a single iteration.
Instead, if your data is separated and grouped by a delimiter ------ you can use the itertools.groupby solution
from itertools import imap, groupby
class Key(object):
def __init__(self, sep):
self.sep = sep
self.count = 0
def __call__(self, line):
if line == self.sep: self.count += 1
return self.count
with open("test.txt") as fin:
data = [[e for e in v if "----------" not in e]
for k, v in groupby(imap(str.strip, fin), key = Key("----------"))]
pprint.pprint(data)
[['party', 'sleep', 'study'],
['party', 'sleep', 'sleep'],
['study', 'sleep', 'party']]
Steps of Execution
Create a Key Class, to increase a counter when ever the separator is encountered. The function call spits out the counter every-time its called apart from conditionally increasing it.
Create a Context Manager with the file object.
Strip each line. (Remove newline)
Group the data using itertools.groupby and using your custom key
Remove the separator from the grouped data and create a list of the groups.
You can try with this:
res = []
tmp = []
for i, line in enumerate(open('file.txt'), 1):
tmp.append(line.strip())
if i % 3 == 0:
res.append(tmp)
tmp = []
print(res)
I've assumed that you don't have the dashes.
Edit:
Here is an example for when you have dashes:
res = []
tmp = []
for i, line in enumerate(open('file.txt')):
if i % 4 == 0:
res.append(tmp)
tmp = []
continue
tmp.append(line.strip())
print(res)
First big problem:
for line in 'test2.txt':
gives you
't', 'e', 's', 't', '2', '.', 't', 'x', 't'
You need to loop through the file you open:
for line in test2:
Or, better:
with open("test2.txt", 'r') as f:
for line in f:
Next, you need to do one of two things:
If the line contains "-----", create a new sub-list (myList.append([]))
Otherwise, append the line to the last sub-list in your list (myList[-1].append(line))
Finally, your print at the end should not be so far indented; currently, it prints for every line, rather than just when the processing is complete.
List.append(a)
print(List)
Perhaps a better structure for your file would be:
party,sleep,study
party,sleep,sleep
...
Now each line is a sub-list:
for line in f:
myList.append(line.split(','))

Categories

Resources