I am trying to use Python's built-in filter function in order to extract data from certain columns in a CSV. Is this a good use of the filter function? Would I have to define the data in these columns first, or would Python somehow already know which columns contain what data?
Since python boasted "batteries included", for most the everyday situations, someone might already provided a solution.
CSV is one of them, there is built-in csv module
Also tablib is a very good 3rd-party module especially you're dealing with non-ascii data.
For the behaviour you described in the comment, this will do:
import csv
with open('some.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
row.pop(1)
print ", ".join(row)
The filter function is intended to select from a list (or in general, any iterable) those elements which satisfy a certain condition. It's not really intended for index-based selection. So although you could use it to pick out specified columns of a CSV file, I wouldn't recommend it. Instead you should probably use something like this:
with open(filename, 'rb') as f:
for record in csv.reader(f):
do_something_with(record[0], record[2])
Depending on what exactly you are doing with the records, it may be better to create an iterator over the columns of interest:
with open(filename, 'rb') as f:
the_iterator = ((record[0], record[2]) for record in csv.reader(f))
# do something with the iterator
or, if you need non-sequential processing, perhaps a list:
with open(filename, 'rb') as f:
the_list = [(record[0], record[2]) for record in csv.reader(f)]
# do something with the list
I'm not sure what you mean by defining the data in the columns. The data are defined by the CSV file.
By comparison, here's a case in which you would want to use filter: suppose your CSV file contains numeric data, and you need to build a list of the records in which the numbers are in strictly increasing order within the row. You could write a function to determine whether a list of numbers is in strictly increasing order:
def strictly_increasing(fields):
return all(int(i) < int(j) for i,j in pairwise(fields))
(see the itertools documentation for a definition of pairwise). Then you can use this as the condition in filter:
with open(filename, 'rb') as f:
the_list = filter(strictly_increasing, csv.reader(f))
# do something with the list
Of course, the same thing could, and usually would, be implemented as a list comprehension:
with open(filename, 'rb') as f:
the_list = [record for record in csv.reader(f) if strictly_increasing(record)]
# do something with the list
so there's little reason to use filter in practice.
Related
I have a csv file of size that would not fit in the memory of my machine. So I want to open the csv file and then read it's rows one at a time. I basically want to make a python generator that yields single rows from the csv.
Thanks in advance! :)
with open(filename, "r") as file:
for line in file:
doanything()
Python is lazy whenever possible. File objects are generators and do not load the entire file but only one line at a time.
Solution:
You can use chunksize param available in pandas read_csv function
chunksize = 10 ** 6
for chunk in pd.read_csv(filename, chunksize=chunksize):
print(type(chunk))
# CODE HERE
set chunksize to 1 and it should take care of your problem statement.
My personal preference for doing this is with csv.DictReader
You set it up as an object, with pointers/parameters, and then to access the file one row at a time, you just iterate over it with next and it returns a dictionary containing the named field key, value pairs in your csv file.
e.g.
import csv
csvfile = open('names.csv')
my_reader = csv.DictReader(csvfile)
first_row = next(my_reader)
for row in my_reader:
print ( [(k,v) for k,v in row.items() ] )
csvfile.close()
See the linked docs for parameter usage etc - it's fairly straightforward.
python generator that yields single rows from the csv.
This sounds like you want csv.reader from built-in csv module. You will get one list for each line in file.
At the moment, I know how to write a list to one row in csv, but when there're multiple rows, they are still written in one row. What I would like to do is write the first list in the first row of csv, and the second list to the second row.
Code:
for i in range(10):
final=[i*1, i*2, i*3]
with open ('0514test.csv', 'a') as file:
file.write(','.join(str(i) for i in kk))
You may want to add linebreak to every ending of row. You can do so by typing for example:
file.write('\n')
The csv module from standard library provides objects to read and write CSV files. In your case, you could do:
import csv
for i in range(10):
final = [i*1, i*2, i*3]
with open("0514test.csv", "a", newline="") as file:
writer = csv.writer(file)
writer.writerow(final)
Using this module is often safer in real life situations because it takes care of all the CSV machinery (adding delimiters like " or ', managing the cases in which your separator is also present in your data etc.)
I would like to write a simple dict to CSV for later reuse in a Pythonic manner.
The approaches I have found online do not seem elegant and Pythonic to me, so I am looking for a better solution. For example:
How do I write a Python dictionary to a csv file?
How to export dictionary as CSV using Python?
dic = {1:"a",
2:"b",
3:"c"}
Output:
1,a
2,b
3,c
The Q&As you're refering to are different. They're assuming that the keys are the first row / the same for all data (hence the advice of using DictWriter)
Here, your dictionary is considered as a list of key/value pairs, and it's not ordered, so if order matters, just sort it according to the elements (which only considers keys since they're unique).
import csv
dic = {1:"a",
2:"b",
3:"c"}
with open("out.csv","w",newline="") as f: # python 2: open("out.csv","wb")
cw = csv.writer(f) # default separator is ",", no need for more params
cw.writerows(sorted(dic.items()))
this approach is very fast because you're not accessing the values by key (no need), stable (the output is the same everytime thanks to the sort function), and uses no loop thanks to csv.writerows. If the order doesn't matter, replace the last line by:
cw.writerows(dic.items())
csv module automatically converts items to strings when writing so it's not an issue when the data type is integer like in your example.
you can do easily with pandas,
import pandas as pd
df=pd.DataFrame(sorted(dic.items()))
df.to_csv("your_file.csv",index=False)
Try this:
import csv
dic = {1:"a",
2:"b",
3:"c"}
with open('output.csv', 'wb') as f:
writer = csv.writer(f)
for key in dic:
writer.writerow([key, dic[key]])
Maybe a more pythonic way:
with open('dic.csv', 'wb') as csv_file:
csv.writer(csv_file).writerows(dic.items())
This is my code that adds the data to the CSV file known as studentScores.csv
myfile = open("studentScores.csv", "a+")
newRecord = Score, Name, Gender, FormGroup, Percentage
myfile.write(str(newRecord))
myfile.write("\n")
myfile.close()
As a part of my task, I need to alphabetise the data in the CSV, I have searched, and searched for a solution, but I am unable to find a working solution for me. I am pretty new to Python, so the simplest solution will be appreciated.
import csv
from operator import itemgetter
with open('studentScores.csv', 'r') as f:
data = [line for line in csv.reader(f)]
newRecord = [Score, Name, Gender, FormGroup, Percentage]
data.append(newRecord)
data.sort(key=itemgetter(1)) # 1 being the column number
with open('studentScores.csv', 'w') as f:
csv.writer(f).writerows(data)
First of all, this uses functions from the csv module for properly parsing and creating CSV syntax. Secondly, it reads all existing entries into data, appends the new record, sorts all records, then dumps them back to the file.
If you're using a header row in your CSV file to add names to columns, look at DictReader and DictWriter, that would allow you to handle columns by name, not number (e.g. in the sorting step).
help please write data horizontally in csv-file.
the following code writes this:
import csv
with open('some.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows('jgjh')
but I need so
The csv module deals in sequences; use writer.writerow() (singular) and give it a list of one column:
writer.writerow(['jgjh'])
The .writerow() method will take each element in the sequence you give it and make it one column. Best to give it a list of columns, and ['jgjh'] makes one such column.
.writerows() (plural) expects a sequence of such rows; for your example you'd have to wrap the one row into another list to make it a series of rows, so writer.writerows([['jgjh']]) would also achieve what you want.