Issue CSV module python - python

I created a program to write a simple .csv (code below):
opencsv = open('agentstatus.csv', 'w')
a = csv.writer(opencsv)
data = [[agents125N],
[okstatusN],
[warningstatusN],
[criticalstatusN],
[agentdisabledN],
[agentslegacyN]]
a.writerows(data)
opencsv.close()
The .csv looks like this (it's with empty rows in the middle, but it's not a problem):
36111
96
25887
10128
7
398
Now I am trying to read the .csv and store each of this numbers in a variable, but without success, see below an example for the number 36111:
import csv
with open('agentstatus.csv', 'r') as csvfile:
f = csv.reader(csvfile)
for row in f:
firstvalue = row[0]
However, I get the error:
line 6, in <module>
firstvalue = row[0]
IndexError: list index out of range
Could you support me here?

Your file contains empty lines, so you need to check the length of the row:
values = []
for row in f:
if len(row) > 0:
values.append(row[0])
values is now ['36111', '96', '25887', '10128', '7', '398']

At the moment you're writing 6 rows into a csv file, with each row containing one column. To make a single row with six columns, you need to use a list of values, not each value in its own list.
ie change
data = [[agents125N], [okstatusN], [warningstatusN], [criticalstatusN], [agentdisabledN], [agentslegacyN]]
to
data = [[agents125N, okstatusN, warningstatusN, criticalstatusN, agentdisabledN, agentslegacyN]]
(a list containing one list of six values). Writing this with csv.writerows will result in
36111, 96, 25887, 10128, 7, 398
row[1] in your reading loop will return 96.

Related

Sorting csv file with over 171 columns, but some rows have fewer of them?

I have a CSV file of 1 GB with around 1 million records, each row is 171 columns, I did some research and came up with this code. I have reduced the size of file to 5 MB for testing purposes, but there are still 171 columns. The code works fine as long as the sorting column index is below 50. Even on 49 it works fine, but I have columns with index 151, 153. I want to sort the file with those columns.
Error:
When I give it index 50 or above it throws the error:
data.sort(key=operator.itemgetter(*sort_key_columns))
IndexError: list index out of range
My Code:
def sort_csv(csv_filename, sort_key_columns):
data = []
with open(csv_filename, 'r') as f:
for row in csv.reader(f):
data.append(row)
data.sort(key=operator.itemgetter(*sort_key_columns))
with open(csv_filename, 'w', newline='') as f:
csv.writer(f).writerows(data)
sort_csv('Huge_Complete_B2B_file_1_1.csv', [49])
You can handle the short row by writing your own version of operator.itemgetter which you can base on the code equivalent to it which is shown in its online documentation.
The custom version below simply supplies a specified value for any that are missing. This will cause the row to be sorted as though it had that value in it at that indexed position.
Note: That this assumes all the missing items should use the same default MISSING value. If that's not the case, it could be enhanced to allow a different one to be specified for each index in the sequence of them passed to it. This would likely require an additional argument.
import csv
import operator
def my_itemgetter(*indexes, MISSING=''):
if len(indexes) == 1:
index = indexes[0]
def getter(obj):
try:
return obj[index]
except IndexError:
return MISSING
else:
def getter(obj):
try:
return tuple(obj[index] for index in indexes)
except IndexError:
return tuple(obj[index] if index < len(obj) else MISSING
for index in indexes)
return getter
def sort_csv(csv_filename, sort_key_columns):
with open(csv_filename, 'r', newline='') as f:
data = [row for row in csv.reader(f)]
data.sort(key=my_itemgetter(*sort_key_columns))
with open(csv_filename, 'w', newline='') as f:
csv.writer(f).writerows(data)
sort_csv('Huge_Complete_B2B_file_1_1.csv', [0, 171])
It seems that one of your files contains a truncated row below 51 columns.
If you don't care about your input being corrupt, you could filter it out while reading the input & sort it, in one line:
def sort_csv(csv_filename, sort_key_columns):
with open(csv_filename, 'r') as f:
data = sorted([row for row in csv.reader(f) if len(row)>=171],key=operator.itemgetter(*sort_key_columns))
# then write the file

How do I find the column headers of various types of csv files in a folder?

I have an issue where I need to intake different files, with different column locations. One files column might start 4 rows down, whereas another files columns might start on row one.
One file might look like this:
This
is
a
column 1, column 2, column 3, column 4
Another might have columns like this on row 1:
column 1, column 2, column 3
I need to get a list of every files column headers. I consider a column header a list greater than 3 items. If I'm using the csv module how can I write this?
I have something like:
temprow = next(csvfile)
for value in temprow:
if value == '':
temprow = next(csvfile)
if len(value) > 3:
header = temprow
else:
header = temprow
This is not quite working as it is also returning columns that contain 1 string.
Try this:
with open('yourfile.csv', 'r') as f:
for line in f: # iterate for each line
if "," in line: # the header line should contain comma
header = line
break # break the loop when header line is found
print(header)
Output:
column 1, column 2, column 3, column 4
According to the specifications in your post, this code works. It returns the first row in a .csv file that has 4 or more elements ('greater than 3 items').
headers = [] # Column names will be appended to this list
files = ['./test'] # Insert files here
for f in files: # Loop over files
with open(f, 'r') as fh: # Open file
reader = csv.reader(fh, delimiter = ',') # Create reader
for row in reader: # Loop over rows
if len(row) >= 4: # Criteria for appending to headers
headers.append(row)

Reformat .csv in python: count commas in a row and insert line break after specific number of commas

I'm new to python and looking for a script that reformats a .csv file. So in my .csv files there are rows which are not formatted correctly. It does look similar to this:
id,author,text,date,id,author,
text,date
id,author,text,date
id,author,text,date
It's supposed to have "id,author,text,date" on each line. So my idea was to count the commas in each row and when a specific number is achieved (in this example 4) it will insert the remainder at the beginning of the next row. What I got is the following which counts the commas in one row:
import csv
with open("test.csv") as f:
r = csv.reader(f) # create rows split on commas
for row in r:
com_count = 0
com_count += len(row)
print(com_count)
Thanks for your help!
We're going to build a generator that yields entries and then build the new rows from that
with open('oldfile.csv', newline='') as old:
r = csv.reader(old)
num_cols = int(input("How many columns: "))
entry_generator = (entry for row in r for entry in row)
with open('newfile.csv', 'w+', newline='') as newfile:
w = csv.writer(newfile)
while True:
try:
w.writerow([next(entry_generator) for _ in range(num_cols)])
except StopIteration:
break
This will not work if you have a row that is missing entries.
If you want to handle getting the column width programmatically, you can either wrap this in a function that takes a width as input, or use the first row of the csv as a canonical length

Remove columns + keep certain rows in multiple large .csv files using python

Hello I'm really new here as well as in the world of python.
I have some (~1000) .csv files, including ~ 1800000 rows of information each. The files are in the following form:
5302730,131841,-0.29999999999999999,NULL,2013-12-31 22:00:46.773
5303072,188420,28.199999999999999,NULL,2013-12-31 22:27:46.863
5350066,131841,0.29999999999999999,NULL,2014-01-01 00:37:21.023
5385220,-268368577,4.5,NULL,2014-01-01 03:12:14.163
5305752,-268368587,5.1900000000000004,NULL,2014-01-01 03:11:55.207
So, i would like for all of the files:
(1) to remove the 4th (NULL) column
(2) to keep in every file only certain rows (depending on the value of the first column i.e.5302730, keep only the rows that containing that value)
I don't know if this is even possible, so any answer is appreciated!
Thanks in advance.
Have a look at the csv module
One can use the csv.reader function to generate an iterator of lines, with each lines cells as a list.
for line in csv.reader(open("filename.csv")):
# Remove 4th column, remember python starts counting at 0
line = line[:3] + line[4:]
if line[0] == "thevalueforthefirstcolumn":
dosomethingwith(line)
If you wish to do this sort of operation with CSV files more than once and want to use different parameters regarding column to skip, column to use as key and what to filter on, you can use something like this:
import csv
def read_csv(filename, column_to_skip=None, key_column=0, key_filter=None):
data_from_csv = []
with open(filename) as csvfile:
csv_reader = csv.reader(csvfile)
for row in csv_reader:
# Skip data in specific column
if column_to_skip is not None:
del row[column_to_skip]
# Filter out rows where the key doesn't match
if key_filter is not None:
key = row[key_column]
if key_filter != key:
continue
data_from_csv.append(row)
return data_from_csv
def write_csv(filename, data_to_write):
with open(filename, 'w') as csvfile:
csv_writer = csv.writer(csvfile)
for row in data_to_write:
csv_writer.writerow(row)
data = read_csv('data.csv', column_to_skip=3, key_filter='5302730')
write_csv('data2.csv', data)

Reading from CSV and filtering columns

I have a CSV file.
There are a fixed number of columns and an unknown number of rows.
The information I need is always in the same 2 columns but not in the same row.
When column 6 has a 17 character value I also need to get the data from column 0.
This is an example row from the CSV file:
E4:DD:EF:1C:00:4F, 2012-10-08 11:29:04, 2012-10-08 11:29:56, -75, 9, 18:35:2C:18:16:ED,
You could open the file and go through it line by line. Split the line and if element 6 has 17 characters append element 0 to your result array.
f = open(file_name, 'r')
res = []
for line in f:
L = line.split(',')
If len(L[6])==17:
res.append(L[0])
Now you have a list with all the elements in column 6 of you cvs.
You can use csv module to read the csv files and you can provide delimiter/dialect as you need (, or | or tab etc..) while reading the file using csv reader.
csv reader takes care of providing the row/record with columns as list of values. If you want access the csv record/row as dict then you can use DictReader and its methods.
import csv
res = []
with open('simple.csv', 'rb') as f:
reader = csv.reader(f, delimiter=',')
for row in reader:
# Index start with 0 so the 6th column will be 5th index
# Using strip method would trim the empty spaces of column value
# Check the length of columns is more than 5 to handle uneven columns
if len(row) > 5 and len(row[5].strip()) == 17:
res.append(row[0])
# Result with column 0 where column 6 has 17 char length
print res

Categories

Resources