Select specific columns from CSV file - python

My code is able to get the 28 columns of a text file and format/remove some data. How Can I select specific columns? The columns I want are 0 to 25, and column 28. What is the best approach?
Thanks in advance!
import csv
import os
my_file_name = os.path.abspath('NVG.txt')
cleaned_file = "cleanNVG.csv"
remove_words = ['INAC-EIM','-INAC','TO-INAC','TO_INAC','SHIP_TO-inac','SHIP_TOINAC']
with open(my_file_name, 'r', newline='') as infile, open(cleaned_file, 'w',newline='') as outfile:
writer = csv.writer(outfile)
cr = csv.reader(infile, delimiter='|')
writer.writerow(next(cr)[:28])
for line in (r[0:28] for r in cr):
if not any(remove_word in element for element in line for remove_word in remove_words):
line[11]= line[11][:5]
writer.writerow(line)
infile.close()
outfile.close()

Have a look at pandas.
import pandas as pd
usecols = list(range(26)) + [28]
data = pd.read_csv(my_file_name, usecols=usecols)
You can also conveniently write the data back to a new file
with open(cleaned_file, 'w') as f:
data.to_csv(f)

exclude column 26 and column27 from row using filter():
for row in cr:
content = list(filter(lambda x: row.index(x) not in [25,26], row))
# work with the selected columns content

Related

Python CSV, Combining multiple columns into one column using CSV

I've been trying to figure out a way to combine all the columns in a csv I have into one columns.
import csv
with open('test.csv') as f:
reader = csv.reader(f)
with open('output.csv', 'w') as g:
writer = csv.writer(g)
for row in reader:
new_row = [' '.join([row[0], row[1]])] + row[2:]
writer.writerow(new_row)
This worked to combine the first two columns, but I've been having trouble trying to loop it and get the rest of the columns into just one.
You should just pass row to .join because it's an array.
import csv
with open('test.csv') as f:
reader = csv.reader(f)
with open('output.csv', 'w') as g:
writer = csv.writer(g)
for row in reader:
new_row = [' '.join(row)] # <---- CHANGED HERE
writer.writerow(new_row)

Generate a new csv file and order data in ascending numeric order?

I have written a code that implements the given regex on every postcode that is included in the 'import_data.csv' file. It then generates a new csv file 'failed_validation.csv' which contains all the postcodes where the validation fails. The structure of both files is in the following format:
row_id postcode
134534 AABC 123
243534 AACD 4PQ
534345 QpCD 3DR
... ...
Following is my code:
import csv
import re
regex = r"(GIR\s0AA)|((([A-PR-UWYZ][0-9][0-9]?)|(([A-PR-UWYZ][A-HK-Y][0-9]((BR|FY|HA|HD|HG|HR|HS|HX|JE|LD|SM|SR|WC|WN|ZE)[0-9])[0-9])|([A-PR-UWYZ][A-HK-Y](AB|LL|SO)[0-9])|(WC[0-9][A-Z])|(([A-PR-UWYZ][0-9][A-HJKPSTUW])|([A-PR-UWYZ][A-HK-Y][0-9][ABEHMNPRVWXY]))))\s[0-9][ABD-HJLNP-UW-Z]{2})"
codes = []
with open('../import_data.csv','r') as f:
r = csv.reader(f, delimiter=',')
for row in r:
if not(re.findall(regex, row[1])):
codes.append([row[0],row[1]])
with open('failed_validation.csv','w',newline='') as fp:
a = csv.writer(fp)
a.writerows(codes)
The code works fine but what I actually want is the postcodes in the new file need to be ordered as per the row_id, in ascending numeric order. I know how to generate a new file with Python, but I don't know how to order the data inside that file in ascending numeric order.
This will do it and preserve the header row:
import csv
import re
regex = r"(GIR\s0AA)|((([A-PR-UWYZ][0-9][0-9]?)|(([A-PR-UWYZ][A-HK-Y][0-9]((BR|FY|HA|HD|HG|HR|HS|HX|JE|LD|SM|SR|WC|WN|ZE)[0-9])[0-9])|([A-PR-UWYZ][A-HK-Y](AB|LL|SO)[0-9])|(WC[0-9][A-Z])|(([A-PR-UWYZ][0-9][A-HJKPSTUW])|([A-PR-UWYZ][A-HK-Y][0-9][ABEHMNPRVWXY]))))\s[0-9][ABD-HJLNP-UW-Z]{2})"
codes = []
with open('import_data.csv', 'r', newline='') as fp:
reader = csv.reader(fp, delimiter=',')
header = next(reader)
for row in reader:
if not re.findall(regex, row[1]):
codes.append([row[0],row[1]])
with open('failed_validation.csv', 'w', newline='') as fp:
writer = csv.writer(fp)
writer.writerow(header)
writer.writerows(sorted(codes))
Sort your codes list before writing to the file.
headers = codes[0]
codes = sorted(codes[1:])
with open('failed_validation.csv','w',newline='') as fp:
a = csv.writer(fp)
a.writerow(header)
a.writerows(codes)

Delete blank columns from header row

I'm pretty new to python and I'm having trouble deleting the header columns after the 25th column. There are 8 more extra columns that have no data so I'm trying to delete those columns. Columns 1-25 have like 50,000k of data and the rest of the columns are blank.How would I do this? My code for now is able to clean up the file but I cant delete the headers for row[0] AFTER COLUMN 25.
Thanks
import csv
my_file_name = "NVG.txt"
cleaned_file = "cleanNVG.csv"
remove_words = ['INAC-EIM','-INAC','TO-INAC','TO_INAC','SHIP_TO-inac','SHIP_TOINAC']
with open(my_file_name, 'r', newline='') as infile, open(cleaned_file, 'w',newline='') as outfile:
writer = csv.writer(outfile)
cr = csv.reader(infile, delimiter='|')
writer.writerow(next(cr)) #I think this is why is not working
for line in (r[0:25] for r in cr):
#del line [26:32]
if not any(remove_word in element for element in line for remove_word in remove_words):
line[11]= line[11][:5]
writer.writerow(line)
You've found the line with the problem - all you have to do is only print the headers you want. next(cr) reads the header line, but you pass the entire line to writer.writerow().
Instead of
writer.writerow(next(cr))
you want:
writer.writerow(next(cr)[:25])
([:25] and [0:25] are the same in Python)

Add a new column to a csv file in python

I am trying to add a column to a csv file that combines strings from two other columns. Whenever I try this I either get an output csv with only the new column or an output with all of the original data and not the new column.
This is what I have so far:
with open(filename) as csvin:
readfile = csv.reader(csvin, delimiter=',')
with open(output, 'w') as csvout:
writefile = csv.writer(csvout, delimiter=',', lineterminator='\n')
for row in readfile:
result = [str(row[10]) + ' ' + str(row[11])]
writefile.writerow(result)
Any help would be appreciated.
No input to test, but try this. Your current approach doesn't include the existing data for each row that already exists in your input data. extend will take the list that represents each row and then add another item to that list... equivalent to adding a column.
import csv
with open(filename) as csvin:
readfile = csv.reader(csvin, delimiter=',')
with open(output, 'w') as csvout:
writefile = csv.writer(csvout, delimiter=',', lineterminator='\n')
for row in readfile:
row.extend([str(row[10]) + ' ' + str(row[11])])
writefile.writerow(row)
I assume that glayne wants to combine column 10 and 11 into one.
In my approach, I concentrate on how to transform a single row first:
def transform_row(input_row):
output_row = input_row[:]
output_row[10:12] = [' '.join(output_row[10:12])]
return output_row
Once tested to make sure that it works, I can move on to replace all rows:
with open('data.csv') as inf, open('out.csv', 'wb') as outf:
reader = csv.reader(inf)
writer = csv.writer(outf)
writer.writerows(transform_row(row) for row in reader)
Note that I use the writerows() method to write multiple rows in one statement.
Below code snippet combines strings in column 10 and column 11 in each row and add that to the end of the each row
import csv
input = 'test.csv'
output= 'output.csv'
with open(input, 'rb') as csvin:
readfile = csv.reader(csvin, delimiter=',')
with open(output, 'wb') as csvout:
writefile = csv.writer(csvout, delimiter=',', lineterminator='\n')
for row in readfile:
result = row + [row[10]+row[11]]
writefile.writerow(result)

Referring to CSV header string to adjust column format

I'm somewhat new to python and csv processing, but I couldn't find any solutions for what I'm looking for. When I open up a specific CSV file in excel, I have a column called "rate" that is in percent. I'm dividing all the values in this column by 100. As of now I'm referring to this column by calling row[6] = percentToFloat(row[6]). My question is if its possible to address the row by the header name rather than just the column number.
with open(input) as inFile:
reader = csv.reader(inFile)
reader.next()
with open(output, 'w') as outFile:
writer = csv.writer(outFile)
for row in reader:
if len(row)>1: #skips empty rows
row[6] = percentToFloat(row[6])
writer.writerow(row)
You could use data frames from Pandas
import pandas as pd
import numpy as np
df = pd.read_csv('data.csv', header=True)
print(df)
print(df.rate)
print(df.rate/100.0)
Use csv.DictReader :
reader = csv.DictReader(inFile)
Now you can use row['column_name'] instead of row[6] in your code.
Use csv.DictReader instead of csv.reader.
with open(input) as inFile:
reader = csv.DictReader(inFile)
rate_index = reader.fieldnames.index('rate')
reader.next()
with open(output, 'w') as outFile:
writer = csv.DictWriter(outFile, fieldnames=reader.fieldnames)
for row in reader:
if len(row)>1: #skips empty rows
row[rate_index] = percentToFloat(row[6])
writer.writerow(row)
Updated.

Categories

Resources