I have a csv file where I need to delete the second and the third row and 3rd to 18th column. I was able to do get it to work in two steps, which produced an interim file. I am thinking that there must be a better and more compact way to do this. Any suggestions would be really appreciated.
Also, if I want to remove multiple ranges of columns, how do I specify in this code. For example, if I want to remove columns 25 to 29, in addition to columns 3 to 18 already specified, how would I add to the code? Thanks
remove_from = 2
remove_to = 17
with open('file_a.csv', 'rb') as infile, open('interim.csv', 'wb') as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
for row in reader:
del row[remove_from : remove_to]
writer.writerow(row)
with open('interim.csv', 'rb') as infile, open('file_b.csv', 'wb') as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
writer.writerow(next(reader))
reader.next()
reader.next()
for row in reader:
writer.writerow(row)
Here is a pandas approach:
Step 1, creating a sample dataframe
import pandas as pd
# Create sample CSV-file (100x100)
df = pd.DataFrame(np.arange(10000).reshape(100,100))
df.to_csv('test.csv', index=False)
Step 2, doing the magic
import pandas as pd
import numpy as np
# Read first row to determine size of columns
size = pd.read_csv('test.csv',nrows=0).shape[1]
#want to remove columns 25 to 29, in addition to columns 3 to 18 already specified,
# Ok so let's create an array with the length of dataframe deleting the ranges
ranges = np.r_[3:19,25:30]
ar = np.delete(np.arange(size),ranges)
# Now let's read the dataframe
# let us also skip rows 2 and 3
df = pd.read_csv('test.csv', skiprows=[2,3], usecols=ar)
# And output
dt.to_csv('output.csv', index=False)
And the proof:
Related
I want to go through large CSV files and if there is missing data I want to remove that row completely, This is only row specific so if there is a cell that = 0 or has no value then I want to remove the entire row. I want this to happen for all the columns so if any column has a black cell it should delete the row, and return the corrected data in a corrected csv.
import csv
with open('data.csv', 'r') as csvfile:
csvreader = csv.reader(csvfile)
for row in csvreader:
print(row)
if not row[0]:
print("12")
This is what I found and tried but it doesnt not seem to be working and I dont have any ideas about how to aproach this problem, help please?
Thanks!
Due to the way in which CSV reader presents rows of data, you need to know how many columns there are in the original CSV file. For example, if the CSV file content looks like this:
1,2
3,
4
Then the lists return by iterating over the reader would look like this:
['1','2']
['3','']
['4']
As you can see, the third row only has one column whereas the first and second rows have 2 columns albeit that one is (effectively) empty.
This function allows you to either specify the number of columns (if you know them before hand) or allow the function to figure it out. If not specified then it is assumed that the number of columns is the greatest number of columns found in any row.
So...
import csv
DELIMITER = ','
def valid_column(col):
try:
return float(col) != 0
except ValueError:
pass
return len(col.strip()) > 0
def fix_csv(input_file, output_file, cols=0):
if cols == 0:
with open(input_file, newline='') as indata:
cols = max(len(row) for row in csv.reader(indata, delimiter=DELIMITER))
with open(input_file, newline='') as indata, open(output_file, 'w', newline='') as outdata:
writer = csv.writer(outdata, delimiter=DELIMITER)
for row in csv.reader(indata, delimiter=DELIMITER):
if len(row) == cols:
if all(valid_column(col) for col in row):
writer.writerow(row)
fix_csv('original.csv', 'fixed.csv')
maybe like this
import csv
with open('data.csv', 'r') as csvfile:
csvreader = csv.reader(csvfile)
data=list(csvreader)
data=[x for x in data if '' not in x and '0' not in x]
you can then rewrite the the csv file if you like
Instead of using csv, you should use Pandas module, something like this.
import pandas as pd
df = pd.read_csv('file.csv')
print(df)
index = 1 #index of the row that you want to remove
df = df.drop(index)
print(df)
df.to_csv('file.csv')
I have to read from an csv all columns starting from the forth column. For example from row[4] until the end of line. Any ideas how I achieve that?
for col in range(4,row):
covid.addCell(row[col])
csv.reader returns rows as lists, so you can slice each row to get a list made of the columns from the fourth to the final column
import csv
with open('mycsv.csv', newline='') as f:
reader = csv.reader(f)
for row in reader:
chunk = row[3:]
# do something with chunk (which is a list)
What have you tried so far? Where is your code/attempt?
import pandas as pd
df = pd.read_csv("XXX.csv")
df_selected = df.iloc[ :, 3: ] # from 4th column, reminder: in python we count from 0
Here's a sample csv file;
out_gate,uless_col,in_gate,n_con
p,x,x,1
p,x,y,1
p,x,z,1
a_a,u,b,1
a_a,s,b,3
a_b,e,a,2
a_b,l,c,4
a_c,e,a,5
a_c,s,b,5
a_c,s,b,3
a_c,c,a,4
a_d,o,c,2
a_d,l,c,3
a_d,m,b,2
p,y,x,1
p,y,y,1
p,y,z,3
I want to remove the useless columns (2nd column) and useless rows (first three and last three rows) and create a new csv file and then save this new one. and How can I deal with the csv file that has more than 10 useless columns and useless rows?
(assuming useless rows are located only on the top or the bottom lines not scattered in the middle)
(and I am also assuming all the rows we want to use has its first element name starting with 'a_')
Can I get solution without using numpys or pandas as well? thanks!
Assuming that you have one or more unwanted columns and the wanted rows start with "a_".
import csv
with open('filename.csv') as infile:
reader = csv.reader(infile)
header = next(reader)
data = list(reader)
useless = set(['uless_col', 'n_con']) # Let's say there are 2 useless columns
mask, new_header = zip(*[(i,name) for i,name in enumerate(header)
if name not in useless])
#(0,2) - column mask
#('out_gate', 'in_gate') - new column headers
new_data = [[row[i] for i in mask] for row in data] # Remove unwanted columns
new_data = [row for row in new_data if row[0].startswith("a_")] # Remove unwanted rows
with open('filename.csv', 'w') as outfile:
writer = csv.writer(outfile)
writer.writerow(new_header)
writer.writerows(new_data)
You can try this:
import csv
data = list(csv.reader(open('filename.csv')))
header = [data[0][0]]+data[0][2:]
final_data = [[i[0]]+i[2:] for i in data[1:]][3:-3]
with open('filename.csv', 'w') as f:
write = csv.writer(f)
write.writerows([header]+final_data)
Output:
out_gate,in_gate,n_con
a,b,1
a,b,3
b,a,2
b,c,4
c,a,5
c,b,5
c,b,3
c,a,4
d,c,2
d,c,3
d,b,2
Below solution uses Pandas.
As the pandas dataframe drop function suggests, you can do the following:
import pandas as pd
df = pd.read_csv("csv_name.csv")
df.drop(columns=['ulesscol'])
Above code is considering dropping columns, you can drop rows by index as:
df.drop([0, 1])
Alternatively, don't read in the column in the first place:
df = pd.read_csv("csv_name.csv",
usecols=["out_gate", "in_gate", "n_con"])
I have a code to read csv file by row
import csv
with open('example.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
for row in readCSV:
print(row)
print(row[0])
But i want only selected columns what is the technique could anyone give me a script?
import csv
with open('example.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
column_one = [row[0] for row in readCSV ]
Will give you list of values from the first column. That being said - you'll have to read the entire file anyway.
You can't do that, because files are written byte-by-byte to your filesystem. To know where one line ends, you will have to read all the line to detect the presence of a line-break character. There's no way around this in a CSV.
So you'll have to read all the file -- but you can choose which parts of each row you want to keep.
I would definitely use pandas for that.
However, in plain python this one of the way to do it.
In this example I am extracting the content of row 3, column 4.
import csv
target_row = 3
target_col = 4
with open('yourfile.csv', 'rb') as csvfile:
reader = csv.reader(csvfile)
n = 0
for row in reader:
if row == target_row:
data = row.split()[target_col]
break
print data
read_csv in pandas module can load a subset of columns.
Assume you only want to load columns 1 and 3 in your .csv file.
import pandas as pd
usecols = [1, 3]
df = pd.read_csv('example.csv',usecols=usecols, sep=',')
Here is Doc for read_csv.
In addition, if your file is big, you can read the file piece by piece by specifying chucksize in read_csv
I have a csv which contains 38 colums of data, all I want to find our how to do is, divide column 11 by column by column 38 and append this data tot he end of each row. Missing out the title row of the csv (row 1.)
If I am able to get a snippet of code that can do this, I will be able to manipulate the same code to perform lots of similar functions.
My attempt involved editing some code that was designed for something else.
See below:
from collections import defaultdict
class_col = 11
data_col = 38
# Read in the data
with open('test.csv', 'r') as f:
# if you have a header on the file
# header = f.readline().strip().split(',')
data = [line.strip().split(',') for line in f]
# Append the relevant sum to the end of each row
for row in xrange(len(data)):
data[row].append(int(class_col)/int(data_col))
# Write the results to a new csv file
with open('testMODIFIED2.csv', 'w') as nf:
nf.write('\n'.join(','.join(row) for row in data))
Any help will be greatly appreciated. Thanks SMNALLY
import csv
with open('test.csv', 'rb') as old_csv:
csv_reader = csv.reader(old_csv)
with open('testMODIFIED2.csv', 'wb') as new_csv:
csv_writer = csv.writer(new_csv)
for i, row in enumerate(csv_reader):
if i != 0:
row.append(float(row[10]) / float(row[37]))
csv_writer.writerow(row)
Use pandas:
import pandas
df = pandas.read_csv('test.csv') #assumes header row exists
df['FRACTION'] = 1.0*df['CLASS']/df['DATA'] #by default new columns are appended to the end
df.to_csv('out.csv')