Is there a way to remove a row from a CSV file without rewriting then entire thing?
Currently, I am using a dictionary 'db' that contains the database with the row I want to delete, first I read the columns, then a completely rewrite every row in the CSV besides for the row with the ID I want to delete, is there a way to do this without having to rewrite everything?
def remove_from_csv(file_name, id, db):
with open(file_name, "r") as f:
reader = csv.reader(f)
i = next(reader)
with open(file_name, 'w') as f:
writer = csv.writer(f, lineterminator='\n')
writer.writerow(i)
for i in db:
if id != i:
for j in db[i]:
writer.writerow([i, j, db[i][j]])
A way I have done so in the past is to use a pandas dataframe and the drop function based on the row index or label.
For example:
import pandas as pd
df = pd.read_csv('yourFile.csv')
newDf = df.drop('rowLabel')
or by index position:
newDf =df.drop(df.index[indexNumber])
Related
I want to go through large CSV files and if there is missing data I want to remove that row completely, This is only row specific so if there is a cell that = 0 or has no value then I want to remove the entire row. I want this to happen for all the columns so if any column has a black cell it should delete the row, and return the corrected data in a corrected csv.
import csv
with open('data.csv', 'r') as csvfile:
csvreader = csv.reader(csvfile)
for row in csvreader:
print(row)
if not row[0]:
print("12")
This is what I found and tried but it doesnt not seem to be working and I dont have any ideas about how to aproach this problem, help please?
Thanks!
Due to the way in which CSV reader presents rows of data, you need to know how many columns there are in the original CSV file. For example, if the CSV file content looks like this:
1,2
3,
4
Then the lists return by iterating over the reader would look like this:
['1','2']
['3','']
['4']
As you can see, the third row only has one column whereas the first and second rows have 2 columns albeit that one is (effectively) empty.
This function allows you to either specify the number of columns (if you know them before hand) or allow the function to figure it out. If not specified then it is assumed that the number of columns is the greatest number of columns found in any row.
So...
import csv
DELIMITER = ','
def valid_column(col):
try:
return float(col) != 0
except ValueError:
pass
return len(col.strip()) > 0
def fix_csv(input_file, output_file, cols=0):
if cols == 0:
with open(input_file, newline='') as indata:
cols = max(len(row) for row in csv.reader(indata, delimiter=DELIMITER))
with open(input_file, newline='') as indata, open(output_file, 'w', newline='') as outdata:
writer = csv.writer(outdata, delimiter=DELIMITER)
for row in csv.reader(indata, delimiter=DELIMITER):
if len(row) == cols:
if all(valid_column(col) for col in row):
writer.writerow(row)
fix_csv('original.csv', 'fixed.csv')
maybe like this
import csv
with open('data.csv', 'r') as csvfile:
csvreader = csv.reader(csvfile)
data=list(csvreader)
data=[x for x in data if '' not in x and '0' not in x]
you can then rewrite the the csv file if you like
Instead of using csv, you should use Pandas module, something like this.
import pandas as pd
df = pd.read_csv('file.csv')
print(df)
index = 1 #index of the row that you want to remove
df = df.drop(index)
print(df)
df.to_csv('file.csv')
Trying to read in a CSV, adding a row at the bottom, and deleting a row at the top. I have not been able to find a way to delete a row in the dictwriter object without converting to a list, deleting the row in the list, then writing it out using csv.writer.
Should be a better way than reading/writing twice.
python3.8, ubuntu
Thx.
stime = get_time_str()
new_dict = {'Time': stime, 'Queries': Querycounter.value}
Querycounter.value = 0
# list of column names
field_names = ['Time', 'Queries']
# Open CSV file in append mode
#append the new queries count at the end of the file
with open(AlpacaQueriesCSVfile, 'a') as f_object:
dictwriter_object = DictWriter(f_object, fieldnames=field_names)
dictwriter_object.writerow(new_dict)
f_object.close()
#open using csv.reader, delete the rows(s).
with open(AlpacaQueriesCSVfile, "r") as f:
reader = csv.reader(f, delimiter=",")
data = list(reader) #should be a better way of doing this by deleting rows in the dictwriter_object above....later
row_count = len(data)
if row_count > 2880:
logger.debug('Deleting row from Queries.csv ')
to_skip = row_count-2880
del data[1:to_skip] # leave first row
with open(QueriesCSVfile, 'w') as f:
write = csv.writer(f)
write.writerows(data)
What I want to do is actually as it is written in the title.
with open(path, "r+", newline='') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
list_of_column_names = []
num_cols = len(next(csv_reader))
for i in range(num_cols):
list_of_column_names.append(i)
fields = list_of_column_names
with open(example.csv, "r+", newline='') as writeFile:
csvwriter = csv.DictWriter(writeFile, delimiter=',', lineterminator='\n', fieldnames=fields)
writeFile.seek(0, 0)
csvwriter.writeheader()
I want to enumerate the columns which initially doesn't have any column names. But when I run the code, it replaces the data in the first row. For example:
example.csv:
a,b
c,d
e,f
what I want:
0,1
a,b
c,d
e,f
what happens after running the code:
0,1
c,d
e,f
Is there a way to prevent this from happening?
There's no magical way to insert a line into an existing text file.
The following is how I think of doing this, and your code is already getting steps 2-4. Also, I wouldn't mess with the DictWriter since you're not trying to convert a Python dict to CSV (I can see you using it for writing the header, but that's easy enough to do with the regular reader/writer):
open a new file for writing
read the first row of your CSV
interpret the column indexes as the header
write the header
write the first row
read/write the rest of the rows
move the new file back to the old file, overwrite (not shown)
Here's what that looks like in code:
import csv
with open('output.csv', 'w', newline='') as out_f:
writer = csv.writer(out_f)
with open('input.csv', newline='') as in_f:
reader = csv.reader(in_f)
# Read the first row
first_row = next(reader)
# Count the columns in first row; equivalent to your `for i in range(len(first_row)): ...`
header = [i for i, _ in enumerate(first_row)]
# Write header and first row
writer.writerow(header)
writer.writerow(first_row)
# Write rest of rows
for row in reader:
writer.writerow(row)
I have an array LiveTick = ['ted3m index','US0003m index','USGG3m index'] and I am reading a CSV file book1.csv. I have to find the row which contains the values in csv.
For example, 15th row will contain ted3m index 500 | 600 and 20th row will contain US0003m index 800 | 900 and likewise.
I then have to get the values contained in the row and parse it for each value contained in array LiveTick. How do I proceed? Below is my sample code:
with open('C:\\blp\\book1.csv', 'r') as f:
reader = csv.reader(f, delimiter=',')
writer = csv.writer(outf)
for row in reader:
for list in LiveTick:
if list in row:
print ('Found: {}'.format(row))
You can use pandas, it's pretty fast and will do all reading, writing and filtering job for you out of the box:
import pandas as pd
df = pd.read_csv('C:\\blp\\book1.csv')
filtered_df = df[df['your_column_name'].isin(LiveTick)]
# now you can save it
filtered_df.to_csv('C:\\blp\\book_filtered.csv')
You have the right idea, but there are a few improvements you can make:
Instead of a nested for loop which doesn't short-circuit, use any to compare the first column to multiple values.
Write to your csv as you go along instead of just print. This is memory-efficient, as you hold in memory only one line at any one time.
Define outf as an open object in your with statement.
Do not shadow built-in list. Use another identifier, e.g. i, for elements in LiveTick.
Here's a demo:
with open('in.csv', 'r') as f, open('out.csv', 'wb', newline='') as outf:
reader = csv.reader(f, delimiter=',')
writer = csv.writer(outf, delimiter=',')
for row in reader:
if any(i in row[0] for i in LiveTick):
writer.writerow(row)
I have a file "TAB.csv" with many columns. I would like to choose one column without header (index of that column is 3) from CSV file. Then create a new text file "NEW.txt" and write there that column (without header).
Below code reads that column but with the header. How to omit the header and save that column in a new text file?
import csv
with open('TAB.csv','rb') as f:
reader = csv.reader(f)
for row in reader:
print row[3]
This is the solution #tmrlvi was talking: it skips the first row (header) via next function:
import csv
with open('TAB.csv','rb') as input_file:
reader = csv.reader(input_file)
output_file = open('output.csv','w')
next(reader, None)
for row in reader:
row_str = row[3]
output_file.write(row_str + '\n')
output_file.close()
Try this:
import csv
with open('TAB.csv', 'rb') as f, open('out.txt', 'wb') as g:
reader = csv.reader(f)
next(reader) # skip header
g.writelines(row[3] + '\n' for row in reader)
enumerate is a nice function that returns a tuple. It enables to to view the index while running over an iterator.
import csv
with open('NEW.txt','wb') as outfile:
with open('TAB.csv','rb') as f:
reader = csv.reader(f)
for index, row in enumerate(reader):
if index > 0:
outfile.write(row[3])
outfile.write("\n")
Another solution would be to read one line from the file (in order to skip the header).
It's an old question but I would like to add my answer about Pandas library, I would like to say. It's better to use Pandas library for such tasks instead of writing your own code. And the simple code with Pandas will be like :
import pandas as pd
reader = pd.read_csv('TAB.csv', header = None)