I am using pandas to load, modify and save csv files. Actually pandas and its dataframe functionality is only a workaround for me, as I do not need this. I only need it, because I have to modify my csv file.
I need to remove certain lines (rows). My current code is as follows:
import pandas as pd
test=pd.read_csv('myfile.csv', sep=';', skiprows=[0,1,3,4,6])
test.to_csv('myoutputfile.csv', index=False, sep=';')
I would like to maniuplate the csv file directly. I know that with import csv and next(reader) I could for example skip the first row. However, I need to skip these specific rows: skiprows=[0,1,3,4,6] and I don't know how to do this. So is there a way to modify the csv files without using pandas and save the changes?
generally, if you have a list of rows to skip, you can use something like this:
import csv
skiprows = (0,1,3,4,6)
with open('myfile.csv') as fp_in:
reader = csv.reader(fp_in, delimiter=';')
rows = [row for i, row in enumerate(reader) if i not in skiprows]
with open('myoutputfile.csv', 'w', newline='') as fp_out:
writer = csv.writer(fp_out)
writer.writerows(rows)
Related
So I have a several tables in the format of csv, I am using Python and the csv module. I want to extract a particular value, lets say column=80 row=109.
Here is a random example:
import csv
with open('hugetable.csv', 'r') as file:
reader = csv.reader(file)
print(reader[109][80])
I am doing this many times with large tables and I would like to avoid loading the whole table into an array (line 2 above) to ask for a single value. Is there a way to open the file, load the specific value and close it again? Would this process be more efficient than what I have done above?
Thanks for all the answers, all answers so far work pretty well.
You could try reading the file without csv library:
row = 108
column = 80
with open('hugetable.csv', 'r') as file:
header = next(file)
for _ in range(row-1):
_ = next(file)
line = next(file)
print(line.strip().split(',')[column])
You can try pandas to load only certain columns of your csv file
import pandas as pd
pd.read_csv('foo.csv',usecols=["column1", "column2"])
You could use pandas to load it
import pandas as pd
text = pd.read_csv('Book1.csv', sep=',', header=None, skiprows= 100, nrows=3)
print(text[50])
Tried different ways. The closest way that may fit my need is the following code:
with open('list.csv', 'r') as reader, open('list-history.csv', 'a') as writer:
for row in reader:
writer.writerow(row)
I'm using 'a' and tried 'w' as well but no luck.
The result is no output at all.
Any suggestion, please? Thanks.
I think there should be a error with a stacktrace.
Here: writer.writerow(row)
Normally open() returns file object which doesn't have .writerow() method, normally, you should use .write(buffer) method.
Example
with open('list.csv', 'r') as reader, open('list-history.csv', 'a') as writer:
for row in reader:
writer.write(row)
For me it works well with test csv files. But it doesn't merge them, just appends content of one file to another one.
If both the csv columns have same name. Python's pandas module can help.
Example Code snippet.
import pandas as pd
df1 = pd.read_csv("csv1.csv")
df2 = pd.read_csv("csv2.csv")
df1.append(df2, ignore_index=True)
df1.to_csv("new.csv",index=False)
I'm new in Python, I'm trying to read all .csv files from one folder, I must add the third column (Dataset 1)from all files to a new .csv file (or Excel file). I have no problem to work with one file and edit (read, cut rows and columns, add columns and make simple statistics).
This is an example of one of my CSV files Imgur
and I have more than 2000!!! each one with 1123 rows
This should be fairly easy with something like the csv library, if you don't want to get into learning dataframes.
import os
import csv
new_data = []
for filename in os.listdir('./csv_dir'):
if filename.endswith('.csv'):
with open('./csv_dir/' + filename, mode='r') as curr_file:
reader = csv.reader(curr_file, delimiter=',')
for row in reader:
new_data.append(row[2]) # Or whichever column you need
with open('./out_dir/output.txt', mode='w') as out_file:
for row in new_data:
out_file.write('{}\n'.format(row))
Your new_data will contain the 2000 * 1123 columns.
This may not be the most efficient way to do this, but it'll get the job done and grab each CSV. You'll need to do the work of making sure the CSV files have the correct structure, or adding in checks in the code for validating the columns before appending to new_data.
Maybe try
csv_file = csv.reader(open(path, "r",), delimiter=",")
csv_file1 = csv.reader(open(path, "r",), delimiter=",")
csv_file2 = csv.reader(open(path, "r",), delimiter=",")
and then read like
for row in csv_file:
your code here
for row in csv_file1:
your code here
for row in csv_file2:
your code here
I wanted to delete specific rows from every single csv. files in my directory (i.e. from row 0 to 33), but I have 224 separate csv. files which need to be done. I would be happy if you help me how can I use one code to carry out this.
I think you can use glob and pandas to do this quite easily, I'm not sure if you want to write over your original files something I never recommend, so be careful as this code will do that.
import os
import glob
import pandas as pd
os.chdir(r'yourdir')
allFiles = glob.glob("*.csv") # match your csvs
for file in allFiles:
df = pd.read_csv(file)
df = df.iloc[33:,] # read from row 34 onwards.
df.to_csv(file)
print(f"{file} has removed rows 0-33")
or something along those lines..
This is a simple combination of two separate tasks.
First, you need to loop through all the csv files in a folder. See this StackOverflow answer for how to do that.
Next, within that loop, for each file, you need to modify the csv by removing rows. See this answer for how to read a csv, write a csv, and omit certain rows based on a condition.
One final aspect is that you want to omit certain line numbers. A good way to do this is with the enumerate function.
So code such as this will give you the line numbers.
import csv
input = open('first.csv', 'r')
output = open('first_edit.csv', 'w')
writer = csv.writer(output)
for i, row in enumerate(input):
if i > 33:
writer.writerow(row)
input.close()
output.close()
Iterate over CSV files and use Pandas to remove the top 34 rows of each file then save it to an output directory.
Try this code after installing pandas:
from pathlib import Path
import pandas as pd
source_dir = Path('path/to/source/directory')
output_dir = Path('path/to/output/directory')
for file in source_dir.glob('*.csv'):
df = pd.read_csv(file)
df.drop(df.head(34).index, inplace=True)
df.to_csv(output_dir.joinpath(file.name), index=False)
I made csv file in my python code itself and going to append next data in ti it but the error is comming
io.UnsupportedOperation: not readable
I tried code is:
df.to_csv('timepass.csv', index=False)
with open(r'timepass.csv', 'a') as f:
writer = csv.reader(f)
your_list = list(writer)
print(your_list)
want to append next data and store in the same csv file. so that csv file having both previous and current data.
so please help me to find out..
Thanks in advance...
It is so simple just try this:
import pandas as pd
df = pd.read_excel("NSTT.xlsx","Sheet1") #reading Excel
print(df) #Printing data frame
df.to_excel("new.xlsx") #Writing Dataframe into New Excel file
Now here if you want to append data in the same file then use
df.to_excel("new.xlsx","a")
And no need to add in a list as you can directly access the data same as a list with data frame only you have to define the location .
Please check this.
You can use pandas in python to read csv and write csv:
import pandas as pd
df = pd.read_csv("csv file")
print(df)
Try:
with open(r'timepass.csv', 'r') as f:
reader = list(csv.reader(f))
print(reader)
Here you are opening your file as r, which means read-only and assigning the list contents to reader with list(csv.reader(f)). Your earlier code a opens the file for appending only where in the documentation is described as:
'a' opens the file for appending; any data written to the file is
automatically added to the end
and does not support the read().
And if you want to append data to the csv file from a different list, use the with open as a with the writer method.
with open('lake.csv','a') as f:
csv.writer(f,[1,2,3]) #dummy list [1,2,3]
Or directly from the pandas.DataFrame.to_csv method from your new dataframe, with header = False so as not to append headers:
df.to_csv('timepass.csv', index=False)
df_new.to_csv(r'timepass.csv', mode='a', header=False) #once you have updated your dataframe, you can directly append it to the same csv file
you can use pandas for appending two csv quickly
import pandas as pd
dataframe1=pd.read_csv("a.csv")
dataframe2=pd.read_csv("b.csv")
dataframe1=dataframe1.append(dataframe2)
dataframe1=dataframe1.reset_index(drop=True)
dataframe1.to_csv("a.csv")