How to add a header for a CSV file - python

Currently I have a CSV file, it just only has 1 attribute at the first row. So, it can not be the header for this csv file. Then I re-write a new file to generate a new CSV file. The data format of this CSV file is like the screenshot below. It contains 5 columns - I would like to add column1 and column2, column3, column4, column5 as the headers for this CSV file.
I tried to use panda to give a header to this csv file but it does not work at all. Here is my code to add a header for this csv file.
with open("ex_fts.csv",'r') as f:
with open("updated_test.csv",'w') as f1:
next(f) # skip header line
for line in f:
f1.write(line)
a = df.to_csv("updated_test.csv", header=["Letter", "Number", "Symbol","a","as"], index=False)
print(a)

Just write the header before writing each row
columnNames = ["Letter", "Number", "Symbol", "a", "as"]
with open("ex_fts.csv",'r') as f:
with open("updated_test.csv",'w') as f1:
# Write new header
f1.write(','.join(columnNames) + '\n')
next(f) # skip header line
for line in f:
f1.write(line)

There appears to be 6 columns in your data, so I've used generic names for the header - replace them with the real column names.
import pandas as pd
header = ','.join(["Column1", "Column2", "Column3", "Column4", "Column5", "Column6"])
with open("ex_fts.csv",'r') as f:
with open("updated_test.csv",'w') as f1:
f1.write(header+'\n')
for line in f:
data = line.replace('\t', ',')
f1.write(data)
df = pd.read_csv('updated_test.csv',index_col=None)
print(df)

Related

writing a text file to a csv file

I have a text file that contains a sentence in each line. Some lines are also empty.
sentence 1
sentence 2
empty line
I want to write the content of this file in a csv file in a way that the csv file has only one column and in each row the corresponding sentence is written. This is what I have tried:
import csv
f = open('data 2.csv', 'w')
with f:
writer = csv.writer(f)
for row in open('data.txt', 'r):
writer.writerow(row)
import pandas as pd
df = pd.read_csv('data 2.csv')
Supposing that I have three sentences in my text file, I want a csv file to have one column with 3 rows. However, when I run the code above, I will get the output below:
[1 rows x 55 columns]
It seems that each character in the sentences is written in one cell and all sentences are written in one row. How should I fix this problem?
So you want to load a text file into a single column of a dataframe, one line per dataframe row. It can be done directly:
with open(data.txt) as file:
df = pd.DataFrame((line.strip() for line in file), columns=['text'])
You can even filter empty lines at read time with filter:
with open(data.txt) as file:
df = pd.DataFrame(filter(lambda x: len(x) > 0, (line.strip() for line in file)),
columns=['text'])
In your code, you iterate through each character in the text file. Try reading line by line through readlines() method:
import csv
f = open('data 2.csv', 'w')
with f:
writer = csv.writer(f)
text_file = open('data.txt', 'r')
for row in text_file.readlines():
writer.writerow(row)

writing the rows of a csv file to another csv file

I want to write the rows of a csv file to another csv file. I want to change the content of each row as well in a way that if the row is empty, it remains empty and if it is not, any spaces at the beginning and end of the string are omitted. The original csv file has one column and 65422771 rows.
I have written the following to write the rows of the original csv file to the new one:
import csv
csvfile = open('data.csv', 'r')
with open('data 2.csv', "w+") as csv_file1:
writer = csv.writer(csv_file1)
count = 0
for row in csvfile:
row = row.replace('"', '')
count+= 1
print(count)
if row.strip() == '':
writer.writerow('\n')
else:
writer.writerow(row)
However, when the new csv file is made, it is shown that it has 130845543 rows (= count)! The size of the new csv file is also 2 times the size of the original one. How can I create the new csv file with exactly the same number of rows but with the mentioned changes made to them?
Try this:
import csv
with open('data.csv', 'r') as file:
rows = [[row[0].strip()] for row in csv.reader(file)]
with open('data_out.csv', "w", newline = "") as file:
writer = csv.writer(file)
writer.writerows(rows)
Also, as #tripleee mentioned, your file is quite large so you may want to read / write it in chunks. You can use pandas for that.
import pandas as pd
chunksize = 10_000
for chunk in pd.read_csv('data.csv', chunksize = chunksize, header = None):
chunk[0] = chunk[0].str.strip()
chunk.to_csv("data_out.csv", mode="a", header = False, index = False)

Replace csv header without deleting the other rows

I want to replace the header row of a cvs file text.csv.
header_list = ['column_1', 'column_2', 'column_3']
The header will look like this;
column_1, column_2, column_3
Here is my code;
import csv
with open('text.csv', 'w') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(header_list)
The header of the csv file was replaced correctly. However, the rest of the rows in the csv file were deleted. How do I replace only the header leaving the other rows intact?
I am using python v3.6
Here is a proper way to do it using csv module.
csv.DictReader reads the content of csv file into a list of dicts. It takes an optional fieldnames argument which if set applies a custom header and ignores an original header and treats it as a data row. So, all you need to do is read your csv
file with csv.DictReader and write data with csv.DictWriter. You will have to drop the first row in the reader because it contains the old header and write the new header. It does make sense to write the new data to a separate file though.
import csv
header = ["column_1", "column_2", "column_3"]
with open('text.csv', 'r') as fp:
reader = csv.DictReader(fp, fieldnames=header)
# use newline='' to avoid adding new CR at end of line
with open('output.csv', 'w', newline='') as fh:
writer = csv.DictWriter(fh, fieldnames=reader.fieldnames)
writer.writeheader()
header_mapping = next(reader)
writer.writerows(reader)
Use this:
import csv
header_list = ['column_1', 'column_2', 'column_3']
mystring = ",".join(header_list)
def line_prepender(filename, line):
with open(filename, 'r+') as csvfile:
content = csvfile.read()
csvfile.seek(0, 0)
csvfile.write(line.rstrip('\r\n') + '\n' + content)
line_prepender("text.csv", mystring)

Delete blank columns from header row

I'm pretty new to python and I'm having trouble deleting the header columns after the 25th column. There are 8 more extra columns that have no data so I'm trying to delete those columns. Columns 1-25 have like 50,000k of data and the rest of the columns are blank.How would I do this? My code for now is able to clean up the file but I cant delete the headers for row[0] AFTER COLUMN 25.
Thanks
import csv
my_file_name = "NVG.txt"
cleaned_file = "cleanNVG.csv"
remove_words = ['INAC-EIM','-INAC','TO-INAC','TO_INAC','SHIP_TO-inac','SHIP_TOINAC']
with open(my_file_name, 'r', newline='') as infile, open(cleaned_file, 'w',newline='') as outfile:
writer = csv.writer(outfile)
cr = csv.reader(infile, delimiter='|')
writer.writerow(next(cr)) #I think this is why is not working
for line in (r[0:25] for r in cr):
#del line [26:32]
if not any(remove_word in element for element in line for remove_word in remove_words):
line[11]= line[11][:5]
writer.writerow(line)
You've found the line with the problem - all you have to do is only print the headers you want. next(cr) reads the header line, but you pass the entire line to writer.writerow().
Instead of
writer.writerow(next(cr))
you want:
writer.writerow(next(cr)[:25])
([:25] and [0:25] are the same in Python)

Python Read Text File Column by Column

So I have a text file that looks like this:
1,989785345,"something 1",,234.34,254.123
2,234823423,"something 2",,224.4,254.123
3,732847233,"something 3",,266.2,254.123
4,876234234,"something 4",,34.4,254.123
...
I'm running this code right here:
file = open("file.txt", 'r')
readFile = file.readline()
lineID = readFile.split(",")
print lineID[1]
This lets me break up the content in my text file by "," but what I want to do is separate it into columns because I have a massive number of IDs and other things in each line. How would I go about splitting the text file into columns and call each individual row in the column one by one?
You have a CSV file, use the csv module to read it:
import csv
with open('file.txt', 'rb') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
This still gives you data by row, but with the zip() function you can transpose this to columns instead:
import csv
with open('file.txt', 'rb') as csvfile:
reader = csv.reader(csvfile)
for column in zip(*reader):
Do be careful with the latter; the whole file will be read into memory in one go, and a large CSV file could eat up all your available memory in the process.

Categories

Resources