I have a text file that contains a sentence in each line. Some lines are also empty.
sentence 1
sentence 2
empty line
I want to write the content of this file in a csv file in a way that the csv file has only one column and in each row the corresponding sentence is written. This is what I have tried:
import csv
f = open('data 2.csv', 'w')
with f:
writer = csv.writer(f)
for row in open('data.txt', 'r):
writer.writerow(row)
import pandas as pd
df = pd.read_csv('data 2.csv')
Supposing that I have three sentences in my text file, I want a csv file to have one column with 3 rows. However, when I run the code above, I will get the output below:
[1 rows x 55 columns]
It seems that each character in the sentences is written in one cell and all sentences are written in one row. How should I fix this problem?
So you want to load a text file into a single column of a dataframe, one line per dataframe row. It can be done directly:
with open(data.txt) as file:
df = pd.DataFrame((line.strip() for line in file), columns=['text'])
You can even filter empty lines at read time with filter:
with open(data.txt) as file:
df = pd.DataFrame(filter(lambda x: len(x) > 0, (line.strip() for line in file)),
columns=['text'])
In your code, you iterate through each character in the text file. Try reading line by line through readlines() method:
import csv
f = open('data 2.csv', 'w')
with f:
writer = csv.writer(f)
text_file = open('data.txt', 'r')
for row in text_file.readlines():
writer.writerow(row)
Related
I am looking to remove rows from a csv file if they contain specific strings or in their row.
I'd like to be able to create a new output file versus overwriting the original.
I need to remove any rows that contain "py-board" or "coffee"
Example Input:
173.20.1.1,2-base
174.28.2.2,2-game
174.27.3.109,xyz-b13-coffee-2
174.28.32.8,2-play
175.31.4.4,xyz-102-o1-py-board
176.32.3.129,xyz-b2-coffee-1
177.18.2.8,six-jump-walk
Expected Output:
173.20.1.1,2-base
174.28.2.2,2-game
174.28.32.8,2-play
177.18.2.8,six-jump-walk
I tried this
Deleting rows with Python in a CSV file
import csv
with open('input_csv_file.csv', 'rb') as inp, open('purged_csv_file', 'wb') as out:
writer = csv.writer(out)
for row in csv.reader(inp):
if row[1] != "py-board" or if row[1] != "coffee":
writer.writerow(row)
and I tried this
import csv
with open('input_csv_file.csv', 'rb') as inp, open('purged_csv_file', 'wb') as out:
writer = csv.writer(out)
for row in csv.reader(inp):
if row[1] != "py-board":
if row[1] != "coffee":
writer.writerow(row)
and this
if row[1][-8:] != "py-board":
if row[1][-8:] != "coffee-1":
if row[1][-8:] != "coffee-2":
but got this error
File "C:\testing\syslogyamlclean.py", line 6, in <module>
for row in csv.reader(inp):
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
I would actually not use the csv package for this goal. This can be achieved easily using standard file reading and writing.
Try this code (I have written some comments to make it self-explanatory):
# We open the source file and get its lines
with open('input_csv_file.csv', 'r') as inp:
lines = inp.readlines()
# We open the target file in write-mode
with open('purged_csv_file.csv', 'w') as out:
# We go line by line writing in the target file
# if the original line does not include the
# strings 'py-board' or 'coffee'
for line in lines:
if not 'py-board' in line and not 'coffee' in line:
out.write(line)
# pandas helps to read and manipulate .csv file
import pandas as pd
# read .csv file
df = pd.read_csv('input_csv_file.csv', sep=',', header=None)
df
0 1
0 173.20.1.1 2-base
1 174.28.2.2 2-game
2 174.27.3.109 xyz-b13-coffee-2
3 174.28.32.8 2-play
4 175.31.4.4 xyz-102-o1-py-board
5 176.32.3.129 xyz-b2-coffee-1
6 177.18.2.8 six-jump-walk
# filter rows
result = df[np.logical_not(df[1].str.contains('py-board') | df[1].str.contains('coffee'))]
print(result)
0 1
0 173.20.1.1 2-base
1 174.28.2.2 2-game
3 174.28.32.8 2-play
6 177.18.2.8 six-jump-walk
# save to result.csv file
result.to_csv('result.csv', index=False, header=False)
Currently I have a CSV file, it just only has 1 attribute at the first row. So, it can not be the header for this csv file. Then I re-write a new file to generate a new CSV file. The data format of this CSV file is like the screenshot below. It contains 5 columns - I would like to add column1 and column2, column3, column4, column5 as the headers for this CSV file.
I tried to use panda to give a header to this csv file but it does not work at all. Here is my code to add a header for this csv file.
with open("ex_fts.csv",'r') as f:
with open("updated_test.csv",'w') as f1:
next(f) # skip header line
for line in f:
f1.write(line)
a = df.to_csv("updated_test.csv", header=["Letter", "Number", "Symbol","a","as"], index=False)
print(a)
Just write the header before writing each row
columnNames = ["Letter", "Number", "Symbol", "a", "as"]
with open("ex_fts.csv",'r') as f:
with open("updated_test.csv",'w') as f1:
# Write new header
f1.write(','.join(columnNames) + '\n')
next(f) # skip header line
for line in f:
f1.write(line)
There appears to be 6 columns in your data, so I've used generic names for the header - replace them with the real column names.
import pandas as pd
header = ','.join(["Column1", "Column2", "Column3", "Column4", "Column5", "Column6"])
with open("ex_fts.csv",'r') as f:
with open("updated_test.csv",'w') as f1:
f1.write(header+'\n')
for line in f:
data = line.replace('\t', ',')
f1.write(data)
df = pd.read_csv('updated_test.csv',index_col=None)
print(df)
I want to write the rows of a csv file to another csv file. I want to change the content of each row as well in a way that if the row is empty, it remains empty and if it is not, any spaces at the beginning and end of the string are omitted. The original csv file has one column and 65422771 rows.
I have written the following to write the rows of the original csv file to the new one:
import csv
csvfile = open('data.csv', 'r')
with open('data 2.csv', "w+") as csv_file1:
writer = csv.writer(csv_file1)
count = 0
for row in csvfile:
row = row.replace('"', '')
count+= 1
print(count)
if row.strip() == '':
writer.writerow('\n')
else:
writer.writerow(row)
However, when the new csv file is made, it is shown that it has 130845543 rows (= count)! The size of the new csv file is also 2 times the size of the original one. How can I create the new csv file with exactly the same number of rows but with the mentioned changes made to them?
Try this:
import csv
with open('data.csv', 'r') as file:
rows = [[row[0].strip()] for row in csv.reader(file)]
with open('data_out.csv', "w", newline = "") as file:
writer = csv.writer(file)
writer.writerows(rows)
Also, as #tripleee mentioned, your file is quite large so you may want to read / write it in chunks. You can use pandas for that.
import pandas as pd
chunksize = 10_000
for chunk in pd.read_csv('data.csv', chunksize = chunksize, header = None):
chunk[0] = chunk[0].str.strip()
chunk.to_csv("data_out.csv", mode="a", header = False, index = False)
Hi I'm writing a simple script to copy a set of rows from a csv file and paste them for N number of times in other file.
I'm not able to write the result into other file.
Please find the code below:
import csv
for i in range(2):
with open('C:\\Python\\CopyPaste\\result2.csv', 'r') as fp:
data = fp.readlines()
fp.close()
with open('C:\\Python\\CopyPaste\\mydata.csv', 'w') as mycsvfile:
thedatawriter = csv.writer(mycsvfile)
for row in data:
thedatawriter.writerow(row)
Assuming that the format of the input and output CSV files is the same, just read the input file into a string and then write it to an output file N times:
N = 3
with open('C:\\Python\\CopyPaste\\result2.csv', 'r') as infile,\
open('C:\\Python\\CopyPaste\\mydata.csv', 'w') as outfile:
data = fp.read() # read entire contents of input file into data
for i in range(N):
outfile.write(data)
The above answers the question literally, however, it will replicate the header row N times, probably not what you want. You can do this instead:
import csv
N = 3
with open('C:\\Python\\CopyPaste\\result2.csv', 'r') as infile,\
open('C:\\Python\\CopyPaste\\mydata.csv', 'w') as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
writer.writerow(next(reader)) # reads header line and writes it to output file
data = [row for row in reader] # reads the rest of the input file
for i in range(N):
writer.writerows(data)
This code reads the first row from the input file as the header, and writes it once to the output CSV file. Then the remaining rows are read from the input file into the data list, and replicated N times in the output file.
I guess your question is : read a .csv file and then write the data to another .csv file for N times?
If my recognition is right, my suggestion would be using pandas library, that's very convenient.
Something like:
import pandas as pd
df = pd.read_csv('origin.csv')
df.to_csv('output.csv')
So I have a text file that looks like this:
1,989785345,"something 1",,234.34,254.123
2,234823423,"something 2",,224.4,254.123
3,732847233,"something 3",,266.2,254.123
4,876234234,"something 4",,34.4,254.123
...
I'm running this code right here:
file = open("file.txt", 'r')
readFile = file.readline()
lineID = readFile.split(",")
print lineID[1]
This lets me break up the content in my text file by "," but what I want to do is separate it into columns because I have a massive number of IDs and other things in each line. How would I go about splitting the text file into columns and call each individual row in the column one by one?
You have a CSV file, use the csv module to read it:
import csv
with open('file.txt', 'rb') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
This still gives you data by row, but with the zip() function you can transpose this to columns instead:
import csv
with open('file.txt', 'rb') as csvfile:
reader = csv.reader(csvfile)
for column in zip(*reader):
Do be careful with the latter; the whole file will be read into memory in one go, and a large CSV file could eat up all your available memory in the process.