Python CSV writerow to specific column in already opened file - python

I am struggling with csv module and writerow method.
NOTE: This is simplified the code as much as I could. I am asking for understanding.
I provided Minimal, Complete, and Verifiable example as much as I could.
WHAT I'VE GOT:
Three tables in the database:
MODEL_test - contain data on which algorithm will learn
my_prediction - contain unseen data on which algorithm will be applied
OUT_predictions - contain output from algorithm predict method
In the first step, I create a new CSV file and keep in open till alliteration for the current algorithm is finished. Before training iteration starts I append CSV file rows with first 7 values from unseen table data, so data won't be multiplied. Then after each algorithm iteration, I want to append already opened file with OUT_prediction values.
CODE:
import csv
import datetime
def export_to_csv():
ldb = sqlite3.connect('database.db')
c = ldb.cursor()
table_name = 'my_predictions'
training_size = 3
now = datetime.datetime.now()
file_name = str.format('my_predictions {}', now.strftime("%Y-%m-%d %H %M %S"))
export_columns = ['COLUMN ' + str(n) for n in range(1, 8)] + \
['OUTPUT ' + str(n) for n in range(1, training_size + 1)]
with open('archived/' + file_name + '.csv', 'w', newline='') as csv_file:
writer = csv.writer(csv_file)
writer.writerow(export_columns)
output_writer = csv.DictWriter(csv_file, fieldnames=export_columns)
for o in range(1, 500): # < write all unseen data from database to csv
c.execute(str.format('SELECT * FROM {} WHERE ID=?', table_name), [o])
fetch_one = c.fetchone()
writer.writerow(fetch_one[1:7])
for t in range(training_size): #for each iteration write output to csv
# some machine learning training code
prediction = [0, 0, 1, 1, 0, 1] # <-- sample output from predictions
combined_set = list(map(str, prediction))
ids = 1
for each in combined_set:
c.execute(str.format('INSERT INTO OUTPUT_prediction VALUES ({})',
",".join(["?" for _ in range(1, len([ids] + [int(each)]) + 1)])), [ids] + [int(each)])
ids += 1
ldb.commit()
for o in range(1, 500): # <-- write down output from last prediction iteration to specific column
c.execute(str.format('SELECT * FROM {} WHERE ID=?', table_name), [o])
fetch_output = c.fetchone()
output_writer.writeheader()
output_writer.writerow({'OUTPUT ' + str(t + 1): fetch_output[-1]}) # <-- columns remain empty
WHAT IS THE PROBLEM
When code finish and I open the file I can see that OUTPUT columns remain empty
CSV IMAGE
EDIT: I don't want to use pandas and to_csv because of thy are very slow. Sometimes my unseen data has 1 million lines and it takes half an hour for a single iteration using to_csv.

I know what I've done wrong and found solution for this situation, but I'm not satisfied with it. When I try to add new column in w mode new data is always written at the end of the file. When I set csv_file.seek(0) old data is overwritten.
Also I have tried to reopen file in r+ mode and set csv_file.seek(0), but got same outcome.
I will use xlwings for this task, because it gives me more control, but still do not know how it will affect input data speed. My goal is to prepare summary report with unseen data, output for each iteration and statistical information.
SOLUTION (with r+):
now = datetime.datetime.now()
file_name = str.format('my_predictions {}', now.strftime("%Y-%m-%d %H %M %S"))
export_columns = ['COLUMN ' + str(n) for n in range(1, 8)] + \
['OUTPUT ' + str(n) for n in range(1, training_size + 1)]
with open('archived/' + file_name + '.csv', 'w', newline='') as csv_file:
writer = csv.writer(csv_file)
writer.writerow(export_columns)
for o in range(1, 500):
c.execute(str.format('SELECT * FROM {} WHERE ID=?', table_name), [o])
fetch_one = c.fetchone()
writer.writerow(fetch_one[1:7])
for t in range(training_size):
# some machine learning training code
prediction = [0, 0, 1, 1, 0, 1] # <-- sample output from predictions
combined_set = List(Map(Str, prediction))
# ids = 1
#
# for each in combined_set:
# c.execute(str.format('INSERT INTO OUTPUT_prediction VALUES ({})',
# ",".join(["?" for _ in range(1, len([ids] + [int(each)]) + 1)])), [ids] + [int(each)])
#
# ids += 1
#
# ldb.commit()
with open('archived/' + file_name + '.csv', 'r+', newline='') as csv_file:
writer = csv.writer(csv_file)
csv_input = csv.reader(csv_file)
rows = List(csv_input)
writer.writerow(export_columns)
for row, o in zip(rows, combined_set):
row += [o]
writer.writerow(row)

Related

What am I missing: list index out of range

import hashlib
import csv
import glob
def hash(text):
return hashlib.sha256(text.encode('UTF-8')).hexdigest()
def hash_file(input_file_name,output_file_name):
with open(input_file_name, newline='') as f_input, open(output_file_name, 'w', newline='') as f_output:
csv_input = csv.reader(f_input)
csv_output = csv.writer(f_output)
csv_output.writerow(next(csv_input)) # Copy the header row to the output
count = 0
print(count)
for customer_email in csv_input:
csv_output.writerow([hash(customer_email[0])])
count = count + 1
print(str(count) + " - " + customer_email[0])
f_input.close()
f_output.close
mylist = [f for f in glob.glob("*.csv")]
for file in mylist:
i_file_name = file
o_file_name = "hashed-" + file
hash_file(i_file_name,o_file_name)
I'm trying the above code and I keep getting a list index out of range. I have about 15 csv files that I would like to hash the email address on. It gets the first csv file and keeps iterating through it until I get the error message. Any help would be appreciated.
There was a blank line in my input that was causing the error
list index out of range mean that you try to acces a value of list by it's index, which is the length - 1 of the list value is not reach that index.
for example :
my_list = ['a', 'b']
lenght of my_list is 2. lenth - 1 = 1
mean that you only can acces the value of list until 1. like these:
my_list[0]
my_list[1]
if you try to access my_list[2], it will raise error list index out of range as your case.
to avoid the error you can:
index_to_access = 2
len(my_list) - index_to_access >= and my_list[index_to_access]
in your case maybe at here:
csv_output.writerow([hash(customer_email[0])])
count = count + 1
print(str(count) + " - " + customer_email[0])
change to :
customer_email = len(customer_email) - 1 >= 0 and customer_email[0] or ''
csv_output.writerow([hash(customer_email)])
count = count + 1
print(str(count) + " - " + customer_email)
actually there are many ways to handle that error.

Python --- Write the line in r-th test file / write all train files except r-th one

This is the problem: Write a script that reads in a dataset, and a number k, and randomly divides the dataset into k equal folds. Then it outputs k different training and testing files. Each of the k splits will use one of the k folds as a test set, and the rest as the training set. Name the output files like: train_1.txt, train_2.txt ... train_k.txt
#!/bin/python
import random
k = int(input("Enter k: "))
f = open("colon-cancer.txt","r")
for i in range(k):
i_str = str(i+1)
file_name_train = 'train_' + i_str + '.txt'
file_name_test = 'test_' + i_str + '.txt'
f1 = open(file_name_train, 'w')
f2 = open(file_name_test, 'w')
f1.close()
f2.close()
for line in f:
r = random.randint(1,k)
I don't know how to create a for loop to write line into the r-th file, or how to write all files except r-the file.
Does anyone know how to solve it?

File data binding with column names

I have files with hundreds and thousands rows of data but they are without any column.
I am trying to go to every file and make them row by row and store them in list after that I want to assign values by columns. But here I am confused what to do because values are around 60 in every row and some extra columns with value assigned and they should be added in every row.
Code so for:
import re
import glob
filenames = glob.glob("/home/ashfaque/Desktop/filetocsvsample/inputfiles/*.txt")
columns = []
with open("/home/ashfaque/Downloads/coulmn names.txt",encoding = "ISO-8859-1") as f:
file_data = f.read()
lines = file_data.splitlines()
for l in lines:
columns.append(l.rstrip())
total = {}
for name in filenames:
modified_data = []
with open(name,encoding = "ISO-8859-1") as f:
file_data = f.read()
lines = file_data.splitlines()
for l in lines:
if len(l) >= 1:
modified_data.append(re.split(': |,',l))
rows = []
i = len(modified_data)
x = 0
while i > 60:
r = lines[x:x+59]
x = x + 60
i = i - 60
rows.append(r)
z = len(modified_data)
while z >= 60:
z = z - 60
if z > 1:
last_columns = modified_data[-z:]
x = []
for l in last_columns:
if len(l) > 1:
del l[0]
x.append(l)
elif len(l) == 1:
x.append(l)
for row in rows:
for vl in x:
row.append(vl)
for r in rows:
for i in range(0,len(r)):
if len(r) >= 60:
total.setdefault(columns[i],[]).append(r[i])
In other script I have separated both row with 60 values and last 5 to 15 columns which should be added with row are separate but again I am confused how to bind all the data.
Data Should look like this after binding.
outputdata.xlsx
Data Input file:
inputdata.txt
What Am I missing here? any tool ?
I believe that your issue can be resolved by taking the input file and turning it into a CSV file which you can then import into whatever program you like.
I wrote a small generator that would read a file a line at a time and return a row after a certain number of lines, in this case 60. In that generator, you can make whatever modifications to the data as you need.
Then with each generated row, I write it directly to the csv. This should keep the memory requirements for this process pretty low.
I didn't understand what you were doing with the regex split, but it would be simple enough to add it to the generator.
import csv
OUTPUT_FILE = "/home/ashfaque/Desktop/File handling/outputfile.csv"
INPUT_FILE = "/home/ashfaque/Desktop/File handling/inputfile.txt"
# This is a generator that will pull only num number of items into
# memory at a time, before it yields the row.
def get_rows(path, num):
row = []
with open(path, "r", encoding="ISO-8859-1") as f:
for n, l in enumerate(f):
# apply whatever transformations that you need to here.
row.append(l.rstrip())
if (n + 1) % num == 0:
# if rows need padding then do it here.
yield row
row = []
with open(OUTPUT_FILE, "w") as output:
csv_writer = csv.writer(output)
for r in get_rows(INPUT_FILE, 60):
csv_writer.writerow(r)

Writing data into CSV file

I have a code that is basically doing this:
row1 = []
count = 0
writer = csv.writer(myFile)
row = []
for j in range(0, 2):
for i in range(0, 4):
row1.append(i+count)
count = count + 1
print(row1)
writer.writerows(row1)
row1[:] = []
I'm creating some lists and I want to map each value to a column, like this
This error showed it up iterable expected. How can I do that?
#roganjosh is right, what you need to write one row at a time is writerow:
import csv
myFile = open("aaa.csv", "w", newline="")
row1 = []
count = 0
writer = csv.writer(myFile)
row = []
for j in range(0, 2):
for i in range(0, 4):
row1.append(i+count)
count = count + 1
print(row1)
writer.writerow(row1)
row1[:] = []
myFile.close() # Don't forget to close your file
You probably need to call the method .writerow() instead of the plural .writerows(), because you write a single line to the file on each call.
The other method is to write multiple lines at once to the file.
Or you could also restructure your code like this to write all the lines at the end:
import csv
row_list = []
for j in range(2):
row = [j+i for i in range(4)]
row_list.append(row)
# row_list = [
# [j+i for i in range(4)]
# for j in range(2)]
with open('filename.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(row_list)
It's much simpler and easier to manipulate tabular data in pandas -- is there a reason you don't want to use pandas?
import pandas as pd
df = pd.DataFrame()
for i in range(4):
df[i] = range(i, i+4)
# Any other data wrangling
df.to_csv("file.csv")

Remove specified elements from a list in a txt file

I have a text file with a 1825 x 51 table. I am attempting to read into the text file and write this table to a new text file while removing certain columns from the table. I couldn't figure out how to delete a whole column because the list is a string so I am attempting to go into each row and use an if/else statement to determine if the element is in the desired range. If it is, write it to output, otherwise delete it.
if i in range(1,3)+[14]+range(20,27)+range(33,38)+[43]+[45]:
newNum=data[i]
data[i]=newNum
else:
delete.data
This is my first time using python so any help would be greatly appreciated!
code from comments
with open(inputfilepath,'r') as f:
outputfile=open(outputfilepath,'w+')
inSection=False
for line in f:
if begin in line:inSection=True
if inSection:
keep=range(1,3)+[14]+range(20,27)+range(33,38)+[43]+[45]
tmp=[]
spl=line.split(' ')
for idx in keep:
tmp.extend(spl[idx])
outputfile.write('%s\n' % ' '.join(tmp))
if end in line:inSection=False
I would probably go with something like this:
from __future__ import print_function
def process(infile, keep):
for line in infile:
fields = line.split()
yield ' '.join([_ for i, _ in enumerate(fields) if i in keep])
def main(infile, outfile):
# The following line (taken from your example) will not work in Python 3 as
# you cannot "add" ranges to lists. In Python 3 you would need to write:
# >>> [14] + list(range(20, 27)
keep = range(1, 3) + [14] + range(20, 27) + range(33, 38) + [43] + [45]
for newline in process(infile, keep):
print(newline, file=outfile)
if __name__ == '__main__':
with open('so.txt') as infile, open('output.txt', 'w') as outfile:
main(infile, outfile)
keep = [1,2,3,14,20,21,22,23,24,25,26,27,33,34,35,36,37,38,43,45]
with open('textfile') as fin, open('textout') as fout:
for ln in fin:
tmp = []
spl = ln.split(' ')
for idx in keep:
tmp.append(spl[idx])
fout.write('%s\n' % ' '.join(tmp))

Categories

Resources