Writing data into CSV file

Writing data into CSV file - python

I have a code that is basically doing this:
row1 = []
count = 0
writer = csv.writer(myFile)
row = []
for j in range(0, 2):
for i in range(0, 4):
row1.append(i+count)
count = count + 1
print(row1)
writer.writerows(row1)
row1[:] = []
I'm creating some lists and I want to map each value to a column, like this
This error showed it up iterable expected. How can I do that?

#roganjosh is right, what you need to write one row at a time is writerow:
import csv
myFile = open("aaa.csv", "w", newline="")
row1 = []
count = 0
writer = csv.writer(myFile)
row = []
for j in range(0, 2):
for i in range(0, 4):
row1.append(i+count)
count = count + 1
print(row1)
writer.writerow(row1)
row1[:] = []
myFile.close() # Don't forget to close your file

You probably need to call the method .writerow() instead of the plural .writerows(), because you write a single line to the file on each call.
The other method is to write multiple lines at once to the file.
Or you could also restructure your code like this to write all the lines at the end:
import csv
row_list = []
for j in range(2):
row = [j+i for i in range(4)]
row_list.append(row)
# row_list = [
# [j+i for i in range(4)]
# for j in range(2)]
with open('filename.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(row_list)

It's much simpler and easier to manipulate tabular data in pandas -- is there a reason you don't want to use pandas?
import pandas as pd
df = pd.DataFrame()
for i in range(4):
df[i] = range(i, i+4)
# Any other data wrangling
df.to_csv("file.csv")

Related

Creating a 2D grid using a file in python

I am making a grid that is populated with numbers from a txt file. The first 2 numbers of the file represent the row and column and the rest are the numbers that will populate my grid. Ive attempted to solve this myself but I ve have not been successful. Any help or suggestions are greatly appreciated.
the file would contain something like this:
2
2
15
20
36
78
with open('file.txt', 'r') as f:
content = f.readlines()
grid = []
for num in content:
grid.append(num.split())
print(grid)
with my code, I'm only getting [['2'], ['2'], ['15'],['20'], ['36'],['78']]
and what I'm looking for is a nested list as such [[15,20],[36,78]]
Thank you in advance for the help.

Try the following:
content = ["2 2 15 20 36 78"]
grid = content[0].split()
new_lst = []
for num in range(2, len(grid)-1, 2):
new_lst.append([grid[num], grid[num+1]])
print(new_lst)

Try slight modifications in your code:
with open('file.txt', 'r') as f:
content = f.readlines()
line = content[0].split()
nums = [int(num) for num in line]
grid = []
for i in range(0, len(nums), 2):
grid.append(nums[i:i+2])
print(grid)
If you have multiple lines in the file, try this:
grid = []
with open('file.txt', 'r') as f:
for line in f:
line = line.split()
nums = [int(num) for num in line]
for i in range(0, len(nums), 2):
grid.append(nums[i:i+2])
print(grid)

Python CSV writerow to specific column in already opened file

I am struggling with csv module and writerow method.
NOTE: This is simplified the code as much as I could. I am asking for understanding.
I provided Minimal, Complete, and Verifiable example as much as I could.
WHAT I'VE GOT:
Three tables in the database:
MODEL_test - contain data on which algorithm will learn
my_prediction - contain unseen data on which algorithm will be applied
OUT_predictions - contain output from algorithm predict method
In the first step, I create a new CSV file and keep in open till alliteration for the current algorithm is finished. Before training iteration starts I append CSV file rows with first 7 values from unseen table data, so data won't be multiplied. Then after each algorithm iteration, I want to append already opened file with OUT_prediction values.
CODE:
import csv
import datetime
def export_to_csv():
ldb = sqlite3.connect('database.db')
c = ldb.cursor()
table_name = 'my_predictions'
training_size = 3
now = datetime.datetime.now()
file_name = str.format('my_predictions {}', now.strftime("%Y-%m-%d %H %M %S"))
export_columns = ['COLUMN ' + str(n) for n in range(1, 8)] + \
['OUTPUT ' + str(n) for n in range(1, training_size + 1)]
with open('archived/' + file_name + '.csv', 'w', newline='') as csv_file:
writer = csv.writer(csv_file)
writer.writerow(export_columns)
output_writer = csv.DictWriter(csv_file, fieldnames=export_columns)
for o in range(1, 500): # < write all unseen data from database to csv
c.execute(str.format('SELECT * FROM {} WHERE ID=?', table_name), [o])
fetch_one = c.fetchone()
writer.writerow(fetch_one[1:7])
for t in range(training_size): #for each iteration write output to csv
# some machine learning training code
prediction = [0, 0, 1, 1, 0, 1] # <-- sample output from predictions
combined_set = list(map(str, prediction))
ids = 1
for each in combined_set:
c.execute(str.format('INSERT INTO OUTPUT_prediction VALUES ({})',
",".join(["?" for _ in range(1, len([ids] + [int(each)]) + 1)])), [ids] + [int(each)])
ids += 1
ldb.commit()
for o in range(1, 500): # <-- write down output from last prediction iteration to specific column
c.execute(str.format('SELECT * FROM {} WHERE ID=?', table_name), [o])
fetch_output = c.fetchone()
output_writer.writeheader()
output_writer.writerow({'OUTPUT ' + str(t + 1): fetch_output[-1]}) # <-- columns remain empty
WHAT IS THE PROBLEM
When code finish and I open the file I can see that OUTPUT columns remain empty
CSV IMAGE
EDIT: I don't want to use pandas and to_csv because of thy are very slow. Sometimes my unseen data has 1 million lines and it takes half an hour for a single iteration using to_csv.

I know what I've done wrong and found solution for this situation, but I'm not satisfied with it. When I try to add new column in w mode new data is always written at the end of the file. When I set csv_file.seek(0) old data is overwritten.
Also I have tried to reopen file in r+ mode and set csv_file.seek(0), but got same outcome.
I will use xlwings for this task, because it gives me more control, but still do not know how it will affect input data speed. My goal is to prepare summary report with unseen data, output for each iteration and statistical information.
SOLUTION (with r+):
now = datetime.datetime.now()
file_name = str.format('my_predictions {}', now.strftime("%Y-%m-%d %H %M %S"))
export_columns = ['COLUMN ' + str(n) for n in range(1, 8)] + \
['OUTPUT ' + str(n) for n in range(1, training_size + 1)]
with open('archived/' + file_name + '.csv', 'w', newline='') as csv_file:
writer = csv.writer(csv_file)
writer.writerow(export_columns)
for o in range(1, 500):
c.execute(str.format('SELECT * FROM {} WHERE ID=?', table_name), [o])
fetch_one = c.fetchone()
writer.writerow(fetch_one[1:7])
for t in range(training_size):
# some machine learning training code
prediction = [0, 0, 1, 1, 0, 1] # <-- sample output from predictions
combined_set = List(Map(Str, prediction))
# ids = 1
#
# for each in combined_set:
# c.execute(str.format('INSERT INTO OUTPUT_prediction VALUES ({})',
# ",".join(["?" for _ in range(1, len([ids] + [int(each)]) + 1)])), [ids] + [int(each)])
#
# ids += 1
#
# ldb.commit()
with open('archived/' + file_name + '.csv', 'r+', newline='') as csv_file:
writer = csv.writer(csv_file)
csv_input = csv.reader(csv_file)
rows = List(csv_input)
writer.writerow(export_columns)
for row, o in zip(rows, combined_set):
row += [o]
writer.writerow(row)

openpyxl start writing from particular column/cell

I have the following code:
ws = wb.worksheets[1]
print(ws)
with open('out.txt', 'r+') as data:
reader = csv.reader(data, delimiter='\t')
for row in reader:
print(row)
ws.append(row)
wb.save('test.xlsx')
by default it's written to xlsx file starting from A0
Is there a more convinient way to start appending data, let's say from C2?
Or only xxx.cell(row=xx , column=yy ).value=zz ?
i = 2
j = 3
with open('out.txt', 'r+') as data:
reader = list(csv.reader(data, delimiter='\t'))
for row in reader:
for element in row:
ws.cell(row=i, column=j).value = element
j += 1
j = 3
i += 1

Just pad the rows with Nones
ws.append([]) # move to row 2
for row in reader:
row = (None)*2 + row
ws.append(row)

File data binding with column names

I have files with hundreds and thousands rows of data but they are without any column.
I am trying to go to every file and make them row by row and store them in list after that I want to assign values by columns. But here I am confused what to do because values are around 60 in every row and some extra columns with value assigned and they should be added in every row.
Code so for:
import re
import glob
filenames = glob.glob("/home/ashfaque/Desktop/filetocsvsample/inputfiles/*.txt")
columns = []
with open("/home/ashfaque/Downloads/coulmn names.txt",encoding = "ISO-8859-1") as f:
file_data = f.read()
lines = file_data.splitlines()
for l in lines:
columns.append(l.rstrip())
total = {}
for name in filenames:
modified_data = []
with open(name,encoding = "ISO-8859-1") as f:
file_data = f.read()
lines = file_data.splitlines()
for l in lines:
if len(l) >= 1:
modified_data.append(re.split(': |,',l))
rows = []
i = len(modified_data)
x = 0
while i > 60:
r = lines[x:x+59]
x = x + 60
i = i - 60
rows.append(r)
z = len(modified_data)
while z >= 60:
z = z - 60
if z > 1:
last_columns = modified_data[-z:]
x = []
for l in last_columns:
if len(l) > 1:
del l[0]
x.append(l)
elif len(l) == 1:
x.append(l)
for row in rows:
for vl in x:
row.append(vl)
for r in rows:
for i in range(0,len(r)):
if len(r) >= 60:
total.setdefault(columns[i],[]).append(r[i])
In other script I have separated both row with 60 values and last 5 to 15 columns which should be added with row are separate but again I am confused how to bind all the data.
Data Should look like this after binding.
outputdata.xlsx
Data Input file:
inputdata.txt
What Am I missing here? any tool ?

I believe that your issue can be resolved by taking the input file and turning it into a CSV file which you can then import into whatever program you like.
I wrote a small generator that would read a file a line at a time and return a row after a certain number of lines, in this case 60. In that generator, you can make whatever modifications to the data as you need.
Then with each generated row, I write it directly to the csv. This should keep the memory requirements for this process pretty low.
I didn't understand what you were doing with the regex split, but it would be simple enough to add it to the generator.
import csv
OUTPUT_FILE = "/home/ashfaque/Desktop/File handling/outputfile.csv"
INPUT_FILE = "/home/ashfaque/Desktop/File handling/inputfile.txt"
# This is a generator that will pull only num number of items into
# memory at a time, before it yields the row.
def get_rows(path, num):
row = []
with open(path, "r", encoding="ISO-8859-1") as f:
for n, l in enumerate(f):
# apply whatever transformations that you need to here.
row.append(l.rstrip())
if (n + 1) % num == 0:
# if rows need padding then do it here.
yield row
row = []
with open(OUTPUT_FILE, "w") as output:
csv_writer = csv.writer(output)
for r in get_rows(INPUT_FILE, 60):
csv_writer.writerow(r)

Python print according to the shape of pyramid

My csv data likes this:
I use this code to print:
import pandas as pd
import csv
rs = pd.read_csv(r'D:/Clustering_TOP.csv',encoding='utf-8')
with open('D:/Clustering_TOP.csv','r') as csvfile:
reader = csv.reader(csvfile)
rows = [row for row in reader]
csv_title = rows[0]
csv_title = csv_title[1:]
len_csv_title = len(csv_title)
for i in range(len_csv_title):
for j in range(i,len_csv_title):
print(str(rs[csv_title[i]].corr(rs[csv_title[j]])), end='\t')
print()
The result of printing is this，the format is not right：
But in fact,I want the printing likes pyramid,for example:
How to modify my code?

Hello I am not 100% sure but I think your second for is the problem. Try this:
for i in range(len_csv_title):
for j in range(i+1):
print(str(rs[csv_title[i]].corr(rs[csv_title[j]])), end='\t')
print()

I think the issue lies in your second 'for' loop. The range should be for j in for j in range(len_csv_title), rather than in range(i,len_csv_title).

Change the below line:
for j in range(i,len_csv_title):
to:
for j in range(0,i+1):

This is a sample code to print starts in the pattern you requested
def pyramidpattern(n):
for i in range(0, n):
for j in range(0, i+1):
print("* ",end="")
print("\r")
n=5
pyramidpattern(n)
Output:
*
* *
* * *
* * * *
* * * * *

You need inverse back loop in your code. Just change the for loop and it works
import pandas as pd
import csv
rs = pd.read_csv(r'file',encoding='utf-8')
with open('file','r') as csvfile:
reader = csv.reader(csvfile)
rows = [row for row in reader]
csv_title = rows[0]
csv_title = csv_title[1:]
len_csv_title = len(csv_title)
for i in range(len_csv_title):
for j in range(0,i+1):
print(str(rs[csv_title[i]].corr(rs[csv_title[j]])), end='\t')
print()
Try to change inverse for loop
for j in range(0,i+1):

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Writing data into CSV file - python

It's much simpler and easier to manipulate tabular data in pandas -- is there a reason you don't want to use pandas? import pandas as pd df = pd.DataFrame() for i in range(4): df[i] = range(i, i+4) # Any other data wrangling df.to_csv("file.csv")

Related

Creating a 2D grid using a file in python

Python CSV writerow to specific column in already opened file

openpyxl start writing from particular column/cell

File data binding with column names

Python print according to the shape of pyramid

Categories

Resources