I am trying to write a program that reads one csv file and then creates two different output files based off of the input file.
file = open("file.csv","r")
def createFile(data_set):
output0 = open("Output" + data_set + ".txt","w")
print >> output0, 'h1,h2,h3,h4'
return output0
def runCalculations(data_set, output):
for row in file:
# equations and stuff
if data_set == 1:
print >> output, row[1]+','+row[2]+','+row[3]+','+x
if data_set == 2:
print >> output, row[4]+','+row[5]+','+row[6]+','+y
output.close()
output1 = createFile(1)
runCalculations(1, output1)
output2 = createFile(2)
runCalculations(2, output2)
file.close()
Output1 is perfect, formatting and everything is exactly how it should be. For Output2, the file is created,and the headers for the columns are visible (so 'createFile' is working fine), but the 'runCalculations' function never runs, including the equations (I checked by putting a few print functions here and there)
There are no error messages, and I've tried changing the variable name for the output file within each function and parameter (everything was just 'output' before). I've also tried closing each file (output1 and output2) individually outside of the 'runCalculations' method. What am I missing that is preventing the 'runCalculations' function from being called the second time?
Sorry if the solution is incredibly obvious, I've been working on this for a while so fresh eyes are a great help. Thank you very much for your time!
The function runCalculations exhausts the data file. That is the problem. It will be beta to open the file and close it in runCalculations. It is also better to create the output file in runCalculations, see below
def createFile(data_set):
output0 = open("Output" + data_set + ".txt","w")
print >> output0, 'h1,h2,h3,h4'
return output0
def runCalculations(data_set, output):
file = open("file.csv","r")
output = createFile(data_set)
for row in file:
# equations and stuff
if data_set == 1:
print >> output, row[1]+','+row[2]+','+row[3]+','+x
if data_set == 2:
print >> output, row[4]+','+row[5]+','+row[6]+','+y
output.close()
file.close()
runCalculations(1)
runCalculations(2)
When runCalculations is executed for the first time you iterate over file object and after the loop ends you are at the end of the file.
That's why calculations are not calculated the second time you run runCalculations. You have to move back to the beginning of the file.
To do this add file.seek(0) at the end of runCalculations function.
Related
I am a newbie of multiprocessing and i am using the said library in python to parallelize the computation of a parameter for the rows of a dataframe.
The idea is the following:
I have two functions, g for the actual computation and f for filling the dataframe with the computed values. I call the function f with pool.apply_async. The problem is that at the end of the poo.async the dataframe has not been updated even though a print inside f easily shows that it is saving correctly the values. So I thought to save the results in a file excel inside the f function as showed in my pseudo code below. However, what I obtain is that the file excel where i save the results stops to be updated after 2 values and the kernel keeps running even though the terminal shows that the script has computed all the values.
This is my pseudo code:
def g(path to image1, path to image 2):
#vectorize images
#does computation
return value #value is a float
def f(row, index):
value= g(row.image1, row.image2)
df.at[index, 'value'] = value
df.to_csv('dftest.csv')
return df
def callbackf(result):
global results
results.append(result)
inside the main:
results=[]
pool = mp.Pool(N_CORES)
for index, row in df.iterrows():
pool.apply_async(f,
args=(row, index),
callback=callbackf)
I tried to use with get_context("spawn").Pool() as pool inside the main as suggested by https://pythonspeed.com/articles/python-multiprocessing/ but it didn't solve my problem. What am I doing wrong? Is it possible that the vectorizing the images at each row causes problem to the multiprocessing?
At the end I saved the results in a txt instead of a csv and it worked. I don't know why it didn't work with csv though.
Here's the code I put instead of the csv and pickle lines:
with open('results.txt', 'a') as f:
f.write(image1 +
'\t' + image2 +
'\t' + str(value) +
'\n')
My program is search the upper and lower value from .txt file according to that input value.
def find_closer():
file = 'C:/.../CariCBABaru.txt'
data = np.loadtxt(file)
x, y = data[:,0], data[:,1]
print(y)
for k in range(len(spasi_baru)):
a = y #[0, 20.28000631, 49.43579604, 78.59158576, 107.7473755, 136.9031652, 166.0589549,
176.5645474, 195.2147447]
b = spasi_baru[k]
# diff_list = []
diff_dict = OrderedDict()
if b in a:
b = input("Number already exists, please enter another number ")
else:
for x in a:
diff = x - b
if diff < 0:
# diff_list.append(diff*(-1))
diff_dict[x] = diff*(-1)
else:
# diff_list.append(diff)
diff_dict[x] = diff
#print("diff_dict", diff_dict)
# print(diff_dict[9])
sort_dict_keys = sorted(diff_dict.keys())
#print(sort_dict_keys)
closer_less = 0
closer_more = 0
#cl = []
#cm = []
for closer in sort_dict_keys:
if closer < b:
closer_less = closer
else:
closer_more = closer
break
#cl.append(closer_less == len(spasi_baru) - 1)
#cm.append(closer_more == len(spasi_baru) - 1)
print(spasi_baru[k],": lower value=", closer_less, "and upper
value =", closer_more)
data = open('C:/.../Batas.txt','w')
text = "Spasi baru:{spasi_baru}, File: {closer_less}, line:{closer_more}".format(spasi_baru=spasi_baru[k], closer_less=closer_less, closer_more=closer_more)
data.write(text)
data.close()
print(spasi_baru[k],": lower value=", closer_less, "and upper value =", closer_more)
find_closer()
The results image is here 1
Then, i want to write these results to file (txt/csv no problem) into rows and columns sequence. But the problem that i have, the file contain just one row or written the last value output in terminal like below,
Spasi baru:400, File: 399.3052727, line: 415.037138
any suggestions to help fix my problem please? I stuck in a several hours to tried any different code algorithms. I'm using Python 3.7
The best solution is to use w+ or a+ mode when you're trying to append into the same test file.
Instead of doing this:
data = open('C:/.../Batas.txt','w')
Do this:
data = open('C:/.../Batas.txt','w+')
or
data = open('C:/.../Batas.txt','a+')
The reason is because you are overwriting the same file over and over inside the loop, so it will keep just the last interaction. Look for ways to save files without overwriting them.
‘r’ – Read mode which is used when the file is only being read
‘w’ – Write mode which is used to edit and write new information to the file (any existing files with the same name will be erased when this mode is activated)
‘a’ – Appending mode, which is used to add new data to the end of the file; that is new information is automatically amended to the end
‘r+’ – Special read and write mode, which is used to handle both actions when working with a file
I am beginner in python (also in programming)I have a larg file containing repeating 3 lines with numbers 1 empty line and again...
if I print the file it looks like:
1.93202838
1.81608154
1.50676177
2.35787777
1.51866227
1.19643624
...
I want to take each three numbers - so that it is one vector, make some math operations with them and write them back to a new file and move to another three lines - to another vector.so here is my code (doesnt work):
import math
inF = open("data.txt", "r+")
outF = open("blabla.txt", "w")
a = []
fin = []
b = []
for line in inF:
a.append(line)
if line.startswith(" \n"):
fin.append(b)
h1 = float(fin[0])
k2 = float(fin[1])
l3 = float(fin[2])
h = h1/(math.sqrt(h1*h1+k1*k1+l1*l1)+1)
k = k1/(math.sqrt(h1*h1+k1*k1+l1*l1)+1)
l = l1/(math.sqrt(h1*h1+k1*k1+l1*l1)+1)
vector = [str(h), str(k), str(l)]
outF.write('\n'.join(vector)
b = a
a = []
inF.close()
outF.close()
print "done!"
I want to get "vector" from each 3 lines in my file and put it into blabla.txt output file. Thanks a lot!
My 'code comment' answer:
take care to close all parenthesis, in order to match the opened ones! (this is very likely to raise SyntaxError ;-) )
fin is created as an empty list, and is never filled. Trying to call any value by fin[n] is therefore very likely to break with an IndexError;
k2 and l3 are created but never used;
k1 and l1 are not created but used, this is very likely to break with a NameError;
b is created as a copy of a, so is a list. But you do a fin.append(b): what do you expect in this case by appending (not extending) a list?
Hope this helps!
This is only in the answers section for length and formatting.
Input and output.
Control flow
I know nothing of vectors, you might want to look into the Math module or NumPy.
Those links should hopefully give you all the information you need to at least get started with this problem, as yuvi said, the code won't be written for you but you can come back when you have something that isn't working as you expected or you don't fully understand.
I can't seem to find a way to return the value of the number of columns in a worksheet in xlwt.Workbook(). The idea is to take a wad of .xls files in a directory and combine them into one. One problem I am having is changing the column position when writing the next file. this is what i'm working with thus far:
import xlwt, xlrd, os
def cbc(rd_sheet, wt_sheet, rlo=0, rhi=None,
rshift=0, clo=0, chi=None, cshift = 0):
if rhi is None: rhi = rd_sheet.nrows
if chi is None: chi = 2#only first two cols needed
for row_index in xrange(rlo, rhi):
for col_index in xrange(clo, chi):
cell = rd_sheet.cell(row_index, col_index)
wt_sheet.write(row_index + rshift, col_index + cshift, cell.value)
Dir = '/home/gerg/Desktop/ex_files'
ext = '.xls'
list_xls = [file for file in os.listdir(Dir) if file.endswith(ext)]
files = [Dir + '/%s' % n for n in list_xls]
output = '/home/gerg/Desktop/ex_files/copy_test.xls'
wbook = xlwt.Workbook()
wsheet = wbook.add_sheet('Summary', cell_overwrite_ok=True)#overwrite just for the repeated testing
for XLS in files:
rbook = xlrd.open_workbook(XLS)
rsheet = rbook.sheet_by_index(0)
cbc(rsheet, wsheet, cshift = 0)
wbook.save(output)
list_xls returns:
['file2.xls', 'file3.xls', 'file1.xls', 'copy_test.xls']
files returns:
['/home/gerg/Desktop/ex_files/file2.xls', '/home/gerg/Desktop/ex_files/file3.xls', '/home/gerg/Desktop/ex_files/file1.xls', '/home/gerg/Desktop/ex_files/copy_test.xls']
My question is how to scoot each file written into xlwt.workbook over by 2 each time. This code gives me the first file saved to .../copy_test.xls. Is there a problem with the file listing as well? I have a feeling there may be.
This is Python2.6 and I bounce between windows and linux.
Thank you for your help,
GM
You are using only the first two columns in each input spreadsheet. You don't need "the number of columns in a worksheet in xlwt.Workbook()". You already have the cshift mechanism in your code, but you are not using it. All you need to do is change the loop in your outer block, like this:
for file_index, file_name in enumerate(files):
rbook = xlrd.open_workbook(file_name)
rsheet = rbook.sheet_by_index(0)
cbc(rsheet, wsheet, chi = 2, cshift = file_index * 2)
For generality, change the line
if chi is None: chi = 2
in your function to
if chi is None: chi = rsheet.ncols
and pass chi=2 in as an arg as I have done in the above code.
I don't understand your rationale for overriding the overwrite check ... surely in your application, overwriting an existing cell value is incorrect?
You say "This code gives me the first file saved to .../copy_test.xls". First in input order is file2.xls. The code that you have shown is overwriting previous input and will give you the LAST file (in input order) , not the first ... perhaps you are mistaken. Note: The last input file 'copy_test.xls' is quite likely be a previous OUTPUT file; perhaps your output file should be put in a separate folder.
I'm tying to execute an R script from python, ideally displaying and saving the results. Using rpy2 has been a bit of a struggle, so I thought I'd just call R directly. I have a feeling that I'll need to use something like "os.system" or "subprocess.call," but I am having difficulty deciphering the module guides.
Here's the R script "MantelScript", which uses a particular stat test to compare two distance matrices at a time (distmatA1 and distmatB1). This works in R, though I haven't yet put in the iterating bits in order to read through and compare a bunch of files in a pairwise fashion (I really need some assistance with this, too btw!):
library(ade4)
M1<-read.table("C:\\pythonscripts\\distmatA1.csv", header = FALSE, sep = ",")
M2<-read.table("C:\\pythonscripts\\distmatB1.csv", header = FALSE, sep = ",")
mantel.rtest(dist(matrix(M1, 14, 14)), dist(matrix(M2, 14, 14)), nrepet = 999)
Here's the relevant bit of my python script, which reads through some previously formulated lists and pulls out matrices in order to compare them via this Mantel Test (it should pull the first matrix from identityA and sequentially compare it to every matrix in identityB, then repeat with the second matrix from identityB etc). I want to save these files and then call on the R program to compare them:
# windownA and windownB are lists containing ascending sequences of integers
# identityA and identityB are lists where each field is a distance matrix.
z = 0
v = 0
import subprocess
import os
for i in windownA:
M1 = identityA[i]
z += 1
filename = "C:/pythonscripts/distmatA"+str(z)+".csv"
file = csv.writer(open(filename, 'w'))
file.writerow(M1)
for j in windownB:
M2 = identityB[j]
v += 1
filename2 = "C:/pythonscripts/distmatB"+str(v)+".csv"
file = csv.writer(open(filename2, 'w'))
file.writerow(M2)
## result = os.system('R CMD BATCH C:/R/library/MantelScript.R') - maybe something like this??
## result = subprocess.call(['C:/R/library/MantelScript.txt']) - or maybe this??
print result
print ' '
If your R script only has side effects that's fine, but if you want to process further the results with Python, you'll still be better of using rpy2.
import rpy2.robjects
f = file("C:/R/library/MantelScript.R")
code = ''.join(f.readlines())
result = rpy2.robjects.r(code)
# assume that MantelScript creates a variable "X" in the R GlobalEnv workspace
X = rpy2.rojects.globalenv['X']
Stick with this.
process = subprocess.Popen(['R', 'CMD', 'BATCH', 'C:/R/library/MantelScript.R'])
process.wait()
When the the wait() function returns a value the .R file is finished.
Note that you should write your .R script to produce a file that your Python program can read.
with open( 'the_output_from_mantelscript', 'r' ) as result:
for line in result:
print( line )
Don't waste a lot of time trying to hook up a pipeline.
Invest time in getting a basic "Python spawns R" process working.
You can add to this later.
In case you're interested in generally invoking an R subprocess from Python.
#!/usr/bin/env python3
from io import StringIO
from subprocess import PIPE, Popen
def rnorm(n):
rscript = Popen(["Rscript", "-"], stdin=PIPE, stdout=PIPE, stderr=PIPE)
with StringIO() as s:
s.write("x <- rnorm({})\n".format(n))
s.write("cat(x, \"\\n\")\n")
return rscript.communicate(s.getvalue().encode())
if __name__ == '__main__':
output, errmsg = rnorm(5)
print("stdout:")
print(output.decode('utf-8').strip())
print("stderr:")
print(errmsg.decode('utf-8').strip())
Better to do it through Rscript.
Given what you're trying to do, a pure R solution might be neater:
file.pairs <- combn(dir(pattern="*.csv"), 2) # get every pair of csv files in the current dir
The pairs are columns in a 2xN matrix:
file.pairs[,1]
[1] "distmatrix1.csv" "distmatrix2.csv"
You can run a function on these columns by using apply (with option '2', meaning 'act over columns'):
my.func <- function(v) paste(v[1], v[2], sep="::")
apply(file.pairs, 2, my.func)
In this example my.func just glues the two file names together; you could replace this with a function that does the Mantel Test, something like (untested):
my.func <- function(v){
M1<-read.table(v[1], header = FALSE, sep = ",")
M2<-read.table(v[2], header = FALSE, sep = ",")
mantel.rtest(dist(matrix(M1, 14, 14)), dist(matrix(M2, 14, 14)), nrepet = 999)
}