Reading a file and printing a specific answer

Reading a file and printing a specific answer - python

I read a text file called CJ.txt containing 2 columns(z and mub) and 31 rows. ((I just write the important part of my program)).
The question is how to define or call a "r" to reach an appropriate answer for example: I would like to print r[25]. It need r[25]=mub[25]*z[25]
Another r[i], i from 0 to 31 can be obtained similar to above.
from math import *
import numpy as np
from scipy.integrate import quad
from scipy.integrate import odeint
min=l=m=n=b=t=chi=r=None
f=0
z,mub=np.genfromtxt('CJ.txt',unpack=True) # opening the text file
for i in range(len(z)): # This means from 0 to 31
r[i]=mub[i]*z[i] # need a function similar to this
print(r[5],r[31],r[2],r[12]) #and other r
or creating an array
x=[[r[1],r[5],r[7]],
[r[31],r[26],r[20]],
[r[21],r[12],r[14]]]
I don't know that this question is easy or hard, but it is very important to me.
I appreciate you time and your attention.

Is that what you looking for?
z,mub=np.genfromtxt('CJ.txt',unpack=True) # opening the text file
r = []
for i in range(len(z)): # This means from 0 to 31
r.append(mub[i]*z[i]) # need a function similar to this
print(r[5],r[31],r[2],r[12]) #and other r

Related

Error evaluating a derivative on Python (with .subs, .evalf and .lambdify)

I am trying to separately compute the elements of a Taylor expansion and did not obtain the results I was supposed to. The function to approximate is x**321, and the first three elements of that Taylor expansion around x=1 should be:
1 + 321(x-1) + 51360(x-1)**2
For some reason, the code associated with the second term is not working.
See my code below.
import sympy as sy
import numpy as np
import math
import matplotlib.pyplot as plt
x = sy.Symbol('x')
f = x**321
x0 = 1
func0 = f.diff(x,0).subs(x,x0)*((x-x0)**0/factorial(0))
print(func0)
func1 = f.diff(x,1).subs(x,x0)*((x-x0)**1/factorial(1))
print(func1)
func2 = f.diff(x,2).subs(x,x0)*((x-x0)**2/factorial(2))
print(func2)
The prints I obtain running this code are
1
321x - 321
51360*(x - 1)**2
I also used .evalf and .lambdify but the results were the same. I can't understand where the error is coming from.
f = x**321
x = sy.Symbol('x')
def fprime(x):
return sy.diff(f,x)
DerivativeOfF = sy.lambdify((x),fprime(x),"numpy")
print(DerivativeOfF(1)*((x-x0)**1/factorial(1)))
321*x - 321
I'm obviously just starting with the language, so thank you for your help.

I found a beginners guide how to Taylor expand in python. Check it out perhaps all your questions are answered there:
http://firsttimeprogrammer.blogspot.com/2015/03/taylor-series-with-python-and-sympy.html
I tested your code and it works fine. like Bazingaa pointed out in the comments it is just an issue how python saves functions internally. One could argument that for a computer it takes less RAM to save 321*x - 321 instead of 321*(x - 1)**1.
In your first output line it also gives you 1 instead of (x - 1)**0

python sparse matrix creation paralellize to speed up

I am creating a sparse matrix file, by extracting the features from an input file. The input file contains in each row, one film id, and then followed by some feature IDs and that features score.
6729792 4:0.15568 8:0.198796 9:0.279261 13:0.17829 24:0.379707
the first number is the ID of the film, and then the value to the left of the colon is feature ID and the value to the right is the score of that feature.
Each line represents one film, and the number of feature:score pairs vary from one film to another.
here is how I construct my sparse matrix.
import sys
import os
import os.path
import time
import numpy as np
from Film import Film
import scipy
from scipy.sparse import coo_matrix, csr_matrix, rand
def sparseCreate(self, Debug):
a = rand(self.total_rows, self.total_columns, format='csr')
l, m = a.shape[0], a.shape[1]
f = tb.open_file("sparseFile.h5", 'w')
filters = tb.Filters(complevel=5, complib='blosc')
data_matrix = f.create_carray(f.root, 'data', tb.Float32Atom(), shape=(l, m), filters=filters)
index_film = 0
input_data = open('input_file.txt', 'r')
for line in input_data:
my_line = np.array(line.split())
id_film = my_line[0]
my_line = np.core.defchararray.split(my_line[1:], ":")
self.data_matrix_search_normal[str(id_film)] = index_film
self.data_matrix_search_reverse[index_film] = str(id_film)
for element in my_line:
if int(element[0]) in self.selected_features:
column = self.index_selected_feature[str(element[0])]
data_matrix[index_film, column] = float(element[1])
index_film += 1
self.selected_matrix = data_matrix
json.dump(self.data_matrix_search_reverse,
open(os.path.join(self.output_path, "data_matrix_search_reverse.json"), 'wb'),
sort_keys=True, indent=4)
my_films = Film(
self.selected_matrix, self.data_matrix_search_reverse, self.path_doc, self.output_path)
x_matrix_unique = self.selected_matrix[:, :]
r_matrix_unique = np.asarray(x_matrix_unique)
f.close()
return my_films
Question:
I feel that this function is too slow on big datasets, and it takes too long to calculate.
How can I improve and accelerate it? maybe using MapReduce? What is wrong in this function that makes it too slow?

IO + conversions (from str, to str, even 2 times to str of the same var, etc) + splits + explicit loops. Btw, there is CSV python module which may be used to parse your input file, you can experiment with it (I suppose you use space as delimiter). Also I' see you convert element[0] to int/str which is bad - you create many tmp. object. If you call this function several times, you may to try to reuse some internal objects (array?). Also, you can try to implement it in another style: with map or list comprehension, but experiments are needed...
General idea of Python code optimization is to avoid explicit Python byte-code execution and to prefer native/C Python functions (for anything). And sure try to solve so many conversions. Also if input file is yours you can format it to fixed length of fields - this helps you to avoid split/parse totally (only string indexing).

init got an unexpected keyword argument [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I'm trying to run this script for chemistry research.
In case I didn't copy the script correctly here's the link to download the files:
http://pubs.acs.org/doi/suppl/10.1021/acs.analchem.5b02258
This instruction may be helpful:
"fsolve_withPT.py takes two command line arguments: input file name of “MamPol2_titration_data.txt” and
output file name of “Kaps_result.txt”"
When I run the script on ipython I get this error message:
__init__() got an unexpected keyword argument 'step_max'
#University of California San Francisco
#Supplemental for
#
#A model for specific and nonspecific binding of ligand to multi-protein
#complexes by native mass spectrometry
#
#Shenheng Guan, et al
#2015
#
import sys
import math
import numpy
import warnings
from scipy.optimize import fsolve,fmin
import matplotlib.pyplot as plt
warnings.filterwarnings('ignore')
input_fn='MamPol2_titration_data.txt'
output_fn='Kaps_result1.txt'
##input_fn=sys.argv[1]
##output_fn=sys.argv[1]
fid=open(input_fn,'r')
line=fid.readline()
line=line.strip('\n')
no_mam=[float(x) for x in line.split('\t')[1:]]
line=fid.readline()
data=[]
conc=[]
for line in fid:
line=line.strip('\n')
tmp0=line.split('\t')
conc.append(float(tmp0[0]))
tmp1=[float(x) for x in tmp0[1:]]
data.append(tmp1)
fid.close()
class fsolve_withPT:
def __init__(self,conc,data):
self.conc = conc
self.data = data
def ff(self, x, Kas, LT, PT):#x[0]:[P];x[1]:[PL]...;x[n]:[PLn];x[-1]:[L] n+2 (10)
fc=[]
for j in range(0,len(x)-2):#setup equilibrium equations
fc.append(Kas[j]*x[j]*x[-1]-x[j+1])
#mass conservation for P
tmpP=0.0#[P]
for j in range(0,len(x)-1):#x[0] to x[8] or [P] to [PL8]
tmpP=tmpP+x[j]
fc.append(tmpP-PT)#PT equals to all P species combined
#mass conservation for L
tmpL=x[-1]#[L]
for j in range(1,len(x)-1):
tmpL=tmpL+j*x[j]
fc.append(tmpL-LT)
return fc
def error(self,w):
Kas=w[:-1]
PT=w[-1]
mySum=0.0
for m in range(0,len(self.conc)):#over conc (LT)
#print Kas,self.conc[m],PT
F=fsolve(self.ff, [1.0]*10, args=(Kas, self.conc[m], PT))
myPT=sum(F[:-1])
for k in range(0, len(no_mam)):#over # of Mam
mySum=mySum+(F[k]/myPT-self.data[m][k])**2
return mySum
w0=[8,7,6,5,4,3,2,1,0]
w0=numpy.array(w0)
w0=w0*3.01e4
w0[-1]=5.e-6
myFclass=fsolve_withPT(conc,data)
w, fopt, iter, funcalls, warnflag = fmin(myFclass.error, w0, maxiter=2000,
maxfun=2000, full_output=True,disp=True)
# http://nullege.com/codes/show/src#n#u#Numdifftools-0.6.0#numdifftools#speed_comparison#run_benchmarks.py/73/numdifftools.Hessian
import numdifftools as nd
#my_step_nom=[1.0e3]*8+[1.0e-6]*1
my_step_nom=w#*1.0e-3
hessian = nd.Hessian(myFclass.error,step_max=1.0e-2,step_nom=my_step_nom)#, step_max=1.0, step_nom=numpy.abs(w))
H = hessian(w)
covH=numpy.linalg.inv(H)
conc0=conc#numpy.linspace(0.0,6.0E-05,num=101).tolist()
y0=[]
for tmp in conc0:
F=fsolve(myFclass.ff, [1.0]*10, args=(w[:-1], tmp,w[-1]))
y0.append(F)
#y0=myFunc(conc0,w)
fid=open(output_fn,'w')
fid.write('Calculated complex conc. (M)\t'+str(w[-1])+'\n')
fid.write('# of Mam in Complex\t')
for j in no_mam:
fid.write(str(j)+'\t')
fid.write('\n')
fid.write('Associate constants (Kas)\t\t')
for j in no_mam[:-1]:
fid.write(str(w[j])+'\t')
fid.write('\n')
fid.write('Mam Conc. (M)\tSimulated abundances\n')
for k in range(0,len(y0)):
fid.write(str(conc0[k])+'\t')
yc=y0[k]
tmp=sum(yc[:-1])
for j in range(0,len(yc)-2):
fid.write(str(yc[j]/tmp)+'\t')
fid.write(str(yc[-2])+'\n')
fid.close()
from scipy import stats
SS=fopt
DF=len(data)*len(data[0])-len(w)
t_factor=stats.t.ppf(0.95, DF)
SE=[]
dw=[]
for j in range(0,len(w)):
SE.append(numpy.sqrt(SS/DF*numpy.abs(covH[j,j])))
for j in range(0,len(w)):
dw.append(SE[j]*t_factor)

you are unable to run this code becuase the paper's code is incorrect. I even downloaded the code from the link you posted to make sure I had the correct code. I was also able to reproduce your error. I'll try to explain what is going on and what you might be able to do in light of this.
The error init() got an unexpected keyword argument 'step_max' essentially means that the code is telling python to create an object with some initial parameters, but python does not recognize the 'step_max' field.
The culprit line in the code is
hessian = nd.Hessian(myFclass.error,step_max=1.0e-2,step_nom=my_step_nom)
You can see that is is trying to tell python to create a nd.Hessian object given three initial parameters: myFclass.error, step_max=1.0e-2, and step_nom=my_step_nom. The problem here is that the nd.Hessian initializer does not take parameters called step_max and step_nom.
So then, what does the nd.Hessian initializer take? nd.Hessian is the Hessian object from the numdifftools package, so I took a look at the source code. Sure enough, this is the source code for initializing a nd.Hessian object:
class Hessian(_Derivative):
def __init__(self, f, step=None, method='central', full_output=False):
Take a look at the __init__. You can see that it takes f, step, method, and full_output. If it had taken in step_max and step_nom, those fields would have been included in the __init__.
One option is to try to use the np.Hessian object correctly and use the step parameter and figure out what step you want to use.
For example, if you replace the
hessian = nd.Hessian(myFclass.error,step_max=1.0e-2,step_nom=my_step_nom)
with
hessian = nd.Hessian(myFclass.error,step=1.0e-2)
You will be able to run the code. It might not give the same results as the paper though, you'll never really know what exact code they ran to get their results.
If you want to continue using this code and want to use the numdifftools package, I suggest taking a look at the source code that has nice explanations and comments and examples.

make a full matrix output in python

I have a matrix with 236 x 97 dimension. When I print the matrix in Python its output isn't complete, having ....... in the middle of matrix.
I tried to write the matrix to a test file, but the result is exactly same.
I can't post the screenshot because my reputation is not enough, and won't appear correctly if I choose another markup option.
Can anyone solve this?
def build(self):
self.keys = [k for k in self.wdict.keys() if len(self.wdict[k]) > 1]
self.keys.sort()
self.A = zeros([len(self.keys), self.dcount])
for i, k in enumerate(self.keys):
for d in self.wdict[k]:
self.A[i,d] += 1
def printA(self):
outprint = open('outputprint.txt','w')
print 'Here is the weighted matrix'
print self.A
outprint.write('%s' % self.A)
outprint.close()
print self.A.shape

Assuming your matrix is an numpy array you can use matrix.tofile(<options>) to write the array to a file as documented here:
#!/usr/bin/env python
# coding: utf-8
import numpy as np
# create a matrix of random numbers and desired dimension
a = np.random.rand(236, 97)
# write matrix to file
a.tofile('output.txt', sep = ' ')

The problem is that you're specifically saving the str representation to a file with this line:
outprint.write('%s' % self.A)
Which explicitly casts it to a string (%s) --- generating the abridged version you're seeing.
There are lots of ways to write the entire matrix to output, one easy option would be to use numpy.savetxt, for example:
import numpy
numpy.savetxt('outputprint.txt', self.A)

Plotting multiple functions using values in a list

This is the gist of what I need to do: in python, I have a parametric function f(x(t,omega),y(t,omega)) where omega has five specific values (at non-regular intervals). What I want to do is basically plot this function f on the same plot for each of the five values of omega.
Now, I have a working code for this but I think that it could be more concise (and I'm very interested in knowing HOW it could be more concise, because I want to learn as much as I can about python from this exercise), and also, I can't figure out how to fix the range of x(t,omega) here! This last point is the most problematic.
Here is my "working" code:
x=linspace(0,10,100)
H0=71
omega0=1.01
Rc=0.5*(omega0/(omega0-1))*(1-cos(x))
tc=(0.5/H0)*(omega0/(omega0-1)**(3/2))*(x-sin(x))
plot(tc,Rc)
omega0=1.1
Rc=0.5*(omega0/(omega0-1))*(1-cos(x))
tc=(0.5/H0)*(omega0/(omega0-1)**(3/2))*(x-sin(x))
plot(tc,Rc)
omega0=1.5
Rc=0.5*(omega0/(omega0-1))*(1-cos(x))
tc=(0.5/H0)*(omega0/(omega0-1)**(3/2))*(x-sin(x))
plot(tc,Rc)
omega0=2.0
Rc=0.5*(omega0/(omega0-1))*(1-cos(x))
tc=(0.5/H0)*(omega0/(omega0-1)**(3/2))*(x-sin(x))
plot(tc,Rc)
omega0=3.0
Rc=0.5*(omega0/(omega0-1))*(1-cos(x))
tc=(0.5/H0)*(omega0/(omega0-1)**(3/2))*(x-sin(x))
plot(tc,Rc)
show()
as you can see, tc and Rc serve as my x(t,omega) and y(t,omega) and I've used x as my parametric variable, because...well, I already have a t in the form of tc. If you plot this you'll see that it's difficult to get much information out of it even though all the lines are technically there. Any help is much appreciated!
EDIT: I got what I needed. For anyone coming across this thread because they have similar issues, my revised code thanks in large part to the below answer and some further searching is:
import numpy as np
import pylab as pl
from pylab import *
x=linspace(0,50,1000)
H0=71 #units km/s/Mpc
omegas = [1.01,1.1,1.5,2.0,3.0]
Rcs = [0.5*(omega0/(omega0-1))*(1-cos(x)) for omega0 in omegas]
tcs = [(0.5/H0)*(omega0/(omega0-1)**(3/2))*(x-sin(x)) for omega0 in omegas]
for pair in zip(tcs,Rcs):
pl.plot(pair[0],pair[1])
pl.xlim(0,0.55)
pl.ylim(0,60)
pl.show()

The most apparent way to reduce redundancy in your code is to use a for loop or list comprehension:
x=linspace(0,10,100)
H0 = 71
omegas = [1.01,1.1,1.5,2.0,3.0]
rcs = [0.5*(omega0/(omega0-1))*(1-cos(x)) for omega0 in omegas]
tcs = [(0.5/H0)*(omega0/(omega0-1)**(3/2))*(x-sin(x)) for omega0 in omegas]
for pair in zip(tcs,rcs):
plot(pair[0],pair[1])
show()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.