I´m writing this question because I´m not sure with the use of "structure arrays". I did a matrix from keyboard with different inputs (integer, float, etc.) using the "dtype" command. Then, I want find repeated elements in the column "p" and "q", when I have these elements, I want to sum the respective elements from column "z". Thanks. This is my Python code:
from numpy import *
from math import *
from cmath import *
from numpy.linalg import *
number_lines_=raw_input("Lines:")
numero_lines=int(number_lines_)
ceros=zeros((numero_lines,1))
dtype=([('p',int),('q',int),('r',float),('x',float),('b',complex),('z',complex),('y',complex)])
#print dtype
leng=len(dtype)
#print leng
yinfo=array(ceros,dtype)
#print shape(yinfo)
if numero_lines>0:
for i in range(numero_lines):
p_=raw_input("P: ")
p=int(p_)
if p>0:
yinfo['p'][i]=p
#print yinfo
q_=raw_input("Q: ")
q=int(q_)
if q>0 and q!=p:
yinfo['q'][i]=q
r_=raw_input("R: ")
r=float(r_)
yinfo['r'][i]=r
x_=raw_input("X: ")
x=float(x_)
yinfo['x'][i]=x
b_=raw_input("b:")
b=complex(b_)
yinfo['b'][i]=complex(0,b)
yinfo['z'][i]=complex(yinfo['r'][i],yinfo['x'][i])
yinfo['y'][i]=1./(yinfo['z'][i])
# print "\n"
print yinfo
print type(yinfo)
print yinfo.shape
Let me suggest some changes:
import numpy as np # not *
....
numero_lines=int(number_lines_)
...
# don't use numpy function names as variable names
# even the np import
dt=np.dtype([('p',int),('q',int),('r',float),('x',float),('b',complex),('z',complex),('y',complex)])
...
yinfo=np.zeros((numero_lines,),dtype=dt)
# make a zero filled array directly
# also make it 1d, I don't think 2nd dimension helps you
# if numero_lines>0: omit this test; range(0) is empty
# link each of the fields to a variable name
# changing a value of yinfo_p will change a value in yinfo
yinfo_p=yinfo['p']
yinfo_q=yinfo['q']
# etc
for i in range(numero_lines):
p_=raw_input("P: ")
p=int(p_)
if p>0:
yinfo_p[i]=p
#print yinfo
q_=raw_input("Q: ")
q=int(q_)
if q>0 and q!=p:
yinfo_q[i]=q
r_=raw_input("R: ")
r=float(r_)
yinfo_r[i]=r
x_=raw_input("X: ")
x=float(x_)
yinfo_x[i]=x
b_=raw_input("b:")
b=complex(b_)
yinfo_b[i]=complex(0,b)
# dont need to fill in these values row by row
# perform array operations after you are done with the input loop
yinfo_z[:] = yinfo_r + 1j*yinfo_x
yinfo_y[:] = 1./yinfo_z
Alternatively I could have defined
yinfo_p = np.zeros((numero_lines,), dtype=int)
yinfo_q = ...
After filling in all the yinfo_* values I could assemble them into a structured array - if that's what I need for other uses.
yinfo['p'] = yinfo_p
etc.
This isn't a very good use of structured arrays; as I tried to show with the yinfo_b variables, you are using each field as though it was a separate 1d array.
I haven't actually run these changes, so there might be some errors, but hopefully they will give you ideas that can improve your code.
Related
I need to convert all zero values of U2 (displacement in Y direction) to very low but non-zero values so that another output can be later divided by U2 without division by 0 issue.
Here's my attempt to do this:
from abaqusConstants import *
from odbAccess import *
# ***********************************************
odbPath="path_to_odb_file"
stepName="Step-1"
frameNumber=-1 #last frame in the stepName
sourceOutputFieldName='U' #displacement field
newOutputFieldName='U2_no_zeros'
# ************************************************
odb=session.openOdb(name=odbPath,readOnly=FALSE)
step=odb.steps[stepName]
frame=step.frames[frameNumber]
AllInstances=(odb.rootAssembly.instances.keys())
MyInstance=(AllInstances[-1])
instance1=odb.rootAssembly.instances[MyInstance]
sourceField=frame.fieldOutputs[sourceOutputFieldName]
subField=sourceField.getScalarField(componentLabel="U2")
Values=subField.bulkDataBlocks[0].data
NodeLabels=subField.bulkDataBlocks[0].nodeLabels
for value in Values:
if value==0:
value=1e-9
newField=frame.FieldOutput(name=newOutputFieldName, type=SCALAR, description="field")
newField.addData(position=NODAL, instance=instance1, labels=NodeLabels, data=Values)
odb.save()
odb.close()
The script runs without errors and the "U2_no_zeros" field is created but it contains the same values as the original U2 field so the loop doesn't work. In fact, this loop is just my loose idea since I don't know exactly how it should be realized. I was expecting some errors leading to the right solution but for some reason the script runs with no error messages.
You are not changing the data inside the Values variable. And use concatenate method from numpy to shape the data correctly.
And lastly, addData method accepts the data to be in tuple format.
import numpy
Values = numpy.concatenate(subField.bulkDataBlocks[0].data)
# use concatenate to --> [[..],[..],...] to [......]
NodeLabels=subField.bulkDataBlocks[0].nodeLabels
for i,value in enumerate(Values):
if value==0:
Values[i] = 1e-9
newField=frame.FieldOutput(name=newOutputFieldName, type=SCALAR, description="field")
newField.addData(position=NODAL, instance=instance1, labels=NodeLabels, data=tuple(Values))
I need to solve for a list (because there are two of values for each variable that are zipped in the proper order) of exmax, eymax, exymax and Nxmax given that all of these variables are some combination of the others.
I have the issue that the type is coming back as a 'finiteset' and it won't let me iterate properly as a result.
import math
import numpy as np
from astropy.table import QTable, Table, Column
from collections import Counter
import operator
from sympy import *
exmax= symbols('exmax')
eymax= symbols('eymax')
exymax= symbols('exymax')
Nxmax=symbols('Nxmax')
Stiffnessofplies=list(1,1) #This isn't the actual value, but it is important to have a len of two #here for later on
Nxmax=[78.4613527541947*exmax + 8.06201746514537e-15*exymax + 4.07395485454472*eymax,
69.4081197440953*exmax + 1.35798495151491*eymax]
exmax= [{(-1.0275144618526e-16*exymax - 0.0519230769230769*eymax,)},
{(-0.0195652173913043*eymax,)}]
eymax = [{(-0.0284210526315789*exmax + 8.11515424209734e-19*exymax,)},
{(-0.299999999999999*exmax,)}]
exymax = [{(-7.78938885521292e-17*exmax + 1.12391245013323e-18*eymax,)}, {(0,)}]
exmax2=[]
for i in emax:
for j in i:
exmax2.append(j)
eymax2=[]
for i in eymax:
for j in i:
eymax2.append(j)
exymax2=[]
for i in exymax:
for j in i:
exymax2.append(j)
I did these last three equations to try and flatten everything out to make it iterable. Here are other things I have tried:
#Pleasework=[]
#for i in range(0,len(Stiffnessofplies)):
# linsolve([exmax2[i]], [eymax2[i]], [exymax2[i]], [Nxmax[i]], (exmax, eymax, exymax,Nxmax))
#System= exmax2[0],eymax2[0],exymax2[0]
#linsolve(System, exmax,eymax,exymax,Nxmax)
#Masterlist=list(zip(exmax,eymax,exymax,Nxmax))
I think one of my main issues is the type I'm getting back 'finiteset' really doesn't work well when trying to iterate the list for both values in the list.
I want to create a numpy array by parsing a .txt file. The .txt file consists of features of iris flowers seperated by commas. every line is has one flower example with 5 data seperated with 4 commas. first 4 number is features and the last one is the name. I parse the .txt in a loop and want to append (using numpy.append probably) every lines parsed data into a numpy array called feature_table.
heres the code;
import numpy as np
iris_data = open("iris_data.txt", "r")
for line in iris_data:
currentline = line.split(",")
#iris_data_parsed = (currentline[0] + " , " + currentline[3] + " , " + currentline[4])
#sepal_length = numpy.array(currentline[0])
#petal_width = numpy.array(currentline[3])
#iris_names = numpy.array(currentline[4])
feature_table = np.array([currentline[0]],[currentline[3]],[currentline[4]])
print (feature_table)
print(feature_table.shape)
so I want to create a numpy array using only first, fourth and fifth data in every line
but I can't make it work as I want to. tried reading numpy docs but couldn't understand it.
While the people in the comments are right in that you are not persisting your data anywhere, your problem, I assume, is incorrect np.array construction. You should enclose all of the arguments in a list like this:
feature_table = np.array([currentline[0],currentline[3],currentline[4]])
And get rid of redundant [ and ] around the arguments.
See the official documentation for more examples. Basically all of the input data needs to be grouped/separated to be only 1 argument as Python will consider the other arguemnts as different positional arguments.
I am creating a sparse matrix file, by extracting the features from an input file. The input file contains in each row, one film id, and then followed by some feature IDs and that features score.
6729792 4:0.15568 8:0.198796 9:0.279261 13:0.17829 24:0.379707
the first number is the ID of the film, and then the value to the left of the colon is feature ID and the value to the right is the score of that feature.
Each line represents one film, and the number of feature:score pairs vary from one film to another.
here is how I construct my sparse matrix.
import sys
import os
import os.path
import time
import numpy as np
from Film import Film
import scipy
from scipy.sparse import coo_matrix, csr_matrix, rand
def sparseCreate(self, Debug):
a = rand(self.total_rows, self.total_columns, format='csr')
l, m = a.shape[0], a.shape[1]
f = tb.open_file("sparseFile.h5", 'w')
filters = tb.Filters(complevel=5, complib='blosc')
data_matrix = f.create_carray(f.root, 'data', tb.Float32Atom(), shape=(l, m), filters=filters)
index_film = 0
input_data = open('input_file.txt', 'r')
for line in input_data:
my_line = np.array(line.split())
id_film = my_line[0]
my_line = np.core.defchararray.split(my_line[1:], ":")
self.data_matrix_search_normal[str(id_film)] = index_film
self.data_matrix_search_reverse[index_film] = str(id_film)
for element in my_line:
if int(element[0]) in self.selected_features:
column = self.index_selected_feature[str(element[0])]
data_matrix[index_film, column] = float(element[1])
index_film += 1
self.selected_matrix = data_matrix
json.dump(self.data_matrix_search_reverse,
open(os.path.join(self.output_path, "data_matrix_search_reverse.json"), 'wb'),
sort_keys=True, indent=4)
my_films = Film(
self.selected_matrix, self.data_matrix_search_reverse, self.path_doc, self.output_path)
x_matrix_unique = self.selected_matrix[:, :]
r_matrix_unique = np.asarray(x_matrix_unique)
f.close()
return my_films
Question:
I feel that this function is too slow on big datasets, and it takes too long to calculate.
How can I improve and accelerate it? maybe using MapReduce? What is wrong in this function that makes it too slow?
IO + conversions (from str, to str, even 2 times to str of the same var, etc) + splits + explicit loops. Btw, there is CSV python module which may be used to parse your input file, you can experiment with it (I suppose you use space as delimiter). Also I' see you convert element[0] to int/str which is bad - you create many tmp. object. If you call this function several times, you may to try to reuse some internal objects (array?). Also, you can try to implement it in another style: with map or list comprehension, but experiments are needed...
General idea of Python code optimization is to avoid explicit Python byte-code execution and to prefer native/C Python functions (for anything). And sure try to solve so many conversions. Also if input file is yours you can format it to fixed length of fields - this helps you to avoid split/parse totally (only string indexing).
I am using python 2.7. I tried to store 2d arrays in file, but it stored only recent value. Suppose if I enter values for 3 arrays which are of 4 rows and two columns then it just store recent single value which i entered for last array. I used numpy for taking input for array. I tried this code:
import numpy as np
from math import *
def main ():
i_p = input("\n Enter number of input patterns:")
out = raw_input("\n Enter number of output nodes:")
hidden = raw_input("\n Enter number of hidden layers:")
print 'Patterns:'
for p in range(0,i_p):
print "z[%d]"%p
rows=input("Enter no of rows:")
cols=input("Enter no of coloumns:")
ff=open('array.txt','w')
for r in range(0,rows):
for c in range(0,cols):
z=np.matrix(input())
ff.write(z)
np.savetxt('array.txt',z)
if __name__=="__main__":
main()
Your
np.savetxt('array.txt',z)
opens the file for a fresh write; thus it destroys anything written to that file before.
Try:
ff=open('array.txt','w')
for i in range(3):
z = np.ones((3,5))*i
np.savetxt(ff,z)
This should write 9 lines, with 5 columns
I was going to adapt your:
for r in range(0,rows):
for c in range(0,cols):
z=np.matrix(input())
np.savetxt...
But that doesn't make sense. You don't write by 'column' with savetxt.
Go to a Python interpreter, make a simple array (not np.matrix), and save it. Make several arrays and save those. Look at what you saved.