VTK formatted output while each point lies at a new line - python

I aim to generate a .vtk format file with N POINT and M POLYGON data.
The formal output function is listed as below where polymesh represents the vtk.vtkPolyData() containing POINT and POLYGON
writer = vtk.vtkPolyDataWriter()
writer.SetFileTypeToASCII()
writer.SetInputData(polymesh)
writer.SetFileName(filename)
writer.Write()
Here is my concern
The output is shown as
...
POINTS N doubles
X0 Y0 Z0 X1 Y1 Z1 X2 Y2 Z2
X3 Y3 Z3 ...
...
POLYGONS
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
Is there any solution that I could make each point shown at a new line and each polygon shown at a new line as well. For example, the expected output should be
...
POINTS N doubles
X0 Y0 Z0
X1 Y1 Z1
X2 Y2 Z2
X3 Y3 Z3
...
POLYGONS
1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16

There is no such option on the writer class.
I see only two solutions:
create your own writer
post process your file

Related

How to iteratively add a column to a dataframe X based on the values of a separated dataframe y with Pandas?

I am struggling with this problem.
These are my initial matrices:
columnsx = {'X1':[6,11,17,3,12],'X2':[1,2,10,24,18],'X3':[8,14,9,15,7], 'X4':[22,4,20,16,5],'X5':[19,21,13,23,25]}
columnsy = {'y1':[0,1,1,2,0],'y2':[1,0,0,2,1]}
X = pd.DataFrame(columnsx)
y = pd.DataFrame(columnsy)
This is the final solution I am figuring out. It adds a column to X (called X_i), corresponding to the name of y with y value > 0. Therefore, it takes only the positive values of y (y>0) and rensitutes a binary vector with cardinality 2.
columnsx = {'X1':[11,17,3,6,3,12],'X2':[2,10,24,1,24,18],'X3':[14,9,15,8,15,7],
'X4':[4,20,16,22,16,5],'X5':[21,13,23,19,23,25], 'X_i':['y1','y1','y1','y2','y2','y2']}
columnsy = {'y':[1,1,2,1,2,1]}
X = pd.DataFrame(columnsx)
y = pd.DataFrame(columnsy)
Use DataFrame.melt
new_df = (df.melt(df.columns[df.columns.str.contains('X')],
var_name='X_y', value_name='y')
.loc[lambda df: df['y'].gt(0)])
print(new_df)
Output
X1 X2 X3 X4 X5 X_y y
1 11 2 14 4 21 y1 1
2 17 10 9 20 13 y1 1
3 3 24 15 16 23 y1 2
5 6 1 8 22 19 y2 1
8 3 24 15 16 23 y2 2
9 12 18 7 5 25 y2 1

How to code the string data in a column so that I could apply machine learning techniques for classification, for example k-means?

I have string variables (Range[VarName]) in a column with respective ID (Range[kksId]). I need to create an algorithm that will classify new variables to existing ID or if it is not possible put them separately in N/A class.
How to code the string data in a column so that I could apply machine learning techniques for classification, for example k-means?
Generally, since your variable "Range[kksId]" is your target class, you map each of theese strings to a unique integer number, here's an example of how that could be achieved in python:
import pandas as pd
def _categoricalToNumeric(dataset):
categoric_id_mapping = {}
curr_id_to_assign = 0
for row in dataset.index:
categorical_value = dataset.loc[row]
if categorical_value in categoric_id_mapping:
dataset.loc[row] = categoric_id_mapping[categorical_value]
else:
categoric_id_mapping[categorical_value] = curr_id_to_assign
dataset.loc[row] = curr_id_to_assign
curr_id_to_assign += 1
return dataset
df = pd.read_excel('DataModel.xlsx', index_col=0)
df['Range[kksId]'] = _categoricalToNumeric(df['Range[kksId]'])
Then, as for the string feature, in a simple classifier, they are generally mapped each caracter into a variable. Example:
R_r_DegPit1_In_St
R_r_DegPit1_In
becomes:
x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16
R _ r _ D e g P i t 1 _ I n _ S t
R _ r _ D e g P i t 1 _ I n \0 \0 \0
Since you will have as many variables as the longest string in your dataset, for the strings which will not occupy all variables you should fill the remaining variables with a value indicating the empty character. You should also change the character values to numeric, however, it is important not to reset the numeric counting based on each column. The result could be something like this:
x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16
3 1 4 1 5 10 11 6 12 13 2 1 7 14 1 8 9
3 1 4 1 5 10 11 6 12 13 2 1 7 14 0 0 0
Keep in mind that more advanced ML/DL techniques handles their strings in different ways.

plot line between points pandas

I would like to plot lines between two points and my points are defined in different columns.
#coordinates of the points
#point1(A[0],B[0])
#point2(C[0],D[0])
#line between point1 and point 2
#next line would be
#point3(A[1],B[1])
#point4(C[1],D[1])
#line between point3 and point 4
plot_result:
A B C D E F
0 0 4 7 1 5 1
1 2 5 8 3 3 1
2 3 4 9 5 6 1
3 4 5 4 7 9 4
4 6 5 2 1 2 7
5 1 4 3 0 4 7
i tried with this code:
import numpy as np
import matplotlib.pyplot as plt
for i in range(0, len(plot_result.A), 1):
plt.plot(plot_result.A[i]:plot_result.B[i], plot_result.C[i]:plot_result.D[i], 'ro-')
plt.show()
but it is a invalid syntax. I have no idea how to implement this
The first two parameters of the method plot are x and y which can be single points or array-like objects. If you want to plot a line from the point (x1,y1) to the point (x2,y2) you have to do something like this:
for plot_result in plot_result.values: # if plot_results is a DataFrame
x1 = row[0] # A[i]
y1 = row[1] # B[i]
x2 = row[2] # C[i]
y2 = row[3] # D[i]
plt.plot([x1,x2],[y1,y2]) # plot one line for every row in the DataFrame.

How to manipulate data from file and sort python

I have a .dat file with this information inside (but the real file with thousans of lines):
n a (au) k0 k1 P1 k2
1 3.156653 2 3 5 -18
2 3.152517 2 5 5 -23
3 3.154422 3 -18 5 29
4 3.151668 3 -16 5 24
5 3.158629 5 -19 5 21
6 3.156970 5 -17 5 16
7 3.155314 5 -15 5 11
8 3.153660 5 -13 5 6
9 3.152007 5 -11 5 1
10 3.150357 5 -9 5 -4
I load the data by:
import numpy as np
import matplotlib.pyplot as plt
from pylab import *
n = array([])
a = array([])
k0 = array([])
k1 = array([])
p1 = array([])
k2 = array([])
p2 = array([])
l = np.loadtxt('pascal.dat', skiprows=1, usecols=(0,1,2,3,4,5)).T
n=append(n,l[0])
a=append(a,l[1])
k0=append(k0,l[2])
p1=append(p1,l[3])
k1=append(k1,l[4])
p2=append(p2,l[5])
I want to use the values of the column "a(au)" to compute the distance of each element of the "n" column from the a given center, thus:
center = 3.15204
for i in range(len(n)):
distance = abs(center-a[i]))
Well, now I want to re-write the .dat file taking into account the value of distance. Therefore, I want to add a new column called "distance" and then I want to sort all the n rows as function of this new parameter, being the smallest (closest to the center) first and so on.
Any suggestion?
I suggest using the pandas library. Read the .dat file in as a dataframe - it's a very powerful tool through which you can manipulate data, add columns, etc.
import pandas as pd
with open('../pascal.dat') as f:
df = pd.Dataframe(f)
center = 3.15
df['distance'] = abs(3.15 - df['a (au)'])

xlrd data extraction python

I am working on data extraction using xlrd and I have extracted 8 columns of inputs for my project. Each column of data has around 100 rows. My code is as follows:
wb = xlrd.open_workbook('/Users/Documents/Sample_data/AI_sample.xlsx')
sh = wb.sheet_by_name('Sample')
x1 = sh.col_values( + 0)[1:]
x2 = sh.col_values( + 1)[1:]
x3 = sh.col_values( + 2)[1:]
x4 = sh.col_values( + 3)[1:]
x5 = sh.col_values( + 4)[1:]
x6 = sh.col_values( + 5)[1:]
x7 = sh.col_values( + 6)[1:]
x8 = sh.col_values( + 7)[1:]
Now I want to create an array of inputs which gives each row of the 8 columns.
For eg: if this is my 8 columns of data
x1 x2 x3 x4 x5 x6 x7 x8
1 2 3 4 5 6 7 8
7 8 6 5 2 4 8 8
9 5 6 4 5 1 7 5
7 5 6 3 1 4 5 6
i want something like: x1, x2, x3, x4, x5, x6 ([1,2,3,4,5,6,7,8]) for all the 100+ rows.
I could have done a row wise extraction but, doing that for 100+ rows is practically very difficult. So how do i do that. i also understand that it could be done using np.array. but i do not know how.
You can also try openpyxl something similar to xlrd
from openpyxl import load_workbook,Workbook
book = load_workbook(filename=file_name)
sheet = book['sheet name']
for row in sheet.rows:
col_0 = row[0].value
col_1 = row[1].value
I used to prefer openpyxl instead of xlrd
I found this piece of code very useful
X = np.array([x1, x2, x3, x4, x5, x6, x7, x8])
return X.T

Categories

Resources