I have a dataset where each rows are a 'column' dimension datapoint, and I want to process it and save each datapoint to a .pnb file . So I would need a snippet that creates a new file and saves in it.
For now I got this, but I am having trouble with the path file. Should it be relative to where my notebook is, or from C: ?
y = full_dataset[1,:]
z = np.save('./data/folder1/2.pnb',y)
Related
Goal
Read the data component of a hdf5 file in R.
Problem
I am using rhdf5 to read hdf5 files in R. Out of 75 files, it successfully read 61 files. But it throws an error about memory for the rest of the files. Although, some of these files are shorter than already read files.
I have tried running individual files in a fresh R session, but get the same error.
Following is an example:
# Exploring the contents of the file:
library(rhdf5)
h5ls("music_0_math_0_simple_12_2022_08_08.hdf5")
group name otype dclass dim
0 / data H5I_GROUP
1 /data ACC_State H5I_DATASET INTEGER 1 x 1
2 /data ACC_State_Frames H5I_DATASET INTEGER 1
3 /data ACC_Voltage H5I_DATASET FLOAT 24792 x 1
4 /data AUX_CACC_Adjust_Gap H5I_DATASET INTEGER 24792 x 1
... CONTINUES ----
# Reading the file:
rhdf5::h5read("music_0_math_0_simple_12_2022_08_08.hdf5", name = "data")
Error in H5Dread(h5dataset = h5dataset, h5spaceFile = h5spaceFile, h5spaceMem = h5spaceMem, :
Not enough memory to read data! Try to read a subset of data by specifying the index or count parameter.
In addition: Warning message:
In h5checktypeOrOpenLoc(file, readonly = TRUE, fapl = NULL, native = native) :
An open HDF5 file handle exists. If the file has changed on disk meanwhile, the function may not work properly. Run 'h5closeAll()' to close all open HDF5 object handles.
Error: Error in h5checktype(). H5Identifier not valid.
I can read the file via python:
import h5py
filename = "music_0_math_0_simple_12_2022_08_08.hdf5"
hf = h5py.File(filename, "r")
hf.keys()
data = hf.get('data')
data['SCC_Follow_Info']
#<HDF5 dataset "SCC_Follow_Info": shape (9, 24792), type "<f4">
How can I successfully read the file in R?
When you ask to read the data group, rhdf5 will read all the underlying datasets into R's memory. It's not clear from your example exactly how much data this is, but maybe for some of your files it really is more than the available memory on your computer. I don't know how Python works under the hood, but perhaps it doesn't do any reading of datasets until you run data['SCC_Follow_Info']?
One option to try, is that rather than reading the entire data group, you could be more selective and try reading only the specific dataset you're interested in at that moment. In the Python example that seems to be /data/SCC_Follow_Info.
You can do that with something like:
follow_info <- h5read(file = "music_0_math_0_simple_12_2022_08_08.hdf5",
name = "/data/SCC_Follow_Info")
Once you've finished working with that dataset remove it from your R session e.g. rm(follow_info) and read the next dataset or file you need.
TLDR: How can I make a notebook cell save its own python code to a file so that I can reference it later?
I'm doing tons of small experiments where I make adjustments to Python code to change its behaviour, and then run various algorithms to produce results for my research. I want to save the cell code (the actual python code, not the output) into a new uniquely named file every time I run it so that I can easily keep track of which experiments I have already conducted. I found lots of answers on saving the output of a cell, but this is not what I need. Any ideas how to make a notebook cell save its own code to a file in Google Colab?
For example, I'm looking to save a file that contains the entire below snippet in text:
df['signal adjusted'] = df['signal'].pct_change() + df['baseline']
results = run_experiment(df)
All cell codes are stored in a List variable In.
For example you can print the lastest cell by
print(In[-1]) # show itself
# print(In[-1]) # show itself
So you can easily save the content of In[-1] or In[-2] to wherever you want.
Posting one potential solution but still looking for a better and cleaner option.
By defining the entire cell as a string, I can execute it and save to file with a separate command:
cell_str = '''
df['signal adjusted'] = df['signal'].pct_change() + df['baseline']
results = run_experiment(df)
'''
exec(cell_str)
with open('cell.txt', 'w') as f:
f.write(cell_str)
I have a stack of CT-scan images. After processing (one image from those stack) CT-scan image using Matlab, I saved XY coordinates for each different boundary region in different Excel sheets as follows:
I = imread('myCTscan.jpeg');
BW = im2bw(I);
[coords, labeledImg] = bwboundaries(BW, 4, 'holes');
sheet = 1;
for n=1:length(coords);
xlswrite('fig.xlsx',coords{n,1},sheet,'A1');
sheet = sheet+1;
end
The next step is then to import this set of coordinates and plot it into Abaqus CAE Sketch for finite element analysis.
I figure out that my workflow is something like this:
Import Excel workbook
For each sheet in workbook:
2.1. For each row: read both column to get xy coordinates (each row has two column, x and y coordinate)
2.2. Put each xy coordinates inside a list
2.3. From list, sketch using spline method
Repeat step 2 for other sheets within the workbook
I searched for a while and found something like this:
from abaqus import *
lines= open('fig.xlsx', 'r').readlines()
pointList= []
for line in lines:
pointList.append(eval('(%s)' %line.strip()))
s1= mdb.models['Model-1'].ConstrainedSketch(name='mySketch', sheetSize=500.0)
s1.Spline(points= pointList)
But this only read XY coordinates from only one sheet and I'm stuck at step 3 above. Thus my problem is that how to read these coordinates in different sheets using Abaqus/Python (Abaqus 6.14, Python 2.7) script?
I'm new to Python programming, I can read and understand the syntax but can't write very well (I'm still struggling on how to import Python module in Abaqus). Manually type each coordinates (like in Abaqus' modelAExample.py tutorial) is practically impossible since each of my CT-scan image can have 100++ of boundary regions and 10k++ points.
I'm using:
Windows 7 x64
Abaqus 6.14 (with built in Python 2.7)
Excel 2013
Matlab 2016a with Image Processing Toolbox
You are attempting to read excel files as comma separated files. CSV files by definition can not have more than one tab. Your read command is interpreting the file as a csv and not allowing you to iterate over the tabs in your file (though it begs the question how your file is opening properly in the first place as you are saving an xlsx and reading a csv).
There are numerous python libraries that will parse and process XLS/XLSX files.
Take a look at pyxl and use it to read your file in.
You would likely use something like
from openpyxl import Workbook
(some commands to open the workbook)
listofnames=wb.sheetnames
for k in listofnames:
ws=wb.worksheets(k)
and then input your remaining commands.
I have a file with 44,586 lines of data. It is read in using pylab:
data = pl.loadtxt("20100101.txt")
density = data[:,0]
I need to run something like...
densities = np.random.normal(density, 30, 1)
np.savetxt('1.txt', np.vstack((densities.ravel())).T)
...and create a new file named 1.txt which has all 44,586 lines of my data randomised within the parameters I desire. Will my above commands be sufficient to read through and perform what I want on every line of data?
The more complicated part on top of this - is that I want to run this 1,000 times and produce 1,000 .txt files (1.txt, 2.txt ... 1000.txt) which each run the exact same command.
I become stuck when trying to run loops in scripts, as I am still very inexperienced. I am having trouble even beginning to get this running how I desire - also I am confused how to handle saving the files with their different names. I have used np.savetxt in the past, but don't know how to make it perform this task.
Thanks for any help!
There are two minor issues - the first relates to how to select the name of the files (which can be solved using pythons support for string concatenation), the second relates to np.random.normal - which only allows a size parameter when loc is a scalar.
data = pl.loadtxt("20100101.txt")
density = data[:,0]
for i in range(1, 1001):
densities = np.random.normal(loc=density, scale=30)
np.savetxt(str(i) + '.txt', densities)
I am working on a really big script right now where I have a csv file that I have removed rows and columns from, and edited the headers. I need to create one big shapefile for the entire csv file then create individual shape files for the units under one of the headers. I thougt the best way to do this would be to use arcpy.MakeXyEventLayer(), I saw in an arcgis sample script to then use arcpy.GetCount() for the output file of the xyEveveLayer, then arcpy.SaveToLayerFile_management() and arcpy.FeatureClassToShapefile_ conversion, but when I run the script only my csv file is getting edited and there is no layer in the output file. Is there a step I am missing or should this be making my shape.
this is the few lines of code I have used after all of he csv file editing to do what is described above:
outLyr = sys.arg[3] # shapefile layer output name
XYLyr.newLyr(csvOut, lyrOutFile, spRef, sys.argv[4], sys.argv[5]) # x coordinate column; y coordinate column
print arcpy.GetCount_management(lyrOutFile)
csv2LYR.saveLYR(lyrOutFile, curDir)
arcpy.SaveToLayerFile_management does not save data to a shapefile or any other kind of featureclass. It only creates a .lyr file, which points to a data source and renders it with saved symbology, etc. You can use arcpy.FeatureClassToShapefile_conversion to create the shapefile from the in-memory feature layer created with arcpy.MakeXyEventLayer. Help for that tool is here.