Assign multiple dataset as one variable [closed] - python

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I am extracting multiple dataset into one csv file.
data = Dataset(r'C:/path/2011.daily_rain.nc', 'r')
I successfully assigned one dataset but i still have ten more to work with in the same way. Are there any methods or functions can allow me to assign or combine multiple dataset as one variable?

From what you've described, it sounds like you want to perform the same task on each set of data. If that is the case, then consider using storing your dataset paths in an array, then using a for .. in loop to iterate through each path.
Consider the following sample code:
dataset_paths = [
"C:/path/some_data_file-0.nc",
"C:/path/some_data_file-1.nc",
"C:/path/some_data_file-2.nc",
"C:/path/some_data_file-3.nc",
# ... and the rest of your dataset file paths
]
for path in dataset_paths:
data = Dataset(path, 'r')
# Code that uses the data here
Everything in the for .. in block will be run for each path defined in the dataset_paths array. This will allow you to work with each dataset in the same way.

Related

How to convert all images in one folder to numpy files? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I need to do Semantic image segmentation based on Unet.
I have to work with Pascal VOC 2012 dataset, however I don't know how to do it, do I manually select images for the train & val and convert them into numpy and then load them into the model? Or is there another way?
If this is the first one I would like to know how to convert all the images present in a folder into .npy.
if i understood correctly, you just need to go through all the files from the folder and add them to the numpy table?
numpyArrays = [yourfunc(file_name) for file_name in listdir(mypath) if isfile(join(mypath, file_name))]
yourfunc is the function you need to write to convert one file from dataset format to numpy table

How do I load a 14,000 image data-set into a variable without running out of memory? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I'm trying to make a function to load a large image data-set of 14,000 images into a variable but I'm running into memory (RAM) issues.
What I'm trying to make is something like a cifar100.load_data function but it's not working out for me.
The function I defined looks like this:
def load_data():
trn_x_names=os.listdir('data/train_x')
trn_y_names=os.listdir('data/train_y')
trn_x_list=[]
trn_y_list=[]
for image in trn_x_names[0:]:
img=cv2.imread('data/train_x/%s'%image)
img=cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
trn_x_list.append(img)
for image in trn_y_names[0:]:
img=cv2.imread('data/train_y/%s'%image)
img=cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
trn_y_list.append(img)
x_train= np.array(trn_x_list)
y_train= np.array(trn_y_list)
return x_train,y_train
I first load all the images one by one, adding them to corresponding lists and at the end changing those lists to a numpy array and assigning them to some variables and returning them. But on the way, I ran into RAM issues as it consumed 100 % of my RAM.
You need to read in your images in batches as opposed to loading the entire data set into memory. If you are using tensorflow use the ImageDataGenerator.flowfromdirectory. Documentation is here. If your data is not organized into sub directories then you will need to create a python generator that reads in the data in batches. You can see how to build such a generator here.. Set the batch size to a value say 30 that will not fill up your memory.

Handling large binary files in Python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I have a binary file (>1GB in size) which contains single precision data, created in Matlab.
I am new to Python and would like to read the same file structure in Python.
any help would be much appreciated:
From Matlab, I can load the file as follow:
fid = fopen('file.dat','r');
my_data = fread(fid,[117276,1794],'single');
Many thanks
InP
Using numpy is easiest with fromfile https://docs.scipy.org/doc/numpy/reference/generated/numpy.fromfile.html:
np.fromfile('file.dat', dtype=np.dtype('single')).reshape((117276, 1794))
where np.dtype('single') is the same as np.dtype('float32')
Note that it may be transposed from what you want since MATLAB reads in column order, while numpy reshapes with row-order.
Also, I'm assuming that using numpy is ok since you are coming from MATLAB and probably will end up using it if you want to keep having MATLAB-like functions and not have to deal with pure python like these answers Reading binary file and looping over each byte

How to pull variables from line of data file in Python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have a large data file where each row looks as follows, where each pipe-delimited value represents a consistent variable (i.e. 1517892812 and 1517892086 represent the Unix Timestamp, and the last pipe delimited object will always be UnixTimestamp)
264|2|8|6|1.32235000|1.33070000|1.31400000|1257.89480966|1517892812
399|10|36|2|1.12329614|1.12659227|1.12000000|148194.47200218|1517892086
How can I pull out the values I need to make variables in Python? For example, looking at a row and getting UnixTimestamp=1517892812 (and other variables) out of it.
I want to pull out each relevant variable per line, work with them, and then look at the next line and reevaluate all of the variable values.
Is RegEx what I should be dealing with here?
No need for regex, you can use split():
int(a.strip().split('|')[-1])
If all variable are only number and you want a matrix whit all your values you can simply do something like:
[int(line.strip().split('|')) for line in your_data.splitlines()]
You can use regex and re.search():
int(re.search(r'[^|]+$', text).group())

Easiest way to validate between two CSV files using python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have two CSV files, and I would like to validate(find the differences and similarities) the data between these two files.
I am retrieving this data from vertica and because the data is so large I would like to do the validation at CSV level.
csvdiff allows you to compare the semantic contents of two CSV files, ignoring things like row and column ordering in order to get to what’s actually changed. This is useful if you’re comparing the output of an automatic system from one day to the next, so that you can look at just what’s changed.
I don't think you can directly compare sheets using openpyxl without manually looping on each rows and using your own validation code.
That depend your aim at performance, if speed is not a requirement, then why not but that will require some additional work.
Instead I would use pandas dataframes for any CSV validation needs, if you can add this dependency it should become really easier to compare files while keeping it at a great performance.
Here is a link to complete example:
http://pbpython.com/excel-diff-pandas.html
However, use read_csv() instead of read_excel() to read data from your files.

Categories

Resources