Use loadtxt to read files recursively - python

I have a large number of .asc files containing (x,y) coordinates for two given satellites. There are approximately 3,000 separate files for each satellite (e.g. Satellite1 = [file1,file2,..., file3000] and Satellite2= [file1,file2,..., file3000]).
I'm trying to write some code in Python (version 2.7.8 |Anaconda 2.0.) that finds the multiple points on the Earth's surface where both satellite tracks crossover.
I've written some basic code that takes two files as input(i.e. one from Sat1 and one from Sat2) using loadtxt. In a nutshell, the code looks like this:
sat1_in = loadtxt("sat1_file1.asc", usecols = (1,2), comments = "#")
sat2_in = loadtxt("sat2_file1.asc", usecols = (1,2), comments = "#")
def main():
xover_search() # Returns True or False whether a crossover is found.
xover_final() # Returns the (x,y) coordinates of the crossover.
write_output() # Appends this coordinates to a txt file for later display.
if __name__ == "__main__":
main()
I would like to implement this code to the whole dataset, using a function that outputs "sat1_in" and "sat2_in" for all possible combinations of files between satellite 1 and satellite 2. These are my ideas so far:
#Create two empty lists to store all the files to process for Sat1 and Sat2:
sat1_files = []
sat2_files = []
#Use os.walk to fill each list with the respective file paths:
for root, dirs, filenames in os.walk('.'):
for filename in fnmatch.filter(filenames, 'sat1*.asc'):
sat1_files.append(os.path.join(root, filename))
for root, dirs, filenames in os.walk('.'):
for filename in fnmatch.filter(filenames, 'sat2*.asc'):
sat2_files.append(os.path.join(root, filename))
#Calculate all possible combinations between both lists using itertools.product:
iter_file = list(itertools.product(sat1_files, sat2_files))
#Extract two lists of files for sat1 and sat2 to be compared each iteration:
sat1_ordered = [seq[0] for seq in iter_file]
sat2_ordered = [seq[1] for seq in iter_file]
And this is where I get stuck. How to iterate through "sat1_ordered" and "sat2_ordered" using loadtxt to extract the lists of coordinates for every single file?The only thing I have tried is:
for file in sat1_ordered:
sat1_in = np.loadtxt(file, usecols = (1,2),comments = "#")
But this will create a huge list containing all the measurements for satellite 1.
Could someone give me some ideas about how to tackle this problem?

Maybe you are searching something like that:
for file1, file2 in iter_file:
sat1_in = np.loadtxt(file1, usecols = (1,2),comments = "#")
sat2_in = np.loadtxt(file2, usecols = (1,2),comments = "#")
....

Related

Construct file path using for loop

I am trying to write a function that reads an array of parquet files as dataframe and concatenate them sequentially as on file
# Get the list of all files and directories
path = "C:**********\\main_data_1"
dir_list = os.listdir(path)
dir_list
Total_df = []
all_ais_msg_df = []
for j in np.arange(1,len(dir_list)+1):
filename = 'ais_comb_'+ str(j)+ '.parquet'
j +=1
Total_df.append(filename)
for i in Total_df:
** df = pd.read_parquet('main_data_1/'+ str(i))** # having problem here
i+=1
all_ais_msg_df = pd.DataFrame(all_ais_msg_df.append(df))
I intend to construct a string like this ------>'main_data_1/ais_comb_1.parquet' while using the i in the second loop to increment ais_comb_1, ais-comb_2 aspect of the string until the last filename in order to get and combine all as a dataframe
The error I get is:
TypeError: can only concatenate str (not "int") to str
I crave a very clear explanation please, I am still a rookie.

Reading in image files from a folder indexed with numeric name tags using python

I am trying to read in a series of images from a folder using python. The images represent small pieces of larger image that has been split into a grid and are indexed using "imagename_row_column.jpg" (i.e. img_0_1.jpg). My current code (pasted below) is having trouble with the column index and is counting numbers 10 and above in the incorrect order. For example instead of reading in like (img_0, img_0_1, img_0_2,...img_0_9, img_0_10...) I am getting (img_0, img_0_10, img_0_11, img_0_1, img_0_2...) Any advice would be much appreciated. Thanks!
# Get images from folder
path1 = r'C:\Users\user_\Desktop\Test\IMG_Scan'
images = []
mylist = os.listdir(path1)
for img in mylist:
curimg = cv2.imread(f'{path1}/{img}')
images.append(curimg)
"img_0_10" < "img_0_2" by string comparison.
Try custom sorting your files before iterating:
mylist = sorted(os.listdir(path1), key=lambda x: list(map(int, x.split("_")[1:])))

batch copy files, perform operation and copy more files

I want to copy files from a directory in batch mode, perform an operation on the copied files and then copy more files. To do this I have managed this code
import os
import sys
from shutil import copy2
_, _, filenames = next(os.walk("src/"))
print(filenames)
number_of_files = len(filenames)
batch_number = 2
i = 0
while i < number_of_files:
i += 1
j = i + batch_number
print(filenames[i:j])
and its output is
['file_02', 'file_03']
['file_03', 'file_04']
['file_04', 'file_010']
['file_010', 'file_01']
['file_01', 'file_06']
['file_06', 'file_08']
['file_08', 'file_09']
['file_09', 'file_07']
['file_07']
[]
What I want is:
['file_01', 'file_02']
['file_03', 'file_04']
['file_05', 'file_06']
['file_07', 'file_08']
['file_09', 'file_10']
What would be the best way to go about doing this?
Be careful, os.walk doesn't provide sorting in numerical way.
You can use sort() method. And pass each time to certain list, where you sort the content numerically
you_file_list.sort(key=int)
Since you case contain file name, you can put file_ to XX number in filenames list.

Reading images while maintaining folder structure

I have to write a matlab script in python as apparently what I want to achieve is done much more efficiently in Python.
So the first task is to read all images into python using opencv while maintaining folder structure. For example if the parent folder has 50 sub folders and each sub folder has 10 images then this is how the images variable should look like in python, very much like a cell in matlab. I read that python lists can perform this cell like behaviour without importing anything, so thats good I guess.
For example, below is how I coded it in Matlab:
path = '/home/university/Matlab/att_faces';
subjects = dir(path);
subjects = subjects(~strncmpi('.', {subjects.name}, 1)); %remove the '.' and '..' subfolders
img = cell(numel(subjects),1); %initialize the cell equal to number of subjects
for i = 1: numel(subjects)
path_now = fullfile(path, subjects(i).name);
contents = dir([path_now, '/*.pgm']);
for j = 1: numel(contents)
img{i}{j} = imread(fullfile(path_now,contents(j).name));
disp([i,j]);
end
end
The above img will have 50 cells and each cell will have stored 10 images. img{1} will be all images belonging to subject 1 and so on.
Im trying to replicate this in python but am failing, this is what I have I got so far:
import cv2
import os
import glob
path = '/home/university/Matlab/att_faces'
sub_f = os.listdir(path)
images = []
for n in sub_f:
path_now = os.path.join(path, sub_f[n], '*.pgm')
images[n] = [cv2.imread(file) for file in glob.glob(path_now)]
Its not exactly what I am looking for, some help would be appreciated. Please ignore silly mistakes as it is my first day writing in python.
Thanks
edit: directory structure:
The first problem is that n isn't a number or index, it is a string containing the path name. To get the index, you can use enumerate, which gives index, value pairs.
Second, unlike in MATLAB you can't assign to indexes that don't exist. You need to pre-allocate your image array or, better yet, append to it.
Third, it is better not to use the variable file since in python 2 it is a built-in data type so it can confuse people.
So with preallocating, this should work:
images = [None]*len(sub_f)
for n, cursub in enumerate(sub_f):
path_now = os.path.join(path, cursub, '*.pgm')
images[n] = [cv2.imread(fname) for fname in glob.glob(path_now)]
Using append, this should work:
for cursub in sub_f
path_now = os.path.join(path, cursub, '*.pgm')
images.append([cv2.imread(fname) for fname in glob.glob(path_now)])
That being said, there is an easier way to do this. You can use the pathlib module to simplify this.
So something like this should work:
from pathlib import Path
mypath = Path('/home/university/Matlab/att_faces')
images = []
for subdir in mypath.iterdir():
images.append([cv2.imread(str(curfile)) for curfile in subdir.glob('*.pgm')])
This loops over the subdirectories, then globs each one.
This can even be done in a nested list comprehension:
images = [[cv2.imread(str(curfile)) for curfile in subdir.glob('*.pgm')]
for subdir in mypath.iterdir()]
It should be the following:
import os
path = '/home/university/Matlab/att_faces'
sub_f = os.listdir(path)
print(sub_f) #--- this will print all the files present in this directory ---
#--- this a list to which you will append all the images ---
images = []
#--- iterate through every file in the directory and read those files that end with .pgm format ---
#--- after reading it append it to the list ---
for n in sub_f:
if n.endswith('.pgm'):
path_now = os.path.join(path, n)
print(path_now)
images.append(cv2.imread(path_now, 1))
import cv2
import os
import glob
path = '/home/university/Matlab/att_faces'
sub_f = os.listdir(path)
images = []
#read the images
for folder in sub_f:
path_now = os.path.join(path, folder, '*.pgm')
images.append([cv2.imread(file) for file in glob.glob(path_now)])
#display the images
for folder in images:
for image in folder:
cv2.imshow('image',image)
cv2.waitKey(0)
cv2.destroyAllWindows()

Pulling random files out of a folder for sampling

I needed a way to pull 10% of the files in a folder, at random, for sampling after every "run." Luckily, my current files are numbered numerically, and sequentially. So my current method is to list file names, parse the numerical portion, pull max and min values, count the number of files and multiply by .1, then use random.sample to get a "random [10%] sample." I also write these names to a .txt then use shutil.copy to move the actual files.
Obviously, this does not work if I have an outlier, i.e. if I have a file 345.txt among other files from 513.txt - 678.txt. I was wondering if there was a direct way to simply pull a number of files from a folder, randomly? I have looked it up and cannot find a better method.
Thanks.
Using numpy.random.choice(array, N) you can select N items at random from an array.
import numpy as np
import os
# list all files in dir
files = [f for f in os.listdir('.') if os.path.isfile(f)]
# select 0.1 of the files randomly
random_files = np.random.choice(files, int(len(files)*.1))
I was unable to get the other methods to work easily with my code, but I came up with this.
output_folder = 'C:/path/to/folder'
for x in range(int(len(files) *.1)):
to_copy = choice(files)
shutil.copy(os.path.join(subdir, to_copy), output_folder)
This will give you the list of names in the folder with mypath being the path to the folder.
from os import listdir
from os.path import isfile, join
from random import shuffle
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]
shuffled = shuffle(onlyfiles)
small_list = shuffled[:len(shuffled)/10]
This should work
You can use following strategy:
Use list = os.listdir(path) to get all your files in the directory as list of paths.
Next, count your files with range = len(list) function.
Using rangenumber you can get random item number like that random_position = random.randrange(1, range)
Repeat step 3 and save values in a list until you get enough positions (range/10 in your case)
After that you can get required files names like that list[random_position]
Use cycle for for iterating.
Hope this helps!
Based on Karl's solution (which did not work for me under Win 10, Python 3.x), I came up with this:
import numpy as np
import os
# List all files in dir
files = os.listdir("C:/Users/.../Myfiles")
# Select 0.5 of the files randomly
random_files = np.random.choice(files, int(len(files)*.5))
# Get the remaining files
other_files = [x for x in files if x not in random_files]
# Do something with the files
for x in random_files:
print(x)

Categories

Resources