Reading files and printing only the filename in python - python

I am new to python, I need to develop a simple code in which I have to take a directory as a user input and then read all the txt.files in there (containing numbers) then based on the numbers I have to generated an output based on the txt files names.
For example, I have two files one name de and the other is named co, each file contains a number say 1 for co and 2 for de, I need to read the program to read the number from the file, then arrange the output based on the files name and the numbers arranged, which means the output here should be code, as co contains 1 and de contain 2..
This is the code so far as I am getting users directory as input
import glob
import os
dirname = input("Please input directory path ")
path = os.path.join(dirname,"**")
for x in glob.glob(path, recursive=True):
print(x)

You simply have to read from all the matching .txt files from your glob results. Put everything in a dict and sort it based on the value.
import glob
import os
dirname = input("Please input directory path ")
path = os.path.join(dirname, "**", "*.txt")
fileValues = {}
for x in glob.glob(path, recursive=True):
with open(x, 'r') as f:
# Use filename as key and assign its value to it
fileValues[os.path.basename(x)] = int(f.read())
# Sort the dictionary based on the values and extract the keys in order
sortedFiles = dict(sorted(fileValues.items(), key=lambda x: x[1])).keys()
# sortedFiles is now a list containing all the filenames in ascending order of their values
# Print it, or use it however you want
print(sortedFiles)
For your example, sortedFiles is now ["co.txt", "de.txt"]
You can get "code" from this by replacing all the .txt and joining the list with ''
''.join(x.replace('.txt', '') for x in sortedFiles)

Related

Sort elements in a list based on a csv column (Python)

I have a list that contains different paths to files. I want to sort the elements in this list by matching the stem of these paths (i.e. the file name) with a column in a csv file that contains the file names. This is to make sure that the list displays elements in order of the file names that are contained in the csv. The csv is similar to as shown below:
I have done the following:
file_list = ['C:\\Example\\SS\\e342-SFA.jpg', 'C:\\Example\\DF\\j541-DFS.jpg', 'C:\\Example\\SD\\p162-YSA.jpg']
for f in file_list:
x = Path(f).stem # grabs file name from file_list without .jpg
for line in csv_file:
IL = line.replace(":", "").replace("\n", "").replace("(", "").replace(")", "")
columns = IL.split(",")
if columns[3] == x: # column[3] = File name in csv
[do the sorting]
I'm not sure how to proceed further from here.
I'll assume you already know how to open and parse a csv file, and hence you already have the list ['p162-YSA', 'e342-SFA', 'j541-DFS'].
from ntpath import basename, splitext
order_list = ['p162-YSA', 'e342-SFA', 'j541-DFS']
file_list = ['C:\\Example\\SS\\e342-SFA.jpg', 'C:\\Example\\DF\\j541-DFS.jpg', 'C:\\Example\\SD\\p162-YSA.jpg']
order_dict = {}
for i, w in enumerate(order_list):
order_dict[w] = i
# {'p162-YSA': 0, 'e342-SFA': 1, 'j541-DFS': 2}
sorted_file_list = [None] * len(file_list)
for name in file_list:
sorted_file_list[ order_dict[splitext(basename(name))[0]] ] = name
print(sorted_file_list)
# ['C:\\Example\\SD\\p162-YSA.jpg', 'C:\\Example\\SS\\e342-SFA.jpg', 'C:\\Example\\DF\\j541-DFS.jpg']
Note: I chose to import basename and splitext from ntpath rather than from os.path so that this code can run on my linux machine. See this related question: Get basename of a Windows path in Linux.

Find from one directory to another

I am trying to read a huge list output from a file directory, and once it finds it, it place it in another file using python.
I have a huge data set and was able to split the csv to 3 different folders based on their classes.
I need to search the list of column images and find from a directory and place it in a new file
To make everything clear:
class0.csv ,Unnamed: 0,noise,rot_ratio,background,class,images
0,0,0.031495803,0.383730466,0.870530701,0,00859199-ad58-4334-8635-07a094e11f94.JPG
5,5,2.605760607,0.547664714,-0.59016648,0,03159229-f613-4bd2-be32-82cf65496865.JPG
13,13,0.79224368,0.742954625,1.136200214,0,083ba0e4-cf97-40b7-9de3-0cdb618006c5.JPG
18,18,-0.416518561,0.432365614,1.12786556,0,0a9bca0f-dcbd-458e-a2bf-557876e5b402.JPG
36,36,2.192400275,0.558622462,-1.038830864,0,0e96c5b0-2ea6-441c-a1b6-22b5f650347b.JPG
46,46,-0.575673656,0.429221735,1.348484522,0,152c3bd4-dc1b-4328-a303-d923c226c040.JPG
51,51,3.880669006,0.295885257,1.005818478,0,19424685-3776-472c-8b07-f4c01643424e.JPG
53,53,1.552991557,0.485258419,0.282584728,0,1a8be963-4696-4605-826a-b9c1999985ae.JPG
Todo:
I need to find the images from my file directory and place it in another file, as the file contains images from different classes.
Hope it is clear.
import csv
from collections import defaultdict
columns = defaultdict(list) # each value in each column is appended to a list
with open('class0.csv') as f:
reader = csv.DictReader(f) # read rows into a dictionary format
for row in reader: # read a row as {column1: value1, column2: value2,...}
for (k,v) in row.items(): # go over each column name and value
columns[k].append(v) # append the value into the appropriate list
# based on column name k
image = columns['images']
print(image)
output list
'00859199-ad58-4334-8635-07a094e11f94.JPG', '03159229-f613-4bd2-be32-82cf65496865.JPG', '083ba0e4-cf97-40b7-9de3-0cdb618006c5.JPG', '0a9bca0f-dcbd-458e-a2bf-557876e5b402.JPG', '0e96c5b0-2ea6-441c-a1b6-22b5f650347b.JPG', '152c3bd4-dc1b-4328-a303-d923c226c040.JPG', '19424685-3776-472c-8b07-f4c01643424e.JPG', '1a8be963-4696-4605-826a-b9c1999985ae.JPG', '1d5c3c21-77d1-42d8-a4fc-80e8db01e2f2.JPG', '1ec36552-00af-454a-af47-de600baf3f1b.JPG', '2acbdc0e-9ae9-47e3-9dc3-c6a124c19296.JPG', '2c9e886e-ac63-4c60-b959-b7bccdd20289.JPG', '2df13128-ff88-4813-973b-c83296f1cbf5.JPG', '2eb9f4b7-cabc-4cd8-98b9-2f3470d623d2.JPG', '3169cf83-c70c-48b6-8332-9cae259e2204.JPG', '339371b4-c470-4489-832a-acd1d9c68d9f.JPG', '3504dc2b-8516-4ffc-bf02-9972409cfb0b.JPG', '364b3c69-dc6a-4afb-b67c-eb0854b5eaaf.JPG', '38cae58c-c150-4e39-a319-a57db3d9ac5f.JPG', '3ee6554f-3d7b-4094-844f-9539cc97a286.JPG', '444ea9ce-cdb5-4f48-ae28-d18e247bc6e4.JPG', '4513be86-e1ee-46a1-8897-30101045b420.JPG', '4e587d28-9656-47cc-bcac-93de429c3847.JPG', '4f4d7096-a3d7-49fe-90c8-63faed85d66c.JPG'
Now, I have a folder with many images including the output list. I want to find those specific images from main folder and place it in a new one.
Done so far: (I need to read the list output and find it from source_dir folder and place the specific files to target_dir)
import shutil
import os
source_dir = '/images'
target_dir = 'class0'
file_names = os.listdir(source_dir)
for file_name in file_names:
shutil.move(os.path.join(source_dir, file_name), target_dir)
I need to place only the specific images from output list

How to add files from multiple zip files into the single zip file

I want to put files in the multiple zip files that have common substring into a single zipfile
I have a folder "temp" containing some .zip files and some other files
filename1_160645.zip
filename1_165056.zip
filename1_195326.zip
filename2_120528.zip
filename2_125518.zip
filename3_171518.zip
test.xlsx
filename19_161518.zip
I have following dataframe df_filenames containing the prefixes of filename
filename_prefix
filename1
filename2
filename3
if there are multiple .zip files in the temp folder with the same prefix that exists in the dataframe df_filenames,i want to merge the contents of those files
for example filename1_160645.zip contains following contents
1a.csv
1b.csv
and filename1_165056.zip contains following contents
1d.csv
and filename1_195326.zip contains following contents
1f.csv
after merging the contents of above 2 files into the filename1_160645.zip
the contents of filename1_160645.zip will be
1a.csv
1b.csv
1d.csv
1f.csv
At the end only following files will remain the temp folder
filename1_160645.zip
filename2_120528.zip
filename3_171518.zip
test.xlsx
filename19_161518.zip
I have written the following code but it's not working
import os
import zipfile as zf
import pandas as pd
df_filenames=pd.read_excel('filename_prefix.xlsx')
#Get the list of all the filenames in the temp folder
lst_fnames=os.listdir(r'C:\Users\XYZ\Downloads\temp')
#take only .zip files
lst_fnames=[fname for fname in lst_fnames if fname.endswith('.zip')]
#take distinct prefixes in the dataframe
df_prefixes=df_filenames['filename_prefix'].unique()
for prefix in df_prefixes:
#this list will contain zip files with the same prefixes
lst=[]
#total count of files in the lst
count=0
for fname in lst_fnames:
if prefix in fname:
#print(prefix)
lst.append(fname)
#print(lst)
#if the list has more than 1 zip files,merge them
if len(lst)>1:
print(lst)
with zf.ZipFile(lst[0], 'a') as f1:
print(f1.filename)
for f in lst[1:]:
with zf.ZipFile(path+'\\'+f, 'r') as f:
print(f.filename) #getting entire path of the file here,not just filename
[f1.writestr(t[0], t[1].read()) for t in ((n, f.open(n)) for n in f.namelist())]
print(f1.namelist())
after merging the contents of the files with the filename containing filename1 into the filename1_160645.zip,
the contents of ``filename1_160645.zip``` should be
1a.csv
1b.csv
1d.csv
1f.csv
but nothing has changed when I double click filename1_160645.zip
Basically, 1a.csv,1b.csv,1d.csv,1f.csv are not part of filename1_160645.zip
I would use shutil for a higher level view for dealing with archive files. Additionally using pathlib gives nice methods/attributes for a given filepath. Combined with a groupby, we can easily extract target files that are related to each other.
import itertools
import shutil
from pathlib import Path
import pandas as pd
filenames = pd.read_excel('filename_prefix.xlsx')
prefixes = filenames['filename_prefix'].unique()
path = Path.cwd() # or change to Path('path/to/desired/dir/')
zip_files = (file for file in path.iterdir() if file.suffix == '.zip')
target_files = sorted(file for file in zip_files
if any(file.stem.startswith(pre) for pre in prefixes))
file_groups = itertools.groupby(target_files, key=lambda x: x.stem.split('_')[0])
for _, group in file_groups:
first, *rest = group
if not rest:
continue
temp_dir = path / first.stem
temp_dir.mkdir()
shutil.unpack_archive(first, extract_dir=temp_dir)
for item in rest:
shutil.unpack_archive(item, extract_dir=temp_dir)
item.unlink()
shutil.make_archive(temp_dir, 'zip', temp_dir)
shutil.rmtree(temp_dir)

Saving Filenames with Condition

I'm trying to save the names of files that fulfill a certain condition.
I think the easiest way to do this would make a short Python program that imports and reads the files, checks if the condition is met, and (assuming it is met) then saves the names of the files.
I have data files with just two columns and four rows, something like this:
a: 5
b: 5
c: 6
de: 7
I want to save the names of the files (or part of the name of the files, if that's a simple fix, otherwise I can just sed the file afterwards) of the data files that have the 4th number ([3:1]) greater than 8. I tried importing the files with numpy, but it said it couldn't import the letters in the first column.
Another way I was considering trying to do it was from the command line something along the lines of cat *.dat >> something.txtbut I couldn't figure out how to do that.
The code I've tried to write up to get this to work is:
import fileinput
import glob
import numpy as np
#Filter to find value > 8
#Globbing value datafiles
file_list = glob.glob("/path/to/*.dat")
#Creating output file containing
f = open('list.txt', 'w')
#Looping over files
for file in file_list:
#For each file in the directory, isolating the filename
filename = file.split('/')[-1]
#Opening the files, checking if value is greater than 8
a = np.loadtxt("file", delimiter=' ', usecols=1)
if a[3:0] > 8:
print >> f, filename
f.close()
When I do this, I get an error that says TypeError: 'int' object is not iterable, but I don't know what that's referring to.
I ended up using
import fileinput
import glob
import numpy as np
#Filter to find value > 8
#Globbing datafiles
file_list = glob.glob("/path/to/*.dat")
#Creating output file containing
f = open('list.txt', 'w')
#Looping over files
for file in file_list:
#For each file in the directory, isolating the filename
filename = file.split('/')[-1]
#Opening the files, checking if value is greater than 8
a = np.genfromtxt(file)
if a[3,1] > 8:
f.write(filename + "\n")
f.close()
it is hard to tell exactly what you want but maybe something like this
from glob import glob
from re import findall
fpattern = "/path/to/*.dat"
def test(fname):
with open(fname) as f:
try:
return int(findall("\d+",f.read())[3])>8
except IndexError:
pass
matches = [fname for fname in glob(fpattern) if test(fname)]
print matches

How to read in multiple files separately from multiple directories in python

I have x directories which are Star_{v} with v=0 to x.
I have 2 csv files in each directory, one with the word "epoch", one without.
If one of the csv files has the word "epoch" in it needs to be sent through one set of code, else through another.
I think dictionaries are probably the way to go but this section of the code is a bit of a wrong mess
directory_dict={}
for var in range(0, len(subdirectory)):
#var refers to the number by which the subdirectories are labelled by Star_0, Star_1 etc.
directory_dict['Star_{v}'.format(v=var)]=directory\\Star_{var}
#directory_dict['Star_0'], directory_dict['Star_1'] etc.
read_csv(f) for f in os.listdir('directory_dict[Star_{var}') if f.endswith(".csv")
#reads in all the files in the directories(star{v}) ending in csv.
if 'epoch' in open(read_csv[0]).read():
#if the word epoch is in the csv file then it is
directory_dict[Star_{var}][read] = csv.reader(read_csv[0])
directory_dict[Star_{var}][read1] = csv.reader(read_csv[1])
else:
directory_dict[Star_{var}][read] = csv.reader(read_csv[1])
directory_dict[Star_{var}][read1] = csv.reader(read_csv[0])
when dealing with csvs, you should use the csv module, and for your particular case, you can use a dictreader and parse the headers to check for the column you're looking for
import csv
import os
directory = os.path.abspath(os.path.dirname(__file__)) # change this to your directory
csv_list = [os.path.join(directory, c) for c in os.listdir(directory) if os.path.splitext(c) == 'csv']
def parse_csv_file():
" open CSV and check the headers "
for c in csv_list:
with open(c, mode='r') as open_csv:
reader = csv.DictReader(open_csv)
if 'epoch' in reader.fieldnames:
# do whatever you want here
else:
# do whatever else
then you can extract it from the DictReader's CSV header and do whatever you want
Also your python looks invalid

Categories

Resources