How to parse mutiple fastq files from a directory? [duplicate]

How to parse mutiple fastq files from a directory? [duplicate] - python

This question already has an answer here:
Can't Open files from a directory in python [duplicate]
(1 answer)
Closed 2 years ago.
I am trying to create a loop to parse one by one 5 fasta files that I have in a same directory. Now I will explain a little bit, I have 5 fasta files with the genome of 5 microorganism, each one in each file. The idea is to obtain de Ids from each file and put them in to a dictionary {Mo_Id1:0, Mo_Id2:0...,Mo_Id5:0}
I think my loop reads the first file, but then it gave me the following error; No such file or directory 'GCF_000006532.1_ASM696v3_genomic.fna' (this is the name of the second file that I have in my folder).
I show you my code:
from Bio import SeqIO
import os
dicc_MO=[]
files = os.listdir("/home/alumno/Escritorio/Asig2Python/Semana4/Tarea/genomas/genomas")
for f in files:
for record_seqMO in SeqIO.parse(f,"fasta"):
record_seqMO.id not in dicc_MO:
dicc_MO[record_seqMO.id] = 0
print(dicc_MO)
With dicc_MO i was trying to check if the loop was OKEY, in that case, I should have a dictionary where the keys are the microorganism name and the values are 0.

The command os.listdir only shows the name of files without path. So you need to add path to each file name in your list files.
from Bio import SeqIO
import os
dicc_MO=[]
files=os.listdir("/home/alumno/Escritorio/Asig2Python/Semana4/Tarea/genomas/genomas")
for f in files:
f = f + "/home/alumno/Escritorio/Asig2Python/Semana4/Tarea/genomas/genomas/"
for record_seqMO in SeqIO.parse(f,"fasta"):
...

Related

Read all .csv files within the folder without having a fixed name? [duplicate]

This question already has answers here:
How to open a file only using its extension?
(3 answers)
Closed 1 year ago.
I have a users.csv file on the same folder as my python script.
I use this to read through the folder:
users = []
with open(r"users.csv", encoding='UTF-8') as f:
Is there a way to automatically read any .csv files within the directory without having to put "users.csv"?
Not sure how to word it but i'll give some context.
I have 10 folders, all folders have 1 users.csv file inside as well as a script that reads such file. Every time i have to switch the .csv file i have to rename it 10 times across all folders so the script reads it. Is there any way to automatically read the file if its in a csv format?

You could do something like this:
import os
for file in os.listdir("."):
if file.endswith(".csv"):
#read it
To look in all sub-directories:
for root, dirs, _ in os.walk("."):
for d in dirs:
files = [os.path.join(root, d, f) for f in os.listdir(os.path.join(root, d)) if f.endswith(".csv")]
if len(files)>0:
for f in files:
#read f

How to pick latest CSV file in python? [duplicate]

This question already has answers here:
How to find newest file with .MP3 extension in directory?
(6 answers)
Closed last year.
I have many files in a folder. Like:
tb_exec_ns_decile_20190129.csv
tb_exec_ns_decile_20190229.csv
tb_exec_ns_decile_20190329.csv
So i just want to pick latest file:
tb_exec_ns_decile_20190329.csv

import glob
import os
latest_csv = max(glob.glob('/path/to/folder/*.csv'), key=os.path.getctime) #give path to your desired file path
print latest_csv

Since your csv files share a common prefix, you can
simply use max on the list of files. Assuming you are located
in the directory with your files and tb_exec_ns_decile_20190329.csv
has the latest date:
>>> import glob
>>> max(glob.glob('tb_exec_ns_decile_*.csv'))
'tb_exec_ns_decile_20190329.csv'

How can I move files with random names from one folder to another in Python? [duplicate]

This question already has answers here:
python copy files by wildcards
(3 answers)
Closed 4 years ago.
I have a large number of .txt files named in the combination of "cb" + number (like cb10, cb13), and I need to filter them out from a source folder that contains all the files named in "cb + number", including the target files.
The numbers in the target file names are all random, so I have to list all the file names.
import fnmatch
import os
import shutil
os.chdir('/Users/college_board_selection')
os.getcwd()
source = '/Users/college_board_selection'
dest = '/Users/seperated_files'
files = os.listdir(source)
for f in os.listdir('.'):
names = ['cb10.txt','cb11.txt']
if names in f:
shutil.move(f,dest)

if names in f: isn't going to work as f is a filename, not a list. Maybe you want if f in names:
But you don't need to scan a whole directory for this, just loop on the files you're targetting, it they exist:
for f in ['cb10.txt','cb11.txt']:
if os.path.exists(f):
shutil.move(f,dest)
If you have a lot of cbxxx.txt files, maybe an alternative would be to compute the intersection of this list with the result of os.listdir using a set (for faster lookup than a list, worth if there are a lot of elements):
for f in {'cb10.txt','cb11.txt'}.intersection(os.listdir(".")):
shutil.move(f,dest)
On Linux, with a lot of "cb" files, this would be faster because listdir doesn't perform a fstat, whereas os.path.exists does.
EDIT: if the files have the same prefix/suffix, you can build the lookup set with a set comprehension to avoid tedious copy/paste:
s = {'cb{}.txt'.format(i) for i in ('10','11')}
for f in s.intersection(os.listdir(".")):
or for the first alternative:
for p in ['10','11']:
f = "cb{}.txt".format(p)
if os.path.exists(f):
shutil.move(f,dest)
EDIT: if all cb*.txt files must be moved, then you can use glob.glob("cb*.txt"). I won't elaborate, the linked "duplicate target" answer explains it better.

How to get the total number of specific files in a directory containing subdirectories? [duplicate]

This question already has answers here:
Find all files in a directory with extension .txt in Python
(25 answers)
Closed 6 years ago.
The following python code counts the number of total files I have in a directory which contains multiple subdirectories. The result prints the subdirectory name along with the number of files it contains.
How can I modify this so that:
It only looks for a specific file extension (i.e. "*.shp")
It provides both the number of ".shp" files in each subdirectory and a final count of all ".shp" files
Here is the code:
import os
path = 'path/to/directory'
folders = ([name for name in os.listdir(path)])
for folder in folders:
contents = os.listdir(os.path.join(path,folder))
print(folder,len(contents))

you can use the .endswith() function on strings. This is handy for recognizing extensions. You could loop through the contents to find these files then as follows.
targets = []
for i in contents:
if i.endswith(extension):
targets.append(i)
print(folder, len(contents))

Thanks for the comments and answer, this is the code I used (feel free to flag my question as a duplicate of the linked question if it is too close):
import os
path = 'path/to/directory'
folders = ([name for name in os.listdir(path)])
targets = []
for folder in folders:
contents = os.listdir(os.path.join(path,folder))
for i in contents:
if i.endswith('.shp'):
targets.append(i)
print(folder, len(contents))
print "Total number of files = " + str(len(targets))

Read files sequentially in order [duplicate]

This question already has answers here:
Is there a built in function for string natural sort?
(23 answers)
Closed 9 years ago.
I have a number of files in a folder with names following the convention:
0.1.txt, 0.15.txt, 0.2.txt, 0.25.txt, 0.3.txt, ...
I need to read them one by one and manipulate the data inside them. Currently I open each file with the command:
import os
# This is the path where all the files are stored.
folder path = '/home/user/some_folder/'
# Open one of the files,
for data_file in os.listdir(folder_path):
...
Unfortunately this reads the files in no particular order (not sure how it picks them) and I need to read them starting with the one having the minimum number as a filename, then the one with the immediate larger number and so on until the last one.

A simple example using sorted() that returns a new sorted list.
import os
# This is the path where all the files are stored.
folder_path = 'c:\\'
# Open one of the files,
for data_file in sorted(os.listdir(folder_path)):
print data_file
You can read more here at the Docs
Edit for natural sorting:
If you are looking for natural sorting you can see this great post by #unutbu

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to parse mutiple fastq files from a directory? [duplicate] - python

Related

Read all .csv files within the folder without having a fixed name? [duplicate]

How to pick latest CSV file in python? [duplicate]

How can I move files with random names from one folder to another in Python? [duplicate]

How to get the total number of specific files in a directory containing subdirectories? [duplicate]

Read files sequentially in order [duplicate]

Categories

Resources