Writing XML filenames from folder in CSV in Python

Writing XML filenames from folder in CSV in Python - python

I have folder C:\test_xml where i have list of XML files. I want to get all the xml file names and store this in csv file xml_file.csv. I am trying with below Python code but dont know how to proceed as i am quiet new in Python.
import os
import glob
files = list(glob.glob(os.path.join('C:\temp','*.xml')))
print (files)

A way to get a list of only the filenames:
import pathlib
files = [file.name for file in pathlib.Path(r"C:\temp").glob("*.xml")]
The documentation for the csv module has some examples on how to write a .csv file

Related

Cant read a csv file in python because the file is not found

I am trying to read a csv file in python, but I keep getting a "FileNotFoundError". No such file or directory
I have it written as:
file = open('case_study2.csv')
I used:
import os
os.getcwd()
to get the current directory for my python file which came back as:
runfile('/Users/natestewart/casestudy2/Task4_CaseStudy2', wdir='/Users/natestewart/casestudy2')

To read csv files, one way is by using pandas library.
import pandas as pd
path = '/Users/natestewart/casestudy2/Task4_CaseStudy2'
file = pd.read_csv('path/case_study2.csv')

How to transfer multiple files from sub directories to a single path folder using python?

I have a list of names(all unique) of Wav files-
2003211085_2003211078_ffc0d543799a2984c60c581d.wav
2003214817_2003214800_92720fb19bf9216c2f160733.wav
2003233142_2003233136_8c42d206701830dff6032d41.wav
2003256235_2003256218_4e71bf77b0ffb907990d2e30.wav
2003276239_2003276196_dad6aff70f37817fcd75ffb8.wav
2003352182_2003352170_b1f2990d5f867408cc39c445.wav
There is a directory called \019\Recordings where all of these files are located under various subfolders.
I want to write a python app that pulls these wav files based on their unique name from all these subfolders and places them into a single target folder.
Im new to python and tried using -
import glob, os
import shutil
target_list_of_wav_names = ["2003211085_2003211078_ffc0d543799a2984c60c581d.wav",
"2003214817_2003214800_92720fb19bf9216c2f160733.wav",
"2003233142_2003233136_8c42d206701830dff6032d41.wav"
"2003352182_2003352170_b1f2990d5f867408cc39c445.wav"]
for file in glob.glob('//19/Recordings*.wav', recursive=True):
print(file)
if file in target_list_of_wav_names:
shutil.move(file, "C:/Users/ivd/Desktop/autotranscribe"+file)
But the files do not reflect in the target folder
How can i fix this?

glob is just a utility to find files based on a wildcard. It returns the string of the files that match your query.
So you'll still need to actually move the file with another function.
you could use os.rename or shutil.move to move it
for file in glob.glob("*.wav"):
os.rename(file, f'destinationfolder/{file}')

import glob, os
import shutil
target_list_of_wav_names = ['example_wav1.wav','example_wav2.wav',...... etc]
for file in glob.glob('/019/Recordings/*.wav', recursive=True):
print(file)
if file in target_list_of_wav_names:
shutil.move(file, "/mydir/"+file)

i have multiple txt files in 1 folder and I need to insert the txt data into MySql table using python

I have Multiple txt file in a folder. I need to insert the data from the txt file into mySql table
I also need to sort the files by modified date before inserting the data into the sql table named TAR.
below is the file inside one of the txt file. I also need to remove the first character in every line
SSerial1234
CCustomer
IDivision
Nat22
nAembly
rA0
PFVT
fchassis1-card-linec
RUnk
TP
Oeka
[06/22/2020 10:11:50
]06/22/2020 10:27:22
My code only reads all the files in the folder and prints the contents of the file. im not sure how to sort the files before reading the files 1 by 1.
Is there also a way to read only a specific file (JPE*.log)
import os
for path, dirs, files in os.walk("C:\TAR\TARS_Source/"):
for f in files:
fileName = os.path.join(path, f)
with open(fileName, "r") as myFile:
print(myFile.read())

Use glob.glob method to get all files using a regex like following...
import glob
files=glob.glob('./JPE*.log')
And you can use following to sort files
sorted_files=sorted(files)

How to validate format of data using glob in python?

I have a list of different files in my folder and these files have several formats, like PDF, txt, Docx and HTML. I want to validate the format of the files in python.
Here is my attempt
import os
import pdftables_api
import glob
path = r"myfolder\*"
files = glob.glob(path)
for i in files:
if i.endswith('.pdf'):
conversion = pdftables_api.Client('my_api')
conversion.xlsx(i,r"destination\*")
The reason for this is I want to iterate through each file and check if the file is pdf, then it is pdf, convert it into excel using API from PDFTable_api package in python and save it in the destination folder. But I don't feel like this is an efficient way to do this.
Can anyone please help me if there is an efficient manner of achieving this?

Using Biopython SeqIO.convert over an entire directory

I have 51 files with metagenomic sequence data that I would like to convert from fastq to fasta using a Biopython script in Windows. The module SeqIO.convert easily converts an individually specified file, but I can't figure out how to convert the entire directory. It's not really too many files to do individually, but I'm trying to learn.
I'm brand new to Biopython, so please forgive my ignorance. This convo was helpful, but I'm still not able to convert the directory from fastq to fasta.
Here's the code I've been trying to run:
#modules-
import sys
import re
import os
import fileinput
from Bio import SeqIO
#define directory
Directory = "FastQ”
#convert files
def process(filename):
return SeqIO.convert(filename, "fastq", "files.fa", filename + ".fasta", "fasta", alphabet= IUPAC.ambiguous_dna)

You need to iterate over the files in the directory and convert them, so assuming your directory is FastQ and that you are calling your script from the proper folder (i.e. the one that your directory is in, since you are using a relative path), you would need to do something like:
def process(directory):
filelist = os.listdir(directory)
for f in filelist:
SeqIO.convert(f, "fastq", f.replace(".fastq",".fasta"), "fasta", alphabet= IUPAC.ambiguous_dna)
then you would call your script in your main:
my_directory = "FastQ"
process(my_directory)
I think that should work.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Writing XML filenames from folder in CSV in Python - python

A way to get a list of only the filenames: import pathlib files = [file.name for file in pathlib.Path(r"C:\temp").glob("*.xml")] The documentation for the csv module has some examples on how to write a .csv file

Related

Cant read a csv file in python because the file is not found

How to transfer multiple files from sub directories to a single path folder using python?

i have multiple txt files in 1 folder and I need to insert the txt data into MySql table using python

How to validate format of data using glob in python?

Using Biopython SeqIO.convert over an entire directory

Categories

Resources