Renaming multiple file names with date - python

I would like to ask for your help on renaming multiple files with date. I have netcdf files "wrfoutput_d01_2016-08-01_00:00:00" until "wrfoutput_d01_2016-08-31_00:00:00" which windows do not read since output is from Linux. I wanted to change the file name to "wrfoutput_d01_2016-08-01_00" until "wrfoutput_d01_2016-08-31_00". How do I do that using python?
Edit:
The containing folder has two set of files. One for domain 1 as denoted by d01, wrfoutput_d01_2016-08-31_00:00:00, and the other set is denoted by d02, wrfoutput_d02_2016-08-31_00:00:00. Total files for d01 is 744 since time step output is hourly same as with d02.
I wanted to rename for each day on an hourly basis. Say, wrfoutput_d01_2016-08-01_00:00:00, wrfoutput_d01_2016-08-01_01:00:00,... to wrfoutput_d01_2016-08-01_00, wrfoutput_d01_2016-08-01_01,...
I saw a code which allows me to access the specific file, e.g. d01 or d02.
import os
from netCDF4 import Dataset
from wrf import getvar
filedir = "/home/gil/WRF/Output/August/"
wrfin = [Dataset(f) for f in os.listdir(filedir)
if f.startswith("wrfout_d02_")]
After this code I get stuck.

First get the filenames, giving the folder path ('/home/user/myfolder...'), then rename them.
import os
import re
filenames = os.listdir(folder_path)
for fn in filenames:
os.rename(fn, re.sub(':','-',fn))

The other answer converts the colons to hyphens. If you wish to truncate the time from the file name, you can use this.
This assumes the files are in the same directory as the python script. If not, change '.' to 'path/to/dir/'. It also only looks at files that have the name format 'wrfoutput...' when it renames them.
from os import listdir, rename
from os.path import isfile, join
only_files = [f for f in listdir('.') if isfile(join('.', f))]
for f in only_files:
# Get the relevant files
if 'wrfoutput' in f:
# Remove _HH:MM:SS from end of file name
rename(f, f[:-9])

Open the Terminal
cd into your directory (cd /home/myfolder)
Start python (python)
Now, a simple rename.
import os
AllFiles=os.listdir('.')
for eachfile in AllFiles:
os.rename(eachfile,eachfile.replace(':','_'))

Related

Access data from particular directory with less line of codes

Assume, I have a csv file data.csv located in the following directory 'C:\\Users\\rp603\\OneDrive\\Documents\\Python Scripts\\Basics\\tutorials\\Revision\\datasets'. Using this code, I can access my csv file:
## read the csv file from a particular folder
import pandas as pd
import glob
files = glob.glob(r"C:\\Users\\rp603\\OneDrive\\Documents\\Python Scripts\\Basics\\tutorials\\Revision\\datasets*.csv")
df = pd.DataFrame()
for f in files:
csv = pd.read_csv(f)
df = df.append(csv)
But as you can see the csv file path is long. So, is there is any way to do the same operation where I can reduce the path location of my data as well as codes line.
use the "dot" notation for a relative path (it does not depend on the programming language)
# example for a "shorter" version of the path
import os
my_current_position = '.' # where you launch the program
files = '' # from above
print(os.path.relpath(files, my_current_position)
Remark relpath is order sensitive
You can use a context manager to open the file, not shorter but more elegant
with open(file, 'r') as fd:
data_table = pd.read_csv(fd)
If you put your script in the same directory as the datasets, you can simply do:
import glob
files = glob.glob("datasets*.csv")

Import file of semi-known name to pandas and return filename

I'm writing a script to import a csv using pandas, then upload it to a SQL server. It all works when I have a test file with one name, but that won't be the case in production. I need to figure out how to import with only a semi-known filename (the filename will always follow the same convention, but there will be differences in the filenames such as dates/times). I should, however, note that there will only ever be one csv file in the folder it's checking. I have tried the wildcard in the import path, but that didn't work.
After it's imported, I also then need to return the filename.
Thanks!
Look into the OS module:
import os
files = os.listdir("./")
csv_files = [filename for filename in files if filename.endswith(".csv")]
csv_files is a list with all the files that ends with .csv

How to import all csv files in one folder and make the filename the variable name in pandas?

I would like to automatically import all csv files that are in one folder as dataframes and set the dataframe's variable name to the respective filename.
For example, in the folder are the following three files: data1.csv, data2.csv and data3.csv
How can I automatically import all three files having three dataframes (data1, data2 and data3) as the result?
If you want to save dataframe as variable with own file name. But it is not secure. This could cause code injection.
import pandas
import os
path = "path_of_directory"
files = os.listdir(path) # Returns list of files in the folder which is specifed path
for file in files:
if file.endswith(".csv"):# Checking wheter file endswith .csv
# os.sep returns the separtor of operator system
exec(f"{file[:-4]} = pandas.read_csv({path}+{os.sep}+{file})")
You can loop over the directory using pathlib and build a dictionary of name->DataFrame, eg:
import pathlib
import pandas as pd
dfs = {path.stem: pd.read_csv(path) for path in pathlib.Path('thepath/').glob(*.csv')}
Then access as dfs['test1'] etc...
Since the answer that was given includes an exec command, and munir.aygun already warned you what could go wrong with that approach. Now I want to show you the way to do it as Justin Ezequiel or munir.aygun already suggested:
import os
import glob
import pandas as pd
# Path to your data
path = r'D:\This\is\your\path'
# Get all .csv files at your path
allFiles = glob.glob(path + "/*.csv")
# Read in the data from files and safe to dictionary
dataStorage = {}
for filename in allFiles:
name = os.path.basename(filename).split(".")[0]
dataStorage[name] = pd.read_csv(filename)
# Can be used then like this (for printing here)
if "data1" in dataStorage:
print(dataStorage["data1"])
Hope this can still be helpful.

How can I read files with similar names on python, rename them and then work with them?

I've already posted here with the same question but I sadly I couldn't come up with a solution (even though some of you guys gave me awesome answers but most of them weren't what I was looking for), so I'll try again and this time giving more information about what I'm trying to do.
So, I'm using a program called GMAT to get some outputs (.txt files with numerical values). These outputs have different names, but because I'm using them to more than one thing I'm getting something like this:
GMATd_1.txt
GMATd_2.txt
GMATf_1.txt
GMATf_2.txt
Now, what I need to do is to use these outputs as inputs in my code. I need to work with them in other functions of my script, and since I will have a lot of these .txt files I want to rename them as I don't want to use them like './path/etc'.
So what I wanted was to write a loop that could get these files and rename them inside the script so I can use these files with the new name in other functions (outside the loop).
So instead of having to this individually:
GMATds1= './path/GMATd_1.txt'
GMATds2= './path/GMATd_2.txt'
I wanted to write a loop that would do that for me.
I've already tried using a dictionary:
import os
import fnmatch
dict = {}
for filename in os.listdir('.'):
if fnmatch.fnmatch(filename, 'thing*.txt'):
examples[filename[:6]] = filename
This does work but I can't use the dictionary key outside the loop.
If I understand correctly, you try to fetch files with similar names (at least a re-occurring pattern) and rename them. This can be accomplished with the following code:
import glob
import os
all_files = glob.glob('path/to/directory/with/files/GMAT*.txt')
for file in files:
new_path = create_new_path(file) # possibly split the file name, change directory and/or filename
os.rename(file, new_path)
The glob library allows for searching files with * wildcards and makes it hence possible to search for files with a specific pattern. It lists all the files in a certain directory (or multiple directories if you include a * wildcard as a directory). When you iterate over the files, you could either directly work with the input of the files (as you apparently intend to do) or rename them as shown in this snippet. To rename them, you would need to generate a new path - so you would have to write the create_new_path function that takes the old path and creates a new one.
Since python 3.4 you should be using the built-in pathlib package instead of os or glob.
from pathlib import Path
import shutil
for file_src in Path("path/to/files").glob("GMAT*.txt"):
file_dest = str(file_src.resolve()).replace("ds", "d_")
shutil.move(file_src, file_dest)
you can use
import os
path='.....' # path where these files are located
path1='.....' ## path where you want these files to store
i=1
for file in os.listdir(path):
if file.endswith(end='.txt'):
os.rename(path + "/" + file, path1 + "/"+str(i) + ".txt")
i+=1
it will rename all the txt file in the source folder to 1,2,3,....n.txt

Using Biopython SeqIO.convert over an entire directory

I have 51 files with metagenomic sequence data that I would like to convert from fastq to fasta using a Biopython script in Windows. The module SeqIO.convert easily converts an individually specified file, but I can't figure out how to convert the entire directory. It's not really too many files to do individually, but I'm trying to learn.
I'm brand new to Biopython, so please forgive my ignorance. This convo was helpful, but I'm still not able to convert the directory from fastq to fasta.
Here's the code I've been trying to run:
#modules-
import sys
import re
import os
import fileinput
from Bio import SeqIO
#define directory
Directory = "FastQ”
#convert files
def process(filename):
return SeqIO.convert(filename, "fastq", "files.fa", filename + ".fasta", "fasta", alphabet= IUPAC.ambiguous_dna)
You need to iterate over the files in the directory and convert them, so assuming your directory is FastQ and that you are calling your script from the proper folder (i.e. the one that your directory is in, since you are using a relative path), you would need to do something like:
def process(directory):
filelist = os.listdir(directory)
for f in filelist:
SeqIO.convert(f, "fastq", f.replace(".fastq",".fasta"), "fasta", alphabet= IUPAC.ambiguous_dna)
then you would call your script in your main:
my_directory = "FastQ"
process(my_directory)
I think that should work.

Categories

Resources