Rename multiple folder names based on Text file - python

I would like to rename folders name in specific path based on text list I will provide.
for example , I have folders list structure as following :
/home/XXX-01/$file1
/home/XXX-12/$file2
/home/XXX-23/$file66
/home/XXX-32/$file44
/home/XXX-123/$file65
and rand.txt file with details what folder name need to be change to , for example
XXX-22
XXX-33
XXX-55
XXX-4321
XXX-24456
the final folder structure would be like
/home/XXX-22/$file1
/home/XXX-33/$file2
/home/XXX-55/$file66
/home/XXX-4321/$file44
/home/XXX-24456/$file65
thank you

Using GNU awk version 4.0.2
awk 'NR==FNR { # Process the list of directories (direcs.txt)
map[NR]=$0 # Create an array indexed by the number record and with the directory as the value
}
NR!=FNR { # Process the output of the find command
newpath=gensub("(/home/)(.*)(/.*$)","\\1"map[NR]"\\3",$0); # Create the new path using the entry in the map array
newdir=gensub("(/home/)(.*)(/.*.txt$)","\\1"map[num],$0) # Create the directory to create.
print "mkdir "newdir # Print the mkdir command
print "mv "$0" "newpath # Print the command to execute
}' direcs.txt <(find /home -regextype posix-extended -regex "^.*[[:digit:]]+.*$")
One liner:
awk 'NR==FNR {map[NR]=$0} NR!=FNR { newpath=gensub("(/home/)(.*)(/.*.txt$)","\\1"map[NR]"\\3",$0);newdir=gensub("(/home/)(.*)(/.*.txt$)","\\1"map[num],$0);print "mkdir "newdir;print "mv "$0" "newpath }' direcs.txt <(find /home -regextype posix-extended -regex "^.*[[:digit:]]+.*$")
Once you are happy that the commands look as expected, execute by piping through to bash/sh and so
awk 'NR==FNR {map[NR]=$0} NR!=FNR { newpath=gensub("(/home/)(.*)(/.*.txt$)","\\1"map[NR]"\\3",$0);newdir=gensub("(/home/)(.*)(/.*.txt$)","\\1"map[num],$0);print "mkdir "newdir;print "mv "$0" "newpath }' direcs.txt <(find /home/robbo -regextype posix-extended -regex "^.*[[:digit:]]+.*$") | bash

Related

Extract code of python file and retreive values from it with bash

I have a setup.py file, which looks like this:
#!/usr/bin/env python
DIR = Path(__file__).parent
README = (DIR / "README.md").read_text()
install_reqs = parse_requirements(DIR / "requirements.txt")
try:
dev_reqs = parse_requirements(DIR / "requirements-dev.txt")
except FileNotFoundError:
dev_reqs = {}
print("INFO: Could not find dev and/or prepro requirements txt file.")
if __name__ == "__main__":
setup(
keywords=[
"demo"
],
python_requires=">=3.7",
install_requires=install_reqs,
extras_require={"dev": dev_reqs},
entry_points={
"console_scripts": ["main=smamesdemo.run.main:main"]
},
)
I want find this file recursively in a folder and extract the following part:
entry_points={
"console_scripts": ["main=smamesdemo.run.main:main"]
},
which may also look like this
entry_points={"console_scripts": ["main=smamesdemo.run.main:main"]}
DESIRED: I want to check if that entrypoints dictionary contains a console_scripts part which a script that starts with the name main and return a True or False (or throw an error).
I get find the file and the needed values like this:
grep -rnw "./" -e "entry_points"
This returns the following:
./setup.py:22: entry_points={
Does anyone know how to solve this?
Assuming that there's only one entry_points={...} block in the setup.py file with the same syntax you have stated in your example.
The following script will find the setup.py file in the current directory by providing the requested outputs.
#!/bin/bash
directory_to_find='.' # Directory path to find the "${py_script}" file.
py_script="setup.py" # Python script file name.
dictionary_to_match="console_scripts" # Dictionary to match
dictionary_script_to_match="main" # Script name to match - If the Dictionary found.
########################################################
py_script=$(find "${directory_to_find}" -name "${py_script}" -type f)
found_entrypoint=$(
while IFS='' read -r line ;do
echo -n $line
done <<< $(fgrep -A1 'entry_points={' ${py_script})
echo -n '}' && echo ""
)
found_entrypoint_dictionary=$(echo ${found_entrypoint} | awk -F'{' '{print $2}' | awk -F':' '{print $1}')
found_entrypoint_dictionary=$(echo ${found_entrypoint_dictionary//\"/})
found_dictionary_script=$(echo ${found_entrypoint} | awk -F'[' '{print $2}' | awk -F'=' '{print $1}')
found_dictionary_script=$(echo ${found_dictionary_script//\"/})
if ! [[ "${found_entrypoint}" =~ ^entry_points\=\{.* ]] ;then
echo "entry_points not found."
exit 1
fi
if [ "${found_entrypoint_dictionary}" == "${dictionary_to_match}" ] && [ "${found_dictionary_script}" == "${dictionary_script_to_match}" ] ;then
echo "${found_entrypoint}"
echo "True"
elif [ "${found_entrypoint_dictionary}" != "${dictionary_to_match}" ] || [ "${found_dictionary_script}" != "${dictionary_script_to_match}" ] ;then
echo "${found_entrypoint}"
echo "False"
else
echo "Somthing went wrong!"
exit 2
fi
exit 0

How to replace/rename the headers in a text file?

I had around 10,000 text files in a directory,
I'd like to replace the text file headers with a same keyword (>zebra_cat).
Orginal file
head -n 1 dat.txt
>bghjallshkal
Modified file
head -n 1 dat.txt
>zebra_cat
sed '/^>/ s/.*/>dat/' *.txt
The output generates by sed is an concatenated file,
by adding loop, redirected output to a separate output files.
Is it possible to rename the header names with their respective file names ?
Orginal file
head -n 1 dat.txt
>xxxxxxxx ; zxf
Modified file
head -n 1 dat.txt
>dat
Suggestions please!
This is pretty simple using sed:
#!/bin/bash
filename="dat.txt" # Or maybe use Command-line parameter ${1}.
if [ ! -f ${filename} ]; then
echo "'${filename}' is not a file."
exit 1
elif [ ! -w ${filename} ]; then
echo "'${filename}' is not writable."
exit 1
fi
sed -i '1s/^.*$/> '${filename}'/' ${filename}
The -i option tells sed to update the file in-place.

How to setup a snakemake rule whose target files are determined by file content?

I would like to split a sam file into multiple sam files according to the barcode info. And the query barcode info are list in another file.
$ cat barcode.list
ATGCATGC
TTTTAAAA
GGGGCCCC
CGCGATGA
AAGGTTCC
....
A simple bash script below can achieve the goal.
barcode_list=./A_barcode.csv
input_bam=./A_input.bam
splited_dir=./splited_sam/A
filtered_dir="./filterd_sam/A"
mkdir -p ${splited_dir} ${splited_dir}
header=$(samtools view -H ${input_bam})
samtools view {input.bam} | LC_ALL=C fgrep -f <(cat ${barcode_list}) | awk -v header="${header}" -v outdir="${splited_dir}" '{barcode=substr($0,index($0, "\tCB:Z:")+6,18);if (!header_printed[barcode]++) {print $0 >> outdir"/"barcode".sam"}}'
for sam in ${output_dir};do samtools view -q 30 -m 1 ${sam} -O bam -o ${filtered_dir}/$(basename ${sam} "sam")"bam";done
Note: Only barcodes that are in both barcode_list file and input_bam file will be recorded into a new file.
But I want to rewrite the script into sankemake pipeline for better scaling up. The solution that I tried is shown below.
I don't known how to assign input file name in the final step of all the rules, rule all in this example. Because they are determined by both input_bam and input_barcode file. Meanwhile, without the knowledge of splited_sam filename, I can't go though the next step either.
SAMPLES = ["A", "B", "C", "D"]
# BARCODE = ???
rule all:
input:
splited_sam_dir = expand("splited_sam/{sample}", sample=SAMPLES)
rule split_sam:
input:
bar = "{sample}_barcode.csv",
bam = "{sample}_input.bam"
output:
splited_sam_dir = "splited_sam/{sample}"
shell:
"""
header=$(samtools view -H {input.bam})
samtools view {input.bam} | LC_ALL=C fgrep -f <(cat {input.bar}) | awk -v header="$header" -v outdir="{output.splited_sam_dir}" '{{barcode=substr($0,index($0, "\tCB:Z:")+6,18);if (!header_printed[barcode]++) {{print $0 >> outdir"/"barcode".sam"}}}}
"""
rule filter_sam:
# ??? don't know the input file name...
I think you need to define "split_sam" as a checkpoint rule, see the doc on checkpoints.
The DAG will be recalculated for all rules that depend on the output of this rule once the checkpoint rule is executed.

search strings in multiple textfile

I have thousands of text files on my disk.
I need to search for them in terms of selected words.
Currently, I use:
grep -Eri 'text1|text2|text3|textn' dir/ > results.txt
The result is saved to a file: results.txt
I would like the result to be saved to many files.
results_text1.txt, results_text2.txt, results_textn.txt
Maybe someone has encountered some kind of script eg in python?
One solution might be to use a bash for loop.
for word in text1 text2 text3 textn; do grep -Eri '$word' dir/ > results_$word.txt; done
You can run this directly from the command line.
By using combination of "sed" and "xargs"
echo "text1,text2,text3,textn" | sed "s/,/\n/g" | xargs -I{} sh -c "grep -ir {} * > result_{}"
One way (using Perl because it's easier for regex and one-liner).
Sample data:
% mkdir dir dir/dir1 dir/dir2
% echo -e "text1\ntext2\nnope" > dir/file1.txt
% echo -e "nope\ntext3" > dir/dir1/file2.txt
% echo -e "nope\ntext2" > dir/dir1/file3.txt
Search:
% find dir -type f -exec perl -ne '/(text1|text2|text3|textn)/ or next;
$pat = $1; unless ($fh{$pat}) {
($fn = $1) =~ s/\W+/_/ag;
$fn = "results_$fn.txt";
open $fh{$pat}, ">>", $fn;
}
print { $fh{$pat} } "$ARGV:$_"' {} \;
Content of results_text1.txt:
dir/file1.txt:text1
Content of results_text2.txt:
dir/dir2/file3.txt:text2
dir/file1.txt:text2
Content of results_text3.txt:
dir/dir1/file2.txt:text3
Note:
you need to put the pattern inside parentheses to capture it. grep doesn't allow one to do this.
the captured pattern is then filtered (s/\W+/_/ag means to replace nonalphanumeric characters with underscore) to ensure it's safe as part of a filename.

Removing non-alphanumeric characters with bash or python

I have much files like these(please see the screenshot):
30.230201521829.jpg
Mens-Sunglasses_L.180022111040.jpg
progressive-sunglasses.180041285287.jpg
Atmosphere.222314222509.jpg
Womens-Sunglasses-L.180023271958.jpg
DAILY ESSENTIALS.211919012115.jpg
aviator-l.Sunglasses.240202216759.jpg
aviator-l.Sunglasses.women.240202218530.jpg
I want to raname them to the following:
230201521829.jpg
180022111040.jpg
180041285287.jpg
222314222509.jpg
172254027299.jpg
211919012115.jpg
240202216759.jpg
240202218530.jpg
230201521829 is a timestamp ,180022111040 is a timestamp,180041285287 is a timestamp, etc.
Ensure that the final file name looks like "timestamp.jpg".
But I am not able to write the script more.
Sed(Bash) command or Python can be used to do it?
Could you give me a example? Thanks.
Using command substitution for renaming the file. Following code will loop to the current directory's (unless path is modified) jpg files.
Awk is used to filter out the penultimate and last column of file name which are separated by "." .
for file in *.jpg
do
mv "$file" $(echo "$file" |awk -F'.' '{print $(NF-1)"." $NF}')
done
I use python
examp.
import os
import sys
import glob
pth = "C:\Users\Test"
dir_show = os.listdir(pth)
for list_file in dir_show:
if list_file.endswith(".JPG"):
(shrname, exts) = os.path.splitext(list_file)
path = os.path.join(pth, list_file)
newname=os.path.join(pth,shrname[shrname.find(".")+1:len(shrname)]+".JPG")
os.rename(path,newname)
Using perl rename one-liner:
$ touch 30.230201521829.jpg Mens-Sunglasses_L.180022111040.jpg progressive-sunglasses.180041285287.jpg Atmosphere.222314222509.jpg Womens-Sunglasses-L.180023271958.jpg Womens-Eyeglasses-R.172254027299.jpg
$ ls -1
30.230201521829.jpg
Atmosphere.222314222509.jpg
Mens-Sunglasses_L.180022111040.jpg
progressive-sunglasses.180041285287.jpg
Womens-Eyeglasses-R.172254027299.jpg
Womens-Sunglasses-L.180023271958.jpg
$ prename -v 's/^[^.]*\.//' *.*.jpg
30.230201521829.jpg renamed as 230201521829.jpg
Atmosphere.222314222509.jpg renamed as 222314222509.jpg
Mens-Sunglasses_L.180022111040.jpg renamed as 180022111040.jpg
progressive-sunglasses.180041285287.jpg renamed as 180041285287.jpg
Womens-Eyeglasses-R.172254027299.jpg renamed as 172254027299.jpg
Womens-Sunglasses-L.180023271958.jpg renamed as 180023271958.jpg
You can use parameter expansion to strip off the extension, then
remove all but the last .-delimited field from the remaining name. After than, you can reapply the extension.
for f in *; do
ext=${f##*.}
base=${f%.$ext}
mv -- "$f" "${base##*.}.$ext"
done
The first line sets ext to the string following the last .. The second line sets base to the string that precedes the last . (by removing the last . and whatever $ext was set to). The third line constructs a new file name by first removing everything up to, and including, the final . in base, then reapplying the extension to the result.
#!/bin/bash/
echo "test: "
echo "" > 30.230201521829.jpg
echo "" > Mens-Sunglasses_L.180022111040.jpg
echo "" > progressive-sunglasses.180041285287.jpg
echo "" > Atmosphere.222314222509.jpg
echo "" > Womens-Sunglasses-L.180023271958.jpg
echo "" > DAILY\ ESSENTIALS.211919012115.jpg
echo "" > aviator-l.Sunglasses.240202216759.jpg
echo "" > aviator-l.Sunglasses.women.240202218530.jpg
echo "before: "
ls -ltr
for f in *.jpg; do
renamed=${f: -16}
mv "${f}" "${renamed}"
done

Categories

Resources