How to replace/rename the headers in a text file? - python

I had around 10,000 text files in a directory,
I'd like to replace the text file headers with a same keyword (>zebra_cat).
Orginal file
head -n 1 dat.txt
>bghjallshkal
Modified file
head -n 1 dat.txt
>zebra_cat
sed '/^>/ s/.*/>dat/' *.txt
The output generates by sed is an concatenated file,
by adding loop, redirected output to a separate output files.
Is it possible to rename the header names with their respective file names ?
Orginal file
head -n 1 dat.txt
>xxxxxxxx ; zxf
Modified file
head -n 1 dat.txt
>dat
Suggestions please!

This is pretty simple using sed:
#!/bin/bash
filename="dat.txt" # Or maybe use Command-line parameter ${1}.
if [ ! -f ${filename} ]; then
echo "'${filename}' is not a file."
exit 1
elif [ ! -w ${filename} ]; then
echo "'${filename}' is not writable."
exit 1
fi
sed -i '1s/^.*$/> '${filename}'/' ${filename}
The -i option tells sed to update the file in-place.

Related

Store output of gunzip/tar -t in Python

I try to store output of command : gunzip -t / tar -t in Python but I dont know how. Indeed, in shell termianl I have no problem to ctch result with echo $? but in Python it's impossible with os.popen() or os.system().
My current script is below :
os.system("gunzip -t Path_to_tar.gz")
gzip_corrupt = os.popen("echo $?").read().replace('\n','')
os.system("gunzip -c Path_to_tar.gz | tar -t > /dev/null")
tar_corrup = os.popen("echo $?").read().replace('\n','')
print(tar_corrup)
print(gzip_corrupt)
Do you have an idea how to store output of gunzip -t in python please ?
I'm no python wiz, but I'd say you need to change your output for os.system from:
os.system("gunzip -c Path_to_tar.gz | tar -t > /dev/null")
to something like:
os.system("gunzip -c Path_to_tar.gz | tar -t > /tmp/myfile.out")
Then, turn around, open up /tmp/myfile.out, and read it back in, etc. (I'd suggest generating a unique name to avoid multiple runs that would collide and cause errors - also include a date/time stamp to keep separate runs - separate)
This line:
tar_corrup = os.popen("echo $?").read().replace('\n','')
is only going to give you the exit code of the gunzip command - NOT the output of gunzip itself (see "What does echo $? do?" 1.)
This is a "brute-force" method - but easy to read, and edit later, and should work.
This is a solution of my question, i test it on differents tar file and it's look like to work :
# On check si la backup est corrompue
gzip_corrupt = False
tar_corrup = False
if const.weekday < 6:
incr_backup.incremental_backup()
else:
full_backup.full_backup()
# On vérifie l'intégrité du gzip
if os.system("gunzip -t " + const.target_directory + const.backup_file_name):
gzip_corrupt = True
# On vérifie l'intégrité du tar
if os.system("gunzip -c " + const.target_directory + const.backup_file_name + " | tar -t"):
tar_corrup = True
# Si la backup est corrompue --> on envoie un mail à la BAL PIC
if gzip_corrupt or tar_corrup:
mail_notice.send_email()
So, apparently. os.system() know is tar -t or gunzip -t output something and I test with an if block.
If os.system("tar -t ...") return something , it means tar is corrupted or it's not a tar file so my boolean takes True
When tar is ok, os.system() return nothing --> read as False by Python
I hope it will help other on this specific command in python
Thank you for help all

Rename multiple folder names based on Text file

I would like to rename folders name in specific path based on text list I will provide.
for example , I have folders list structure as following :
/home/XXX-01/$file1
/home/XXX-12/$file2
/home/XXX-23/$file66
/home/XXX-32/$file44
/home/XXX-123/$file65
and rand.txt file with details what folder name need to be change to , for example
XXX-22
XXX-33
XXX-55
XXX-4321
XXX-24456
the final folder structure would be like
/home/XXX-22/$file1
/home/XXX-33/$file2
/home/XXX-55/$file66
/home/XXX-4321/$file44
/home/XXX-24456/$file65
thank you
Using GNU awk version 4.0.2
awk 'NR==FNR { # Process the list of directories (direcs.txt)
map[NR]=$0 # Create an array indexed by the number record and with the directory as the value
}
NR!=FNR { # Process the output of the find command
newpath=gensub("(/home/)(.*)(/.*$)","\\1"map[NR]"\\3",$0); # Create the new path using the entry in the map array
newdir=gensub("(/home/)(.*)(/.*.txt$)","\\1"map[num],$0) # Create the directory to create.
print "mkdir "newdir # Print the mkdir command
print "mv "$0" "newpath # Print the command to execute
}' direcs.txt <(find /home -regextype posix-extended -regex "^.*[[:digit:]]+.*$")
One liner:
awk 'NR==FNR {map[NR]=$0} NR!=FNR { newpath=gensub("(/home/)(.*)(/.*.txt$)","\\1"map[NR]"\\3",$0);newdir=gensub("(/home/)(.*)(/.*.txt$)","\\1"map[num],$0);print "mkdir "newdir;print "mv "$0" "newpath }' direcs.txt <(find /home -regextype posix-extended -regex "^.*[[:digit:]]+.*$")
Once you are happy that the commands look as expected, execute by piping through to bash/sh and so
awk 'NR==FNR {map[NR]=$0} NR!=FNR { newpath=gensub("(/home/)(.*)(/.*.txt$)","\\1"map[NR]"\\3",$0);newdir=gensub("(/home/)(.*)(/.*.txt$)","\\1"map[num],$0);print "mkdir "newdir;print "mv "$0" "newpath }' direcs.txt <(find /home/robbo -regextype posix-extended -regex "^.*[[:digit:]]+.*$") | bash

Bash While Loop Continue Despite non 0 return

Im writing a bash script that automates the usage of other python tests (imagedeploy, hydrationwiring). The bash script looks at a .txt list of device names, and then goes down the list and performs 2 things (imagedeploy,hydrationwiring) on each name in the .txt.
What happens is that hydrationwiring test will return a non zero return value at the end, which breaks the loop and ends the script.
I want the script to continue going down the list of p, regardless of non 0 returns, until each device in the list p has been touched.
My question: how can I make my while loop continue on regardless of non 0 returns.
#!/bin/bash
if [ -z "$1" ]
then
echo "Usage: "devices.txt""
exit 1
fi
FILENAME=$1
RESULTFILE="/home/user/ssim-results/RduLabTestResults"
date >> $RESULTFILE
while read p; do
echo "TESTING $p:"
LOGFILE="/home/user/ssim-results/RduLabTestLog_${p}.log"
SUMMARYFILE="/home/user/ssim-results/RduLabTestLog_${p}.summary"
#echo "STEP1: imagedeploy -d $p --latest-release4"
echo "STEP1: imagedeploy -d $p --latest-release4"
#imagedeploy -d $p --latest-release4
if [ $? -eq 0 ] #imagedeploy pass/failure condition
then
echo "STEP2: LLDP check"
#runtests.sh -l INFO --vx-img -i /home/frogs/vmlocker/cloud/vx/latest-vx-rel $TESTS_HOME/tests/platform/HydrationWiring.py -d $p -T $LOGFILE -r $SUMMARYFILE
runtests.sh -l INFO --vx-img -i $VXREL3 $TESTS_HOME/tests/platform/HydrationWiring.py -d $p -T $LOGFILE -r $SUMMARYFILE
echo "STEP3: checking result"
if grep --quiet success $SUMMARYFILE
then
echo "$p PASS" >> $RESULTFILE
else
echo "$p FAIL" >> $RESULTFILE
fi
else
echo "imagedeploy failed"
fi
done <$FILENAME
ImageDeploy is commented out because imagedeploy works as intended. The issue is in "step 2". The runtest.sh hydrationwiring
output:
FAILED (errors=1)
STEP3: checking result
It only tested the first device on my list because it failed, I would like the output to be something like this:
FAILED (errors=1)
STEP3: checking result
next device...
PASS
STEP3: checking result
next device...
FAILED (error=1)
STEP3: checking result
next device...
Passed
STEP3: checking result
etc

How to setup a snakemake rule whose target files are determined by file content?

I would like to split a sam file into multiple sam files according to the barcode info. And the query barcode info are list in another file.
$ cat barcode.list
ATGCATGC
TTTTAAAA
GGGGCCCC
CGCGATGA
AAGGTTCC
....
A simple bash script below can achieve the goal.
barcode_list=./A_barcode.csv
input_bam=./A_input.bam
splited_dir=./splited_sam/A
filtered_dir="./filterd_sam/A"
mkdir -p ${splited_dir} ${splited_dir}
header=$(samtools view -H ${input_bam})
samtools view {input.bam} | LC_ALL=C fgrep -f <(cat ${barcode_list}) | awk -v header="${header}" -v outdir="${splited_dir}" '{barcode=substr($0,index($0, "\tCB:Z:")+6,18);if (!header_printed[barcode]++) {print $0 >> outdir"/"barcode".sam"}}'
for sam in ${output_dir};do samtools view -q 30 -m 1 ${sam} -O bam -o ${filtered_dir}/$(basename ${sam} "sam")"bam";done
Note: Only barcodes that are in both barcode_list file and input_bam file will be recorded into a new file.
But I want to rewrite the script into sankemake pipeline for better scaling up. The solution that I tried is shown below.
I don't known how to assign input file name in the final step of all the rules, rule all in this example. Because they are determined by both input_bam and input_barcode file. Meanwhile, without the knowledge of splited_sam filename, I can't go though the next step either.
SAMPLES = ["A", "B", "C", "D"]
# BARCODE = ???
rule all:
input:
splited_sam_dir = expand("splited_sam/{sample}", sample=SAMPLES)
rule split_sam:
input:
bar = "{sample}_barcode.csv",
bam = "{sample}_input.bam"
output:
splited_sam_dir = "splited_sam/{sample}"
shell:
"""
header=$(samtools view -H {input.bam})
samtools view {input.bam} | LC_ALL=C fgrep -f <(cat {input.bar}) | awk -v header="$header" -v outdir="{output.splited_sam_dir}" '{{barcode=substr($0,index($0, "\tCB:Z:")+6,18);if (!header_printed[barcode]++) {{print $0 >> outdir"/"barcode".sam"}}}}
"""
rule filter_sam:
# ??? don't know the input file name...
I think you need to define "split_sam" as a checkpoint rule, see the doc on checkpoints.
The DAG will be recalculated for all rules that depend on the output of this rule once the checkpoint rule is executed.

search strings in multiple textfile

I have thousands of text files on my disk.
I need to search for them in terms of selected words.
Currently, I use:
grep -Eri 'text1|text2|text3|textn' dir/ > results.txt
The result is saved to a file: results.txt
I would like the result to be saved to many files.
results_text1.txt, results_text2.txt, results_textn.txt
Maybe someone has encountered some kind of script eg in python?
One solution might be to use a bash for loop.
for word in text1 text2 text3 textn; do grep -Eri '$word' dir/ > results_$word.txt; done
You can run this directly from the command line.
By using combination of "sed" and "xargs"
echo "text1,text2,text3,textn" | sed "s/,/\n/g" | xargs -I{} sh -c "grep -ir {} * > result_{}"
One way (using Perl because it's easier for regex and one-liner).
Sample data:
% mkdir dir dir/dir1 dir/dir2
% echo -e "text1\ntext2\nnope" > dir/file1.txt
% echo -e "nope\ntext3" > dir/dir1/file2.txt
% echo -e "nope\ntext2" > dir/dir1/file3.txt
Search:
% find dir -type f -exec perl -ne '/(text1|text2|text3|textn)/ or next;
$pat = $1; unless ($fh{$pat}) {
($fn = $1) =~ s/\W+/_/ag;
$fn = "results_$fn.txt";
open $fh{$pat}, ">>", $fn;
}
print { $fh{$pat} } "$ARGV:$_"' {} \;
Content of results_text1.txt:
dir/file1.txt:text1
Content of results_text2.txt:
dir/dir2/file3.txt:text2
dir/file1.txt:text2
Content of results_text3.txt:
dir/dir1/file2.txt:text3
Note:
you need to put the pattern inside parentheses to capture it. grep doesn't allow one to do this.
the captured pattern is then filtered (s/\W+/_/ag means to replace nonalphanumeric characters with underscore) to ensure it's safe as part of a filename.

Categories

Resources