I had around 10,000 text files in a directory,
I'd like to replace the text file headers with a same keyword (>zebra_cat).
Orginal file
head -n 1 dat.txt
>bghjallshkal
Modified file
head -n 1 dat.txt
>zebra_cat
sed '/^>/ s/.*/>dat/' *.txt
The output generates by sed is an concatenated file,
by adding loop, redirected output to a separate output files.
Is it possible to rename the header names with their respective file names ?
Orginal file
head -n 1 dat.txt
>xxxxxxxx ; zxf
Modified file
head -n 1 dat.txt
>dat
Suggestions please!
This is pretty simple using sed:
#!/bin/bash
filename="dat.txt" # Or maybe use Command-line parameter ${1}.
if [ ! -f ${filename} ]; then
echo "'${filename}' is not a file."
exit 1
elif [ ! -w ${filename} ]; then
echo "'${filename}' is not writable."
exit 1
fi
sed -i '1s/^.*$/> '${filename}'/' ${filename}
The -i option tells sed to update the file in-place.
Related
I try to store output of command : gunzip -t / tar -t in Python but I dont know how. Indeed, in shell termianl I have no problem to ctch result with echo $? but in Python it's impossible with os.popen() or os.system().
My current script is below :
os.system("gunzip -t Path_to_tar.gz")
gzip_corrupt = os.popen("echo $?").read().replace('\n','')
os.system("gunzip -c Path_to_tar.gz | tar -t > /dev/null")
tar_corrup = os.popen("echo $?").read().replace('\n','')
print(tar_corrup)
print(gzip_corrupt)
Do you have an idea how to store output of gunzip -t in python please ?
I'm no python wiz, but I'd say you need to change your output for os.system from:
os.system("gunzip -c Path_to_tar.gz | tar -t > /dev/null")
to something like:
os.system("gunzip -c Path_to_tar.gz | tar -t > /tmp/myfile.out")
Then, turn around, open up /tmp/myfile.out, and read it back in, etc. (I'd suggest generating a unique name to avoid multiple runs that would collide and cause errors - also include a date/time stamp to keep separate runs - separate)
This line:
tar_corrup = os.popen("echo $?").read().replace('\n','')
is only going to give you the exit code of the gunzip command - NOT the output of gunzip itself (see "What does echo $? do?" 1.)
This is a "brute-force" method - but easy to read, and edit later, and should work.
This is a solution of my question, i test it on differents tar file and it's look like to work :
# On check si la backup est corrompue
gzip_corrupt = False
tar_corrup = False
if const.weekday < 6:
incr_backup.incremental_backup()
else:
full_backup.full_backup()
# On vérifie l'intégrité du gzip
if os.system("gunzip -t " + const.target_directory + const.backup_file_name):
gzip_corrupt = True
# On vérifie l'intégrité du tar
if os.system("gunzip -c " + const.target_directory + const.backup_file_name + " | tar -t"):
tar_corrup = True
# Si la backup est corrompue --> on envoie un mail à la BAL PIC
if gzip_corrupt or tar_corrup:
mail_notice.send_email()
So, apparently. os.system() know is tar -t or gunzip -t output something and I test with an if block.
If os.system("tar -t ...") return something , it means tar is corrupted or it's not a tar file so my boolean takes True
When tar is ok, os.system() return nothing --> read as False by Python
I hope it will help other on this specific command in python
Thank you for help all
I would like to rename folders name in specific path based on text list I will provide.
for example , I have folders list structure as following :
/home/XXX-01/$file1
/home/XXX-12/$file2
/home/XXX-23/$file66
/home/XXX-32/$file44
/home/XXX-123/$file65
and rand.txt file with details what folder name need to be change to , for example
XXX-22
XXX-33
XXX-55
XXX-4321
XXX-24456
the final folder structure would be like
/home/XXX-22/$file1
/home/XXX-33/$file2
/home/XXX-55/$file66
/home/XXX-4321/$file44
/home/XXX-24456/$file65
thank you
Using GNU awk version 4.0.2
awk 'NR==FNR { # Process the list of directories (direcs.txt)
map[NR]=$0 # Create an array indexed by the number record and with the directory as the value
}
NR!=FNR { # Process the output of the find command
newpath=gensub("(/home/)(.*)(/.*$)","\\1"map[NR]"\\3",$0); # Create the new path using the entry in the map array
newdir=gensub("(/home/)(.*)(/.*.txt$)","\\1"map[num],$0) # Create the directory to create.
print "mkdir "newdir # Print the mkdir command
print "mv "$0" "newpath # Print the command to execute
}' direcs.txt <(find /home -regextype posix-extended -regex "^.*[[:digit:]]+.*$")
One liner:
awk 'NR==FNR {map[NR]=$0} NR!=FNR { newpath=gensub("(/home/)(.*)(/.*.txt$)","\\1"map[NR]"\\3",$0);newdir=gensub("(/home/)(.*)(/.*.txt$)","\\1"map[num],$0);print "mkdir "newdir;print "mv "$0" "newpath }' direcs.txt <(find /home -regextype posix-extended -regex "^.*[[:digit:]]+.*$")
Once you are happy that the commands look as expected, execute by piping through to bash/sh and so
awk 'NR==FNR {map[NR]=$0} NR!=FNR { newpath=gensub("(/home/)(.*)(/.*.txt$)","\\1"map[NR]"\\3",$0);newdir=gensub("(/home/)(.*)(/.*.txt$)","\\1"map[num],$0);print "mkdir "newdir;print "mv "$0" "newpath }' direcs.txt <(find /home/robbo -regextype posix-extended -regex "^.*[[:digit:]]+.*$") | bash
Im writing a bash script that automates the usage of other python tests (imagedeploy, hydrationwiring). The bash script looks at a .txt list of device names, and then goes down the list and performs 2 things (imagedeploy,hydrationwiring) on each name in the .txt.
What happens is that hydrationwiring test will return a non zero return value at the end, which breaks the loop and ends the script.
I want the script to continue going down the list of p, regardless of non 0 returns, until each device in the list p has been touched.
My question: how can I make my while loop continue on regardless of non 0 returns.
#!/bin/bash
if [ -z "$1" ]
then
echo "Usage: "devices.txt""
exit 1
fi
FILENAME=$1
RESULTFILE="/home/user/ssim-results/RduLabTestResults"
date >> $RESULTFILE
while read p; do
echo "TESTING $p:"
LOGFILE="/home/user/ssim-results/RduLabTestLog_${p}.log"
SUMMARYFILE="/home/user/ssim-results/RduLabTestLog_${p}.summary"
#echo "STEP1: imagedeploy -d $p --latest-release4"
echo "STEP1: imagedeploy -d $p --latest-release4"
#imagedeploy -d $p --latest-release4
if [ $? -eq 0 ] #imagedeploy pass/failure condition
then
echo "STEP2: LLDP check"
#runtests.sh -l INFO --vx-img -i /home/frogs/vmlocker/cloud/vx/latest-vx-rel $TESTS_HOME/tests/platform/HydrationWiring.py -d $p -T $LOGFILE -r $SUMMARYFILE
runtests.sh -l INFO --vx-img -i $VXREL3 $TESTS_HOME/tests/platform/HydrationWiring.py -d $p -T $LOGFILE -r $SUMMARYFILE
echo "STEP3: checking result"
if grep --quiet success $SUMMARYFILE
then
echo "$p PASS" >> $RESULTFILE
else
echo "$p FAIL" >> $RESULTFILE
fi
else
echo "imagedeploy failed"
fi
done <$FILENAME
ImageDeploy is commented out because imagedeploy works as intended. The issue is in "step 2". The runtest.sh hydrationwiring
output:
FAILED (errors=1)
STEP3: checking result
It only tested the first device on my list because it failed, I would like the output to be something like this:
FAILED (errors=1)
STEP3: checking result
next device...
PASS
STEP3: checking result
next device...
FAILED (error=1)
STEP3: checking result
next device...
Passed
STEP3: checking result
etc
I would like to split a sam file into multiple sam files according to the barcode info. And the query barcode info are list in another file.
$ cat barcode.list
ATGCATGC
TTTTAAAA
GGGGCCCC
CGCGATGA
AAGGTTCC
....
A simple bash script below can achieve the goal.
barcode_list=./A_barcode.csv
input_bam=./A_input.bam
splited_dir=./splited_sam/A
filtered_dir="./filterd_sam/A"
mkdir -p ${splited_dir} ${splited_dir}
header=$(samtools view -H ${input_bam})
samtools view {input.bam} | LC_ALL=C fgrep -f <(cat ${barcode_list}) | awk -v header="${header}" -v outdir="${splited_dir}" '{barcode=substr($0,index($0, "\tCB:Z:")+6,18);if (!header_printed[barcode]++) {print $0 >> outdir"/"barcode".sam"}}'
for sam in ${output_dir};do samtools view -q 30 -m 1 ${sam} -O bam -o ${filtered_dir}/$(basename ${sam} "sam")"bam";done
Note: Only barcodes that are in both barcode_list file and input_bam file will be recorded into a new file.
But I want to rewrite the script into sankemake pipeline for better scaling up. The solution that I tried is shown below.
I don't known how to assign input file name in the final step of all the rules, rule all in this example. Because they are determined by both input_bam and input_barcode file. Meanwhile, without the knowledge of splited_sam filename, I can't go though the next step either.
SAMPLES = ["A", "B", "C", "D"]
# BARCODE = ???
rule all:
input:
splited_sam_dir = expand("splited_sam/{sample}", sample=SAMPLES)
rule split_sam:
input:
bar = "{sample}_barcode.csv",
bam = "{sample}_input.bam"
output:
splited_sam_dir = "splited_sam/{sample}"
shell:
"""
header=$(samtools view -H {input.bam})
samtools view {input.bam} | LC_ALL=C fgrep -f <(cat {input.bar}) | awk -v header="$header" -v outdir="{output.splited_sam_dir}" '{{barcode=substr($0,index($0, "\tCB:Z:")+6,18);if (!header_printed[barcode]++) {{print $0 >> outdir"/"barcode".sam"}}}}
"""
rule filter_sam:
# ??? don't know the input file name...
I think you need to define "split_sam" as a checkpoint rule, see the doc on checkpoints.
The DAG will be recalculated for all rules that depend on the output of this rule once the checkpoint rule is executed.
I have thousands of text files on my disk.
I need to search for them in terms of selected words.
Currently, I use:
grep -Eri 'text1|text2|text3|textn' dir/ > results.txt
The result is saved to a file: results.txt
I would like the result to be saved to many files.
results_text1.txt, results_text2.txt, results_textn.txt
Maybe someone has encountered some kind of script eg in python?
One solution might be to use a bash for loop.
for word in text1 text2 text3 textn; do grep -Eri '$word' dir/ > results_$word.txt; done
You can run this directly from the command line.
By using combination of "sed" and "xargs"
echo "text1,text2,text3,textn" | sed "s/,/\n/g" | xargs -I{} sh -c "grep -ir {} * > result_{}"
One way (using Perl because it's easier for regex and one-liner).
Sample data:
% mkdir dir dir/dir1 dir/dir2
% echo -e "text1\ntext2\nnope" > dir/file1.txt
% echo -e "nope\ntext3" > dir/dir1/file2.txt
% echo -e "nope\ntext2" > dir/dir1/file3.txt
Search:
% find dir -type f -exec perl -ne '/(text1|text2|text3|textn)/ or next;
$pat = $1; unless ($fh{$pat}) {
($fn = $1) =~ s/\W+/_/ag;
$fn = "results_$fn.txt";
open $fh{$pat}, ">>", $fn;
}
print { $fh{$pat} } "$ARGV:$_"' {} \;
Content of results_text1.txt:
dir/file1.txt:text1
Content of results_text2.txt:
dir/dir2/file3.txt:text2
dir/file1.txt:text2
Content of results_text3.txt:
dir/dir1/file2.txt:text3
Note:
you need to put the pattern inside parentheses to capture it. grep doesn't allow one to do this.
the captured pattern is then filtered (s/\W+/_/ag means to replace nonalphanumeric characters with underscore) to ensure it's safe as part of a filename.