Delete and modify text in a bunch of files with python

Delete and modify text in a bunch of files with python - python

I have a bunch of files called test211.csh, test212.csh and so on. The first lines of each file I have:
"#$ -N /gridware/wor(number without 2)"
"#$ -o /gridware/wor(number without 2).out"
"#$ -e /gridware/wor(number without 2).err"
"#$ -cwd"
(without "")
For example test238.csh:
"#$ -N /gridware/wor38"
"#$ -o /gridware/wor38.out"
"#$ -e /gridware/wor38.err"
"#$ -cwd"
I want only in the same file:
"#$ -N (name of the file without CSH)"
Also I want to replace "//" with "/"
How can I do that?

import os
from glob import glob
from itertools import islice
path = "full_path/"
import ntpath
for f in glob(path + "*.csv"): # find all .csh files
with open(os.path.join(path, f)) as fl:
data = fl.readlines()
with open(os.path.join(path, f), "w") as w:
# write filename - .csh
w.write("#$ -N {}\n".format(ntpath.basename(f).rstrip(".csh")))
for line in data[4:]:
# write remaining lines
w.write(line.replace("//", "/"))

Related

grep bash command into Python script

There is a bash command, I am trying to convert the logic into python. But I don't know what to do, I need some help with that.
bash command is this :
ls
ls *
TODAY=`date +%Y-%m-%d`
cd xx/xx/autotests/
grep -R "TEST_F(" | sed s/"TEST_F("/"#"/g | cut -f2 -d "#" | while read LINE
The logic is inside a directory, reads the filename one by one, includes all sub-folders, then lists the file matches. Any help here will be much appreciated
I tried something like follows, but it is not what I would like to have. And there are some subfolders inside, which the code is not reading the file inside them
import fnmatch
import os
from datetime import datetime
time = datetime.now()
dir_path = "/xx/xx/autotests"
dirs = os.listdir(dir_path)
TODAY = time.strftime("%Y-%m-%d")
filesOfDirectory = os.listdir(dir_path)
print(filesOfDirectory)
pattern = "TEST_F("
for file in filesOfDirectory:
if fnmatch.fnmatch(file, pattern):
print(file)

Use os.walk() to scan the directory recursively.
Open each file, loop through the lines of the file looking for lines with "TEST_F(". Then extract the part of the line after that (that's what sed and cut are doing).
for root, dirs, files in os.walk(dir_path):
for file in files:
with open(os.path.join(root, file)) as f:
for line in f:
if "TEST_F(" in line:
data = line.split("TEST_F(")[1]
print(data)

Copy files from one folder to another with matching names in .txt file

I want to copy files from one big folder to another folder based on matching file names in a .txt file.
My list.txt file contains file names:
S001
S002
S003
and another big folder contains many files for ex. S001, S002, S003, S004, S005.
I only want to copy the files from this big folder that matches the file names in my list.txt file.
I have tried Bash, Python - not working.
for /f %%f in list.txt do robocopy SourceFolder/ DestinationFolder/ %%f
is not working either.
My logic in Python is not working:
import os
import shutil
def main():
destination = "DestinationFolder/copy"
source = "SourceFolder/MyBigData"
with open(source, "r") as lines:
filenames_to_copy = set(line.rstrip() for line in lines)
for filenames in os.walk(destination):
for filename in filenames:
if filename in filenames_to_copy:
shutil.copy(source, destination)
Any answers in Bash, Python or R?
Thanks

I think the issue with your Python code is that with os.walk() your filename will be a list everytime, which will not be found in your filenames_to_copy.
I'd recommend trying with os.listdir() instead as this will return a list of the names of filenames/folders as strings - easier to compare against your filenames_to_copy.
Other note - perhaps you want to do os.listdir() (or os.walk()) on the source instead of the destination. Currently, you're only copying files from the source to the destination if the file already exists in the destination.

os.walk() will return a tuple of three elements: the name of the current directory inspected, the list of folders in it, and the list of files in it. You are only interested in the latter. So your should iterate with:
for _, _, filenames in os.walk(destination):
As pointed out by JezMonkey, os.listdir() is easier to use as it will list of the files and folders in the requested directory. However, you will lose the recursive search that os.walk() enables. If all your files are in the same folder and not hidden in some folders, you'd rather use os.listdir().
The second problem I see in you code is that you copy source when I think you want to copy os.path.join(source, filename).
Can you publish the exact error you have with the Python script so that we can better help you.
UPDATE
You actually don't need to list all the files in the source folder. With os.path.exists you can check that the file exists and copy it if it does.
import os
import shutil
def main():
destination = "DestinationFolder/copy"
source = "SourceFolder/MyBigData"
with open("list.txt", "r") as lines: # adapt the name of the file to open to your exact location.
filenames_to_copy = set(line.rstrip() for line in lines)
for filename in filenames_to_copy:
source_path = os.path.join(source, filename)
if os.path.exists(source_path):
print("copying {} to {}".format(source_path, destination))
shutil.copy(source_path, destination)

Thank you #PySaad and #Guillaume for your contributions, although my script is working now: I added:
if os.path.exists(copy_to):
shutil.rmtree(copy_to)
shutil.copytree(file_to_copy, copy_to)
to the script and its working like a charm :)
Thanks a lot for your help!

You can try with below code -
import glob
big_dir = "~\big_dir"
copy_to = "~\copy_to"
copy_ref = "~\copy_ref.txt"
big_dir_files = [os.path.basename(f) for f in glob.glob(os.path.join(big_dir, '*'))]
print 'big_dir', big_dir_files # Returns all filenames from big directory
with open(copy_ref, "r") as lines:
filenames_to_copy = set(line.rstrip() for line in lines)
print filenames_to_copy # prints filename which you have in .txt file
for file in filenames_to_copy:
if file in big_dir_files: # Matches filename from ref.txt with filename in big dir
file_to_copy = os.path.join(big_dir, file)
copy_(file_to_copy, copy_to)
def copy_(source_dir, dest_dir):
files = glob.iglob(os.path.join(source_dir, '*'))
for file in files:
dest = os.path.join(dest_dir, os.path.basename(os.path.dirname(file)))
if not os.path.exists(dir_name):
os.mkdir(dest)
shutil.copy2(file, dest)
Reference:
https://docs.python.org/3/library/glob.html

If you want an overkill bash script/tool. Check https://github.com/jordyjwilliams/copy_filenames_from_txt out.
This can be invoked by ./copy_filenames_from_txt.sh -f ./text_file_with_names -d search_dir-o output_dir
The script can be summarised (without error handling/args etc) to:
cat $SEARCH_FILE | while read i; do
find $SEARCH_DIR -name "*$i*" | while read file; do
cp -r $file $OUTPUT_DIR/
done
done
The 2nd while loop here is not even strictly necessary... One could just pass $i to the cp. (eg a list of files if multiple matches) I just wanted to have each file handled separately for the tool I was writing...
To make this a bit nicer and add error handling...
The active part of the my tool in the repo is (ignore the color markers):
cat $SEARCH_FILE | while read i; do
# Handle non-matching file
if [[ -z $(find $SEARCH_DIR -name "*$i*") && $VERBOSE ]]; then
echo -e "❌: ${RED}$i${RESET} ||${YELLOW} No matching file found${RESET}"
continue
fi
# Handle case of more than one matching file
find $SEARCH_DIR -name "*$i*" | while read file; do
if [[ -n "$file" ]]; then # Check file matching
FILENAME=$(basename ${file}); DIRNAME=$(dirname "${file}")
if [[ -f $OUTPUT_DIR/"$FILENAME" ]]; then
if [[ $DIRNAME/ != $OUTPUT_DIR && $VERBOSE ]]; then
echo -e "📁: ${CYAN}$FILENAME ${GREEN}already exists${RESET} in ${MAGENTA}$OUTPUT_DIR${RESET}"
fi
else
if [[ $VERBOSE ]]; then
echo -e "✅: ${GREEN}$i${RESET} >> ${CYAN}$file${RESET} >> ${MAGENTA}$OUTPUT_DIR${RESET}"
fi
cp -r $file $OUTPUT_DIR/
fi
fi
done
done

wireshark commands - python

enter image description hereThere are few wireshark .pcap files. I need to separate each .pcap to incoming and outgoing traffic (by giving source and destination mac addresses) and these separated files have to get written into two different folders namely Incoming and Outgoing. The output files (files that got separated as incoming and outgoing) have to get the same name as input files and need to get written to .csv files. I tried the below code, but not working . Any help is greatly appreciated. Thanks
import os
import csv
startdir= '/root/Desktop/Test'
suffix= '.pcap'
for root,dirs, files, in os.walk(startdir):
for name in files:
if name.endswith(suffix):
filename=os.path.join(root,name)
cmdOut = 'tshark -r "{}" -Y "wlan.sa==00:00:00:00:00:00 && wlan.da==11:11:11:11:11:11" -T fields -e frame.time_delta_displayed -e frame.len -E separator=, -E header=y > "{}"'.format(filename,filename)
cmdIn = 'tshark -r "{}" -Y "wlan.sa==11:11:11:11:11:11 && wlan.da==00:00:00:00:00:00" -T fields -e frame.time_delta_displayed -e frame.len -E separator=, -E header=y > "{}"'.format(filename,filename)
#os.system(cmd1)
#os.system(cmd2)
with open('/root/Desktop/Incoming/', 'w') as csvFile:
writer = csv.writer(csvFile)
writer.writerows(os.system(cmdIn))
with open('/root/Desktop/Outgoing/', 'w') as csvFile:
writer = csv.writer(csvFile)
writer.writerows(os.system(cmdOut))
csvFile.close()

A correct implementation might look more like:
import csv
import os
import subprocess
startdir = 'in.d' # obviously, people other than you won't have /root/Desktop/test
outdir = 'out.d'
suffix = '.pcap'
def decode_to_file(cmd, in_file, new_suffix):
proc = subprocess.Popen(cmd, stdout=subprocess.PIPE)
fileName = outdir + '/' + in_file[len(startdir):-len(suffix)] + new_suffix
os.makedirs(os.path.dirname(fileName), exist_ok=True)
csv_writer = csv.writer(open(fileName, 'w'))
for line_bytes in proc.stdout:
line_str = line_bytes.decode('utf-8')
csv_writer.writerow(line_str.strip().split(','))
for root, dirs, files in os.walk(startdir):
for name in files:
if not name.endswith(suffix):
continue
in_file = os.path.join(root, name)
cmdCommon = [
'tshark', '-r', in_file,
'-T', 'fields',
'-e', 'frame.time_delta_displayed',
'-e', 'frame.len',
'-E', 'separator=,',
'-E', 'header=y',
]
decode_to_file(
cmd=cmdCommon + ['-Y', 'wlan.sa==00:00:00:00:00:00 && wlan.da==11:11:11:11:11:11'],
in_file=in_file,
new_suffix='.out.csv'
)
decode_to_file(
cmd=cmdCommon + ['-Y', 'wlan.sa==11:11:11:11:11:11 && wlan.da==00:00:00:00:00:00'],
in_file=in_file,
new_suffix='.in.csv'
)
Note:
We don't use os.system(). (This wouldn't have ever worked, since it returns a numeric exit status, not strings in a format you can write to a CSV file).
We're not needing to generate any temporary files; we can read directly into our Python code from the stdout of the tshark subprocess.
We construct our output file name by modifying the input file name (replacing its extension with .out.csv and .in.csv, respectively).
Because writerow() requires an iterable, we can generate one by splitting by line.
Note that I'm not completely clear why you wanted to use the Python CSV module at all, since the fields output appears to already be CSV, so one could also just redirect the output straight to a file with no other processing.

Removing six.b from multiple files

I have dozens of files in the project and I want to change all occurences of six.b("...") to b"...". Can I do that with some sort of regex bash script?

It's possible entirely in Python, But I would first make a backup of my project tree, and then:
import re
import os
indir = 'files'
for root, dirs, files in os.walk(indir):
for f in files:
fname = os.path.join(root, f)
with open(fname) as f:
txt = f.read()
txt = re.sub(r'six\.(b\("[^"]*"\))', r'\1', txt)
with open(fname, 'w') as f:
f.write(txt)
print(fname)

A relatively simple bash solution (change *.foo to *.py or whatever filename pattern suits your situation):
#!/bin/bash
export FILES=`find . -type f -name '*.foo' -exec egrep -l 'six\.b\("[^\"]*"\)' {} \; 2>/dev/null`
for file in $FILES
do
cp $file $file.bak
sed 's/six\.b(\(\"[^\"]*[^\\]\"\))/b\1/' $file.bak > $file
echo $file
done
Notes:
It will only consider/modify files that match the pattern
It will make a '.bak' copy of each file it modifies
It won't handle embedded \"), e.g. six.b("asdf\")"), but I don't know that there is a trivial solution to that problem, without knowing more about the files you're manipulating. Is the end of six.b("") guaranteed to be the last ") on the line? etc.

Renaming HTML files using <title> tags

I'm a relatively new to programming. I have a folder, with subfolders, which contain several thousand html files that are generically named, i.e. 1006.htm, 1007.htm, that I would like to rename using the tag from within the file.
For example, if file 1006.htm contains Page Title , I would like to rename it Page Title.htm. Ideally spaces are replaced with dashes.
I've been working in the shell with a bash script with no luck. How do I do this, with either bash or python?
this is what I have so far..
#!/usr/bin/env bashFILES=/Users/Ben/unzipped/*
for f in $FILES
do
if [ ${FILES: -4} == ".htm" ]
then
awk 'BEGIN{IGNORECASE=1;FS="<title>|</title>";RS=EOF} {print $2}' $FILES
fi
done
I've also tried
#!/usr/bin/env bash
for f in *.html;
do
title=$( grep -oP '(?<=<title>).*(?=<\/title>)' "$f" )
mv -i "$f" "${title//[^a-zA-Z0-9\._\- ]}".html
done
But I get an error from the terminal exlaing how to use grep...

use awk instead of grep in your bash script and it should work:
#!/bin/bash
for f in *.html;
do
title=$( awk 'BEGIN{IGNORECASE=1;FS="<title>|</title>";RS=EOF} {print $2}' "$f" )
mv -i "$f" "${title//[^a-zA-Z0-9\._\- ]}".html
done
don't forget to change your bash env on the first line ;)
EDIT full answer with all the modifications
#!/bin/bash
for f in `find . -type f | grep \.html`
do
title=$( awk 'BEGIN{IGNORECASE=1;FS="<title>|</title>";RS=EOF} {print $2}' "$f" )
mv -i "$f" "${title//[ ]/-}".html
done

Here is a python script I just wrote:
import os
import re
from lxml import etree
class MyClass(object):
def __init__(self, dirname=''):
self.dirname = dirname
self.exp_title = "<title>(.*)</title>"
self.re_title = re.compile(self.exp_title)
def rename(self):
for afile in os.listdir(self.dirname):
if os.path.isfile(afile):
originfile = os.path.join(self.dirname, afile)
with open(originfile, 'rb') as fp:
contents = fp.read()
try:
html = etree.HTML(contents)
title = html.xpath("//title")[0].text
except Exception as e:
try:
title = self.re_title.findall(contents)[0]
except Exception:
title = ''
if title:
newfile = os.path.join(self.dirname, title)
os.rename(originfile, newfile)
>>> test = MyClass('/path/to/your/dir')
>>> test.rename()

You want to use a HTML parser (likelxml.html) to parse your HTML files. Once you've got that, retrieving the title tag is one line (probably page.get_element_by_id("title").text_content()).
Translating that to a file name and renaming the document should be trivial.

A python3 recursive globbing version that does a bit of title sanitising before renaming.
import re
from pathlib import Path
import lxml.html
root = Path('.')
for path in root.rglob("*.html"):
soup = lxml.html.parse(path)
title_els = soup.xpath('/html/head/title')
if len(title_els):
title = title_els[0].text
if title:
print(f'Original title {title}')
name = re.sub(r'[^\w\s-]', '', title.lower())
name = re.sub(r'[\s]+', '-', name)
new_path = (path.parent/name).with_suffix(path.suffix)
if not Path(new_path).exists():
print(f'Renaming [{path.absolute()}] to [{new_path}]')
path.rename(new_path)
else:
print(f'{new_path.name} already exists!')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Delete and modify text in a bunch of files with python - python

Related

grep bash command into Python script

Copy files from one folder to another with matching names in .txt file

wireshark commands - python

Removing six.b from multiple files

Renaming HTML files using <title> tags

Categories

Resources