Removing six.b from multiple files - python

I have dozens of files in the project and I want to change all occurences of six.b("...") to b"...". Can I do that with some sort of regex bash script?

It's possible entirely in Python, But I would first make a backup of my project tree, and then:
import re
import os
indir = 'files'
for root, dirs, files in os.walk(indir):
for f in files:
fname = os.path.join(root, f)
with open(fname) as f:
txt = f.read()
txt = re.sub(r'six\.(b\("[^"]*"\))', r'\1', txt)
with open(fname, 'w') as f:
f.write(txt)
print(fname)

A relatively simple bash solution (change *.foo to *.py or whatever filename pattern suits your situation):
#!/bin/bash
export FILES=`find . -type f -name '*.foo' -exec egrep -l 'six\.b\("[^\"]*"\)' {} \; 2>/dev/null`
for file in $FILES
do
cp $file $file.bak
sed 's/six\.b(\(\"[^\"]*[^\\]\"\))/b\1/' $file.bak > $file
echo $file
done
Notes:
It will only consider/modify files that match the pattern
It will make a '.bak' copy of each file it modifies
It won't handle embedded \"), e.g. six.b("asdf\")"), but I don't know that there is a trivial solution to that problem, without knowing more about the files you're manipulating. Is the end of six.b("") guaranteed to be the last ") on the line? etc.

Related

grep bash command into Python script

There is a bash command, I am trying to convert the logic into python. But I don't know what to do, I need some help with that.
bash command is this :
ls
ls *
TODAY=`date +%Y-%m-%d`
cd xx/xx/autotests/
grep -R "TEST_F(" | sed s/"TEST_F("/"#"/g | cut -f2 -d "#" | while read LINE
The logic is inside a directory, reads the filename one by one, includes all sub-folders, then lists the file matches. Any help here will be much appreciated
I tried something like follows, but it is not what I would like to have. And there are some subfolders inside, which the code is not reading the file inside them
import fnmatch
import os
from datetime import datetime
time = datetime.now()
dir_path = "/xx/xx/autotests"
dirs = os.listdir(dir_path)
TODAY = time.strftime("%Y-%m-%d")
filesOfDirectory = os.listdir(dir_path)
print(filesOfDirectory)
pattern = "TEST_F("
for file in filesOfDirectory:
if fnmatch.fnmatch(file, pattern):
print(file)
Use os.walk() to scan the directory recursively.
Open each file, loop through the lines of the file looking for lines with "TEST_F(". Then extract the part of the line after that (that's what sed and cut are doing).
for root, dirs, files in os.walk(dir_path):
for file in files:
with open(os.path.join(root, file)) as f:
for line in f:
if "TEST_F(" in line:
data = line.split("TEST_F(")[1]
print(data)

How to modify multiple json files

I have a folder with 100 json files and I have to add [ at the beginning and ] at the end of each file .
The file structure is:
{
item
}
However, I need to transform all of them like so:
[{
item
}]
How to do that?
While I would normally recommend using json.load and json.dump for anything related to JSON, due to your specific requirements the following scrip should suffice:
import os
os.chdir('path to your directory')
for fp in os.listdir('.'):
if fp.endswith('.json'):
f = open(fp, 'r+')
content = f.read()
f.seek(0)
f.truncate()
f.write('[' + content + ']')
f.close()
you can use glob module to parse through all the files. then you can read the contents and then modify that content and write back to the file
from glob import glob
for filename in glob('./json/*.json'):
f = open(filename, 'r')
contents = f.read()
f.close()
new_contents = f"[{contents}]"
f = open(filename, 'w')
f.write(new_contents)
f.close()
import glob
for fn in glob.glob('/path/*'):
with open(fn, 'r') as f:
data = f.read()
with open(fn, 'w') as f:
f.write('[' + data + ']')
TL;DR
# One file
$ jq '.=[.]' test.json | sponge test.json
# Many files
find . -type f -name '*.json' -exec sh -c "jq '.=[.]' "{}" | sponge "{}"" \;
Breakdown
Let's take a sample file
$ cat test.json
{
"hello":"world"
}
$ jq '.=[.]' test.json
Above, the dot (.) represents the root node. So I'm taking the root node and putting it inside brackets. So we get:
[
{
"hello": "world"
}
]
Now we need to take the output and put it back into the file
$ jq '.=[.]' test.json | sponge test.json
$ cat test.json
[
{
"hello": "world"
}
]
Next, let's find all the json files where we want to do this
$ find . -type f -name '*.json'
./test.json
We can iterate over each line of the find command and cat it as as follow:
$ find . -type f -name '*.json' -exec cat {}\;
{
"hello":"world"
}
or instead we can compose the jq command that we need. The filename is passed to the curly braces between -exec and \;
$ find . -type f -name '*.json' -exec sh -c "jq '.=[.]' "{}" | sponge "{}"" \;
jq
sponge
Thanks to zeppelin

Copy files from one folder to another with matching names in .txt file

I want to copy files from one big folder to another folder based on matching file names in a .txt file.
My list.txt file contains file names:
S001
S002
S003
and another big folder contains many files for ex. S001, S002, S003, S004, S005.
I only want to copy the files from this big folder that matches the file names in my list.txt file.
I have tried Bash, Python - not working.
for /f %%f in list.txt do robocopy SourceFolder/ DestinationFolder/ %%f
is not working either.
My logic in Python is not working:
import os
import shutil
def main():
destination = "DestinationFolder/copy"
source = "SourceFolder/MyBigData"
with open(source, "r") as lines:
filenames_to_copy = set(line.rstrip() for line in lines)
for filenames in os.walk(destination):
for filename in filenames:
if filename in filenames_to_copy:
shutil.copy(source, destination)
Any answers in Bash, Python or R?
Thanks
I think the issue with your Python code is that with os.walk() your filename will be a list everytime, which will not be found in your filenames_to_copy.
I'd recommend trying with os.listdir() instead as this will return a list of the names of filenames/folders as strings - easier to compare against your filenames_to_copy.
Other note - perhaps you want to do os.listdir() (or os.walk()) on the source instead of the destination. Currently, you're only copying files from the source to the destination if the file already exists in the destination.
os.walk() will return a tuple of three elements: the name of the current directory inspected, the list of folders in it, and the list of files in it. You are only interested in the latter. So your should iterate with:
for _, _, filenames in os.walk(destination):
As pointed out by JezMonkey, os.listdir() is easier to use as it will list of the files and folders in the requested directory. However, you will lose the recursive search that os.walk() enables. If all your files are in the same folder and not hidden in some folders, you'd rather use os.listdir().
The second problem I see in you code is that you copy source when I think you want to copy os.path.join(source, filename).
Can you publish the exact error you have with the Python script so that we can better help you.
UPDATE
You actually don't need to list all the files in the source folder. With os.path.exists you can check that the file exists and copy it if it does.
import os
import shutil
def main():
destination = "DestinationFolder/copy"
source = "SourceFolder/MyBigData"
with open("list.txt", "r") as lines: # adapt the name of the file to open to your exact location.
filenames_to_copy = set(line.rstrip() for line in lines)
for filename in filenames_to_copy:
source_path = os.path.join(source, filename)
if os.path.exists(source_path):
print("copying {} to {}".format(source_path, destination))
shutil.copy(source_path, destination)
Thank you #PySaad and #Guillaume for your contributions, although my script is working now: I added:
if os.path.exists(copy_to):
shutil.rmtree(copy_to)
shutil.copytree(file_to_copy, copy_to)
to the script and its working like a charm :)
Thanks a lot for your help!
You can try with below code -
import glob
big_dir = "~\big_dir"
copy_to = "~\copy_to"
copy_ref = "~\copy_ref.txt"
big_dir_files = [os.path.basename(f) for f in glob.glob(os.path.join(big_dir, '*'))]
print 'big_dir', big_dir_files # Returns all filenames from big directory
with open(copy_ref, "r") as lines:
filenames_to_copy = set(line.rstrip() for line in lines)
print filenames_to_copy # prints filename which you have in .txt file
for file in filenames_to_copy:
if file in big_dir_files: # Matches filename from ref.txt with filename in big dir
file_to_copy = os.path.join(big_dir, file)
copy_(file_to_copy, copy_to)
def copy_(source_dir, dest_dir):
files = glob.iglob(os.path.join(source_dir, '*'))
for file in files:
dest = os.path.join(dest_dir, os.path.basename(os.path.dirname(file)))
if not os.path.exists(dir_name):
os.mkdir(dest)
shutil.copy2(file, dest)
Reference:
https://docs.python.org/3/library/glob.html
If you want an overkill bash script/tool. Check https://github.com/jordyjwilliams/copy_filenames_from_txt out.
This can be invoked by ./copy_filenames_from_txt.sh -f ./text_file_with_names -d search_dir-o output_dir
The script can be summarised (without error handling/args etc) to:
cat $SEARCH_FILE | while read i; do
find $SEARCH_DIR -name "*$i*" | while read file; do
cp -r $file $OUTPUT_DIR/
done
done
The 2nd while loop here is not even strictly necessary... One could just pass $i to the cp. (eg a list of files if multiple matches) I just wanted to have each file handled separately for the tool I was writing...
To make this a bit nicer and add error handling...
The active part of the my tool in the repo is (ignore the color markers):
cat $SEARCH_FILE | while read i; do
# Handle non-matching file
if [[ -z $(find $SEARCH_DIR -name "*$i*") && $VERBOSE ]]; then
echo -e "❌: ${RED}$i${RESET} ||${YELLOW} No matching file found${RESET}"
continue
fi
# Handle case of more than one matching file
find $SEARCH_DIR -name "*$i*" | while read file; do
if [[ -n "$file" ]]; then # Check file matching
FILENAME=$(basename ${file}); DIRNAME=$(dirname "${file}")
if [[ -f $OUTPUT_DIR/"$FILENAME" ]]; then
if [[ $DIRNAME/ != $OUTPUT_DIR && $VERBOSE ]]; then
echo -e "📁: ${CYAN}$FILENAME ${GREEN}already exists${RESET} in ${MAGENTA}$OUTPUT_DIR${RESET}"
fi
else
if [[ $VERBOSE ]]; then
echo -e "✅: ${GREEN}$i${RESET} >> ${CYAN}$file${RESET} >> ${MAGENTA}$OUTPUT_DIR${RESET}"
fi
cp -r $file $OUTPUT_DIR/
fi
fi
done
done

Read all files in directory and subdirectories in Python

I'm trying to translate this bash line in python:
find /usr/share/applications/ -name "*.desktop" -exec grep -il "player" {} \; | sort | while IFS=$'\n' read APPLI ; do grep -ilqw "video" "$APPLI" && echo "$APPLI" ; done | while IFS=$'\n' read APPLI ; do grep -iql "nodisplay=true" "$APPLI" || echo "$(basename "${APPLI%.*}")" ; done
The result is to show all the videos apps installed in a Ubuntu system.
-> read all the .desktop files in /usr/share/applications/ directory
-> filter the strings "video" "player" to find the video applications
-> filter the string "nodisplay=true" and "audio" to not show audio players and no-gui apps
The result I would like to have is (for example):
kmplayer
smplayer
vlc
xbmc
So, I've tried this code:
import os
import fnmatch
apps = []
for root, dirnames, filenames in os.walk('/usr/share/applications/'):
for dirname in dirnames:
for filename in filenames:
with open('/usr/share/applications/' + dirname + "/" + filename, "r") as auto:
a = auto.read(50000)
if "Player" in a or "Video" in a or "video" in a or "player" in a:
if "NoDisplay=true" not in a or "audio" not in a:
print "OK: ", filename
filename = filename.replace(".desktop", "")
apps.append(filename)
print apps
But I've a problem with the recursive files...
How can I fix it?
Thanks
Looks like you are doing os.walk() loop incorrectly. There is no need for nested dir loop.
Please refer to Python manual for the correct example:
https://docs.python.org/2/library/os.html?highlight=walk#os.walk
for root, dirs, files in os.walk('python/Lib/email'):
for file in files:
with open(os.path.join(root, file), "r") as auto:

Renaming HTML files using <title> tags

I'm a relatively new to programming. I have a folder, with subfolders, which contain several thousand html files that are generically named, i.e. 1006.htm, 1007.htm, that I would like to rename using the tag from within the file.
For example, if file 1006.htm contains Page Title , I would like to rename it Page Title.htm. Ideally spaces are replaced with dashes.
I've been working in the shell with a bash script with no luck. How do I do this, with either bash or python?
this is what I have so far..
#!/usr/bin/env bashFILES=/Users/Ben/unzipped/*
for f in $FILES
do
if [ ${FILES: -4} == ".htm" ]
then
awk 'BEGIN{IGNORECASE=1;FS="<title>|</title>";RS=EOF} {print $2}' $FILES
fi
done
I've also tried
#!/usr/bin/env bash
for f in *.html;
do
title=$( grep -oP '(?<=<title>).*(?=<\/title>)' "$f" )
mv -i "$f" "${title//[^a-zA-Z0-9\._\- ]}".html
done
But I get an error from the terminal exlaing how to use grep...
use awk instead of grep in your bash script and it should work:
#!/bin/bash
for f in *.html;
do
title=$( awk 'BEGIN{IGNORECASE=1;FS="<title>|</title>";RS=EOF} {print $2}' "$f" )
mv -i "$f" "${title//[^a-zA-Z0-9\._\- ]}".html
done
don't forget to change your bash env on the first line ;)
EDIT full answer with all the modifications
#!/bin/bash
for f in `find . -type f | grep \.html`
do
title=$( awk 'BEGIN{IGNORECASE=1;FS="<title>|</title>";RS=EOF} {print $2}' "$f" )
mv -i "$f" "${title//[ ]/-}".html
done
Here is a python script I just wrote:
import os
import re
from lxml import etree
class MyClass(object):
def __init__(self, dirname=''):
self.dirname = dirname
self.exp_title = "<title>(.*)</title>"
self.re_title = re.compile(self.exp_title)
def rename(self):
for afile in os.listdir(self.dirname):
if os.path.isfile(afile):
originfile = os.path.join(self.dirname, afile)
with open(originfile, 'rb') as fp:
contents = fp.read()
try:
html = etree.HTML(contents)
title = html.xpath("//title")[0].text
except Exception as e:
try:
title = self.re_title.findall(contents)[0]
except Exception:
title = ''
if title:
newfile = os.path.join(self.dirname, title)
os.rename(originfile, newfile)
>>> test = MyClass('/path/to/your/dir')
>>> test.rename()
You want to use a HTML parser (likelxml.html) to parse your HTML files. Once you've got that, retrieving the title tag is one line (probably page.get_element_by_id("title").text_content()).
Translating that to a file name and renaming the document should be trivial.
A python3 recursive globbing version that does a bit of title sanitising before renaming.
import re
from pathlib import Path
import lxml.html
root = Path('.')
for path in root.rglob("*.html"):
soup = lxml.html.parse(path)
title_els = soup.xpath('/html/head/title')
if len(title_els):
title = title_els[0].text
if title:
print(f'Original title {title}')
name = re.sub(r'[^\w\s-]', '', title.lower())
name = re.sub(r'[\s]+', '-', name)
new_path = (path.parent/name).with_suffix(path.suffix)
if not Path(new_path).exists():
print(f'Renaming [{path.absolute()}] to [{new_path}]')
path.rename(new_path)
else:
print(f'{new_path.name} already exists!')

Categories

Resources