import os
from subprocess import call
call('(find $readDir -type f -print0 | sort -z | xargs -0 sha1sum; find $readDir \( -type f -o -type d \) -print0 | sort -z | xargs -0 stat -c '\%n \%a') | sha1sum')
SyntaxError: invalid token
Can someone tell me which characters need to be escaped here? I thought it was the percent signs but that's not working either. I'm looking to get the sha for the contents of a folder.
I did not dig into your command itself, but to make it work, you can remove the escaping hassle by using triple quotes and "raw" string:
call(r'''(find $readDir -type f -print0 | sort -z | xargs -0 sha1sum; find $readDir \( -type f -o -type d \) -print0 | sort -z | xargs -0 stat -c '%n %a') | sha1sum''', shell=True)
Also, you need to add shell=True, for that command to be passed to your shell interpreter.
That being said, it'd be a better idea to walk through your directory and calculate the hashes in pure python, not calling any shell. Here's an example on how to do it:
As a suggestion here's a way:
import os
import hashlib
def sha1OfFile(filepath):
sha = hashlib.sha1()
with open(filepath, 'rb') as f:
while True:
block = f.read(2**10) # Magic number: one-megabyte blocks.
if not block: break
sha.update(block)
return sha.hexdigest()
for (path, dirs, files) in os.walk('.'):
for file in files:
print('{}: {}'.format(os.path.join(path, file), sha1OfFile(os.path.join(path, file))))
Related
In a folder, I have some markdown files that have links to some png or jpg images. The images are saved in the attachments sub-folder.
Now if I delete an image's link (say ![](attachments/fig1.png)) from a markdown file, the corresponding image file, of course, doesn't get removed. I want to remove that image to a .trash sub-folder. I compiled a short shell script for that (with help from https://www.stevemar.net/remove-unused-images/), but it does nothing!
#!zsh
imagepaths=$(find . -name '*.jpg' -o -name '*.png')
for imagepath in $imagepaths; do
filename=$(basename -- $imagepath)
if ! grep -q --exclude-dir=".git" $filename .; then
mv $imagepath ./.trash
fi
done
1. find all images files
find . -type f -name "*.jpg" -or -name "*.png"
2. create awk script script.awk
BEGIN { # before processing input file file "markdown.txt"
RS = "^$"; # read input file as a single string
# converst filesList into array filesArr
split(filesList,filesArr);
}
{ # process input file "markdown.txt" as single string
for(i in filesArr) { # for each file-name in filesArr
if ($0 ~ filesArr[i]) { # if current file-name matched in input file
delete filesArr[i]; # delete current file-name from filesArr
}
}
}
END { # after processing input file file "markdown.txt"
for(i in filesArr) { # for each unmatched file-name in filesArr
printf("mv \"%s\" ./.trash\n", filesArr[i]); # print "mv" command
}
}
3. print all unmatched files mv commands
awk -f script.awk -v filesList="$(find . -type f -name "*.jpg" -or -name "*.png")" markdown.txt
4. execute all mv command at once
bash <<< $(awk -f script.awk -v filesList="$(find . -type f -name "*.jpg" -or -name "*.png")" markdown.txt)
This is the python code that worked for me.
import shutil
from pathlib import Path
cwd = Path.cwd()
attachment_folder = cwd / '../attachments'
note_folder = cwd / '../'
trash_folder = Path('../Trash')
all_note_paths = list(note_folder.glob('*.md'))
all_attachment_paths = list(attachment_folder.glob('*.*'))
all_texts = ''
for note_path in all_note_paths:
with open(note_path, 'r') as f:
all_texts += f.read()
for attachment_path in all_attachment_paths:
if attachment_path.stem not in all_texts:
print(f'{attachment_path.name} moved to Trash')
shutil.move(attachment_path, trash_folder/f'{attachment_path.name}')
I have a folder with 100 json files and I have to add [ at the beginning and ] at the end of each file .
The file structure is:
{
item
}
However, I need to transform all of them like so:
[{
item
}]
How to do that?
While I would normally recommend using json.load and json.dump for anything related to JSON, due to your specific requirements the following scrip should suffice:
import os
os.chdir('path to your directory')
for fp in os.listdir('.'):
if fp.endswith('.json'):
f = open(fp, 'r+')
content = f.read()
f.seek(0)
f.truncate()
f.write('[' + content + ']')
f.close()
you can use glob module to parse through all the files. then you can read the contents and then modify that content and write back to the file
from glob import glob
for filename in glob('./json/*.json'):
f = open(filename, 'r')
contents = f.read()
f.close()
new_contents = f"[{contents}]"
f = open(filename, 'w')
f.write(new_contents)
f.close()
import glob
for fn in glob.glob('/path/*'):
with open(fn, 'r') as f:
data = f.read()
with open(fn, 'w') as f:
f.write('[' + data + ']')
TL;DR
# One file
$ jq '.=[.]' test.json | sponge test.json
# Many files
find . -type f -name '*.json' -exec sh -c "jq '.=[.]' "{}" | sponge "{}"" \;
Breakdown
Let's take a sample file
$ cat test.json
{
"hello":"world"
}
$ jq '.=[.]' test.json
Above, the dot (.) represents the root node. So I'm taking the root node and putting it inside brackets. So we get:
[
{
"hello": "world"
}
]
Now we need to take the output and put it back into the file
$ jq '.=[.]' test.json | sponge test.json
$ cat test.json
[
{
"hello": "world"
}
]
Next, let's find all the json files where we want to do this
$ find . -type f -name '*.json'
./test.json
We can iterate over each line of the find command and cat it as as follow:
$ find . -type f -name '*.json' -exec cat {}\;
{
"hello":"world"
}
or instead we can compose the jq command that we need. The filename is passed to the curly braces between -exec and \;
$ find . -type f -name '*.json' -exec sh -c "jq '.=[.]' "{}" | sponge "{}"" \;
jq
sponge
Thanks to zeppelin
I would like to run these tree bash commands in Python:
sed $'s/\r//' -i filename
sed -i 's/^ *//; s/ *$//; /^$/d' filename
awk -F, 'NF==10' filename > temp_file && mv temp_file filename
I wrote the following code:
cmd_1 = ["sed $'s/\r//' -i", file]
cmd_2 = ["sed -i 's/^ *//; s/ *$//; /^$/d'", file]
cmd_3 = ["awk -F, 'NF==10'", file, "> tmp_file && mv tmp_file", file]
subprocess.run(cmd_1)
subprocess.run(cmd_2)
subprocess.run(cmd_3)
But I'm getting this error here:
FileNotFoundError: [Errno 2] No such file or directory: "sed $'s/\r//' -i": "sed $'s/\r//' -i"
What I'm getting wrong?
If you provide the command as a list, then each argument should be a separate list member. Therefore:
cmd_1 = ["sed" r"s/\r//", "-i", file]
cmd_2 = ["sed" "-i" "s/^ *//; s/ *$//; /^$/d", file]
subprocess.run(cmd_1)
subprocess.run(cmd_2)
The last command requires the operators > and && provided by the shell, so you will need to also specify shell=True, and make the command a string:
cmd_3 = f"awk -F, NF==10 '{file}' > tmp_file && mv temp_file '{file}'"
subprocess.run(cmd_3, shell=True)
You have to use the shell=True parameter:
subprocess.run(cmd_1, shell=True)
I have dozens of files in the project and I want to change all occurences of six.b("...") to b"...". Can I do that with some sort of regex bash script?
It's possible entirely in Python, But I would first make a backup of my project tree, and then:
import re
import os
indir = 'files'
for root, dirs, files in os.walk(indir):
for f in files:
fname = os.path.join(root, f)
with open(fname) as f:
txt = f.read()
txt = re.sub(r'six\.(b\("[^"]*"\))', r'\1', txt)
with open(fname, 'w') as f:
f.write(txt)
print(fname)
A relatively simple bash solution (change *.foo to *.py or whatever filename pattern suits your situation):
#!/bin/bash
export FILES=`find . -type f -name '*.foo' -exec egrep -l 'six\.b\("[^\"]*"\)' {} \; 2>/dev/null`
for file in $FILES
do
cp $file $file.bak
sed 's/six\.b(\(\"[^\"]*[^\\]\"\))/b\1/' $file.bak > $file
echo $file
done
Notes:
It will only consider/modify files that match the pattern
It will make a '.bak' copy of each file it modifies
It won't handle embedded \"), e.g. six.b("asdf\")"), but I don't know that there is a trivial solution to that problem, without knowing more about the files you're manipulating. Is the end of six.b("") guaranteed to be the last ") on the line? etc.
I'm a relatively new to programming. I have a folder, with subfolders, which contain several thousand html files that are generically named, i.e. 1006.htm, 1007.htm, that I would like to rename using the tag from within the file.
For example, if file 1006.htm contains Page Title , I would like to rename it Page Title.htm. Ideally spaces are replaced with dashes.
I've been working in the shell with a bash script with no luck. How do I do this, with either bash or python?
this is what I have so far..
#!/usr/bin/env bashFILES=/Users/Ben/unzipped/*
for f in $FILES
do
if [ ${FILES: -4} == ".htm" ]
then
awk 'BEGIN{IGNORECASE=1;FS="<title>|</title>";RS=EOF} {print $2}' $FILES
fi
done
I've also tried
#!/usr/bin/env bash
for f in *.html;
do
title=$( grep -oP '(?<=<title>).*(?=<\/title>)' "$f" )
mv -i "$f" "${title//[^a-zA-Z0-9\._\- ]}".html
done
But I get an error from the terminal exlaing how to use grep...
use awk instead of grep in your bash script and it should work:
#!/bin/bash
for f in *.html;
do
title=$( awk 'BEGIN{IGNORECASE=1;FS="<title>|</title>";RS=EOF} {print $2}' "$f" )
mv -i "$f" "${title//[^a-zA-Z0-9\._\- ]}".html
done
don't forget to change your bash env on the first line ;)
EDIT full answer with all the modifications
#!/bin/bash
for f in `find . -type f | grep \.html`
do
title=$( awk 'BEGIN{IGNORECASE=1;FS="<title>|</title>";RS=EOF} {print $2}' "$f" )
mv -i "$f" "${title//[ ]/-}".html
done
Here is a python script I just wrote:
import os
import re
from lxml import etree
class MyClass(object):
def __init__(self, dirname=''):
self.dirname = dirname
self.exp_title = "<title>(.*)</title>"
self.re_title = re.compile(self.exp_title)
def rename(self):
for afile in os.listdir(self.dirname):
if os.path.isfile(afile):
originfile = os.path.join(self.dirname, afile)
with open(originfile, 'rb') as fp:
contents = fp.read()
try:
html = etree.HTML(contents)
title = html.xpath("//title")[0].text
except Exception as e:
try:
title = self.re_title.findall(contents)[0]
except Exception:
title = ''
if title:
newfile = os.path.join(self.dirname, title)
os.rename(originfile, newfile)
>>> test = MyClass('/path/to/your/dir')
>>> test.rename()
You want to use a HTML parser (likelxml.html) to parse your HTML files. Once you've got that, retrieving the title tag is one line (probably page.get_element_by_id("title").text_content()).
Translating that to a file name and renaming the document should be trivial.
A python3 recursive globbing version that does a bit of title sanitising before renaming.
import re
from pathlib import Path
import lxml.html
root = Path('.')
for path in root.rglob("*.html"):
soup = lxml.html.parse(path)
title_els = soup.xpath('/html/head/title')
if len(title_els):
title = title_els[0].text
if title:
print(f'Original title {title}')
name = re.sub(r'[^\w\s-]', '', title.lower())
name = re.sub(r'[\s]+', '-', name)
new_path = (path.parent/name).with_suffix(path.suffix)
if not Path(new_path).exists():
print(f'Renaming [{path.absolute()}] to [{new_path}]')
path.rename(new_path)
else:
print(f'{new_path.name} already exists!')