How to modify multiple json files

How to modify multiple json files - python

I have a folder with 100 json files and I have to add [ at the beginning and ] at the end of each file .
The file structure is:
{
item
}
However, I need to transform all of them like so:
[{
item
}]
How to do that?

While I would normally recommend using json.load and json.dump for anything related to JSON, due to your specific requirements the following scrip should suffice:
import os
os.chdir('path to your directory')
for fp in os.listdir('.'):
if fp.endswith('.json'):
f = open(fp, 'r+')
content = f.read()
f.seek(0)
f.truncate()
f.write('[' + content + ']')
f.close()

you can use glob module to parse through all the files. then you can read the contents and then modify that content and write back to the file
from glob import glob
for filename in glob('./json/*.json'):
f = open(filename, 'r')
contents = f.read()
f.close()
new_contents = f"[{contents}]"
f = open(filename, 'w')
f.write(new_contents)
f.close()

import glob
for fn in glob.glob('/path/*'):
with open(fn, 'r') as f:
data = f.read()
with open(fn, 'w') as f:
f.write('[' + data + ']')

TL;DR
# One file
$ jq '.=[.]' test.json | sponge test.json
# Many files
find . -type f -name '*.json' -exec sh -c "jq '.=[.]' "{}" | sponge "{}"" \;
Breakdown
Let's take a sample file
$ cat test.json
{
"hello":"world"
}
$ jq '.=[.]' test.json
Above, the dot (.) represents the root node. So I'm taking the root node and putting it inside brackets. So we get:
[
{
"hello": "world"
}
]
Now we need to take the output and put it back into the file
$ jq '.=[.]' test.json | sponge test.json
$ cat test.json
[
{
"hello": "world"
}
]
Next, let's find all the json files where we want to do this
$ find . -type f -name '*.json'
./test.json
We can iterate over each line of the find command and cat it as as follow:
$ find . -type f -name '*.json' -exec cat {}\;
{
"hello":"world"
}
or instead we can compose the jq command that we need. The filename is passed to the curly braces between -exec and \;
$ find . -type f -name '*.json' -exec sh -c "jq '.=[.]' "{}" | sponge "{}"" \;
jq
sponge
Thanks to zeppelin

Related

Moving images that are not linked to in a set of markdown files to a trash folder

In a folder, I have some markdown files that have links to some png or jpg images. The images are saved in the attachments sub-folder.
Now if I delete an image's link (say ![](attachments/fig1.png)) from a markdown file, the corresponding image file, of course, doesn't get removed. I want to remove that image to a .trash sub-folder. I compiled a short shell script for that (with help from https://www.stevemar.net/remove-unused-images/), but it does nothing!
#!zsh
imagepaths=$(find . -name '*.jpg' -o -name '*.png')
for imagepath in $imagepaths; do
filename=$(basename -- $imagepath)
if ! grep -q --exclude-dir=".git" $filename .; then
mv $imagepath ./.trash
fi
done

1. find all images files
find . -type f -name "*.jpg" -or -name "*.png"
2. create awk script script.awk
BEGIN { # before processing input file file "markdown.txt"
RS = "^$"; # read input file as a single string
# converst filesList into array filesArr
split(filesList,filesArr);
}
{ # process input file "markdown.txt" as single string
for(i in filesArr) { # for each file-name in filesArr
if ($0 ~ filesArr[i]) { # if current file-name matched in input file
delete filesArr[i]; # delete current file-name from filesArr
}
}
}
END { # after processing input file file "markdown.txt"
for(i in filesArr) { # for each unmatched file-name in filesArr
printf("mv \"%s\" ./.trash\n", filesArr[i]); # print "mv" command
}
}
3. print all unmatched files mv commands
awk -f script.awk -v filesList="$(find . -type f -name "*.jpg" -or -name "*.png")" markdown.txt
4. execute all mv command at once
bash <<< $(awk -f script.awk -v filesList="$(find . -type f -name "*.jpg" -or -name "*.png")" markdown.txt)

This is the python code that worked for me.
import shutil
from pathlib import Path
cwd = Path.cwd()
attachment_folder = cwd / '../attachments'
note_folder = cwd / '../'
trash_folder = Path('../Trash')
all_note_paths = list(note_folder.glob('*.md'))
all_attachment_paths = list(attachment_folder.glob('*.*'))
all_texts = ''
for note_path in all_note_paths:
with open(note_path, 'r') as f:
all_texts += f.read()
for attachment_path in all_attachment_paths:
if attachment_path.stem not in all_texts:
print(f'{attachment_path.name} moved to Trash')
shutil.move(attachment_path, trash_folder/f'{attachment_path.name}')

Removing six.b from multiple files

I have dozens of files in the project and I want to change all occurences of six.b("...") to b"...". Can I do that with some sort of regex bash script?

It's possible entirely in Python, But I would first make a backup of my project tree, and then:
import re
import os
indir = 'files'
for root, dirs, files in os.walk(indir):
for f in files:
fname = os.path.join(root, f)
with open(fname) as f:
txt = f.read()
txt = re.sub(r'six\.(b\("[^"]*"\))', r'\1', txt)
with open(fname, 'w') as f:
f.write(txt)
print(fname)

A relatively simple bash solution (change *.foo to *.py or whatever filename pattern suits your situation):
#!/bin/bash
export FILES=`find . -type f -name '*.foo' -exec egrep -l 'six\.b\("[^\"]*"\)' {} \; 2>/dev/null`
for file in $FILES
do
cp $file $file.bak
sed 's/six\.b(\(\"[^\"]*[^\\]\"\))/b\1/' $file.bak > $file
echo $file
done
Notes:
It will only consider/modify files that match the pattern
It will make a '.bak' copy of each file it modifies
It won't handle embedded \"), e.g. six.b("asdf\")"), but I don't know that there is a trivial solution to that problem, without knowing more about the files you're manipulating. Is the end of six.b("") guaranteed to be the last ") on the line? etc.

shell command in python for sha

import os
from subprocess import call
call('(find $readDir -type f -print0 | sort -z | xargs -0 sha1sum; find $readDir \( -type f -o -type d \) -print0 | sort -z | xargs -0 stat -c '\%n \%a') | sha1sum')
SyntaxError: invalid token
Can someone tell me which characters need to be escaped here? I thought it was the percent signs but that's not working either. I'm looking to get the sha for the contents of a folder.

I did not dig into your command itself, but to make it work, you can remove the escaping hassle by using triple quotes and "raw" string:
call(r'''(find $readDir -type f -print0 | sort -z | xargs -0 sha1sum; find $readDir \( -type f -o -type d \) -print0 | sort -z | xargs -0 stat -c '%n %a') | sha1sum''', shell=True)
Also, you need to add shell=True, for that command to be passed to your shell interpreter.
That being said, it'd be a better idea to walk through your directory and calculate the hashes in pure python, not calling any shell. Here's an example on how to do it:
As a suggestion here's a way:
import os
import hashlib
def sha1OfFile(filepath):
sha = hashlib.sha1()
with open(filepath, 'rb') as f:
while True:
block = f.read(2**10) # Magic number: one-megabyte blocks.
if not block: break
sha.update(block)
return sha.hexdigest()
for (path, dirs, files) in os.walk('.'):
for file in files:
print('{}: {}'.format(os.path.join(path, file), sha1OfFile(os.path.join(path, file))))

Read all files in directory and subdirectories in Python

I'm trying to translate this bash line in python:
find /usr/share/applications/ -name "*.desktop" -exec grep -il "player" {} \; | sort | while IFS=$'\n' read APPLI ; do grep -ilqw "video" "$APPLI" && echo "$APPLI" ; done | while IFS=$'\n' read APPLI ; do grep -iql "nodisplay=true" "$APPLI" || echo "$(basename "${APPLI%.*}")" ; done
The result is to show all the videos apps installed in a Ubuntu system.
-> read all the .desktop files in /usr/share/applications/ directory
-> filter the strings "video" "player" to find the video applications
-> filter the string "nodisplay=true" and "audio" to not show audio players and no-gui apps
The result I would like to have is (for example):
kmplayer
smplayer
vlc
xbmc
So, I've tried this code:
import os
import fnmatch
apps = []
for root, dirnames, filenames in os.walk('/usr/share/applications/'):
for dirname in dirnames:
for filename in filenames:
with open('/usr/share/applications/' + dirname + "/" + filename, "r") as auto:
a = auto.read(50000)
if "Player" in a or "Video" in a or "video" in a or "player" in a:
if "NoDisplay=true" not in a or "audio" not in a:
print "OK: ", filename
filename = filename.replace(".desktop", "")
apps.append(filename)
print apps
But I've a problem with the recursive files...
How can I fix it?
Thanks

Looks like you are doing os.walk() loop incorrectly. There is no need for nested dir loop.
Please refer to Python manual for the correct example:
https://docs.python.org/2/library/os.html?highlight=walk#os.walk
for root, dirs, files in os.walk('python/Lib/email'):
for file in files:
with open(os.path.join(root, file), "r") as auto:

Renaming HTML files using <title> tags

I'm a relatively new to programming. I have a folder, with subfolders, which contain several thousand html files that are generically named, i.e. 1006.htm, 1007.htm, that I would like to rename using the tag from within the file.
For example, if file 1006.htm contains Page Title , I would like to rename it Page Title.htm. Ideally spaces are replaced with dashes.
I've been working in the shell with a bash script with no luck. How do I do this, with either bash or python?
this is what I have so far..
#!/usr/bin/env bashFILES=/Users/Ben/unzipped/*
for f in $FILES
do
if [ ${FILES: -4} == ".htm" ]
then
awk 'BEGIN{IGNORECASE=1;FS="<title>|</title>";RS=EOF} {print $2}' $FILES
fi
done
I've also tried
#!/usr/bin/env bash
for f in *.html;
do
title=$( grep -oP '(?<=<title>).*(?=<\/title>)' "$f" )
mv -i "$f" "${title//[^a-zA-Z0-9\._\- ]}".html
done
But I get an error from the terminal exlaing how to use grep...

use awk instead of grep in your bash script and it should work:
#!/bin/bash
for f in *.html;
do
title=$( awk 'BEGIN{IGNORECASE=1;FS="<title>|</title>";RS=EOF} {print $2}' "$f" )
mv -i "$f" "${title//[^a-zA-Z0-9\._\- ]}".html
done
don't forget to change your bash env on the first line ;)
EDIT full answer with all the modifications
#!/bin/bash
for f in `find . -type f | grep \.html`
do
title=$( awk 'BEGIN{IGNORECASE=1;FS="<title>|</title>";RS=EOF} {print $2}' "$f" )
mv -i "$f" "${title//[ ]/-}".html
done

Here is a python script I just wrote:
import os
import re
from lxml import etree
class MyClass(object):
def __init__(self, dirname=''):
self.dirname = dirname
self.exp_title = "<title>(.*)</title>"
self.re_title = re.compile(self.exp_title)
def rename(self):
for afile in os.listdir(self.dirname):
if os.path.isfile(afile):
originfile = os.path.join(self.dirname, afile)
with open(originfile, 'rb') as fp:
contents = fp.read()
try:
html = etree.HTML(contents)
title = html.xpath("//title")[0].text
except Exception as e:
try:
title = self.re_title.findall(contents)[0]
except Exception:
title = ''
if title:
newfile = os.path.join(self.dirname, title)
os.rename(originfile, newfile)
>>> test = MyClass('/path/to/your/dir')
>>> test.rename()

You want to use a HTML parser (likelxml.html) to parse your HTML files. Once you've got that, retrieving the title tag is one line (probably page.get_element_by_id("title").text_content()).
Translating that to a file name and renaming the document should be trivial.

A python3 recursive globbing version that does a bit of title sanitising before renaming.
import re
from pathlib import Path
import lxml.html
root = Path('.')
for path in root.rglob("*.html"):
soup = lxml.html.parse(path)
title_els = soup.xpath('/html/head/title')
if len(title_els):
title = title_els[0].text
if title:
print(f'Original title {title}')
name = re.sub(r'[^\w\s-]', '', title.lower())
name = re.sub(r'[\s]+', '-', name)
new_path = (path.parent/name).with_suffix(path.suffix)
if not Path(new_path).exists():
print(f'Renaming [{path.absolute()}] to [{new_path}]')
path.rename(new_path)
else:
print(f'{new_path.name} already exists!')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to modify multiple json files - python

I have a folder with 100 json files and I have to add [ at the beginning and ] at the end of each file . The file structure is: { item } However, I need to transform all of them like so: [{ item }] How to do that?

import glob for fn in glob.glob('/path/*'): with open(fn, 'r') as f: data = f.read() with open(fn, 'w') as f: f.write('[' + data + ']')

Related

Moving images that are not linked to in a set of markdown files to a trash folder

Removing six.b from multiple files

shell command in python for sha

Read all files in directory and subdirectories in Python

Renaming HTML files using <title> tags

Categories

Resources