NPP + Python: Move Text between search strings to Another Position (Footer) - python

I am working on sone html-documents looking like this:
<html>
<head>Something in here</head>
<body>
<MYTAG>This should be moved to the Footer</MYTAG>
<MYTAG>This should be moved to the Footer, too</MYTAG>
</body>
<footer></footer>
</html>
I am already using Notepad++ and Python to customize the rest of the document mainly using Regular Expressions.
Now I want to move the parts that are tagged with <MYTAG></MYTAG> to the footer, having the documents like this in the end:
<html>
<head>Something in here</head>
<body>
</body>
<footer>
<MYTAG>This should be moved to the Footer</MYTAG>
<MYTAG>This should be moved to the Footer, too</MYTAG>
</footer>
</html>
First I tried to do the job with Regular Expressions alone:
Search for:
(<html.*?)(<MYTAG>.*?</MYTAG>)(.*?<footer>)(.*?)(</footer>.*?</html>)
and replace it with: $1$3$4$2$5
This works, but I have to run it over and over again for multiple <MYTAG>-parts (and it's a pain... with larger documents).
I know there is a better solution with python but I cannot get the coding write. The documentation and Syntax confuses me. I thought about using editor.setSelection followed by editor.cut and finally editor.paste somewhere to the footer but I don't know how to set the right targets.
Any help on this is very much appreciated :)

You can use following script:
import re
with open("temp.html") as html_file:
html = html_file.read()
tags = re.findall(r"<MYTAG>.*</MYTAG>\n*", html)
html = re.sub(r"<MYTAG>.*</MYTAG>\n*", "", html)
footer = re.split(r"<footer>", html)
tags.insert(0, "<footer>\n")
tags.insert(0, footer[0])
tags.append(footer[1])
with open("temp.html", "w") as html_file:
html_file.write("".join(tags))
It working following way:
Read file
Finds all tags
Replaces tags in the body
Split file's content on 2 parts.
Adds the tags and <footer> in the text
Writes result to the file.

Try this
(<html.*?)((?:\s*<MYTAG>[^<]+<\/MYTAG>\n*)+)(.*?( *)<footer>)(.*?)(<\/footer>.*?<\/html>)
Substitution:
\1\3\2\4\6
Regex Demo
Input
<html>
<head>Something in here</head>
<body>
<MYTAG>This should be moved to the Footer</MYTAG>
<MYTAG>This should be moved to the Footer, too</MYTAG>
</body>
<footer></footer>
</html>
Output
<html>
<head>Something in here</head>
<body> </body>
<footer>
<MYTAG>This should be moved to the Footer</MYTAG>
<MYTAG>This should be moved to the Footer, too</MYTAG>
</footer>
</html>

Related

How do you input and output text with Pyscript?

I’m learning py-script where you can use <py-script></py-script> in an HTML5 file to write Python Code. As a python coder, I would like to try web development while still using python, so it would be helpful if we could output and input information using py-script.
For example, could someone explain how to get this function to work:
<html>
<head>
<link rel="stylesheet" href="https://pyscript.net/alpha/pyscript.css" />
<script defer src="https://pyscript.net/alpha/pyscript.js"></script>
</head>
<body>
<div>Type an sample input here</div>
<input id = “test_input”></input>
<-- How would you get this button to display the text you typed into the input into the div with the id, “test”--!>
<button id = “submit-button” onClick = “py-script-function”>
<div id = “test”></div>
<div
<py-script>
<py-script>
</body>
</html
I would appreciate it and I hope this will also help the other py-script users.
I checked source code on GitHub and found folder examples.
Using files todo.html and todo.py I created this index.html
(which I tested using local server python -m http.server)
Some elements I figured out because I have some experience with JavaScript and CSS - so it could be good to learn JavaScript and CSS to work with HTML elements.
index.html
<!DOCTYPE html>
<html>
<head>
<!--<link rel="stylesheet" href="https://pyscript.net/alpha/pyscript.css" />-->
<script defer src="https://pyscript.net/alpha/pyscript.js"></script>
</head>
<body>
<div>Type an sample input here</div>
<input type="text" id="test-input"/>
<button id="submit-button" type="submit" pys-onClick="my_function">OK</button>
<div id="test-output"></div>
<py-script>
from js import console
def my_function(*args, **kwargs):
#print('args:', args)
#print('kwargs:', kwargs)
console.log(f'args: {args}')
console.log(f'kwargs: {kwargs}')
text = Element('test-input').element.value
#print('text:', text)
console.log(f'text: {text}')
Element('test-output').element.innerText = text
</py-script>
</body>
</html>
Here screenshot with JavaScript console in DevTool in Firefox.
It needed longer time to load all modules
(from Create pyodine runtime to Collecting nodes...)
Next you can see outputs from console.log().
You may also use print() but it shows text with extra error writing to undefined ....
An alternative to way to display the output would be to replace the
Element('test-output').element.innerText = text
by
pyscript.write('test-output', text)

Handling text and a new line character in Python's dominate module

I am using dominate module in Python 3.7, I am not sure how to handle the new line characters that are existing in Python. As per my requirement the new line character \n should be converted to break character in HTML <br>. But that is not happening. The dominate module is ignoring the newline character which is not how I am expecting it to behave. Below is the code which I have tried.
import dominate
from dominate.tags import *
text = "Hello\nworld!"
doc = dominate.document(title='Dominate your HTML')
with doc:
h1(text)
with open("dominate22.html", 'w') as file:
file.write(doc.render())
The output HTML code is
<!DOCTYPE html>
<html>
<head>
<title>Dominate your HTML</title>
</head>
<body>
<h1>Hello,
World!</h1>
</body>
</html>
Also I have tried replacing the new line character with break character i.e text.replace("\n", "<br>")
But this was creating a string like Hello<br>World, which was not what I was expecting. Attaching the HTML code for the same.
<!DOCTYPE html>
<html>
<head>
<title>Dominate your HTML</title>
</head>
<body>
<h1>Hello<br>world</h1>
</body>
</html>
The dominate module is ignoring the newline character which is not how I am expecting it to behave.
For the dominate library the \n is just a character in a text string. It's not the same as a line break <br> HTML element so you have to do add it programmatically.
This example shows two approaches, using a context manager and adding nodes to an instance:
import dominate
from dominate.tags import h1, h2, br
from dominate.util import text
the_string = "Hello\nworld!"
doc = dominate.document(title='Dominate your HTML')
with doc:
parts = the_string.split('\n')
with h1(): # using a context manager
text(parts[0])
br()
text(parts[1])
header = h2() # same as above but adding nodes
header.add(text(parts[0]))
header.add(br())
header.add(text(parts[1]))
with open("dominate22.html", 'w') as file:
file.write(doc.render())
Gives this HTML:
<!DOCTYPE html>
<html>
<head>
<title>Dominate your HTML</title>
</head>
<body>
<h1>Hello<br>world!</h1>
<h2>Hello<br>world!</h2>
</body>
</html>

Is there a way of writing several lines at a time to a file in python?

I need to write a lot of information to a file, basically a whole webpage with certain values calculated using my script. I know I can do this using .write(), however I would like to know if you can write several lines at a time to a file, without having to put in all of the line breaks.
For example, I would like to wite the following to a file:
<!DOCTYPE html>
<html>
<head>
</head>
<style>
some styling stuff ..
<\style>
<body>
many more lines of code ...
</body>
</html>
Currently I have
file = open('filetowriteto.txt','w')
file.write('<html>\n')
file.write('<head>\n')
...
file.close()
But I would like to be able to do
file.write('
<html>
<head>
</head>
<style>
some styling stuff ..
<\style>
<body>
many more lines of code ...
</body>
</html>')
Does anybody know of a way to do this? Thanks!
When you use triple quotes ('''), line breaks are read into the string:
file.write('''
<html>
<head>
</head>
<style>
some styling stuff ..
<\style>
<body>
many more lines of code ...
</body>
</html>''')
That's what file.writelines is for:
with open(filename) as fp:
fp.writelines([
'<html>',
'</html>'
])
You also could use multiline strings with triple quotes ''' or """, but they tend to mess with indentation.
That being said, consider using Jinja for HTML output.

How do I match a tag containing only the stated class, not any others, using BeautifulSoup?

Is there a way to use BeautifulSoup to match a tag with only the indicated class attribute, not the indicated class attribute and others? For example, in this simple HTML:
<html>
<head>
<title>
Title here
</title>
</head>
<body>
<div class="one two">
some content here
</div>
<div class="two">
more content here
</div>
</body>
</html>
is it possible to match only the div with class="two", but not match the div with class="one two"? Unless I'm missing something, that section of the documentation doesn't give me any ideas. This is the code I'm using currently:
from bs4 import BeautifulSoup
html = '''
<html>
<head>
<title>
Title here
</title>
</head>
<body>
<div class="one two">
should not be matched
</div>
<div class="two">
this should be matched
</div>
</body>
</html>
'''
soup = BeautifulSoup(html)
div_two = soup.find("div", "two")
print(div_two.contents[0].strip())
I'm trying to get this to print this should be matched instead of should not be matched.
EDIT: In this simple example, I know that the only options for classes are "one two" or "two", but in production code, I'll only know that what I want to match will have class "two"; other tags could have a large number of other classes in addition to "two", which may not be known.
On a related note, it's also helpful to read the documentation for version 4, not version 3 as I previously linked.
Try:
divs = soup.findAll('div', class="two")
for div in divs:
if div['class'] == ['two']:
pass # handle class="two"
else:
pass # handle other cases, including but not limited to "one two"
Hope, below code helps you. Though I didn't try this one.
soup.find("div", { "class" : "two" })

Multiple <html><body> </html></body> in same file

I have a multiple html files in one file:
<html>
<body>
</body>
</html>
<html>
<body>
</body>
</html>
<html>
<body>
</body>
</html>
and the result is that I get a messed up html file. How to correct this without removing tags from the rest. I am using python to generate the html file. If I use the self.response.out.write(function(query)) I get a nice html page.
If I use it a second time self.response.out.write(function(query2)) then the page gets distorted.
Have one HTML file per file. Anything else is invalid and won’t be processed properly.
If you’re not sure if your HTML files are valid, the W3C’s validator will tell you.

Categories

Resources