Pasting text from clipboard to a txt file with python - python

I am working on a project with Selenium and I have to copy text from websites.
So far so good but I want to paste all text I copied to a *.txt file with the date and time of today.
Can somebody help me, please?

Don't think of it as 'pasting'. You have the text you want to store, you can create a new file (see python IO with files) and can write the date and time and then the string you pulled from the website with selenium.

import pyautogui
import os
import time
def sendingTextToNote():
path = "Clipboard.txt"
path = os.path.realpath(path)
os.startfile(path)
time.sleep(1)
pyautogui.hotkey('ctrl','v') #paset copied text from clipboard
pyautogui.hotkey('ctrl','s') #save the file
pyautogui.hotkey('alt','f4') #close the file and back to program
sendingTextToNote()

Related

Split image/pdf based on specific text with Python

I want to split a pdf (or image if needed) based on text in it. I want to split it to get each question with its options in the pdf/image, separately like a screenshot of just that question and its options.
Sample PDF link:https://drive.google.com/file/d/1UtMropzRdfJwQjaRf9kZa1UpAzrKlH-K/view?usp=sharing
Is it even possible? If yes what is the code needed to accomplish this. I am a newbie to python so some explanation might help. I've got almost 100 of these PDFs and just wanted to automate the process of getting individual question and its options.
Step1: You simply need to install pdftotext and put the .exe in the same working directory.
Step2: Copy the code down below and paste it in the same directory.
step3: Also keep in mind that the pdf files should also be in the same directory
step4: Run the .py file
Complete Code that worked for me :
import os
import glob
import subprocess
files=[]
#remember to put your pdftotxt.exe to the folder with your pdf files
for filename in glob.glob(os.getcwd() + '\\*.pdf'):
files.append(filename[0:-4]+".txt")
subprocess.call([os.getcwd() + '\\pdftotext', filename, filename[0:-4]+".txt"])
all_files=[]
for i in range(len(files)):
with open(files[i],'r') as f:
text=f.read()
text=text.split('carry one mark each')[1].split('WWW.UNITOPERATION.COM')[0]
text_ls=text.splitlines()
ques=[]
counter=1
for i in range(len(text_ls)):
if text_ls[i].startswith(str(counter)+'.'):
ques.append(''.join(text_ls[i:]).split('\n'[0]))
counter+=1
all_files.append(ques)
# Now you have list of all_files in which ques list is added
# You simply need take one by one element out from all_files and write it in a .txt file
# and that will be your task

Is it possible to open PDF files one after other, their names are saved in a text file using Python?

I wanted to open PDF files one after other to take screenshots with a delay of n seconds.I have made a "1.txt" file to open these through python. I have read these names to a list. But Is there way to read this list to open the files with delay?
I am disconnected here to get the list of the drawings from the list to open it through loop with a delay.
linelist=[line.rstrip('\n') for line in open('1.txt')
print(linelist)
pdf_file=open('1.pdf','rb')
read_pdf=PyPDF2.pdfFileReader(pdf_file)
This is the place I am stuck, to get the file names in the list looped to opening them. used PyPDF2, Webbrowser modules
wb.open_new(r'C\test\1.pdf')
Any help is highly appreciated.
To iterate over a list in python you can do for element in list.
Also, to generate the images from the pdf files, you can use python's pdf2image module, as in this link.
The complete solution would look like:
import os
import tempfile
from pdf2image import convert_from_path
with open('1.txt', 'r') as f:
line_list = f.read().splitlines()
print(line_list)
for line in line_list:
with tempfile.TemporaryDirectory() as path:
images_from_path = convert_from_path(line, output_folder='./',
last_page=1, first_page =0)
base_filename = os.path.splitext(os.path.basename(line))[0] + '.jpg'
save_dir = './'
for page in images_from_path:
page.save(os.path.join(save_dir, base_filename), 'JPEG')
This uses commandline to open the file, which opens it in the default viewer. Tested on Windows 10.
import subprocess
subprocess.Popen([filename], shell=True)
To use your own code:
import subprocess
import time
sleepytime = 5
linelist=[line.rstrip('\n') for line in open('1.txt')
print(linelist)
for filename in linelist:
subprocess.Popen([filename], shell=True)
time.sleep(sleepytime)
Of course I would advise you to look at a way to automate also the screenshot part. Making your life so much more fun.
For example using the pdf2image library
from pdf2image import convert_from_path
images = convert_from_path('/home/belval/example.pdf')
for image in images:
image.save('image.jpg', 'JPEG') # <- change this

How to open different html file in same window using Python

I have folder which is named as 'Doc', The Doc folder contains some sub folders each folder has '.html' file. I have to open all at time in web breowser using Python code. I opened it but the problem is, the html files is not opening in the same window with new tab. Some times each file is opening in new window. I dont know what is the exact problem. Here, is the code which I tried
import os
import webbrowser
for root, dirs, files in os.walk("Doc"):
for file in files:
if file.endswith("index.html"):
webbrowser.open_new_tab(os.path.join(root, file))
RTFM (https://docs.python.org/2/library/webbrowser.html )
webbrowser.open_new_tab(url)
Open url in a new page (“tab”) of the default browser, if possible, otherwise equivalent to open_new().
It is a best effort only, presumably depending on the browser too.

accessing source code from a local file python

I need to access the source code of a locally saved file, but I need to automate this because there are multiple files in one folder. I've looked at the inspect module and the selenium module, but I still understand what to do. After accessing the source code, I need to use bs4 to extract from it.
I've read several posts on here and elsewhere with similar problems, but the thing is that my file does not open in the source code (it's written in xml and so far everything needs to be in source code before you can use these modules). If I open the file, it just uses my browser to open a regular page and then I have to click view page source.
How can I automate this so that it will open the page, go to the source code, and save it so I can stick it into a soup for later parsing?
path_g_jurt = r'C:\Users\g\Desktop\t\SDU\jurt htmls\jurt\meta jurt'
file = r'C:\Users\g\Desktop\t\SDU\jurt htmls\jurt\meta jurt' + "/" + file
for file in path_g_jurt:
if file.endswith(".xhtml"):
with open(file, encoding = "utf-8") as mdata_jurt:
soup = BeautifulSoup(mdata_jurt)
main = file.find("jcid").get_text()
misc_links = []
for item in file.find_all("regelgeving"):
misc = item.find("misc:link")
misc_links.append(misc.get("misc:jcid"))
Any help would be appreciated.

Python3:Save File to Specified Location

I have a rather simple program that writes HTML code ready for use.
It works fine, except that if one were to run the program from the Python command line, as is the default, the HTML file that is created is created where python.exe is, not where the program I wrote is. And that's a problem.
Do you know a way of getting the .write() function to write a file to a specific location on the disc (e.g. C:\Users\User\Desktop)?
Extra cool-points if you know how to open a file browser window.
The first problem is probably that you are not including the full path when you open the file for writing. For details on opening a web browser, read this fine manual.
import os
target_dir = r"C:\full\path\to\where\you\want\it"
fullname = os.path.join(target_dir,filename)
with open(fullname,"w") as f:
f.write("<html>....</html>")
import webbrowser
url = "file://"+fullname.replace("\\","/")
webbrowser.open(url,True,True)
BTW: the code is the same in python 2.6.
I'll admit I don't know Python 3, so I may be wrong, but in Python 2, you can just check the __file__ variable in your module to get the name of the file it was loaded from. Just create your file in that same directory (preferably using os.path.dirname and os.path.join to remain platform-independent).

Categories

Resources