I have a camera whose picture is achieve using the IP address of the camera in the web browser.
I can download the image link then and then download the image to my local system.
After that, I have to delete the image using msg_id, token, parameter.
I added the image link for delete using msg_id.
from time import sleep
import os
import sys
import requests
from bs4 import BeautifulSoup
import piexif
import os
from fractions import Fraction
archive_url = "http://192.168.42.1/SD/AMBA/191123000/"
def get_img_links():
# create response object
r = requests.get(archive_url)
# create beautiful-soup object
soup = BeautifulSoup(r.content,'html5lib')
# find all links on web-page
links = soup.findAll('a')
# filter the link sending with .mp4
img_links = [archive_url + link['href'] for link in links if link['href'].endswith('JPG')]
return img_links
def FileDelete():
FilesToProcess = get_img_links()
print(FilesToProcess)
FilesToProcessStr = "\n".join(FilesToProcess)
for FileTP in FilesToProcess:
tosend = '{"msg_id":1281,"token":%s,"param":"%s"}' %(token, FileTP)
print("Delete successfully")
Getting this error:
NameError: name 'token' is not defined
runfile('D:/EdallSystem/socket_pro/pic/hy/support.py', wdir='D:/EdallSystem/socket_pro/pic/hy')
['http://192.168.42.1/SD/AMBA/191123000/13063800.JPG', 'http://192.168.42.1/SD/AMBA/191123000/13064200.JPG', 'http://192.168.42.1/SD/AMBA/191123000/13064600.JPG', 'http://192.168.42.1/SD/AMBA/191123000/13065000.JPG', 'http://192.168.42.1/SD/AMBA/191123000/13065400.JPG', 'http://192.168.42.1/SD/AMBA/191123000/13065800.JPG', 'http://192.168.42.1/SD/AMBA/191123000/13072700.JPG', 'http://192.168.42.1/SD/AMBA/191123000/13073100.JPG', 'http://192.168.42.1/SD/AMBA/191123000/13073500.JPG', 'http://192.168.42.1/SD/AMBA/191123000/13073900.JPG', 'http://192.168.42.1/SD/AMBA/191123000/13074300.JPG', 'http://192.168.42.1/SD/AMBA/191123000/13074700.JPG', 'http://192.168.42.1/SD/AMBA/191123000/13075100.JPG', 'http://192.168.42.1/SD/AMBA/191123000/13075500.JPG', 'http://192.168.42.1/SD/AMBA/191123000/13075900.JPG', 'http://192.168.42.1/SD/AMBA/191123000/13080300.JPG', 'http://192.168.42.1/SD/AMBA/191123000/13080700.JPG', 'http://192.168.42.1/SD/AMBA/191123000/13081100.JPG', 'http://192.168.42.1/SD/AMBA/191123000/13081500.JPG', 'http://192.168.42.1/SD/AMBA/191123000/13081900.JPG', 'http://192.168.42.1/SD/AMBA/191123000/13082300.JPG', 'http://192.168.42.1/SD/AMBA/191123000/13082700.JPG', 'http://192.168.42.1/SD/AMBA/191123000/13083100.JPG', 'http://192.168.42.1/SD/AMBA/191123000/13083500.JPG', 'http://192.168.42.1/SD/AMBA/191123000/13083900.JPG', 'http://192.168.42.1/SD/AMBA/191123000/13084300.JPG', 'http://192.168.42.1/SD/AMBA/191123000/13084700.JPG', 'http://192.168.42.1/SD/AMBA/191123000/13085100.JPG', 'http://192.168.42.1/SD/AMBA/191123000/13085500.JPG', 'http://192.168.42.1/SD/AMBA/191123000/13085900.JPG', 'http://192.168.42.1/SD/AMBA/191123000/13090300.JPG', 'http://192.168.42.1/SD/AMBA/191123000/13090700.JPG', 'http://192.168.42.1/SD/AMBA/191123000/13091100.JPG', 'http://192.168.42.1/SD/AMBA/191123000/13091500.JPG', 'http://192.168.42.1/SD/AMBA/191123000/13091900.JPG']
Traceback (most recent call last):
File "D:\EdallSystem\socket_pro\pic\hy\support.py", line 82, in <module>
FileDelete()
File "D:\EdallSystem\socket_pro\pic\hy\support.py", line 74, in FileDelete
tosend = '{"msg_id":1281,"token":%s,"param":"%s"}' %( FileTP)
TypeError: not enough arguments for format string
Related
I am trying to download the most recent zip file from the ERCOT Website (https://www.ercot.com/mp/data-products/compliance-and-disclosure/?id=NP3-965-ER). However, the link of the zip file has a doclookup id that changes everytime. The id is also populated dynamically. I have tried using beautifulsoup to get the link, but since it's being loaded dynamically it is not providing any links. Any feedback or solutions will be appreciated. enter image description here
Using the exposed api:
import json
import pandas as pd
import pendulum
import requests
def get_document_id(type_id: int) -> int:
url = (
"https://www.ercot.com/misapp/servlets/IceDocListJsonWS?"
f"reportTypeId={type_id}&"
f"_={pendulum.now().format('X')}"
)
with requests.Session() as request:
response = request.get(url, timeout=10)
if response.status_code != 200:
print(response.raise_for_status())
data = json.loads(response.text)
return pd.json_normalize(data=data["ListDocsByRptTypeRes"], record_path="DocumentList").head(1)["Document.DocID"].squeeze()
id_number = get_document_id(13052)
print(id_number)
869234127
So BS4 was working earlier today however it has problems when trying to load a page.
import requests
from bs4 import BeautifulSoup
name = input("")
twitter = requests.get("https://twitter.com/" + name)
#instagram = requests.get("https//instagram.com/" + name)
#website = requests.get("https://" + name + ".com")
twitter_soup = BeautifulSoup(twitter, 'html.parser')
twitter_available = twitter_soup.body.findAll(text="This account doesn't exist")
if twitter_available == True:
print("Available")
else:
print("Not Available")
So the line where twitter_soup is declared I get the following errors
Traceback (most recent call last):
File "D:\Programming\Python\name-checker.py", line 12, in
twitter_soup = BeautifulSoup(twitter, 'html.parser')
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\bs4_init_.py", line 310, in init
elif len(markup) <= 256 and (
TypeError: object of type 'Response' has no len()
I have also tried the other parsers the docs were suggesting however none are working.
I just figured it out.
So I had to use the actual html which is twitter.text in this situation instead of just using the request.
i'm creating a crawler in python to list all links in a website but i'm getting an error i can't see what cause it
the error is :
Traceback (most recent call last):
File "vul_scanner.py", line 8, in <module>
vuln_scanner.crawl(target_url)
File "C:\Users\Lenovo x240\Documents\website\website\spiders\scanner.py", line 18, in crawl
href_links= self.extract_links_from(url)
File "C:\Users\Lenovo x240\Documents\website\website\spiders\scanner.py", line 15, in extract_links_from
return re.findall('(?:href=")(.*?)"', response.content)
File "C:\Users\Lenovo x240\AppData\Local\Programs\Python\Python38\lib\re.py", line 241, in findall
return _compile(pattern, flags).findall(string)
TypeError: cannot use a string pattern on a bytes-like object
my code is : in scanner.py file:
# To ignore numpy errors:
# pylint: disable=E1101
import urllib
import requests
import re
from urllib.parse import urljoin
class Scanner:
def __init__(self, url):
self.target_url = url
self.target_links = []
def extract_links_from(self, url):
response = requests.get(url)
return re.findall('(?:href=")(.*?)"', response.content)
def crawl(self, url):
href_links= self.extract_links_from(url)
for link in href_links:
link = urljoin(url, link)
if "#" in link:
link = link.split("#")[0]
if self.target_url in link and link not in self.target_links:
self.target_links.append(link)
print(link)
self.crawl(link)
in vul_scanner.py file :
import scanner
# To ignore numpy errors:
# pylint: disable=E1101
target_url = "https://www.amazon.com"
vuln_scanner = scanner.Scanner(target_url)
vuln_scanner.crawl(target_url)
the command i run is : python vul_scanner.py
return re.findall('(?:href=")(.*?)"', response.content)
response.content in this case is of type binary. So either you use response.text, so you get pure text and can process it as you plan on doing now, or you can check this out:
Regular expression parsing a binary file?
In case you want to continue down the binary road.
Cheers
I'm trying to add a boolean value to a text file but get this error:
Traceback (most recent call last):
File "/Users/valentinwestermann/Documents/La dieta mediterranea_dhooks.py", line 32, in <module>
f.write(variant["available"])
TypeError: write() argument must be str, not bool
Does anyone have an idea how to fix this? : )
It is supposed to work as a restock monitor and make a text version of the product availability when the bot launches and then constantly compare it and notifiers the user about product restocks.
import bs4 as bs
import urllib.request
import discord
from discord.ext import commands
from dhooks import Webhook
import requests
import json
r = requests.get("https://www.feature.com/products.json")
products = json.loads((r.text))["products"]
for product in products:
print("============================================")
print(product["title"])
print(product["tags"])
print(product["published_at"])
print(product["created_at"])
print(product["product_type"])
for variant in product["variants"]:
print(variant['title'])
print(variant['available'],"\n")
data =("available")
with open("stock_index.txt","w") as f:
for product in products:
for variant in product["variants"]:
if variant['available'] == True:
f.write(product["title"])
f.write(variant['title'])
print(variant["available"])
f.write("--------------------")
You can convert to a string first:
f.write(str(variant["available"]))
I am using the following code in an attempt to do webscraping .
import sys , os
import requests, webbrowser,bs4
from PIL import Image
import pyautogui
p = requests.get('http://www.goal.com/en-ie/news/ozil-agent-eviscerates-jealous-keown-over-stupid-comments/1javhtwzz72q113dnonn24mnr1')
n = open("exml.txt" , 'wb')
for i in p.iter_content(1000) :
n.write(i)
n.close()
n = open("exml.txt" , 'r')
soupy= bs4.BeautifulSoup(n,"html.parser")
elems = soupy.select('img[src]')
for u in elems :
print (u)
so what I am intending to do is to extract all the image links that is there in the xml response obtained from the page .
(Please correct me If I am wrong in thinking that requests.get returns the whole static html file of the webpage that opens on entering the URL)
However in the line :
soupy= bs4.BeautifulSoup(n,"html.parser")
I am getting the following error :
Traceback (most recent call last):
File "../../perl/webscratcher.txt", line 24, in <module>
soupy= bs4.BeautifulSoup(n,"html.parser")
File "C:\Users\Kanishc\AppData\Local\Programs\Python\Python36-32\lib\site-packages\bs4\__init__.py", line 191, in __init__
markup = markup.read()
File "C:\Users\Kanishc\AppData\Local\Programs\Python\Python36-32\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 24662: character maps to <undefined>
I am clueless about the error and the "Appdata" folder is empty .
How to proceed further ?
Post Trying suggestions :
I changed the extension of the filename to py and this error got removed . However on the following line :
soupy= bs4.BeautifulSoup(n,"lxml") I am getting the following error :
Traceback (most recent call last):
File "C:\perl\webscratcher.py", line 23, in
soupy= bs4.BeautifulSoup(p,"lxml")
File "C:\Users\PREMRAJ\AppData\Local\Programs\Python\Python36-32\lib\site-packages\bs4_init_.py", line 192, in init
elif len(markup) <= 256 and (
TypeError: object of type 'Response' has no len()
How to tackle this ?
You are over-complicating things. Pass the bytes content of a Response object directly into the constructor of the BeautifulSoup object, instead of writing it to a file.
import requests
from bs4 import BeautifulSoup
response = requests.get('http://www.goal.com/en-ie/news/ozil-agent-eviscerates-jealous-keown-over-stupid-comments/1javhtwzz72q113dnonn24mnr1')
soup = BeautifulSoup(response.content, 'lxml')
for element in soup.select('img[src]'):
print(element)
Okay so you you might want to do a review on working with BeautifulSoup. I referenced an old project of mine and this is all you need for printing them. Check the BS documents to find the exact syntax you want with the select method.
This will print all the img tags from the html
import requests, bs4
site = 'http://www.goal.com/en-ie/news/ozil-agent-eviscerates-jealous-keown-over-stupid-comments/1javhtwzz72q113dnonn24mnr1'
p = requests.get(site).text
soupy = bs4.BeautifulSoup(p,"html.parser")
elems = soupy.select('img[src]')
for u in elems :
print (u)