How to use local files with this script from chatdownloader? Python - python

I want to use a local file (for example, an exported file from chatdownloader) rather than an url, is it possible?
chatdownloader
from chat_downloader import ChatDownloader
import time
url = 'https://www.youtube.com/watch?v=Ih2WTyY62J4' # Change URL here
prev_time = 0
chat = ChatDownloader().get_chat(url, start_time=prev_time)
for message in chat: # iterate over messages
if prev_time is not None and 'time_in_seconds' in message:
time.sleep(max(message['time_in_seconds'] - prev_time, 0))
chat.print_formatted(message) # print the formatted message
prev_time = message.get('time_in_seconds')
I don't really know what to do. I'd be happy if you can tell me your thought!

Related

How to recall previous output of python script

i am working on a telethon script of python which runs if the channel/group receives new message
i am looking at the message id for running my script
i am a beginner of python so with what knowledge i have
i am using this following code.
prev_msgid=0
latest_msgid = message.id
if latest_msgid>prev_msgid:
print('latest message')
prev_msgid = message.id
else:
print('old message')
but when i run this code every time the previous message resets to 0
i need a way for when i run this code multiple times the prev_msgid is automatically changed to the latest message id.
thank you.
like #Quba said you need a way to store data in persistent way
Pickle is the fastest solution for you. It can save python object as a file:
import pickle
from os import path
prev_msgid = 0
# check if saved
if path.exists("prev_msgid"):
# load
with open("prev_msgid", 'rb') as f:
prev_msgid = pickle.load(f)
prev_msgid += 1
# save
with open("prev_msgid", 'wb') as f:
pickle.dump(prev_msgid, f)
print(prev_msgid)
Every time you run the script it will add one to prev_msgid. See that it makes a file that named "prev_msgid"

Downloading Multiple torrent files with Libtorrent in Python

I'm trying to write a torrent application that can take in a list of magnet links and then download them all together. I've been trying to read and understand the documentation at Libtorrent but I haven't been able to tell if what I try works or not. I've managed to be able to apply a SOCKS5 proxy to a Libtorrent session and download a single magnet link using this code:
import libtorrent as lt
import time
import os
ses = lt.session()
r = lt.proxy_settings()
r.hostname = "proxy_info"
r.username = "proxy_info"
r.password = "proxy_info"
r.port = 1080
r.type = lt.proxy_type_t.socks5_pw
ses.set_peer_proxy(r)
ses.set_web_seed_proxy(r)
ses.set_proxy(r)
t = ses.settings()
t.force_proxy = True
t.proxy_peer_connections = True
t.anonymous_mode = True
ses.set_settings(t)
print(ses.get_settings())
ses.peer_proxy()
ses.web_seed_proxy()
ses.set_settings(t)
magnet_link = "magnet"
params = {
"save_path": os.getcwd() + r"\torrents",
"storage_mode": lt.storage_mode_t.storage_mode_sparse,
"url": magnet_link
}
handle = lt.add_magnet_uri(ses, magnet_link, params)
ses.start_dht()
print('downloading metadata...')
while not handle.has_metadata():
time.sleep(1)
print('got metadata, starting torrent download...')
while handle.status().state != lt.torrent_status.seeding:
s = handle.status()
state_str = ['queued', 'checking', 'downloading metadata', 'downloading', 'finished', 'seeding', 'allocating']
print('%.2f%% complete (down: %.1f kb/s up: %.1f kB/s peers: %d) %s' % (s.progress * 100, s.download_rate / 1000, s.upload_rate / 1000, s.num_peers, state_str[s.state]))
time.sleep(5)
This is great and all for runing on its own with a single link. What I want to do is something like this:
def torrent_download(magnetic_link_list):
for mag in range(len(magnetic_link_list)):
handle = lt.add_magnet_uri(ses, magnetic_link_list[mag], params)
#Then download all the files
#Once all files complete, stop the torrents so they dont seed.
return torrent_name_list
I'm not sure if this is even on the right track or not, but some pointers would be helpful.
UPDATE: This is what I now have and it works fine in my case
def magnet2torrent(magnet_link):
global LIBTORRENT_SESSION, TORRENT_HANDLES
if LIBTORRENT_SESSION is None and TORRENT_HANDLES is None:
TORRENT_HANDLES = []
settings = lt.default_settings()
settings['proxy_hostname'] = CONFIG_DATA["PROXY"]["HOST"]
settings['proxy_username'] = CONFIG_DATA["PROXY"]["USERNAME"]
settings['proxy_password'] = CONFIG_DATA["PROXY"]["PASSWORD"]
settings['proxy_port'] = CONFIG_DATA["PROXY"]["PORT"]
settings['proxy_type'] = CONFIG_DATA["PROXY"]["TYPE"]
settings['force_proxy'] = True
settings['anonymous_mode'] = True
LIBTORRENT_SESSION = lt.session(settings)
params = {
"save_path": os.getcwd() + r"/torrents",
"storage_mode": lt.storage_mode_t.storage_mode_sparse,
"url": magnet_link
}
TORRENT_HANDLES.append(LIBTORRENT_SESSION.add_torrent(params))
def check_torrents():
global TORRENT_HANDLES
for torrent in range(len(TORRENT_HANDLES)):
print(TORRENT_HANDLES[torrent].status().is_seeding)
It's called "magnet links" (not magnetic).
In new versions of libtorrent, the way you add a magnet link is:
params = lt.parse_magnet_link(uri)
handle = ses.add_torrent(params)
That also gives you an opportunity to tweak the add_torrent_params object, to set the save directory for instance.
If you're adding a lot of magnet links (or regular torrent files for that matter) and want to do it quickly, a faster way is to use:
ses.add_torrent_async(params)
That function will return immediately and the torrent_handle object can be picked up later in the add_torrent_alert.
As for downloading multiple magnet links in parallel, your pseudo code for adding them is correct. You just want to make sure you either save off all the torrent_handle objects you get back or query all torrent handles once you're done adding them (using ses.get_torrents()). In your pseudo code you seem to overwrite the last torrent handle every time you add a new one.
The condition you expressed for exiting was that all torrents were complete. The simplest way of doing that is simply to poll them all with handle.status().is_seeding. i.e. loop over your list of torrent handles and ask that. Keep in mind that the call to status() requires a round-trip to the libtorrent network thread, which isn't super fast.
The faster way of doing this is to keep track of all torrents that aren't seeding yet, and "strike them off your list" as you get torrent_finished_alerts for torrents. (you get alerts by calling ses.pop_alerts()).
Another suggestion I would make is to set up your settings_pack object first, then create the session. It's more efficient and a bit cleaner. Especially with regards to opening listen sockets and then immediately closing and re-opening them when you change settings.
i.e.
p = lt.settings_pack()
p['proxy_hostname'] = '...'
p['proxy_username'] = '...'
p['proxy_password'] = '...'
p['proxy_port'] = 1080
p['proxy_type'] = lt.proxy_type_t.socks5_pw
p['proxy_peer_connections'] = True
ses = lt.session(p)

How to save photos using instagram API and python

I'm using the Instagram API to obtain photos taken at a particular location using the python 3 code below:
import urllib.request
wp = urllib.request.urlopen("https://api.instagram.com/v1/media/search?lat=48.858844&lng=2.294351&access_token="ACCESS TOKEN")
pw = wp.read()
print(pw)
This allows me to retrieve all the photos. I wanted to know how I can save these on my computer.
An additional question I have is, is there any limit to the number of images returned by running the above? Thanks!
Eventually came up with this. In case anybody needs it, here you go:
#This Python Script will download 10,000 images from a specified location.
# 10k images takes approx 15-20 minutes, approx 700 MB.
import urllib, json, requests
import time, csv
print "time.time(): %f " % time.time() #Current epoch time (Unix Timestamp)
print time.asctime( time.localtime(time.time()) ) #Current time in human readable format
#lat='48.858844' #Latitude of the center search coordinate. If used, lng is required.
#lng='2.294351' #Longitude of the center search coordinate. If used, lat is required.
#Brooklyn Brewery
lat='40.721645'
lng='-73.957258'
distance='5000' #Default is 1km (distance=1000), max distance is 5km.
access_token='<YOUR TOKEN HERE>' #Access token to use API
#The default time span is set to 5 days. The time span must not exceed 7 days.
#min_timestamp # A unix timestamp. All media returned will be taken later than this timestamp.
#max_timestamp # A unix timestamp. All media returned will be taken earlier than this timestamp.
#Settings for Verification Dataset of images
#lat, long =40.721645, -73.957258, dist = 5000, default timestamp (5 days)
images={}
#to keep track of duplicates
total_count=0
count=0
#count for each loop
timestamp_last_image=0
flag=0
#images are returned in reverse order, i.e. most recent to least recent
#A max of 100 images are returned in during each request, to get the next set, we use last image (least recent) timestamp as max timestamp and continue
#to avoid duplicates we check if image ID has already been recorded (instagram tends to return images based on a %60 timestamp)
#flag checks for first run of loop
#use JSON viewer http://www.jsoneditoronline.org/ and use commented API response links below to comprehend JSON response
while total_count<10000:
if flag==0:
response = urllib.urlopen('https://api.instagram.com/v1/media/search?lat='+lat+'&lng='+lng+'&distance='+distance+'&access_token='+access_token+'&count=100')
#https://api.instagram.com/v1/media/search?lat=48.858844&lng=2.294351&distance=5000&access_token=2017228644.ab103e5.f6083159690e476b94dff6cbe8b53759
else:
response = urllib.urlopen('https://api.instagram.com/v1/media/search?lat='+lat+'&lng='+lng+'&distance='+distance+'&max_timestamp='+timestamp_last_image+'&access_token='+access_token+'&count=100')
data = json.load(response)
for img in data["data"]:
#print img["images"]["standard_resolution"]["url"]
if img['id'] in images:
continue
images[img['id']] = 1
total_count = total_count + 1
count=count+1
urllib.urlretrieve(img["images"]["standard_resolution"]["url"],"C://Instagram/"+str(total_count)+".jpg")
#above line downloads image by retrieving it from the url
instaUrlFile.write(img["images"]["standard_resolution"]["url"]+"\n")
#above line captures image url so it can be passed directly to Face++ API from the text file instaUrlFile.txt
print "IMAGE WITH name "+str(total_count)+".jpg was just saved with created time "+data["data"][count-1]["created_time"]
#This for loop will download all the images from instagram and save them in the above path
timestamp_last_image=data["data"][count-1]["created_time"]
flag=1
count=0
Here the code which save all images.
I can't test it, coz i have not instagramm token.
import urllib, json
access_token = "ACCESS TOKEN" # Put here your ACCESS TOKEN
search_results = urllib.urlopen("https://api.instagram.com/v1/media/search?lat=48.858844&lng=2.294351&access_token='%s'" % access_token)
instagram_answer = json.loads(search_results) # Load Instagram Media Result
for row in instagram_answer['data']:
if row['type'] == "image": # Filter non images files
filename = row['id']
url = row['images']['standard_resolution']['url']
file_obj, headers = urllib.urlretrieve(
url=url,
filename=url
) # Save images

Capturing STDOUT to then reconcile on the file names

I've been struggling with this problem for a bit, I am trying to create a program that will create a datetime object based on the current date and time, create a second such object from our file data, find the difference between the two, and if it is greater than 10 minutes search for a "handshake file", which is a file we receive back when our file has successfully loaded. If we don't find that file, I want to kick out an error email.
My problem lies in being able to capture the result of my ls command in a meaningful way where I would be able to parse through it and see if the correct file exists. Here is my code:
"""
This module will check the handshake files sent by Pivot based on the following conventions:
- First handshake file (loaded to the CFL, *auditv2*): Check every half-hour
- Second handshake file (proofs are loaded and available, *handshake*): Check every 2 hours
"""
import smtplib
from email.mime.text import MIMEText
from datetime import datetime, timedelta
from csv import DictReader
from subprocess import *
from os import chdir
from glob import glob
def main():
audit_in = '/prod/bcs/lgnp/clientapp/csvbill/audit_process/lgnp.smr.csv0000.audit.qty'
with open(audit_in, 'rbU') as audit_qty:
my_audit_reader = DictReader(audit_qty, delimiter=';', restkey='ignored')
my_audit_reader.fieldnames = ("Property Code",
"Pivot ID",
"Inwork File",
"Billing Manager E-mail",
"Total Records",
"Number of E-Bills",
"Printed Records",
"File Date",
"Hour",
"Minute",
"Status")
# Get current time to reconcile against
now = datetime.now()
# Change internal directory to location of handshakes
chdir('/prod/bcs/lgnp/input')
for line in my_audit_reader:
piv_id = line['Pivot ID']
status = line['Status']
file_date = datetime(int(line['File Date'][:4]),
int(line['File Date'][4:6]),
int(line['File Date'][6:8]),
int(line['Hour']),
int(line['Minute']))
# print(file_date)
if status == 's':
diff = now - file_date
print diff
print piv_id
if 10 < (diff.seconds / 60) < 30:
proc = Popen('ls -lh *{0}*'.format(status),
shell=True) # figure out how to get output
print proc
def send_email(recipient_list):
msg = MIMEText('Insert message here')
msg['Subject'] = 'Alert!! Handshake files missing!'
msg['From'] = r'xxx#xxx.com'
msg['To'] = recipient_list
s = smtplib.SMTP(r'xxx.xxx.xxx')
s.sendmail(msg['From'], msg['To'], msg.as_string())
s.quit()
if __name__ == '__main__':
main()
To parse ls output is not the best solution here. You can surely do that parsing subprocess.check_output result or in any other way, but let me give you an advice.
It is a good criterion of something going wrong if you find yourself parsing someone's output or logs to solve a standard problem, please consider other solutions, like offered below:
If the only thing you want is to see the contents of the directory use os.listdir like:
my_home_files = os.listdir(os.path.expanduser('~/my_dir')) # surely it's cross-platform
now you have a list of files in your my_home_files variable.
You can filter them in the way you want or use glob.glob to use metacharacters like that:
glob.glob("/home/me/handshake-*.txt") # will output everything matching the expression
# (say you have ids in your filenames).
After that you may want to check some stats of the file (like the date of last access etc.)
consider using os.stat:
os.stat(my_home_files[0]) # outputs stats of the first
# posix.stat_result(st_mode=33104, st_ino=140378115, st_dev=3306L, st_nlink=1, st_uid=23449, st_gid=59216, st_size=1442, st_atime=1421834474, st_mtime=1441831745, st_ctime=1441234474)
# see os.stat linked above to understand how to parse it

Getting file input into Python script for praw script

So I have a simple reddit bot set up which I wrote using the praw framework. The code is as follows:
import praw
import time
import numpy
import pickle
r = praw.Reddit(user_agent = "Gets the Daily General Thread from subreddit.")
print("Logging in...")
r.login()
words_to_match = ['sdfghm']
cache = []
def run_bot():
print("Grabbing subreddit...")
subreddit = r.get_subreddit("test")
print("Grabbing thread titles...")
threads = subreddit.get_hot(limit=10)
for submission in threads:
thread_title = submission.title.lower()
isMatch = any(string in thread_title for string in words_to_match)
if submission.id not in cache and isMatch:
print("Match found! Thread ID is " + submission.id)
r.send_message('FlameDraBot', 'DGT has been posted!', 'You are awesome!')
print("Message sent!")
cache.append(submission.id)
print("Comment loop finished. Restarting...")
# Run the script
while True:
run_bot()
time.sleep(20)
I want to create a file (text file or xml, or something else) using which the user can change the fields for the various information being queried. For example I want a file with lines such as :
Words to Search for = sdfghm
Subreddit to Search in = text
Send message to = FlameDraBot
I want the info to be input from fields, so that it takes the value after Words to Search for = instead of the whole line. After the information has been input into the file and it has been saved. I want my script to pull the information from the file, store it in a variable, and use that variable in the appropriate functions, such as:
words_to_match = ['sdfghm']
subreddit = r.get_subreddit("test")
r.send_message('FlameDraBot'....
So basically like a config file for the script. How do I go about making it so that my script can take input from a .txt or another appropriate file and implement it into my code?
Yes, that's just a plain old Python config, which you can implement in an ASCII file, or else YAML or JSON.
Create a subdirectory ./config, put your settings in ./config/__init__.py
Then import config.
Using PEP-18 compliant names, the file ./config/__init__.py would look like:
search_string = ['sdfghm']
subreddit_to_search = 'text'
notify = ['FlameDraBot']
If you want more complicated config, just read the many other posts on that.

Categories

Resources