Python weasyprint unable to find 'gobject-2.0-0' library

Python weasyprint unable to find 'gobject-2.0-0' library - python

While following the installation process for Saleor headless e-commerce, the Python Weasyprint package fails to load the gobject-2.0.0 dependency which I have already installed on my machine using Macport.
Below is the source code showing where the error is emitting from after starting the Django server. The file holds utility functions for generating invoices for the plugin file.
utils.py
import os
import re
from datetime import datetime
from decimal import Decimal
import pytz
from django.conf import settings
from django.template.loader import get_template
from prices import Money
from weasyprint import HTML # <----------- This what is emitting the error because the
# package can't find the needed dependency.
from ...giftcard import GiftCardEvents
from ...giftcard.models import GiftCardEvent
from ...invoice.models import Invoice
MAX_PRODUCTS_WITH_TABLE = 3
MAX_PRODUCTS_WITHOUT_TABLE = 4
MAX_PRODUCTS_PER_PAGE = 13
def make_full_invoice_number(number=None, month=None, year=None):
now = datetime.now()
current_month = int(now.strftime("%m"))
current_year = int(now.strftime("%Y"))
month_and_year = now.strftime("%m/%Y")
if month == current_month and year == current_year:
new_number = (number or 0) + 1
return f"{new_number}/{month_and_year}"
return f"1/{month_and_year}"
def parse_invoice_dates(number: str):
match = re.match(r"^(\d+)\/(\d+)\/(\d+)", number)
if not match:
raise ValueError("Unrecognized invoice number format")
return int(match.group(1)), int(match.group(2)), int(match.group(3))
def generate_invoice_number():
last_invoice = Invoice.objects.filter(number__isnull=False).last()
if not last_invoice or not last_invoice.number:
return make_full_invoice_number()
try:
number, month, year = parse_invoice_dates(last_invoice.number)
return make_full_invoice_number(number, month, year)
except (IndexError, ValueError, AttributeError):
return make_full_invoice_number()
def chunk_products(products, product_limit):
"""Split products to list of chunks.
Each chunk represents products per page, product_limit defines chunk size.
"""
chunks = []
for i in range(0, len(products), product_limit):
limit = i + product_limit
chunks.append(products[i:limit])
return chunks
def get_product_limit_first_page(products):
if len(products) < MAX_PRODUCTS_WITHOUT_TABLE:
return MAX_PRODUCTS_WITH_TABLE
return MAX_PRODUCTS_WITHOUT_TABLE
def get_gift_cards_payment_amount(order):
events = GiftCardEvent.objects.filter(
type=GiftCardEvents.USED_IN_ORDER, order_id=order.id
)
total_paid = Decimal(0)
for event in events:
balance = event.parameters["balance"]
total_paid += Decimal(balance["old_current_balance"]) - Decimal(
balance["current_balance"]
)
return Money(total_paid, order.currency)
def generate_invoice_pdf(invoice): # <------- The function calling the HTML module from
# weasyprint
font_path = os.path.join(
settings.PROJECT_ROOT, "templates", "invoices", "inter.ttf"
)
all_products = invoice.order.lines.all()
product_limit_first_page = get_product_limit_first_page(all_products)
products_first_page = all_products[:product_limit_first_page]
rest_of_products = chunk_products(
all_products[product_limit_first_page:], MAX_PRODUCTS_PER_PAGE
)
order = invoice.order
gift_cards_payment = get_gift_cards_payment_amount(order)
creation_date = datetime.now(tz=pytz.utc)
rendered_template = get_template("invoices/invoice.html").render(
{
"invoice": invoice,
"creation_date": creation_date.strftime("%d %b %Y"),
"order": order,
"gift_cards_payment": gift_cards_payment,
"font_path": f"file://{font_path}",
"products_first_page": products_first_page,
"rest_of_products": rest_of_products,
}
)
return HTML(string=rendered_template).write_pdf(), creation_date
plugins.py
from typing import Any, Optional
from uuid import uuid4
from django.core.files.base import ContentFile
from django.utils.text import slugify
from ...core import JobStatus
from ...invoice.models import Invoice
from ...order.models import Order
from ..base_plugin import BasePlugin
from .utils import generate_invoice_number, generate_invoice_pdf
class InvoicingPlugin(BasePlugin):
PLUGIN_ID = "mirumee.invoicing"
PLUGIN_NAME = "Invoicing"
DEFAULT_ACTIVE = True
PLUGIN_DESCRIPTION = "Built-in saleor plugin that handles invoice creation."
CONFIGURATION_PER_CHANNEL = False
def invoice_request(
self,
order: "Order",
invoice: "Invoice",
number: Optional[str],
previous_value: Any,
) -> Any:
invoice_number = generate_invoice_number()
invoice.update_invoice(number=invoice_number)
file_content, creation_date = generate_invoice_pdf(invoice)
invoice.created = creation_date
slugified_invoice_number = slugify(invoice_number)
invoice.invoice_file.save(
f"invoice-{slugified_invoice_number}-order-{order.id}-{uuid4()}.pdf",
ContentFile(file_content), # type: ignore
)
invoice.status = JobStatus.SUCCESS
invoice.save(
update_fields=[
"created_at",
"number",
"invoice_file",
"status",
"updated_at",
]
)
return invoice
To fix the issue, I followed this instruction of creating a symlink which I did and pointed to it in the path environment of my machine yet it didn't fix the issue. Does that mean that Django isn't checking for the depenecy using the path environment?
It's also worth noting that an installation of Python and weasyprint using Homebrew would fix the issue but I don't use home because I'm MacOS Catalina 10.15 which isn't supported anymore thus the version for it is unstable to use.
I know the dependency is on my machine but it's been difficult to point to it? What am I doing wrong?
I've been on this for days!

After many days, it turned out this is a normal issue faced when using the Python Weasyprint package. This occurs when you install both Python and the required system dependencies using different installers. In my case, I used a system installer for Python and Macport installer for the dependencies and that's what caused the issue. Fortunately, answers already exist to solve the issue but I found this one particularly useful.

Related

Python maximum recursion depth exceeded getting sharepoint metadata only after compilation in .exe

I´ve written a script that connects to Sharepoint and downloads the metadata from all the new files added into a dataframe.
I have developed it in Spyder and it works just fine. But after I compile it into an .exe file with pyinstaller and I run it, I keep getting the following error: maximum recursion depth exceeded
I´m using Windows10 and Python 3.6.10 (conda)
from office365.runtime.auth.clientCredential import ClientCredential
from office365.sharepoint.client_context import ClientContext
import pandas as pd
clien_id = XXX
cient_secret = XXX
site_url = XXX
ctx = ClientContext.connect_with_credentials(site_url, ClientCredential(client_id, client_secret))
lib = ctx.web.lists.get_by_title("REO")
doc = 1
while cuenta_error < 10:
try:
item = lib.get_item_by_id(doc)
ctx.load(item)
ctx.execute_query()
id_documento = "{0}".format(item.properties["IdDocumentoSistemaOrigen"])
id_sharepoint = "{0}".format(item.properties["Id"])
ts_alta = datetime.strptime(str(datetime.now()), '%Y-%m-%d %H:%M:%S.%f').strftime('%Y-%m-%d %H:%M:%S')
origen = "{0}".format(item.properties["AplicacionOrigen"])
tipo_doc_SH = "{0}".format(item.properties["TipoDocumento"]['Label'])
fecha_alta_origen = datetime.strptime("{0}".format(item.properties["Created"])[0:10], '%Y-%m-%d').strftime('%Y-%m-%d')
id_inmueble = "{0}".format(item.properties["IdActivoPrinex"])
dni = "{0}".format(item.properties["NumeroIdentificacion"])
id_promocion = "{0}".format(item.properties["CodigoPromocion"])
file = item.file
ctx.load(file)
ctx.execute_query()
nombre_documento = "{0}".format(file.properties["Name"])
fecha_envio = None
IAObjeto = None
#print(id_documento)
doc +=1
cuenta_error = 0
fecha_envio = None
ind_error = None
to_append = [id_documento, id_sharepoint, tipo_documento, ts_alta, nombre_documento,
origen, tipo_doc_SH, fecha_alta_origen, id_inmueble, fecha_envio, ind_error,dni, id_promocion, IAObjeto]
a_series = pd.Series(to_append, index = df.columns)
df = df.append(a_series, ignore_index=True)
export_doc +=1
except Exception as e:
print(e)
I have tryied with sys.setrecursionlimit(1500) but I still get the same error and with sys.setrecursionlimit(1000**6) the code end up crashing.
Does anybody have any suggestion about how to fix this?

Good Morning
Finally I figured out the problem
ClientContext works in a recursive way and after a few calls crush,
I edited my code so I call the context each time I call a new item and now it's working.
I created two functions that I call every time I want to get the metadata from each file.
With this functions I close the context and open it again after each call, probably is not the best way, but I run out of ideas and this work.
from office365.runtime.auth.client_credential import ClientCredential
from office365.sharepoint.client_context import ClientContext
import pandas as pd
from datetime import datetime, date
def call_lib(site_url,client_id, client_secret,serv):
ctx = ClientContext.connect_with_credentials(site_url, ClientCredential(client_id, client_secret))
if serv == 'PRN':
lib = ctx.web.lists.get_by_title("REO")
elif serv == 'GIA':
lib = ctx.web.lists.get_by_title("DESINVERSIÓN")
return ctx, lib
def get_metadata_from_sh (ctx,lib,docnum):
item = lib.get_item_by_id(docnum)
ctx.load(item)
ctx.execute_query()
id_documento = "{0}".format(item.properties["IdDocumentoSistemaOrigen"])
id_sharepoint = "{0}".format(item.properties["Id"])
ts_alta = datetime.strptime(str(datetime.now()), '%Y-%m-%d %H:%M:%S.%f').strftime('%Y-%m-%d %H:%M:%S')
origen = "{0}".format(item.properties["AplicacionOrigen"])
tipo_doc_SH = "{0}".format(item.properties["TipoDocumento"]['Label'])
fecha_alta_origen = datetime.strptime("{0}".format(item.properties["Created"])[0:10], '%Y-%m-%d').strftime('%Y-%m-%d')
id_inmueble = "{0}".format(item.properties["IdActivoPrinex"])
dni = "{0}".format(item.properties["NumeroIdentificacion"])
id_promocion = "{0}".format(item.properties["CodigoPromocion"])
file = item.file
ctx.load(file)
ctx.execute_query()
nombre_documento = "{0}".format(file.properties["Name"])
fecha_envio = None
IAObjeto = None
return id_documento, id_sharepoint, ts_alta, origen, tipo_doc_SH, fecha_alta_origen, id_inmueble, nombre_documento, fecha_envio, dni, int(id_promocion), IAObjeto
ctx, lib = call_lib(site_url,client_id, client_secret,serv_origen[i])
id_documento, id_sharepoint, ts_alta, origen, tipo_doc_SH, fecha_alta_origen, id_inmueble, nombre_documento, fecha_envio, dni,id_promocion, IAObjeto = get_metadata_from_sh (ctx,lib,doc)

Reset index name in elasticsearch dsl

I'm trying to create an ETL that extracts from mongo, process the data and loads into elastic. I will do a daily load so I thought of naming my index with the current date. This will help me for a later processing I need to do with this first index.
I used elasticsearch dsl guide: https://elasticsearch-dsl.readthedocs.io/en/latest/persistence.html
The problem that I have comes from my little experience with working with classes. I don't know how to reset the Index name from the class.
Here is my code for the class (custom_indices.py):
from elasticsearch_dsl import Document, Date, Integer, Keyword, Text
from elasticsearch_dsl.connections import connections
from elasticsearch_dsl import Search
import datetime
class News(Document):
title = Text(analyzer='standard', fields={'raw': Keyword()})
manual_tagging = Keyword()
class Index:
name = 'processed_news_'+datetime.datetime.now().strftime("%Y%m%d")
def save(self, ** kwargs):
return super(News, self).save(** kwargs)
def is_published(self):
return datetime.now() >= self.processed
And this is the part of the code where I create the instance to that class:
from custom_indices import News
import elasticsearch
import elasticsearch_dsl
from elasticsearch_dsl.connections import connections
import pandas as pd
import datetime
connections.create_connection(hosts=['localhost'])
News.init()
for index, doc in df.iterrows():
new_insert = News(meta={'id': doc.url_hashed},
title = doc.title,
manual_tagging = doc.customTags,
)
new_insert.save()
Every time I call the "News" class I would expect to have a new name. However, the name doesn't change even if I load the class again (from custom_indices import News). I know this is only a problem I have when testing but I'd like to know how to force that "reset". Actually, I originally wanted to change the name outside the class with this line right before the loop:
News.Index.name = "NEW_NAME"
However, that didn't work. I was still seeing the name defined on the class.
Could anyone please assist?
Many thanks!
PS: this must be just an object oriented programming issue. Apologies for my ignorance on the subject.

Maybe you could take advantage of the fact that Document.init() accepts an index keyword argument. If you want the index name to get set automatically, you could implement init() in the News class and call super().init(...) in your implementation.
A simplified example (python 3.x):
from elasticsearch_dsl import Document
from elasticsearch_dsl.connections import connections
import datetime
class News(Document):
#classmethod
def init(cls, index=None, using=None):
index_name = index or 'processed_news_' + datetime.datetime.now().strftime("%Y%m%d")
return super().init(index=index_name, using=using)

You can override the index when you call save() .
new_insert.save('processed_news_' + datetime.datetime.now().strftime("%Y%m%d"))

Example as following.
# coding: utf-8
import datetime
from elasticsearch_dsl import Keyword, Text, \
Index, Document, Date
from elasticsearch_dsl.connections import connections
HOST = "localhost:9200"
index_names = [
"foo-log-",
"bar-log-",
]
default_settings = {"number_of_shards": 4, "number_of_replicas": 1}
index_settings = {
"foo-log-": {
"number_of_shards": 40,
"number_of_replicas": 1
}
}
class LogDoc(Document):
level = Keyword(ignore_above=256)
date = Date(format="yyyy-MM-dd'T'HH:mm:ss.SSS")
hostname = Text(fields={'fields': Keyword(ignore_above=256)})
message = Text()
createTime = Date(format="yyyy-MM-dd'T'HH:mm:ss.SSS")
def auto_create_index():
'''自动创建ES索引'''
connections.create_connection(hosts=[HOST])
for day in range(3):
dt = datetime.datetime.now() + datetime.timedelta(days=day)
for index in index_names:
name = index + dt.strftime("%Y-%m-%d")
settings = index_settings.get(index, default_settings)
idx = Index(name=name)
idx.document(LogDoc)
idx.settings(**settings)
try:
idx.create()
except Exception as e:
print(e)
continue
print("create index %s" % name)
if __name__ == '__main__':
auto_create_index()

Is there a way to get the full path of the shell:appsfolder on Windows 10?

I'd like to be able to list the files in the shell:appsfolder in a python script but need the full path to do this using os.list. Is there a way to get the full path (or does anyone know it)? Alternatively, is there a different way I can list these files? Can I "cd" to it?
The idea behind the script is to automate the shortcut creation of all the Windows Store apps (identified by the fact they have a "long name" property I think) and extract those shortcuts to a folder where the program Launchy can detect them. I don't like having to manually go through the process of creating the shortcut (and renaming it to remove the " - shortcut) every time I download or remove an app so I thought I'd automate it.

Here's a function that hopefully does what you want in terms of creating shortcuts for the Windows Store apps that are listed in the "Applications" virtual folder (i.e. FOLDERID_AppsFolder). To classify Windows Store apps, it looks for an exclamation point in the Application User Model ID since the AUMID should be of the form "PackageFamily!ApplicationID" (see Automate Launching UWP Apps). For reliability it cross-checks each package family with the user's registered package families.
import os
import ctypes
import pywintypes
import pythoncom
import winerror
try:
import winreg
except ImportError:
# Python 2
import _winreg as winreg
bytes = lambda x: str(buffer(x))
from ctypes import wintypes
from win32com.shell import shell, shellcon
from win32com.propsys import propsys, pscon
# KNOWNFOLDERID
# https://msdn.microsoft.com/en-us/library/dd378457
# win32com defines most of these, except the ones added in Windows 8.
FOLDERID_AppsFolder = pywintypes.IID('{1e87508d-89c2-42f0-8a7e-645a0f50ca58}')
# win32com is missing SHGetKnownFolderIDList, so use ctypes.
_ole32 = ctypes.OleDLL('ole32')
_shell32 = ctypes.OleDLL('shell32')
_REFKNOWNFOLDERID = ctypes.c_char_p
_PPITEMIDLIST = ctypes.POINTER(ctypes.c_void_p)
_ole32.CoTaskMemFree.restype = None
_ole32.CoTaskMemFree.argtypes = (wintypes.LPVOID,)
_shell32.SHGetKnownFolderIDList.argtypes = (
_REFKNOWNFOLDERID, # rfid
wintypes.DWORD, # dwFlags
wintypes.HANDLE, # hToken
_PPITEMIDLIST) # ppidl
def get_known_folder_id_list(folder_id, htoken=None):
if isinstance(folder_id, pywintypes.IIDType):
folder_id = bytes(folder_id)
pidl = ctypes.c_void_p()
try:
_shell32.SHGetKnownFolderIDList(folder_id, 0, htoken,
ctypes.byref(pidl))
return shell.AddressAsPIDL(pidl.value)
except WindowsError as e:
if e.winerror & 0x80070000 == 0x80070000:
# It's a WinAPI error, so re-raise it, letting Python
# raise a specific exception such as FileNotFoundError.
raise ctypes.WinError(e.winerror & 0x0000FFFF)
raise
finally:
if pidl:
_ole32.CoTaskMemFree(pidl)
def enum_known_folder(folder_id, htoken=None):
id_list = get_known_folder_id_list(folder_id, htoken)
folder_shell_item = shell.SHCreateShellItem(None, None, id_list)
items_enum = folder_shell_item.BindToHandler(None,
shell.BHID_EnumItems, shell.IID_IEnumShellItems)
for item in items_enum:
yield item
def list_known_folder(folder_id, htoken=None):
result = []
for item in enum_known_folder(folder_id, htoken):
result.append(item.GetDisplayName(shellcon.SIGDN_NORMALDISPLAY))
result.sort(key=lambda x: x.upper())
return result
def create_shortcut(shell_item, shortcut_path):
id_list = shell.SHGetIDListFromObject(shell_item)
shortcut = pythoncom.CoCreateInstance(shell.CLSID_ShellLink, None,
pythoncom.CLSCTX_INPROC_SERVER, shell.IID_IShellLink)
shortcut.SetIDList(id_list)
persist = shortcut.QueryInterface(pythoncom.IID_IPersistFile)
persist.Save(shortcut_path, 0)
def get_package_families():
families = set()
subkey = (r'Software\Classes\Local Settings\Software\Microsoft'
r'\Windows\CurrentVersion\AppModel\Repository\Families')
with winreg.OpenKey(winreg.HKEY_CURRENT_USER, subkey) as hkey:
index = 0
while True:
try:
families.add(winreg.EnumKey(hkey, index))
except OSError as e:
if e.winerror != winerror.ERROR_NO_MORE_ITEMS:
raise
break
index += 1
return families
def update_app_shortcuts(target_dir):
package_families = get_package_families()
for item in enum_known_folder(FOLDERID_AppsFolder):
try:
property_store = item.BindToHandler(None,
shell.BHID_PropertyStore, propsys.IID_IPropertyStore)
app_user_model_id = property_store.GetValue(
pscon.PKEY_AppUserModel_ID).ToString()
except pywintypes.error:
continue
# AUID template: Packagefamily!ApplicationID
if '!' not in app_user_model_id:
continue
package_family, app_id = app_user_model_id.rsplit('!', 1)
if package_family not in package_families:
continue
name = item.GetDisplayName(shellcon.SIGDN_NORMALDISPLAY)
shortcut_path = os.path.join(target_dir, '%s.lnk' % name)
create_shortcut(item, shortcut_path)
print('{}: {}'.format(name, app_user_model_id))
example
if __name__ == '__main__':
desktop = shell.SHGetFolderPath(0, shellcon.CSIDL_DESKTOP, 0, 0)
target_dir = os.path.join(desktop, 'Windows Store Apps')
if not os.path.exists(target_dir):
os.mkdir(target_dir)
update_app_shortcuts(target_dir)

fixing old_version on yowsup

I have been tring to register number on Yowsup, but I got old_version error.
Here is my env_s40.py file:
from .env import YowsupEnv
import hashlib
class S40YowsupEnv(YowsupEnv):
_VERSION = "2.16.11"
_OS_NAME= "S40"
_OS_VERSION = "14.26"
_DEVICE_NAME = "302"
_MANUFACTURER = "Nokia"
_TOKEN_STRING = "PdA2DJyKoUrwLw1Bg6EIhzh502dF9noR9uFCllGk1478194306452"
_AXOLOTL = True
def getVersion(self):
return self.__class__._VERSION
def getOSName(self):
return self.__class__._OS_NAME
def getOSVersion(self):
return self.__class__._OS_VERSION
def getDeviceName(self):
return self.__class__._DEVICE_NAME
def getManufacturer(self):
return self.__class__._MANUFACTURER
def isAxolotlEnabled(self):
return self.__class__._AXOLOTL
def getToken(self, phoneNumber):
return hashlib.md5(self.__class__._TOKEN_STRING.format(phone = phoneNumber).encode()).hexdigest()
def getUserAgent(self):
return self.__class__._USERAGENT_STRING.format(
WHATSAPP_VERSION = self.getVersion(),
OS_NAME = self.getOSName() + "Version",
OS_VERSION = self.getOSVersion(),
DEVICE_NAME = self.getDeviceName(),
MANUFACTURER = self.getManufacturer()
)
And I got
DEBUG:yowsup.env.env:Current env changed to s40
yowsup-cli v2.0.15
yowsup v2.5.0
Copyright (c) 2012-2016 Tarek Galal
http://www.openwhatsapp.org
This software is provided free of charge. Copying and redistribution is
encouraged.
If you appreciate this software and you would like to support future
development please consider donating:
http://openwhatsapp.org/yowsup/donate
DEBUG:yowsup.common.http.warequest:{'Accept': 'text/json', 'User-Agent': 'WhatsApp/2.16.7 S40Version/14.26 Device/Nokia-302'}
DEBUG:yowsup.common.http.warequest:cc=255&in=716343889&lc=GB&lg=en&sim_mcc=640&sim_mnc=002&mcc=640&mnc=002&method=sms&mistyped=6&network_radio_type=1&simnum=1&s=&copiedrc=1&hasinrc=1&rcmatch=1&pid=6444&rchash=5b9d791a39befe77a165a669cac86d3fa12c7390536885aa43555059748747e9&anhash=Q%90%1D%2Am%BA%EF%1C%8E%E1%84%94%E4%93%C6%DD%AB%B55E&extexist=1&extstate=1&token=fe234b378d154c6dcca365d264ed96cf&id=%A0%3B%2B%C3%B6%A2%F9%04%CB%FA%AE%887%7FY%F7E%E6%D3%17
DEBUG:yowsup.common.http.warequest:Opening connection to v.whatsapp.net
DEBUG:yowsup.common.http.warequest:Sending GET request to /v2/code?cc=255&in=716343889&lc=GB&lg=en&sim_mcc=640&sim_mnc=002&mcc=640&mnc=002&method=sms&mistyped=6&network_radio_type=1&simnum=1&s=&copiedrc=1&hasinrc=1&rcmatch=1&pid=6444&rchash=5b9d791a39befe77a165a669cac86d3fa12c7390536885aa43555059748747e9&anhash=Q%90%1D%2Am%BA%EF%1C%8E%E1%84%94%E4%93%C6%DD%AB%B55E&extexist=1&extstate=1&token=fe234b378d154c6dcca365d264ed96cf&id=%A0%3B%2B%C3%B6%A2%F9%04%CB%FA%AE%887%7FY%F7E%E6%D3%17
INFO:yowsup.common.http.warequest:{"login":"255716MYNUMBER","status":"fail","reason":"old_version"}
status: fail
reason: old_version
login: 255716MYNUMBER
How can I fix that?

Probably a bad question for so.
But a short google gave:
https://github.com/tgalal/yowsup/issues/1861#issuecomment-264792842
There are also other discussions regarding a similar issue.

in the env_android file change the version to the most current one (whatsapp):

Why are the videos on the most_recent standard feed so out of date?

I'm trying to grab the most recently uploaded videos. There's a standard feed for that - it's called most_recent. I don't have any problems grabbing the feed, but when I look at the entries inside, they're all half a year old, which is hardly recent.
Here's the code I'm using:
import requests
import os.path as P
import sys
from lxml import etree
import datetime
namespaces = {"a": "http://www.w3.org/2005/Atom", "yt": "http://gdata.youtube.com/schemas/2007"}
fmt = "%Y-%m-%dT%H:%M:%S.000Z"
class VideoEntry:
"""Data holder for the video."""
def __init__(self, node):
self.entry_id = node.find("./a:id", namespaces=namespaces).text
published = node.find("./a:published", namespaces=namespaces).text
self.published = datetime.datetime.strptime(published, fmt)
def __str__(self):
return "VideoEntry[id='%s']" % self.entry_id
def paginate(xml):
root = etree.fromstring(xml)
next_page = root.find("./a:link[#rel='next']", namespaces=namespaces)
if next_page == None:
next_link = None
else:
next_link = next_page.get("href")
entries = [VideoEntry(e) for e in root.xpath("/a:feed/a:entry", namespaces=namespaces)]
return entries, next_link
prefix = "https://gdata.youtube.com/feeds/api/standardfeeds/"
standard_feeds = set("top_rated top_favorites most_shared most_popular most_recent most_discussed most_responded recently_featured on_the_web most_viewed".split(" "))
feed_name = sys.argv[1]
assert feed_name in standard_feeds
feed_url = prefix + feed_name
all_video_ids = []
while feed_url is not None:
r = requests.get(feed_url)
if r.status_code != 200:
break
text = r.text.encode("utf-8")
video_ids, feed_url = paginate(text)
all_video_ids += video_ids
all_upload_times = [e.published for e in all_video_ids]
print min(all_upload_times), max(all_upload_times)
As you can see, it prints the min and max timestamps for the entire feed.
misha#misha-antec$ python get_standard_feed.py most_recent
2013-02-02 14:40:02 2013-02-02 14:54:00
misha#misha-antec$ python get_standard_feed.py top_rated
2006-04-06 21:30:53 2013-07-28 22:22:38
I've glanced through the downloaded XML and it appears to match the output. Am I doing something wrong?
Also, on an unrelated note, the feeds I'm getting are all about 100 entries (I'm paginating through them 25 at a time). Is this normal? I expected the feeds to be a bit bigger.

Regarding the "Most-Recent-Feed"-Topic: There is a ticket for this one here. Unfortunately, the YouTube-API-Teams doesn't respond or solved the problem so far.
Regarding the number of entries: That depends on the type of standardfeed, but for the most-recent-Feed it´s usually around 100.
Note: You could try using the "orderby=published" parameter to get recents videos, although I don´t know how "recent" they are.
https://gdata.youtube.com/feeds/api/videos?orderby=published&prettyprint=True
You can combine this query with the "category"-parameter or other ones (region-specific queries - like for the standard feeds - are not possible, afaik).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python weasyprint unable to find 'gobject-2.0-0' library - python

Related

Python maximum recursion depth exceeded getting sharepoint metadata only after compilation in .exe

Reset index name in elasticsearch dsl

Is there a way to get the full path of the shell:appsfolder on Windows 10?

fixing old_version on yowsup

Why are the videos on the most_recent standard feed so out of date?

Categories

Resources