get filmorgaphy for a chosen company with IMDBPY - python

From the documentation i see that companies has only 'main' and don't have 'filmography' unlike persons, but are there a way to fetch movies for the chosen company? Maybe it's possible to see the list of 'Films in Production' and 'Past Film & Video'?
I try to populate the data for the company with related movies somehow, but 'main' stays an empty list.
I don't want to go through all the movies in the db in order to check if the company present there, as it seems to be very inefficient. I use 'http' access, as i don't need a data base copy locally.
my_company = ia.search_company('Walt Disney Pictures [US]')[0]
id = my_company.companyID
ia.get_company(id) # the only info i can get!
ia.update(my_company)
ia.get_company_infoset()
my_company.infoset2keys

Unfortunately, the IMDb web site changed the information on a company page, and right now IMDbPY is no longer able to collect other information beside the company's name and country.
I have opened an issue to describe the problem: https://github.com/alberanid/imdbpy/issues/198

Related

LinkedIn Marketing API Creative names

I am trying to use the marketing developer platform api to pull reports for my campaigns.
I want to be able to break down my reports by campaign and then by creative name.
In the LinkedIn documentation (https://learn.microsoft.com/en-gb/linkedin/marketing/integrations/ads-reporting/ads-reporting#statistics-finder) they give examples of the statistics finder and say that it can pull up to 3 pivots.
This is the example they give:
GET https://api.linkedin.com/v2/adAnalyticsV2?q=statistics&pivots[0]=CAMPAIGN&dateRange.start.day=1&dateRange.start.month=1&dateRange.start.year=2017&timeGranularity=DAILY&campaigns[0]=urn:li:sponsoredCampaign:1234567
I can't seem to get it to work for more than 1 pivot.
Another issue that I am facing is that I am not sure how to pull creative names - I can only seem to get creative ids in my api calls.
I am using the examples from to get campaign name:
https://learn.microsoft.com/en-gb/linkedin/shared/references/v2/ads/adcampaigns?context=linkedin/marketing/context
Looking at the creative name equivalent:
https://learn.microsoft.com/en-gb/linkedin/shared/references/v2/ads/adcreatives?context=linkedin/marketing/context
I cannot seem to find name for creatives here. Am I looking in the wrong place?
The magic sequence to get multiple pivots is:
...q=statistics&pivots[0]=ACCOUNT&pivots[1]=CAMPAIGN&pivots[2]=CREATIVE&...
As for creative names, they do not 'simply' exist. There are different fields (variables/data) for each type of creative, and what you would see in the UI depends on the type of campaign/creative displayed. For a simple Text Ad, it would be variables.data.title and variables.data.text. For the rest, you need to use projection to get specific fields from the urn's referenced.

Web Scraping with an Incrementing ID and Login

I’m trying to create a web scraper in python for a website called Canvas LMS to find a course link. The course link is formatted like this: [schoolname].instructure.com/courses/[id] I need to figure out how to have a bot log in with my api key and check all IDs from 1 to 10,000 for courses that does not contain the phrase “Unauthorized” or “Page Not Found” in the title so I can check them manually. However, I cannot figure out how to do any of this, as there is no guide (to my knowledge) that says how to do any of this. Tips would be appreciated.
I was working with a similar problem a few weeks back and found a solution.
Firstly, there is extensive Canvas LMS documentation available here
Secondly, find the relevant tags which can be identified and then copied to a download set. I forked a repository here and ran as a Docker container on my local machine.
I would also recommend viewing the data in a Jupyter Notebook. As a starter (and I do not advocate nested 'for loops' but you get the idea) try these:
from canvasapi import Canvas
canvas = Canvas("your-course-url", "api-token")
#Get courses and print
courses = canvas.get_courses()
for course in courses:
print(f"{course}")
#Get modules and print
modules = course.get_modules()
for module in modules:
print(f"{module}")
#Get modules items and print
module_items = course.get_module_items()
for item in module_items:
print(f"{item}")
Good luck

Getting the creation date of a page on Wikidata Query Service

I'm looking for a way to get all the movies on English Wikipedia, with their creation date.
A movie for me is a page with IMDB ID attached to it.
So, this is my query so far:
SELECT DISTINCT ?item_label ?imdb_id (year(?dateCreation) as ?AnneeCreation) WHERE {
?item wdt:P345 $imdb_id.
filter STRSTARTS(?imdb_id,"tt")
OPTIONAL{
?item wdt:P571 ?dateCreation.
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en".}
?article schema:about ?item ; schema:isPartOf <https://en.wikipedia.org/> ; schema:name ?item_label
}
The problem with this is that most of the pages don't have a P571 property,
So I was wondering whether there is a better way to get the creation date?
Maybe by the revision history or something, I couldn't find such an option.
Any help will be appreciated!
So, as the comments have noted, Wikidata properties (with some rare examples like featured-article flags) describe the underlying concept, not the Wikipedia page metadata. There is some limited ability to talk to the Wikipedia API as #AKSW points out, but my understanding is that this doesn't work very well for large numbers of articles (note the example code has a LIMIT 50 in it)
However, all is not lost! I worked out a methodology to do this at scale for very large numbers of articles recently in Gender and Deletion on Wikipedia, using a bit of lateral thinking.
First step: figure out your Wikidata query. tt-prefixed IMDB tags may apply to things other than films (eg TV episodes, sports broadcasts), so another approach might be to do a P31/P279 type/class search to find all things that are "films, or subclasses of films". You will also want to add a filter that explicitly says "and only has an article in English Wikipedia", which I see you've already done. Note that this gives you the name of the WP article, not the "label" of the Wikidata item, which is distinct, so you can drop the (time-consuming) label service clause. You'll end up with something like https://w.wiki/FH4 (this still uses the tt- prefix approach and gets 180k results) or https://w.wiki/FH8 (P31/P279 filter plus tt- prefix, 136k results)
Run this query, save the results TSV somewhere, and move on to step 2. The tool we will use here is PetScan, which is designed to link up data from Wikipedia categories, Wikipedia metadata, Wikidata queries, etc.
Feed the SPARQL query into tab 4 ("Other sources") and say "Use wiki: enwiki" at the bottom of this tab. This will force it to output data on the Wikipedia articles linked from this query.
Now hit "do it", wait a little while, (it took ~100s when I tested it) and examine the results. You will see that we get title (the WP article), page ID, namespace (hopefully always "(Article)", size in bytes, and last-touched date. None of these are creation date...
...except one of them kind of is. PageIDs are assigned sequentially, so they are essentially time-of-creation timestamps. There are some nuances here about edge cases - eg if I created a redirect called "Example (film)" in 2010, and in 2015 manually edited the redirect to become a real article called "Example (film)", it would show up as created in 2010. There may also be odd results for pages deleted and recreated, or ones that have had complicated page-move histories (straightforward page moves should maintain IDs, though). But, in general, for 95% of items, the pageID will reflect the time at which it was first created onwiki. For example, 431900000 was created at 11.14am on 1 July 2014; 531900000 was created at 6.29pm on 14 February 2017; and so on.
Back to PetScan - let's pull down all these items. In PetScan, go to the last tab and select TSV. Re-run the search and save the resulting file.
Now, we have one TSV with Wikidata IDs, IMDB IDs, and WP page titles (plus anything else you want to recover from WD queries); we have another with WP page titles and page IDs. You can link them together using WP page titles, letting you go from "results in Wikidata" to "page ID". Clean these up and link them however you prefer - I did it in bash, you might want to use something more sensible like python.
Now you can convert PageID to creation date. For the work I did I was only interested in six-month bins so I just worked out an arbitrary pageID created on 1 January and 1 July each year, and counted IDs between them. You could do the same thing, or use the API to look up individual pageIDs and get creation timestamps back - depends exactly what you're wanting to get.
This is all a bit more complicated than just using the query service, and it will ''probably'' give spurious results for one or two articles with complicated histories, but it will basically let you do what you originally asked for.

How to get list of categories in Google Books API?

I was searching for an already answered question about this but couldn't find one so please forgive me if I somehow missed it.
I'm using Google Books API and I know I can search a book by specific category.
My question is, how can I get all the available categories from the API?
I looked in the API documentation but couldn't find any mention of this.
The Google books api does not have an end point for returning Categories that are not associated with a book itself.
The Google Books api is only there to list books. You can
search and browse through the list of books that match a given query.
view information about a book, including metadata, availability and price, links to the preview page.
manage your own bookshelves.
You can see the category of a book you can not get a list of available categories in the whole system
You may be interested to know this has been on their todo list since 2012 category list
We have numerous requests for this and we're investigating how we can properly provide the data. One issue is Google does not own all the category information. "New York Times Bestsellers" is one obvious example. We need to first identify what we can publish through the API.
work around
i worked around it by implementing my own category list mechanism so i can pull all categories that exists in my app's database.
(unfortunately, the newly announced ScriptDb deprecation means my whole system will go to waste in a couple of monthes anyway... but that's another story)
https://support.google.com/books/partner/answer/3237055?hl=en
Scroll down to subject/genres and you will see this link.
https://bisg.org/page/bisacedition
This list is apparently a list of subjects AKA categories for North American Books. I am making various GET requests with an API testing tool and getting for the most part, perfect matches (you may have to drop a word from the query string. ex: "criticism" instead of "literary criticism") for whatever subject I choose from the BISG subjects list, and what comes back in the json response under the "categories" key.
Ex: GET https://www.googleapis.com/books/v1/volumes?q=business+subject:juvenile+fiction
Long story short, the BISG link is where I'm pretty sure Google got all the options for their "categories" key from.

Gracenote/Cddb official genre list

I am working with mp3s and metadata. I have used Python to edit and add metadata to each file, but I cannot seem to get genres to work. I have used Pygn, a gracenote module that seems to be doing its job well. The problem is with the gracenote data itself. When I request data for, say daft punk, pygn.search(clientID=clientid, userID=userid, artist="Daft Punk", album="Random Access Memories")
I am returned a JSON as expected. The problem is that the genre key gives me a text anwser, and a number such as 45720. I was wondering if maybe there is some Dewey decimal-like system in place here. Do you know what system this is? Do you have an official gracenote genre list?
Gracenote does not provide genre list. But you can get the top level genre list of 25 genres through Rhythm API. Check out the fieldvalues API call on Gracenote Developer Program website.

Categories

Resources