Extract and download chapters from youtube videos [duplicate] - python

Recently, Youtube added the ability to break up their videos in the progress bar into sections called "chapters".
https://support.google.com/youtube/answer/9884579?hl=en
Currently I am able to get info about a video from the Youtube API. However, it doesn't seem like there's any info about a video's chapters, and I haven't found anything in the API documentation about chapters. Am I missing something, or is there simply no way to get chapter data yet?

As far as I know, such data is in plain text in the description of the video.
So, you can use the following example:
Video used in this demonstration: Top 10 Monsters with 2500 Attack in YuGiOh
URL Request:
https://www.googleapis.com/youtube/v3/videos?part=snippet&id=NNgYId7b4j0&key=[YOUR_API_KEY]
Response:
{
"kind": "youtube#videoListResponse",
"etag": "YpVLmrSx1iP8hAJOnumaTBoKqqQ",
"items": [
{
"kind": "youtube#video",
"etag": "oIoJq5F3RHvBbtVohafaJ_1SThU",
"id": "NNgYId7b4j0",
"snippet": {
"publishedAt": "2020-09-14T18:37:46Z",
"channelId": "UC0roOaAn95Rtgoe078RkVXQ",
"title": "Top 10 Monsters with 2500 Attack in YuGiOh",
"description": "In this video we'll go over the best monsters that have 2500 attack, and attack threshold for a lot of boss monsters actually.\n\nCheck out my DnD channel #TheD&DLogs \n\n--The List--\nIntro: (0:00)\n10- Blue-Eyes Spirit Dragon: (0:00)\n9- Invoked Mechaba: (2:14)\n8- Number S39: Utopia the Lightning: (3:23)\n7- Earthbound Immortal Aslla Piscu: (4:35)\n6- Eldlich the golden Lord: (6:04)\n5- True King Lithosagym the Disaster: (7:34)\n4- Block Dragon: (8:54)\n3- Astrograph sorcerer: (10:25)\n2- Beatrice lady of the eternal: (12:36)\n1- Firewall Dragon: (14:37)\n- \n-----------------------------------------\n#yugioh #top10 \n\nDuels are all done on EDOpro, its completely free and updated all the time. If you want it, just look for the EDOpro discord and you'll find all you need to download it from there\n\nSome of the Video backgrounds in this video were made by \"Amitai Angor AA VFX\" https://www.youtube.com/dvdangor2011\n\n\nhttps://twitter.com/hirumared\nhttps://twitter.com/TheDuelLogs",
"thumbnails": {
"default": {
"url": "https://i.ytimg.com/vi/NNgYId7b4j0/default.jpg",
"width": 120,
"height": 90
},
"medium": {
"url": "https://i.ytimg.com/vi/NNgYId7b4j0/mqdefault.jpg",
"width": 320,
"height": 180
},
"high": {
"url": "https://i.ytimg.com/vi/NNgYId7b4j0/hqdefault.jpg",
"width": 480,
"height": 360
},
"standard": {
"url": "https://i.ytimg.com/vi/NNgYId7b4j0/sddefault.jpg",
"width": 640,
"height": 480
},
"maxres": {
"url": "https://i.ytimg.com/vi/NNgYId7b4j0/maxresdefault.jpg",
"width": 1280,
"height": 720
}
},
"channelTitle": "TheDuelLogs",
"tags": [
"yugioh",
"ygo",
"dev",
"pro",
"link",
"duels",
"auto-matic duels",
"online",
"current",
"ban",
"list",
"dueling",
"network",
"theduellogs",
"the",
"duel",
"logs",
"loggs",
"Yu",
"Gi",
"Oh!",
"YGOpro",
"gimmick",
"links",
"top ten",
"2020",
"edopro"
],
"categoryId": "20",
"liveBroadcastContent": "none",
"localized": {
"title": "Top 10 Monsters with 2500 Attack in YuGiOh",
"description": "In this video we'll go over the best monsters that have 2500 attack, and attack threshold for a lot of boss monsters actually.\n\nCheck out my DnD channel #TheD&DLogs \n\n--The List--\nIntro: (0:00)\n10- Blue-Eyes Spirit Dragon: (0:00)\n9- Invoked Mechaba: (2:14)\n8- Number S39: Utopia the Lightning: (3:23)\n7- Earthbound Immortal Aslla Piscu: (4:35)\n6- Eldlich the golden Lord: (6:04)\n5- True King Lithosagym the Disaster: (7:34)\n4- Block Dragon: (8:54)\n3- Astrograph sorcerer: (10:25)\n2- Beatrice lady of the eternal: (12:36)\n1- Firewall Dragon: (14:37)\n- \n-----------------------------------------\n#yugioh #top10 \n\nDuels are all done on EDOpro, its completely free and updated all the time. If you want it, just look for the EDOpro discord and you'll find all you need to download it from there\n\nSome of the Video backgrounds in this video were made by \"Amitai Angor AA VFX\" https://www.youtube.com/dvdangor2011\n\n\nhttps://twitter.com/hirumared\nhttps://twitter.com/TheDuelLogs"
},
"defaultAudioLanguage": "en"
}
}
],
"pageInfo": {
"totalResults": 1,
"resultsPerPage": 1
}
}
Get the response:
response.items[0].snippet.description
Results:
"In this video we'll go over the best monsters that have 2500 attack, and attack threshold for a lot of boss monsters actually.
Check out my DnD channel #TheD&DLogs
--The List--
Intro: (0:00)
10- Blue-Eyes Spirit Dragon: (0:00)
9- Invoked Mechaba: (2:14)
8- Number S39: Utopia the Lightning: (3:23)
7- Earthbound Immortal Aslla Piscu: (4:35)
6- Eldlich the golden Lord: (6:04)
5- True King Lithosagym the Disaster: (7:34)
4- Block Dragon: (8:54)
3- Astrograph sorcerer: (10:25)
2- Beatrice lady of the eternal: (12:36)
1- Firewall Dragon: (14:37)
-
-----------------------------------------
#yugioh #top10
Duels are all done on EDOpro, its completely free and updated all the time. If you want it, just look for the EDOpro discord and you'll find all you need to download it from there
Some of the Video backgrounds in this video were made by "Amitai Angor AA VFX" https://www.youtube.com/dvdangor2011
https://twitter.com/hirumared
https://twitter.com/TheDuelLogs"

One more time YouTube Data API v3 doesn't provide a basic feature.
I would suggest you to use my open-source YouTube operational API, indeed by requesting https://yt.lemnoslife.com/videos?part=chapters&id=VIDEO_ID you would get a JSON with the video chapters (titles and timestamps) you are looking for in item['chapters']['chapters'].
Example of result with YouTube video id NNgYId7b4j0:
{
"kind": "youtube#videoListResponse",
"etag": "NotImplemented",
"items": [
{
"kind": "youtube#video",
"etag": "NotImplemented",
"id": "NNgYId7b4j0",
"chapters": {
"areAutoGenerated": false,
"chapters": [
{
"title": "10- Blue-Eyes Spirit Dragon",
"time": 0,
"thumbnails": [
{
"url": "https:\/\/i.ytimg.com\/vi\/NNgYId7b4j0\/hqdefault_4000.jpg?sqp=-oaymwEiCKgBEF5IWvKriqkDFQgBFQAAAAAYASUAAMhCPQCAokN4AQ==&rs=AOn4CLCoTrvu0Yu-iNxb7o4II-pxi5WVbQ",
"width": 168,
"height": 94
},
{
"url": "https:\/\/i.ytimg.com\/vi\/NNgYId7b4j0\/hqdefault_4000.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLCuupNwIgFIf9hXbjMsvpSGThFyhg",
"width": 336,
"height": 188
}
]
},
{
"title": "9- Invoked Mechaba",
"time": 134,
"thumbnails": [
{
"url": "https:\/\/i.ytimg.com\/vi\/NNgYId7b4j0\/hqdefault_135933.jpg?sqp=-oaymwEiCKgBEF5IWvKriqkDFQgBFQAAAAAYASUAAMhCPQCAokN4AQ==&rs=AOn4CLBe94BKNpQXvM2dUl75LtcgX0N03w",
"width": 168,
"height": 94
},
{
"url": "https:\/\/i.ytimg.com\/vi\/NNgYId7b4j0\/hqdefault_135933.jpg?sqp=-oaymwEjCNACELwBSFryq4qpAxUIARUAAAAAGAElAADIQj0AgKJDeAE=&rs=AOn4CLBULUhlI1OOjJiW6mpFDUhPzh4Adw",
"width": 336,
"height": 188
}
]
},
...
]
}
}
]
}

I am replying with this answer to help people such as myself who ended up on this video wanting to find a youtube chapter parser / extractor for text rather than where to find the chapter data. Just to add some further information, currently, there is no way to get the chapters from the official YouTube API, so the only way to get the chapters from a text-description response (like the YouTube API provides) is to parse it in some way:
My answer is in Javascript but it can easily be converted: The idea is to extract the MIN:SEC and HR:MIN:SEC timestamps then to generate the title we remove the word that includes them (So this would typically remove however people aesthetically wrap them too [00:00] or (00:00)
It's far from perfect, but in my experience it's better than the other solutions I've seen on github/npm at the time of writing this. You might want to also trim away starting and ending spaces and punctuational separators such as (-, :, ~, |) too
const parseChapters = (description) => {
// Extract timestamps (either 00:00:00, 0:00:00, 00:00 or 0:00)
const lines = description.split("\n")
const regex = /(\d{0,2}:?\d{1,2}:\d{2})/g
const chapters = []
for (const line of lines) {
// Match the regex and check if the line contains a matched regex
const matches = line.match(regex)
if (matches) {
const ts = matches[0]
const title = line
.split(" ")
.filter((l) => !l.includes(ts))
.join(" ")
chapters.push({
timestamp: ts,
title: title,
})
}
}
return chapters
}

Very late answer but it solved my problem.
You could use the code below. It's written in C# but it can easily be transcribed into another language. Since you can already get youtube video data, I assume you also have the description of the video.
private static IEnumerable<string> GetChaptersFromDescription(string text)
{
var lines = text.Split("\n");
var regex = new Regex(#"[0-9]:[0-9][0-9]");
foreach (var line in lines)
{
if (regex.IsMatch(line))
{
yield return line;
}
}
}

Related

Using Atlasians Insight API to bring over AWS resources

I'm very new to this API. I've been able to figure out everything in this link up until step 3 ([api docs][1]). I have a sample payload of what we want to import, but I have no idea what the schema/mapping should be. The example provided [here][2], for a hard drive does not make sense to me at all. I've even tried sending the exact payload/mapping from that example and get hit with a 409 error. Any help would be great.
Example of what I want to bring in:
{
"ARN": "arn:aws:codedeploy:ca-central-1:030375219570:deploymentconfig:CodeDeployDefault.LambdaCanary10Percent10Minutes",
"availabilityZone": "Not Applicable",
"awsAccountId": "030375219570",
"awsRegion": "ca-central-1",
"configuration": {
"computePlatform": "Lambda",
"deploymentConfigId": "00000000-0000-0000-0000-000000000008",
"deploymentConfigName": "CodeDeployDefault.LambdaCanary10Percent10Minutes",
"trafficRoutingConfig": {
"timeBasedCanary": {
"canaryInterval": 10,
"canaryPercentage": 10
},
"type": "TimeBasedCanary"
}
},
"configurationItemCaptureTime": "2022-02-09T20:42:23.445Z",
"configurationItemStatus": "ResourceDiscovered",
"configurationItemVersion": "1.3",
"configurationStateId": 1644439343445,
"configurationStateMd5Hash": "",
"relatedEvents": [],
"relationships": [],
"resourceId": "00000000-0000-0000-0000-000000000008",
"resourceName": "CodeDeployDefault.LambdaCanary10Percent10Minutes",
"resourceType": "AWS::CodeDeploy::DeploymentConfig",
"supplementaryConfiguration": {},
"tags": {}
}
If anyone knows how the mapping/schema would look for something like the above, I'm all ears.
thanks
[1]: https://developer.atlassian.com/cloud/assets/imports/workflow/
[2]: https://developer.atlassian.com/cloud/assets/imports/schema-and-mapping/?utm_source=%2Fcloud%2Finsight%2Fimports%2Fschema-and-mapping%2F&utm_medium=301#external-imports-schema-and-mapping

Write a list of dictionaries (with varying keys) to one .csv file?

Given this dictionary:
{
"last_id": "9095247150673486907",
"stories": [
{
"description": "The $68.7 billion deal would be Microsoft\u2019s biggest takeover ever and the biggest deal in video game history. The acquisition would make Microsoft the world\u2019s third-largest gaming company by revenue,\u2026 The post Following the takeover of Activision by Microsoft, Sony is already being shaken up appeared first on The Latest News.",
"favicon_url": "https://static.tickertick.com/website_icons/gettotext.com.ico",
"id": "5310290716350155140",
"site": "gettotext.com",
"tags": [
"msft"
],
"time": 1642641278000,
"title": "Following the takeover of Activision by Microsoft, Sony is already being shaken up",
"url": "https://gettotext.com/following-the-takeover-of-activision-by-microsoft-sony-is-already-being-shaken-up/"
},
{
"description": "Also Read | Acquisition of Activision Blizzard by Microsoft: an opportunity born out of chaos An announcement of such a nature could only inspire a good number of analysts, whose\u2026 The post Microsoft\u2019s takeover of Activision Blizzard ignites analysts appeared first on The Latest News.",
"favicon_url": "https://static.tickertick.com/website_icons/gettotext.com.ico",
"id": "-14419799692027457",
"site": "gettotext.com",
"tags": [
"msft"
],
"time": 1642641042000,
"title": "Microsoft\u2019s takeover of Activision Blizzard ignites analysts",
"url": "https://gettotext.com/microsofts-takeover-of-activision-blizzard-ignites-analysts/"
},
{
"description": "Practical in-ears, mini speakers with long battery life or powerful boom boxes \u2013 the manufacturer Anker offers a suitable product for almost every situation. On Ebay and Amazon you can\u2026 The post Anker on Ebay and Amazon on offer: Inexpensive Soundcore 3, Motion Boom & Co appeared first on The Latest News.",
"favicon_url": "https://static.tickertick.com/website_icons/gettotext.com.ico",
"id": "5221754710166764872",
"site": "gettotext.com",
"tags": [
"amzn"
],
"time": 1642640469000,
"title": "Anker on Ebay and Amazon on offer: Inexpensive Soundcore 3, Motion Boom & Co",
"url": "https://gettotext.com/anker-on-ebay-and-amazon-on-offer-inexpensive-soundcore-3-motion-boom-co/"
},
{
"favicon_url": "https://static.tickertick.com/website_icons/trib.al.ico",
"id": "-3472956334378244458",
"site": "trib.al",
"tags": [
"goog"
],
"time": 1642640285000,
"title": "Google is forming a group dedicated to blockchain and related technologies under a newly appointed executive",
"url": "https://trib.al/nZz3omw"
},
{
"description": "Texas' attorney general on Wednesday sued Google, alleging the company asked local radio DJs to record personal endorsements for smartphones that they hadn't used or been provided.",
"favicon_url": "https://static.tickertick.com/website_icons/yahoo.com.ico",
"id": "9095247150673486907",
"site": "yahoo.com",
"tags": [
"goog"
],
"time": 1642639680000,
"title": "Texas sues Google over local radio ads for its smartphones",
"url": "https://finance.yahoo.com/m/b44151c6-7276-30d9-bc62-bfe18c6297be/texas-sues-google-over-local.html?.tsrc=rss"
}
]
}
...how can I write the 'stories' list of dictionaries to one csv file, such that the keys are the header row, and the values are all the rest of the rows. Note, that some keys don't appear in ALL of the records (example, some story dictionaries don't have a 'description' key, and some do).
Psuedo might include:
Get all keys in the 'stories' list and assign those as the df's header
Iterate through each story in the 'stories' list and append the appropriate rows, leaving a nan if there isn't a matching key for every column
Looking for a pythonic way of doing this relatively quickly.
UPDATE
Trying this:
# Save to excel file
with open("newsheadlines.csv", "wt") as fp:
writer = csv.writer(fp, delimiter=",")
# writer.writerow(["your", "header", "foo"]) # write header
writer.writerows(response['stories'])
...gives this output
Does that help?
Simplest "pythonic" way to do so is by the pandas package.
import pandas as pd
pd.DataFrame(d["stories"]).to_csv('tmp.csv')
# To retrieve it
stories = pd.read_csv('tmp.csv', index_col=0)

loop through a list of dictionaries, perform function, and append result to csv

I trying to loop over a list containing Twitter data in a json format. The list is made of several dictionaries each containing data on a politician. The code works if the input json_response only holds data on one politician. However, when json_response is list of dictionaries i get an error.
In short, I believe the issue can be isolated to three for-loops in the code for tweet in json_response['data']:, for dics in json_response['includes']['users']:, and for element in json_response['includes']['media']:.
# Inputs for the request
bearer_token = auth()
headers = create_headers(bearer_token)
keyword = search_query
start_time = "2016-03-01T00:00:00.000Z"
end_time = "2021-03-31T00:00:00.000Z"
max_results = 3000
json_response = [] # empty list that will hold tweet objects
for i in keyword: # loop through list of politicians in keyword i.e. search query and extract tweets
url = create_url(i, start_time, end_time, max_results)
json_response.append(connect_to_endpoint(url[0], headers, url[1]))
pass
I have only pasted the json_response object for 2 out of 30 politicians due cap on characters. However, the structure is the same for the remaining 28 politicians.
print(json.dumps(json_response, indent=4, sort_keys=True)) # look at json_response object.
[
{
"data": [
{
"author_id": "2877379617",
"created_at": "2021-03-25T12:11:14.000Z",
"id": "1375057688355336195",
"text": "#prettynobodyco She blocked me in 2015 - for pointing out that Tim Kaine enables sexual assault in the military and the evidence was his killing of the MJIA and publicly stated that Military commanders should remain in charge of military rape cases. She's Tanden level awful. Congrats!"
},
{
"author_id": "1265018154444562440",
"created_at": "2021-03-22T19:48:59.000Z",
"id": "1374085719472361474",
"text": "#MehcatCat #AlasscanIsBack #PattyArquette #timkaine Funny, they blocked me. \ud83e\udd23\ud83e\udd23"
},
{
"author_id": "2378324935",
"created_at": "2021-03-07T21:32:13.000Z",
"id": "1368675879312887810",
"text": "#DrWinarick #KatieOGrady4 I apologize for any drama. Katie O Grady blocked me because we had a disagreement about Tim Kaine on one of your older posts. I guess I can't please everyone haha. :/"
},
{
"author_id": "821870502943817729",
"created_at": "2021-02-12T23:53:59.000Z",
"id": "1360376637385244673",
"text": "She blocked me a long ass time ago when I asked her why we shoulf care about Tim Kaine's personal view on abortion if it didn't impact legislation"
},
{
"attachments": {
"media_keys": [
"16_1341045032732770306"
]
},
"author_id": "17232340",
"created_at": "2020-12-21T15:37:07.000Z",
"id": "1341045038420275205",
"text": "#DSingh4Biden #moomintroll8 #timkaine #GovernorVA That's why I replied to you. She blocked me previously, for what silliness I can't remember. Tough being a troll AND a snowflake!"
}
],
"includes": {
"media": [
{
"media_key": "16_1341045032732770306",
"type": "animated_gif"
}
],
"users": [
{
"created_at": "2014-11-15T02:23:57.000Z",
"description": "",
"id": "2877379617",
"name": "Laura Saylor",
"username": "lauraleesaylor"
},
{
"created_at": "2020-05-25T20:33:36.000Z",
"description": "Weird Writer & Lunatic Linguist\nWicked Witch of the East\nshe/her",
"id": "1265018154444562440",
"name": "Zauberkind",
"username": "Zauberkind2"
},
{
"created_at": "2014-03-08T07:22:31.000Z",
"description": "#Resist, #BLM, #Vaxxed, liberal, autistic, kidney transplant survivor, political nerd, mental health advocate, fighter for equality, truth, justice, etc.",
"id": "2378324935",
"name": "Trevor \"Trev\" McKee Achilles",
"username": "MrTAchilles"
},
{
"created_at": "2017-01-19T00:02:52.000Z",
"description": "statist / Progressive Gun Nut/ Single and hating it\n\n / \n\nstraight????? /\n\npronouns / brain worm survivor\n\n \n",
"id": "821870502943817729",
"name": "Squirrel Dad",
"username": "nihilisticpillo"
},
{
"created_at": "2008-11-07T15:09:46.000Z",
"description": "Liberal-Veteran-Dog Lover | Taste for irony, but in moderation | Humor is reason gone mad. ~Groucho Marx | I follow & unfollow back #VeteransResist #Resist",
"id": "17232340",
"name": "anti-Fascist Jim",
"username": "JimnBL"
}
]
},
"meta": {
"newest_id": "1375057688355336195",
"next_token": "b26v89c19zqg8o3foseug43lzoqdft4ghg78o9sn9ds3h",
"oldest_id": "1341045038420275205",
"result_count": 5
}
},
{
"data": [
{
"author_id": "1248251899884814336",
"created_at": "2021-03-27T13:36:45.000Z",
"id": "1375803982409576450",
"text": "#gavinjeffries0 #steven86026859 #MSNBC #SenBooker Uh Oh our friend Steve blocked me, I guess not being able to answer your simple question and being asked to was too much for him."
},
{
"author_id": "293104735",
"created_at": "2021-02-07T21:45:47.000Z",
"id": "1358532435122683904",
"text": "#slwilliams1101 #annabella313 #CrossConnection #TiffanyDCross #Scaramucci #JoyAnnReid #CapehartJ #MSNBC #SenBooker #AliVelshi I stopped watching #TiffanyDCross as well and only watch #CapehartJ now (even though he blocked me in 2016 because I had a \"strong\" response to something mean he said about Hillary Clinton)."
},
{
"author_id": "380970864",
"created_at": "2021-02-07T20:58:01.000Z",
"id": "1358520416273326081",
"text": "#annabella313 #CrossConnection #TiffanyDCross #Scaramucci #JoyAnnReid #CapehartJ #MSNBC After I criticized #TiffanyDCross she blocked me. #JoyAnnReid called herself petty during and interview with #SenBooker. Why be petty? Be mature and thoughtful so people can learn. Hosts need to learn too. I only watch #AliVelshi #CapehartJ now."
},
{
"attachments": {
"media_keys": [
"3_1358448920632909825"
]
},
"author_id": "793175035322171397",
"created_at": "2021-02-07T16:17:44.000Z",
"id": "1358449876565164034",
"text": "#FinstaManhattan #SenSchumer #SenBooker #RonWyden Lmao he blocked me over that. His bio said he likes to 'debate & that sometimes he's wrong but he can admit that'.\n\nGuess not.\n\nI wasn't rude or mean at all. This is too funny \ud83e\udd23"
},
{
"author_id": "752266160352010241",
"created_at": "2021-02-06T20:34:06.000Z",
"id": "1358152008948195328",
"text": "#fattypinner #tkbone32221 #SenSchumer #SenBooker #RonWyden He blocked me \ud83e\udd23\ud83d\ude2d\ud83e\udd23\ud83e\udd23\ud83e\udd23\ud83d\ude2d"
}
],
"includes": {
"media": [
{
"media_key": "3_1358448920632909825",
"type": "photo",
"url": ""
}
],
"users": [
{
"created_at": "2020-04-09T14:11:04.000Z",
"description": "",
"id": "1248251899884814336",
"name": "Firstcomm",
"username": "Firstcomm1"
},
{
"created_at": "2011-05-04T19:26:22.000Z",
"description": "Cinephile, balletomane, book lover, tennis fan, K-Drama fanatic, Jang Na-ra fangirl, USC School of Cinematic Arts alumna, Hillary Clinton and Nancy Pelosi Dem.",
"id": "293104735",
"name": "Joyce Tyler",
"username": "joyce_tyler"
},
{
"created_at": "2011-09-27T14:50:37.000Z",
"description": "Spelman College, BA, George Washington University MA, University of South Florida Ph.D. in Political Science, proud Ted Kennedy, Obama, Biden/Harris Democrat!",
"id": "380970864",
"name": "Stephanie L. Williams, Ph.D.",
"username": "slwilliams1101"
},
{
"created_at": "2016-10-31T19:37:19.000Z",
"description": "Loves: life, fam, cats, cars, tattoos, reality TV; collector of t-shirts & Volkswagen\u2019s. Hates: Oxford commas. #CombatVet #Medic #BidenHarris2020 #Resist",
"id": "793175035322171397",
"name": "Que Sarah Sarah \ud83d\udda4",
"username": "sarahalli13"
},
{
"created_at": "2016-07-10T22:20:03.000Z",
"description": "3x Hollywood Video Street Fighter 2 Champion",
"id": "752266160352010241",
"name": "Sugarcoder",
"username": "TheSugarCoder"
}
]
},
"meta": {
"newest_id": "1375803982409576450",
"next_token": "b26v89c19zqg8o3fosktkdplqiw2q9kzx2ibm4r4y27wd",
"oldest_id": "1358152008948195328",
"result_count": 5
}
}
...28 other politicians
# Create file
csvFile = open("tweet_sample.csv", "a", newline="", encoding='utf-8')
csvWriter = csv.writer(csvFile)
# Create headers for the data I want to save. I only want to save these columns in my dataset
csvWriter.writerow(
['author id', 'created_at', 'id', 'tweet', 'bio', 'image_url'])
csvFile.close()
def append_to_csv(json_response, fileName):
# A counter variable
global created_at, tweet_id, bio, text, author_id
counter = 0
# Open OR create the target CSV file
csvFile = open(fileName, "a", newline="", encoding='utf-8')
csvWriter = csv.writer(csvFile)
# Loop through each tweet
for tweet in json_response[0]['data']: # NOTE adding a 0 gives access to the data for the first politician while adding 1 gives access to data for the second politician and so on...
# 1. Author ID
author_id = tweet['author_id']
# 2. Time created
created_at = dateutil.parser.parse(tweet['created_at'])
# 3. Tweet ID
tweet_id = tweet['id']
# 4. Tweet text
text = tweet['text']
for dics in json_response[0]['includes']['users']: # NOTE 0 added
# 5. description. Contained in includes data object
if ('description' in dics):
bio = dics['description']
else:
bio = " "
for element in json_response[0]['includes']['media']: # NOTE 0 added
# 6. image url. Contained in includes data object
if ('url' in element):
image_url = element['url']
else:
image_url = " "
# Assemble all data in a list
res = [author_id, created_at, tweet_id, text, bio, image_url]
# Append the result to the CSV file
csvWriter.writerow(res)
counter += 1
# When done, close the CSV file
csvFile.close()
# Print the number of tweets for this iteration
print("# of Tweets added from this response: ", counter)
append_to_csv(json_response, "tweet_sample.csv") # Save tweet data in a csv file
Error message:
TypeError: list indices must be integers or slices, not str
By adding the [0] in the loop I avoid the TypeError above. However the output from the function append_to_csv is not ideal as it only includes the last tweet for the first politician. I guess my loop overwrites data.
Desired output would be a data frame with columns author_id, created_at, id, tweet, bio, image_url. Not all users have a bio on their profile or an image_url in their tweet hence the if-else statement in the function above and the bio, no_bio and bio, image_url, no_image_url in the desired data frame.
pol_df = pd.read_csv("path_to_tweet_sample.csv" )
pol_df.head()
author_id created_at id tweet bio image_url
0 737885223858384896 2021-03-26T21:56:02.000Z 1375567243082338314 tweet_text no_bio no_image_url
1 847612931487416323 2021-03-26T21:55:24.000Z 1375567083791073283 tweet_text no_bio no_image_url
2 18634205 2021-03-08T12:29:00.000Z 1368901564363051010 tweet_text bio image_url
3 27327319 2021-03-02T11:53:16.000Z 1366718245521211393 tweet_text bio no_image_url
4 917634626247647232 2021-02-28T18:16:45.000Z 1366089974907432961 tweet_text bio image_url
I think you are confusing lists with dicts. When you try to access a list like a dict (e.g. data["author_id"]) the TypeError you're getting will be raised. You have to iterate over a list and then try to access each dict in that list like [x['author_id'] for x in data], for example. If you want to extract values from the dicts and write it to a csv file you might want to do something like this:
import pandas as pd
author_data = []
for data in resp:
for author in data['data']:
author_id = author['author_id']
created_at = author['created_at']
another_id = author['id']
tweet_text = author['text']
author_data.append([author_id, created_at, another_id, tweet_text])
author_df = pd.DataFrame(author_data, columns=['author_id', 'created_at', 'id', 'text'])
media_data = []
for data in resp:
for media in data['includes']['media']:
url = media.get('url', 'no_url')
media_data.append(media)
media_df = pd.DataFrame(media_data, columns=['url'])
bio_data = []
for data in resp:
for user in data['includes']['users']:
bio = user['description']
author_id = user['id']
bio_data.append([bio, author_id])
bio_df = pd.DataFrame(bio_data, columns=['bio', 'author_id'])
final_df = author_df.merge(bio_df, on="author_id")
print(final_df)
You have to save different parts of the data in different dataframes and then merge them. The thing is that media does not contain the author_id or another key that is shared between the ['includes']['media'] part and ['data'] part so you cannot merge that.

Is it possible to retreive object_story_spec for an ad creative which was not created with that object_story_spec? [Python Facebook API]

I have a set of ad creatives that I retreive through the Facebook Business Python SDK. I need these specifically to retreive the outbound URL when someone clicks on the ad: AdCreative['object_story_spec']['video_data']['call_to_action']['value']['link'].
I use the following call:
adcreatives = set.get_ad_creatives(fields=[
AdCreative.Field.id,
AdCreative.Field.name,
AdCreative.Field.object_story_spec,
AdCreative.Field.effective_object_story_id ,
])
Where set is an ad set.
For some cases, the result looks like this (with actual data removed), which is expected:
<AdCreative> {
"body": "[<BODY>]",
"effective_object_story_id": "[<EFFECTIVE_OBJECT_STORY_ID>]",
"id": "[<ID>]",
"name": "[<NAME>]",
"object_story_spec": {
"instagram_actor_id": "[<INSTAGRAM_ACTOR_ID>]",
"page_id": "[<PAGE_ID>]",
"video_data": {
"call_to_action": {
"type": "[<TYPE>]",
"value": {
"link": "[<LINK>]", <== This is what I need
"link_format": "[<LINK_FORMAT>]"
}
},
"image_hash": "[<IMAGE_HASH>]",
"image_url": "[<IMAGE_URL>]",
"message": "[<MESSAGE>]",
"video_id": "[<VIDEO_ID>]"
}
}
}
While sometimes results look like this:
<AdCreative> {
"effective_object_story_id": "[<EFFECTIVE_OBJECT_STORY_ID>]",
"id": "[<ID>]",
"name": "[<NAME>]",
"object_story_spec": {
"instagram_actor_id": "[<INSTAGRAM_ACTOR_ID>]",
"page_id": "[<PAGE_ID>]"
}
}
According to this earlier question: Can't get AdCreative ObjectStorySpec this is due to the fact that the object_story_spec is not populated if it is linked to a creative, instead of created along with the creative.
However, the video_data (and as such, the link), should be saved somewhere. Is there a way to retreive this? Maybe through effective_object_story_id?
The documentation page for object_story_spec (https://developers.facebook.com/docs/marketing-api/reference/ad-creative-object-story-spec/v12.0) does not have the information I am looking for.

Flask If Statement - Range for list index

customer_data.json (loaded as customer_data)
{
"customers": [
{
"username": "anonymous",
"id": "1234",
"password": "12341234",
"email": "1234#gmail.com",
"status": false,
"books": [
"Things Fall Apart",
"Fairy Tales",
"Divine Comedy"
]
}
]
}
Example post in new_catalog.json. (loaded as posts)
{
"books": [
{
"author": "Chinua Achebe",
"country": "Nigeria",
"language": "English",
"link": "https://en.wikipedia.org/wiki/Things_Fall_Apart\n",
"pages": 209,
"title": "Things Fall Apart",
"year": 1958,
"hold": false
}
}
Necessary code in Flask_practice.py
for customer in customer_data['customers']:
if len(customer['books']) > 0:
for book in customer['books']:
holds.append(book)
for post in posts['books']:
if post['title'] == holds[range(len(holds))]:
matching_posts['books'].append(post)
holds[range(len(holds))] doesn't work.
I am trying to go through each of the books in holds using holds[0], holds[1] etc and test to see if the title is equal to a book title in new_catalog.json.
I'm still new to Flask, Stack Overflow and coding in general so there may be a really simple solution to this problem.
I am trying to go through each of the books in holds using holds[0], holds[1] etc and test to see if the title is equal to a book title
Translated almost literally to Python:
# For each post...
for post in posts['books']:
# ...go through each of the books in `holds`...
for hold in holds:
# ...and see if the title is equal to a book title
if post['title'] == hold:
matching_posts['books'].append(post)
Or, if you don't want to append(post) for each item in holds:
for post in posts['books']:
if post['title'] in holds:
matching_posts['books'].append(post)

Categories

Resources