Search via Python Search API timing out intermittently

Search via Python Search API timing out intermittently - python

We have an application that is basically just a form submission for requesting a team drive to be created. It's hosted on Google App Engine.
This timeout error is coming from a single field in the form that simply does typeahead for an email address. All of the names on the domain are indexed in the datastore, about 300k entities - nothing is being pulled directly from the directory api. After 10 seconds of searching (via the Python Google Search API), it will time out. This is currently intermittent, but errors have been increasing in frequency.
Error: line 280, in get_result raise _ToSearchError(e) Timeout: Failed to complete request in 9975ms
Essentially, speeding up the searches will resolve. I looked at the code and I don't believe there is any room for improvement there. I am not sure if increasing the instance class will improve this, it is currently an F2. Or if perhaps there is another way to improve the index efficiency. I'm not entirely sure how one would do that however. Any thoughts would be appreciated.
Search Code:
class LookupUsersorGrpService(object):
'''
lookupUsersOrGrps accepts various params and performs search
'''
def lookupUsersOrGrps(self,params):
search_results_json = {}
search_results = []
directory_users_grps = GoogleDirectoryUsers()
error_msg = 'Technical error'
query = ''
try:
#Default few values if not present
if ('offset' not in params) or (params['offset'] is None):
params['offset'] = 0
else:
params['offset'] = int(params['offset'])
if ('limit' not in params) or (params['limit'] is None):
params['limit'] = 20
else:
params['limit'] = int(params['limit'])
#Search related to field name
query = self.appendQueryParam(q=query, p=params, qname='search_name', criteria=':', pname='query', isExactMatch=True,splitString=True)
#Search related to field email
query = self.appendQueryParam(q=query, p=params, qname='search_email', criteria=':', pname='query', isExactMatch=True, splitString=True)
#Perform search
log.info('Search initialized :\"{}\"'.format(query) )
# sort results by name ascending
expr_list = [search.SortExpression(expression='name', default_value='',direction=search.SortExpression.ASCENDING)]
# construct the sort options
sort_opts = search.SortOptions(expressions=expr_list)
#Prepare the search index
index = search.Index(name= "GoogleDirectoryUsers",namespace="1")
search_query = search.Query(
query_string=query.strip(),
options=search.QueryOptions(
limit=params['limit'],
offset=params['offset'],
sort_options=sort_opts,
returned_fields = directory_users_grps.get_search_doc_return_fields()
))
#Execute the search query
search_result = index.search(search_query)
#Start collecting the values
total_cnt = search_result.number_found
params['limit'] = len(search_result.results)
#Prepare the response object
for teamdriveDoc in search_result.results:
teamdriveRecord = GoogleDirectoryUsers.query(GoogleDirectoryUsers.email==teamdriveDoc.doc_id).get()
if teamdriveRecord:
if teamdriveRecord.suspended == False:
search_results.append(teamdriveRecord.to_dict())
search_results_json.update({"users" : search_results})
search_results_json.update({"limit" : params['limit'] if len(search_results)>0 else '0'})
search_results_json.update({"total_count" : total_cnt if len(search_results)>0 else '0'})
search_results_json.update({"status" : "success"})
except Exception as e:
log.exception("Error in performing search")
search_results_json.update({"status":"failed"})
search_results_json.update({"description":error_msg})
return search_results_json
''' Retrieves the given param from dict and adds to query if exists
'''
def appendQueryParam(self, q='', p=[], qname=None, criteria='=', pname=None,
isExactMatch = False, splitString = False, defaultValue=None):
if (pname in p) or (defaultValue is not None):
if len(q) > 0:
q += ' OR '
q += qname
if criteria:
q += criteria
if defaultValue is None:
val = p[pname]
else:
val = defaultValue
if splitString:
val = val.replace("", "~")[1: -1]
#Helps to retain passed argument as it is, example email
if isExactMatch:
q += "\"" +val + "\""
else:
q += val
return q

An Index instance's search method accepts a deadline parameter, so you could use that to increase the time that you are willing to wait for the search to respond:
search_result = index.search(search_query, deadline=30)
The documentation doesn't specify acceptable value for deadline, but other App Engine services tend to accept values up to 60 seconds.

Related

What is the best way to return a variable or call a function to maximize code reuse?

I was wondering if i could get some input from some season python exports, i have a couple questions
I am extracting data from an api request and calculating the total vulnerabilities,
what is the best way i can return this data so that i can call it in another function
what is the way i can add up all the vulnerabilities (right now its just adding it per 500 at a time, id like to do the sum of every vulnerability
def _request():
third_party_patching_filer = {
"asset": "asset.agentKey IS NOT NULL",
"vulnerability" : "vulnerability.categories NOT IN ['microsoft patch']"}
headers = _headers()
print(headers)
url1 = f"https://us.api.insight.rapid7.com/vm/v4/integration/assets"
resp = requests.post(url=url1, headers=headers, json=third_party_patching_filer, verify=False).json()
jsonData = resp
#print(jsonData)
has_next_cursor = False
nextKey = ""
if "cursor" in jsonData["metadata"]:
has_next_cursor = True
nextKey = jsonData["metadata"]["cursor"]
while has_next_cursor:
url2 = f"https://us.api.insight.rapid7.com/vm/v4/integration/assets?&size=500&cursor={nextKey}"
resp2 = requests.post(url=url2, headers=headers, json=third_party_patching_filer, verify=False).json()
cursor = resp2["metadata"]
print(cursor)
if "cursor" in cursor:
nextKey = cursor["cursor"]
print(f"next key {nextKey}")
#print(desktop_support)
for data in resp2["data"]:
for tags in data['tags']:
total_critical_vul_osswin = []
total_severe_vul_osswin = []
total_modoer_vuln_osswin = []
if tags["name"] == 'OSSWIN':
print("OSSWIN")
critical_vuln_osswin = data['critical_vulnerabilities']
severe_vuln_osswin = data['severe_vulnerabilities']
modoer_vuln_osswin = data['moderate_vulnerabilities']
total_critical_vul_osswin.append(critical_vuln_osswin)
total_severe_vul_osswin.append(severe_vuln_osswin)
total_modoer_vuln_osswin.append(modoer_vuln_osswin)
print(sum(total_critical_vul_osswin))
print(sum(total_severe_vul_osswin))
print(sum(total_modoer_vuln_osswin))
if tags["name"] == 'DESKTOP_SUPPORT':
print("Desktop")
total_critical_vul_desktop = []
total_severe_vul_desktop = []
total_modorate_vuln_desktop = []
critical_vuln_desktop = data['critical_vulnerabilities']
severe_vuln_desktop = data['severe_vulnerabilities']
moderate_vuln_desktop = data['moderate_vulnerabilities']
total_critical_vul_desktop.append(critical_vuln_desktop)
total_severe_vul_desktop.append(severe_vuln_desktop)
total_modorate_vuln_desktop.append(moderate_vuln_desktop)
print(sum(total_critical_vul_desktop))
print(sum(total_severe_vul_desktop))
print(sum(total_modorate_vuln_desktop))
else:
pass
else:
has_next_cursor = False

If you have a lot of parameters to pass, consider using a dict to combine them. Then you can just return the dict and pass it along to the next function that needs that data. Another approach would be to create a class and either access the variables directly or have helper functions that do so. The latter is a cleaner solution vs a dict, since with a dict you have to quote every variable name, and with a class you can easily add additional functionally beyond just being a container for a bunch of instance variables.
If you want the total across all the data, you should put these initializations:
total_critical_vul_osswin = []
total_severe_vul_osswin = []
total_modoer_vuln_osswin = []
before the while has_next_cursor loop (and similarly for the desktop totals). The way your code is currently, they are initialized each cursor (ie, each 500 samples based on the URL).

Multiprocessing In Django Function

Is it possible to use multi processing in Django on a request.
#so if I send a request to http://127.0.0.1:8000/wallet_verify
def wallet_verify(request):
walelts = botactive.objects.all()
#here I check if the user want to be included in the process or not so if they set it to True then i'll include them else ignore.
for active in walelts:
check_active = active.active
if check_active == True:
user_is_active = active.user
#for the ones that want to be included I then go to get their key data.
I need to get both api and secret so then I loop through to get the data from active users.
database = Bybitapidatas.objects.filter(user=user_is_active)
for apikey in database:
apikey = apikey.apikey
for apisecret in database:
apisecret = apisecret.apisecret
#since I am making a request to an exchange endpoint I can only include one API and secret at a time . So for 1 person at a time this is why I want to run in parallel.
for a, b in zip(list(Bybitapidatas.objects.filter(user=user_is_active).values("apikey")), list(Bybitapidatas.objects.filter(user=user_is_active).values("apisecret"))):
session =spot.HTTP(endpoint='https://api-testnet.bybit.com/', api_key=a['apikey'], api_secret=b['apisecret'])
#here I check to see if they have balance to open trades if they have selected to be included.
GET_USDT_BALANCE = session.get_wallet_balance()['result']['balances']
for i in GET_USDT_BALANCE:
if 'USDT' in i.values():
GET_USDT_BALANCE = session.get_wallet_balance()['result']['balances']
idx_USDT = GET_USDT_BALANCE.index(i)
GET_USDTBALANCE = session.get_wallet_balance()['result']['balances'][idx_USDT]['free']
print(round(float(GET_USDTBALANCE),2))
#if they don't have enough balance I skip the user.
if round(float(GET_USDTBALANCE),2) < 11 :
pass
else:
session.place_active_order(
symbol="BTCUSDT",
side="Buy",
type="MARKET",
qty=10,
timeInForce="GTC"
)
How can I run this process in parallel while looping through the database to also get data for each individual user.
I am still new to coding so hope I explained that it makes sense.
I have tried multiprocessing and pools but then I get that the app has not started yet and I have to run it outside of wallet_verify is there a way to do it in wallet_verify
and when I send the Post Request.
Any help appreciated.

Filtering the Database to get Users who have set it to True
Listi - [1,3](these are user ID's Returned
processess = botactive.objects.filter(active=True).values_list('user')
listi = [row[0] for row in processess]
Get the Users from the listi and perform the action.
def wallet_verify(listi):
# print(listi)
database = Bybitapidatas.objects.filter(user = listi)
print("---------------------------------------------------- START")
for apikey in database:
apikey = apikey.apikey
print(apikey)
for apisecret in database:
apisecret = apisecret.apisecret
print(apisecret)
start_time = time.time()
session =spot.HTTP(endpoint='https://api-testnet.bybit.com/', api_key=apikey, api_secret=apisecret)
GET_USDT_BALANCE = session.get_wallet_balance()['result']['balances']
for i in GET_USDT_BALANCE:
if 'USDT' in i.values():
GET_USDT_BALANCE = session.get_wallet_balance()['result']['balances']
idx_USDT = GET_USDT_BALANCE.index(i)
GET_USDTBALANCE = session.get_wallet_balance()['result']['balances'][idx_USDT]['free']
print(round(float(GET_USDTBALANCE),2))
if round(float(GET_USDTBALANCE),2) < 11 :
pass
else:
session.place_active_order(
symbol="BTCUSDT",
side="Buy",
type="MARKET",
qty=10,
timeInForce="GTC"
)
print ("My program took", time.time() - start_time, "to run")
print("---------------------------------------------------- END")
return HttpResponse("Wallets verified")
Verifyt is what I use for the multiprocessing since I don't want it to run without being requested to run. also initialiser starts apps for each loop
def verifyt(request):
with ProcessPoolExecutor(max_workers=4, initializer=django.setup) as executor:
results = executor.map(wallet_verify, listi)
return HttpResponse("done")
```

How can we list all the parameters in the aws parameter store using Boto3? There is no ssm.list_parameters in boto3 documentation?

SSM — Boto 3 Docs 1.9.64 documentation
get_parameters doesn't list all parameters?

For those who wants to just copy-paste the code:
import boto3
ssm = boto3.client('ssm')
parameters = ssm.describe_parameters()['Parameters']
Beware of the limit of max 50 parameters!

This code will get all parameters, by recursively fetching until there are no more (50 max is returned per call):
import boto3
def get_resources_from(ssm_details):
results = ssm_details['Parameters']
resources = [result for result in results]
next_token = ssm_details.get('NextToken', None)
return resources, next_token
def main()
config = boto3.client('ssm', region_name='us-east-1')
next_token = ' '
resources = []
while next_token is not None:
ssm_details = config.describe_parameters(MaxResults=50, NextToken=next_token)
current_batch, next_token = get_resources_from(ssm_details)
resources += current_batch
print(resources)
print('done')

You can use get_paginator api. find below example, In my use case i had to get all the values of SSM parameter store and wanted to compare it with a string.
import boto3
import sys
LBURL = sys.argv[1].strip()
client = boto3.client('ssm')
p = client.get_paginator('describe_parameters')
paginator = p.paginate().build_full_result()
for page in paginator['Parameters']:
response = client.get_parameter(Name=page['Name'])
value = response['Parameter']['Value']
if LBURL in value:
print("Name is: " + page['Name'] + " and Value is: " + value)

One of the responses from above/below(?) (by Val Lapidas) inspired me to expand it to this (as his solution doesn't get the SSM parameter value, and some other, additional details).
The downside here is that the AWS function client.get_parameters() only allows 10 names per call.
There's one referenced function call in this code (to_pdatetime(...)) that I have omitted - it just takes the datetime value and makes sure it is a "naive" datetime. This is because I am ultimately dumping this data to an Excel file using pandas, which doesn't deal well with timezones.
from typing import List, Tuple
from boto3 import session
from mypy_boto3_ssm import SSMClient
def ssm_params(aws_session: session.Session = None) -> List[dict]:
"""
Return a detailed list of all the SSM parameters.
"""
# -------------------------------------------------------------
#
#
# -------------------------------------------------------------
def get_parameter_values(ssm_client: SSMClient, ssm_details: dict) -> Tuple[list, str]:
"""
Retrieve additional attributes for the SSM parameters contained in the 'ssm_details'
dictionary passed in.
"""
# Get the details
ssm_param_details = ssm_details['Parameters']
# Just the names, ma'am
param_names = [result['Name'] for result in ssm_param_details]
# Get the parames, including the values
ssm_params_with_values = ssm_client.get_parameters(Names=param_names,
WithDecryption=True)
resources = []
result: dict
for result in ssm_params_with_values['Parameters']:
# Get the matching parameter from the `ssm_details` dict since this has some of the fields
# that aren't in the `ssm_params_with_values` returned from "get_arameters".
param_details = next((zz for zz in ssm_param_details if zz.get('Name', None) == result['Name']), {})
param_policy = param_details.get('Policies', None)
if len(param_policy) == 0:
param_policy = None
resources.append({
'Name': result['Name'],
'LastModifiedDate': to_pdatetime(result['LastModifiedDate']),
'LastModifiedUser': param_details.get('LastModifiedUser', None),
'Version': result['Version'],
'Tier': param_details.get('Tier', None),
'Policies': param_policy,
'ARN': result['ARN'],
'DataType': result.get('DataType', None),
'Type': result.get('Type', None),
'Value': result.get('Value', None)
})
next_token = ssm_details.get('NextToken', None)
return resources, next_token
# -------------------------------------------------------------
#
#
# -------------------------------------------------------------
if aws_session is None:
raise ValueError('No session.')
# Create SSM client
aws_ssm_client = aws_session.client('ssm')
next_token = ' '
ssm_resources = []
while next_token is not None:
# The "describe_parameters" call gets a whole lot of info on the defined SSM params,
# except their actual values. Due to this limitation let's call the nested function
# to get the values, and a few other details.
ssm_descriptions = aws_ssm_client.describe_parameters(MaxResults=10,
NextToken=next_token)
# This will get additional details for the params, including values.
current_batch, next_token = get_parameter_values(ssm_client=aws_ssm_client,
ssm_details=ssm_descriptions)
ssm_resources += current_batch
print(f'SSM Parameters: {len(ssm_resources)}')
return ssm_resources
pythonawsboto3amazon-web-services

There's no ListParameters only DescribeParameter, which lists all the paremeters, or you can set filters.
Boto3 Docs Link:
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ssm.html#SSM.Client.describe_parameters
AWS API Documentation Link:
https://docs.aws.amazon.com/systems-manager/latest/APIReference/API_DescribeParameters.html

You can use get_parameters() and get_parameters_by_path().

Use paginators.
paginator = client.get_paginator('describe_parameters')
More information here.

python-ldap unable to do any basic search queries on open server

I have been trying to do some basic search queries, but I am unable to connect to an open LDAP server regardless. I tried a couple of servers, and none of them worked. I used Apache Directory Studio to make sure that the keyword was there but it did not work either way. I tried a variety of different code from different sources.
This was the first one I used
:
https://www.linuxjournal.com/article/6988
import ldap
keyword = "boyle"
def main():
server = "ldap.forumsys.com"
username = "cn=read-only-admin,dc=example,dc=com"
password = "password"
try:
l = ldap.open(server)
l.simple_bind_s(username,password)
print "Bound to server . . . "
l.protocol_version = ldap.VERSION3
print "Searching . . ."
mysearch (l,keyword)
except ldap.LDAPError:
print "Couldnt connect"
def mysearch(l, keyword):
base = ""
scope = ldap.SCOPE_SUBTREE
filter = "cn=" + "*" + keyword + "*"
retrieve_attributes = None
count = 0
result_set = []
timeout = 0
try:
result_id = l.search(base, scope, filter, retrieve_attributes)
while l != 1:
result_id = l.search(base, scope,filter, retrieve_attributes)
result_type, result_data = l.result(result_id, timeout)
if result_data == []:
break
else:
if result_type == ldap.RES_SEARCH_ENTRY:
result_set.append(result_data)
if len (result_set = 0):
print "No Results"
for i in range (len(result_set)):
for entry in result_set[i]:
try:
name = entry[1]['cn'][0]
mail = entry[1]['mail'][0]
#phone = entry[1]['telephonenumber'][0]
#desc = entry[1]['description'][0]
count = count + 1
print name + mail
except:
pass
except ldap.LDAPError, error_message:
print error_message
main()
Every time I ran this program, I received an error
{'desc': u"No such object"}
I also tried this
import ldap
try:
l = ldap.open("ldap.example.com")
except ldap.LDAPError, e:
print e
base_dn = "cn=read-only-admin,dc=example,dc=com"
search_scope = ldap.SCOPE_SUBTREE
retrieve_attributes = None
search_filter = "uid=myuid"
try:
l_search = l.search(base_dn, search_scope, search_filter, retrieve_attributes)
result_status, result_data = l.result(l_search, 0)
print result_data
except ldap.LDAPError, e:
print e
The error on this one was
{'desc': u"Can't contact LDAP server"}
I spent about 5 hours trying to figure this out. I would really appreciate it if you guys could give me some advice. Thanks.

There are several bogus things in there.
I will only comment your first code sample because it can be used by anyone with that public LDAP server.
l = ldap.open(server)
Function ldap.open() is deprecated since many years. You should use function ldap.initialize() with LDAP URI as argument instead like this:
l = ldap.initialize("ldap://ldap.forumsys.com")
l_search = l.search(..)
This is the asynchronous method which just returns a message ID (int) of the underlying OpenLDAP C API (libldap). It's needed if you want to retrieve extended controls returned by the LDAP server along with search results. Is that what you want?
As a beginner you probably want to use the simpler method LDAPObject.search_s() which immediately returns a list of (DN, entry) 2-tuples.
See also: python-ldap -- Sending LDAP requests
while l != 1
This does not make sense at all because l is your LDAPObject instance (LDAP connection object). Note that LDAPObject.search() would raise an exception if it gets an Integer error code from OpenLDAP's libldap. No need to do C-style error checks at this level.
filter = "cn=" + "" + keyword + ""
If keyword can be arbitrary input this is a prone to LDAP injection attacks. Don't do that.
For adding arbitrary input into a LDAP filter use function ldap.filter.escape_filter_chars() to properly escape special characters. Also avoid using variable name filter because it's the name of a built-in Python function and properly enclose the filter in parentheses.
Better example:
ldap_filter = "(cn=*%s*)" % (ldap.filter.escape_filter_chars(keyword))
base = ""
The correct search base you have to use is:
base = "dc=example,dc=com"
Otherwise ldap.NO_SUCH_OBJECT is raised.
So here's a complete example:
import pprint
import ldap
from ldap.filter import escape_filter_chars
BINDDN = "cn=read-only-admin,dc=example,dc=com"
BINDPW = "password"
KEYWORD = "boyle"
ldap_conn = ldap.initialize("ldap://ldap.forumsys.com")
ldap_conn.simple_bind_s(BINDDN, BINDPW)
ldap_filter = "(cn=*%s*)" % (ldap.filter.escape_filter_chars(KEYWORD))
ldap_results = ldap_conn.search_s(
"dc=example,dc=com",
ldap.SCOPE_SUBTREE,
ldap_filter,
)
pprint.pprint(ldap_results)

Spotipy: How to read more than 100 tracks from a playlist

I'm trying to pull all tracks in a certain playlist using the Spotipy library for python.
The user_playlist_tracks function is limited to 100 tracks, regardless of the parameter limit. The Spotipy documentation describes it as:
user_playlist_tracks(user, playlist_id=None, fields=None, limit=100,
offset=0, market=None)
Get full details of the tracks of a playlist
owned by a user.
Parameters:
user
the id of the user playlist_id
the id of the playlist fields
which fields to return limit
the maximum number of tracks to return offset
the index of the first track to return market
an ISO 3166-1 alpha-2 country code.
After authenticating with Spotify, I'm currently using something like this:
username = xxxx
playlist = #fromspotipy
sp_playlist = sp.user_playlist_tracks(username, playlist_id=playlist)
tracks = sp_playlist['items']
print tracks
Is there a way to return more than 100 tracks? I've tried setting the limit=None in the function parameters, but it returns an error.

Many of the spotipy methods return paginated results, so you will have to scroll through them to view more than just the max limit. I've encountered this most often when collecting a playlist's full track listing and consequently created a custom method to handle this:
def get_playlist_tracks(username,playlist_id):
results = sp.user_playlist_tracks(username,playlist_id)
tracks = results['items']
while results['next']:
results = sp.next(results)
tracks.extend(results['items'])
return tracks

I wrote a function that can output Panda's DataFrame where it pulls all the metadata (not all of it because I didn't want to, but you can make some space for that) for playlists over 100 songs. I do it by iterating over every song, finding the metadata for each, saving the metadata to a dictionary, and then concatenating the dictionary to the DataFrame. It takes your username and the Playlist ID as input.
# Function to extract MetaData from a playlist thats longer than 100 songs
def get_playlist_tracks_more_than_100_songs(username, playlist_id):
results = sp.user_playlist_tracks(username,playlist_id)
tracks = results['items']
while results['next']:
results = sp.next(results)
tracks.extend(results['items'])
results = tracks
playlist_tracks_id = []
playlist_tracks_titles = []
playlist_tracks_artists = []
playlist_tracks_first_artists = []
playlist_tracks_first_release_date = []
playlist_tracks_popularity = []
for i in range(len(results)):
print(i) # Counter
if i == 0:
playlist_tracks_id = results[i]['track']['id']
playlist_tracks_titles = results[i]['track']['name']
playlist_tracks_first_release_date = results[i]['track']['album']['release_date']
playlist_tracks_popularity = results[i]['track']['popularity']
artist_list = []
for artist in results[i]['track']['artists']:
artist_list= artist['name']
playlist_tracks_artists = artist_list
features = sp.audio_features(playlist_tracks_id)
features_df = pd.DataFrame(data=features, columns=features[0].keys())
features_df['title'] = playlist_tracks_titles
features_df['all_artists'] = playlist_tracks_artists
features_df['popularity'] = playlist_tracks_popularity
features_df['release_date'] = playlist_tracks_first_release_date
features_df = features_df[['id', 'title', 'all_artists', 'popularity', 'release_date',
'danceability', 'energy', 'key', 'loudness',
'mode', 'acousticness', 'instrumentalness',
'liveness', 'valence', 'tempo',
'duration_ms', 'time_signature']]
continue
else:
try:
playlist_tracks_id = results[i]['track']['id']
playlist_tracks_titles = results[i]['track']['name']
playlist_tracks_first_release_date = results[i]['track']['album']['release_date']
playlist_tracks_popularity = results[i]['track']['popularity']
artist_list = []
for artist in results[i]['track']['artists']:
artist_list= artist['name']
playlist_tracks_artists = artist_list
features = sp.audio_features(playlist_tracks_id)
new_row = {'id':[playlist_tracks_id],
'title':[playlist_tracks_titles],
'all_artists':[playlist_tracks_artists],
'popularity':[playlist_tracks_popularity],
'release_date':[playlist_tracks_first_release_date],
'danceability':[features[0]['danceability']],
'energy':[features[0]['energy']],
'key':[features[0]['key']],
'loudness':[features[0]['loudness']],
'mode':[features[0]['mode']],
'acousticness':[features[0]['acousticness']],
'instrumentalness':[features[0]['instrumentalness']],
'liveness':[features[0]['liveness']],
'valence':[features[0]['valence']],
'tempo':[features[0]['tempo']],
'duration_ms':[features[0]['duration_ms']],
'time_signature':[features[0]['time_signature']]
}
dfs = [features_df, pd.DataFrame(new_row)]
features_df = pd.concat(dfs, ignore_index = True)
except:
continue
return features_df

Another way around it would be to write a for loop and do:
offset +=100
then you could concatenate the tracks at the end, or put them in a data frame.
Function Ref:
playlist_tracks(playlist_id, fields=None, limit=100, offset=0, market=None)
Reference: https://spotipy.readthedocs.io/en/2.7.0/#spotipy.client.Spotify.playlist_tracks

Below is the user_playlist_tracks module used in spotipy. (notice it defaults to 100 limit).
Try setting the limit to 200.
def user_playlist_tracks(self, user, playlist_id = None, fields=None,
limit=100, offset=0):
''' Get full details of the tracks of a playlist owned by a user.
Parameters:
- user - the id of the user
- playlist_id - the id of the playlist
- fields - which fields to return
- limit - the maximum number of tracks to return
- offset - the index of the first track to return
'''
plid = self._get_id('playlist', playlist_id)
return self._get("users/%s/playlists/%s/tracks" % (user, plid),
limit=limit, offset=offset, fields=fields)

When trying the above solutions I got key error messages. I eventually figured it out. Here is my solution. This is only for displaying tracks/artists on the next pages.
id = "5lrkIjzukk65X4ksulpA0H?si=9db60a70278a4fd6"
results = sp.playlist_items(id)
tracks = results['tracks']
next_pages = 14
track_list = []
for i in range(next_pages):
tracks = sp.next(tracks)
for y in range(0,100):
try:
track = tracks['items'][y]['track']['name']
artist = tracks['items'][y]['track']['artists'][0]['name']
track_list.append(artist)
except:
continue
print(track_list)

It's unfortunate SpotiPy makes their API Access to complicated. Try using SpotifyR in r, and you can accomplish this in a few lines of code. No loops, lists, extra variables, or appending required. Then just pop it back into python if you'd like.
library(spotifyr)
df <- get_playlist_audio_features('playlist_owner_username', 'playlist_uri')
And boom, you're done. I'm not sure what the max is, but I know it's over 300 songs because I have pulled that in.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Search via Python Search API timing out intermittently - python

Related

What is the best way to return a variable or call a function to maximize code reuse?

Multiprocessing In Django Function

How can we list all the parameters in the aws parameter store using Boto3? There is no ssm.list_parameters in boto3 documentation?

python-ldap unable to do any basic search queries on open server

Spotipy: How to read more than 100 tracks from a playlist

Categories

Resources