I am new to Instaloader and I am running into a problem when trying to pull in Bio information. We have scraped Google for a list of Instagram handles for our accounts, unfortunately the data isn't perfect and some of the handles we have pulled in are no longer active(User has changed profile handle or deleted account). This causes the ProfileNotExistsException error to come up and stops pulling in the information for all subsequent accounts.
Is there a way to ignore this and continue pulling in the rest of the bios while just leaving this one blank?
Here is the code that is throwing me the error. handles is the list of handles we have.
bios = []
for element in handles:
if element == '': bios.append('NULL')
else:
bios.append(instaloader.Profile.from_username(L.context, element).biography)
I have tried using the workaround found in this forum(can't find the post) but it is not working for me. No errors, just not solving the issue. The code they suggested was:
def _obtain_metadata(self):
try:
if self._rhx_gis == None:
metadata = self._context.get_json('{}/'.format(self.username), params={})
self._node = metadata['entry_data']['ProfilePage'][0]['graphql']['user']
self._rhx_gis = metadata['rhx_gis']
metadata = self._context.get_json('{}/'.format(self.username), params={})
self._node = metadata['entry_data']['ProfilePage'][0]['graphql']['user']
except (QueryReturnedNotFoundException, KeyError) as err:
raise ProfileNotExistsException('Profile {} does not exist.'.format(self.username)) from err
Thanks in advance!
Related
I keep getting these errors in my Flask app:
malloc(): unsorted double linked list corrupted
double free or corruption (!prev)
basically I'm calling the pos() method and trying to get the year attribute from my JS page in order to use it in my query... but it doesn't work and instead returns some malloc error
can someone explain the problem and how to solve it???? I've been at it for hours of research and I still don't understand how my code is generating these errors. If someone could just point me in the right direction and give me a hint, I would really appreciate it. I looked into similar malloc errors on many different posts, but I tried the solutions they proposed (adding command: [
'--wait_timeout=28800',
'--max_allowed_packet=67108864'] to my docker-compose file)
did NOT work.
yr = 2022
pos_list = []
def pos_request():
global yr
global pos_list
cursor1 = conn.cursor (cursorclass = MySQLdb.cursors.DictCursor) ##conn is already set up
query = “private database structure info that I can’t share“ % (yr, yr, yr)
try:
cursor1.execute (query)
pos_list = []
for r in cursor1.fetchall():
pos_list.append (r)
cursor1.close ()
return pos_list
except (MySQLdb.Error, MySQLdb.Warning) as e:
return e
#app.route('/pos', methods= ['POST'])
def pos():
global yr
try:
yr = int(request.json['year'])
return json.dumps(pos_request(), default= str)
except:
return yr
#app.route('/pos', methods= ['GET'])
def pos_get():
return json.dumps(pos_list, default= str)
#app.route('/pos_post', methods = ['POST'])
def pos_post():
So I figured out what I was doing wrong. MySQLdb simply isn't built to support multiple cursors using one MySQL connection. I got this from their site: "Two threads simply cannot share a connection while a transaction is in progress, in addition to not being able to share it during query execution." So the only way is to use another class, like the pymysql driver snakecharmerb suggested. Or, if you can, open new MySQL connections for each method/route.
Sources:
https://mysqlclient.readthedocs.io/user_guide.html (scroll down to threadsafety)
I'm using the Python ibm-cloud-sdk in an attempt to iterate all resources in a particular IBM Cloud account. My trouble has been that pagination doesn't appear to "work for me". When I pass in the "next_url" I still get the same list coming back from the call.
Here is my test code. I successfully print many of my COS instances, but I only seem to be able to print the first page....maybe I've been looking at this too long and just missed something obvious...anyone have any clue why I can't retrieve the next page?
try:
####### authenticate and set the service url
auth = IAMAuthenticator(RESOURCE_CONTROLLER_APIKEY)
service = ResourceControllerV2(authenticator=auth)
service.set_service_url(RESOURCE_CONTROLLER_URL)
####### Retrieve the resource instance listing
r = service.list_resource_instances().get_result()
####### get the row count and resources list
rows_count = r['rows_count']
resources = r['resources']
while rows_count > 0:
print('Number of rows_count {}'.format(rows_count))
next_url = r['next_url']
for i, resource in enumerate(resources):
type = resource['id'].split(':')[4]
if type == 'cloud-object-storage':
instance_name = resource['name']
instance_id = resource['guid']
crn = resource['crn']
print('Found instance id : name - {} : {}'.format(instance_id, instance_name))
############### this is SUPPOSED to get the next page
r = service.list_resource_instances(start=next_url).get_result()
rows_count = r['rows_count']
resources = r['resources']
except Exception as e:
Error = 'Error : {}'.format(e)
print(Error)
exit(1)
From looking at the API documentation for listing resource instances, the value of next_url includes the URL path and the start parameter including its token for start.
To retrieve the next page, you would only need to pass in the parameter start with the token as value. IMHO this is not ideal.
I typically do not use the SDK, but a simply Python request. Then, I can use the endpoint (base) URI + next_url as full URI.
If you stick with the SDK, use urllib.parse to extract the query parameter. Not tested, but something like:
from urllib.parse import urlparse,parse_qs
o=urlparse(next_url)
q=parse_qs(o.query)
r = service.list_resource_instances(start=q['start'][0]).get_result()
Could you use the Search API for listing the resources in your account rather than the resource controller? The search index is set up for exactly that operation, whereas paginating results from the resource controller seems much more brute force.
https://cloud.ibm.com/apidocs/search#search
I am making a little django app to serve translations for my react frontend. The way it works is as follows:
The frontend tries to find a translation using a key.
If the translation for that key is not found, It sends a request to the backend with the missing key
On the backend, the missing key is appended to a json file
Everything works just fine when the requests are sent one at a time (when one finishes, the other is sent). But when multiple requests are sent at the same time, everything breaks. The json file gets corrupted. It's like all the requests are changing the file at the same time which causes this to happen. I am not sure if that's the case because I think that the file can not be edited by two processes at the same time(correct me if I am wrong) but I don't receive such an error which indicates that the requests are handled one at a time according to this and this
Also, I tried something, which to my surprise worked, that is to add time.sleep(1) to the top of my api view. When I did this, everything worked as expected.
What is going on ?
Here is the code, just in case it matters:
#api_view(['POST'])
def save_missing_translation_keys(request, lng, ns):
time.sleep(1)
missing_trans_path = MISSING_TRANS_DIR / f'{lng}.json'
# Read lng file and get current missing keys for given ns
try:
with open(missing_trans_path, 'r', encoding='utf-8') as missing_trans_file:
if is_file_empty(missing_trans_path):
missing_keys_dict = {}
else:
missing_keys_dict = json.load(missing_trans_file)
except FileNotFoundError:
missing_keys_dict = {}
except Exception as e:
# Even if file is not empty, we might not be able to parse it for some reason, so we log any errors in log file
with open(MISSING_LOG_FILE, 'a', encoding='utf-8') as logFile:
logFile.write(
f'could not save missing keys {str(list(request.data.keys()))}\nnamespace {lng}/{ns} file can not be parsed because\n{str(e)}\n\n\n')
raise e
# Add new missing keys to the list above.
ns_missing_keys = missing_keys_dict.get(ns, [])
for missing_key in request.data.keys():
if missing_key and isinstance(missing_key, str):
ns_missing_keys.append(missing_key)
else:
raise ValueError('Missing key not allowed')
missing_keys_dict.update({ns: list(set(ns_missing_keys))})
# Write new missing keys to the file
with open(missing_trans_path, 'w', encoding='utf-8') as missing_trans_file:
json.dump(missing_keys_dict, missing_trans_file, ensure_ascii=False)
return Response()
I was just getting started with Instaloader but when i tried to download a specific post, my code wouldn't continue
from instaloader import Instaloader, Profile, Post
# Get instance
L = Instaloader()
L.login(username, password)
print("login complete")
post = Post.from_shortcode(L.context, "CEPH-B0M8B9")
L.download_post(post, target='test')
print("test")
It wouldn't print the "test"
And I was also having some difficulties changing the filename as which the post would get saved. In the documentation it says:
target (Union[str, Path]) – Target name, i.e. profile name, #hashtag,
:feed; for filename.
but that wasn't helping me at all :/
I appreciate every answer :D
After looking at the source code i found the problem.
The download_post function does a lot of stuff and you can deactivate them with these lines:
L = Instaloader()
L.post_metadata_txt_pattern = ""
L.download_geotags = False
L.save_metadata = False
L.save_metadata_json = False
L.download_comments = False
The issue why the code wouldn't continue was that the function just took forever to download all comments
Hope this will help and safe some time for someone in the future :)
This should be an easy question, but just can't find the answer.
I am using a dynamodb paginator:
paginator = dynamoDbClient.get_paginator('query')
response_iterator = paginator.paginate(...)
I am looping through the results:
for response in response_iterator:
I expected the loop to not be executed when no results were found. Unfortunately, it is and I can't figure out what to check for to indicate that no results were found.
Please help.
Thanks.
Usually I write my own pagination logic, something like the below:
while True:
responseListPolicies = iam_client.list_policies(Marker=marker) if marker else iam_client.list_policies()
truncatedListPolicies = responseListPolicies.get('IsTruncated')
allPolicies = responseListPolicies.get('Policies')
allPolicyArns += [policy.get('Arn') for policy in allPolicies]
allPolicyNames += [policy.get('PolicyName') for policy in allPolicies]
if not truncatedListPolicies:
break
marker = responseListPolicies.get('Marker')
print("Found truncated at marker " + marker)
However, I recently came across the pagination made available via boto3 and decided to play with it. That's where I stumbled upon a similar issue trying to fetch group policies.
Here's a snippet of what I tried:
paginator = iam_client.get_paginator('list_group_policies')
res = paginator.paginate(GroupName=name, PaginationConfig={'MaxItems': 100000})
listOfAllPolicyNames += res.build_full_result().get('PolicyNames')
And this threw an error like:
An error occurred (NoSuchEntity) when calling the ListGroupPolicies operation: The group with name groupName cannot be found.
I started exploring it. What I did was try to see what are the methods made available on the res object. So a dir(res) gave something like this (partial results shown below):
...
'build_full_result',
'non_aggregate_part',
'result_key_iters',
'result_keys',
'resume_token',
'search']
I ended up looking at result_keys and it was something like this:
[{'type': 'field', 'children': [], 'value': 'PolicyNames'}]
I noticed that the children element was an empty list. So I tried the above pagination with another group name that I knew would certainly have some inline policies. And this time also the results were the same. So eventually, I realized that there may not be a directly available key that could be looked out for.
So finally had to resort back to the Exception handling way as below:
try:
paginator = iam_client.get_paginator('list_group_policies')
res = paginator.paginate(GroupName=name, PaginationConfig={'MaxItems': 100000})
res.build_full_result()
except Exception as e:
if e.response['Error']['Code'] == 'NoSuchEntity':
print("Entity does not exist")
Don't know how much would this help. But this is the approach I took. Cheers!
Each page has a KeyCount key, which tells you how many S3 objects are contained in each page. If the first page from the paginator has a KeyCount of 0, then you know it's empty.
Like this:
paginator = client.get_paginator('list_objects_v2')
for page in paginator.paginate(Bucket='my-bucket', Prefix='my-prefix'):
if page['KeyCount'] == 0:
# The paginator is empty (and this should be the first page)
If you actually need to use the KeyCount for other purposes (e.g., 0 might mean you've processed a bunch of pages and then the last one is empty), you could do this:
for i, page in enumerate(paginator.paginate(Bucket='my-bucket', Prefix='my-prefix')):
if i == 0 and page['KeyCount'] == 0:
# The paginator is empty
else:
# Paginator is not empty