Python 2.7 Mailchimp3 401 status "API Key is missing" - python

I am currently trying to make a call using mailchimp3:
client.reports.email_activity.all(campaign_id = '#######', getall=True, fields = '######')
When I call large amounts of email data I get the 401 status error. I am able to call smaller amounts with no error.
I tried increasing:
request.get(timeout=10000)

Pulling large amounts of data in one call can cause this timeout error. Instead it is best to use the offset method.
client.reports.email_activity.all(campaign_id = '#####', offset = #, count = #, fields = '######')
With this you can loop through and make a bunch of smaller calls instead of one large call that causes the timeout issue.
Thanks to the support staff at chimp mail that helped me trouble shoot this issue.

Related

How to query Twitter API public metrics for multiple `tweet_id`s using Tweepy

GOAL
Use Tweepy to to call the Twitter API to return the public_metrics (likes, retweets, quotes, replies) for each of multiple tweet_ids.
SOLUTION
client = tweepy.Client(bearer_token, wait_on_rate_limit=True)
gathered_tweets = []
for response in tweepy.Paginator(client.get_tweets,
query = '1537245238095323136, 1536973965394157569',
tweet_fields = ['public_metrics'],
max_results=100):
time.sleep(1)
gathered_tweets.append(response)
result = []
user_dict = {}
# Loop through each response object
for response in gathered_tweets:
# Take all of the users, and put them into a dictionary of dictionaries with the info we want to keep
for user in response.includes['users']:
user_dict[user.id] = {'username': user.username,
'created_at': user.created_at
}
for tweet in response.data:
# For each tweet, find the author's information
author_info = user_dict[tweet.author_id]
# Put all of the information we want to keep in a single dictionary for each tweet
result.append({'author_id': tweet.author_id,
'tweet_id': tweet.id,
'retweets': tweet.public_metrics['retweet_count'],
'replies': tweet.public_metrics['reply_count'],
'likes': tweet.public_metrics['like_count'],
'quotes': tweet.public_metrics['quote_count']
})
ERROR MESSAGE
TypeError Traceback (most recent call last)
<ipython-input-31-a0842c496bea> in <module>()
9 query = '1537245238095323136,1536973965394157569',
10 tweet_fields = ['public_metrics'],
---> 11 max_results=100):
12 time.sleep(1)
13 gathered_tweets.append(response)
/usr/local/lib/python3.7/dist-packages/tweepy/pagination.py in __next__(self)
96 self.kwargs["pagination_token"] = pagination_token
97
---> 98 response = self.method(*self.args, **self.kwargs)
99
100 self.previous_token = response.meta.get("previous_token")
TypeError: get_tweets() missing 1 required positional argument: 'ids'
ASSESSMENT
I get that I should be stating the ids differently, but not clear how.
I'm reading the get_tweets documentation here:
https://docs.tweepy.org/en/stable/client.html#tweet-lookup
But it's not clear on the argument syntax.
The documentation states: " A comma separated list of Tweet IDs. Up to 100 are allowed in a single request. Make sure to not include a space between commas and fields."
I believe my syntax is in line with that, so I'm not getting what the error message means.
And guidance on this greatly appreciated.
Thx
Doug
As for your original error, the error message clearly told you TypeError: get_tweets() missing 1 required positional argument: 'ids', and that line showed that ids was not passed. In the comments we resolved that to know that this was a minor typo as the query argument (in the line query = '1537245238095323136, 1536973965394157569',) should have been renamed to ids.
Now for the subsequent error, max_results not working with get_tweets but working with search_recent_tweets can also be traced to the same issue. Reading through the documentation for get_tweets (source code) clearly shows that it does not accept a max_results argument, while search_recent_tweets (source code) does. You may have thought and make an assumption that the tweetpy.Paginator (source code) actually provide the support for max_results argument but again, reading both the documentation and the code clearly shows that is not the case. You can read further and then find that there is a flatten method (source code) that actually will provide way to the limit the results returned.
Your issues can be resolved by careful reading of the documentation and sometimes the code of the underlying libraries you are using, and that will typically save you a lot more time than waiting around for an answer. Never make any assumptions about what the library does, it's often counterproductive to pass random things to a function when there is no knowledge of what arguments it accepts.

Why does ArangoDB (using Python-Arango) return ERR 1600 ERROR_CURSOR_NOT_FOUND?

The problem
I iterate over an entire vertex collection, e.g. journals, and use it to create edges, author, from a person to the given journal.
I use python-arango and the code is something like:
for journal in journals.all():
create_author_edge(journal)
I have a relatively small dataset, and the journals-collection has only ca. 1300 documents. However: this is more than 1000, which is the batch size in the Web Interface - but I don't know if this is of relevance.
The problem is that it raises a CursorNextError, and returns HTTP 404 and ERR 1600 from the database, which is the ERROR_CURSOR_NOT_FOUND error:
Will be raised when a cursor is requested via its id but a cursor with that id cannot be found.
Insights to the cause
From ArangoDB Cursor Timeout, and from this issue, I suspect that it's because the cursor's TTL has expired in the database, and in the python stacktrace something like this is seen:
# Part of the stacktrace in the error:
(...)
if not cursor.has_more():
raise StopIteration
cursor.fetch() <---- error raised here
(...)
If I iterate over the entire collection fast, i.e. if I do print(len(journals.all()) it outputs "1361" with no errors.
When I replace the journals.all() with AQL, and increase the TTL parameter, it works without errors:
for journal in db.aql.execute("FOR j IN journals RETURN j", ttl=3600):
create_author_edge(journal)
However, without the the ttl-parameter, the AQL approach gives the same error as using journals.all().
More information
A last piece of information is that I'm running this on my personal laptop when the error is raised. On my work computer, the same code was used to create the graph and populate it with the same data, but there no errors were raised. Because I'm on holiday I don't have access to my work computer to compare versions, but both systems were installed during the summer so there's a big chance the versions are the same.
The question
I don't know if this is an issue with python-arango, or with ArangoDB. I believe that because there is no problem when TTL is increased that it could indicate an issue with ArangodDB and not the Python driver, but I cannot know.
(I've added a feature request to add ttl-param to the .all()-method here.)
Any insights into why this is happening?
I don't have the rep to create the tag "python-arango", so it would be great if someone would create it and tag my question.
Inside of the server the simple queries will be translated to all().
As discussed on the referenced github issue, simple queries don't support the TTL parameter, and won't get them.
The prefered solution here is to use an AQL-Query on the client, so that you can specify the TTL parameter.
In general you should refrain from pulling all documents from the database at once, since this may introduce other scaling issues. You should use proper AQL with FILTER statements backed by indices (use explain() to revalidate) to fetch the documents you require.
If you need to iterate over all documents in the database, use paging. This is usually implemented the best way by combining a range FILTER with a LIMIT clause:
FOR x IN docs
FILTER x.offsetteableAttribute > #lastDocumentWithThisID
LIMIT 200
RETURN x
So here is how I did it. You can specify with the more args param makes it easy to do.
Looking at the source you can see the doc string says what to do
def AQLQuery(self, query, batchSize = 100, rawResults = False, bindVars = None, options = None, count = False, fullCount = False,
json_encoder = None, **moreArgs):
"""Set rawResults = True if you want the query to return dictionnaries instead of Document objects.
You can use **moreArgs to pass more arguments supported by the api, such as ttl=60 (time to live)"""
from pyArango.connection import *
conn = Connection(username=usr, password=pwd,arangoURL=url)# set this how ya need
db = conn['collectionName']#set this to the name of your collection
aql = """ for journal in journals.all():
create_author_edge(journal)"""
doc = db.AQLQuery(aql,ttl=300)
Thats all ya need to do!

Forex-python "Currency Rates Source Not Ready"

I want to use the Forex-python module to convert amounts in various currencies to a specific currency ("DKK") according to a specific date (The last day of a previous month according to a date in the dataframe)
This is the structure of my code:
pd.DataFrame(data={'Date':['2017-4-15','2017-6-12','2017-2-25'],'Amount':[5,10,15],'Currency':['USD','SEK','EUR']})
def convert_rates(amount,currency,PstngDate):
PstngDate = datetime.strptime(PstngDate, '%Y-%m-%d')
if currency != 'DKK':
return c.convert(base_cur=currency,dest_cur='DKK',amount=amount \
,date_obj=PstngDate - timedelta(PstngDate.day))
else:
return amount
and the the new column with the converted amount:
df['Amount, DKK'] = np.vectorize(convert_rates)(
amount=df['Amount'],
currency=df['Currency'],
PstngDate=df['Date']
)
I get the RatesNotAvailableError "Currency Rates Source Not Ready"
Any idea what can cause this? It has previously worked with small amounts of data, but I have many rows in my real df...
I inserted a small print statement into convert.py (part of forex-python) to debug this.
print(response.status_code)
Currently I receive:
502
Read these threads about the HTTP 502 error:
In HTTP 502, what is meant by an invalid response?
https://www.lifewire.com/502-bad-gateway-error-explained-2622939
These errors are completely independent of your particular setup,
meaning that you could see one in any browser, on any operating
system, and on any device.
502 indicates that currently there is a problem with the infrastructure this API uses to provide us with the required data. As I am in need of the data myself I will continue to monitor this issue and keep my post on this site updated.
There is already an issue on Github regarding this issue:
https://github.com/MicroPyramid/forex-python/issues/100
From the source: https://github.com/MicroPyramid/forex-python/blob/80290a2b9150515e15139e1a069f74d220c6b67e/forex_python/converter.py#L73
Your error means the library received a non 200 response code to your request. This could mean the site is down, or it's blocked you momentarily because you're hammering it with requests.
Try replacing the call to c.convert with something like:
from time import sleep
def try_convert(amount, currency, PstngDate):
success = False
while success == False:
try:
res = c.convert(base_cur=currency,dest_cur='DKK',amount=amount \
,date_obj=PstngDate - timedelta(PstngDate.day))
except:
#wait a while
sleep(10)
return res
Or even better, use a library like backoff, to do the retrying for you:
https://pypi.python.org/pypi/backoff/1.3.1

Python-Twitter: Retrieve most recent mentions

I'm trying to use the Python-Twitter library (https://github.com/bear/python-twitter) to extract mentions of a twitter account using the GetMention() function. The script populates a database and runs periodically on a cron job so I don't want to extract every mention, only those since the last time the script was run.
The code below extracts the mentions fine but for some reason the 'since_id' argument doesn't seem to do anything - the function returns all the mentions every time I run it, rather than filtering for only the most recent mentions. For reference the documentation is here: https://python-twitter.googlecode.com/hg/doc/twitter.html#Api-GetMentions)
What is the correct way to implement the GetMention() function? (I've looked but I can't find any examples online). Alternatively, is there a different/more elegant way of extracting twitter mentions that I'm overlooking?
def scan_timeline():
''' Scans the timeline and populates the database with the results '''
FN_NAME = "scan_timeline"
# Establish the api connection
api = twitter.Api(
consumer_key = "consumerkey",
consumer_secret = "consumersecret",
access_token_key = "accesskey",
access_token_secret = "accesssecret"
)
# Tweet ID of most recent mention from the last time the function was run
# (In actual code this is dynamic and extracted from a database)
since_id = 498404931028938752
# Retrieve all mentions created since the last scan of the timeline
length_of_response = 20
page_number = 0
while length_of_response == 20:
# Retreive most recent mentions
results = api.GetMentions(since_id,None,page_number)
### Additional code inserts the tweets into a database ###
Your syntax seems to be consistent as per mentioned in the Python-Twitter library. What I think is happening is the following:
If the limit of Tweets has occured since the since_id, the since_id will be forced to the oldest ID available.
Which would lead to all the tweets starting from the oldest available ID. Try working with a more recent since ID value. Equivalently, also check whether the since ID you're giving is appropriate or not.

elasticsearch-py scan and scroll to return all documents

I am using elasticsearch-py to connect to my ES database which contains over 3 million documents. I want to return all the documents so I can abstract data and write it to a csv. I was able to accomplish this easily for 10 documents (the default return) using the following code.
es=Elasticsearch("glycerin")
query={"query" : {"match_all" : {}}}
response= es.search(index="_all", doc_type="patent", body=query)
for hit in response["hits"]["hits"]:
print hit
Unfortunately, when I attempted to implement the scan & scroll so I could get all the documents I ran into issues. I tried it two different ways with no success.
Method 1:
scanResp= es.search(index="_all", doc_type="patent", body=query, search_type="scan", scroll="10m")
scrollId= scanResp['_scroll_id']
response= es.scroll(scroll_id=scrollId, scroll= "10m")
print response
After scroll/ it gives the scroll id and then ends with ?scroll=10m (Caused by <class 'httplib.BadStatusLine'>: ''))
Method 2:
query={"query" : {"match_all" : {}}}
scanResp= helpers.scan(client= es, query=query, scroll= "10m", index="", doc_type="patent", timeout="10m")
for resp in scanResp:
print "Hiya"
If I print out scanResp before the for loop I get <generator object scan at 0x108723dc0>. Because of this I'm relatively certain that I'm messing up my scroll somehow, but I'm not sure where or how to fix it.
Results:
Again, after scroll/ it gives the scroll id and then ends with ?scroll=10m (Caused by <class 'httplib.BadStatusLine'>: ''))
I tried increasing the Max retries for the transport class, but that didn't make a difference.I would very much appreciate any insight into how to fix this.
Note: My ES is located on a remote desktop on the same network.
The python scan method is generating a GET call to the rest api. It is trying to send over your scroll_id over http. The most likely case here is that your scroll_id is too large to be sent over http and so you are seeing this error because it returns no response.
Because the scroll_id grows based on the number of shards you have it is better to use a POST and send the scroll_id in JSON as part of the request. This way you get around the limitation of it being too large for an http call.
Do you issue got resolved ?
I have got one simple solution, you must change the scroll_id every time after you call scroll method like below :
response_tmp = es.scroll(scroll_id=scrollId, scroll= "1m")
scrollId = response_tmp['_scroll_id']

Categories

Resources