Kaggle datasets into jupyter notebook - python

I am trying to import some data from kaggle into notebook. The error I am receiving is a 401 unauthorized, however I have accepted the competition rules and I am able to download the data.
This is the code I am running:
from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
files = api.competition_download_files("twosigmanews")
api.competitions_submit("submission.csv", "my submission message", "twosigmanews")
EDIT: Added more of the error: No matter which kaggle data I wish to import I obtain the same error.
ApiException Traceback (most recent call last)
<ipython-input-7-65a92f19da82> in <module>()
2
3 api = KaggleApi()
----> 4 files = api.competition_download_files("twosigmanews")
5 api.competitions_submit("submission.csv", "my submission message", "twosigmanews")
~\Anaconda3\lib\site-packages\kaggle\api\kaggle_api_extended.py in competition_download_files(self, competition, path, force, quiet)
637 quiet: suppress verbose output (default is False)
638 """
--> 639 files = self.competition_list_files(competition)
640 if not files:
641 print('This competition does not have any available data files')
~\Anaconda3\lib\site-packages\kaggle\api\kaggle_api_extended.py in competition_list_files(self, competition)
554 """
555 competition_list_files_result = self.process_response(
--> 556 self.competitions_data_list_files_with_http_info(id=competition))
557 return [File(f) for f in competition_list_files_result]
558
~\Anaconda3\lib\site-packages\kaggle\api\kaggle_api.py in competitions_data_list_files_with_http_info(self, id, **kwargs)
416 _preload_content=params.get('_preload_content', True),
417 _request_timeout=params.get('_request_timeout'),
--> 418 collection_formats=collection_formats)
419
420 def competitions_list(self, **kwargs): # noqa: E501
~\Anaconda3\lib\site-packages\kaggle\api_client.py in call_api(self, resource_path, method, path_params, query_params, header_params, body, post_params, files, response_type, auth_settings, async_req, _return_http_data_only, collection_formats, _preload_content, _request_timeout)
332 response_type, auth_settings,
333 _return_http_data_only, collection_formats,
--> 334 _preload_content, _request_timeout)
335 else:
336 thread = self.pool.apply_async(self.__call_api, (resource_path,
~\Anaconda3\lib\site-packages\kaggle\api_client.py in __call_api(self, resource_path, method, path_params, query_params, header_params, body, post_params, files, response_type, auth_settings, _return_http_data_only, collection_formats, _preload_content, _request_timeout)
163 post_params=post_params, body=body,
164 _preload_content=_preload_content,
--> 165 _request_timeout=_request_timeout)
166
167 self.last_response = response_data
~\Anaconda3\lib\site-packages\kaggle\api_client.py in request(self, method, url, query_params, headers, post_params, body, _preload_content, _request_timeout)
353 _preload_content=_preload_content,
354 _request_timeout=_request_timeout,
--> 355 headers=headers)
356 elif method == "HEAD":
357 return self.rest_client.HEAD(url,
~\Anaconda3\lib\site-packages\kaggle\rest.py in GET(self, url, headers, query_params, _preload_content, _request_timeout)
249 _preload_content=_preload_content,
250 _request_timeout=_request_timeout,
--> 251 query_params=query_params)
252
253 def HEAD(self, url, headers=None, query_params=None, _preload_content=True,
~\Anaconda3\lib\site-packages\kaggle\rest.py in request(self, method, url, query_params, headers, body, post_params, _preload_content, _request_timeout)
239
240 if not 200 <= r.status <= 299:
--> 241 raise ApiException(http_resp=r)
242
243 return r
ApiException: (401)
Reason: Unauthorized
HTTP response headers: HTTPHeaderDict({'Cache-Control': 'private', 'Content-Length': '37', 'Content-Type': 'application/json; charset=utf-8', 'X-MiniProfiler-Ids': '["b1df1310-4d5b-4000-8f43-e5b6f4958a48","b9dcdaa4-64ef-4be1-bbbe-90fe664a81bd","db1868eb-0a12-4217-a89a-5cbb3946b0e7","b8166dda-a74f-4e64-8bd4-fe529e95bf04","205f9250-b5eb-4cfd-b94c-976778be8f17","229360b9-37d4-456f-b030-9e56879d7c84"]', 'X-Frame-Options': 'SAMEORIGIN', 'Referrer-Policy': 'strict-origin-when-cross-origin', 'Set-Cookie': 'ARRAffinity=87506ffb959c51b2ba135ec75a7dffc3bc28e2948e5cb4ee012d8d916b147438;Path=/;HttpOnly;Domain=www.kaggle.com', 'Date': 'Sat, 06 Oct 2018 16:23:01 GMT'})
HTTP response body: {"code":401,"message":"Unauthorized"}

I think that the name of the competition is wrong. Try:
from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi('copy and paste kaggle.json content here')
api.authenticate()
files = api.competition_download_files("two-sigma-financial-news")

While looking through the source code, I found this class. I think the notebook doesn't auto authenticate when you call KaggleApi(), hence you need to call the authenticate function on the API to connect to the Kaggle API.
Try:
api = KaggleApi()
api.authenticate()
I was able to connect and download the samples after this call.

Here is another pythonic way to import your data from kaggle API. I assume you are working on cloud instance with linux OS.
Here is how I do it :
get your kaggle.json file from your kaggle account page: https://www.kaggle.com/<username>/account
Run this code and make sure you have the kaggle.json in the right directory.
import json
import os
os.chdir("~/.kaggle")
data = {"username":"username","key":"tockenvalue"} # get this data from kaggle.json file
with open('kaggle.json', 'w') as outfile:
json.dump(data, outfile)
in terminal, cd change directory to where you want to put your data and then type in terminal :
kaggle competitions download -c two-sigma-financial-news
This is available everytime you want to import data from Kaggle API.

You have not provided any authorization to your code, e.g. your user id, password, and the most important authentication key. The authentication key is given after user id and its Kaggle password is provided.
The Kaggle authentication can be obtained from api.authenticate() function after assigning Kaggle API to the variable named "api".

Your username and key is either not provided or invalid.
Goto https://www.kaggle.com/username/account and create new API token. kaggle.json file will be downloaded. Place it in ~/.kaggle/kaggle.json or C:\Users\User\.kaggle\kggle.json.
Also, you have to click "I understand and accept" in Rules Acceptance section for the data your going to download.

Colab is also best methode to import the kaggle dataset the steps are:
! pip install kaggle
! mkdir ~/.kaggle
! cp kaggle.json ~/.kaggle/
! chmod 600 ~/.kaggle/kaggle.json
! kaggle datasets download -d rohanrao/air-quality-data-in-india

Related

IBM text to speech Python DecodeError

I just tried the basic example of IBM's text to speech with Python:
!pip install ibm_watson
from ibm_watson import TextToSpeechV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
apikey = 'my API KEY'
url = 'my SERVICE URL'
authenticator = IAMAuthenticator(apikey)
tts = TextToSpeechV1(authenticator=authenticator)
tts.set_service_url(url)
with open('./speech.mp3', 'wb') as audio_file:
res = tts.synthesize('Hello World!', accept='audio/mp3', voice='en-US_AllisonV3Voice').get_result()
audio_file.write(res.content)
But I get an error message:
DecodeError: It is required that you pass in a value for the "algorithms" argument when calling decode().*
DecodeError Traceback (most recent call last)
<ipython-input-5-53b9d398591b> in <module>
1 with open('./speech.mp3', 'wb') as audio_file:
----> 2 res = tts.synthesize('Hello World!', accept='audio/mp3', voice='en-US_AllisonV3Voice').get_result()
3 audio_file.write(res.content)
c:\users\chris\appdata\local\programs\python\python39\lib\site-packages\ibm_watson\text_to_speech_v1.py in synthesize(self, text, accept, voice, customization_id, **kwargs)
275
276 url = '/v1/synthesize'
--> 277 request = self.prepare_request(method='POST',
278 url=url,
279 headers=headers,
c:\users\chris\appdata\local\programs\python\python39\lib\site-packages\ibm_cloud_sdk_core\base_service.py in prepare_request(self, method, url, headers, params, data, files, **kwargs)
295 request['data'] = data
296
--> 297 self.authenticator.authenticate(request)
298
299 # Next, we need to process the 'files' argument to try to fill in
c:\users\chris\appdata\local\programs\python\python39\lib\site-packages\ibm_cloud_sdk_core\authenticators\iam_authenticator.py in authenticate(self, req)
104 """
105 headers = req.get('headers')
--> 106 bearer_token = self.token_manager.get_token()
107 headers['Authorization'] = 'Bearer {0}'.format(bearer_token)
108
c:\users\chris\appdata\local\programs\python\python39\lib\site-packages\ibm_cloud_sdk_core\jwt_token_manager.py in get_token(self)
77 """
78 if self._is_token_expired():
---> 79 self.paced_request_token()
80
81 if self._token_needs_refresh():
c:\users\chris\appdata\local\programs\python\python39\lib\site-packages\ibm_cloud_sdk_core\jwt_token_manager.py in paced_request_token(self)
122 if not request_active:
123 token_response = self.request_token()
--> 124 self._save_token_info(token_response)
125 self.request_time = 0
126 return
c:\users\chris\appdata\local\programs\python\python39\lib\site-packages\ibm_cloud_sdk_core\jwt_token_manager.py in _save_token_info(self, token_response)
189
190 # The time of expiration is found by decoding the JWT access token
--> 191 decoded_response = jwt.decode(access_token, verify=False)
192 # exp is the time of expire and iat is the time of token retrieval
193 exp = decoded_response.get('exp')
c:\users\chris\appdata\local\programs\python\python39\lib\site-packages\jwt\api_jwt.py in decode(self, jwt, key, algorithms, options, **kwargs)
111 **kwargs,
112 ) -> Dict[str, Any]:
--> 113 decoded = self.decode_complete(jwt, key, algorithms, options, **kwargs)
114 return decoded["payload"]
115
c:\users\chris\appdata\local\programs\python\python39\lib\site-packages\jwt\api_jwt.py in decode_complete(self, jwt, key, algorithms, options, **kwargs)
77
78 if options["verify_signature"] and not algorithms:
---> 79 raise DecodeError(
80 'It is required that you pass in a value for the "algorithms" argument when calling decode().'
81 )
DecodeError: It is required that you pass in a value for the "algorithms" argument when calling decode().
The issue may be with the newest version of PyJWT package (2.0.0).
use
pip install PyJWT==1.7.1
to downgrade to the previous version and your project may work now. ( this worked for me )
Yea there is definitely an issue with pyjwt > 2.0, like earlier mentioned you need to uninstall the previous version and install a more stable one like 1.7.1.
it worked for me
I was having this problem with PyJWT==2.1.0,
My existing code,
import jwt
data = 'your_data'
key = 'your_secret_key'
encoded_data = jwt.encode(data, key) # worked
decoded_data = jwt.decode(encoded_data, key) # did not worked
I passed the algorithm for encoding and afterwards it worked fine,
Solution:
import jwt
data = 'your_data'
key = 'your_secret_key'
encoded_data = jwt.encode(data, key, algorithm="HS256")
decoded_data = jwt.decode(encoded_data, key, algorithms=["HS256"])
More information is available in JWT Docs
I faced the same error when i was working with speech-to-text using ibm_watson then i solved my issue by installing PyJWT of version 1.7.1 to do that try:
pip install PyJWT==1.7.1
OR
python -m pip install PyJWT
Good Luck
For those who want to use the latest version(as of writing this v2.1.0) of PyJWT.
If you don't want to continue with the older version(i.e PyJWT==1.7.1) and want to upgrade it for some reason, you need to use the verify_signature parameter and set it to False(It is True by default if you don't provide it). In older versions (before <2.0.0) the parameter was verify and you could directly use that. but in the newer version, you have to use it inside options parameter which is of type dict
jwt.decode(...., options={"verify_signature": False})
If you don't want to use verify_signature, you can simply pass the algorithms parameter without verify_signature.
jwt.decode(.... algorithms=['HS256'])
This is from the official Changelog.
Dropped deprecated verify param in jwt.decode(...)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Use jwt.decode(encoded, key, options={"verify_signature": False})
instead.
Require explicit algorithms in jwt.decode(...) by default
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Example: jwt.decode(encoded, key, algorithms=["HS256"]).
I had this problem, with version 2.3.0 of jwt.
I have solved it like this:
jwt.decode(token, MY_SECRET, algorithms=['HS256'])
you must use algorithms instead of algorithm

Docker CLI allows tag, but Docker Python API raises APIError

I am trying to push a local Docker image to ECR using the Docker Python API. As part of that process I need to tag the image a certain way. When I do so on the CLI, it works:
docker tag foo/bar '{user_id}.dkr.ecr.us-east-1.amazonaws.com/foo/bar'
However when I try to do the same thing using the docker.images.Image.tag function in the Docker Python SDK it fails:
import docker
(docker.client.from_env().images.get('foo/bar')
.tag('foo/bar',
'{user-id}.dkr.ecr.us-east-1.amazonaws.com/foo/bar'
)
)
(replace user_id in the code samples above with an AWS user id value, e.g. 717171717171; I've obfuscated it here for the purposes of this question)
With the following error:
In [10]: docker.client.from_env().images.get('foo/bar').ta
...: g('foo/bar', '{user_id}.dkr.ecr.us-east-1.amaz
...: onaws.com/foo/bar')
---------------------------------------------------------------------------
HTTPError Traceback (most recent call last)
~/miniconda3/envs/alekseylearn-dev/lib/python3.6/site-packages/docker/api/client.py in _raise_for_status(self, response)
255 try:
--> 256 response.raise_for_status()
257 except requests.exceptions.HTTPError as e:
~/miniconda3/envs/alekseylearn-dev/lib/python3.6/site-packages/requests/models.py in raise_for_status(self)
939 if http_error_msg:
--> 940 raise HTTPError(http_error_msg, response=self)
941
HTTPError: 500 Server Error: Internal Server Error for url: http+docker://localhost/v1.35/images/sha256:afe07035bce72b6c496878a7e3960bedffd46c1bedc79f1bd2b89619e8457194/tag?tag={user_id}.dkr.ecr.us-east-1.amazonaws.com%2Ffoo%2Fbar&repo=foo%2Fbar&force=0
During handling of the above exception, another exception occurred:
APIError Traceback (most recent call last)
<ipython-input-10-5bb015d17409> in <module>
----> 1 docker.client.from_env().images.get('alekseylearn-example/build').tag('foo/bar', '{user_id}.dkr.ecr.us-east-1.amazonaws.com/foo/bar')
~/miniconda3/envs/alekseylearn-dev/lib/python3.6/site-packages/docker/models/images.py in tag(self, repository, tag, **kwargs)
120 (bool): ``True`` if successful
121 """
--> 122 return self.client.api.tag(self.id, repository, tag=tag, **kwargs)
123
124
~/miniconda3/envs/alekseylearn-dev/lib/python3.6/site-packages/docker/utils/decorators.py in wrapped(self, resource_id, *args, **kwargs)
17 'Resource ID was not provided'
18 )
---> 19 return f(self, resource_id, *args, **kwargs)
20 return wrapped
21 return decorator
~/miniconda3/envs/alekseylearn-dev/lib/python3.6/site-packages/docker/api/image.py in tag(self, image, repository, tag, force)
531 url = self._url("/images/{0}/tag", image)
532 res = self._post(url, params=params)
--> 533 self._raise_for_status(res)
534 return res.status_code == 201
535
~/miniconda3/envs/alekseylearn-dev/lib/python3.6/site-packages/docker/api/client.py in _raise_for_status(self, response)
256 response.raise_for_status()
257 except requests.exceptions.HTTPError as e:
--> 258 raise create_api_error_from_http_exception(e)
259
260 def _result(self, response, json=False, binary=False):
~/miniconda3/envs/alekseylearn-dev/lib/python3.6/site-packages/docker/errors.py in create_api_error_from_http_exception(e)
29 else:
30 cls = NotFound
---> 31 raise cls(e, response=response, explanation=explanation)
32
33
APIError: 500 Server Error: Internal Server Error ("invalid tag format")
Why does the CLI command succeed and the Python API command fail?
In detailed Docker API lingo, an image name like 123456789012.dkr.ecr.us-east-1.amazon.aws.com/foo/bar:baz is split up into a repository (before the colon) and a tag (after the colon). The host-name part of the repository name is a registry. The default tag value if none is specified is the literal latest.
In your case, you already have an Image object, so you need to apply the two "halves" of the second argument:
docker.client.from_env().images.get('foo/bar')
.tag('{user-id}.dkr.ecr.us-east-1.amazonaws.com/foo/bar',
'latest'
)
(In many practical cases using the latest tag isn't a great idea; something like a timestamp or source control commit ID better identifies the image and helps indicate to services like ECS or EKS or plain Kubernetes that they need to do an update. Also, while the ECR image IDs are kind of impractically long, in a scripting context nothing stops you from using them directly; you can, for example, docker build -t 12345...amazonaws.com/foo/bar:abcdef0 and skip the intermediate docker tag step if you want.)

How to get Rally Stories in closed Projects using pyral

Is there a way to get Rally Stories for closed Projects using the pyral Python library?
When using rally.get(...) Stories in closed Projects aren't being returned. Here is the code being used:
from pyral import Rally
rally = Rally(...)
rally_id = 'S123456'
response = rally.get('UserStory',
query='FormattedID = %s AND Project.State = "Closed"' % rally_id,
fetch=True, instance=True)
The response contains no results. But, the above example ID S123456 is a valid Rally Story ID. It's just part of a closed Project.
Also, adding what I think would be the query param syntax to find this Story still didn't work and didn't raise any Python errors either.
Versions of code being used:
Python 2.7.15
Pyral 1.4.1 Also back tested in Pyral 1.2.3 and didn't
work.
Here is the error that I get when calling the above code. The error basically means, the Rally.get(...) request failed to find the instance.
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/Users/alelevier/Documents/rally_jira_sync/venv/lib/python2.7/site-packages/IPython/core/formatters.pyc in __call__(self, obj)
697 type_pprinters=self.type_printers,
698 deferred_pprinters=self.deferred_printers)
--> 699 printer.pretty(obj)
700 printer.flush()
701 return stream.getvalue()
/Users/alelevier/Documents/rally_jira_sync/venv/lib/python2.7/site-packages/IPython/lib/pretty.pyc in pretty(self, obj)
401 if cls is not object \
402 and callable(cls.__dict__.get('__repr__')):
--> 403 return _repr_pprint(obj, self, cycle)
404
405 return _default_pprint(obj, self, cycle)
/Users/alelevier/Documents/rally_jira_sync/venv/lib/python2.7/site-packages/IPython/lib/pretty.pyc in _repr_pprint(obj, p, cycle)
701 """A pprint that just redirects to the normal repr function."""
702 # Find newlines and replace them with p.break_()
--> 703 output = repr(obj)
704 for idx,output_line in enumerate(output.splitlines()):
705 if idx:
/Users/alelevier/Documents/rally_jira_sync/venv/lib/python2.7/site-packages/pyral/rallyresp.pyc in __repr__(self)
408 else:
409 blurb = "%sResult TotalResultCount: %d Results: %s" % \
--> 410 (self.request_type, self.resultCount, self.content['Results'])
411 return "%s %s" % (self.status_code, blurb)
412
KeyError: 'Results'

TypeError when calling requests_oauthlib multiple times (error actually raised within urllib3)

I'm writing a script to pull some data from the Twitter API. It's use of OAuth 1.1 means I'm using the requests_oauthlib helper library on top of requests to authenticate the session.
The first call to the API works, but then subsequent calls give a TypeError as follows:
/Users/phil/code/Virtualenv/req_test/lib/python2.7/site-packages/requests/packages/urllib3/connection.pyc in __init__(self, *args, **kw)
124
125 # Superclass also sets self.source_address in Python 2.7+.
--> 126 _HTTPConnection.__init__(self, *args, **kw)
127
128 def _new_conn(self):
TypeError: unbound method __init__() must be called with HTTPConnection instance as first argument (got VerifiedHTTPSConnection instance instead)
Looks like there's something persisting in the session as it's always on repeated use that the error comes. I've tried a clean virtualenv with latest versions installed via pip and no difference.
I'm using the context manager approach so thought that the session would be destroyed after each call, preventing this from happening:
with ro.OAuth1Session(**self._auth) as s:
response = s.get(url)
Any fix or pointers to understand what's causing the problem would be appreciated.
Edit: I've tried a different approach, using the alternative way of building a session as described on the requests docs (http://docs.python-requests.org/en/master/user/authentication/) but same error is raised.
Edit: Full stack in case it's useful:
/Users/phil/code/Virtualenv/req_test/lib/python2.7/site-packages/requests/sessions.pyc in get(self, url, **kwargs)
485
486 kwargs.setdefault('allow_redirects', True)
--> 487 return self.request('GET', url, **kwargs)
488
489 def options(self, url, **kwargs):
/Users/phil/code/Virtualenv/req_test/lib/python2.7/site-packages/requests/sessions.pyc in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
473 }
474 send_kwargs.update(settings)
--> 475 resp = self.send(prep, **send_kwargs)
476
477 return resp
/Users/phil/code/Virtualenv/req_test/lib/python2.7/site-packages/requests/sessions.pyc in send(self, request, **kwargs)
583
584 # Send the request
--> 585 r = adapter.send(request, **kwargs)
586
587 # Total elapsed time of the request (approximately)
/Users/phil/code/Virtualenv/req_test/lib/python2.7/site-packages/requests/adapters.pyc in send(self, request, stream, timeout, verify, cert, proxies)
401 decode_content=False,
402 retries=self.max_retries,
--> 403 timeout=timeout
404 )
405
/Users/phil/code/Virtualenv/req_test/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.pyc in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, **response_kw)
564 # Request a connection from the queue.
565 timeout_obj = self._get_timeout(timeout)
--> 566 conn = self._get_conn(timeout=pool_timeout)
567
568 conn.timeout = timeout_obj.connect_timeout
/Users/phil/code/Virtualenv/req_test/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.pyc in _get_conn(self, timeout)
254 conn = None
255
--> 256 return conn or self._new_conn()
257
258 def _put_conn(self, conn):
/Users/phil/code/Virtualenv/req_test/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.pyc in _new_conn(self)
800 conn = self.ConnectionCls(host=actual_host, port=actual_port,
801 timeout=self.timeout.connect_timeout,
--> 802 strict=self.strict, **self.conn_kw)
803
804 return self._prepare_conn(conn)
/Users/phil/code/Virtualenv/req_test/lib/python2.7/site-packages/requests/packages/urllib3/connection.pyc in __init__(self, host, port, key_file, cert_file, strict, timeout, **kw)
208
209 HTTPConnection.__init__(self, host, port, strict=strict,
--> 210 timeout=timeout, **kw)
211
212 self.key_file = key_file
/Users/phil/code/Virtualenv/req_test/lib/python2.7/site-packages/requests/packages/urllib3/connection.pyc in __init__(self, *args, **kw)
124
125 # Superclass also sets self.source_address in Python 2.7+.
--> 126 _HTTPConnection.__init__(self, *args, **kw)
127
128 def _new_conn(self):
TypeError: unbound method __init__() must be called with HTTPConnection instance as first argument (got VerifiedHTTPSConnection instance instead)
The format for OAuth1Session is:
oauth = OAuth1Session(client_key,
client_secret=client_secret,
resource_owner_key=resource_owner_key,
resource_owner_secret=resource_owner_secret,
verifier=verifier)
** is used with keyword arguments, and OAuth1Session has different signature
Edit: Also added reference to reloading container library within IPython / Jupyter.
After a fair bit of reading, it appears the problem may arise when you call get on a request but don't access the body of the response (which is what I did whilst building / debugging the flow):
http://docs.python-requests.org/en/master/user/advanced/#keep-alive
"Note that connections are only released back to the pool for reuse once all body data has been read; be sure to either set stream to False or read the content property of the Response object."
So the answer is to make sure the first thing you do after making a request is flush the response by calling Response.content, Response.json() or similar method.
The problem has only arisen when using the requests_oauthlib library because the session it uses is less common. I've done similar with the Facebook and LinkedIn APIs without issue, by using query parameters that don't affect the session object itself.
It also arose most often when reloading the helper library within IPython / Jupyter. Quitting the notebook or command line session then restarting would remove the issue.

404 when getting private YouTube video even when logged in with the owner's account using gdata-python-client

If a YouTube video is set as private and I try to fetch it using the gdata Python API a 404 RequestError is raised, even though I have done a programmatic login with the account that owns that video:
from gdata.youtube import service
yt_service = service.YouTubeService(email=my_email,
password=my_password,
client_id=my_client_id,
source=my_source,
developer_key=my_developer_key)
yt_service.ProgrammaticLogin()
yt_service.GetYouTubeVideoEntry(video_id='IcVqemzfyYs')
---------------------------------------------------------------------------
RequestError Traceback (most recent call last)
<ipython console>
/usr/lib/python2.4/site-packages/gdata/youtube/service.pyc in GetYouTubeVideoEntry(self, uri, video_id)
203 elif video_id and not uri:
204 uri = '%s/%s' % (YOUTUBE_VIDEO_URI, video_id)
--> 205 return self.Get(uri, converter=gdata.youtube.YouTubeVideoEntryFromString)
206
207 def GetYouTubeContactFeed(self, uri=None, username='default'):
/usr/lib/python2.4/site-packages/gdata/service.pyc in Get(self, uri, extra_headers, redirects_remaining, encoding, converter)
1100 'body': result_body}
1101 else:
-> 1102 raise RequestError, {'status': server_response.status,
1103 'reason': server_response.reason, 'body': result_body}
1104
RequestError: {'status': 404, 'body': 'Video not found', 'reason': 'Not Found'}
This happens every time, unless I go into my YouTube account (through the YouTube website) and set it public, after that I can set it as private and back to public using the Python API.
Am I missing a step or is there another (or any) way to fetch a YouTube video set as private from the API?
Thanks in advance.
Apparently the YouTube Data API doesn't allow this (yet), so to workaround this I use the GetYouTubeUserFeed method of a YouTubeService instance to obtain a list of all the video entries I need (whether they are private or public):
from gdata.youtube import service
VIDEO_ID = 'IcVqemzfyYs'
yt_service = service.YouTubeService(email=my_email,
password=my_password,
client_id=my_client_id,
source=my_source,
developer_key=my_developer_key)
yt_service.ProgrammaticLogin()
userfeed = yt_service.GetYouTubeUserFeed(username=my_email[:my_email.index('#')])
video_entry = reduce(lambda e1, e2: e1 if e1.id.text.endswith(VIDEO_ENTRY) else (e2 if e2.id.text.endswith(VIDEO_ENTRY) else None),
userfeed.entry)
Hope this helps anyone having the same problem :)

Categories

Resources