sequence item 0: expected str instance, tuple found(2) - python

I analyzed the data in the precedent and tried to use topic modeling. Here is a
syntax I am using:
According to the error, I think it means that the string should go in when
joining, but the tuple was found. I don't know how to fix this part.
class FacebookAccessException(Exception): pass
def get_profile(request, token=None):
...
response = json.loads(urllib_response)
if 'error' in response:
raise FacebookAccessException(response['error']['message'])
access_token = response['access_token'][-1]
return access_token
#Join the review
word_list = ",".join([",".join(i) for i in sexualhomicide['tokens']])
word_list = word_list.split(",")
This is Error
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
C:\Users\Public\Documents\ESTsoft\CreatorTemp\ipykernel_13792\3474859476.py in <module>
1 #Join the review
----> 2 word_list = ",".join([",".join(i) for i in sexualhomicide['tokens']])
3 word_list = word_list.split(",")
C:\Users\Public\Documents\ESTsoft\CreatorTemp\ipykernel_13792\3474859476.py in <listcomp>(.0)
1 #Join the review
----> 2 word_list = ",".join([",".join(i) for i in sexualhomicide['tokens']])
3 word_list = word_list.split(",")
TypeError: sequence item 0: expected str instance, tuple found
This is print of 'sexual homicide'
print(sexualhomicide['cleaned_text'])
print("="*30)
print(twitter.pos(sexualhomicide['cleaned_text'][0],Counter('word')))
I can't upload the results of this syntax. Error occurs because it is classified as spam during the upload process.

Related

Getting KeyError: 'viewCount' for using Youtube API in Python

I'm trying to get the view count for a list of videos from a channel. I've written a function and when I try to run it with just 'video_id', 'title' & 'published date' I get the output. However, when I want the view count or anything from statistics part of API, then it is giving a Key Error.
Here's the code:
def get_video_details(youtube, video_ids):
all_video_stats = []
for i in range(0, len(video_ids), 50):
request = youtube.videos().list(
part='snippet,statistics',
id = ','.join(video_ids[i:i+50]))
response = request.execute()
for video in response['items']:
video_stats = dict(
Video_id = video['id'],
Title = video['snippet']['title'],
Published_date = video['snippet']['publishedAt'],
Views = video['statistics']['viewCount'])
all_video_stats.append(video_stats)
return all_video_stats
get_video_details(youtube, video_ids)
And this is the error message:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_18748/3337790216.py in <module>
----> 1 get_video_details(youtube, video_ids)
~\AppData\Local\Temp/ipykernel_18748/1715852978.py in get_video_details(youtube, video_ids)
14 Title = video['snippet']['title'],
15 Published_date = video['snippet']['publishedAt'],
---> 16 Views = video['statistics']['viewCount'])
17
18 all_video_stats.append(video_stats)
KeyError: 'viewCount'
I was referencing this Youtube video to write my code.
Thanks in advance.
I got it.
I had to use .get() to avoid the KeyErrors. It will return None for KeyErrors.
Replaced this code to get the solution.
Views = video['statistics'].get('viewCount')

IMDbPY handling None object

I'm trying to pull data about cinematographers from IMDbPY and i'm encountering a null object. I'm not sure how to deal with that None object in the code. Could someone help me out please?
here's where I have reached.
from imdb import IMDb, IMDbError
ia = IMDb()
itemdop = ''
doplist = []
items = ["0050083", "6273736", "2582802"]
def imdblistdop(myList=[], *args):
for x in myList:
movie = ia.get_movie(x)
cinematographer = movie.get('cinematographers')[0]
cinematographer2 = movie.get('cinematographers')
print(cinematographer)
print(doplist)
try:
itemdop = cinematographer['name']
doplist.append(itemdop)
except KeyError as ke:
print('Nope!')
imdblistdop(items)
The code is not working at all and all i get is this:
Boris Kaufman
[]
TypeError Traceback (most recent call last)
in ()
21
22
---> 23 imdblistdop(items)
24
25
in imdblistdop(myList, *args)
10 for x in myList:
11 movie = ia.get_movie(x)
---> 12 cinematographer = movie.get('cinematographers')[0]
13 cinematographer2 = movie.get('cinematographers')
14 print(cinematographer)
TypeError: 'NoneType' object is not subscriptable
cinematographer is a list. It means that you can point to an an entry in the list using its index. Example: cinematographer[2]. You can not use a string to point to an entry in the list.

TypeError: list indices must be integers or slices, not str error, multiple fails after trying debug in a different cell

I have two dataframe.
As follows:
And I have the following function:
def get_user_movies(user_id):
movie_id = user_movie_df[user_movie_df['UserID'] == user_id]['MovieID'].tolist()
movie_title = []
for i in range(len(movie_id)):
a = movie_title[movie_title['MovieID'] == movie_id[i]]['Title'].values[0]
movie_title.append(a)
if movie_id == [] and movie_title == []:
raise Exception
return movie_id,movie_title
get_user_movies(30878)
And I have the following error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-55-9c58c22528ff> in <module>
8 raise Exception
9 return movie_id,movie_title
---> 10 get_user_movies(30878)
<ipython-input-55-9c58c22528ff> in get_user_movies(user_id)
3 movie_title = []
4 for i in range(len(movie_id)):
----> 5 a = movie_title[movie_title['MovieID'] == movie_id[i]]['Title'].values[0]
6 movie_title.append(a)
7 if movie_id == [] and movie_title == []:
TypeError: list indices must be integers or slices, not str
I debug couple of times, the line that has error no problem running when I try to run with single movie_id or some random movie_id together in another loop.. I just don't understand why this error keeps poping up..
Please take a look! Thanks!
def get_user_movies(user_id):
movie_id = user_movie_df[user_movie_df['UserID'] == user_id]['MovieID'].tolist()
movie_title = []
for i in range(len(movie_id)):
a = movie_title[movie_title['MovieID'] == movie_id[i]]['Title'].values[0]
movie_title.append(a)
if movie_id == [] and movie_title == []:
raise Exception
return movie_id,movie_title
get_user_movies(30878)
movie_title list and movie_title dataframe name repeated..

TypeError sequence item 0: expected str instance, bytes found

for some reason i keep getting the TypeError at this
TypeError Traceback (most recent call last)
<
ipython-input-19-3490eb36442d> in <module>
2 result, numbers = mail.uid('search', None, "ALL")
3 uids = numbers[0].split()
----> 4 result, messages = mail.uid('fetch', ','.join(uids), '(BODY[])')
mail.select("INBOX")
result, numbers = mail.uid('search', None, "ALL")
uids = numbers[0].split()
result, messages = mail.uid('fetch', ','.join(uids), '(BODY[])')
date_list = []
from_list = []
message_text = []
for _, message in messages[::2]:
msg = email.message_from_string(message)
if msg.is_multipart():
t = []
for p in msg.get_payload():
t.append(p.get_payload(decode=True))
message_text.append(t[0])
else:message_text.append(msg.get_payload(decode=True))
date_list.append(msg.get('date'))
from_list.append(msg.get('from'))
date_list = pd.to_datetime(date_list)
print (len(message_text))
print (len(from_list))
df = pd.DataFrame(data={'Date':date_list,'Sender':from_list,'Message':message_text})
print (df.head())
df.to_csv('~inbox_email.csv',index=False)
This line
result, messages = mail.uid('fetch', ','.join(uids), '(BODY[])')
is raising the exception
TypeError sequence item 0: expected str instance, bytes found
Inspecting the line, 'fetch' and '(BODY[])' are already strings, so they are unlikely to be the problem.
That leaves ','.join(uids). uids is actually a list of bytes instances, so str.join is raising the exception because it expects an iterable of str instances.
To fix the problem, decode numbers[0] to str before manipulating it.
result, numbers = mail.uid('search', None, "ALL")
uids = numbers[0].decode().split() # <- decode before splitting
result, messages = mail.uid('fetch', ','.join(uids), '(BODY[])')

rdd.first() does not give an error but rdd.collect() does

I am working in pyspark and have the following code, where I am processing tweet and making an RDD with the user_id and text. Below is the code
"""
# Construct an RDD of (user_id, text) here.
"""
import json
def safe_parse(raw_json):
try:
json_object = json.loads(raw_json)
if 'created_at' in json_object:
return json_object
else:
return;
except ValueError as error:
return;
def get_usr_txt (line):
tmp = safe_parse (line)
return ((tmp.get('user').get('id_str'),tmp.get('text')));
usr_txt = text_file.map(lambda line: get_usr_txt(line))
print (usr_txt.take(5))
and the output looks okay (as shown below)
[('470520068', "I'm voting 4 #BernieSanders bc he doesn't ride a CAPITALIST PIG adorned w/ #GoldmanSachs $. SYSTEM RIGGED CLASS WAR "), ('2176120173', "RT #TrumpNewMedia: .#realDonaldTrump #America get out & #VoteTrump if you don't #VoteTrump NOTHING will change it's that simple!\n#Trump htt…"), ('145087572', 'RT #Libertea2012: RT TODAY: #Colorado’s leading progressive voices to endorse #BernieSanders! #Denver 11AM - 1PM in MST CO State Capitol…'), ('23047147', '[VID] Liberal Tears Pour After Bernie Supporter Had To Deal With Trump Fans '), ('526506000', 'RT #justinamash: .#tedcruz is the only remaining candidate I trust to take on what he correctly calls the Washington Cartel. ')]
However, as soon as I do
print (usr_txt.count())
I get an error like below
Py4JJavaError Traceback (most recent call last)
<ipython-input-60-9dacaf2d41b5> in <module>()
8 usr_txt = text_file.map(lambda line: get_usr_txt(line))
9 #print (usr_txt.take(5))
---> 10 print (usr_txt.count())
11
/usr/local/spark/python/pyspark/rdd.py in count(self)
1054 3
1055 """
-> 1056 return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
1057
1058 def stats(self):
What am I missing? Is the RDD not created properly? or there is something else? how do I fix it?
You have returned None from safe_parse method when there is no created_at element in the parsed json line or when there is an error in parsing. This created error while getting elements from the parsed jsons in (tmp.get('user').get('id_str'),tmp.get('text')). That caused the error to occur
The solution is to check for None in get_usr_txt method
def get_usr_txt (line):
tmp = safe_parse(line)
if(tmp != None):
return ((tmp.get('user').get('id_str'),tmp.get('text')));
Now the question is why print (usr_txt.take(5)) showed the result and print (usr_txt.count()) caused the error
Thats because usr_txt.take(5) considered only the first five rdds and not the rest and didn't have to deal with None datatype.

Categories

Resources