I have a JSON file that I extracted a list that I'm taking out of it.
I've tried to use list indexing to get the other items from the list but I'm getting this error.
TypeError: list indices must be integers or slices, not str.
for name in data['athletes'][0:]['athlete']['displayName']:
print(name)
If I don't use the colon in the indexing it extracts the first name.
You're specifying:
for name in data['athletes'][0:]['athlete']['displayName']:
print(name)
Presumably data['athletes'][0] is a dictionary. But by saying data['athletes'][0:], you are taking a slice of the data['athletes'] list, in essence copying the entire list. Trying to extract a key value 'displayName', which is a string, from a list make no sense and hence your error (you can only index lists by integers). Of course, that was not your intention. You just want the zeroth element of the list, so remove the colon.
If each element of the list, data['athletes'], is a dictionary with key athlete that you are trying to display, then see the answer posted by Daniel Roseman (#DanielRoseman).
You probably want:
for athlete in data['athletes']
print(athlete['athlete']['displayName']
but it's impossible to tell for sure without seeing your JSON.
This is the type of data, correct?
data: Dict[str, List[Dict[str, Dict[str, str]]]]
Hence:
data['athletes'][0]: Dict[str, Dict[str, str]]
data['athletes'][0:]: List[Dict[str, Dict[str, str]]]
When you slice data['athletes'] (via [0:]), you get the List, which you can't index via a string. When you get a particular element (via [0]), you get the first Dict inside the List.
Other answers have already suggested how to print the names. If you wanted to turn them into a list, here's how you could do it via comprehension (I think this is sort of what you're trying to do with the slice):
names = [entry['athlete']['displayName'] for entry in data['athletes']]
Related
I am trying to filter some data using the Python Core API, which is through Apache Spark, but I am coming into this error, and I am unable to solve it in terms of the data I have:
TypeError: tuple indices must be integers or slices, not str
Now, this is a sample of my data structure:
This is the code I am using to filter my data, but it keeps giving me that error. I am simply trying to return the business_id, city and stars from my dataset.
(my_rdd
.filter(lambda x: x['city']=='Toronto')
.map(lambda x: (x['business_id'], x['city'], x['stars']))
).take(5)
Any guidance on how to filter my data would be helpful.
Thanks.
Sinc your data is nested in tuples, you need to specify the tuple indices in your filter and map:
result = (my_rdd
.filter(lambda x: x[1][1]['city']=='Toronto')
.map(lambda x: (x[1][1]['business_id'], x[1][1]['city'], x[1][1]['stars']))
)
print(result.collect())
[('7v91woy8IpLrqXsRvxj_vw', 'Toronto', 3.0)]
I think you are mistaking in the use of filter and map here. Both of them are used to update lists, and returns lists.
Both of them take a function as parameter (that's the case in the object version, you can also find a functional version which takes the input list as second parameter) and apply it on each item of the input list to build the output list. What differs though is their usage of the function:
filter uses it to, well, filter the input list. The function should return a boolean which indicates whether or not to include the item in the output list.
map uses it to build a new list of the same length as the old one, but with values updated using the provided function.
Now that being said, I believe you have the error TypeError: tuple indices must be integers or slices, not str when you try to filter the list.
On the first loop, the filter function will try to run the function against the first element of the list. This first element is the tuple ('7v91woy8IpLrqXsRvxj_vw', (({'average_stars': 3.41, 'compliment_cool': 9, ...}))). The problem is that you are trying to access a value of this tuple using a string, as if it was a dictionary, which is not permitted in python (and doesn't make much sense).
To extract the data you need, I would go with something much more simple:
item = my_rdd[0]
(item[1][1]['business_id'], item[1][1]['city'], item[1][1]['stars'])
I start by downloading some tweets from Twitter.
tweet_text = DonaldTrump["Tweets"]
tweet_text = tweet_text.str.lower()
Then in next step, we move with TweetTokenizer.
Tweet_tkn = TweetTokenizer()
tokens = [Tweet_tkn.tokenize(t) for t in tweet_text]
tokens[0:3]
Can someone explain to me and help me solve it.
I have been through similar questions that face similar errors but they provide different solutions.
Lists are mutable and can therefore not be used as dict keys. Otherwise, the program could add a list to a dictionary, change its value, and it is now unclear whether the value in the dictionary should be available under the new or the old list value, or neither.
If you want to use structured data as keys, you need to convert them to immutable types first, such as tuple or frozenset. For non-nested objects, you can simply use tuple(obj). For a simple list of lits, you can use this:
tuple(tuple(elem) for elem in obj)
But for an arbitrary structure, you will have to use recursion.
I've inherited the following code which is working great, apart from when only a single data item is return from the original xml. When that occurs the following error is thrown: 'TypeError: string indices must be integers'
result = xmltodict.parse(get_xml())
latest_result = result['Response']['Items']['Item']
myJsonData = json.dumps(latest_result)
j= json.loads(myJason)
print type(j)
for item in j:
print (item['Id'])
print (item['OrderId'])
I have narrowed the change in behaviour to a difference in datatype here:
print type(j)
When only a single ['Item'] is returned from the source XML the datatype of j is a 'dict', whilst the rest of the time (greater than one ['Item']) its a 'list'.
Hope someone can help.
Encoding to JSON then decoding again has nothing to do with your question. It is a red herring, you can use latest_result and still get the same error.
The result['Response']['Items']['Item'] can evidently be either a list of dictionaries, or a single dictionary. When iterating over a list, you'll get contained elements, while iteration over a dictionary gives you the keys. So your item elements are strings (each a key in the dictionary) and you can't address elements in that string with 'Id' or 'OrderId'.
Test for the type or use exception handling:
if isinstance(latest_result, dict):
# just one result, wrap it in a single-element list
latest_result = [latest_result]
Alternatively, fix up the xmltodict code (which you didn't share or otherwise identify) to always return lists for elements, even when there is just a single one.
This is a common xmltodict module usage problem. When there is a single child, by default, it makes a dict out of it and not a list with a single item. Relevant github issue:
xml containing 1 child
To workaround it, one option would be to set the dict_constructor argument:
from collections import defaultdict
xmltodict.parse(xml, dict_constructor=lambda *args, **kwargs: defaultdict(list, *args, **kwargs))
My data looks like this:
>>> print nattach[:10]
[PPAttachment(sent=u'1', verb=u'is', noun1=u'chairman', prep=u'of', noun2=u'N.V.', attachment=u'N'), PPAttachment(sent=u'2', verb=u'named', noun1=u'director', prep=u'of', noun2=u'conglomerate', attachment=u'N'), PPAttachment(sent=u'3', verb=u'caused', noun1=u'percentage', prep=u'of', noun2=u'deaths', attachment=u'N')...]
I want a list of the third element of each tuple. How do I do this?
I tried to do a list comprehension (I think), but I got this error:
TypeError: 'PPAttachment' object does not support indexing
I hope you will help a newbie to Python.
Obviously PPAttachment is not a tuple, nor is it apparently a namedtuple. To get the third element, you'll probably want to access the value by name:
[attach.noun1 for attach in nattach]
These are not tuples, but PPAttachment objects. I take it what you want is noun1, so maybe
something like
[pp_attachment.noun1 for pp_attachment in nattach[:10]]
will do the job
Okay I concede that I didn't ask the question very well. I will update my question to be more precise.
I am writing a function that takes a list as an argument. I want to check the length of the list so I can loop through the list.
The problem that I have is when the list has only one entry, len(myList) returns the length of that entry (the length of the string) and not the length of the list which should be == 1.
I can fix this if I force the argument to be parsed as a single value list ['val']. But I would prefer my API to allow the user to parse either a value or a list of values.
example:
def myMethod(self,dataHandle, data,**kwargs):
comment = kwargs.get('comment','')
_dataHandle= list()
_data = list()
_dataHandle.append(dataHandle)
_data.append(data)
for i in range(_dataHandle):
# do stuff.
I would like to be able to call my method either by
myMethod('ed', ed.spectra,comment='down welling irradiance')
or by
myMethod(['ed','lu'] , [ed.spectra,lu.spectra] , comments = ['downwelling', upwelling radiance'])
Any help would be greatly appreciated. Might not seem like a big deal to parse ['ed'], but it breaks the consistency of my API so far.
The proper python syntax for a list consisting of a single item is [ 'ed' ].
What you're doing with list('ed') is asking python to convert 'ed' to a list. This is a consistent metaphor in python: when you want to convert something to a string, you say str(some_thing). Any hack you'd use to make list('ed') return a list with just the string 'ed' would break python's internal metaphors.
When python sees list(x), it will try to convert x to a list. If x is iterable, it does something more or less equivalent to this:
def make_list(x):
ret_val = []
for item in x:
ret_val.append(item)
return ret_val
Because your string 'ed' is iterable, python will convert it to a list of length two: [ 'e', 'd' ].
The cleanest idiomatic python in this case might be to have your function accept a variable number of arguments, so instead of this
def my_func(itemList):
...
you'd do this
def my_func(*items):
...
And instead of calling it like this
my_func(['ed','lu','lsky'])
You'd call it like this:
my_func('ed', 'lu', 'lsky')
In this way you can accept any number of arguments, and your API will be nice and clean.
You can ask if your variable is a list:
def my_method(my_var):
if isinstance(my_var, list):
for my_elem in my_var:
# do stuff with my_elem
else: # my_var is not iterable
# do stuff with my_var
EDIT: Another option is to try iterating over it, and if it fails (raises and exception) you assume is a single element:
def my_method(my_var):
try:
for my_elem in my_var:
# do stuff with my_elem
except TypeError: # my_var is not iterable
# do_stuff with my_var
The good thing about this second options is that it will work not only for lists, as the first one, but with anything that is iterable (strings, sets, dicts, etc.)
You do actually need to put your string in a list if you want it to be treated like a list
EDIT
I see that at some point there was a list in front of the string. list, contrary to what you may think, doesn't create a list of one item. It calls __iter__ on the string object and iterates over each item. Thus it makes a list of characters.
Hopefully this makes it clearer:
>>> print(list('abc'))
['a', 'b', 'c']
>>> print(list(('abc',)))
['abc']
list('ed') does not create a list containing a single element, 'ed'. list(x) in general does not create a list containing a single element, x. In fact, if you had been using numbers rather than strings (or anything else non-iterable), this would have been blindingly obvious to you:
>>> list('ed')
['e', 'd']
>>> list(3)
Traceback (most recent call last):
File "<pyshell#0>", line 1, in <module>
list(3)
TypeError: 'int' object is not iterable
>>
So you are in fact passing a list with multiple elements to your method, which is why len is returning greater than 1. It's not returning the length of the first element of the list.
For your method to allow passing either a single item or a list, you'd have to do some checking to see if it's a single item first, and if it is create a list containing it with myVar = [myVar], then run your loop.
However this sort of API is tricky to implement and use, and I would not recommend it. The most natural way to check if you've been given a collection or an item is see if myVar is iterable. However this fails for strings, which are iterable. Strings unfortunately straddle the boundry between a collection and an individual data item; we very very often use them as data items containing a "chunk of text", but they are also collections of characters, and Python allows them to be used as such.
Such an API also is likely to cause you to one day accidentally pass a list that you're thinking of as a single thing and expecting the method to treat it as a single thing. But it's a list, so suddenly the code will behave differently.
It also raises questions about what you do with other data types. A dictionary is not a list, but it can be iterated. If you pass a dictionary as myVar, will it be treated as a list containing a single dictionary, or will it iterate over the keys of the dictionary? How about a tuple? What about a custom class implementing __iter__? What if the custom class implementing __iter__ is trying to be "string-like" rather than "list-like"?
All these questions lead to surprises if the caller guesses/remembers wrongly. Surprises when programming lead to bugs. IMHO, it's better to just live with the extra two characters of typing ([ and ]), and have your API be clean and simple.
I run into this same problem frequently. Building a list from an empty list, as you are doing with the "_dataHandle= list()" line, is common in Python because we don't reserve memory in advance. Therefore, it is often the case that the state of the list will transition from empty, to one element, to multiple elements. As you found, Python treats the indexing different for one element vs. multiple elements. If you can use list comprehension, then the solution can be simple. Instead of:
for i in range(_dataHandle):
use:
for myvar in _dataHandle:
In this case, if there is only one element, the loop only iterates once as you would expect.