Extract elements from a particular list on python - python

Here is the block to analyse:
('images\\Principales\\Screenshot_1.png', '{"categories":[{"name":"abstract_","score":0.00390625},{"name":"outdoor_","score":0.01171875},{"name":"outdoor_road","score":0.41796875}],"description":{"tags":["road","building","outdoor","scene","street","city","sitting","empty","light","view","driving","red","sign","intersection","green","large","riding","traffic","white","tall","blue","fire"],"captions":[{"text":"a view of a city street","confidence":0.83864323826716347}]},"requestId":"73fc14d5-653f-4a0a-a45a-e7a425580361","metadata":{"width":150,"height":153,"format":"Png"},"color":{"dominantColorForeground":"Grey","dominantColorBackground":"Grey","dominantColors":["Grey"],"accentColor":"274A68","isBWImg":false}}')
I need to extract all elements after "description", but i don't know how to do that... (in fact, i need this elements:
"road", "building","outdoor","scene","street","city","sitting","empty","light","view","driving","red","sign","intersection","green","large","riding","traffic","white","tall","blue","fire"
I've been looking for several minutes already, but I do not understand how to do it! I'm a little beginner in learning "lists" element, and I still have a hard time understanding.
The "For" loop returns only 'images\\Principales\\Screenshot_1.png', then the big blocks left ...
Did you have a solution?
Thanks in advence!
EDIT:
Indeed, it is actually JSON! Thanks to the people who helped me :)
To extract the desired elements contained in the second block, I simply proceeded thus:
import json
ElementSeparate= '{"categories":[{"name":"abstract_","score":0.00390625},{"name":"outdoor_","score":0.01171875},{"name":"outdoor_road","score":0.41796875}],"description":{"tags":["road","building","outdoor","scene","street","city","sitting","empty","light","view","driving","red","sign","intersection","green","large","riding","traffic","white","tall","blue","fire"],"captions":[{"text":"a view of a city street","confidence":0.83864323826716347}]},"requestId":"73fc14d5-653f-4a0a-a45a-e7a425580361","metadata":{"width":150,"height":153,"format":"Png"},"color":{"dominantColorForeground":"Grey","dominantColorBackground":"Grey","dominantColors":["Grey"],"accentColor":"274A68","isBWImg":false}'
ElementSeparate = json.loads(ElementSeparate)
for a in ElementSeparate['description']['tags']:
print a

To me it looks like you're trying to parse JSON. You should use the JSON parser for the second element of the array. You'll get back either list or dictionary. Then you'll be able to extract data from "description" key has.
https://docs.python.org/3/library/json.html

Related

Meaning of: urlTags = [t[11:] for t in tags if t.startswith('site_url')]

I'm a novice beginner with python, and I'm trying to understand a piece of code that I can't really get the answer to. Can someone explain how the following works?
urlTags = [t[11:] for t in tags if t.startswith('site_url')]
I know that this is slice notation, but I'm struggling to understand what the
'[t['
and t in tags part is doing. Any explanation would be helpful!
[t[11:] for t in tags if t.startswith('site_url')]
Means for each item in the list tags that starts with site_url, take the string from the 11th index forward
So lets say the list tags is
tags = ['site_url://www.facebook.com', 'blabla', 'site_url://www.amazon.com']
The result will be ['www.facebook.com', 'www.amazon.com']
This is called list comprehension.
It will return a list object that contains individual elements of the list tags if the element in question meets the criteria of the if statement (in this case if the element starts with 'site_url').
Finally the element that will be added in this new list will be sliced, starting from the 11th caracter to its end. Please see Python list slicing rules / documentation for more details.

How can I split integers from string line?

How can I split confirmed value, death value and recovered value. I want to add them to different lists. I tried to isdigit method to find value in line. Also I tried split('":'). I thought I can define value after '":'. But these are not working.
https://api.covid19api.com/total/dayone/country/us
I added all line to textlist from this page.
I just edited question for other users. My problem solved thank you.
The list actually contains a string. You need to parse it and then iterate over it to access the required values from it.
import json
main_list = ['.....']
data_points = json.parse(main_list[0])
confirmed = []
for single_data_point in data_points:
confirmed.append(single_data_point.Confirmed)
print(confirmed)
A similar approach can be taken for any other values needed.
Edit:
On a better look at your source, it looks like the initial data is not in the right JSON format to begin with. Some issues I noticed:
Each object which has a Country value does not have its closing }. This is a bigger issue and needs to be resolved first.
The country object starting from the 2nd object has a ' before the object starting. This should not be the case as well.
I suggest you to look at how you are initially parsing/creating the list.
Since you gave the valid source of your data it becomes pretty simple:
import urllib.request
import json
data = json.load(urllib.request.urlopen("https://api.covid19api.com/total/dayone/country/turkey"))
confirmed=[]
deaths=[]
recovered=[]
for dataline in data:
confirmed.append(dataline["Confirmed"])
deaths.append(dataline["Deaths"])
recovered.append(dataline["Recovered"])
print ("Confirmed:",confirmed)
print ("Deaths:", deaths)
print ("Recovered:",recovered)

How does unicodecsv.DictReader represent a csv file

I'm currently going through the Udacity course on data analysis in python, and we've been using the unicodecsv library.
More specifically we've written the following code which reads a csv file and converts it into a list. Here is the code:
def read_csv(filename):
with open(filename,'rb')as f:
reader = unicodecsv.DictReader(f)
return list(reader)
In order to get my head around this, I'm trying to figure out how the data is represented in the dictionary and the list, and I'm very confused. Can someone please explain it to me.
For example, one thing I don't understand is why the following throws an error
enrollment['cancel_date']
While the following works fine:
for enrollment in enrollments:
enrollments['cancel_date'] = parse_date(enrollment['cancel_date'])
Hopefully this question makes sense. I'm just having trouble visualizing how all of this is represented.
Any help would be appreciated.
Thanks.
I too landed up here for some troubles related to the course and found this unanswered. However I think you already managed it. Anyway answering here so that someone else might find this helpful.
Like we all know, dictionaries can be accessed like
dictionary_name['key']
and likewise
enrollments['cancel_date'] should also work.
But if you do something like
print enrollments
you will see the structure
[{u'status': u'canceled', u'is_udacity': u'True', ...}, {}, ... {}]
If you notice the brackets, it's like a list of dictionaries. You may argue it is a list of list. Try it.
print enrollments[0][0]
You'll get an error! KeyError.
So, it's like a collection of dictionaries. How to access them? Zoom down to any dictionary (rather rows of the csv) by enrollments[n].
Now you have a dictionary. You can now use freely the key.
print enrollments[0]['cancel_date']
Now coming to your loop,
for enrollment in enrollments:
enrollment['cancel_date'] = parse_date(enrollment['cancel_date'])
What this is doing is the enrollment is the dummy variable capturing each of the iterable element enrollments like enrollments[1], enrollments[2] ... enrollments[n].
So every-time enrollment is having a dictionary from enrollments and so enrollment['cancel_date'] works over enrollments['cancel_date'].
Lastly I want to add a little more thing which is why I came to the thread.
What is the meaning of "u" in u'..' ? Ex: u'cancel_date' = u'11-02-19'.
The answer is this means the string is encoded as an Unicode. It is not part of the string, it is python notation. Unicode is a library that contains the characters and symbol for all of the world's languages.
This mainly happens because the unicodecsv package does not take the headache of tracking and converting each item in the csv file. It reads them as Unicode to preserve all characters. Now that's why Caroline and you defined and used parse_date() and other functions to convert the Unicode strings to the desired datatype. This is all a part of the Data Wrangling process.

How to delete a list parameters but not the contents?

I have a problem where the I will get info from an API and the info will be returned like:
[{"market_id":"16","coin":"Dogecoin","code":"DOGE","exchange":"BTC","last_price":"0.00000025","yesterday_price":"0.00000025","change":"0.00","24hhigh":"0.00000026","24hlow":"0.00000025","24hvol":"6.732","top_bid":"0.00000025","top_ask":"0.00000026"}]
That makes it really hard to call the price for example. So I was wondering if there would be a way to get rid of the list [ ] and still keep that data? Thanks a bunch in advance!
Sure -- I'm assuming you've already parsed the JSON -- and since you only want to keep the first element (because there apparently is only one element), you can do:
data = data[0]
import json
data = '[{"market_id":"16","coin":"Dogecoin","code":"DOGE","exchange":"BTC","last_price":"0.00000025","yesterday_price":"0.00000025","change":"0.00","24hhigh":"0.00000026","24hlow":"0.00000025","24hvol":"6.732","top_bid":"0.00000025","top_ask":"0.00000026"}]'
# since the list only contains one element (the dict), we'll access it with [0]
dictionary = json.loads(data)[0]
print dictionary["last_price"]

Python: Scraping ID's from JSON

This question is a bit of an ask, but it's been giving me a headache all day (as I am fairly new to programming).
Basically I have huge list of ID's (named pk's) and I need to get all of them as they are surrounded by other text.
How would I go about retrieving all of the ID's? By the way each ID looks like this:
"pk":12345678
"pk":123456789
The ID is either a 8 or 9 digit number.
Thanks a lot guys, any help would be appreciated!
Editor's note: Asker did post his full json data in a comment to this answer.
ids = [var["pk"]]
where var is the variable of your JSON
If you clarify your JSON a little more I might be able to make this more precise.
I'd just use JSONPath. A simple, but extremely general way to extract all the ids would be this:
>>> from jsonpath import jsonpath
>>> from json import loads
>>> instagram_pop = open("instagram_popular_list.json"), "r").read()
>>> instagram_data = loads(instagram_pop)
>>> jsonpath(instagram_data, '$..id')[:3]
[u'234148392791340801_11305924', u'234098919041318605_2364270', u'234153616185741448_1907035']
Of course, since your data is flat, you can get away with a direct loop, such as:
[item['id'] for item in instagram_data['items']]
but I have a feeling you have more struct parsing to do, so I think jsonpath is a more flexible answer.

Categories

Resources