I have this json that im trying to extract the first element/list in this list of lists in python... How do you extract the first item in ads_list?
Are there any built in functions i can use to extract first item in a json file? I need something other than simply iterating through this like an array....
P.S. i shortened the json data
Here is the list.
{
u'data':{
u'ad_list':[
{
u'data':{
u'require_feedback_score':0,
u'hidden_by_opening_hours':False,
u'trusted_required':False,
u'currency':u'EGP',
u'require_identification':False,
u'is_local_office':False,
u'first_time_limit_btc':None,
u'city':u'',
u'location_string':u'Egypt',
u'countrycode':u'EG',
u'max_amount':u'20000',
u'lon':0.0,
u'sms_verification_required':False,
u'require_trade_volume':0.0,
u'online_provider':u'SPECIFIC_BANK',
u'max_amount_available':u'20000',
u'msg': u" \u2605\u2605\u2605\u2605\u2605 \u0645\u0631\u062d\u0628\u0627 \u2605\u2605\u2605\u2605\u2605\r\n\r\n\u0625\u0630\u0627 \u0643\u0646\u062a \u062a\u0631\u063a\u0628 \u0641\u064a \u0628\u064a\u0639 \u0627\u0648 \u0634\u0631\u0627\u0621 \u0627\u0644\u0628\u062a\u0643\u0648\u064a\u0646 \u062a\u0648\u0627\u0635\u0644 \u0645\u0639\u064a \u0648\u0633\u0623\u0642\u0648\u0645 \u0628\u062e\u062f\u0645\u062a\u0643\r\n\u0644\u0644\u062a\u0648\u0627\u0635\u0644: https: //tawk.to/hanyibrahim\r\n \u0627\u0644\u062e\u064a\u0627\u0631 \u0644\u0644\u062a\u062d\u0648\u064a\u0644: \u0627\u0644\u0628\u0646\u0643 \u0627\u0644\u062a\u062c\u0627\u0631\u064a \u0627\u0644\u062f\u0648\u0644\u064a \u0627\u0648\u0641\u0648\u062f\u0627\u0641\u0648\u0646 \u0643\u0627\u0634 \u0627\u0648 \u0627\u062a\u0635\u0627\u0644\u0627\u062a \u0641\u0644\u0648\u0633 \u0627\u0648 \u0627\u0648\u0631\u0627\u0646\u062c \u0645\u0648\u0646\u064a\r\n\r\n'' \u0634\u0643\u0631\u0627 ''\r\n\r\n\r\n \u2605\u2605\u2605\u2605\u2605 Hello \u2605\u2605\u2605\u2605\u2605\r\n\r\nIf you would like to trade Bitcoins please let me know and I will help you\r\nconnect: https: //tawk.to/hanyibrahim\r\nOption transfer:Bank CIB Or Vodafone Cash Or Etisalat Flous Or Orange Money\r\n'' Thank ''",
u'volume_coefficient_btc':u'1.50',
u'profile':{
u'username':u'hanyibrahim11',
u'feedback_score':100,
u'trade_count':u'3000+',
u'name':u'hanyibrahim11 (3000+; 100%)',
u'last_online': u'2019-01-14T17:54:52+00:00 '}, u' bank_name':u'CIB_Vodafone Cash_Etisalat Flous_Orange Money',
u'trade_type':u'ONLINE_BUY',
u'ad_id':803036,
u'temp_price':u'67079.44',
u'payment_window_minutes':90,
u'min_amount':u'50',
u'limit_to_fiat_amounts':u'',
u'require_trusted_by_advertiser':False,
u'temp_price_usd':u'3738.54',
u'lat':0.0,
u'visible':True,
u'created_at': u'2018-07-25T08:12:21+00:00 ', u' atm_model':None,
u'is_low_risk':True
},
u'actions':{
u'public_view': u'https://localbitcoins.com/ad/803036'
}
},
{
u'data':{
u'require_feedback_score':0,
u'hidden_by_opening_hours':False,
u'trusted_required':False,
u'currency':u'EGP',
u'require_identification':False,
u'is_local_office':False,
u'first_time_limit_btc':None,
u'city':u'',
u'location_string':u'Egypt',
u'countrycode':u'EG',
u'max_amount':u'20000',
u'lon':0.0,
u'sms_verification_required':False,
u'require_trade_volume':0.0,
u'online_provider':u'CASH_DEPOSIT',
u'max_amount_available':u'20000',
u'msg':u'QNB,
CIB deposite- Vodafone Cash - Etisalat Felous - Orange Money - Western Union - Money Gram \r\n- Please do not entiate a new trade request if you are not serious to finalize it.',
u'volume_coefficient_btc':u'1.50',
u'profile':{
u'username':u'Haboush',
u'feedback_score':99,
u'trade_count':u'500+',
u'name':u'Haboush (500+; 99%)',
u'last_online': u'2019-01-14T16:48:52+00:00 '}, u' bank_name':u'QNB\u2714CIB\u2714Vodafone\u2714Orange\u2714Etisalat\u2714WU',
u'trade_type':u'ONLINE_BUY',
u'ad_id':719807,
u'temp_price':u'66860.18',
u'payment_window_minutes':270,
u'min_amount':u'100',
u'limit_to_fiat_amounts':u'',
u'require_trusted_by_advertiser':False,
u'temp_price_usd':u'3726.32',
u'lat':0.0,
u'visible':True,
u'created_at': u'2018-03-24T19:29:08+00:00 ', u' atm_model':None,
u'is_low_risk':True
},
u'actions':{
u'public_view': u'https://localbitcoins.com/ad/719807'
}
},
}
],
u'ad_count':17
}
}
Assuming your data structure is stored in the variable j, you can use j['data']['ad_list'][0] to extract the first item from the ad_list key. Use a try-except block to catch a possible IndexError exception if ad_list can ever be empty.
Related
Good day. I'm using JIRA APIs to get data from JIRA about stories and put it in a dataframe/Excel. There is one particular field "issue.fields.aggregatetimeoriginalestimate" - which can have a "None" type or a "float" value in seconds. Is there a way to dynamically check for this and populate the appropriate value in the Pandas dataframe, using code while the population is going on in a for loop?
Here's what I'm trying to achieve:
jira_issues = jira.search_issues(jql,maxResults=0)
# JSON to pandas DataFrame
issues = pd.DataFrame()
for issue in jira_issues_ncr:
d = {
'Self': issue.self,
'Project': str(issue.fields.project),
'JIRA ID': issue.key,
'Summary': str(issue.fields.summary),
'Original Story Points': str(issue.fields.customfield_15972),
'Story Points': str(issue.fields.customfield_10010),
'Aggregate Orig Estimate (Hrs)': {
if type(issue.fields.aggregatetimeoriginalestimate) != None):
issue.fields.aggregatetimeoriginalestimate/(60.0*60.0)
else:
str(issue.fields.aggregatetimeoriginalestimate)
},
'Original Estimate': str(issue.fields.timeoriginalestimate),
'Remaining Estimate': str(issue.fields.timeestimate),
'Priority': str(issue.fields.priority.name),
# 'Severity': str(issue.fields.customfield_10120),
'Resolution': str(issue.fields.resolution),
'Status': str(issue.fields.status.name),
'Assignee': str(issue.fields.assignee),
'Creator' : str(issue.fields.creator),
'Reporter': str(issue.fields.reporter),
'Created' : str(issue.fields.created),
# 'Found by': str(issue.fields.customfield_11272),
# 'Root cause': str(issue.fields.customfield_10031),
# 'Earliest place to find': str(issue.fields.customfield_11380),
# 'Test Escape Classification': str(issue.fields.customfield_11387),
'Labels': str(issue.fields.labels),
'Components': str(issue.fields.components),
# 'Description': str(issue.fields.description),
# 'FixVersions': str(issue.fields.fixVersions),
'Issuetype': str(issue.fields.issuetype.name),
# 'Resolution_date': str(issue.fields.resolutiondate),
'Updated': str(issue.fields.updated),
# 'Versions': str(issue.fields.versions),
# 'Status_name': str(issue.fields.status.name),
# 'Watchcount': str(issue.fields.watches.watchCount),
}
issues = issues.append(d, ignore_index=True)
Please let me know how this can be achieved inside the for loop, such that:
if the value of the field is not "None", I want to do a calculation (value/(60.0*60.0) and then populate the field "Aggregate Orig Time Estimate (Hrs)" or if it is type "None", then just put the value as is "None" in the data frame? (I guess we could also put a 0.0, if None is found).
I'm a novice in Python so will appreciate any assistance.
When I tried to run this, I get:
d = {
^
SyntaxError: '{' was never closed
The curly brackets { and } are used in Python to define dictionaries. So this part of your code is not valid:
'Aggregate Orig Estimate (Hrs)': {
if type(issue.fields.aggregatetimeoriginalestimate) != None):
issue.fields.aggregatetimeoriginalestimate/(60.0*60.0)
else:
str(issue.fields.aggregatetimeoriginalestimate)
},
You can write it on one line though:
'Aggregate Orig Estimate (Hrs)': issue.fields.aggregatetimeoriginalestimate/(60.0*60.0) if issue.fields.aggregatetimeoriginalestimate else 0.0
From Python 3.8 you can shorten it further with an assigment :=:
'Aggregate Orig Estimate (Hrs)': agg/(60.0*60.0) if (agg:=issue.fields.aggregatetimeoriginalestimate) else 0.0
Can someone help me please. I am doing fetch request and trying to get data using python. But I am getting an error.
import requests
import json
response_API = requests.get('https://newsapi.org/v2/top-headlines?q=sports&country=ru&pageSize=10&apiKey=befce9fd53c04bb695e30568399296c0')
print(response_API.status_code)
data=response_API.text
parse_json=json.loads(response_API)
active_case=parse_json['name']
print('Total results',active_case)
I'm trying to get the name from the following array:
{"status":"ok","totalResults":2,"articles":[{"source":{"id":null,"**name**":"Sports.ru"},"author":"Валерий Левкин","title":"Леброн Джеймс получил «Золотую малину» за худшую актерскую работу - Sports.ru","description":"В США названы обладатели антинаграды «Золотая малина» по итогам 2021 года.","url":"https://www.sports.ru/basketball/1107870293-lebron-dzhejms-poluchil-zolotuyu-malinu-za-xudshuyu-akterskuyu-rabotu.html%22,%22urlToImage%22:%22https://www.sports.ru/dynamic_images/news/110/787/029/3/share/bd571e.jpg%22,%22publishedAt%22:%222022-03-26T13:03:00Z%22,%22content":null}]}
Got error, value is not returned.
The newsapi URL returns JSON content with a list of articles, where each article has this structure:
{
"source": {
"id": null,
"name": "Sports.ru"
},
"author": "...",
"title": "... - Sports.ru",
"description": "...",
"url": "https://www.sports.ru/basketball/1107870293-lebron-dzhejms-poluchil-zolotuyu-malinu-za-xudshuyu-akterskuyu-rabotu.html",
"urlToImage": "https://www.sports.ru/dynamic_images/news/110/787/029/3/share/bd571e.jpg",
"publishedAt": "2022-03-26T13:03:00Z",
"content": null
}
To extract a particular element such as description from each article then try this:
import requests
import json
response = requests.get('https://newsapi.org/v2/top-headlines?q=sports&country=ru&pageSize=10&apiKey=befce9fd53c04bb695e30568399296c0')
print(response.status_code)
response.encoding = "utf-8"
data = response.json()
# to get the name from source of each article
print([article["source"].get("name") for article in data["articles"]])
# to get the descriptions from each article
# where desc will be a list of descriptions
desc = [article["description"] for article in data["articles"]]
print(desc)
Output:
200
['Sports.ru', 'Sports.ru']
['description1', 'description2']
You need to follow the nesting of objects:
First get the key 'articles'
Then get the first element of the list
Then get the key 'source'
Finally get the key 'name'.
You can do this all in a single line with indexes.
slightly different method, but same result using your original technique as the basis. You get a json string, then convert that to json, then search for the bit that you want.
import requests
import json
response_API = requests.get('https://newsapi.org/v2/top-headlines?q=sports&country=ru&pageSize=10&apiKey=befce9fd53c04bb695e30568399296c0')
print(response_API.status_code)
# this is a json string
data=response_API.text
# convert string to json
parse_json=json.loads(data)
print('here is the json....')
print(parse_json)
# get an element form json
active_case=parse_json['articles'][0]
# print the result
print('here is the active case...')
print(active_case)
This is the result, from which you can extract what you like from it:
{'source': {'id': None, 'name': 'Sports.ru'}, 'author': 'Валерий Левкин', 'title': 'Леброн Джеймс получил «Золотую малину» за худшую актерскую работу - Sports.ru', 'description': 'В США названы обладатели антинаграды «Золотая малина» по итогам 2021 года.', 'url': 'https://www.sports.ru/basketball/1107870293-lebron-dzhejms-poluchil-zolotuyu-malinu-za-xudshuyu-akterskuyu-rabotu.html', 'urlToImage': 'https://www.sports.ru/dynamic_images/news/110/787/029/3/share/bd571e.jpg', 'publishedAt': '2022-03-26T13:03:00Z', 'content': None}, {'source': {'id': None, 'name': 'Sports.ru'}, 'author': 'Андрей Карнаухов', 'title': 'Овечкин забил 771-й гол в НХЛ. До Хоу – 30 шайб - Sports.ru', 'description': 'Капитан\xa0«Вашингтона»\xa0Александр Овечкин\xa0забросил\xa0шайбу, а также забил победный буллит в серии в матче с «Баффало» (4:3 Б) и был признан третьей звездой.', 'url': 'https://www.sports.ru/hockey/1107860736-ovechkin-zabil-771-j-gol-v-nxl-do-xou-30-shajb.html', 'urlToImage': 'https://www.sports.ru/dynamic_images/news/110/786/073/6/share/c9cb18.jpg', 'publishedAt': '2022-03-26T01:56:15Z', 'content': None}
Here the result is a simple dict.
I am carrying out a API search on scaleserp and for each search I do i want to put the output in the column in my dataframe.
#Matches the GET request
api_result = requests.get('https://api.scaleserp.com/search', params, verify=False)
print(type(api_result))
#stores the result in JSON
result = api_result.json()
print(type(result))
#Creates a new DataFrame with 'Organic_Results' From the JSON output.
Results_df = (result['organic_results'])
#FOR loop to look at each result and select which output from the JSON is wanted.
for res in Results_df:
StartingDataFrame['JSONDump'] = res
api_result is a requests.models.Response
result is a dict.
RES is a dict.
I want the RES to be put into the column Dump. is this possible?
Updated Code
#Matches the GET request
api_result = requests.get('https://api.scaleserp.com/search', params, verify=False)
#stores the result in JSON
result = api_result.json()
#Creates a new DataFrame with 'Organic_Results' From the JSON output.
Results_df = (result['organic_results'])
#FOR loop to look at each result and select which output from the JSON is wanted.
for res in Results_df:
Extracted_data = {key: res[key] for key in res.keys()
& {'title', 'link', 'snippet_matched', 'date', 'snippet'}}
Extracted_data is a dict and contains the info i need.
{'title': '25 Jun 1914 - Advertising - Trove', 'link': 'https://trove.nla.gov.au/newspaper/article/7280119', 'snippet_matched': ['()', 'charge', 'Dan Whit'], 'snippet': 'I Iron roof, riltibcd II (),. Line 0.139.5. wai at r ar ... Propertb-« entired free of charge. Line 2.130.0 ... AT Dan Whit",\'»\', 6il 02 sturt »L, Prlnce\'»~Brti\'»e,. Line 3.12.0.'}
{'snippet': "Mary Bardwell is in charge of ... I() •. Al'companit'd by: Choppf'd Chitkf'n Li\\f>r Palt·. 1h!iiSC'o Gret'n Salad g iii ... of the overtime as Dan Whit-.", 'title': 'October 16,1980 - Bethlehem Public Library',
'link': 'http://www.bethlehempubliclibrary.org/webapps/spotlight/years/1980/1980-10-16.pdf', 'snippet_matched': ['charge', '()', 'Dan Whit'], 'date': '16 Oct 1980'}
{'snippet': 'CONGRATULATIONS TO DAN WHIT-. TLE ON THE ... jailed and beaten dozens of times. In one of ... ern p()rts ceased. The MIF is not only\xa0...', 'title': 'extensions of remarks - US Government Publishing Office', 'link': 'https://www.gpo.gov/fdsys/pkg/GPO-CRECB-1996-pt5/pdf/GPO-CRECB-1996-pt5-7-3.pdf', 'snippet_matched': ['DAN WHIT', 'jailed', '()'], 'date': '26 Apr 1986'}
{'snippet': 'ILLUSTRATION BY DAN WHIT! By Matt Manning ... ()n the one hand, there are doctors on both ... self-serving will go to jail at the beginning of\xa0...', 'title': 'The BG News May 23, 2007 - ScholarWorks#BGSU - Bowling ...', 'link': 'https://scholarworks.bgsu.edu/cgi/viewcontent.cgi?article=8766&context=bg-news', 'snippet_matched': ['DAN WHIT', '()', 'jail'], 'date': '23 May 2007'}
{'snippet': '$19.95 Charge card number SERVICE HOURS: ... Explorer Advisor Dan Whit- ... lhrr %(OnrwflC or ()utuflrueonlinelfmarketing (arnpaigfl%? 0I - .',
'title': '<%BANNER%> TABLE OF CONTENTS HIDE Section A: Main ...', 'link': 'https://ufdc.ufl.edu/UF00028295/00194', 'snippet_matched': ['Charge', 'Dan Whit', '()'], 'date': 'Listings 1 - 800'}
{'title': 'Lledo Promotional,Bull Nose Morris,Dandy,Desperate Dan ...', 'link': 'https://www.ebay.co.uk/itm/Lledo-Promotional-Bull-Nose-Morris-Dandy-Desperate-Dan-White-Van-/233817683840', 'snippet_matched': ['charges'], 'snippet': 'No additional import charges on delivery. This item will be sent through the Global Shipping Programme and includes international tracking. Learn more- opens\xa0...'}
The problem looks like the length of your organic_results is not the same as the length of your dataframe.
StartingDataFrame['Dump'] = (result['organic_results'])
Here your setting the whole column dump to equal the organic_results which is smaller or larger than your already defined dataframe. I'm not sure what your dataframe already has but you could do this by iterating through the rows and stashing them in that row if you have values you want them to add up with like this:
StartingDataFrame['Dump'] = []*len(StartingDataFrame)
for i,row in StartingDataFrame.iterrows():
StartingDataFrame.at[i,'Dump'] = result['organic_results']
Depending on what your data looks like you could maybe just append it to the dataframe
StartingDataFrame = StartingDataFrame.append(result['organic_results'],ignore_index=True)
Could you show us a sample of what both data source looks like?
I have a collection of about 1.4 million tweets in a MongoDB collection. I want to find all that are NOT retweets, and am using Python. The structure of a document is as follows:
{
'_id': ObjectId('59388c046b0c1901172555b9'),
'coordinates': None,
'created_at': datetime.datetime(2016, 8, 18, 17, 17, 12),
'geo': None,
'is_quote': False,
'lang': 'en',
'text': b'Adam Cole Praises Kevin Owens + A Preview For Next Week\xe2\x80\x99s',
'tw_id': 766323071976247296,
'user_id': 2231233110,
'user_lang': 'en',
'user_loc': 'main; #Kan1shk3',
'user_name': 'sheezy0',
'user_timezone': 'Chennai'
}
I can write a query that works to find the particular tweet from above:
twitter_mongo_collection.find_one({
'text': b'Adam Cole Praises Kevin Owens + A Preview For Next Week\xe2\x80\x99s'
})
But when I try to find retweets, my code doesn't work, for example I try to find any tweets that start like this:
'text': b'RT some tweet'
Using this query:
find_one( {'text': {'$regex': "/^RT/" } } )
It doesn't return an error, but it doesn't find anything. I suspect it has something to do with that 'b' at the beginning before the text starts. I know I also need to put '$not:' in there somewhere but am not sure where.
Thanks!
It looks like your regex search is trying to match the string
b'RT'
but you want to match strings like
b'RT some text afterwards'
try using this regex instead
find_one( {'text': {'$regex': "/^RT.*/" } } )
I had to decode the 'text' field that was encoded as binary. Then I was able to use
twitter_mongo_collection.find_one( { {'text': { '$not': re.compile("^RT.*") } } )
to find all the documents that did not start with "RT".
I am using the python-linkedin library to access the LinkedIn api, with the primary purpose of retrieving data from specific company pages. I can successfully identify the company ID and retrieve some information, but the problem is that the response does not contain State or Country information.
The official docs show that the response should contain
- locations:(address:(state))
- locations:(address:(country-code))
...but this is not the case. Even the official examples of the XML response, no state or country data is shown:
<location>
<address>
<street1>30 S. Wacker Drive</street1>
<city>Chicago</city>
<postal-code>60606</postal-code>
</address>
<contact-info>
</contact-info>
</location>
I have gone through a bunch of test cases, and every time the company page has included a state and country value, but the response does not include this data.
My test case, on LinkedIn, and via python-linkedin:
>>>company = auth.get_companies(company_ids=['834495'], selectors=['id','name','locations'])
>>>company {u'_total': 1, u'values': [
{
u'_key': u'834495',
u'id': 834495,
u'name': u'RingLead, Inc.',
u'locations': {
u'_total': 2, u'values': [
{
u'contactInfo':{
u'fax': u'',
u'phone1': u'888-240-8088'
},
u'address': {
u'postalCode': u'11743',
u'city': u'Huntington',
u'street1': u'205 East Main Street'
}
},
{
u'contactInfo': {
u'fax': u'',
u'phone1': u''
},
u'address': {
u'postalCode': u'89117',
u'city': u'Las Vegas',
u'street1': u'3080 South Durango, Ste.102'
}
}
]
}
}
]
}
Is this a design choice by LinkedIn, or is it possible to update the API to provide this information in the response?
Enter in the locations selector the following field -
{'locations' : {'address' : 'country-code'}}
In total a selector field would look like this:
selectors=[{'companies': [{'locations' : {'address' : 'country-code'}}, 'name', 'universal-name', 'website-url']}]
It's amazing how LinkedIn's documentation is bad, as if they're trying to make it hard on the developers.. smh