Listing JSON fields in Python - python

This is my first go at using JSON in Python.
For example, say that I had a JSON file that lists employees first and last names.
How would I go about listing the first names of all the employees?
I can get it to display the first name for one person:
import json
json_data = open('app.json')
data = json.load(json_data)
print data['employees'][0]['firstname']
So I then tried two ways to list all the first names, both error:
print data['employees']['firstname']
print data['employees'][0:]['firstname']

You can use a list comprehension to extract all the first names:
print [emp['firstname'] for emp in data['employees']]
or use an explicit loop, printing each name separately:
for emp in data['employees']:
print emp['firstname']

Related

Read the first Json Element in a Pythonscript

i need the first json object´s name (here in this examole its "object") as a String. The Json looks like this:
{'object':{'a': ['123', '234', '345'], 'b' : '1234'}}
but the objects name switches randomly with the user input. So I need to read the first element of the Json file like list[0] with lists.
Assuming you have a JSON string data
import json
jsondict = json.loads(data)
first = list(jsondict)[0]
This will give you the first object's name.

Declaring a new variable with a string name

I have this code that iterates a text file, and each line will contain a companys name. I then want the loop to make a list with the name of that specific line.
So let's say I have a file with: Volvo, Audi, BMW
Then the program will do something like the following:
for line in textfile:
line = []
So now I should have three empty lists: Volvo, Audi and BMW.
I'm sorry if this was confusing, what I'm really trying to ask is,
is it possible to initialize a variable with a string name?
lets say I have a string car = "volvo". Can I make a new variable, for example a list, with the name of the car object?
What you need is a dictionary, so you can do:
my_companies = {}
for line in textfile:
my_companies[line] = []
Now you can access your lists by the company names of a single dictionary
examples:
print my_companies["Volvo"]
print my_companies["Audi"]
print my_companies["BMW"]

Using BeautifulSoup to find a tag and evaluate whether it fits some criteria

I am writing a program to extract text from a website and write it into a text file. Each entry in the text file should have 3 values separated by a tab. The first value is hard-coded to XXXX, the 2nd value should initialize to the first item on the website with , and the third value is the next item on the website with a . The logic I'm trying to introduce is looking for the first and write the associated string into the text file. Then find the next and write the associated string into the text file. Then, look for the next p class. If it's "style4", start a new line, if it's another "style5", write it into the text file with the first style5 entry but separated with a comma (alternatively, the program could just skip the next style5.
I'm stuck on the part of the program in bold. That is, getting the program to look for the next p class and evaluate it against style4 and style5. Since I was having problems with finding and evaluating the p class tag, I chose to pull my code out of the loop and just try to accomplish the first iteration of the task for starters. Here's my code so far:
import urllib2
from bs4 import BeautifulSoup
soup = BeautifulSoup(urllib2.urlopen('http://www.kcda.org/KCDA_Awarded_Contracts.htm').read())
next_vendor = soup.find('p', {'class': 'style4'})
print next_vendor
next_commodity = next_vendor.find_next('p', {'class': 'style5'})
print next_commodity
next = next_commodity.find_next('p')
print next
I'd appreciate any help anybody can provide! Thanks in advance!
I am not entirely sure how you are expecting your output to be. I am assuming that you are trying to get the data in the webpage in the format:
Alphabet \t Vendor \t Category
You can do this:
# The basic things
import urllib2
from bs4 import BeautifulSoup
soup = BeautifulSoup(urllib2.urlopen('http://www.kcda.org/KCDA_Awarded_Contracts.htm').read())
Get the td of interest:
table = soup.find('table')
data = table.find_all('tr')[-1]
data = data.find_all('td')[1:]
Now, we will create a nested output dictionary with alphabets as the keys and an inner dict as the value. The inner dict has vendor name as key and category information as it's value
output_dict = {}
current_alphabet = ""
current_vendor = ""
for td in data:
for p in td.find_all('p'):
print p.text.strip()
if p.get('class')[0] == 'style6':
current_alphabet = p.text.strip()
vendors = {}
output_dict[current_alphabet] = vendors
continue
if p.get('class')[0] == 'style4':
print "Here"
current_vendor = p.text.strip()
category = []
output_dict[current_alphabet][current_vendor] = category
continue
output_dict[current_alphabet][current_vendor].append(p.text.strip())
This gets the output_dict in the format:
{ ...
u'W': { u'WTI - Weatherproofing Technologies': [u'Roofing'],
u'Wenger Corporation': [u'Musical Instruments and Equipment'],
u'Williams Scotsman, Inc': [u'Modular/Portable Buildings'],
u'Witt Company': [u'Interactive Technology']
},
u'X': { u'Xerox': [u"Copiers & MFD's", u'Printers']
}
}
Skipping the earlier parts for brevity. Now it is just a matter of accessing this dictionary and writing out to a tab separated file.
Hope this helps.
Agree with #shaktimaan. Using a dictionary or list is a good approach here. My attempt is slightly different.
import requests as rq
from bs4 import BeautifulSoup as bsoup
import csv
url = "http://www.kcda.org/KCDA_Awarded_Contracts.htm"
r = rq.get(url)
soup = bsoup(r.content)
primary_line = soup.find_all("p", {"class":["style4","style5"]})
final_list = {}
for line in primary_line:
txt = line.get_text().strip().encode("utf-8")
if txt != "\xc2\xa0":
if line["class"][0] == "style4":
key = txt
final_list[key] = []
else:
final_list[key].append(txt)
with open("products.csv", "wb") as ofile:
f = csv.writer(ofile)
for item in final_list:
f.writerow([item, ", ".join(final_list[item])])
For the scrape, we isolate style4 and style5 tags right away. I did not bother going for the style6 or the alphabet headers. We then get the text inside each tag. If the text is not a whitespace of sorts (this is all over the tables, probably obfuscation or bad mark-up), we then check if it's style4 or style5. If it's the former, we assign it as a key to a blank list. If it 's the latter, we append it to the blank list of the most recent key. Obviously the key changes every time we hit a new style4 only so it's a relatively safe approach.
The last part is easy: we just use ", ".join on the value part of the key-value pair to concatenate the list as one string. We then write it to a CSV file.
Due to the dictionary being unsorted, the resulting CSV file will not be sorted alphabetically. Screenshot of result below:
Changing it to a tab-delimited file is up to you. That's simple enough. Hope this helps!

How to Convert from String to Tuple in python?

When I am retrieving a record from DB getting the record as below
('("2014-02-21 07:10:40",ManualNo,184,vsp,AP10123456,aaaaa,Coconut-Na,5,10)',)
and I need to get the data as tuple like:
("2014-02-21 07:10:40",ManualNo,184,vsp,AP10123456,aaaaa,Coconut-Na,5,10)
without using split function and then want to get the individual values from it.
like
record[0] = 2014-02-21 07:10:40
record[1] = ManualNo
and so on...
You can simply split the string over the comma:
data = ('("2014-02-21 07:10:40",ManualNo,184,vsp,AP10123456,aaaaa,Coconut-Na,5,10)',)
record = data[0].lstrip('(').rstrip(')').split(',')

In python, how to break list of strings into list of list of strings?

I am running a server with cherrypy and python script. Currently, there is a web page containing data of a list of people, which i need to get. The format of the web page is as follow:
www.url1.com, firstName_1, lastName_1
www.url2.com, firstName_2, lastName_2
www.url3.com, firstName_3, lastName_3
I wish to display the list of names on my own webpage, with each name hyperlinked to their corresponding website.
I have read the webpage into a list with the following method:
#cherrypy.expose
def receiveData(self):
""" Get a list, one per line, of currently known online addresses,
separated by commas.
"""
method = "whoonline"
fptr = urllib2.urlopen("%s/%s" % (masterServer, method))
data = fptr.readlines()
fptr.close()
return data
But I don't know how to break the list into a list of lists at where the comma are. The result should give each smaller list three elements; URL, First Name, and Last Name. So I was wondering if anyone could help.
Thank you in advance!
You can iterate over fptr, no need to call readlines()
data = [line.split(', ') for line in fptr]
You need the split(',') method on each string:
data = [ line.split(',') for line in fptr.readlines() ]
lists = []
for line in data:
lists.append([x.strip() for x in line.split(',')])
If you data is a big 'ole string (potentially with leading or trailing spaces), do it this way:
lines=""" www.url1.com, firstName_1, lastName_1
www.url2.com, firstName_2 , lastName_2
www.url3.com, firstName_3, lastName_3 """
data=[]
for line in lines.split('\n'):
t=[e.strip() for e in line.split(',')]
data.append(t)
print data
Out:
[['www.url1.com', 'firstName_1', 'lastName_1'], ['www.url2.com', 'firstName_2',
'lastName_2'], ['www.url3.com', 'firstName_3', 'lastName_3']]
Notice the leading and trailing spaces are removed.

Categories

Resources