I am working on an example but stuck with some points. I am a beginner for python and try to improve myself. I am trying to use openweather api and get some data from it and then write these data to a csv file. The aim of my code is input a txt file which contains city names, I want to get City name, Country code, lat long, Temperature, Wind speed, Wind direction. and then write them to a csv file. I can input the txt file or get the data with input from the command line but can not do both. And also I want to write the data to a csv file. Could you please help me? I can write it to the console, but I need to write them to the csv file. But, I can not convert my json object to csv
My input.txt
Los Angeles
San Francisco
...
My code:
import requests
from pprint import pprint
import csv
import pandas as pd
file = input("Input the filepath: ")
with open(file) as f:
for line in f:
line = line.strip()
API_key = "MYAPIKEY"
base_url = "http://api.openweathermap.org/data/2.5/weather?"
headers = {'content-type': 'application/json'}
city_name = line
Final_url = base_url + "appid=" + API_key + "&q=" + city_name
weather_data = requests.get(Final_url).json()
print("\nCurrent Weather" + city_name + ":\n")
weather_data = requests.get(Final_url, headers=headers)
f = open('weather_data_file.csv', "w")
f.write(weather_data.text)
f.close()
print(f)
The problem after edit:
The CSV file just contains the last city data and data is not in a
proper form if I open with excel
The data it outputs:
{"coord":{"lon":-122.42,"lat":37.77},"weather":[{"id":802,"main":"Clouds","description":"scattered clouds","icon":"03d"}],"base":"stations","main":{"temp":284.74,"feels_like":280.59,"temp_min":283.15,"temp_max":286.48,"pressure":1024,"humidity":76},"visibility":10000,"wind":{"speed":5.1,"deg":260},"clouds":{"all":40},"dt":1609003065,"sys":{"type":1,"id":5817,"country":"US","sunrise":1608996226,"sunset":1609030632},"timezone":-28800,"id":5391959,"name":"San Francisco","cod":200}
To write your JSON file to a CSV file:
import pandas as pd
if __name__ == "__main__":
df = pd.read_json (r'ajsonfile.json')
df.to_csv (r'filename.csv', index = None)
To write your JSON data from an API:
# Get JSON Data
weather_data = requests.get(yourURL, headers=yourHeaders)
# Write to .CSV
f = open('weather_data_file.csv', "w")
f.write(weather_data.text)
f.close()
JavaScript Object Notation (JSON) is a serialized representation of data. pandas.read_json tries to decode the JSON and build a dataframe from it. After you've read the the data with requests and deserialized it into python with the .json() call, its not JSON anymore and pandas.read_json won't work.
Sometimes you can build a dataframe directly from the python object, but in this case you've got an additional problem. You are only asking for one row of data at a time (one city) and its information is nested in multiple dictionaries. You can use python to flatten the received city data into the subset of data you want. And since you are only working row by row anyway, use the csv module to write the rows as you go.
A solution is
import requests
from pprint import pprint
import csv
openweathermap_city_csv_header = ["City Name", "Country Code", "Lat", "Long", "Temperature",
"Wind Speed", "Wind Direction"]
def openweathermap_city_flatten_record(record):
coord = record["coord"]
wind = record["wind"]
return { "City Name":record["name"],
"Country Code":record["Code"],
"Lat":coord["lat"],
"Long":coord["lon"],
"Temperature":record["main"]["temp"],
"Wind Speed":wind["speed"],
"Wind Direction":wind["deg"] }
file = input("Input the filepath: ")
with open(file) as cities, open('weather_data_file.csv', "w") as outfile:
writer = csv.DictWriter(outfile, openweathermap_city_csv_header)
for line in f:
line = line.strip()
API_key = "MYAPIKEY"
base_url = "http://api.openweathermap.org/data/2.5/weather?"
headers = {'content-type': 'application/json'}
city_name = line
Final_url = base_url + "appid=" + API_key + "&q=" + city_name
weather_data = requests.get(Final_url, headers=headers).json()
print("\nCurrent Weather" + city_name + ":\n")
writer.writerow(openweathermap_city_flatten_record(weather_data))
Related
How do I make below working code to iterate over data["#odata.nextLink"] and append the data["value"] to sample.json file?
import requests
import json
import datetime
def get_data():
bearerAccessToken = '*************'
now = datetime.datetime.now()-datetime.timedelta(days=10)
dt_string = now.strftime("%Y-%m-%dT%H:%M:%S-04:00")
print(dt_string)
resourceUrl = "https://retsapi.raprets.com/CIN/RESO/OData/Property?Class=Residential&$count=true"
query_params = {"$filter":"ModificationTimestamp ge "+dt_string}
print(query_params)
r = requests.get(resourceUrl, params=query_params, headers={'Authorization' : 'Bearer '+ bearerAccessToken})
data = r.json()
with open("sample.json", "w") as outfile:
json.dump(data["value"], outfile)
print(data["#odata.nextLink"])
get_data()
I'm collecting tweets from Twitter's API. My code is returning a string which I have transformed into a dictionary. I am looking to create a CSV where I store this data by creating columns. I have attached an image of my CSV currently looks like.
current CSV image:
.
What suggestions do you suggest for creating something like the following;
desired outcome:
with open('dict.csv', 'w') as csv_file:
writer = csv.writer(csv_file)
for key, value in y.items():
writer.writerow([key, value])
#with open('my_file.csv', 'w') as f:
# [f.write('{0},{1}\n'.format(key, value)) for key, value in y.items()]
Full code:
import requests
import os
import json
import pandas as pd
import csv
import sys
import time
bearer_token = "insert here"
search_url = "https://api.twitter.com/2/tweets/search/all"
query_params = {'query': '(Johnson&Johnson) -is:retweet -is:verified -baby -lotion -shampoo','tweet.fields': 'text', 'tweet.fields':'created_at', 'start_time':'2021-01-20T00:00:01.000Z', 'end_time':'2021-02-17T23:30:00.000Z'}
#query_params={'query':'(vaccine OR vaccinated) -is:retweet -is:verified -RT -baby -lotion -shampoo&start_time=2021-01-20T00:00:01.000Z&end_time=2021-02-20T23:30:00.000Z&max_results=10&tweet.fields=author_id,conversation_id,created_at,geo,id,lang,source,text&expansions=author_id&place.fields=full_name&user.fields=created_at,description,entities,id,location,name,url,username'}
def create_headers(bearer_token):
headers = {"Authorization": "Bearer {}".format(bearer_token)}
return headers
def connect_to_endpoint(url, headers, params):
response = requests.request("GET", search_url, headers=headers, params=params)
print('first:', response.status_code)
if response.status_code != 200:
raise Exception(response.status_code, response.text)
return response.json()
def main():
headers = create_headers(bearer_token)
json_response = connect_to_endpoint(search_url, headers, query_params)
x = json.dumps(json_response,sort_keys=True)
y = json.loads(x)
if __name__ == "__main__":
main()
Try Using DictWriter,
import csv
with open(csv_file, 'w') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=csv_columns)
writer.writeheader()
for data in dict_data:
writer.writerow(data)
For more info refer below link,
How to save a Python Dictionary to a CSV File?
I have a basic bs4 web scraper, There are no issues in getting my scrape data, but when I try to write it to a .csv file, I got some problems. I am unable to write my data to more than one column. In the tutorial I kinda follow, he can separate rows with "," easily but when I open my CSV with excel, neither in the header nor in data there is a separation, what am I missing?
import requests
from bs4 import BeautifulSoup
url="myurl"
page=requests.get(url)
soup=BeautifulSoup(page.content,'html.parser')
items=soup.find_all('a', class_='listing-card')
filename = 'data.csv'
f = open(filename, "w")
header = "name, price\n"
f.write(header)
for item in items:
title = item.find('span', class_='title').text
price = item.find('span', class_='price').text
f.write(title.replace(",","|") + ',' + price + "\n")
f.close()
Another method.
from simplified_scrapy import SimplifiedDoc, utils, req
url = "myurl"
html = req.get(url)
rows = []
rows.append(['name', 'price']) # Add header
doc = SimplifiedDoc(html)
items = doc.getElements('a', attr='class', value='listing-card') # Get all nodes a according to the class
for item in items:
title = item.getElement('span', value='title').text
price = item.getElement('span', value='price').text
rows.append([title, price])
utils.save2csv('data.csv', rows) # Save to CSV file
Here are more examples: https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples
I have found that the easiest way to get your data into a CSV file is to put the data into a pandas DataFrame then use the to_csv method to write the file.
Using your example the code would be as follows:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url="myurl"
page=requests.get(url)
soup=BeautifulSoup(page.content,'html.parser')
items=soup.find_all('a', class_='listing-card')
filename = 'data.csv'
f = open(filename, "w")
header = "name, price\n"
f.write(header)
#
# Create an empty list to store entries
mylist = []
for item in items:
title = item.find('span', class_='title').text
price = item.find('span', class_='price').text
#
# Create the dictionary item to be appended to the list
entry = {'name' : title, 'price' : price}
mylist.append(entry)
myDataframe = pd.DataFrame(mylist)
myDataframe.to_csv('CSV_file.csv')
I'm attempting to scrape multiple websites for specific products and I'm sure there is a way to optimize my code. As of right now, the code does it's job but this is really not the Pythonic way to go about it(I am a Python novice so please excuse my lack of knowledge).
The goal of this program is to get the prices of the products from the URLs provided and write them to a .csv file. Each website has a different structure, but I am always using the same 3 websites. This is an example of my current code:
import requests
import csv
import io
import os
from datetime import datetime
from bs4 import BeautifulSoup
timeanddate=datetime.now().strftime("%Y%m%d-%H%M%S")
folder_path =
'my_folder_path'
file_name = 'product_prices_'+timeanddate+'.csv'
full_name = os.path.join(folder_path, file_name)
with io.open(full_name, 'w', newline='', encoding="utf-8") as file:
writer = csv.writer(file)
writer.writerow(["ProductTitle", "Website1", "Website2", "Website3"])
#---Product 1---
#Website1 price
website1product1 = requests.get('website1product1URL')
website1product1Data = BeautifulSoup(website1product1.text, 'html.parser')
website1product1Price = website1product1Data.find('div', attrs={'class': 'price-final'}).text.strip()
print(website1product1Price)
#Website2 price
website2product1 = requests.get('website2product1URL')
website2product1Data = BeautifulSoup(website2product1.text, 'html.parser')
website2product1Price = website2product1Data.find('div', attrs={'class': 'price_card'}).text.strip()
print(website2product1Price)
#Website3 price
website3product1 = requests.get('website3product1URL')
website3product1Data = BeautifulSoup(website3product1.text, 'html.parser')
website3product1Price = website3product1Data.find('strong', attrs={'itemprop': 'price'}).text.strip()
print(website3product1Price)
writer.writerow(["ProductTitle", website1product1Price, website2product1Price, website3product1Price])
file.close()
It saves the ProductTitles and Prices to a .csv in this format and I'd like to keep this format:
#Header
ProductTitle Website1 Website2 Website3
#Scraped data
Product1 $23 $24 $52
This is manageable for a few products, but I'd like to have hundreds and copying the same lines of code and changing variable names is confusing, tedious and is bound to be riddled with human error.
Can I create a function that takes 3 URLs as arguments and outputs the website1product1Price, website2product1Price and website2product1Price, and call that function once per product? Can it then be wrapped in a loop to go through a list of URLs and still keep the original formatting?
Any help is appreciated.
Is this could be a solution for you?
Admitting you have an array of dict for your product:
products = [
{
'name': 'product1',
'url1': 'https://url1',
'url2': 'https://url2',
'url3': 'https://url3'
}
]
Your code could be something like this:
import requests
import csv
import io
import os
from datetime import datetime
from bs4 import BeautifulSoup
def get_product_prices(product):
#---Product 1---
#Website1 price
website1product1 = requests.get(product['url1'])
website1product1Data = BeautifulSoup(website1product1.text, 'html.parser')
website1product1Price = website1product1Data.find('div', attrs={'class': 'price-final'}).text.strip()
#Website2 price
website2product1 = requests.get(product['url2'])
website2product1Data = BeautifulSoup(website2product1.text, 'html.parser')
website2product1Price = website2product1Data.find('div', attrs={'class': 'price_card'}).text.strip()
#Website3 price
website3product1 = requests.get(product['url3'])
website3product1Data = BeautifulSoup(website3product1.text, 'html.parser')
website3product1Price = website3product1Data.find('strong', attrs={'itemprop': 'price'}).text.strip()
return website1product1Price, website2product1Price, website3product1Price
timeanddate=datetime.now().strftime("%Y%m%d-%H%M%S")
folder_path =
'my_folder_path'
file_name = 'product_prices_'+timeanddate+'.csv'
full_name = os.path.join(folder_path, file_name)
with io.open(full_name, 'w', newline='', encoding="utf-8") as file:
writer = csv.writer(file)
writer.writerow(["ProductTitle", "Website1", "Website2", "Website3"])
for product in products:
price1, price2, price3 = get_product_prices(product)
write.writerow(product['name'], price1, price2, price3)
file.close()
You can create a function and pass everything as parameter like url, tag_name , attribute_name and attribute_value.see if this help.
def price_text(url_text,ele_tag,ele_attr,attrval):
website1product1 = requests.get(url_text)
website1product1Data = BeautifulSoup(website1product1.text, 'html.parser')
website1product1Price=website1product1Data.find("'" + ele_tag + "'", attrs="{'" + ele_attr + "': '" + attrval + "'}").text.strip()
print(website1product1Price)
website1product1Price=price_text("url","div","class","price-final")
website1product2Price=price_text("url","div","class","price_card")
website1product3Price=price_text("url","strong","itemprop","price")
I scraped a site for data and I was able to print the desired output with json format containing only value but what i actually needed is to get the data with both key and value pair and save it into output.json format so I can insert into my django database. Here is what I have done so far
import requests
import json
URL ='http://tfda.go.tz/portal/en/trader_module/trader_module/getRegisteredDrugs_products'payload = "draw=1&columns%5B0%5D%5Bdata%5D=no&columns%5B0%5D%5Bname%5D=&columns%5B0%5D%5Bsearchable%5D=True&columns%5B0%5D%5Borderable%5D=True&columns%5B0%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B0%5D%5Bsearch%5D%5Bregex%5D=False&columns%5B1%5D%5Bdata%5D=certificate_no&columns%5B1%5D%5Bname%5D=&columns%5B1%5D%5Bsearchable%5D=True&columns%5B1%5D%5Borderable%5D=True&columns%5B1%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B1%5D%5Bsearch%5D%5Bregex%5D=False&columns%5B2%5D%5Bdata%5D=brand_name&columns%5B2%5D%5Bname%5D=&columns%5B2%5D%5Bsearchable%5D=True&columns%5B2%5D%5Borderable%5D=True&columns%5B2%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B2%5D%5Bsearch%5D%5Bregex%5D=False&columns%5B3%5D%5Bdata%5D=classification_name&columns%5B3%5D%5Bname%5D=&columns%5B3%5D%5Bsearchable%5D=True&columns%5B3%5D%5Borderable%5D=True&columns%5B3%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B3%5D%5Bsearch%5D%5Bregex%5D=False&columns%5B4%5D%5Bdata%5D=common_name&columns%5B4%5D%5Bname%5D=&columns%5B4%5D%5Bsearchable%5D=True&columns%5B4%5D%5Borderable%5D=True&columns%5B4%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B4%5D%5Bsearch%5D%5Bregex%5D=False&columns%5B5%5D%5Bdata%5D=dosage_form&columns%5B5%5D%5Bname%5D=&columns%5B5%5D%5Bsearchable%5D=True&columns%5B5%5D%5Borderable%5D=True&columns%5B5%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B5%5D%5Bsearch%5D%5Bregex%5D=False&columns%5B6%5D%5Bdata%5D=product_strength&columns%5B6%5D%5Bname%5D=&columns%5B6%5D%5Bsearchable%5D=True&columns%5B6%5D%5Borderable%5D=True&columns%5B6%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B6%5D%5Bsearch%5D%5Bregex%5D=False&columns%5B7%5D%5Bdata%5D=registrant&columns%5B7%5D%5Bname%5D=&columns%5B7%5D%5Bsearchable%5D=True&columns%5B7%5D%5Borderable%5D=True&columns%5B7%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B7%5D%5Bsearch%5D%5Bregex%5D=False&columns%5B8%5D%5Bdata%5D=registrant_country&columns%5B8%5D%5Bname%5D=&columns%5B8%5D%5Bsearchable%5D=True&columns%5B8%5D%5Borderable%5D=True&columns%5B8%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B8%5D%5Bsearch%5D%5Bregex%5D=False&columns%5B9%5D%5Bdata%5D=manufacturer&columns%5B9%5D%5Bname%5D=&columns%5B9%5D%5Bsearchable%5D=True&columns%5B9%5D%5Borderable%5D=True&columns%5B9%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B9%5D%5Bsearch%5D%5Bregex%5D=False&columns%5B10%5D%5Bdata%5D=manufacturer_country&columns%5B10%5D%5Bname%5D=&columns%5B10%5D%5Bsearchable%5D=True&columns%5B10%5D%5Borderable%5D=True&columns%5B10%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B10%5D%5Bsearch%5D%5Bregex%5D=False&columns%5B11%5D%5Bdata%5D=expiry_date&columns%5B11%5D%5Bname%5D=&columns%5B11%5D%5Bsearchable%5D=True&columns%5B11%5D%5Borderable%5D=True&columns%5B11%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B11%5D%5Bsearch%5D%5Bregex%5D=False&columns%5B12%5D%5Bdata%5D=id&columns%5B12%5D%5Bname%5D=&columns%5B12%5D%5Bsearchable%5D=True&columns%5B12%5D%5Borderable%5D=True&columns%5B12%5D%5Bsearch%5D%5Bvalue%5D=&columns%5B12%5D%5Bsearch%5D%5Bregex%5D=False&order%5B0%5D%5Bcolumn%5D=0&order%5B0%5D%5Bdir%5D=asc&start=0&length=3911&search%5Bvalue%5D=&search%5Bregex%5D=False"
with requests.Session() as s:
s.headers={"User-Agent":"Mozilla/5.0"}
s.headers.update({'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8'})
res = s.post(URL, data = payload)
for data in res.json()['data']:
serial = data['no']
certno = data['certificate_no']
brndname = data['brand_name']
clssification = data['classification_name']
common_name = data['common_name']
dosage_form = data['dosage_form']
expiry_date = data['expiry_date']
manufacturer = data['manufacturer']
manufacturer_country = data['manufacturer_country']
product_strength = data['product_strength']
registrant = data['registrant']
registrant_country = data['registrant_country']
output = (serial,certno,brndname,clssification,common_name,dosage_form,expiry_date,manufacturer, manufacturer_country,product_strength,registrant, registrant_country )
my_list = output
json_str = json.dumps(my_list)
print (json_str)
And here is my attached output screenshot
So how do I approach this?
Use json.dump
with open(path, 'w') as file:
[...]
json.dump(myPythonList, file)
file.write('\n')