Python request not retrieving data from API, when the URLhas Japanese characters - python

I am using python3.7 and running my code in Unix environment.
In my code, I have to hit some api and retrieve the data in json format, however when my source data has the japenese or non ascii characters, then it is not able to get the data form the request. Same api call when i make through the postman, it is returning me data.
Do i need to make any encoding changes if i have non -ascii characters in the api request?
bash-4.2$ more sourcefile.csv
"ひとみ","Abràmoff","70141558"
import requests
import csv
with open('sourcefile.csv', "r" ) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
for lines in csv_reader:
FRST_NM = (lines[0])
LST_NM = (lines[1])
ID = (lines[2])
URL2 ='<base url>?filter=(equals(FirstName,"' + FRST_NM + '"))and(equals(LastName,"' + LST_NM + '"))and(equals(ID,"' + ID + '"))'
full_data = requests.get(URL2)
print(full_data.json())
This is returning [], however it should return the data.

You can use BeautifulSoup https://www.dataquest.io/blog/web-scraping-beautifulsoup/
from that you can extract the information .

Related

How to insert data in CSV file?

I have a problem with my Python script.
Basically what my script does is first it does a GET request to our API and extracts all ID's from the endpoint and then saves it in a CSV file.
Atm im having problems inserting data in the csv file. What i want is for my csv file to look like this after inserting data in:
id
1
2
3
...
Basically I want every id in their own row.
But what ends up being inserted is this:
id
1,2,3,...
I have tried for looping and few other things and nothing seemed to work out. I would love if anyone can help me with this problem. It's probably something really simple I just missed out.
My script code:
import requests
import json
import csv
from jsonpath_ng import jsonpath, parse
url = 'url'
headers = {
"Authorization": "Bearer token"
}
response = requests.get(url_v1, headers=headers)
JsonResponse = response.json()
converted = json.dumps(JsonResponse)
Data = json.loads(converted)
ParseData = parse('$..id')
Id = ParseData.find(Data)
open_file = open('C:/File/test.csv','w', newline='')
writer = csv.writer(open_file)
list_id = []
fields = ['id']
for i in range(0, len(Id)):
result = Id[i].value
list_id.append(result)
writer.writerow(fields)
writer.writerow(list_id)
open_file.close()

How to download csv ata from website using Python

I'm trying to automatically download data from the following website; however I just get the html and no data:
http://tcplus.com/GTN/OperationalCapacity#filter.GasDay=02/02/19&filter.CycleType=1&page=1&sort=LocationName&sort_direction=ascending
import csv
import urllib2
downloaded_data = urllib2.urlopen('http://tcplus.com/GTN/OperationalCapacity#filter.GasDay=02/02/19&filter.CycleType=1&page=1&sort=LocationName&sort_direction=ascending')
csv_data = csv.reader(downloaded_data)
for row in csv_data:
print row
The code below will only fetch data from provided url, but if you tweak parameters you can get other reports as well.
import requests
parameters = {'serviceTypeName': 'Ganesha.InfoPost.Service.OperationalCapacity.OperationalCapacityService, Ganesha.InfoPost.Service',
'filterTypeName': 'Ganesha.InfoPost.ViewModels.GasDayAndCycleTypeFilterViewModel, Ganesha.InfoPost',
'templateType': 6,
'exportType': 1,
'filter.GasDay': '02/02/19',
'filter.CycleType': 1}
response = requests.post('http://tcplus.com/GTN/Export/Generate', data=parameters)
with open('result.csv', 'w') as f:
f.write(response.text)

Set up a crawler and downloaded tweets. Unable to parse JSON file

I have been trying to parse a JSON file and it keeps giving me additional data errors. Since I am new to Python, I have no idea how I can resolve this. It seems there are multiple objects within the file. How do I parse it without getting any errors?
Edit: (Not my code but I am trying to work on it)
import json
import csv
import io
'''
creates a .csv file using a Twitter .json file
the fields have to be set manually
'''
data_json = io.open('filename', mode='r', encoding='utf-8').read() #reads in
the JSON file
data_python = json.loads(data_json)
csv_out = io.open('filename', mode='w', encoding='utf-8') #opens csv file
fields = u'created_at,text,screen_name,followers,friends,rt,fav' #field
names
csv_out.write(fields)
csv_out.write(u'\n')
for line in data_python:
#writes a row and gets the fields from the json object
#screen_name and followers/friends are found on the second level hence two
get methods
row = [line.get('created_at'),
'"' + line.get('text').replace('"','""') + '"', #creates double
quotes
line.get('user').get('screen_name'),
unicode(line.get('user').get('followers_count')),
unicode(line.get('user').get('friends_count')),
unicode(line.get('retweet_count')),
unicode(line.get('favorite_count'))]
row_joined = u','.join(row)
csv_out.write(row_joined)
csv_out.write(u'\n')
csv_out.close()
Edit 2: I found another recipe to parse it but there is no way for me to save the output. Any recommendations?
import json
import re
json_as_string = open('filename.json', 'r')
# Call this as a recursive function if your json is highly nested
lines = [re.sub("[\[\{\]]*", "", one_object.rstrip()) for one_object in
json_as_string.readlines()]
json_as_list = "".join(lines).split('}')
for elem in json_as_list:
if len(elem) > 0:
print(json.loads(json.dumps("{" + elem[::1] + "}")))

Save body text on csv file | Python 3

I am trying to create a database with several articles for Text mining purposes.
I am extracting the body via web scraping and then save the body of these articles on a csv file. However, I couldn't manage to save all the body texts.
The code that I came up with saves only the text the last URL (article) while if I print what I am scraping (and what I am supposed to save) I obtain the body of all the articles.
I just included some of the URL from the list (which contains a larger number of URLs) just to give you an idea:
import requests
from bs4 import BeautifulSoup
import csv
r=["http://www.nytimes.com/2016/10/12/world/europe/germany-arrest-syrian-refugee.html",
"http://www.nytimes.com/2013/06/16/magazine/the-effort-to-stop-the- attack.html",
"http://www.nytimes.com/2016/10/06/world/europe/police-brussels-knife-terrorism.html",
"http://www.nytimes.com/2016/08/23/world/europe/france-terrorist-attacks.html",
"http://www.nytimes.com/interactive/2016/09/09/us/document-Review-of-the-San-Bernardino-Terrorist-Shooting.html",
]
for url in r:
t= requests.get(url)
t.encoding = "ISO-8859-1"
soup = BeautifulSoup(t.content, 'lxml')
text = soup.find_all(("p",{"class": "story-body-text story-content"}))
print(text)
with open('newdb30.csv', 'w', newline='') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=' ',quotechar='|', quoting=csv.QUOTE_MINIMAL)
spamwriter.writerow(text)
Try declaring variable such as all_text = "" before the for loop and adding text to all_text by all_text += text + "\n" at the end of the for loop (the \n creates a new line).
Then, in the last row, instead of writing text, you write all_text.

python unicode csv export using pyramid

I'm trying to export mongodb that has non ascii characters into csv format.
Right now, I'm dabbling with pyramid and using pyramid.response.
from pyramid.response import Response
from mycart.Member import Member
#view_config(context="mycart:resources.Member", name='', request_method="POST", permission = 'admin')
def member_export( context, request):
filename = 'member-'+time.strftime("%Y%m%d%H%M%S")+".csv"
download_path = os.getcwd() + '/MyCart/mycart/static/downloads/'+filename
member = Members(request)
my_list = [['First Name,Last Name']]
record = member.get_all_member( )
for r in record:
mystr = [ r['fname'], r['lname']]
my_list.append(mystr)
with open(download_path, 'wb') as f:
fileWriter = csv.writer(f, delimiter=',',quotechar='|', quoting=csv.QUOTE_MINIMAL)
for l in my_list:
print(l)
fileWriter.writerow(l)
size = os.path.getsize(download_path)
response = Response(content_type='application/force-download', content_disposition='attachment; filename=' + filename)
response.app_iter = open(download_path , 'rb')
response.content_length = size
return response
In mongoDB, first name is showing 王, when I'm using print, it too is showing 王. However, when I used excel to open it up, it shows random stuff - ç¾…
However, when I tried to view it in shell
$ more member-20130227141550.csv
It managed to display the non ascii character correctly.
How should I rectify this problem?
I'm not a Windows guy, so I am not sure whether the problem may be with your code or with excel just not handling non-ascii characters nicely. But I have noticed that you are writing your file with python csv module, which is notorious for headaches with unicode.
Other users have reported success with using unicodecsv as a replacement for the csv module. Perhaps you could try dropping in this module as a csv writer and see if your problem magically goes away.

Categories

Resources