Python-How to send a dataframe to external API (push) - python

I am wondering how to send this dataframe to external API?

You can convert your dataframe into json file using pandas.to_json(), read that file and pass it as data in the post call of requests.
df_json_dict = json.loads(df.to_json(orient='records'))
requests.post(url, data=df_json_dict)

In case the answer is still not clear, I will summarise the thing:
create your pandas dataframe
import pandas as pd
dataframe = pd.read_csv(file_path, sep=',')
create the stream
import io, requests
stream = io.StringIO()
convert dataframe to csv stream
dataframe.to_csv(stream, sep=';', encoding='utf-8', index = False)
start stream from the beginning
stream.seek(0)
put the request using stream
file_upload_resp = requests.put(url, data=stream)

Related

Problem in reading Excel file form url into a dataframe

How can I read excel file from a url into a dataframe?
import requests
request_url = 'https://pishtazfund.com/Download/DownloadNavChartList?exportType=Excel&fromDate=5/9/2008&toDate=2/22/2022&basketId=0'
response = requests.get(request_url, headers={'Accept': 'text/html'})
I can not convert the response into a dataframe, any idea or solution appreciated
You can use panda's read_csv()
import pandas as pd
df = pd.read_csv('https://pishtazfund.com/Download/DownloadNavChartList?exportType=Excel&fromDate=5/9/2008&toDate=2/22/2022&basketId=0')

How can i save my scraped json dataset that is a text format into my local machine and also how can i read the file into Pandas DataFrame?

This is how the code looks like below:
I want to use it for analysis
response = requests.request("GET", url, headers=headers, params=querystring)
print(response.text)
{"#type":"imdb.api.title.ratings","id":"/title/tt0944947/","title":"Game of Thrones","titleType":"tvSeries","year":2011,"canRate":true,"otherRanks":[{"id":"/chart/ratings/toptv","label":"Top 250 TV","rank":12,"rankType":"topTv"}],"rating":9.2,"ratingCount":1885115,"ratingsHistograms":{"Males Aged 18-29":{"aggregateRating":9.3,"demographic":"Males Aged 18-29","histogram":{"1":11186,"2":693,"3":801,"4":962,"5":2103,"6":3583,"7":9377,"8":22859,"9":52030,"10":174464},"totalRatings":278058},"IMDb Staff":{"aggregateRating":8.7,"demographic":"IMDb Staff","histogram":{"1":0,"2":0,"3":0,"4":0,"5":1,"6":3,"7":6,"8":19,"9":27,"10":17},"totalRatings":73}
Frankly, you should find it in any tutorial for Python or in many examples for requests
fh = open("output.json")
fh.write(response.text)
fh.close()
or
with open("output.json") as fh:
fh.write(response.text)
As for pandas you can try to read it
df = pd.read_json("output.json")
or you can use module io to read it without saving on disk
import io
fh = io.StringIO(response.text)
df = pd.read_json(fh)
But pandas keeps data as table with rows and columns but you have nested lists/dicts so it may need some work to keep it in DataFrame.
If you want to get only some data from json then you could use response.json()

how to store bytes like b'PK\x03\x04\x14\x00\x08\x08\x08\x009bwR\x00\x00\x00\x00\x00\x00\x00 to dataframe or csv in python

I am requesting a URL and getting a return in bytes. I want to store this in a data frame and then to CSV.
#Get Data from the CSV
url = "someURL"
req = requests.get(URL)
url_content = req.content
csv_file = open('test.txt', 'wb')
print(type(url_content))
print(url_content)
csv_file.write(url_content)
csv_file.close()
I tried many approaches, but couldn't find the solution. The above code is storing the output in CSV, but getting the below error. My end objective is to store this in CSV then send it to google cloud. And create a google big query table.
Output:
<class 'bytes'>
b'PK\x03\x04\x14\x00\x08\x08\x08\x009bwR\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x13\x00\x00\x00[Content_Types].xml\xb5S\xcbn\xc20\x10\xfc\x95\xc8\xd76\xf4PU\x15\x81C\x1f\xc7\x16\xa9\xf4\x03\{\x93X\xf8%\xaf\xa1\xf0\xf7]\x078\x94R\x89\nq\xf2cfgfW\xf6d\xb6q\xb6ZCB\x13|\xc3\xc6|\xc4\xf0h\xe3\xbb\x86},^\xea{Va\x96^K\x1b<4\xcc\x076\x9bN\x16\xdb\x08XQ\xa9\xc7\x86\xf59\xc7\x07!P\xf5\xe0$\xf2\x10\xc1\x13\xd2\x86\xe4d\xa6c\xeaD\x94j);\x10\xb7\xa3\xd1\x9dP\xc1g\xf0\xb9\xceE\x83M'O\xd0\xca\x95\xcd\xd5\xe3\xee\xbeH7L\xc6h\x8d\x92\x99R\x89\xb5\xd7G\xa2\xf5^\x90'\xb0\x03\x07{\x13\xf1\x86\x08\xacz\xde\x90\xca\xae\x1bB\x91\x893\x1c\x8e\x0b\xcb\x99\xea\xdeh.\xc9h\xf8W\xb4\xd0\xb6F\x81\x0ej\xe5\xa8\x84CQ\xd5\xa0\xeb\x98\x88\x98\xb2\x81}\xce\xb9L\xf9U:\x12\x14D\x9e\x13\x8a\x82\xa4\xf9%\xde\x87\xb1\xa8\x90\xe0,\xc3B\xbc\xc8\xf1\xa8[\x8c\t\xa4\xc6\x1e ;\xcb\xb1\x97\t\xf4{N\xf4\x98~\x87\xd8X\xf1\x83p\xc5\x1cykOL\xa1\x04\x18\x90kN\x80V\xee\xa4\xf1\xa7\xdc\xbfBZ~\x86\xb0\xbc\x9e\x7fq\x18\xf6\x7f\xd9\x0f \x8aa\x19\x1fr\x88\xe1{O\xbf\x01PK\x07\x08z\x94\xcaq;\x01\x00\x00\x1c\x04\x00\x00PK\x03\x04\x14\x00\x08\x08\x08\x009bwR\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0b\x00\x00\x00_rels/.rels\xad\x92\xc1j\xc30\x0c\x86_\xc5\xe8\xde8\xed`\x8cQ\xb7\x972\xe8m\x8c\xee\x014[ILb\xcb\xd8\xda\x96\xbd\xfd\xcc.[K\n\x1b\xec($}\xff\x07\xd2v?\x87I\xbdQ.\x9e\xa3\x81u\xd3\x82\xa2h\xd9\xf9\xd8\x1bx>=\xac\xee#\x15\xc1\xe8p\xe2H\x06"\xc3~\xb7}\xa2\t\xa5n\x94\xc1\xa7\xa2"\x16\x03\x83H\xba\xd7\xba\xd8\x81\x02\x96\x86\x13\xc5\xda\xe98\x07\x94Z\xe6^'\xb4#\xf6\xa47m{\xab\xf3O\x06\x9c3\xd5\xd1\x19\xc8G\xb7\x06u\xc2\xdc\x93\x18\x98'\xfd\xcey|a\x1e\x9b\x8a\xad\x8d\x8fD\xbf\t\xe5\xae\xf3\x96\x0el_\x03EY\xc8\xbe\x98\x00\xbd\xec\xb2\xf9vql\x1f3\xd7ML\xe9\xbfeh\x16\x8a\x8e\xdc*\xd5\x04\xca\xe2\xa9\3\xbaY0\xb2\x9c\xe9oJ\xd7\x8f\xa2\x03\t:\x14\xfc\xa2^\x08\xe9\xb3\x1f\xd8}\x02PK\x07\x08\xa7\x8cz\xbd\xe3\x00\x00\x00I\x02\x00\x00PK\x03\x04\x14\x00\x08\x08\x08\x009bwR\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\x00docProps/app.xmlM\x8e\xc1\n\xc20\x10D\xef~E\xc8\xbd\xdd\xeaAD\xd2\x94\x82\x08\x9e\xecA? \xa4\xdb6\xd0lB\xb2J?
The original URL (now edited out of the question) suggests that the downloaded file is in .xlsx format. The .xlsx format is essentially one or more xml files in a zip archive (iBug's answer is correct in this respect).
Therefore if you want to get the file's data in a dataframe, tell Pandas to read it as an excel file.
import pandas as pd
url = "someURL"
req = requests.get(URL)
url_content = req.content
# Load into a dataframe
df = pd.read_excel(url_content)
# Write to csv
df.to_csv('data.csv')
The initial bytes PK\x03\x04 suggest that it's PK Zip format. Try unzipping it first, either with unzip x <filename> or with Python builtin zipfile module.

How to convert a large json response of a web service call into CSV in Python?

I have a web service which returns a very large JSON response. I want to parse it and convert it into a CSV format using Python. I have written a code to load json and convert it to CSV. However, for a large response it raises MemoryError. How can I load and convert response data using streaming?
Here is my code:
import json
from pandas import json_normalize
re = requests.get(url)
data = json.loads(re.text)
df = json_normalize(data)
df.to_csv(fileName, index=False, encoding='utf-8')
Here is a sample of my JSON response:
[{"F1":"V1_1","F2":false,,"F3":120,"F4":"URL1","F5":{"F5_1":4,"F5_2":"A"}},
{"F1":"V2_1","F2":true,,"F3":450,"F4":"URL2","F5":{"F5_1":13,"F5_2":"B"}},
{"F1":"V3_1","F2":false,,"F3":312,"F4":"URL3","F5":{"F5_1":6,"F5_2":"C"}},
...
]
The MemoryError occurs in the json.loads() function (or the response.json() function if use it instead of json.loads()). Is there any idea how I can load and parse and convert such a big JSON response to a CSV file?
First, you are not making dataframe from only the result of the response, rather you are trying to push additional data with the result here. Try this and see if pandas can save this time.
import pandas as pd
# this is a dummy URL for demonstration
url = "https://www.qnt.io/api/results?pID=gifgif&mID=54a309ae1c61be23aba0da62&key=54a309ac1c61be23aba0da3f"
response = requests.get(url)
# extract the relevant results from response
data = response.json()["results"]
df = pd.json_normalize(data)
df.to_csv("filename.csv", index=False, encoding="utf-8")

Python: read a csv file generated dynamically by an API?

I want to read into pandas the csv generated by this URL:
https://www.alphavantage.co/query?function=FX_DAILY&from_symbol=EUR&to_symbol=USD&apikey=demo&datatype=csv
How should this be done?
I believe you can just read it with pd.read_csv
import pandas as pd
URL = 'https://www.alphavantage.co/query?function=FX_DAILY&from_symbol=EUR&to_symbol=USD&apikey=demo&datatype=csv'
df = pd.read_csv(URL)
Results:

Categories

Resources