Python: POST multiple multipart-encoded files - python

As described here, it is possible to send multiple files with one request:
Uploading multiple files in a single request using python requests module
However, I have a problem generating these multiple filehandlers from a list.
So let's say I want to make a request like this:
sendfiles = {'file1': open('file1.txt', 'rb'), 'file2': open('file2.txt', 'rb')}
r = requests.post('http://httpbin.org/post', files=sendfiles)
How can I generate sendfiles from the list myfiles?
myfiles = ["file1.txt", "file20.txt", "file50.txt", "file100.txt", ...]

Use a dictionary comprehension, using os.path.splitext() to remove those extensions from the filenames:
import os.path
sendfiles = {os.path.splitext(fname)[0]: open(fname, 'rb') for fname in myfiles}
Note that a list of 2-item tuples will do too:
sendfiles = [(os.path.splitext(fname)[0], open(fname, 'rb')) for fname in myfiles]
Beware; using the files parameter to send a multipart-encoded POST will read all those files into memory first. Use the requests-toolbelt project to build a streaming POST body instead:
from requests_toolbelt import MultipartEncoder
import requests
import os.path
m = MultipartEncoder(fields={
os.path.splitext(fname)[0]: open(fname, 'rb') for fname in myfiles})
r = requests.post('http://httpbin.org/post', data=m,
headers={'Content-Type': m.content_type})

Related

python download folder of text files

The goal is to download GTFS data through python web scraping, starting with https://transitfeeds.com/p/agence-metropolitaine-de-transport/129/latest/download
Currently, I'm using requests like so:
def download(url):
fpath = "prov/city/GTFS"
r = requests.get(url)
if r.ok:
print("Saving file.")
open(fpath, "wb").write(r.content)
else:
print("Download failed.")
The results of requests.content of the above url unfortunately renders the following:
You can see the files of interest within the output (e.g. stops.txt) but how might I access them to read/write?
I fear you're trying to read a zip file with a text editor, perhaps you should try using the "zipfile" module.
The following worked:
def download(url):
fpath = "path/to/output/"
f = requests.get(url, stream = True, headers = headers)
if f.ok:
print("Saving to {}".format(fpath))
g=open(fpath+'output.zip','wb')
g.write(f.content)
g.close()
else:
print("Download failed with error code: ", f.status_code)
You need to write this file into a zip.
import requests
url = "https://transitfeeds.com/p/agence-metropolitaine-de-transport/129/latest/download"
fname = "gtfs.zip"
r = requests.get(url)
open(fname, "wb").write(r.content)
Now fname exists and has several text files inside. If you want to programmatically extract this zip and then read the content of a file, for example stops.txt, then you need first to extract a single file, or simply extractall.
import zipfile
# this will extract only a single file, and
# raise a KeyError if the file is missing from the archive
zipfile.ZipFile(fname).extract("stops.txt")
# this will extract all the files found from the archive,
# overwriting files in the process
zipfile.ZipFile(fname).extractall()
Now you just need to work with your file(s).
thefile = "stops.txt"
# just plain text
text = open(thefile).read()
# csv file
import csv
reader = csv.reader(open(thefile))
for row in reader:
...

How do I fix my code so that it is automated?

I have the below code that takes my standardized .txt file and converts it into a JSON file perfectly. The only problem is that sometimes I have over 300 files and doing this manually (i.e. changing the number at the end of the file and running the script is too much and takes too long. I want to automate this. The files as you can see reside in one folder/directory and I am placing the JSON file in a differentfolder/directory, but essentially keeping the naming convention standardized except instead of ending with .txt it ends with .json but the prefix or file names are the same and standardized. An example would be: CRAZY_CAT_FINAL1.TXT, CRAZY_CAT_FINAL2.TXT and so on and so forth all the way to file 300. How can I automate and keep the file naming convention in place, and read and output the files to different folders/directories? I have tried, but can't seem to get this to iterate. Any help would be greatly appreciated.
import glob
import time
from glob import glob
import pandas as pd
import numpy as np
import csv
import json
csvfile = open(r'C:\Users\...\...\...\Dog\CRAZY_CAT_FINAL1.txt', 'r')
jsonfile = open(r'C:\Users\...\...\...\Rat\CRAZY_CAT_FINAL1.json', 'w')
reader = csv.DictReader(csvfile)
out = json.dumps([row for row in reader])
jsonfile.write(out)
****************************************************************************
I also have this code using the python library "requests". How do I make this code so that it uploads multiple json files with a standard naming convention? The files end with a number...
import requests
#function to post to api
def postData(xactData):
url = 'http link'
headers = {
'Content-Type': 'application/json',
'Content-Length': str(len(xactData)),
'Request-Timeout': '60000'
}
return requests.post(url, headers=headers, data=xactData)
#read data
f = (r'filepath/file/file.json', 'r')
data = f.read()
print(data)
# post data
result = postData(data)
print(result)
Use f-strings?
for i in range(1,301):
csvfile = open(f'C:\Users\...\...\...\Dog\CRAZY_CAT_FINAL{i}.txt', 'r')
jsonfile = open(f'C:\Users\...\...\...\Rat\CRAZY_CAT_FINAL{i}.json', 'w')
import time
from glob import glob
import csv
import json
import os
INPATH r'C:\Users\...\...\...\Dog'
OUTPATH = r'C:\Users\...\...\...\Rat'
for csvname in glob(INPATH+'\*.txt'):
jsonname = OUTPATH + '/' + os.basename(csvname[:-3] + 'json')
reader = csv.DictReader(open(csvname,'r'))
json.dump( list(reader), open(jsonname,'w') )

Parse multiple json files, flatten data and save as json

I have multiple number of json files saved in a folder. I would like to parse each json file, use the library flatten and save as a seperate json file.
I have managed to do this with one json, but struggling to parse several json files at once without merging the data and then save.
I think I need to create a loop to load a json file, flatten and save until there were no more json files in the folder, is this possible?
This still seems to only parse one json file.
path_to_json='json_test/'
for file in [file for file in os.listdir(path_to_json)if file.endswith('.json')]:
with open(path_to_json + file) as json_file:
data1=json.load(json_file)
Any help would be much appreciated thanks!
Every looop 'data1' is assigned to new json files. therefore only returns one result.
Instead, append to a new list.
import os
import json
# Flatten not supported on 3.8.3
path = 'X:test folder/'
file_list = [p for p in os.listdir(path) if p.endswith('.json')]
flattened = []
for file in file_list:
with open(path + file) as json_file:
# flatten json here, can't install from pip.
flattened.append(json.load(json_file))
for file, flat_json in zip(file_list, flattened):
json.dump(flat_json, open(path + file + '_flattened.json', "w"), indent=2)
# Can yo try this out
# https://stackoverflow.com/questions/23520542/issue-with-merging-multiple-json-files-in-python
import glob
read_files = glob.glob("*.json")
with open("merged_file.json", "wb") as outfile:
outfile.write('[{}]'.format(
','.join([open(f, "rb").read() for f in read_files])))

Using requests for multipart/form data for API

I am trying to use requests for a multipart data submission on SpeechMatics API.
The API is declared like this in curl:
curl -F data_file=#my_audio_file.mp3 -F model=en-US "https://api.speechmatics.com/v1.0/user/17518/jobs/?auth_token=<some token>" # transcription
Where data file is supposed to be the local path and model is the language as per the documentation here, https://app.speechmatics.com/api-details#getJobs
Using the requests library, my code is as below, but seems to fail to upload the file:
import Requests
path = 'https://api.speechmatics.com/v1.0/user/userID/jobs'
token = {'auth_token':<some token>}
data_file = open('F:\\user\\Bern\\Data Files\\audio.flac','rb')
model = 'en-US'
r = requests.post(path,params=token,files={'data_file':data_file,'model':model})
I get Reponse 200 but the file seems to fail to upload.
I think this is what you're looking for
import requests
files = {
'data_file': open('my_audio_file.mp3', 'rb'),
'model': 'en-US'
}
requests.get('https://api.speechmatics.com/v1.0/user/17518/jobs/?auth_token=<some token>', files=files)
I was guided by a post to use both the data and files parameter for the API.
Below is the cut and paste code that I used.
post_api = 'https://api.speechmatics.com/v1.0/user/17518/jobs/?auth_token=<some token>'
path = input()
files = [os.path.join(path,f) for f in os.listdir(path) if f.endswith('.mp3')]
l = []
for file in files:
with open(file, 'rb') as f:
files = {
'data_file': f
}
data = {'model': 'en-AU'}
r = requests.post(post_api,data = data,files=files)
l.append(r)
f.closed

downloading a file, not the contents

I am trying to automate downloading a .Z file from a website, but the file I get is 2kb when it should be around 700 kb and it contains a list of the contents of the page (ie: all the files available for download). I am able to download it manually without a problem. I have tried urllib and urllib2 and different configurations of each, but each does the same thing. I should add that the urlVar and fileName variables are generated in a different part of the code, but I have given an example of each here to demonstrate.
import urllib2
urlVar = "ftp://www.ngs.noaa.gov/cors/rinex/2014/100/txga/txga1000.14d.Z"
fileName = txga1000.14d.Z
downFile = urllib2.urlopen(urlVar)
with open(fileName, "wb") as f:
f.write(downFile.read())
At least the urllib2documentation suggest you should use the Requestobject. This works with me:
import urllib2
req = urllib2.Request("ftp://www.ngs.noaa.gov/cors/rinex/2014/100/txga/txga1000.14d.Z")
response = urllib2.urlopen(req)
data = response.read()
Data length seems to be 740725.
I was able to download what seems like the correct size for your file with the following python2 code:
import urllib2
filename = "txga1000.14d.Z"
url = "ftp://www.ngs.noaa.gov/cors/rinex/2014/100/txga/{}".format(filename)
reply = urllib2.urlopen(url)
buf = reply.read()
with open(filename, "wb") as fh:
fh.write(buf)
Edit: The post above me was answered faster and is much better.. I thought I'd post since I tested and wrote this out anyways.

Categories

Resources