How to save many CSV files from URL - python

I have many CSV files that I need to get from a URL. I found this reference: How to read a CSV file from a URL with Python?
It does almost the thing I want, but I don't want to go through Python to read the CSV and then have to save it. I just want to directly save the CSV file from the URL to my hard drive.
I have no problem with for loops and cycling through my URLs. It is simply a matter of saving the CSV file.

If all you want to do is save a csv, then I wouldn't suggest using python at all. In fact this is more of a unix question. Making the assumption here that you're working on some kind of *nix system I would suggest just using wget. For instance:
wget http://someurl/path/to/file.csv
You can run this command directly from python like so:
import subprocess
bashCommand = lambda url, filename: "wget -O %s.csv %s" % (filename, url)
save_locations = {'http://someurl/path/to/file.csv': 'test.csv'}
for url, filename in save_locations.items():
process = subprocess.Popen(bashCommand(url, filename).split(), stdout=subprocess.PIPE)
output = process.communicate()[0]

Related

Run python zip file from memory at runtime?

I am trying to run a python zip file which is retrieved using requests.get. The zip file has several directories of python files in addition to the __main__.py, so in the interest of easily sending it as a single file, I am zipping it.
I know the file is being sent correctly, as I can save it to a file and then run it, however, I want to execute it without writing it to storage.
The working part is more or less as follows:
import requests
response = requests.get("http://myurl.com/get_zip")
I can write the zip to file using
f = open("myapp.zip","wb")
f.write(response.content)
f.close()
and manually run it from command line. However, I want something more like
exec(response.content)
This doesn't work since it's still compressed, but you get the idea.
I am also open to ideas that replace the zip with some other format of sending the code over internet, if it's easier to execute it from memory.
A possible solution is this
import io
import requests
from zipfile import ZipFile
response = requests.get("http://myurl.com/get_zip")
# Read the contents of the zip into a bytes object.
binary_zip = io.BytesIO(response.content)
# Convert the bytes object into a ZipFile.
zip_file = ZipFile(binary_zip, "r")
# Iterate over all files in the zip (folders should be also ok).
for script_file in zip_file.namelist():
exec(zip_file.read(script_file))
But it is a bit convoluted and probably can be improved.

Open and read latest json file one time only

SO members...how can i read latest json file in a directory one time only (if no new file print something). So far I can only read the latest file ...The sample script (run every 45mins) below open and read latest json file in a directory. In this case latest file is file3.json (json file created every 30mins). Thus, if file4 is not created for some reason (for example server fail to create new json file). If the script run again.. it will still read the same last file3.
files in directory
file1.json
file2.json
file3.json
The script below able to open and read latest json file created in the directory.
import glob
import os
import os.path
import datetime, time
listFiles = glob.iglob('logFile/*.json')
latestFile = max(listFiles, key=os.path.getctime)
with open(latestFile, 'r') as f:
mydata = json.load(f)
print(mydata)
To ensure the script will only read newest file and read the newest file one time only...aspect something below:-
listFiles = glob.iglob('logFile/*.json')
latestFile = max(listFiles, key=os.path.getctime)
if latestFile newer than previous open/read file: # Not sure to compare the latest file with the previous file.
with open(latestFile, 'r') as f:
mydata = json.load(f)
print(mydata)
else:
print("no new file created")
Thank you for your help. Example solution would be good to share.
I can't figure out the solution...seems simple but few days try n error without any luck.
(1)Make sure read latest file in directory
(2)Make sure read file/s that may miss to read (due to script fail to run)
(3)Only read once all the files and if no new file give warning.
Thank you.
After SO discussion and suggestion, I got few methods to resolve or at least to accommodate some of the requirement. I just move files that have been processed. If no file create, script will run nothing and if script fail and once normalize it will run and read all related files available. I think its good for now. Thank you guyz...
Below is the answer rather an approach, I would like to propose:
The idea is as follows:
Every log file that is written to a directory can have a key-val in it called "creation_time": timestamp (fileX.json that gets stored in the server). Now, your script runs at 45min to obtain the file which is dumped to a directory. In normal cases, you must be able to read the file, and finally, when you exit the script you can store the last read filename and the creation_time taken from the fileX.json into a logger.json.
An example for a logger.json is as follows:
{
"creation_time": "03520201330",
"file_name": "file3.json"
}
Whenever a server fail or any delay occurs, there could be a rewritten of the fileX.json or new fileX's.json would have been created in the directory. In these situations, you would first open the logger.json and obtain both the timestamp and last filename as shown in the example above. By using the last filename, you can compare the old timestamp that is present in logger with the new timestamp in fileX.json. If they match basically there is no change you only read ahead files and rewrite the logger.
If that is not the case you would re-read the last fileX.json again and proceed to read other ahead files.

Python 3.3 Code to Download a file to a location and save as a given file name

For example I would like to save the .pdf file # http://arxiv.org/pdf/1506.07825 with the filename: 'Data Assimilation- A Mathematical Introduction' at the location 'D://arXiv'.
But I have many such files. So, my input is of the form of a .csv file with rows given by (semi-colon is the delimiter):
url; file name; location.
I found some code here: https://github.com/ravisvi/IDM
But that is a bit advanced for me to parse. I want to start with something simpler. The above seems to have more functionality than I need right now - threading, pausing etc.
So can you please write me a very minimal code to do the above:
save the file 'Data Assimilation- A Mathematical Introduction'
from 'http://arxiv.org/pdf/1506.07825'
at 'D://arXiv'?
I think I will be able to generalize it to deal with a .csv file.
Or, hint me a place to get started. (The github repository already has a solution, and it is too perfect! I want something simpler.) My guess is, with Python, a task as above should be possible with no more than 10 lines of code. So tell me important ingredients of the code, and perhaps I can figure it out.
Thanks!
I would use the requests module, you can just pip install requests.
Then, the code is simple:
import requests
response = requests.get(url)
if response.ok:
file = open(file_path, "wb+") # write, binary, allow creation
file.write(response.content)
file.close()
else:
print("Failed to get the file")
Using Python 3.6.5
Here is a method that can create a folder and save the file in a folder.
dataURL - Complete URL path
data_path - Where the file needs to be saved.
tgz_path - Name of the datafile with the extension.
def fetch_data_from_tar(data_url,data_path,tgz_path):
if not os.path.isdir(data_path):
os.mkdir(data_path)
print ("Data Folder Created # Path", data_path)
else:
print("Folder path already exists")
tgz_path = os.path.join(data_path,tgz_path)
urllib.request.urlretrieve(data_url,filename=tgz_path)
data_tgz = tarfile.open(tgz_path)
data_tgz.extractall(path=data_path)
data_tgz.close()

Compare archiwum.rar content and extracted data from .rar in the folder on Windows 7

Does anyone know how to compare amount of files and size of the files in archiwum.rar and its extracted content in the folder?
The reason I want to do this, is that server I'am working on has been restarted couple of times during extraction and I am not sure, if all the files has been extracted correctly.
.rar files are more then 100GB's each and server is not that fast.
Any ideas?
ps. if the solution would be some code instead standalone program, my preference is Python.
Thanks
In Python you can use RarFile module. The usage is similar to build-in module ZipFile.
import rarfile
import os.path
extracted_dir_name = "samples/sample" # Directory with extracted files
file = rarfile.RarFile("samples/sample.rar", "r")
# list file information
for info in file.infolist():
print info.filename, info.date_time, info.file_size
# Compare with extracted file here
extracted_file = os.path.join(extracted_dir_name, info.filename)
if info.file_size != os.path.getsize(extracted_file):
print "Different size!"

Apple Automator process csv files and create new files

Is it possible to loop through a set of selected files, process each, and save the output as new files using Apple Automator?
I have a collection of .xls files, and I've gotten Automator to
- Ask for Finder Items
- Open Finder Items
- Convert Format of Excel Files #save each .xls file to a .csv
I've written a python script that accepts a filename as an argument, processes it, and saves it as p_filename in the directory the script's being run from. I'm trying to use Run Shell Script with the /usr/bin/python shell and my python script pasted in.
Some things don't translate too well, though, especially since I'm not sure how it deals with python's open('filename','w') command. It probably doesn't have permissions to create new files, or I'm entering the command incorrectly. I had the idea to instead output the processed file as text, capture it with Automator, and then save it to a new file.
To do so, I tried to use New Text File, but I can't get it to create a new text file for each file selected back in the beginning. Is it possible to loop through all the selected Finder Items?
Why do you want this done in the folder of the script? Or do you mean the folder of the files you are getting from the Finder items? In that case just get the path for each file passed into Python.
When you run open('filename','w') you should thus pass in a full pathname. Probably what's happening is you are actually writing to the root directory rather than where you think you are.
Assuming you are passing your files to the shell command in Automator as arguments then you might have the following:
import sys, os
args = sys.argv[1:]
for a in args:
p = os.path.dirname(a)
mypath = p + "/" + name
f = open(mypath, "w")

Categories

Resources