I have a script to download a PDF from the internet and save it to a specific directory, how can I go about appending the date and time to the file name?
# Import all needed modules and tools
from fileinput import filename
import os
import os.path
from datetime import datetime
import urllib.request
import requests
# Disable SSL and HTTPS Certificate Warnings
import urllib3
urllib3.disable_warnings()
resp = requests.get('url.org', verify=False)
# Get current date and time
current_datetime = datetime.now()
print("Current date & time : ". current_datetime)
# Convert datetime obj to string
str_current_datetime = str(current_datetime)
# Download and name the PDF file from the URL
response= urllib.request.urlretrieve('url.pdf',
filename = 'my directory\civil.pdf')
# Save to the preferred directory
with open("my directory\civil.pdf", 'wb') as f: f.write(resp.content)
Use f-strings:
open(f"file - {datetime.now().strftime('%Y-%m-%D')}.txt", "w")
# will create a new file with the title: "file - Year-Month-Date.txt"
# then you can do whatever you want with it
f-string docs
Related
I'm trying to scrap data from website into HDFS, at first it was working well the scraping, and then I added the line of storing data into HDFS it's not working:
import requests
from pathlib import Path
import os
from datetime import date
from hdfs import InsecureClient
date= date.today()
date
def downloadFile(link, destfolder):
r = requests.get(link,stream=True)
filename="datanew1"+ str(date)+".xls"
downloaded_file = open(os.path.join(destfolder, filename), 'wb')
client= InsecureClient('http://hdfs-namenode.default.svc.cluster.local:50070', user='hdfs')
with client.download('/data/test.csv')
for chunk in r.iter_content(chunk_size=256):
if chunk:
downloaded_file.write(chunk)
link="https://api.worldbank.org/v2/fr/indicator/FP.CPI.TOTL.ZG?downloadformat=excel"
Path('http://hdfs-namenode.default.svc.cluster.local:50070/data').mkdir(parents=True, exist_ok=True)
downloadFile(link, 'http://hdfs-namenode.default.svc.cluster.local:50070/data')
There is no error in the code, just I can't found the data scraped!
I need to use a Python program to download a file from an HTTP server while preserving the original timestamp of the file creation.
Accordingly, two questions:
How to get file date from HTTP server using Python 3.7?
How to set this date for the downloaded file?
You could have a look at requests to download the file and get the modification date from the headers.
To set the dates you can use os.utime and email.utils.parsedate for parsing the date from the headers (see this answer by tzot).
Here is an example:
import datetime
import os
import time
import requests
import email.utils as eut
url = 'http://www.hamsterdance.org/hamsterdance/index-Dateien/hamster.gif'
r = requests.get(url)
f = open('output', 'wb')
f.write(r.content)
f.close()
last_modified = r.headers['last-modified']
modified = time.mktime(datetime.datetime(*eut.parsedate(last_modified)[:6]).timetuple())
now = time.mktime(datetime.datetime.today().timetuple())
os.utime('output', (now, modified))
I am scraping info to a text file and am trying to write the date at the top. I have the method to grab the date but have no clue how I can use the write function to place at top. Below is a stripped down version of what I am working on.
import re
import urllib2
import json
from datetime import datetime
import time
now = datetime.now()
InputDate = now.strftime("%Y-%m-%d")
Today = now.strftime("%B %d")
header = ("Today").split()
newfile = open("File.txt", "w")
### Irrelevant Info Here ###
string = title"\n"+info+"\n"
#newfile.write(header)
newfile.write(string)
print title+" written to file"
newfile.close()
You can't insert something at the beginning of a file. You need to write a new file, starting with the line you want to insert, then finish with the contents of the old file. Unlike appending to the end, writing to the start of the file is really, really inefficient
The key to this problem is to use a NamedTemporaryFile. After you finish constructing it, you then rename it on top of the old file.
Code:
def insert_timestamp_in_file(filename):
with open(filename) as src, tempfile.NamedTemporaryFile(
'w', dir=os.path.dirname(filename), delete=False) as dst:
# Save the new first line
dst.write(dt.datetime.now().strftime("%Y-%m-%d\n"))
# Copy the rest of the file
shutil.copyfileobj(src, dst)
# remove old version
os.unlink(filename)
# rename new version
os.rename(dst.name, filename)
Test Code:
import datetime as dt
import tempfile
import shutil
insert_timestamp_in_file("file1")
file1
I am scraping info to a text file and am trying to write the date at
the top. I have the method to grab the date but have no clue how I can
use the write function to place at top. Been trying for 2 days and all.
Results:
2018-02-15
I am scraping info to a text file and am trying to write the date at
the top. I have the method to grab the date but have no clue how I can
use the write function to place at top. Been trying for 2 days and all.
To write the date to the 'top' of the file you would want to put:
newfile.write(InputDate)
newfile.write(Today)
after where you open the file and before anything else.
Just to give you idea
Try this:-
import re
import urllib2
import json
from datetime import datetime
import time
now = datetime.now()
InputDate = now.strftime("%Y-%m-%d")
Today = now.strftime("%B %d")
#start writing from here
newfile = open("File.txt", "a")
newfile.write(InputDate+"\n")
newfile.write("hello Buddy")
newfile.close()
Simple One will be, if you will not call it as a str then it will throw an error TypeError: write() argument must be str, not list
I have rfreshed teh code to be more precise and effective use..
import re
from datetime import datetime
import time
now = datetime.now()
InputDate = now.strftime("%B"+" "+"%Y-%m-%d")
newfile = open("File.txt", "a")
string = "Hi trying to add a datetime at the top of the file"+"\n"
newfile.write(str(InputDate+"\n"))
newfile.write(string)
newfile.close()
Result will be:
February 152018-02-15
Hi trying to add a datetime at the top of the file
My current python script:
import ftplib
import hashlib
import httplib
import pytz
from datetime import datetime
import urllib
from pytz import timezone
import os.path, time
import glob
def ftphttp(cam_name):
for image in glob.glob(os.path.join('/tmp/image/*.png')):
ts = os.path.getmtime(image)
dt = datetime.fromtimestamp(ts, pytz.utc)
timeZone= timezone('Asia/Singapore')
localtime = dt.astimezone(timeZone).isoformat()
camid = cam_name(cam_name)
tscam = camid + localtime
ftp = ftplib.FTP('10.217.137.121','kevin403','S$ip1234')
ftp.cwd('/var/www/html/image')
m=hashlib.md5()
m.update(tscam)
dd=m.hexdigest()
x = httplib.HTTPConnection('10.217.137.121', 8086)
x.connect()
f = {'ts' : localtime}
x.request('GET','/camera/store?fn='+dd+'&'+urllib.urlencode(f)+'&cam='+cam_name(cam_name))
y = x.getresponse()
z=y.read()
x.close()
with open(image, 'rb') as file:
ftp.storbinary('STOR '+dd+ '.png', file)
ftp.quit()
Right now I'm able to send multiple files into another folder but the data that is store in the database is duplicated. Like example, when i store 3 files into the folder and then my database stored 6 data via httplib. Anybody got any ideas why the data is duplicated? HELP needed!
import re
import string
import shutil
import os
import os.path
import time
import datetime
import math
import urllib
from array import array
import random
filehandle = urllib.urlopen('http://www.google.com/') #open webpage
s = filehandle.read() #read
print s #display
#what i plan to do with it once i get the first part working
#results = re.findall('[<td style="font-weight:bold;" nowrap>$][0-9][0-9][0-9][.][0-9][0-9][</td></tr></tfoot></table>]',s)
#earnings = '$ '
#for money in results:
#earnings = earnings + money[1]+money[2]+money[3]+'.'+money[5]+money[6]
#print earnings
#raw_input()
this is the code that i have so far. now i have looked at all the other forums that give solutions such as the name of the script, which is parse_Money.py, and i have tried doing it with urllib.request.urlopen AND i have tried running it on python 2.5, 2.6, and 2.7. If anybody has any suggestions it would be really welcome, thanks everyone!!
--Matt
---EDIT---
I also tried this code and it worked, so im thinking its some kind of syntax error, so if anybody with a sharp eye can point it out, i would be very appreciative.
import shutil
import os
import os.path
import time
import datetime
import math
import urllib
from array import array
import random
b = 3
#find URL
URL = raw_input('Type the URL you would like to read from[Example: http://www.google.com/] :')
while b == 3:
#get file name
file1 = raw_input('Enter a file name for the downloaded code:')
filepath = file1 + '.txt'
if os.path.isfile(filepath):
print 'File already exists'
b = 3
else:
print 'Filename accepted'
b = 4
file_path = filepath
#open file
FileWrite = open(file_path, 'a')
#acces URL
filehandle = urllib.urlopen(URL)
#display souce code
for lines in filehandle.readlines():
FileWrite.write(lines)
print lines
print 'The above has been saved in both a text and html file'
#close files
filehandle.close()
FileWrite.close()
it appears that the urlopen method is available in the urllib.request module and not in the urllib module as you're expecting.
rule of thumb - if you're getting an AttributeError, that field/operation is not present in the particular module.
EDIT - Thanks to AndiDog for pointing out - this is a solution valid for Py 3.x, and not applicable to Py2.x!
The urlopen function is actually in the urllib2 module. Try import urllib2 and use urllib2.urlopen
I see that you are using Python2 or at least intend to use Python2.
urlopen helper function is available in both urllib and urllib2 in Python2.
What you need to do this, execute this script against the correct version of your python
C:\Python26\python.exe yourscript.py