Error Opening Downloaded File from a Data Frame in Python

Error Opening Downloaded File from a Data Frame in Python - python

The code below downloads the pictures as .png filetype to my computer. However, when I try to open the images, the image viewer says: "It looks like we don't support this file format". I used another app to open the image and the problem persisted. It seems that the files are downloaded as 'bytes object' instead of as images..
import csv
import requests
with open('rlth.csv', 'r', newline='') as file:
has_header = csv.Sniffer().has_header(file.read(1024))
file.seek(0) # Rewind.
reader = csv.reader(file)
if has_header:
next(reader) # Skip header row.
csvrows = csv.reader(file, delimiter=',', quotechar='"')
for row in csvrows:
filename = row[0]
url = row[2]
print(url)
result = requests.get(url, stream=True)
if result.status_code == 200:
image = result.raw.read()
open("{}.png".format(filename),"wb").write(image)

Related

How to read the headers of a csv file using csv module in "rb" mode?

I am currently reading the csv file in "rb" mode and uploading the file to an s3 bucket.
with open(csv_file, 'rb') as DATA:
s3_put_response = requests.put(s3_presigned_url,data=DATA,headers=headers)
All of this is working fine but now I have to validate the headers in the csv file before making the put call.
When I try to run below, I get an error.
with open(csv_file, 'rb') as DATA:
csvreader = csv.reader(file)
columns = next(csvreader)
# run-some-validations
s3_put_response = requests.put(s3_presigned_url,data=DATA,headers=headers)
This throws
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
As a workaround, I have created a new function which opens the file in "r" mode and does validation on the csv headers and this works ok.
def check_csv_headers():
with open(csv_file, 'r') as file:
csvreader = csv.reader(file)
columns = next(csvreader)
I do not want to read the same file twice. Once for header validation and once for uploading to s3. The upload part also doesn't work if I do it in "r" mode.
Is there a way I can achieve this while reading the file only once in "rb" mode ? I have to make this work using the csv module and not the pandas library.

Doing what you want is possible but not very efficient. Simply opening a file isn't that expensive. The CSV reader only reads only line at a time, not the entire file.
To do what you want you have to :
Read the first line as bytes
Decode it into a string (using the correct encoding)
Convert it to a list of strings
Parse it with csv.reader and finally
Seek to the start of the stream.
Otherwise you'll end up uploading only the data without the headers :
with open(csv_file, 'rb') as DATA:
header=file.readline()
lines=[header.decode()]
csvreader = csv.reader(lines)
columns = next(csvreader)
// run-some-validations
DATA.seek(0)
s3_put_response = requests.put(s3_presigned_url,data=DATA,headers=headers)
Opening the file as text is not only simpler, it allows you to separate the validation logic from the upload code.
To ensure only one line is read at a time you can use buffering=1
def check_csv_headers():
with open(csv_file, 'r', buffering=1) as file:
csvreader = csv.reader(file)
columns = next(csvreader)
// run-some-validations
with open(csv_data, 'rb') as DATA:
s3_put_response = requests.put(s3_presigned_url,data=DATA,headers=headers)
Or
def check_csv_headers():
with open(csv_file, 'r', buffering=1) as file:
csvreader = csv.reader(file)
columns = next(csvreader)
// run-some-validations
//If successful
return True
def upload_csv(filePath):
if check_csv_headers(filePath) :
with open(csv_data, 'rb') as DATA:
s3_put_response = requests.put(s3_presigned_url,data=DATA,headers=headers)

Using Python to download api data

Hi I have created a piece of code that downloads data from a api end point and also loads in the apikeys.
I am trying to achieve downloading the api data into csv files into their own folder based on the input.csv I have tried to achieve this by adding the following section at the end. The problem is that it does not download the file its to be receiving from the api end point.
Please assist?
with open('filepath/newfile.csv', 'w+') as f:
f.write(r.text)
import csv
import sys
import requests
def query_api(business_id, api_key):
headers = {
"Authorization": api_key
}
r = requests.get('https://api.link.com', headers=headers)
print(r.text)
# get filename from command line arguments
if len(sys.argv) < 2:
print ("input.csv")
sys.exit(1)
csv_filename = sys.argv[1]
with open(csv_filename) as csv_file:
csv_reader = csv.DictReader(csv_file, delimiter=',')
for row in csv_reader:
business_id = row['BusinessId']
api_key = row['ApiKey']
query_api(business_id, api_key)
with open('filepath/newfile.csv', 'w+') as f:
f.write(r.text)

Python organize CSV of urls, split line per url, download images from urls

I'm having trouble organizing my CSV file full of urls and downloading each image per url.
https://i.imgur.com/w1slgf6.png
It's quite hell, but the goal is to:
Write the src of these images into a csv file, splitting each url per line.
And download each image
from selenium import webdriver
from bs4 import BeautifulSoup
from time import sleep
import urllib.request
import pandas as pd
import requests
import urllib
import csv
# BeautifulSoup4 findAll src from img
print ('Downloading URLs to file')
sleep(1)
with open('output.csv', 'w', newline='\n', encoding='utf-8') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(srcs)
print ('Downloading images to folder')
sleep(1)
filename = "output"
with open("{0}.csv".format(filename), 'r') as csvfile:
# iterate on all lines
i = 0
for line in csvfile:
splitted_line = line.split(',')
# check if we have an image URL
if splitted_line[1] != '' and splitted_line[1] != "\n":
urllib.request.urlretrieve(splitted_line[1], "img_" + str(i) + ".png")
print ("Image saved for {0}".format(splitted_line[0]))
i += 1
else:
print ("No result for {0}".format(splitted_line[0]))

Base on the limited resources that you provided, I think this is the code that you need:
import requests
with open('output.csv', 'r') as file:
oldfile = file.read()
linkslist = oldfile.replace("\n", "") # Because your file is wrongly splitted by new lines so I removed it
links = linkslist.split(",")
with open('new.csv', 'w') as file: # Writing all your links to a new file, this can combine with the below code but I think open file and requests at the same time will make it slower
for link in links:
file.write(link + "\n")
for link in links:
response = requests.get(link) # This is to save image
file = open("(yourfilenamehere).png", "wb") # Replace the name that you want for the picture in here
file.write(response.content)
file.close()
Please find comments of explanation inside the code, If you have any problem, just ask, I haven't tested it because I don't have your exact CSV but it should work

Adding csv filename to a column in python (200 files)

I have 200 files with dates in the file name. I would like to add date from this file name into new column in each file.
I created macro in Python:
import pandas as pd
import os
import openpyxl
import csv
os.chdir(r'\\\\\\\')
for file_name in os.listdir(r'\\\\\\'):
with open(file_name,'r') as csvinput:
reader = csv.reader(csvinput)
all = []
row = next(reader)
row.append('FileName')
all.append(row)
for row in reader:
row.append(file_name)
all.append(row)
with open(file_name, 'w') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
writer.writerows(all)
if file_name.endswith('.csv'):
workbook = openpyxl.load_workbook(file_name)
workbook.save(file_name)
csv_filename = pd.read_csv(r'\\\\\\')
csv_data= pd.read_csv(csv_filename, header = 0)
csv_data['filename'] = csv_filename`
Right now I see "InvalidFileException: File is not a zip file" and only first file has added column with the file name.
Can you please advise what am I doing wrong? BTW I,m using Python 3.4.
Many thanks,
Lukasz

First problem, this section:
with open(file_name, 'w') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
writer.writerows(all)
should be indented, to be included in the for loop. Now it is only executed once after the loop. This is why you only get one output file.
Second problem, the exception is probably caused by openpyxl.load_workbook(file_name). Presumably openpyxl can only open actual Excel files (which are .zip files with other extension), no CSV files. Why do you want to open and save it after all? I think you can just remove those three lines.

python code for Exporting scraped data to CSV

import csv
in_txt = csv.reader(open(post.text, "rb"), delimiter = '\t')
out_csv = csv.writer("C:\Users\sptechsoft\Documents\source3.csv", 'wb')
out_csv.writerows(in_txt)
when executing above code i am getting IO error and i need to save in CSV in seperate folder

You dont need to open file before passing it to csvreader.
You can directly pass the file to csvreader and it would work
import csv
in_txt = csv.reader("post.text", "rb", delimiter = '\t')
out_csv = csv.writer("C:\Users\sptechsoft\Documents\source3.csv", 'wb')
out_csv.writerows(in_txt)

Try the following:
import csv
with open(post.text, "rb") as f_input, open(r"C:\Users\sptechsoft\Documents\source3.csv", "wb") as f_output:
in_csv = csv.reader(f_input, delimiter='\t')
out_csv = csv.writer(f_output)
out_csv.writerows(in_csv)
The csv.reader() and csv.writer() needs either a list or a file object. It cannot open the file for you. By using with it ensures the files are correctly closed automatically afterwards.
Also do not forget to prefix your path string with r to disable any string escaping due to the backslashes.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Error Opening Downloaded File from a Data Frame in Python - python

Related

How to read the headers of a csv file using csv module in "rb" mode?

Using Python to download api data

Python organize CSV of urls, split line per url, download images from urls

Adding csv filename to a column in python (200 files)

python code for Exporting scraped data to CSV

Categories

Resources