I have written this code that takes a URL and downloads the image. I was wondering how can I save the images to a specific folder in the same directory?
For example, I have a folder named images in the same directory. I wanted to save all the images to that folder.
Below is my code:
import requests
import csv
with open('try.csv', 'r') as csv_file:
csv_reader = csv.reader(csv_file)
for rows in csv_reader:
image_resp = requests.get(rows[0])
with open(rows[1], 'wb') as image_downloader:
image_downloader.write(image_resp.content)
Looking for this?
with open(os.path.join("images", rows[1]), 'wb') as fd:
for chunk in r.iter_content(chunk_size=128):
fd.write(chunk)
See official docs here: os.path.join
Requests specific stuff from https://2.python-requests.org/en/master/user/quickstart/#raw-response-content
PS: You might want to use a connection pool and/or multiprocessing instead of the for rows in csv_reader, in order to have many concurrent requests.
Saving an image can be achieved by using .save() method with Pillow library’s Image module in Python.
Ex.
from PIL import Image
file = "C://Users/ABC/MyPic.jpg" # Selects a jpg file 'MyPic'
img = Image.open(file)
img.save('Desktop/MyPics/new_img.jpg') # Saves it to MyPics folder as 'new_img'
Image.save(fp, format=None, **params)[source]:
Saves this image under the given filename. If no format is specified, the format to use is determined from the filename extension, if possible.
Keyword options can be used to provide additional instructions to the writer. If a writer doesn’t recognise an option, it is silently ignored. The available options are described in the image format documentation for each writer.
You can use a file object instead of a filename. In this case, you must always specify the format. The file object must implement the seek, tell, and write methods, and be opened in binary mode.
Related
I am new here to try to solve one of my interesting questions in World of Tanks. I heard that every battle data is reserved in the client's disk in the Wargaming.net folder because I want to make a batch of data analysis for our clan's battle performances.
image
It is said that these .dat files are a kind of json files, so I tried to use a couple of lines of Python code to read but failed.
import json
f = open('ex.dat', 'r', encoding='unicode_escape')
content = f.read()
a = json.loads(content)
print(type(a))
print(a)
f.close()
The code is very simple and obviously fails to make it. Well, could anyone tell me the truth about that?
Added on Feb. 9th, 2022
After I tried another set of codes via Jupyter Notebook, it seems like something can be shown from the .dat files
import struct
import numpy as np
import matplotlib.pyplot as plt
import io
with open('C:/Users/xukun/Desktop/br/ex.dat', 'rb') as f:
fbuff = io.BufferedReader(f)
N = len(fbuff.read())
print('byte length: ', N)
with open('C:/Users/xukun/Desktop/br/ex.dat', 'rb') as f:
data =struct.unpack('b'*N, f.read(1*N))
The result is a set of tuple but I have no idea how to deal with it now.
Here's how you can parse some parts of it.
import pickle
import zlib
file = '4402905758116487.dat'
cache_file = open(file, 'rb') # This can be improved to not keep the file opened.
# Converting pickle items from python2 to python3 you need to use the "bytes" encoding or "latin1".
legacyBattleResultVersion, brAllDataRaw = pickle.load(cache_file, encoding='bytes', errors='ignore')
arenaUniqueID, brAccount, brVehicleRaw, brOtherDataRaw = brAllDataRaw
# The data stored inside the pickled file will be a compressed pickle again.
vehicle_data = pickle.loads(zlib.decompress(brVehicleRaw), encoding='latin1')
account_data = pickle.loads(zlib.decompress(brAccount), encoding='latin1')
brCommon, brPlayersInfo, brPlayersVehicle, brPlayersResult = pickle.loads(zlib.decompress(brOtherDataRaw), encoding='latin1')
# Lastly you can print all of these and see a lot of data inside.
The response contains a mixture of more binary files as well as some data captured from the replays.
This is not a complete solution but it's a decent start to parsing these files.
First you can look at the replay file itself in a text editor. But it won't show the code at the beginning of the file that has to be cleaned out. Then there is a ton of info that you have to read in and figure out but it is the stats for each player in the game. THEN it comes to the part that has to do with the actual replay. You don't need that stuff.
You can grab the player IDs and tank IDs from WoT developer area API if you want.
After loading the pickle files like gabzo mentioned, you will see that it is simply a list of values and without knowing what the value is referring to, its hard to make sense of it. The identifiers for the values can be extracted from your game installation:
import zipfile
WOT_PKG_PATH = "Your/Game/Path/res/packages/scripts.pkg"
BATTLE_RESULTS_PATH = "scripts/common/battle_results/"
archive = zipfile.ZipFile(WOT_PKG_PATH, 'r')
for file in archive.namelist():
if file.startswith(BATTLE_RESULTS_PATH):
archive.extract(file)
You can then decompile the python files(uncompyle6) and then go through the code to see the identifiers for the values.
One thing to note is that the list of values for the main pickle objects (like brAccount from gabzo's code) always has a checksum as the first value. You can use this to check whether you have the right order and the correct identifiers for the values. The way these checksums are generated can be seen in the decompiled python files.
I have been tackling this problem for some time (albeit in Rust): https://github.com/dacite/wot-battle-results-parser/tree/main/datfile_parser.
I have a folder with 452 images (.png) that I'm trying to merge into a single PDF file, using Python. Each of the images are labelled by their intended page number, e.g. "1.png", "2.png", ....., "452.png".
This code was technically successful, but input the pages out of the intended order.
import img2pdf
from PIL import Image
with open("output.pdf", 'wb') as f:
f.write(img2pdf.convert([i for i in os.listdir('.') if i.endswith(".png")]))
I also tried reading the data as binary data, then convert it and write it to the PDF, but this yields a 94MB one-page PDF.
import img2pdf
from PIL import Image
with open("output.pdf", 'wb') as f:
for i in range(1, 453):
img = Image.open(f"{i}.png")
pdf_bytes = img2pdf.convert(img)
f.write(pdf_bytes)
Any help would be appreciated, I've done quite a bit of research, but have come up short. Thanks in advance.
but input the pages out of the intended order
I suspect that the intended order is "in numerical order of file name", i.e. 1.png, 2.png, 3.png, and so forth.
This can be solved with:
with open("output.pdf", 'wb') as f:
f.write(img2pdf.convert(sorted([i for i in os.listdir('.') if i.endswith(".png")], key=lambda fname: int(fname.rsplit('.',1)[0]))))
This is a slightly modified version of your first attempt, that just sorts the file names (in the way your second attempt tries to do) before batch-writing it to the PDF
I have old code below that gzips a file and stores it as json into S3, using the IO library ( so a file does not save locally). I am having trouble converting this same approach (ie using IO library for a buffer) to create a .txt file and push into S3 and later retrieve. I know how to create txt files and push into s3 is as well, but not how to use IO in the process.
The value I would want to be stored in the text value would just be a variable with a string value of 'test'
Goal: Use IO library and save string variable as a text file into S3 and be able to pull it down again.
x = 'test'
inmemory = io.BytesIO()
with gzip.GzipFile(fileobj=inmemory, mode='wb') as fh:
with io.TextIOWrapper(fh, encoding='utf-8',errors='replace') as wrapper:
wrapper.write(json.dumps(x, ensure_ascii=False,indent=2))
inmemory.seek(0)
s3_resource.Object(s3bucket, s3path + '.json.gz').upload_fileobj(inmemory)
inmemory.close()
Also any documentation with that anyone likes with specific respect to the IO library and writing to files, because the actual documentation ( f = io.StringIO("some initial text data")
ect..https://docs.python.org/3/library/io.html ) It just did not give me enough at my current level.
Duplicate.
For sake of brevity, it turns out there's a way to override the putObject call so that it takes in a string of text instead of a file.
The original post is answered in Java, but this additional thread should be sufficient for a Python-specific answer.
I am asking this question for Moritz:
I managed to run the programme (ubuntu-linux-tested branch) and upload a .uff-file. When I perform the analysis it also works decently but how do I extract the modal parameters (eigen frequencies, damping and mode shape). When I click on export and select the analysis results it saves an almost empty .uff-file (see .txt file)
Thank you in advance!!
Currently modal vectors are not yet exported in uff you would have to
open saved *.mdd file with python (mdd file is actually pickled data).
Within data you will find geometry data, modal vectors, etc.
Here is the python code to do this (see also: https://github.com/openmodal/OpenModal/issues/31):
import OpenModal as OM # In order to reload modules to new location when unpickling files
import pickle
import sys
import pandas as pd
sys.modules['openModal'] = OM # In order to reload modules to new location when unpickling files
## see: https://stackoverflow.com/questions/13398462/unpickling-python-objects-with-a-changed-module-path
file_name=r'beam_accel.mdd'
f = open(file_name, 'rb')
data = pickle.load(f)
writer = pd.ExcelWriter('output.xlsx')
data[0].tables['analysis_index'].astype(str).to_excel(writer, 'index')
data[0].tables['analysis_values'].astype(str).to_excel(writer, 'values')
writer.save()
I have the following code:
import os
from pyPdf import PdfFileReader, PdfFileWriter
path = "C:/Real Python/Course materials/Chapter 12/Practice files"
input_file_name = os.path.join(path, "Pride and Prejudice.pdf")
input_file = PdfFileReader(file(input_file_name, "rb"))
output_PDF = PdfFileWriter()
for page_num in range(1, 4):
output_PDF.addPage(input_file.getPage(page_num))
output_file_name = os.path.join(path, "Output/portion.pdf")
output_file = file(output_file_name, "wb")
output_PDF.write(output_file)
output_file.close()
Till now I was just reading from Pdfs and later learned to write from Pdf to txt... But now this...
Why the PdfFileReader differs so much from PdfFileWriter
Can someone explain this? I would expect something like:
import os
from pyPdf import PdfFileReader, PdfFileWriter
path = "C:/Real Python/Course materials/Chapter 12/Practice files"
input_file_name = os.path.join(path, "Pride and Prejudice.pdf")
input_file = PdfFileReader(file(input_file_name, "rb"))
output_file_name = os.path.join(path, "out Pride and Prejudice.pdf")
output_file = PdfFileWriter(file(output_file_name, "wb"))
for page_num in range(1,4):
page = input_file.petPage(page_num)
output_file.addPage(page_num)
output_file.write(page)
Any help???
Thanks
EDIT 0: What does .addPage() do?
for page_num in range(1, 4):
output_PDF.addPage(input_file.getPage(page_num))
Does it just creates 3 BLANK pages?
EDIT 1: Someone can explain what happends when:
1) output_PDF = PdfFileWriter()
2) output_PDF.addPage(input_file.getPage(page_num))
3) output_PDF.write(output_file)
The 3rd one passes a JUST CREATED(!) object to output_PDF , why?
The issue is basically the PDF Cross-Reference table.
It's a somewhat tangled spaghetti monster of references to pages, fonts, objects, elements, and these all need to link together to allow for random access.
Each time a file is updated, it needs to rebuild this table. The file is created in memory first so this only has to happen once, and further decreasing the chances of torching your file.
output_PDF = PdfFileWriter()
This creates the space in memory for the PDF to go into. (to be pulled from your old pdf)
output_PDF.addPage(input_file.getPage(page_num))
add the page from your input pdf, to the PDF file created in memory (the page you want.)
output_PDF.write(output_file)
Finally, this writes the object stored in memory to a file, building the header, cross-reference table, and linking everything together all hunky dunky.
Edit: Presumably, the JUST CREATED flag signals PyPDF to start building the appropriate tables and link things together.
--
in response to the why vs .txt and csv:
When you're copying from a text or CSV file, there's no existing data structures to comprehend and move to make sure things like formatting, image placement, and form data (input sections, etc) are preserved and created properly.
Most likely, it's done because PDFs aren't exactly linear - the "header" is actually at the end of the file.
If the file was written to disk every time a change was made, your computer needs to keep pushing that data around on the disk. Instead, the module (probably) stores the information about the document in an object (PdfFileWriter), and then converts that data into your actual PDF file when you request it.