Using files I downloaded with python - python

So I want to download a bunch of clinical trial information from clinicaltrials.gov. They have a system that lets you download searches by using a custom URL. The url format is https://clinicaltrials.gov/ct2/results/download_fields?cond=&term=genentech&locn=pennsylvania&down_count=1000&down_fmt=xml
First of all how do I download that file using python? I'm assuming its something like
file = requests.get('https://clinicaltrials.gov/ct2/results/download_fields?cond=&term=genentech&locn=pennsylvania&down_count=1000&down_fmt=xml')
Then can I also rename the file and put it in my working directory?
In the end I would like to process about three to four hundred downloads and parse the files for certain information. I think I can handle that part but getting all the files into my working directory is what I'm having trouble with now.
Any help would be greatly appreciated.
Thanks!

Related

Pulling files from real devices in appium iOS

Im having a difficult time trying to pull files and folders in one of my automated tests using appium. We use real devices for testing and I would like to use driver.pull_file() to accomplish this task. The files I want exist in the On My iPad folder, and I cannot figure out how to get the file path of the actual file in that location on the device.
Does anyone know where exactly I can find the right path? or what it would look like?
How to get the file path of a file on iOS.

How to extract all files from a p7m file

I have a bunch of p7m files (used to digitally sign some files, usually pdf files) and I would like some help to find a way to extract the content. I know how to iterate a process over the files in a folder using Python, I need help just with the extraction part.
I tried with PyPDF2.PdfFileReader.decrypt() but I get a "EOF marker not found" error because apparently PyPDF2 cannot manage encrypted files.
I saw somebody used the mime library, but that is way above my level honestly.
Thank you

Python - How to download a file with given name from online repository (given its URL)

I currently have a main python script which is working by analyzing a given csv file present in its own local working folder. With the aim of automatizing the process of analyzing more than one csv file, I'm currently trying to build another script which is performing the following tasks:
Download in local working folder a csv file, identified by its own name among the many in an online repository (a OneDrive folder), for which I have the corresponding URL (for the OneDrive folder, not directly the file).
Run the main script and analyze it.
Remove the analyzed csv file from local folder and repeat the process.
I'm having some issues with the identification and download of the csv files.
I've seen some approaches using 'request' module but they were more related to downloading directly a file corresponding to a given URL, not looking for it and taking it from an online repository. For this reason I'm not even sure about how to start here.
What I'm looking for is something like:
url = 'https://1drv.ms/xxxxxxxxx'
file_name = 'title.csv'
# -> Download(link = url, file = file_name)
Thanks in advance to anyone who'll take some time to read this! :)

how to extract a continuously updable Zip file on a website by Selenium and then unzip it for specific file names

I have used Selenium x Python to download a zip file daily but i am currently facing a few issues after downloading it on my local download folder
is it possible to use Python to read those files dynamically? let's say the date is always different. Can we simply add wildcard*? I am trying to move it from downloader folder to another folder but it always require me to name the file entirely.
how to unzip a file and look for specific files there? let's say those file will always start with files names "ABC202103xx.csv"
much appreciate for your help! any sample code will be truly appreciate!
Not knowing the excact name of a file in a local folder should usually not be a problem. You could just list all filenames in the local folder and then use a for loop to find the filename you need. For example let's assume that you have downloaded a zip file into a Downloads folder and you know it is named "file-X.zip" with X being any date.
import os
for filename in os.listdir("Downloads"):
if filename.startswith("file-") and filename.endswith(".zip"):
filename_you_are_looking_for = filename
break
To unzip files, I will refer you to this stackoverflow thread. Again, to look for specific files in there, you can use os.listdir.

How to detect and separate Corrupt/Unreadable PDFs and password protected PDFs from a directory using python?

I have a directory containing about ~ 1,00,000 multipage PDFs.
I want to separate Corrupt/Unreadable and Password protected PDFs from this directory using python.
Need a good and fast solution as I might need to do it for large number of files in future.
Thanks in advance.
You can try to use PyPDF2. Loop over all files in the directory using os.listdir() and try opening each one, and store the name of each one that gives you an error. You can also place them in two different directories depending on whether opening a file gives you an error using simple try/except.

Categories

Resources