Using gdal in python to produce tiff files from csv files - python

I have many csv files with this format:
Latitude,Longitude,Concentration
53.833399,-122.825257,0.021957
53.837893,-122.825238,0.022642
....
My goal is to produce GeoTiff files based on the information within these files (one tiff file per csv file), preferably using python. This was done several years ago on the project I am working on, however how they did it before has been lost. All I know is that they most likely used GDAL.
I have attempted to do this by researching how to use GDAL, but this has not got me anywhere, as there are limited resources and I have no knowledge of how to use this.
Can someone help me with this?

Here is a little code I adapted for your case. You need to have the GDAL directory with all the *.exe in added to your path for it to work (in most cases it's C:\Program Files (x86)\GDAL).
It uses the gdal_grid.exe util (see doc here: http://www.gdal.org/gdal_grid.html)
You can modify as you wish the gdal_cmd variable to suits your needs.
import subprocess
import os
# your directory with all your csv files in it
dir_with_csvs = r"C:\my_csv_files"
# make it the active directory
os.chdir(dir_with_csvs)
# function to get the csv filenames in the directory
def find_csv_filenames(path_to_dir, suffix=".csv"):
filenames = os.listdir(path_to_dir)
return [ filename for filename in filenames if filename.endswith(suffix) ]
# get the filenames
csvfiles = find_csv_filenames(dir_with_csvs)
# loop through each CSV file
# for each CSV file, make an associated VRT file to be used with gdal_grid command
# and then run the gdal_grid util in a subprocess instance
for fn in csvfiles:
vrt_fn = fn.replace(".csv", ".vrt")
lyr_name = fn.replace('.csv', '')
out_tif = fn.replace('.csv', '.tiff')
with open(vrt_fn, 'w') as fn_vrt:
fn_vrt.write('<OGRVRTDataSource>\n')
fn_vrt.write('\t<OGRVRTLayer name="%s">\n' % lyr_name)
fn_vrt.write('\t\t<SrcDataSource>%s</SrcDataSource>\n' % fn)
fn_vrt.write('\t\t<GeometryType>wkbPoint</GeometryType>\n')
fn_vrt.write('\t\t<GeometryField encoding="PointFromColumns" x="Longitude" y="Latitude" z="Concentration"/>\n')
fn_vrt.write('\t</OGRVRTLayer>\n')
fn_vrt.write('</OGRVRTDataSource>\n')
gdal_cmd = 'gdal_grid -a invdist:power=2.0:smoothing=1.0 -zfield "Concentration" -of GTiff -ot Float64 -l %s %s %s' % (lyr_name, vrt_fn, out_tif)
subprocess.call(gdal_cmd, shell=True)

Related

Limit on bz2 file decompression using python?

I have numerous files that are compressed in the bz2 format and I am trying to uncompress them in a temporary directory using python to then analyze. There are hundreds of thousands of files so manually decompressing the files isn't feasible so I wrote the following script.
My issue is that whenever I try to do this, the maximum file size is 900 kb even though a manual decompression has each file around 6 MB. I am not sure if this is a flaw in my code and how I am saving the data as a string to then copy to the file or a problem with something else. I have tried this with different files and I know that it works for files smaller than 900 kb. Has anyone else had a similar problem and knows of a solution?
My code is below:
import numpy as np
import bz2
import os
import glob
def unzip_f(filepath):
'''
Input a filepath specifying a group of Himiwari .bz2 files with common names
Outputs the path of all the temporary files that have been uncompressed
'''
cpath = os.getcwd() #get current path
filenames_ = [] #list to add filenames to for future use
for zipped_file in glob.glob(filepath): #loop over the files that meet the name criterea
with bz2.BZ2File(zipped_file,'rb') as zipfile: #Read in the bz2 files
newfilepath = cpath +'/temp/'+zipped_file[-47:-4] #create a temporary file
with open(newfilepath, "wb") as tmpfile: #open the temporary file
for i,line in enumerate(zipfile.readlines()):
tmpfile.write(line) #write the data from the compressed file to the temporary file
filenames_.append(newfilepath)
return filenames_
path_='test/HS_H08_20180930_0710_B13_FLDK_R20_S*bz2'
unzip_f(path_)
It returns the correct file paths with the wrong sizes capped at 900 kb.
It turns out this issue is due to the files being multi stream which does not work in python 2.7. There is more info here as mentioned by jasonharper and here. Below is a solution just using the Unix command to decompress the bz2 files and then moving them to the temporary directory I want. It is not as pretty but it works.
import numpy as np
import os
import glob
import shutil
def unzip_f(filepath):
'''
Input a filepath specifying a group of Himiwari .bz2 files with common names
Outputs the path of all the temporary files that have been uncompressed
'''
cpath = os.getcwd() #get current path
filenames_ = [] #list to add filenames to for future use
for zipped_file in glob.glob(filepath): #loop over the files that meet the name criterea
newfilepath = cpath +'/temp/' #create a temporary file
newfilename = newfilepath + zipped_file[-47:-4]
os.popen('bzip2 -kd ' + zipped_file)
shutil.move(zipped_file[-47:-4],newfilepath)
filenames_.append(newfilename)
return filenames_
path_='test/HS_H08_20180930_0710_B13_FLDK_R20_S0*bz2'
unzip_f(path_)
This is a known limitation in Python2, where the BZ2File class doesn't support multiple streams.
This can be easily resolved by using bz2file, https://pypi.org/project/bz2file/, which is a backport of Python3 implementation and can be used as a drop-in replacement.
After running pip install bz2file you can just replace bz2 with it:
import bz2file as bz2 and everything should just work :)
The original Python bug report: https://bugs.python.org/issue1625

Converting scad file format to stl in Python

Is there a way of converting SCAD files to STL format efficiently in Python? I have around 3000 files to be converted to STL. Plus, there are some different formats.
I tried searching on the internet for some libraries but was not able to find any suitable one (I am using Windows OS) Anyone has any idea?
you can run openscad from command line, see documentation,
and prepare every command by python (example in python3)
from os import listdir
from subprocess import call
files = listdir('.')
for f in files:
if f.find(".scad") >= 0: # get all .scad files in directory
of = f.replace('.scad', '.stl') # name of the outfile .stl
cmd = 'call (["openscad", "-o", "{}", "{}"])'.format(of, f) #create openscad command
exec(cmd)
in python3.5 and higher subprocess.call should be replaced by subrocess.run()

Move certain files from specific subdirectories into a new directory

How do I move only certain files (not all files), from specific subdirectories (not all subdirectories), into a new directory?
The files that need to be moved have been listed in a CSV file and are about 85,000 in number. They have been mentioned with their absolute paths. All the files possess the same extension, i.e., .java. The number of specific subdirectories is about 13,000.
Is there a Python script (preferred) or a Shell script to do this?
N.B: The forums that I searched on returned solutions on how to move all files from within a single subdirectory into a new directory. They are mentioned below:
https://www.daniweb.com/programming/software-development/threads/473187/copymove-all-sub-directories-from-a-folder-to-another-folder-using-python
http://pythoncentral.io/how-to-copy-a-file-in-python-with-shutil/
Filter directory when using shutil.copytree?
https://unix.stackexchange.com/questions/207375/copy-certain-files-from-specified-subdirectories-into-a-separate-subdirectory
Assuming that your CSV file looks like this:
George,/home/geo/mtvernon.java,Betsy
Abe,/home/honest/gettys.java,Mary
You could move the files using this shell command:
$ cut -d, -f2 < x.csv | xargs -I '{}' mv '{}' /home/me/new-directory
Something like this might work for you:
import os
import csv
def move_file(old_file_path, new_directory):
if not os.path.isdir(new_directory):
os.mkdir(new_directory)
base_name = os.path.basename(old_file_path)
new_file_path = os.path.join(new_directory, base_name)
os.rename(old_file_path, new_file_path)
def parse_csv_file(csv_path):
csv_file = open(csv_path)
csv_reader = csv.reader(csv_file, delimiter=',', quotechar='"')
paths = list(csv_reader)
csv_file.close()
return paths
if __name__ == '__main__':
old_file_paths = parse_csv_file('your_csv_path')
for old_file_path in old_file_paths:
move_file(old_file_path, 'your_new_directory')
This assumes that your CSV file only contains paths, delimited by commas, and all of those files exist.

How can I run a python script on many files to get many output files?

I am new at programming and I have written a script to extract text from a vcf file. I am using a Linux virtual machine and running Ubuntu. I have run this script through the command line by changing my directory to the file with the vcf file in and then entering python script.py.
My script knows which file to process because the beginning of my script is:
my_file = open("inputfile1.vcf", "r+")
outputfile = open("outputfile.txt", "w")
The script puts the information I need into a list and then I write it to outputfile. However, I have many input files (all .vcf) and want to write them to different output files with a similar name to the input (such as input_processed.txt).
Do I need to run a shell script to iterate over the files in the folder? If so how would I change the python script to accommodate this? I.e writing the list to an outputfile?
I would integrate it within the Python script, which will allow you to easily run it on other platforms too and doesn't add much code anyway.
import glob
import os
# Find all files ending in 'vcf'
for vcf_filename in glob.glob('*.vcf'):
vcf_file = open(vcf_filename, 'r+')
# Similar name with a different extension
output_filename = os.path.splitext(vcf_filename)[0] + '.txt'
outputfile = open(output_filename, 'w')
# Process the data
...
To output the resulting files in a separate directory I would:
import glob
import os
output_dir = 'processed'
os.makedirs(output_dir, exist_ok=True)
# Find all files ending in 'vcf'
for vcf_filename in glob.glob('*.vcf'):
vcf_file = open(vcf_filename, 'r+')
# Similar name with a different extension
output_filename = os.path.splitext(vcf_filename)[0] + '.txt'
outputfile = open(os.path.join(output_dir, output_filename), 'w')
# Process the data
...
You don't need write shell script,
maybe this question will help you?
How to list all files of a directory?
It depends on how you implement the iteration logic.
If you want to implement it in python, just do it;
If you want to implement it in a shell script, just change your python script to accept parameters, and then use shell script to call the python script with your suitable parameters.
I have a script I frequently use which includes using PyQt5 to pop up a window that prompts the user to select a file... then it walks the directory to find all of the files in the directory:
pathname = first_fname[:(first_fname.rfind('/') + 1)] #figures out the pathname by finding the last '/'
new_pathname = pathname + 'for release/' #makes a new pathname to be added to the names of new files so that they're put in another directory...but their names will be altered
file_list = [f for f in os.listdir(pathname) if f.lower().endswith('.xls') and not 'map' in f.lower() and not 'check' in f.lower()] #makes a list of the files in the directory that end in .xls and don't have key words in the names that would indicate they're not the kind of file I want
You need to import os to use the os.listdir command.
You can use listdir(you need to write condition to filter the particular extension) or glob. I generally prefer glob. For example
import os
import glob
for file in glob.glob('*.py'):
data = open(file, 'r+')
output_name = os.path.splitext(file)[0]
output = open(output_name+'.txt', 'w')
output.write(data.read())
This code will read the content from input and store it in outputfile.

Using Biopython SeqIO.convert over an entire directory

I have 51 files with metagenomic sequence data that I would like to convert from fastq to fasta using a Biopython script in Windows. The module SeqIO.convert easily converts an individually specified file, but I can't figure out how to convert the entire directory. It's not really too many files to do individually, but I'm trying to learn.
I'm brand new to Biopython, so please forgive my ignorance. This convo was helpful, but I'm still not able to convert the directory from fastq to fasta.
Here's the code I've been trying to run:
#modules-
import sys
import re
import os
import fileinput
from Bio import SeqIO
#define directory
Directory = "FastQ”
#convert files
def process(filename):
return SeqIO.convert(filename, "fastq", "files.fa", filename + ".fasta", "fasta", alphabet= IUPAC.ambiguous_dna)
You need to iterate over the files in the directory and convert them, so assuming your directory is FastQ and that you are calling your script from the proper folder (i.e. the one that your directory is in, since you are using a relative path), you would need to do something like:
def process(directory):
filelist = os.listdir(directory)
for f in filelist:
SeqIO.convert(f, "fastq", f.replace(".fastq",".fasta"), "fasta", alphabet= IUPAC.ambiguous_dna)
then you would call your script in your main:
my_directory = "FastQ"
process(my_directory)
I think that should work.

Categories

Resources