Check if GZIP file exists in Python

Check if GZIP file exists in Python - python

I would like to check for the existence of a .gz file on my linux machine while running Python. If I do this for a text file, the following code works:
import os.path
os.path.isfile('bob.asc')
However, if bob.asc is in gzip format (bob.asc.gz), Python does not recognize it. Ideally, I would like to use os.path.isfile or something very concise (without writing new functions). Is it possible to make the file recognizable either in Python or by changing something in my system configuration?
Unfortunately I can't change the data format or the file names as they are being given to me in batch by a corporation.

After fooling around for a bit, the most concise way I could get the job done was
subprocess.call(['ls','bob.asc.gz']) == 0
which returns True if the file exists in the directory. This is the behavior I would expect from
os.path.isfile('bob.asc.gz')
but for some reason Python won't accept files with extension .gz as files when passed to os.path.isfile.
I don't feel like my solution is very elegant, but it is concise. If someone has a more elegant solution, I'd love to see it. Thanks.

Of course it doesn't; they are completely different files. You need to test it separately:
os.path.isfile('bob.asc.gz')
This would return True if that exact file was present in the current working directory.
Although a workaround could be:
from os import listdir, getcwd
from os.path import splitext, basename
any(splitext(basename(f))[0] == 'bob.asc' for f in listdir(getcwd()))

You need to test each file. For example :
if any(map(os.path.isfile, ['bob.asc', 'bob.asc.gz'])):
print 'yay'

Related

Python - How to write a windows path to a json file?

I am working together with a colleague and he has Ubuntu while I have windows. We have a dataset of json files which have in them a "path" written. His paths look like this:
'C:/Users/krock/Desktop/FIIT/BP/Ubuntu/luadb/etc/luarocks_test/modules/30log/share/lua/5.3/30log.lua'
But this doesn't work on Windows, I was trying to do
some_string.replace('/', '\\')
But this results in strings written in json that look like this:
'C:\\Users\\krock\\Desktop\\FIIT\\BP\\Ubuntu\\luadb\\etc\\luarocks_test\\data_all'
On my windows machine, I can't read (the program) these paths as it give an error:
No such file or directory
Is there a solution to this?
EDIT: I tried using Path from pathlib, but I got another error saying:
TypeError: Object of type WindowsPath is not JSON serializable
I found the solution to this is to do str(Path(path_string)), but the result is again the path in double quotes.

Yes, the solution is to use Python's built in pathlib. Also, using string literals might help the clarity of your program.
https://docs.python.org/3/library/pathlib.html

This question is missing code samples, so can't be more specific, but generally speaking, doing this manually is error-prone. Consider using a library, such as pathlib. E.G:
>>> from pathlib import Path
>>> Path('luarocks_test/modules/30log/share/lua/5.3/30log.lua')
PosixPath('luarocks_test/modules/30log/share/lua/5.3/30log.lua')
On Windows, instantiating a Path would give you a WindowsPath. You'll also want to use relative, rather than absolute references, as the paths will be different on your workstations.

Evaluating File Paths in Excel

I have an ever increasing list of file paths (i have around 5000 records now) in Excel. More specifically, I have a certain unique identifier in column A and in Column B, I have a file path that leads to a picture for that unique identifier.
The process of adding the file paths is very manual and sometimes mistakes occur. So, I wanted to create a code that goes through each one of this file paths and if file path doesn't open/returns an error, to store these values in a list so that I can go directly to those and fix the file path.
I was thinking of writing a Python code that checks the File Path in Google Chrome URL (I have found it to work better than directly clicking the Hyperlink in Excel), but it's been a while since I have used Python and don't know where to start.
Any recommendation/ideas of how to achieve this?
Thank you,
Ricardo G.

To read excel files, I prefer to use the pandas library, specifically the read_excel function. You can also check if a filepath is a valid, existing file in your filesystem using the os.path module. os.path.isfile returns True if the provided path points to an actual file, so you want to use a list comprehension with a filter to only have filepaths where that is not the case.
import pandas as pd
import os
df = pd.read_excel('path/to/excel')
bad_files = [fp for fp in df['filepath_column'] if !os.path.isfile(path)]
I'm not sure what you mean by check with google chrome, but if you're talking about local files, this should work well for you.

How to access file in parent directory using python?

I am trying to access a text file in the parent directory,
Eg : python script is in codeSrc & the text file is in mainFolder.
script path:
G:\mainFolder\codeSrc\fun.py
desired file path:
G:\mainFolder\foo.txt
I am currently using this syntax with python 2.7x,
import os
filename = os.path.dirname(os.getcwd())+"\\foo.txt"
Although this works fine, is there a better (prettier :P) way to do this?

While your example works, it is maybe not the nicest, neither is mine, probably. Anyhow, os.path.dirname() is probably meant for strings where the final part is already a filename. It uses os.path.split(), which provides an empty string if the path end with a slash. So this potentially can go wrong. Moreover, as you are already using os.path, I'd also use it to join paths, which then becomes even platform independent. I'd write
os.path.join( os.getcwd(), '..', 'foo.txt' )
...and concerning the readability of the code, here (as in the post using the environ module) it becomes evident immediately that you go one level up.

To get a path to a file in the parent directory of the current script you can do:
import os
file_path = os.path.join(os.path.dirname(os.path.dirname(__file__)), 'foo.txt')

You can try this
import environ
environ.Path() - 1 + 'foo.txt'

to get the parent dir the below code will help you:
import os
os.path.abspath(os.path.join('..', os.getcwd()))

Python - extract and modify a file path in all files in a directory in linux

I have files .sh files and .json files in which there are file paths given to point to a specific directory, but I should keep on changing the file path, depending on where my python scipt is run.
eg:content of one of my .sh file is
"cd /home/aswany/BotStudioInstallation/databricks/platform/databricksastro"
and I should change the file path via python code where the following path
"/home/aswany/BotStudioInstallation/" keep on changing depending on where databicks is located,
I tried the following code:
replaceAll(str(self.currentdirectory)+
"/databricks/platform/devsettings.json",
"/home/holmes/BotStudioInstallation",self.currentdirectory)
and function replaceAll is:
def replaceAll(self,file,searchExp,replaceExp):
for line in fileinput.input(file, inplace=1):
if searchExp in line:
line = line.replace(searchExp,replaceExp)
sys.stdout.write(line)
but above code only replaces a line
"home/holmes/BotStudioInstallation" to the current directory I am logged in,bt it cannot be sure that "home/holmes/BotStudioInstallation" is the only possibility it keep on changing like "home/aswany/BotStudioInstallation","home/dev3/BotStudioInstallation" etc ,I thought of regular expression for this.
please help me

Not sure I 100% understood your issue, but maybe I can help nonetheless.
As pointed out by J.F. Sebastian, you can use relative paths and remove the base part of the path. Using ./databricks/platform/devsettings.json might be enough. This is by far the most elegant solution.
If for any reason it is not, you can keep the directory you need to access, then append it to the base directory whenever you need it. That should allow you to deal with changes in the base directory. Though in the case the files will be used by other applications than your own, that might not be an option.
dir = get_dir_from_json()
dir_with_base = self.currentdirectory + dir
Alternatively, not an elegant solution though, without using regex you can use a "pattern" to always replace.
{
"directory": "<<_replace_me_>>/databricks/platform"
}
Then you know you can always replace "<<_replace_me_>>" with the base directory.

Why is tempfile using DOS 8.3 directory names on my XP box?

>>> import tempfile
>>> tempfile.mkstemp()
(3, 'c:\\docume~1\\k0811260\\locals~1\\temp\\tmpk6tpd3')
It works, but looks a bit strange. and the actual temporary file name is more than 8 letters.
Why doesn't it use long file names instead?

mkstemp uses the environment variables TMPDIR, TEMP or TMP (the first one that is set) to determine where to put your temporary file. One of these is probably set to c:\docume~1\k0811260\locals~1\temp on your system. Issue
echo %%tmp%%
etc. in a command window ("DOS box") to find out for sure.
Which, in fact, is a good thing because some naïve modules/programs (e.g., those that call external OS commands) may get confused when a directory name contains a space, due to quoting issues.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Check if GZIP file exists in Python - python

You need to test each file. For example : if any(map(os.path.isfile, ['bob.asc', 'bob.asc.gz'])): print 'yay'

Related

Python - How to write a windows path to a json file?

Evaluating File Paths in Excel

How to access file in parent directory using python?

Python - extract and modify a file path in all files in a directory in linux

Why is tempfile using DOS 8.3 directory names on my XP box?

Categories

Resources