How to import "dat" file including only certain strings in python

How to import "dat" file including only certain strings in python - python

I am trying to import several "dat" files to python spyder.
dat_file_list_images
Here are dat files listed, and there are two types on the list ended with _1 and _2. I wanna import dat files ended with "_1" only.
Is there any way to import them at once with one single loop?
After I import them, I would like to aggregate all to one single matrix.

import os
files_to_import = [f for f in os.listdir(folder_path)
if f.endswith("1")]
Make sure that you know whether the files have a .dat-extension or not - in Windows Explorer, the default setting is to hide file endings, and this will make your code fail if the files have a different ending.
What this code does is called list comprehension - os.listdir() provides all the files in the folder, and you create a list with only the ones that end with "1".

Uses str.endswith() it will return true if the entered string is ended with checking string
According to this website
Syntax: str.endswith(suffix[, start[, end]])
For your case:
You will need a loop to get filenames as String and check it while looping if it ends with "_1"
yourFilename = "yourfilename_1"
if yourFilename.endswith("_1"):
# do your job here

Related

Modifying the order of reading CSV Files in Python according to the name

I have 1000 CSV files with names Radius_x where x stands for 0,1,2...11,12,13...999. But when I read the files and try to analyse the results, I wish to read them in the same order of whole numbers as listed above. But the code reads as follows (for example): ....145,146,147,148,149,15,150,150...159,16,160,161,...... and so on.
I know that if we rename the CSV files as Radius_xyz where xyz = 000,001,002,003,....010,011,012.....999, the problem could be resolved. Kindly help me as to how I can proceed.

To sort a list of paths numerically in python, first find all the files you are looking to open, then sort that iterable with a key which extracts the number.
With pathlib:
from pathlib import Path
files = list(Path("/tmp/so/").glob("Radius_*.csv")) # Path.glob returns a generator which needs to be put in a list
files.sort(key=lambda p: int(p.stem[7:])) # `Radius_` length is 7
files contains
[PosixPath('/tmp/so/Radius_1.csv'),
PosixPath('/tmp/so/Radius_2.csv'),
PosixPath('/tmp/so/Radius_3.csv'),
PosixPath('/tmp/so/Radius_4.csv'),
PosixPath('/tmp/so/Radius_5.csv'),
PosixPath('/tmp/so/Radius_6.csv'),
PosixPath('/tmp/so/Radius_7.csv'),
PosixPath('/tmp/so/Radius_8.csv'),
PosixPath('/tmp/so/Radius_9.csv'),
PosixPath('/tmp/so/Radius_10.csv'),
PosixPath('/tmp/so/Radius_11.csv'),
PosixPath('/tmp/so/Radius_12.csv'),
PosixPath('/tmp/so/Radius_13.csv'),
PosixPath('/tmp/so/Radius_14.csv'),
PosixPath('/tmp/so/Radius_15.csv'),
PosixPath('/tmp/so/Radius_16.csv'),
PosixPath('/tmp/so/Radius_17.csv'),
PosixPath('/tmp/so/Radius_18.csv'),
PosixPath('/tmp/so/Radius_19.csv'),
PosixPath('/tmp/so/Radius_20.csv')]
NB. files is a list of paths not strings, but most functions which deal with files accept both types.
A similar approach could be done with glob, which would give a list of strings not paths.

Rename directory with constantly changing name

I created a script that is supposed to download some data, then run a few processes. The data source (being ArcGIS Online) always downloads the data as a zip file and when extracted the folder name will be a series of letters and numbers. I noticed that these occasionally change (not entirely sure why). My thought is to run an os.listdir to get the folder name then rename it. Where I run into issues is that the list returns the folder name with brackets and quotes. It returns as ['f29a52b8908242f5b1f32c58b74c063b.gdb'] as the folder name while folder in the file explorer does not have the brackets and quotes. Below is my code and the error I receive.
from zipfile import ZipFile
file_name = "THDNuclearFacilitiesBaseSandboxData.zip"
with ZipFile(file_name) as zip:
# unzipping all the files
print("Unzipping "+ file_name)
zip.extractall("C:/NAPSG/PROJECTS/DHS/THD_Nuclear_Facilities/SCRIPT/CountyDownload/Data")
print('Unzip Complete')
#removes old zip file
os.remove(file_name)
x = os.listdir("C:/NAPSG/PROJECTS/DHS/THD_Nuclear_Facilities/SCRIPT/CountyDownload/Data")
os.renames(str(x), "Test.gdb")
Output:
FileNotFoundError: [WinError 2] The system cannot find the file specified: "['f29a52b8908242f5b1f32c58b74c063b.gdb']" -> 'Test.gdb'
I'm relatively new to python scripting, so if there is an easier alternative, that would be great as well. Thanks!

os.listdir() returns a list files/objects that are in a folder.
lists are represented, when printed to the screen, using a set of brackets.
The name of each file is a string of characters and strings are represented, when printed to the screen, using quotes.
So we are seeing a list with a single filename:
['f29a52b8908242f5b1f32c58b74c063b.gdb']
To access an item within a list using Python, you can using index notation (which happens to also use brackets to tell Python which item in the list to use by referencing the index or number of the item.
Python list indexes starting at zero, so to get the first (and in this case only item in the list), you can use x[0].
x = os.listdir("C:/NAPSG/PROJECTS/DHS/THD_Nuclear_Facilities/SCRIPT/CountyDownload/Data")
os.renames(x[0], "Test.gdb")
Having said that, I would generally not use x as a variable name in this case... I might write the code a bit differently:
files = os.listdir("C:/NAPSG/PROJECTS/DHS/THD_Nuclear_Facilities/SCRIPT/CountyDownload/Data")
os.renames(files[0], "Test.gdb")

Square brackets indicate a list. Try x[0] that should get rid of the brackets and be just the data.
The return from listdir may be a list with only one value or a whole bunch

Make glob directory variable

I'm trying to write a Python script that searches a folder for all files with the .txt extension. In the manuals, I have only seen it hardcoded into glob.glob("hardcoded path").
How do I make the directory that glob searches for patterns a variable? Specifically: A user input.
This is what I tried:
import glob
input_directory = input("Please specify input folder: ")
txt_files = glob.glob(input_directory+"*.txt")
print(txt_files)
Despite giving the right directory with the .txt files, the script prints an empty list [ ].

If you are not sure whether a path contains a separator symbol at the end (usually '/' or '\'), you can concatenate using os.path.join. This is a much more portable method than appending your local OS's path separator manually, and much shorter than writing a conditional to determine if you need to every time:
import glob
import os
input_directory = input('Please specify input folder: ')
txt_files = glob.glob(os.path.join(input_directory, '*.txt'))
print(txt_files)

For Python 3.4+, you can use pathlib.Path.glob() for this:
import pathlib
input_directory = pathlib.Path(input('Please specify input folder: '))
if not input_directory.is_dir():
# Input is invalid. Bail or ask for a new input.
for file in input_directory.glob('*.txt'):
# Do something with file.
There is a time of check to time of use race between the is_dir() and the glob, which unfortunately cannot be easily avoided because glob() just returns an empty iterator in that case. On Windows, it may not even be possible to avoid because you cannot open directories to get a file descriptor. This is probably fine in most cases, but could be a problem if your application has a different set of privileges from the end user or from other applications with write access to the parent directory. This problem also applies to any solution using glob.glob(), which has the same behavior.
Finally, Path.glob() returns an iterator, and not a list. So you need to loop over it as shown, or pass it to list() to materialize it.

Attempting to read data from multiple files to multiple arrays

I would like to be able to read data from multiple files in one folder to multiple arrays and then perform analysis on these arrays such as plot graphs etc. I am currently having trouble reading the data from these files into multiple arrays.
My solution process so far is as follows;
import numpy as np
import os
#Create an empty list to read filenames to
filenames = []
for file in os.listdir('C\\folderwherefileslive'):
filenames.append(file)
This works so far, what I'd like to do next is to iterate over the filenames in the list using numpy.genfromtxt.
I'm trying to use os.path join to put the individual list entry at the end of the path specified in listdir earlier. This is some example code:
for i in filenames:
file_name = os.path.join('C:\\entryfromabove','i')
'data_'+[i] = np.genfromtxt('file_name',skiprows=2,delimiter=',')
This piece of code returns "Invalid syntax".
To sum up the solution process I'm trying to use so far:
1. Use os.listdir to get all the filenames in the folder I'm looking at.
2. Use os.path.join to direct np.genfromtxt to open and read data from each file to a numpy array named after that file.
I'm not experienced with python by any means - any tips or questions on what I'm trying to achieve are welcome.

For this kind of task you'd want to use a dictionary.
data = {}
for file in os.listdir('C\\folderwherefileslive'):
filenames.append(file)
path = os.path.join('C:\\folderwherefileslive', i)
data[file] = np.genfromtxt(path, skiprows=2, delimiter=',')
# now you could for example access
data['foo.txt']
Notice, that everything you put within single or double quotes ends up being a character string, so 'file_name' will just be some characters, whereas using file_name would use the value stored in variable by that name.

Glob search files in date order?

I have this line of code in my python script. It searches all the files in in a particular directory for * cycle *.log.
for searchedfile in glob.glob("*cycle*.log"):
This works perfectly, however when I run my script to a network location it does not search them in order and instead searches randomly.
Is there a way to force the code to search by date order?
This question has been asked for php but I am not sure of the differences.
Thanks

To sort files by date:
import glob
import os
files = glob.glob("*cycle*.log")
files.sort(key=os.path.getmtime)
print("\n".join(files))
See also Sorting HOW TO.

Essentially the same as #jfs but in one line using sorted
import os,glob
searchedfiles = sorted(glob.glob("*cycle*.log"), key=os.path.getmtime)

Well. The answer is nope. glob uses os.listdir which is described by:
"Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order. It does not include the special entries '.' and '..' even if they are present in the directory."
So you are actually lucky that you got it sorted. You need to sort it yourself.
This works for me:
import glob
import os
import time
searchedfile = glob.glob("*.cpp")
files = sorted( searchedfile, key = lambda file: os.path.getctime(file))
for file in files:
print("{} - {}".format(file, time.ctime(os.path.getctime(file))) )
Also note that this uses creation time, if you want to use modification time, the function used must be getmtime.

If your paths are in sortable order then you can always sort them as strings (as others have already mentioned in their answers).
However, if your paths use a datetime format like %d.%m.%Y, it becomes a bit more involving. Since strptime does not support wildcards, we developed a module datetime-glob to parse the date/times from paths including wildcards.
Using datetime-glob, you could walk through the tree, list a directory, parse the date/times and sort them as tuples (date/time, path).
From the module's test cases:
import pathlib
import tempfile
import datetime_glob
def test_sort_listdir(self):
with tempfile.TemporaryDirectory() as tempdir:
pth = pathlib.Path(tempdir)
(pth / 'some-description-20.3.2016.txt').write_text('tested')
(pth / 'other-description-7.4.2016.txt').write_text('tested')
(pth / 'yet-another-description-1.1.2016.txt').write_text('tested')
matcher = datetime_glob.Matcher(pattern='*%-d.%-m.%Y.txt')
subpths_matches = [(subpth, matcher.match(subpth.name)) for subpth in pth.iterdir()]
dtimes_subpths = [(mtch.as_datetime(), subpth) for subpth, mtch in subpths_matches]
subpths = [subpth for _, subpth in sorted(dtimes_subpths)]
# yapf: disable
expected = [
pth / 'yet-another-description-1.1.2016.txt',
pth / 'some-description-20.3.2016.txt',
pth / 'other-description-7.4.2016.txt'
]
# yapf: enable
self.assertListEqual(subpths, expected)

Using glob no. Right now as you're using it, glob is storing all the files simultaneously in code and has no methods for organizing those files. If only the final result is important, you could use a second loop that checks the file's date and resorts based on that. If the parse order matters, glob is probably not the best way to do this.

You can sort the list of files that come back using os.path.getmtime or os.path.getctime. See this other SO answer and note the comments as well.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to import "dat" file including only certain strings in python - python

Related

Modifying the order of reading CSV Files in Python according to the name

Rename directory with constantly changing name

Make glob directory variable

Attempting to read data from multiple files to multiple arrays

Glob search files in date order?

Categories

Resources