How to move Header and Trailer from files to another file? - python

I have around 100 text files with close to thousand records in a folder. I want to copy header and trailer of these files into a new file with the file name of respective file.
So the output i want is as
File_Name,Header,Trailer
is this possible using Unix or Python?

one way to do it is with the bash shell in the folder containing the files:
for file in *; do echo "$file,$(head -1 $file),$(tail -1 $file)"; done

PowerShell-core one liner with aliases
gci *.txt |%{"{0},{1},{2}" -f $_.FullName,(gc $_ -Head 1),(gc $_ -Tail 1)}|set-content .\newfile.txt

Related

Conditional extraction of files from an Archive file

I have a large tar.gz archive file having nxml files and total size is around 5gb.
My aim is to extract files from it but, I do not have to extract all of them. I have to extract all those files whose name is greater than a threshold value.
For example:
Let us consider 1000 is our threshold value. So
path/to/file/900.nxml will not be extracted but
path/to/file/1100.nxml will be extracted.
So my requirement is to make a conditional extraction of files from the archive.
Thanks
Use tar -tf <archive> to get a list of files in the archive.
Process the list of files to determine those you need to extract. Write the file list to a temporary file <filelist>, one line per file.
Looking at the tags you chose, you can use either Python or bash for this string filtering, whichever you prefer.
Use tar -xf <archive> -T <filelist> to extract the files you need.
The option -T or --files-from reads the filenames to process from the given file.
See also the manpage for tar.
You can also use --wildcards option of tar.
For example in the case when your threshold is 1000 you can use tar -xf tar.gz --wildcards path/to/files/????*.nxml. The ? will match one character and using * will match any number of character. This pattern will look for any file name with 4 or more characters.
Hope this helps.

Delete lines of text file if they reference a nonexistent file

I have a text file (images1.txt) with lists of .jpg names and I have a folder (Bones) with .jpg images. All image names are exactly 42 characters (including the file extension), and each is on a separate line containing the name and some information about the image. For example:
OO75768249870G_2018051_4A284DQ0-011628.jpg,1A4502432KJL459265,emergency
OO75768249870G_2018051_4A284DQ0-011629.jpg,1A451743245122,appointment
where everything after .jpg is my own personal notes about the photos. Bones contains many of the 4,000+ images named in images1 but not all. Using either the command prompt or python, how would I remove the lines from images1 which correspond to images not present in my Bones folder?
Thanks!
In python:
import os
LEN_OF_FILENAME = 42
with open('images1.txt', 'r') as image_file:
with open('filtered_images1.txt', 'w') as filtered_image_file:
for line in image_file:
image_name = line[:LEN_OF_FILENAME]
path_to_image = os.path.join('Bones', image_name)
if os.path.exists(path_to_image):
filtered_image_file.write(line)
Assuming images1.txt and Bones are in the same folder, if you run the above Python script in that folder you will get filtered_images1.txt. It will only contain lines that has a corresponding image in Bones.
This code will read the lines from image1.txt and create an image2.txt with the lines where the file exists in the bones directory.
#ECHO OFF
IF EXIST image2.txt (DEL image2.txt)
FOR /F "tokens=1,* delims=," %%f IN ('TYPE "image1.txt"') DO (
IF EXIST "bones\%%~f" (ECHO %%f,%%g >>"image2.txt")
)
EXIT /B
I think the easiest way is to use the findstr command:
rem /* Search for lines in file `images1.txt` in a case-insensitive manner that literally begin
rem with a file name found in the directory `Bones` which in turn matches the naming pattern;
rem then write all matching lines into a temporary file: */
dir /B /A:-D "Bones\??????????????_???????_????????-??????.jpg" | findstr /LIBG:/ "images1.txt" > "images1.tmp"
rem // Overwrite original `images1.txt` file by the temporary file:
move /Y "images1.tmp" "images1.txt" > nul

Batch modify file name and add ascending numbers

I have a bunch of TIFF image files ordered by date. I need to rename them using either python, or terminal commands. The file names are structured like this:
basename_unnecessary_x.tif
where:
basename = is part of the original filename I need to keep (16 characters long)
unnecessary = part of the original filename I want to discard (14 characters long)
x = ascending numbers I need to add. Starting at 0 and going up in steps of 250 for every subsequent file.
I know there are plenty of questions on batch renaming and adding ascending numbers to file names but I haven't found anything that keeps part of the original filename and deletes another portion and adds ascending numbers. Any help would be appreciated.
Thanks!
first get all file names you wanted to rename into a text file.
if you want to rename all files in the directory then simply run below command and redirect it to a text file.[ changed code, it will now list .tif files only]
dir /a:-D /b *.tif >cp1.txt
Now use below code , which will rename files basename_unnecessary_x.tif to basename_0.tif and so on.
#echo off
CD %CD%\<Folderpath in which .tif files should be renamed>
setlocal enabledelayedexpansion
set /a count=0
echo --------Script started -------------------------
echo.
for /f "tokens=*" %%a in (cp1.txt) do (
echo original file name %%a
echo ------------------------------------------
for /f "tokens=1 delims=_" %%b in ("%%a") do (
echo file will be renamed to %%b_!count!.tif
echo ------------------------------------------
rename %%a %%b_!count!.tif
set /a count+=250
)
)
echo.
echo --------Script Completed -------------------------
Changes to the script :
dir command will now only list .tif files to cp1.txt
you can execute the script from any location , provided you update the path in CD section of code.
Updated the code now, it will follow the sequence of 0 to 250 .. so on.
FYI the reason it was giving 250 to the first files even though i have initialized to zero , because i have increased it by 250 before using it in rename command.

How to Edit File via incrond/python

How would one go about making a script to first edit a newly found file within a specific directory, and then upload it through incron/python? I'm a bit confused as how to specify the filename as a string in the python script.
incrontab -e :
/var/test IN_CREATE /var/pythoncode/code.py
python code:
s = open("confused.txt").read()
s = s.replace("string1", "string2")
f = open("confused.txt", 'w')
f.write(s)
f.close()
Essentially, I am trying to have the incron service find any new file that is found within /var/test folder , and then execute python code to look for and replace a string within the new file found in /var/test. However, I am uncertain how to approach the "confused.txt" filename string, since each file found with the /var/test will have a dynamic name.
Here's a workaround assuming you are well verse with scripting -
Step 1 - Change the files in a directory (change all and get names with $FILE in ls).
Step 2 - Store the last file name in a variable $LAST_FILE
Step 3 - Run a WHILE loop to find files newer than $LAST_FILE -
find /var/test/ -newer $LAST_FILE
OR
find /var/test/ -type f -newer $LAST_FILE
OR
find /var/test/ -type f -iname "*.txt" -newer $LAST_FILE
This will list the files that are newer than last file you edited. So, parse this list into your "change file" script.

How to walk a tar.gz file that contains zip files without extraction

I have a large tar.gz file to analyze using a python script. The tar.gz file contains a number of zip files which might embed other .gz files in it. Before extracting the file, I would like to walk through the directory structure within the compressed files to see if certain files or directories are present. By looking at tarfile and zipfile module I don't see any existing function that allow me to get a table of content of a zip file within a tar.gz file.
Appreciate your help,
You can't get at it without extracting the file. However, you don't need to extract it to disk if you don't want to. You can use the tarfile.TarFile.extractfile method to get a file-like object that you can then pass to tarfile.open as the fileobj argument. For example, given these nested tarfiles:
$ cat bar/baz.txt
This is bar/baz.txt.
$ tar cvfz bar.tgz bar
bar/
bar/baz.txt
$ tar cvfz baz.tgz bar.tgz
bar.tgz
You can access files from the inner one like so:
>>> import tarfile
>>> baz = tarfile.open('baz.tgz')
>>> bar = tarfile.open(fileobj=baz.extractfile('bar.tgz'))
>>> bar.extractfile('bar/baz.txt').read()
'This is bar/baz.txt.\n'
and they're only ever extracted to memory.
I suspect that this is not possible and that you'll have to program it manually.
.tar.gz files are first tar'd then gzipped with what is essentially two different applications, in succession. To access the tar file, you're probably going to have to un-gzip it, first.
Also, once you do have access to the tar file after ungzipping it, it does not do random-access well. There is no central repository in the tar file that lists the contents.

Categories

Resources