How would one go about making a script to first edit a newly found file within a specific directory, and then upload it through incron/python? I'm a bit confused as how to specify the filename as a string in the python script.
incrontab -e :
/var/test IN_CREATE /var/pythoncode/code.py
python code:
s = open("confused.txt").read()
s = s.replace("string1", "string2")
f = open("confused.txt", 'w')
f.write(s)
f.close()
Essentially, I am trying to have the incron service find any new file that is found within /var/test folder , and then execute python code to look for and replace a string within the new file found in /var/test. However, I am uncertain how to approach the "confused.txt" filename string, since each file found with the /var/test will have a dynamic name.
Here's a workaround assuming you are well verse with scripting -
Step 1 - Change the files in a directory (change all and get names with $FILE in ls).
Step 2 - Store the last file name in a variable $LAST_FILE
Step 3 - Run a WHILE loop to find files newer than $LAST_FILE -
find /var/test/ -newer $LAST_FILE
OR
find /var/test/ -type f -newer $LAST_FILE
OR
find /var/test/ -type f -iname "*.txt" -newer $LAST_FILE
This will list the files that are newer than last file you edited. So, parse this list into your "change file" script.
Related
My code below, Python can't read my file:
f = open('./resource/review.txt','r')
The error is as in photo: Errno 2: No such file in directory.
You can try \ instead of /
f = open('.\\resource\\review.txt','r')
content = f.read()
print(content)
Because the file is not located at the same folder as your script you need to go back one folder.
In order to do that tou need '..' (double dot) and not one, this will indicate to go back a folder relative to the current path.
The message Errno 2: No such file in directory simply means that your relative pathname to your file cannot be found in the directory in which you're executing (which might not be the directory the Python script is in or that you think you're in, if you're using an IDE).
So to see what directory your Python is actually running in, do:
print(f'current directory is: {os.getcwd()}')
Here is the below code we have developed for single directory of files
from os import listdir
with open("/user/results.txt", "w") as f:
for filename in listdir("/user/stream"):
with open('/user/stream/' + filename) as currentFile:
text = currentFile.read()
if 'checksum' in text:
f.write('current word in ' + filename[:-4] + '\n')
else:
f.write('NOT ' + filename[:-4] + '\n')
I want loop for all directories
Thanks in advance
If you're using UNIX you can use grep:
grep "checksum" -R /user/stream
The -R flag allows for a recursive search inside the directory, following the symbolic links if there are any.
My suggestion is to use glob.
The glob module allows you to work with files. In the Unix universe, a directory is / should be a file so it should be able to help you with your task.
More over, you don't have to install anything, glob comes with python.
Note: For the following code, you will need python3.5 or greater
This should help you out.
import os
import glob
for path in glob.glob('/ai2/data/prod/admin/inf/**', recursive=True):
# At some point, `path` will be `/ai2/data/prod/admin/inf/inf_<$APP>_pvt/error`
if not os.path.isdir(path):
# Check the `id` of the file
# Do things with the file
# If there are files inside `/ai2/data/prod/admin/inf/inf_<$APP>_pvt/error` you will be able to access them here
What glob.glob does is, it Return a possibly-empty list of path names that match pathname. In this case, it will match every file (including directories) in /user/stream/. If these files are not directories, you can do whatever you want with them.
I hope this will help you!
Clarification
Regarding your 3 point comment attempting to clarify the question, especially this part we need to put appi dynamically in that path then we need to read all files inside that directory
No, you do not need to do this. Please read my answer carefully and please read glob documentation.
In this case, it will match every file (including directories) in /user/stream/
If you replace /user/stream/ with /ai2/data/prod/admin/inf/, you will have access to every file in /ai2/data/prod/admin/inf/. Assuming your app ids are 1, 2, 3, this means, you will have access to the following files.
/ai2/data/prod/admin/inf/inf_1_pvt/error
/ai2/data/prod/admin/inf/inf_2_pvt/error
/ai2/data/prod/admin/inf/inf_3_pvt/error
You do not have to specify the id, because you will be iterating over all files. If you do need the id, you can just extract it from the path.
If everything looks like this, /ai2/data/prod/admin/inf/inf_<$APP>_pvt/error, you can get the id by removing /ai2/data/prod/admin/inf/ and taking everything until you encounter _.
this is my first time posting to StackOverflow and I'm hoping someone can assist. I'm fairly new at pig scripts and have encountered a problem I can't solve.
Below is a pig script that fails when I attempt to write results to a file:
register 'myudf.py' using jython as myfuncs;
A = LOAD '$file_nm' USING PigStorage('$delimiter') AS ($fields);
B = FILTER A by ($field_nm) IS NOT NULL;
C = FOREACH B GENERATE ($field_nm) as fld;
D = GROUP C ALL;
E = FOREACH D GENERATE myfuncs.theResult(C.fld);
--DUMP E;
STORE E INTO 'myoutput/theResult';
EXEC;
I see the results of E when I Dump to the screen. However, I need to store the results temporarily in a file. After the Store command, the error I receive is: Output Location Validation Failed.
I've tried numerous workarounds, like removing the theResult folder and removing the earlier contents of theResult, but none of the commands I use work. These have been along the lines of:
hdfs dfs -rm myoutput/theResult
and
hadoop fs -rm myoutput/theResult
...using both the shell (hs) and file system (fs) commands. I've tried to call another function (shell script, python function, etc.) to clear the earlier results stored in the myoutput/theResult folder. I've read every website I can find and nothing is working. Any ideas??
the output location of a mapreduce is a directory. So, you must have tried it this way
hadoop fs -rmr myoutput/theResult
and then run the pig script. It will work.
"rmr" - remove recursive, which deletes both folder/file
"rm" - is just remove which removes only file
Everytime, you need to either change output path or delete and use the same, since HDFS is worm(write once read many) model storage.
Couple of things you can try-
making sure output director is a valid path.
Remove the entire directory and not just content within it. Remove directory with 'rmr and check that path doesn't exist before running pig script.
Thanks for both of your replies. I now have a solution that is working:
fs -mkdir -p myoutput/theResult
fs -rm -r myoutput/theResult
The first line attempts to create a directory, but the "-p" prevents an error if it already exists. Then the second line removes it. Either way, there will be a directory to remove, so no error!
The output of store is confusing when we are using Pig for the first time.
store grp into '/output1';
This will create the folder named 'output1' in root. The folder should not be already present
You can give your own hdfs path here like /user/thewhitetulip.
hdfs dfs -ls /output1
output:
/output1/_SUCCESS
/output1/part-r-00000
The part-r-00000 file is the output of the store program.
I have multiple .txt files in different levels of subdirectory. All txt files are in the final iteration, that is, there is no level with both .txt files and further directories. I want to concatenate them all into one new text file, but can't find a simple way to go through all the subdirectories.
A command I've found, that is to be entered into the python command line terminal thus:
$ cat source/file/*.txt > source/output/output.txt
But I am not sure how I could make this iterate over multiple subdirectories.
(I am a real beginner with python, there seems to be some confusion. Is this not a python command? The source I found it from claimed it was...)
You could build them using a python script using something like:
file_list = ['c:\data.txt', 'c:\folder1\data.txt']
for everything in file_list:
f = open(everything)
for line in f.readlines():
print line
Call the script from cmd e.g. \python.exe 'buildtxtfiles.py'
I need to do a batch rename given the following scenario:
I have a bunch of files in Folder A
A bunch of files in Folder B.
The files in Folder A are all ".doc",
the files in Folder B are all ".jpg".
The files in Folder A are named "A0001.doc"
The files in Folder B are named "A0001johnsmith.jpg"
I want to merge the folders, and rename the files in Folder A so that they append the name portion of the matching file in Folder B.
Example:
Before:
FOLDER A: Folder B:
A0001.doc A0001johnsmith.jpg
After:
Folder C:
A0001johnsmith.doc
A0001johnsmith.jpg
I have seen some batch renaming scripts, but the only difference is that i need to assign a variable to contain the name portion so I can append it to the end of the corresponding file in Folder A.
I figure that the best way to do it would be to do a simple python script that would do a recursive loop, working on each item in the folder as follows:
Parse filename of A0001.doc
Match string to filenames in Folder B
Take the portion following the string that matched but before the "." and assign variable
Take the original string A0001 and append the variable containing the name element and rename it
Copy both files to Folder C (non-destructive, in case of errors etc)
I was thinking of using python for this, but I could use some help with syntax and such. I only know a little bit using the base python library, and I am guessing I would be importing libraries such as "OS", and maybe "SYS". I have never used them before, any help would be appreciated. I am also open to using a windows batch script or even powershell. Any input is helpful.
This is Powershell since you said you would use that.
Please note that I HAVE NOT TESTED THIS. I don't have access to a Windows machine right now so I can't test it. I'm basing this off of memory, but I think it's mostly correct.
foreach($aFile in ls "/path/to/FolderA")
{
$matchString = $aFile.Name.Split("."}[0]
$bFile = $(ls "/path/to/FolderB" |? { $_.Name -Match $matchString })[0]
$addString = $bFile.Name.Split(".")[0].Replace($matchString, "")
cp $aFile ("/path/to/FolderC/" + $matchString + $addString + ".doc")
cp $bFile "/path/to/FolderC"
}
This makes a lot of assumptions about the name structure. For example, I assumed the string to add doesn't appear in the common filename strings.
It is very simple with a plain batch script.
#echo off
for %%A in ("folderA\*.doc") do (
for %%B in ("folderB\%%~nA*.jpg") do (
copy "%%A" "folderC\%%~nB.doc"
copy "%%B" "folderC"
)
)
I haven't added any error checking.
You could have problems if you have a file like "A1.doc" matching multiple files like "A1file1.jpg" and "A10file2.jpg".
As long as the .doc files have fixed width names, and there exists a .jpg for every .doc, then I think the code should work.
Obviously more code could be added to handle various scenarios and error conditions.