findstr function doesn't work on original file

findstr function doesn't work on original file - python

I have a code to use powershell to export domain controller's policy using Get-GPOReport. However, I can never use findstr on this exported HTML file. The only way it works is if I change the extension of the HTML file to .txt, then copy all the content in it to another newly created .txt file (e.g. test.txt).
Only then, the findstr function works. Does anyone know why it doesn't work on the original file?
import os, subprocess
subprocess.Popen(["powershell","Get-GPOReport -Name 'Default Domain Controllers Policy' -ReportType HTML -Path 'D:\Downloads\Project\GPOReport.html'"],stdout=subprocess.PIPE)
policyCheck = subprocess.check_output([power_shell,"-Command", 'findstr /c:"Minimum password age"', "D:\Downloads\Project\GPOReport.html"]).decode('utf-8')
print(policyCheck)
# However if I copy all the content in D:\Downloads\Project\GPOReport.html to a newly created test.txt file (MANUALLY - I've tried to do it programmatically, findstr wouldn't work too) under the same directory and use:
power_shell = os.path.join(os.environ["SYSTEMROOT"], "System32","WindowsPowerShell", "v1.0", "powershell.exe")
policyCheck = subprocess.check_output([power_shell,"-Command", 'findstr /c:"Minimum password age"', "D:\Downloads\Project\test.txt"]).decode('utf-8')
print(policyCheck)
# Correct Output Will Show
What I got:
subprocess.CalledProcessError: Command '['C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\powershell.exe', '-Command', 'findstr /c:"Minimum password age"', 'D:\Downloads\Project\GPOReport.html']' returned non-zero exit status 1.
Expected Output:
<tr><td>Minimum password age</td><td>1 days</td></tr>

I'm not a Python guy, but I think this may be an encoding issue. Based on the fact that findstr is not Unicode compatible. As #iRon suggested Select-String should do the trick though you may have to reference the .Line property to get the expected output you mentioned. Other wise it will return match objects.
I'll leave it to you to transpose this into the Python code, but Select-String command should look something like:
(Select-String -Path "D:\Downloads\Project\GPOReport.html" -Pattern "Minimum password age" -SimpleMatch).Line
If there are multiple matches this will return an array of strings; the lines where the matches were made. Let me know if that's helpful.

Related

Searching string among 5Gb of text files

I have several CSV files (~25k in total) with a total size of ~5Gb. This files are in a network path and I need to search for several strings inside all these files and to save the files' names (in an output file for example) where these strings are found.
I've already tried two things:
With Windows I've used findstr : findstr /s "MYSTRING" *.csv > Output.txt
With Windows PowerShell: gci -r "." -filter "*.csv" | Select-String "MYSTRING" -list > .\Output.txt
I also can use Python but I don't really think it'll be faster.
There is any other way to speed up this search ?
More precision: the structure of all the files is different. They are CSV but they could be just simple TXT files

You can use pandas to go through large csv files. You will use the read_csv() method to read the contents of the csv files, then use the query() method to filter out the columns and then use to_csv() to export those results in a separate csv file.
import pandas as pd
df = pd.read_csv('csv_file.csv')
result = df.query('column_name == "filtered_strings"')
result.to_csv('filtered_result.csv', index=False)
Hopefully this helps you.

One of the fastest ways to search in text files using PowerShell is switch with parameters -File FILENAME -Regex.
This won't make a big difference though, unless you avoid the I/O bottleneck of the network, by running the search code on the server, e. g. using Invoke-Command. Of course you need to have permissions to be able to run scripts on the remote server.
Invoke-Command -ComputerName TheRemoteMachine {
Get-ChildItem C:\Location\Of\Files -Recurse -Filter *.csv -PV file | ForEach-Object {
switch -File $_.Fullname -Regex {
'MYSTRING|ANOTHERSTRING' { $file.FullName; break }
}
}
} | Set-Content output.txt
This outputs the full paths of files that contain the sub string "MYSTRING" or "ANOTHERSTRING", which gets received on the local machine and stored in a local file.
switch -File $_.Fullname -Regex reads the current file line by line, applying the regular expression to each line. We use break to stop searching when the first match has been found.
Parameter -PV file (alias of -PipeLineVariable) for Get-ChildItem is used so we have access to the current file path in the switch statement. In the switch statement $_ denotes the current RegEx match, so it hides $_ from the ForEach-Object command. Using -PV we provide another name for the $_ variable of ForEach-Object.

Python script doesn't delete file from archive -- printed command via terminal works fine

I'm creating an archive in Python using this code:
#Creates archive using string like [proxy_16-08-15_08.57.07.tar]
proxyArchiveLabel = 'proxy_%s' % EXECUTION_START_TIME + '.tar'
log.info('Packaging %s ...' % proxyArchiveLabel)
#Removes .tar from label during creation
shutil.make_archive(proxyArchiveLabel.rsplit('.',1)[0], 'tar', verbose=True)
So this creates an archive fine in the local directory. The problem is, there's a specific directory in my archive I want to remove, due to it's size and lack of necessity for this task.
ExecWithLogging('tar -vf %s --delete ./roles/jobs/*' % proxyArchiveLabel)
# ------------
def ExecWithLogging(cmd):
print cmd
p = subprocess.Popen(cmd.split(' '), env=os.environ, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
while(True):
log.info(p.stdout.readline().strip())
if(p.poll() is not None):
break
However, this seems to do basically nothing. The size remains the same. If I print cmd inside of the ExecWithLogging, and copy/past that command to a terminal in the working directory of the script, it works fine. Just to be sure, I also tried hard-coding the full path to where the archive is created as part of the tar -vf %s --delete command, but still nothing seemed to happen.
I do get this output in my INFO log: tar: Pattern matching characters used in file names, so I'm kind of thinking Popen is interpreting my command incorrectly somehow... (or rather, I'm more likely passing in something incorrectly).
Am I doing something wrong? What else can I try?

You may have to use the --wildcards option in the tar command, which enables pattern matching. This may well be what you are seeing in your log, be it somewhat cryptically.
Edit:
In answer to your question Why? I suspect that the shell is performing the wildcard expansion whilst the command proffered through Popen is not. The --wildcard option for tar, forces tar to perform the wildcard expansion.
For a more detailed explanation see here:
Tar and wildcards

Python: How to get the URL to a file when the file is received from a pipe?

I created, in Python, an executable whose input is the URL to a file and whose output is the file, e.g.,
file:///C:/example/folder/test.txt --> url2file --> the file
Actually, the URL is stored in a file (url.txt) and I run it from a DOS command line using a pipe:
type url.txt | url2file
That works great.
I want to create, in Python, an executable whose input is a file and whose output is the URL to the file, e.g.,
a file --> file2url --> URL
Again, I am using DOS and connecting executables via pipes:
type url.txt | url2file | file2url
Question: file2url is receiving a file. How do I get the file's URL (or path)?

In general, you probably can't.
If the url is not stored in the file, I seems very difficult to get the url. Imagine someone reads a text to you. Without further information you have no way to know what book it comes from.
However there are certain usecases where you can do it.
Pipe the url together with the file.
If you need the url and you can do that, try to keep the url together with the file. Make url2file pipe your url first and then the file.
Restructure your pipeline
Maybe you don't need to find the url for the file, if you restructure your pipeline.
Index your files
If only a certain files could potentially be piped into file2url, you could precalculate a hash for all files and store it in your program together with the url. In python you would do this using a dict where the key is the file (as a string) and the value is the url. You could use pickle to write the dict object to a file and load it at the start of your program.
Then you could simply lookup the url from this dict.
You might want to research how databases or search functions in explorers handle indexing or alternative solutions.
Searching for the file
You could use one significant line of the file and use something like grep or head on linux to search all files of your computer for this line. Note that grep and head are programs, not python functions. For DOS, you might need to google the equivalent programs.
FYI: grep searches for one line of text inside a file.
head puts out the first few lines of a file. I suggest comparing only the first few lines of files to avoid searching through huge file.
Searching all files on the computer might take very long.
You could only search files with the same size as your piped input.
Use url.txt
If file2url knows the location of the file url.txt, then you could look up all files in url.txt until you find a file identical to the file that was piped into your program. You could combine this with the hashing/ indexing solution.

'file2url' receives the data via standard input (like keyboard).
The data is transferred by the kernel and it doesn't necessarily have to have any file-system representation. So if there's no file there's no URL or path to that for you to get.

Let's try to do it by obvious way:
$ cat test.py | python test.py
import sys
print ''.join(sys.stdin.readlines())
print sys.stdin.name
<stdin>
So, filename is "< stdin>" because, for the python there is no filename - only input.
Another way is a system-dependent. Find a command line, which was used, for example, but no garantee that is will be works.

Using cat command in Python for printing

In the Linux kernel, I can send a file to the printer using the following command
cat file.txt > /dev/usb/lp0
From what I understand, this redirects the contents in file.txt into the printing location. I tried using the following command
>>os.system('cat file.txt > /dev/usb/lp0')
I thought this command would achieve the same thing, but it gave me a "Permission Denied" error. In the command line, I would run the following command prior to concatenating.
sudo chown root:lpadmin /dev/usb/lp0
Is there a better way to do this?

While there's no reason your code shouldn't work, this probably isn't the way you want to do this. If you just want to run shell commands, bash is much better than python. On the other hand, if you want to use Python, there are better ways to copy files than shell redirection.
The simplest way to copy one file to another is to use shutil:
shutil.copyfile('file.txt', '/dev/usb/lp0')
(Of course if you have permissions problems that prevent redirect from working, you'll have the same permissions problems with copying.)
You want a program that reads input from the keyboard, and when it gets a certain input, it prints a certain file. That's easy:
import shutil
while True:
line = raw_input() # or just input() if you're on Python 3.x
if line == 'certain input':
shutil.copyfile('file.txt', '/dev/usb/lp0')
Obviously a real program will be a bit more complex—it'll do different things with different commands, and maybe take arguments that tell it which file to print, and so on. If you want to go that way, the cmd module is a great help.

Remember, in UNIX - everything is a file. Even devices.
So, you can just use basic (or anything else, e.g. shutil.copyfile) files methods (http://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files).
In your case code may (just a way) be like that:
# Read file.txt
with open('file.txt', 'r') as content_file:
content = content_file.read()
with open('/dev/usb/lp0', 'w') as target_device:
target_device.write(content)
P. S. Please, don't use system() call (or similar) to solve your issue.

under windows OS there is no cat command you should usetype instead of cat under windows
(**if you want to run cat command under windows please look at: https://stackoverflow.com/a/71998867/2723298 )
import os
os.system('type a.txt > copy.txt')
..or if your OS is linux and cat command didn't work anyway here are other methods to copy file..
with grep:
import os
os.system('grep "" a.txt > b.txt')
*' ' are important!
copy file with sed:
os.system('sed "" a.txt > sed.txt')
copy file with awk:
os.system('awk "{print $0}" a.txt > awk.txt')

os.system: saving shell variables with multiple commands in one method

I am having a problem using my command/commands with one instance of os.system.
Unfortunately I have to use os.system as I have no control over this, as I send the string to the os.system method. I know I should really use subprocess module for my case, but that ain't an option.
So here is what I am trying to do.
I have a string like below:
cmd = "export BASE_PATH=`pwd`; export fileList=`python OutputString.py`; ./myscript --files ${fileList}; cp outputfile $BASE_PATH/.;"
This command then gets sent to the os.system module like so
os.system(cmd)
unfortunately when I consult my log file I get something that looks like this
os.system(r"""export BASE_PATH=/tmp/bla/bla; export fileList=; ./myscript --files ; cp outputfile /.;""")
As you can see BASE_PATH seems to be working but then when I call it with the cp outputfile /.
I get a empty string
Also with my fileList I get a empty string as fileList=python OutputString.py should print out a file list to this variable.
My thoughts:
Are these bugs due to a new process for each command? Hence I loose the variable in BASE_PATH in the next command.
Also for I not sure why fileList is empty.
Is there a solution to my above problem using os.system and my command string?
Please Note I have to use os.system module. This is out of my control.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.