Trouble using requests library in python - python

I am attempting to check for active web site folders against a list that was created using robots.txt (this is for learning security, Im doing this on a server that I own and control). I am using Python 2.7 on Kali Linux.
My code works if I just do one web address at a time, as I get a proper 200 or 404 response for folders that are active and not working, respectively.
When I attempt to this against the entire list, I get a string of 404 errors. When i print out actual addresses that the script is creating, everything looks correct.
Here is the code that I am doing:
import requests
attempt = open('info.txt', 'r')
folders = attempt.readlines()
for line in folders:
host = 'http://10.0.1.66/mutillidae'+line
attempt = requests.get(host)
print attempt
This results in a string of 404 errors. If I take the loop out, and try each one individually, I get a 200 response back showing that it is up and running.
I have also printed out the address using the same loop against the text document that contains the correct folders, and the addresses seem to look fine which I verified through copy and pasting. I have tried this with a file containing multiple folders and a single folder listed, and always get a 404 when attempting to read from the file.
The info.txt file contains the following:
/passwords/
/classes/
/javascript/
/config
/owasp-esapi-php/
/documentation/
Any advice is appreciated.

Lines returned by file.readlines() contain trailing newlines, which you must remove before passing them to requests.get. Replace the statement:
host = 'http://10.0.1.66/mutillidae'+line
with:
host = 'http://10.0.1.66/mutillidae' + line.rstrip()
and the problem will go away.
Note that your code would be easier to read if you refrained from using the same generic variable name such as attempt for different purposes, all in the same scope. Also, one should try to use variable names that reflect their usage—for example, host would be better named url, as it doesn't hold the host name, but the entire URL.

Related

Opening a .docx file in S3 bucket in Python (Boto3)

In one of our S3 buckets, we have a .docx file with Mail Merge fields in it.
What I'm trying to do is directly read it directly from the bucket without first downloading it locally!
Typically, I can open a file and see the mail merge fields within it through the use of this code:
from mailmerge import MailMerge
document = MailMerge(r'C:\Users\User\Desktop\MailMergeFile.docx') # Trying to get a variable to pass in here
print(document.get_merge_fields())
As seen above, what I'm trying to do is to get the object in a way where I can just pass it to the MailMerge method, as though I were passing a path on my local machine.
The ways I've looked up to do this haven't been able to work.
fileobj = s3.get_object(
Bucket='bucketname',
Key='folder/mailmergefile.docx'
)
word_file = fileobj['Body'].read()
contents = word_file.decode('ISO-8859-1') # can't use utf-8 as that gives encoding error
contents
But then when I try and pass the contents variable to the Mailmerge function, I get another error:
document = MailMerge(contents)
print(document.get_merge_fields())
The error I get is:
ValueError: embedded null character
I presume you are using docx-mailmerge · PyPI.
The documentation is quite sparse but is shows MailMerge('input.docx'), which suggests that it is expecting the name of a file, not the 'contents' of a file.
In looking at the code, it seems to be calling a library to open a zip file.
Bottom line: As written, it wants the name of a file, not the contents of the file.

Find out differences between directory listings on time A and time B on FTP

I want to build a script which finds out which files on an FTP server are new and which are already processed.
For each file on the FTP we read out the information, parse it and write the information we need from it to our database. The files are xml-files, but have to be translated.
At the moment I'm using mlsd() to get a list, but this takes up to 4 minutes because there are already 15.000 files in this directory - it will be more everyday.
Instead of comparing this list with an older list which I saved in a textfile I would like to know if there are better possibilities.
Because this task has to run "live" it would end in an cronjob every 1 or 2 minutes. If this method takes to long this won't work.
The solution should be either in PHP or Python.
def handle(self, *args, **options):
ftp = FTP_TLS(host=host)
ftp.login(user,passwd)
ftp.prot_p()
list = ftp.mlsd("...")
for item in list:
print(item[0] + " => " + item[1]['modify'])
This code examples already runs 4 minutes.
I have always tried to avoid browsing a folder to find what could have changed. I prefered setting a dedicated workflow. When files can only be added (or new versions of existing files), I tried to use a workflow where files are added in one directory and then go in other directories where they are archived. Processing can occur in a directory where files are deleted after being used, or when they are copied/moved from a folder to an other one.
As a slight goody, I also use a copy/rename pattern: the files are first copied using a temporary name (for example a .t prefix or suffix) and renamed when the copy has ended. This prevents trying to process a file which is not fully copied. Ok it used to be more important when we had slow lines, but race conditions should be avoided as much as possible, and it allows to use daemon which polls a folder every 10 seconds or less.
Unsure whether it is really relevant here because it could require some refactoring, but it gives bullet proof solutions.
If FTP is your only interface to the server, there's no better way that what you are already doing.
Except maybe, if you server supports non-standard -t switch to LIST/NLST commands, which returns the list sorted by timestamps.
See How to get files in FTP folder sorted by modification time.
And if what takes long is the download of the file list (not initiation of the download). In that case you can request sorted list, but download only the leading new files, aborting the listing once you find the first already processed file.
For an example, how to abort download of a file list, see:
Download the first N rows of text file in ftp with ftplib.retrlines
Something like this:
class AbortedListing(Exception):
pass
def collectNewFiles(s):
if isProcessedFile(s): # your code to detect if the file was processed already
print("We know this file already: " + s + " - aborting")
raise AbortedListing()
print("New file: " + s)
try:
ftp.retrlines("NLST -t /path", collectNewFiles)
except AbortedListing:
# read/skip response
ftp.getmultiline()

Read msi with python msilib

I need to read an msi file and make some queries to it. But it looks like despite it is a standard lib for python, it has poor documentation.
To make queries I have to know database schema and I can't find any examples or methods to get it from the file.
Here is my code I'm trying to make work:
import msilib
path = "C:\\Users\\Paul\\Desktop\\my.msi" #I cannot share msi
dbobject = msilib.OpenDatabase(path, msilib.MSIDBOPEN_READONLY)
view = dbobject.OpenView("SELECT FileName FROM File")
rec = view.Execute(None)
r = v.Fetch()
And the rec variable is None. But I can open the MSI file with InstEd tool and see that File is present in the tables list and there are a lot of records there.
What I'm doing wrong?
Your code is suspect, as the last line will throw a NameError in your sample. So let's ignore that line.
The real problem is that view.Execute returns nothing of use. Under the hoods, the MsiViewExecute function only returns success or failure. After you call that, you then need to call view.Fetch, which may be what your last line intended to do.

Python FTP: parseable directory listing

I'm using the Python FTP lib for the first time. My goal is simply to connect to an FTP site, get a directory listing, and then download all files which are newer than a certain date - (e.g. download all files created or modified within the last 5 days, for example)
This turned out to be a bit more complicated than I expected for a few reasons. Firstly, I've discovered that there is no real "standard" FTP file list format. Most FTP sites conventionally use the UNIX ls format, but this isn't guaranteed.
So, my initial thought was to simply parse the UNIX ls format: it's not so bad after all, and it seems most mainstream FTP servers will use it in response to the LIST command.
This was easy enough to code with Python's ftplib:
import ftplib
def callback(line):
print(line)
ftp = ftplib.FTP("ftp.example.com")
result = ftp.login(user = "myusername", passwd = "XXXXXXXX")
dirlist = ftp.retrlines("LIST", callback )
This works, except the problem is that the date given in the UNIX list format returned by the FTP server I'm dealing with doesn't have a year. A typical entry is:
-rw-rw-r-- 1 user user 1505581 Dec 9 21:53 somefile.txt
So the problem here is that I'd have to code in extra logic to sort of "guess" if the date refers to the current year or not. Except really, I'd much rather not code some complex logic like that when it seems so unnecessary - there's no reason the FTP server shouldn't be able to give me the year.
Okay, so after Googling around for some alternative ways to get LIST information, I've found that many FTP servers support the MLST and MLSD command, which apparently provides a directory listing in a "machine-readable" format, i.e. a list format which is much more amenable to automatic processing. Great. So, I try the following:
dirlist = ftp.sendcmd("MLST")
print(dirlist)
This produces a single line response, giving me data about the current working directory, but NOT a list of files.
250-Start of list for /
modify=20151210094445;perm=flcdmpe;type=cdir;unique=808U6EC0051;UNIX.group=1003;UNIX.mode=0775;UNIX.owner=1229; /
250 End of list
So this looks great, and easy to parse, and it also has a modify date with the year. Except it seems the MLST command is showing information about the directory itself, rather than a listing of files.
So, I've Googled around and read the relevant RFCs, but can't seem to figure out how to get a listing of files in "MLST" format. It seems the MLSD command is what I want, but I get a 425 error when I try that:
File "temp8.py", line 8, in <module>
dirlist = ftp.sendcmd("MLSD")
File "/usr/lib/python3.2/ftplib.py", line 255, in sendcmd
return self.getresp()
File "/usr/lib/python3.2/ftplib.py", line 227, in getresp
raise error_temp(resp)
ftplib.error_temp: 425 Unable to build data connection: Invalid argument
So how can I get a full directory listing in MLST/MLSD format here?
There is another module ftputil which is built based on ftplib, and has many features emulating os, os.path, shutil. I found it pretty easy to use and robust in related operation. Maybe you could give it a try.
As for your purpose, the introduction codes solves it exactly.
you could try this, and see if you can get what you need.
print(ftp.mlst('directory'))
I am working on something similar where i need to parse the content of directory and all sub directories within. However the server that I am working with did not allow mlst command, so i accomplished what i need by,
parse the main directory content
for loop through main directory content
Append for loop output to pandas DataFrame.
test = pd.Series('ftp.nlst('/target directory/'))
df_server_content = pd.DataFrame()
for i in test:
data_dir = '/target directory/' + i
server_series = pd.Series(ftp.nlst(data_dir))
df_server_content = df_server_content.append(server_series)

Cannot access file on pythonanywhere

I have a django project that worked perfectly on my local server returning a response. I am now trying to run it on pythonanywhere, it keeps saying no such directory or file. I initially used the os.path.dirname("__file__") but then I changed it into the absolute address, i.e. "/home/username/projectname/filename" to no avail. That latter method is the only one others on the web are suggesting, but it still isn't working. Is there a special syntax to access files in pythonanywhere? or do you have any suggestions? Thanks.
The following is the line that throws the error:
with open("home/<username>/<project>/layer.pem", "r") as rsa_priv_file:
Directory structure:
If this with open("home/<username>/<project>/layer.pem", "r") as rsa_priv_file:
is the actual code you're using, then you're missing a / at the beginning. What you're actually asking for with that code is not the absolute path to layer.pem, but a relative path rooted in the current directory.
Also, the os.path.dirname("__file__") is not working because you quoted the __file__. What you're asking for is the dirname of a file called "__file__" (which will be an empty string), not the dirname of the current file.

Categories

Resources