Facing errors when reading huge text in python - python

Using Python3 my requirement is to read email files from a directory and filter Html tags in it.
I have managed to do it to a large extent.When I try to read the content of my output, it gives an error
for line in output.splitlines():
AttributeError: 'int' object has no attribute 'splitlines'
for file in glob.glob('spam/*.*'):
output = os.system("python html2txt.py " + file)
for line in output.splitlines():
print(line)
When I print output, it shows a filtered text.Any help is appreciated.

Try this as a replacement for the code you've provided:
import glob
files = glob.glob('spam/*.*')
for f in files:
with open(f) as spam_file:
for line in spam_file:
print(line)
If the files are indeed html files, I would recommend looking into BeautifulSoup.

The return value of os.system(command) is system-dependent, it supposes to return the (encoded) process exit value which represented by an int. read more here
On Unix, the return value is the exit status of the process encoded in
the format specified for wait(). Note that POSIX does not specify the
meaning of the return value of the C system() function, so the return
value of the Python function is system-dependent.
On Windows, the return value is that returned by the system shell
after running command, given by the Windows environment variable
COMSPEC: on command.com systems (Windows 95, 98 and ME) this is always
0; on cmd.exe systems (Windows NT, 2000 and XP) this is the exit
status of the command run; on systems using a non-native shell,
consult your shell documentation.
But in no system it returns a str and the method splitlines() is a str method. read more here
You are calling a str method on a int that is why you get the error:
AttributeError: 'int' object has no attribute 'splitlines'

On Unix, the return value is the exit status of the process encoded in
the format specified for wait(). Note that POSIX does not specify the
meaning of the return value of the C system() function, so the return
value of the Python function is system-dependent.
On Windows, the return value is that returned by the system shell
after running command. The shell is given by the Windows environment
variable COMSPEC: it is usually cmd.exe, which returns the exit status
of the command run; on systems using a non-native shell, consult your
shell documentation.
python docs
So your output variable is a integer not the result of the file being parsed by the
html2txt.py script.
And why do you run another python script outside of your current process ? Can't you just import whatever class of function that is doing the job from that module ?
Also there is an email module that can help you

Related

batch file to convert .mp4 to .mp3 crashes half the times

I am using a batch file to access my portable VLC executable to convert an mp4 to an mp3:
set arg1=%1 REM -> arg1={my_mp4_full_path}
set arg2=%2 REM -> arg2={my_mp3_full_path}
echo %arg1%
echo %arg2%
REM batch file is in the same directory as "VLCPlayer" folder
"%~dp0\VLCPlayer\VLCPortable.exe" -I dummy %arg1% --sout=#transcode{acodec=mp3,ab=128,vcodec=dummy}:std{access="file",mux="raw",dst=%arg2%} vlc://quit
When I run this script the first time, vlc crashes and I get an unplayable mp3 file, however when I run the script again the script works and I get a playable mp3. Is there a way to remedy this, or make it consistent? I don't see why running it twice would yield different outcomes.
No I don't have ffmpeg on my computer it is unrecognizable internal or external command.
Note that I face the same problem when using powershell to perform the same task, when I import my function from a .psm1 script:
function ConvertToMp3(
[switch] $inputObject,
[string] $vlc = '{PAth_TO_PORTABLE_VLC}\VLCPortable.exe')
{
PROCESS {
$codec = 'mp3';
$oldFile = $_;
$newFile = $oldFile.FullName.Replace($oldFile.Extension, ".$codec").Replace("'","");
&"$vlc" -I dummy "$oldFile" ":sout=#transcode{acodec=$codec,
vcodec=dummy}:standard{access=file,mux=raw,dst=`'$newFile`'}" vlc://quit | out-null;
# delete the original file
Remove-Item $oldFile;
}
}
I get the same random output that sometimes works, sometimes crashes.
Update:
I feel like I should add more info of how I use the batch file:
I have a python script Convert.py and I call my batch file inside using os.system():
mp4_to_convert = arguments.file
full_path_mp4 = os.path.join(outdir,mp4_to_convert)
mp3_to_convert_to = mp4_to_convert.replace(".mp4",".mp3")
full_path_mp3 = os.path.join(outdir,mp3_to_convert_to)
command_string = """Convert_Script.bat \"{}\" \"{}\"""".format(full_path_mp4, full_path_mp3)
os.system(command_string)
This is the documentation of os.system():
os.system(command)
Execute the command (a string) in a subshell. This
is implemented by calling the Standard C function system(), and has
the same limitations. Changes to sys.stdin, etc. are not reflected in
the environment of the executed command. If command generates any
output, it will be sent to the interpreter standard output stream.
On Unix, the return value is the exit status of the process encoded in
the format specified for wait(). Note that POSIX does not specify the
meaning of the return value of the C system() function, so the return
value of the Python function is system-dependent.
On Windows, the return value is that returned by the system shell
after running command. The shell is given by the Windows environment
variable COMSPEC: it is usually cmd.exe, which returns the exit status
of the command run; on systems using a non-native shell, consult your
shell documentation.
Any pointers or suggestions would be helpful, thank you in advance for your help.

Running python script from perl, with argument to stdin and saving stdout output

My perl script is at path:
a/perl/perlScript.pl
my python script is at path:
a/python/pythonScript.py
pythonScript.py gets an argument from stdin, and returns result to stdout. From perlScript.pl , I want to run pythonScript.py with the argument hi to stdin, and save the results in some variable. That's what I tried:
my $ret = `../python/pythonScript.py < hi`;
but I got the following error:
The system cannot find the path specified.
Can you explain the path can't be found?
The qx operator (backticks) starts a shell (sh), in which prog < input syntax expects a file named input from which it will read lines and feed them to the program prog. But you want the python script to receive on its STDIN the string hi instead, not lines of a file named hi.
One way is to directly do that, my $ret = qx(echo "hi" | python_script).
But I'd suggest to consider using modules for this. Here is a simple example with IPC::Run3
use warnings;
use strict;
use feature 'say';
use IPC::Run3;
my #cmd = ('program', 'arg1', 'arg2');
my $in = "hi";
run3 \#cmd, \$in, \my $out;
say "script's stdout: $out";
The program is the path to your script if it is executable, or perhaps python script.py. This will be run by system so the output is obtained once that completes, what is consistent with the attempt in the question. See documentation for module's operation.
This module is intended to be simple while "satisfy 99% of the need for using system, qx, and open3 [...]. For far more power and control see IPC::Run.
You're getting this error because you're using shell redirection instead of just passing an argument
../python/pythonScript.py < hi
tells your shell to read input from a file called hi in the current directory, rather than using it as an argument. What you mean to do is
my $ret = `../python/pythonScript.py hi`;
Which correctly executes your python script with the hi argument, and returns the result to the variable $ret.
The Some of the other answers assume that hi must be passed as a command line parameter to the Python script but the asker says it comes from stdin.
Thus:
my $ret = `echo "hi" | ../python/pythonScript.py`;
To launch your external script you can do
system "python ../python/pythonScript.py hi";
and then in your python script
import sys
def yourFct(a, b):
...
if __name__== "__main__":
yourFct(sys.argv[1])
you can have more informations on the python part here

Can't pass file handle to subprocess

I created a file in the current directory with echo "foo" > foo. I then tried to pass that file to subprocess.run, but I seem to misunderstand how file paths are handled in Python, since I'm getting an error. What's wrong?
My test code
with open('foo') as file:
import subprocess
subprocess.run(['cat',file])
yields
TypeError: expected str, bytes or os.PathLike object, not _io.TextIOWrapper
What is a PathLike object? How to I get it from open('foo')? Where can I find more information about how files are handled in Python?
There's no need to open the file in the first place. You can simply run
import subprocess
subprocess.run(['cat', 'foo'])
The cat command is being run as a shell command by your machine, so you should just be able to pass the file name as a string.
Python does not handle the file at all. The point of subprocess is to pass a command to the underlying system (in this case, apparently a UNIX based OS). All you are doing is passing a plaintext command to the command line.
I won't, however, discourage you from reading about file handling. Look at this documentation.
PathLike object: docs
How to get it from the open call's return value:
Use the name field
subprocess.run(['cat',file.name])
Learn about python files: Reading and writing files

string variable of cwd

Input
import os
my_cwd = str(os.system("cd"))
Output
C:\ProgramData\Anaconda2
Input
my_cwd
Output
'0'
I would expect calling my_cwd would return 'C:\ProgramData\Anaconda2' what am I missing?
os.system returns the return code of the command as an integer (that's why you tried to convert to str), not the output of the command as a string.
To get the output, you could use subprocess.check_output (subprocess.run in python 3.5+) with shell=True since cd is built-in:
import subprocess
value = subprocess.check_output(["cd"],shell=True)
(check_output raises an exception if the command fails, though)
You also have to "cleanup" the output by using value.rstrip() and decode the result into a string, since subprocess.check_output returns a bytes object... Also, your code is not portable on Linux, since the required command would be pwd.
Well, that very complex to just get the current directory (leave that kind of stuff for cls or clear commands). The most pythonic way to get it is to use:
os.getcwd()

In R: use system() to pass python command with white spaces

Im trying to pass a python command from R (on Windows x64 Rstudio) to a python script via the command promt. It works if I type directly into cdm but not if I do it via R using the R function system(). The format is (this is how I EXACTLY would write in the windows cmd shell/promt):
pyhton C:/some/path/script <C:/some/input.file> C:/some/output.file
This works in the cmd promt, and runs the script with the input file (in <>) and gives the output file. I thought I in R could do:
system('pyhton C:/some/path/script <C:/some/input.file> C:/some/output.file')
But this gives an error from python about
error: unparsable arguments: ['<C:/some/input.file>', 'C:/some/output.file']
It seems as if R or windows interpret the white spaces different than if I simply wrote (or copy-paste) the line to the cmd promt. How to do this.
From ?system
This interface has become rather complicated over the years: see
system2 for a more portable and flexible interface which is
recommended for new code.
System2 accepts a parameter args for the arguments of your command.
So you can try:
system2('python', c('C:\\some\\path\\script', 'C:\\some\\input.file', 'C:\\some\\output.file'))
On Windows:
R documentation is not really clear on this point (or maybe it's just me), anyway it seems that on Windows the suggested approach is to use the shell() which is less raw than system and system2, plus it seems to work better with redirection operators (like < or >).
shell ('python C:\\some\\path\\script < C:\\some\\input.file > C:\\some\\output.file')
So what is this command doing is:
Call python
Telling python to execute the script C:\some\path\script. Here we need to escape the '\' using '\'.
Then we passing some inputs to the script using a the '<' operator and the input.file
We redirect the output (using '>') to the output file.

Categories

Resources