cant run with pypy3 script which uses pyshark - python

here i tryied to run script with pypy3 c.py but above error occured ,
i installed pypy3 -m pip install pyshark but ...
pypy3 c.py
ModuleNotFoundError: No module named 'lxml.objectify'
import pyshark
import pandas as pd
import numpy as np
from multiprocessing import Pool
import re
import sys
temp_array = []
cap = pyshark.FileCapture("ddos_attack.pcap")
#print(cap._extract_packet_json_from_data(cap[0]))
def parse(capture):
print(capture)
packet_raw = [i.strip('\r').strip('\t').split(':') for i in str(capture).split('\n')]
packet_raw = map(lambda num:[num[0].replace('(',''),num[1].strip(')').replace('(','')] if len(num)== 2 else [num[0],':'.join(num[1:])] ,[i for i in packet_raw])
raw = list(packet_raw)[:-1]
cols = [i[0] for i in raw]
vals = [i[1] for i in raw]
temp_array.append(dict(zip(cols,vals)))
return dict(zip(cols,vals))
def preprocess_dataset(x):
count = 0
temp = []
#print(list(cap))
#p = Pool(5)
#r = p.map(parse,cap)
#p.close()
#p.join()
#print(r)
try:
for i in list(cap):
temp.append(parse(i))
count += 1
except Exception:
print("somethin")
data = pd.DataFrame(temp)
print(data)
data = data[['Packet Length','.... 0101 = Header Length','Protocol','Time to Live','Source Port','Length','Time since previous frame in this TCP stream','Window']]
data.rename(columns={".... 0101 = Header Length": 'Header Length'})
filtr = ["".join(re.findall(r'\d.',str(i))) for i in data['Time since previous frame in this TCP stream']]
data['Time since previous frame in this TCP stream'] = filtr
print(data.to_csv('data.csv'))
here i tryied to run script with pypy3 c.py
but above error occured ,
i installed pypy3 -m pip install pyshark but ...

Check your terminal settings.
Try to use another compiler like PyCharm.

It seems lxml is not installed correctly. It is hard to figure out what is going on since you only show the last line of the traceback, and do not state what platform you are on nor what version of PyPy you are using. The lxml package is listed as a requirement for pyshark, so it should have been installed. What happens when you try import lxml ?

Related

(raspberry pi) instead of pip install all the function (like pandas and json), but we can still use them in script import

system: raspberry pi 4 model B, 32bit, linux run python
Is a dumb question, I was planning to read data from MongoDB to excel and also read excel toMongoDB. Overall the .py scrip/code is fine and working. (the code is below)
I do know if in the code I do "import pandas as pd" then raspberry pi cmd
need to pip install it
my main quesion:
but we also acknowledge that raspberrypi's memory not as bigger as other laptop, is there other way instead of pip install all the stuff, we can still use them?
Becides, I only pip install pandas by raspberrypi took about 15 min, and laptop is like 30sec, and factory might have more than hundred of raspberrypis for recording such as temperature, product data etc on production line.
There should be an efficient way to implement (use pandas and other pymongo without manually pip install on raspberrypi)
the memory left:
joy#raspberrypi:/ $ free
3834332/total , 223876/used , 2844436/free
the fine code.py script MongoDB to excel:
import pandas as pd
from pymongo import MongoClient
import pymongo
from json2excel import Json2Excel
import json
from bson.objectid import ObjectId
from bson import json_util
client = pymongo.MongoClient("mongodb://localhost:27017/")
# Database Name
db = client["(practice_10_14)-0002"]
# Collection Name
col = db["(practice_10_24)read_MongoDB_to_Excel"]
# Find All: It works like Select * query of SQL.
x = col.find()
list_01 = []
for data in x:
list_01.append(data)
print(data)
print("= = = = = ")
df = pd.DataFrame(data,index=[0])
# select two columns
for y in df:
print(y)
print("= = = = = ")
print(type(list_01))
print(list_01)
df = pd.DataFrame(list_01)
writer = pd.ExcelWriter('test10.24.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='welcome', index=False)
writer.save()

How to run a function in python only first time the code is running?

I have written a function that will install the Module required to run a script. My problem is that the function runs every time the script is running. I need to run the function only the first time the script is running so that after installing the module the function does not run every time the script is running.
My code is
import importlib
import subprocess
import pkg_resources
import os, time, json, datetime, sys
def import_and_install(package):
try:
importlib.import_module(package)
except (ModuleNotFoundError, pkg_resources.DistributionNotFound) as e:
print("{0} module is not installed.\n Don't worry. will take care\n".format(package))
package = [package]
subprocess.check_call([sys.executable, '-m', 'pip', 'install'] + package)
packages = ['pandas', 'numpy', 'threading', 'xlwings']
for package in packages:
import_and_install(package)
import pandas as pd
import threading
import xlwings as xw
import numpy as np
I am not sure what you mean by running the script only the first time the script is running?.
does that mean you want to run the script only once per computer/VM?.
by the looks of it, it seems that the script that you want to run is for handling dependencies and packaging management, if that is the case I would recommend you use packages manager instead such as Poetry(https://python-poetry.org/).
if you still consist on doing everything on your own.
the simplest solution I could think of Is creating a file that stores a flag that tells you if the script has already ran or not.
but there might be better solutions
We can compare the list of packages required with the list of packages installed.
To get the list of packages installed we can use:
import pkg_resources
pkg_resources.working_set
This will return an iterator from where we can get the modules installed names (key) and versions (version).
import pkg_resources
modules_installed = {pkg.key:pkg.version for pkg in pkg_resources.working_set}
# this will return a dict like this one, for example:
# {'setuptools': '65.3.0',
# 'pip': '22.2.2',
# 'xlwings': '0.28.5'
# ...}
Now we can compare them with a function that will help to verify if a package is install and also verify the version in case that the package has '==' in the value:
# package: for example 'pandas==1.5.0'
# modules_installed: {pkg.key:pkg.version for pkg in pkg_resources.working_set}
def is_pkg_installed(package, modules_installed) -> bool:
pkg_name = package.split('==')[0] if '==' in package else package
pkg_ver = package.split('==')[1] if '==' in package else None
if pkg_name not in modules_installed:
return False
elif pkg_ver:
installed_version = modules_installed[pkg_name]
if pkg_ver != installed_version:
return False
return True
Using list comprehensions we can get the list of packages_to_install:
packages_to_install = [package for package in packages_list
if not is_pkg_installed(package, modules_installed)]
# values example: packages_to_install = ['pandas==1.5.0', 'numpy']
To install the packages_to_install, we can use the next function that installs all the packages that are in the list:
import subprocess
import sys
def pip_install(packages: list) -> None:
if not packages:
return
# this will do a call like this one:
# ../python.exe -m pip install pandas=1.5.0, numpy
subprocess.check_call([sys.executable, '-m', 'pip', 'install'] + packages)
finally the complete example, where pandas has a specific version selected pandas==1.5.0:
import subprocess
import pkg_resources
import sys
def pip_install(packages: list) -> None:
if not packages:
return
subprocess.check_call([sys.executable, '-m', 'pip', 'install'] + packages)
def is_pkg_installed(package, modules_installed) -> bool:
pkg_name = package.split('==')[0] if '==' in package else package
pkg_ver = package.split('==')[1] if '==' in package else None
if pkg_name not in modules_installed:
return False
elif pkg_ver:
installed_version = modules_installed[pkg_name]
if pkg_ver != installed_version:
return False
return True
def install_packages(packages_list: list) -> None:
modules_installed = {pkg.key:pkg.version for pkg in pkg_resources.working_set}
packages_to_install = [package for package in packages_list
if not is_pkg_installed(package, modules_installed)]
if packages_to_install:
print(f"{packages_to_install} modules are not installed.\nDon't worry. will take care\n")
pip_install(packages_to_install)
packages = ['pandas==1.5.0', 'numpy', 'threading', 'xlwings']
install_packages(packages)
import pandas as pd
import threading
import xlwings as xw
import numpy as np
install_packages() will get the list of packages that are not installed or have different version, and if the list is not empty, run pip_install()

AttributeError: module 'whois' has no attribute 'whois'

I am running my ML code and getting this error-
Enter website name=> www.google.com
Traceback (most recent call last):
File "Dphishing.py", line 12, in <module>
p2.category2(website)
File "C:\xampp\htdocs\Detect_Phishing_Website\p2.py", line 8, in category2
page = whois.whois(website)
AttributeError: module 'whois' has no attribute 'whois'
My code is:
# -*- coding: utf-8 -*-
import p1
import p2
import p3
import p4
import pandas as pd
#import numpy as np
website = str(input("Enter website name=> "))
p1.category1(website)
p2.category2(website)
p3.category3(website)
p4.category4(website)
read = pd.read_csv(r'C:\Users\Anushree\Desktop\college\4th year project\Detect_Phishing_Website\phishing5.txt',header = None,sep = ',')
read = read.iloc[:,:-1].values
dataset = pd.read_csv(r'C:\Users\Anushree\Desktop\college\4th year project\Detect_Phishing_Website\Training Dataset1.csv')
X = dataset.iloc[:,:-1].values
y = dataset.iloc[:,-1].values
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2,random_state = 1001)
from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators = 10,criterion = "mse",random_state = 2)
regressor.fit(X_train,y_train)
y_pred = regressor.predict(X_test)
from sklearn.model_selection import cross_val_score
accuracy = cross_val_score(estimator = regressor,X=X_train,y=y_train,cv = 5)
accuracy.mean()
accuracy.std()
Detect_phishing_website = regressor.predict(read)
if Detect_phishing_website == 1:
print("legitimate website")
elif Detect_phishing_website == 0:
print ('suspicious website')
else:
print('phishing website')
The code of file p2.py is:
import re
import whois
def category2(website):
file_obj = open(r'C:\Users\Anushree\Desktop\college\4th year project\Detect_Phishing_Website\phishing5.txt','a')
#8 Domain Registration Length
page = whois.whois(website)
if type(page.expiration_date) == list:
domain_reg_len = (page.expiration_date[0] - page.creation_date[0]).days
else:
domain_reg_len = (page.expiration_date - page.creation_date).days
#print domain_reg_len
if domain_reg_len <= 365:
file_obj.write('-1,')
else:
file_obj.write('1,')
#9 Using Non-Standard Port
match_port = re.search(':[//]+[a-z]+.[a-z0-9A-Z]+.[a-zA-Z]+:([0-9#]*)',website)
if match_port:
print (match_port.group())
if match_port.group(1) == '#':#represent multiple ports are active on url
file_obj.write('-1,')
else:
file_obj.write('1,')
else:
file_obj.write('1,')
file_obj.close()
I have already tried uninstalling whois and then reinstalling python-whois using the command pip install python-whois. But that hasn't helped with the error.
How can I understand what is going wrong, and how I can correct it?
Reason for your error:
You have not installed the whois command on your system.
Ubuntu: Use sudo apt install whois
Windows: Download and install from here
First uninstall any whois module with pip uninstall whois and pip uninstall python-whois
Solution 1: Use python-whois
Install python-whois with pip install python-whois
Then make sure you already installed the whois command on your machine.
Then your code should work.
Solution 2: Use whois
Install whois command on your machine. If you are on ubuntu sudo apt install whois will do.
Install whois module with pip install whois,
Then use whois.query() instead of whois.whois() in your code.
Source

importing module (nltk) causes multiprocessing to hang

I tracked a python multiprocessing headache down to the import of a module (nltk). Reproducible (hopefully) code is pasted below. This doesn't make any sense to me, does anybody have any ideas?
from multiprocessing import Pool
import time, requests
#from nltk.corpus import stopwords # uncomment this and it hangs
def gethtml(key, url):
r = requests.get(url)
return r.text
def getnothing(key, url):
return "nothing"
if __name__ == '__main__':
pool = Pool(processes=4)
result = list()
nruns = 4
url = 'http://davidchao.typepad.com/webconferencingexpert/2013/08/gartners-magic-quadrant-for-cloud-infrastructure-as-a-service.html'
for i in range(0,nruns):
# print gethtml(i,url)
result.append(pool.apply_async(gethtml, [i,url]))
# result.append(pool.apply_async(getnothing, [i,url]))
pool.close()
# monitor jobs until they complete
running = nruns
while running > 0:
time.sleep(1)
running = 0
for run in result:
if not run.ready(): running += 1
print "processes still running:",running
# print results
for i,run in enumerate(result):
print i,run.get()[0:40]
Note that the 'getnothing' function works. It's a combination of the nltk module import and the requests call. Sigh
> python --version
Python 2.7.6
> python -c 'import sys;print("%x" % sys.maxsize, sys.maxsize > 2**32)'
('7fffffffffffffff', True)
> pip freeze | grep requests
requests==2.2.1
> pip freeze | grep nltk
nltk==2.0.4
I would redirect others with similar problems to solutions which do not use the multiprocessing module:
1) Apache Spark for scalability/flexibility. However, this doesn't seem to a solution for python multiprocessing. Looks like pyspark is also limited by the Global Interpreter Lock?
2) 'gevent' or 'twisted' for general python asynchronous processing
http://sdiehl.github.io/gevent-tutorial/
3) grequests for asynchronous requests
Asynchronous Requests with Python requests

rpy2: check if package is installed

Using rpy2, I want to check if a given package is installed. If it is, I import it. If not, I install it first.
How do I check if it's installed?
from rpy2 import *
if not *my package is installed*:
rpy2.interactive as r
r.importr("utils")
package_name = "my_package"
r.packages.utils.install_packages(package_name)
myPackage = importr("my_package")
Here is a function that'd do it on the Python side
(note the contriburl, that should be set to a CRAN mirror, and that the case where installing the library is failing is not handled).
from rpy2.rinterface import RRuntimeError
from rpy2.robjects.packages import importr
utils = importr('utils')
def importr_tryhard(packname, contriburl):
try:
rpack = importr(packname)
except RRuntimeError:
utils.install_packages(packname, contriburl = contriburl)
rpack = importr(packname)
return rpack
You can use the following function I got from #SaschaEpskamp's answer to another SO post:
pkgTest <- function(x)
{
if (!require(x,character.only = TRUE))
{
install.packages(x,dep=TRUE)
if(!require(x,character.only = TRUE)) stop("Package not found")
}
}
And use this instead to load your packages:
r.source("file_with_pkgTest.r")
r.pkgTest("utils")
In general, I would recommend not try to write much R code inside Python. Just create a few high-level R functions which do what you need, and use those as a minimal interface between R and Python.
import sys,subprocess
your_package = 'nltk'
package_names = subprocess.Popen([pip freeze],
stdout=subprocess.PIPE).communicate()[0]
pakage = package_names.split('\n')
for package in packages:
if package ==your_package:
print 'true'

Categories

Resources