Listing all usb mass storage disks using python

Listing all usb mass storage disks using python - python

I have few USB disks inserted in my system. I would like to list all of them like:-
/dev/sdb
/dev/sdc
.... ans so on..
Please remember that I don't want to list partitions in it like /dev/sdb1. I am looking for solution under Linux. Tried cat /proc/partitions.
major minor #blocks name
8 0 488386584 sda
8 1 52428800 sda1
8 2 52428711 sda2
8 3 1 sda3
8 5 52428800 sda5
8 6 15728516 sda6
8 7 157683712 sda7
8 8 157682688 sda8
11 0 1074400 sr0
11 1 47602 sr1
8 32 3778852 sdc
8 33 1 sdc1
8 37 3773440 sdc5
But it list all the disks and unable to figure which one is USB storage disks. I am looking for a solution which does not require an additional package installation.

You can convert Klaus D.'s suggestion into Python code like this:
#!/usr/bin/env python
import os
basedir = '/dev/disk/by-path/'
print 'All USB disks'
for d in os.listdir(basedir):
#Only show usb disks and not partitions
if 'usb' in d and 'part' not in d:
path = os.path.join(basedir, d)
link = os.readlink(path)
print '/dev/' + os.path.basename(link)
path contains info in this format:
/dev/disk/by-path/pci-0000:00:1d.7-usb-0:5:1.0-scsi-0:0:0:0
which is a symbolic link, so we can get the pseudo-scsi device name using os.readlink().
But that returns info in this format:
../../sdc
so we use os.path.basename() to clean it up.
Instead of using
'/dev/' + os.path.basename(link)
you can produce a string in the '/dev/sdc' format by using
os.path.normpath(os.path.join(os.path.dirname(path), link))
but I think you'll agree that the former technique is simpler. :)

List the right path in /dev:
ls -l /dev/disk/by-path/*-usb-* | fgrep -v part

Related

iterate over a list of chemical names using ChemSpiPy to get canonical smiles

I have list of chemical names called Phenolics
Phenolics
0 Dihydroquercetin 7,30-dimethyl ether
1 Artelin
2 Esculin 7- methylether (methylesculin)
3 Esculin
4 Scopoletin (7- hydroxy-6- methoxycoumarin)
5 Axillarin
6 Esculetin
7 Isoscopoletin
8 6-Beta-D-glucosyl-7- methoxycoumarin
9 5,40Dihydroxy- 3,6,7,30- tetramethoxyflavone
10 Apigenin
11 Luteolin-7-O- glucoside
12 Magnoloside
13 Penduletin
14 Quercetagetin
15 Quercetagetin-3,6,7- trimethyl ether
16 Quercetin
17 Quercetin 7,30- dimethyl ether (Rhamnazine)
18 Scoparone
19 Skimmin
20 Umbelliferone
21 Apigenin 40-methyl ether
and I would like to run a search on chemspipy to obtain the canonical smiles of these chemical names.
I tried
for result in cs.search(Phenolics):
print(result.smiles)
and it doesn't work, I get no results.

I can not test it because I have no API key, but this should search for the name and give you a result about it. How to then get a canonical SMILES from that result is another question I can't answer:
from chemspipy import ChemSpider
cs = ChemSpider('<YOUR-API-KEY>')
Phenolics = ['Artelin', 'Esculin', 'Axillarin']
for name in Phenolics:
result = cs.search(name)
print(name, result)

Pass R object to Python after running R Script

I have a python script Test.py that runs an R script Test.R below:
import subprocess
import pandas
import pyper
#Run Simple R Code and Print Output
proc = subprocess.Popen(['Path/To/Rscript.exe',
'Path/To/Test.R'],
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout, stderr = proc.communicate()
print stdout
print stderr
The R script is below:
library("methods")
x <- c(1,2,3,4,5,6,7,8,9,10)
y <- c(2,4,6,8,10,12,14,16,18,20)
data <- data.frame(x,y)
How can I pass the R data frame (or any R object for that matter) to Python? I've had great difficulty getting Rpy2 to work on windows, and I've seen this link to use PypeR but it's using a lot of in-line R code in the Python code and I'd really like to keep the code on separate files (or is this practice considered acceptable?) Thanks.

I've experienced issues getting Rpy2 to work on a mac too and use the same workaround calling R directly from python via subprocess; also agree that keeping files separate helps manage complexity.
First export your data as a .csv from R (again script called through subprocess):
write.table(data, file = 'test.csv')
After that, you can import as a python pandas data frame (Pandas):
import pandas as pd
dat = pd.read_csv('test.csv')
dat
row x y
1 1 2
2 2 4
3 3 6
4 4 8
5 5 10
6 6 12
7 7 14
8 8 16
9 9 18
10 10 20

python getting upload/download speeds

I want to monitor on my computer the upload and download speeds. A program called conky already does it with the following in conky conf:
Connection quality: $alignr ${wireless_link_qual_perc wlan0}%
${downspeedgraph wlan0}
DLS:${downspeed wlan0} kb/s $alignr total: ${totaldown wlan0}
and it shows me the speeds in almost real time while I browse. I want to be able to access the same information using python.

You can calculate the speed yourself based on the rx_bytes and tx_bytes for the device and polling those values over an interval
Here is a very simplistic solution I hacked together using Python 3
#!/usr/bin/python3
import time
def get_bytes(t, iface='wlan0'):
with open('/sys/class/net/' + iface + '/statistics/' + t + '_bytes', 'r') as f:
data = f.read();
return int(data)
if __name__ == '__main__':
(tx_prev, rx_prev) = (0, 0)
while(True):
tx = get_bytes('tx')
rx = get_bytes('rx')
if tx_prev > 0:
tx_speed = tx - tx_prev
print('TX: ', tx_speed, 'bps')
if rx_prev > 0:
rx_speed = rx - rx_prev
print('RX: ', rx_speed, 'bps')
time.sleep(1)
tx_prev = tx
rx_prev = rx

I would look into the psutil module for Python.
Here is a short snippet which prints out the number of bytes sent since you booted your machine:
import psutil
iostat = psutil.net_io_counters(pernic=False)
print iostat[0] #upload only
You could easily expand this to grab the value at a constant interval and diff the two values to determine the number of bytes sent and/or received over that period of time.

In order to have interface specific statistics, the methods already proposed would work just fine.
I'll try instead to suggest a solution for your second request:
It would also be very helpful to know which program was using that
bandwidth, but so far I haven't seen anything that can do that.
As already suggested, nethogs prints process specific statistics. To my knowledge, there's no easy way to access these values under /proc and I will therefore explain how nethogs achieves this.
Considering one process with pid PID, nethogs first retrieves a list of all the sockets opened by the process listing the content of /proc/PID/fd:
➜ ~ [1] at 23:59:31 [Sat 15] $ ls -la /proc/21841/fd
total 0
dr-x------ 2 marco marco 0 Nov 15 23:41 .
dr-xr-xr-x 8 marco marco 0 Nov 15 23:41 ..
lrwx------ 1 marco marco 64 Nov 15 23:42 0 -> /dev/pts/15
l-wx------ 1 marco marco 64 Nov 15 23:42 1 -> /dev/null
lrwx------ 1 marco marco 64 Nov 15 23:41 2 -> /dev/pts/15
lrwx------ 1 marco marco 64 Nov 15 23:42 4 -> socket:[177472]
Here we have just one socket and 177472 is the inode number. We will find here all kind of sockets: TCPv4, TCPv6, UDP, netlink. In this case I will consider only TCPv4.
Once all the inode numbers are collected, each socket is assigned an unique identifier, namely (IP_SRC, PORT_SRC, IP_DEST, PORT_DEST). And of course the pairing with the PID is stored as well. The tuple (IP_SRC, PORT_SRC, IP_DEST, PORT_DEST) can be retrieved reading /proc/net/tcp (for TCPv4). In this case:
➜ ~ [1] at 0:06:05 [Sun 16] $ cat /proc/net/tcp | grep 177472
sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode
38: 1D00A8C0:1F90 0400A8C0:A093 01 00000000:00000000 00:00000000 00000000 1000 0 177472 1 f6fae080 21 4 0 10 5
Addresses are expressed as IP:PORT, with IP represented as a 4 bytes LE number. You can then build a key->value structure, where the key is (IP_SRC, PORT_SRC, IP_DEST, PORT_DEST) and value is the PID.
At this point, nethogs captures all the network traffic with libpcap. When it detects a TCP packet it tries to match the tuple (IP_SRC_PACKET, PORT_SRC_PACKET, IP_DEST_PACKET, PORT_DEST_PACKET) against all the connections inside the table. Of course it must try to swap SRC and DEST, the packet could be incoming (DL) or outgoing (UL). If it maches a connection, it retrieves the PID of the process the connection belongs to and it adds the size of the TCP payload to the TX or RX counter. With the number of bytes updated at every packet captured, the transfer speed for each process can be easily calculated.
This, in theory, can be implemented in python with pypcap, even though it needs a bit of work. I have tried to implement something, but it's painfully slow and it requires much more work to be usable. I was monitoring just one PID, with one connection, not updating the connections table, but beyond 3MB/s my script could not keep up with the network traffic.
As you can see it's not exactly trivial. Parsing the output of a tool already available might lead to a better solution and might save you a lot of work.

You could do something dodgy like call conky -i 1 and parse the output:
import subprocess
conky=subprocess.check_output("conky -i 1", shell=True)
lines=conky.splitlines()
print lines[11].split()[1::3]
resulting in:
['1234B', '5678B']
my config looks like:
${scroll 16 $nodename - $sysname $kernel on $machine | }
Uptime: $uptime
Frequency (in MHz): $freq
Frequency (in GHz): $freq_g
RAM Usage: $mem/$memmax - $memperc% ${membar 4}
Swap Usage: $swap/$swapmax - $swapperc% ${swapbar 4}
CPU Usage: $cpu% ${cpubar 4}
Processes: $processes Running: $running_processes
File systems:
/ ${fs_used /}/${fs_size /} ${fs_bar 6 /}
Networking:
Up: ${upspeed eth0} - Down: ${downspeed eth0}
Name PID CPU% MEM%
${top name 1} ${top pid 1} ${top cpu 1} ${top mem 1}
${top name 2} ${top pid 2} ${top cpu 2} ${top mem 2}
${top name 3} ${top pid 3} ${top cpu 3} ${top mem 3}
${top name 4} ${top pid 4} ${top cpu 4} ${top mem 4}

Python, os.walk(), pass information back up?

I'm currently attempting to write a simple python program that loops through a bunch of subdirectories finding java files and printing some information regarding the number of times certain keywords are used. I've managed to get this working for the most part. The problem I'm having is printing overall information regarding the higher directories, for example, my current output is as follows:
testcases/part1/testcase2/root_dir:
0 bytes 0 public 0 private 0 try 0 catch
testcases/part1/testcase2/root_dir/folder1:
12586 bytes 19 public 7 private 8 try 22 catch
testcases/part1/testcase2/root_dir/folder1/folder5:
7609 bytes 9 public 2 private 7 try 11 catch
testcases/part1/testcase2/root_dir/folder4:
0 bytes 0 public 0 private 0 try 0 catch
testcases/part1/testcase2/root_dir/folder4/folder2:
7211 bytes 9 public 2 private 4 try 9 catch
testcases/part1/testcase2/root_dir/folder4/folder3:
0 bytes 0 public 0 private 0 try 0 catch
and I want the output to be:
testcases/part1/testcase2/root_dir :
27406 bytes 37 public 11 private 19 try 42 catch
testcases/part1/testcase2/root_dir/folder1 :
20195 bytes 28 public 9 private 15 try 33 catch
testcases/part1/testcase2/root_dir/folder1/folder5 :
7609 bytes 9 public 2 private 7 try 11 catch
testcases/part1/testcase2/root_dir/folder4 :
7211 bytes 9 public 2 private 4 try 9 catch
testcases/part1/testcase2/root_dir/folder4/folder2 :
7211 bytes 9 public 2 private 4 try 9 catch
testcases/part1/testcase2/root_dir/folder4/folder3 :
0 bytes 0 public 0 private 0 try 0 catch
As you can see the lower subdirectories directly provide the information to the higher subdirectories. This is the problem I'm running into. How to efficiently implement this. I have considered storing each print as a string in a list and then printing everything at the very end, but I don't think that would work for multiple subdirectories such as the example provided. This is my code so far:
def lsJava(path):
print()
for dirname, dirnames, filenames in os.walk(path):
size = 0
public = 0
private = 0
tryCount = 0
catch = 0
#Get stats by current directory.
tempStats = os.stat(dirname)
#Print current directory information
print(dirname + ":")
#Print files of directory.
for filename in filenames:
if(filename.endswith(".java")):
fileTempStats = os.stat(dirname + "/" + filename)
size += fileTempStats[6]
tempFile = open(dirname + "/" + filename)
tempString = tempFile.read()
tempString = removeComments(tempString)
public += tempString.count("public", 0, len(tempString))
private += tempString.count("private", 0, len(tempString))
tryCount += tempString.count("try", 0, len(tempString))
catch += tempString.count("catch", 0, len(tempString))
print(" ", size, " bytes ", public, " public ",
private, " private ", tryCount, " try ", catch,
" catch")
The removeComments function simply removes all comments from the java files using a regular expression pattern. Thank you for any help in advance.
EDIT:
The following code was added at the beginning of the for loop:
current_dirpath = dirname
if( dirname != current_dirpath):
size = 0
public = 0
private = 0
tryCount = 0
catch = 0
The output is now as follows:
testcases/part1/testcase2/root_dir/folder1/folder5:
7609 bytes 9 public 2 private 7 try 11 catch
testcases/part1/testcase2/root_dir/folder1:
20195 bytes 28 public 9 private 15 try 33 catch
testcases/part1/testcase2/root_dir/folder4/folder2:
27406 bytes 37 public 11 private 19 try 42 catch
testcases/part1/testcase2/root_dir/folder4/folder3:
27406 bytes 37 public 11 private 19 try 42 catch
testcases/part1/testcase2/root_dir/folder4:
27406 bytes 37 public 11 private 19 try 42 catch
testcases/part1/testcase2/root_dir:
27406 bytes 37 public 11 private 19 try 42 catch

os.walk() takes an optional topdown argument. If you use os.walk(path, topdown=False) it will instead traverse directories bottom-up.
When you first start the loop save off the first element of the tuple (dirpath) as a variable like current_dirpath. As you continue through the loop you can keep a running total of the file sizes in that directory. Then just add a check like if dirpath != current_dirpath, at which point you know you've gone up a directory level, and can reset the totals.

I don't believe you can do this with a single counter, even bottom-up: If a directory A has subdirectories B and C, when you're done with B you need to zero the counter before you descend into C; but when it's time to do A, you need to add the sizes of B and C (but B's count is long gone).
Instead of maintaining a single counter, build up a dictionary mapping each directory (key) to the associated counts (in a tuple or whatever). As you iterate (bottom-up), whenever you are ready to print output for a directory, you can look up all its subdirectories (from the dirname argument returned by os.walk()) and add their counts together.
Since you don't discard the data, this approach can be extended to maintain separate deep and shallow counts, so that at the end of the scan you can sort your directories by shallow count, report the 10 largest counts, etc.

heapy reports memory usage << top

NB: This is my first foray into memory profiling with Python, so perhaps I'm asking the wrong question here. Advice re improving the question appreciated.
I'm working on some code where I need to store a few million small strings in a set. This, according to top, is using ~3x the amount of memory reported by heapy. I'm not clear what all this extra memory is used for and how I can go about figuring out whether I can - and if so how to - reduce the footprint.
memtest.py:
from guppy import hpy
import gc
hp = hpy()
# do setup here - open files & init the class that holds the data
print 'gc', gc.collect()
hp.setrelheap()
raw_input('relheap set - enter to continue') # top shows 14MB resident for python
# load data from files into the class
print 'gc', gc.collect()
h = hp.heap()
print h
raw_input('enter to quit') # top shows 743MB resident for python
The output is:
$ python memtest.py
gc 5
relheap set - enter to continue
gc 2
Partition of a set of 3197065 objects. Total size = 263570944 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 3197061 100 263570168 100 263570168 100 str
1 1 0 448 0 263570616 100 types.FrameType
2 1 0 280 0 263570896 100 dict (no owner)
3 1 0 24 0 263570920 100 float
4 1 0 24 0 263570944 100 int
So in summary, heapy shows 264MB while top shows 743MB. What's using the extra 500MB?
Update:
I'm running 64 bit python on Ubuntu 12.04 in VirtualBox in Windows 7.
I installed guppy as per the answer here:
sudo pip install https://guppy-pe.svn.sourceforge.net/svnroot/guppy-pe/trunk/guppy

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.