Using a list to search in textfile - Python

Using a list to search in textfile - Python - python

I'm new to Python, but I'm convinced that you should learn by doing. So here goes:
I'm trying to create a small CLI-application that takes two textfiles as input.
It's then supposed to create a list of common SSIDs from the file SSID.txt, and then go through Kismet.nettxt to see how many of the Access Points that have common names.
Am I on the right track here at all? This is what I have so far, which imports the SSID.txt into a variable called "ssids"
f = open('SSID.txt', "r")
s = open('Kismet.nettxt', "r")
for line in f:
ssids = line.strip()
s.close()
f.close()
Any tips on how I should proceed from here and what to look for?
This is how the files are formatted:
SSID.txt:
linksys
<no ssid>
default
NETGEAR
Wireless
WLAN
Belkin54g
MSHOME
home
hpsetup
smc
tsunami
ACTIONTEC
orange
USR8054
101
tmobile
<hidden ssid>
SpeedStream
linksys-g
3Com
This is how the Kismet.nettxt is formatted:
Network 3: BSSID REMOVED
Manuf : Siemens
First : Sun Dec 29 20:59:46 2013
Last : Sun Dec 29 20:59:46 2013
Type : infrastructure
BSSID : REMOVED
SSID 1
Type : Beacon
SSID : "Internet"
First : Sun Dec 29 20:59:46 2013
Last : Sun Dec 29 20:59:46 2013
Max Rate : 54.0
Beacon : 10
Packets : 2
Encryption : WPA+PSK
Encryption : WPA+TKIP
Channel : 5
Frequency : 2432 - 2 packets, 100.00%
Max Seen : 1000
LLC : 2
Data : 0
Crypt : 0
Fragments : 0
Retries : 0
Total : 2
Datasize : 0
Last BSSTS :
Seen By : wlan0 (wlan0mon)

Here are a couple of tips of how I would go about it.
Read SSID.txt, create a dict of names, so you have something fast to lookup each name and store a count. This also removes the duplicates if there are any in the SSID.txt file.
Read Kismet.nettxt, if line starts with has "SSID :" then take the name and lookup in the dict, if found add to the count.
At this point you will have an ssids dictionary with the name and count.
The code would look something like this:
f = open('SSID.txt', "r")
s = open('Kismet.nettxt', "r")
ssids = {} # Create dictionary
for line in f:
# Add each to the dictionary,
# if there are duplicates this effectively removes them
ssids[line.strip()] = 0
for line in s:
# Check the lines in the kismet file that start with the SSID we are after
if line.strip().startswith('SSID :'):
# Break the entry at : and take the second part which is the name
kismet = line.split(':')[1].strip()
# Remove the " marks from front and back and lookup in the ssids
# add to the count if you find it.
if kismet[1:-1] in ssids:
ssids[kismet[1:-1]] += 1
s.close()
f.close()

This code should do everything you asked for in the OP:
try:
with open('SSID.txt', 'r') as s:
ssid_dict = {}
for each_line in s:
ssid_dict[each_line.strip()] = 0 #key: SSID value: count
except FileNotFoundError:
pass
try:
with open('kissmet.nettext', 'r') as f:
try:
for each_line in f:
each_line = each_line.strip()
if each_line.startswith("SSID") and ':' in each_line: #checks for a line that starts with 'SSID' and contains a ':'
val = each_line.split(':')[1].replace('"', '').strip() #splits the line to get the SSID, removes the quotes
if ssid_dict[val]:
ssid_dict[val] += 1 #adds one to the count in the dictionary
else:
pass#I don't know what you want to do here
except KeyError as err:
print("Key error" + str(err))
except FileNotFoundError:
pass
for key in ssid_dict:
print(str(key) + " " + str(ssid_dict[key]))
it outputs:
Wireless 0
101 0
Belkin54g 0
tsunami 0
tmobile 0
<hidden ssid> 0
linksys-g 0
smc 0
hpsetup 0
ACTIONTEC 0
SpeedStream 0
Internet 1
3Com 0
home 0
USR8054 0
<no ssid> 0
WLAN 0
NETGEAR 0
default 0
MSHOME 0
linksys 0
orange 0
I added 'Internet' to the list of SSID's for testing purposes.
EDIT: I have updated the section that adds to the count to deal with keys that aren't in the dictionary. I don't knwo what you want to do with ones that aren't so for now I left a pass in there

Related

How to parse a text file into a dictionary in python with key on one line followed by two lines of values

I have a file with lines in this format:
CALSPHERE 1
1 00900U 64063C 20161.15561498 .00000210 00000-0 21550-3 0 9996
2 00900 90.1544 28.2623 0029666 80.8701 43.4270 13.73380512769319
CALSPHERE 2
1 00902U 64063E 20161.16836122 .00000025 00000-0 23933-4 0 9990
2 00902 90.1649 30.9038 0019837 126.9344 3.6737 13.52683749559421
..etc.
I would like to parse this into a dictionary of the format:
{CALSPHERE 1:(1 00900U 64063C 20161.15561498 .00000210 00000-0 21550-3 0 9996, 2 00900 90.1544 28.2623 0029666 80.8701 43.4270 13.73380512769319),
CALSPHERE 2:(1 00902U 64063E 20161.16836122 .00000025 00000-0 23933-4 0 9990, 2 00902 90.1649 30.9038 0019837 126.9344 3.6737 13.52683749559421),...}
I'm puzzled as to how to parse this, so that every third line is the key, with the following two lines forming a tuple for the value. What would be the best way to do this in python?
I've attempted to add some logic for "every third line" though it seems kind of convoluted; something like
with open(r"file") as f:
i = 3
for line in f:
if i%3=0:
key = line
else:
#not sure what to do with the next lines here

If your file always have the same distribution (i.e: the 'CALSPHERE' word -or any other that you want it as your dictionary key-, followed by two lines), you can achieve what you want by doing something as follows:
with open(filename) as file:
lines = file.read().splitlines()
d = dict()
for i in range(0, len(lines), 3):
d[lines[i].strip()] = (lines[i + 1], lines[i + 2])
Output:
{
'CALSPHERE 1': ('1 00900U 64063C 20161.15561498 .00000210 00000-0 21550-3 0 9996', '2 00900 90.1544 28.2623 0029666 80.8701 43.4270 13.73380512769319'),
'CALSPHERE 2': ('1 00902U 64063E 20161.16836122 .00000025 00000-0 23933-4 0 9990', '2 00902 90.1649 30.9038 0019837 126.9344 3.6737 13.52683749559421')
}

Assuming that your content is in file.txt you can use the following.
It shall work for any number of CALSPHERE keyword occurrences and also various number of entries between.
with open('file.txt') as inp:
buffer = []
for line in inp:
# remove newline
copy = line.replace('\n','')
# check if next entry
if 'CALSPHERE' in copy:
buffer.append([])
# add line
buffer[-1].append(copy)
# put the output into dictionary
res = {}
for chunk in buffer:
# safety check
if len(chunk) > 1:
res[chunk[0]] = tuple( chunk[1:] )
print(res)

Appending the length of sentences to file

I found the length and index and i want save all of them to new file:
example: index sentences length
my code
file = open("testing_for_tools.txt", "r")
lines_ = file.readlines()
for line in lines_:
lenght=len(line)-1
print(lenght)
for item in lines_:
print(lines_.index(item)+1,item)
output:
64
18
31
31
23
36
21
9
1
1 i went to city center, and i bought xbox5 , and some other stuff
2 i will go to gym !
3 tomorrow i, sill start my diet!
4 i achive some and i need more ?
5 i lost lots of weights؟
6 i have to , g,o home,, then sleep ؟
7 i have things to do )
8 i hope so
9 o
desired output and save to new file :
1 i went to city center, and i bought xbox5 , and some other stuff 64
2 i will go to gym ! 18

This can be achieved using the following code. Note the use of with ... as f which means we don't have to worry about closing the file after using it. In addition, I've used f-strings (requires Python 3.6), and enumerate to get the line number and concatenate everything into one string, which is written to the output file.
with open("test.txt", "r") as f:
lines_ = f.readlines()
with open("out.txt", "w") as f:
for i, line in enumerate(lines_, start=1):
line = line.strip()
f.write(f"{i} {line} {len(line)}\n")
Output:
1 i went to city center, and i bought xbox5 , and some other stuff 64
2 i will go to gym ! 18
If you wanted to sort the lines based on length, you could just put the following line after the first with block:
lines_.sort(key=len)
This would then give output:
1 i will go to gym ! 18
2 i went to city center, and i bought xbox5 , and some other stuff 64

Modifying a data file: issue with savetxt header option

I have a bunch of files from a LAMMPS simulation made by a 9-lines header and a Nx5 data array (N is large, order 10000). A file looks like this:
ITEM: TIMESTEP
1700000
ITEM: NUMBER OF ATOMS
40900
ITEM: BOX BOUNDS pp pp pp
0 59.39
0 59.39
0 59.39
ITEM: ATOMS id type xu yu zu
1 1 -68.737755560980844 1.190046376093027 122.754819323806714
2 1 -68.334493269859621 0.365731265115530 122.943111038981527
3 1 -68.413018326512173 -0.456802254452782 123.436843456292138
4 1 -68.821350328206080 -1.360098170077123 123.314784135612115
5 1 -67.876948635447775 -1.533699833382506 123.072964235308660
6 1 -67.062910322675322 -2.006415676993953 123.431518511867381
7 1 -67.069984116148134 -2.899068427170739 123.057125785834685
8 1 -66.207325578729183 -3.292545155979909 123.377770523297343
...
I would like to open every file, perform a certain operation on the numerical data and save the file with a different name leaving the header unchanged. My script is:
for f in files:
filename=path+"/"+f
with open(filename) as myfile:
header = ' '.join([next(myfile) for x in xrange(9)])
data=np.loadtxt(filename,skiprows=9)
data[:,2:5]%=L #Put everything inside the box...
savetxt(filename.replace("lammpstrj","fold.lammpstrj"),data,header=header,comments="",fmt="%d %d %.15f %.15f %.15f")
The output, though, looks like this:
ITEM: TIMESTEP
1700000
ITEM: NUMBER OF ATOMS
40900
ITEM: BOX BOUNDS pp pp pp
0 59.39
0 59.39
0 59.39
ITEM: ATOMS id type xu yu zu
1 1 50.042244439019157 1.190046376093027 3.974819323806713
2 1 50.445506730140380 0.365731265115530 4.163111038981526
3 1 50.366981673487828 58.933197745547218 4.656843456292137
4 1 49.958649671793921 58.029901829922878 4.534784135612114
5 1 50.903051364552226 57.856300166617494 4.292964235308659
6 1 51.717089677324680 57.383584323006048 4.651518511867380
7 1 51.710015883851867 56.490931572829261 4.277125785834684
8 1 52.572674421270818 56.097454844020092 4.597770523297342
...
The header is not exactly the same: there are spaces at the beginning of every lines except for the first, and a newline after the last line of the header. I need to get rid of those, but I don't know how.
What am I doing wrong?

The issue is in the ' '.join(a):
a = ['sadf\n', 'sdfg\n']
' '.join(a)
>>>'sadf\n sdfg\n' # Note the space at the start of the second line.
Instead:
''.join(a)
>>>'sadf\nsdfg\n'
You will also need to trim the last '\n' in your header to prevent the empty line:
''.join(a).rstrip()
>>>'sadf\nsdfg'

The header parameter will add a newline after it automatically, so you can eliminate the original last '\n' as a redundant newline.
header = header.rstrip('\n')
The leading spaces occurs, since you join each line by an extra space character. you can solve it by the below command.
header = ''.join([next(myfile) for x in xrange(9)])

Python - Formatting a print to align a specific column

I am trying to format a print statement to align a specific column.
Currently my output is:
0 - Rusty Bucket (40L bucket - quite rusty) = $0.00
1 - Golf Cart (Tesla powered 250 turbo) = $195.00*
2 - Thermomix (TM-31) = $25.50*
3 - AeroPress (Great coffee maker) = $5.00
4 - Guitar (JTV-59) = $12.95
The output I am looking for is:
0 - Rusty Bucket (40L bucket - quite rusty) = $0.00
1 - Golf Cart (Tesla powered 250 turbo) = $195.00*
2 - Thermomix (TM-31) = $25.50*
3 - AeroPress (Great coffee maker) = $5.00
4 - Guitar (JTV-59) = $12.95
Here is the code I am currently using for the print:
def list_items():
count = 0
print("All items on file (* indicates item is currently out):")
for splitline in all_lines:
in_out = splitline[3]
dollar_sign = "= $"
daily_price = "{0:.2f}".format(float(splitline[2]))
if in_out == "out":
in_out = str("*")
else:
in_out = str("")
print(count, "- {} ({}) {}{}{}".format(splitline[0], splitline[1], dollar_sign, daily_price, in_out))
count += 1
I have tried using formatting such as:
print(count, "- {:>5} ({:>5}) {:>5}{}{}".format(splitline[0], splitline[1], dollar_sign, daily_price, in_out))
but have never been able to get just the one column to align. Any help or suggestions would be greatly appreciated! I am also using python 3.x
To note I am using tuples to contain all the information, with all_lines being the master list, as it were. The information is being read from a csv originally. Apologies for the horrible naming conventions, trying to work on functionality first.
Sorry if this has been answered elsewhere; I have tried looking.
EDIT: Here is the code im using for my all_lines
import csv
open_file = open('items.csv', 'r+')
all_lines = []
for line in open_file:
splitline = line.strip().split(',')
all_lines.append((splitline[0], splitline[1], splitline[2], splitline[3]))
And here is the csv file information:
Rusty Bucket,40L bucket - quite rusty,0.0,in
Golf Cart,Tesla powered 250 turbo,195.0,out
Thermomix,TM-31,25.5,out
AeroPress,Great coffee maker,5.0,in
Guitar,JTV-59,12.95,in

You should look at str.ljust(width[, fillchar]):
> '(TM-31)'.ljust(15)
'(TM-31) ' # pad to width
Then extract the variable-length {} ({}) part and padd it to the necessary width.

What you are looking for may be:
[EDIT] I have now also included a tabulate version since you may gain flexibility with this.
import csv
from tabulate import tabulate
open_file = open('items.csv', 'r+')
all_lines = []
for line in open_file:
splitline = line.strip().split(',')
all_lines.append((splitline[0], splitline[1], splitline[2], splitline[3]))
#print all_lines
count = 0
new_lines=[]
for splitline in all_lines:
in_out = splitline[3]
dollar_sign = "= $"
daily_price = "{0:.2f}".format(float(splitline[2]))
if in_out == "out":
in_out = str("*")
else:
in_out = str("")
str2='('+splitline[1]+')'
print count, "- {:<30} {:<30} {}{:<30} {:<10}".format(splitline[0], str2, dollar_sign, daily_price, in_out)
new_lines.append([splitline[0], str2, dollar_sign, daily_price, in_out])
count += 1
print tabulate(new_lines, tablefmt="plain")
print
print tabulate(new_lines, tablefmt="plain", numalign="left")

I do not like the idea of controlling printing format myself.
In this case, I would leverage a tabulating library such as: Tabulate.
The two key points are:
keep data in a table (e.g. list in a list)
select proper printing format with tablefmt param.

compare text lines(multiple lists) using python

I have the text file which contains below lines:
Cycle 0 DUT 2 Bad Block : 2,4,6,7,8,10,12,14,16,18,20,22,24,26,28
Cycle 0 DUT 3 Bad Block : 4,6,8,10,12,14,16,18,20,22,24,26
Cycle 0 DUT 4 Bad Block : 4,6,8,10,12,14,16,18,20,22,24,26
Cycle 1 DUT 2 Bad Block : 2,4,6,7,8,10,12,14,16,18,20,22,24,26,28
Cycle 1 DUT 3 Bad Block : 4,6,8,10,12,14,16,18,20,22,24,26,28,30,32
I want to compare the Cycle 0 DUT 2 text line (numbers after colon separated with commas) to the Cycle 1 DUT 2 text line (numbers after colon seperated with commas) and get the differences, then compare Cycle 0 DUT 3 text line to Cycle 1 DUT 3 text line and get the differences or the unique values.

I guess you want to key things to to the DUT digit:
import re
dut_data = {}
cycle_dut = re.compile('^Cycle\s+(\d)\s+DUT\s+(\d)\s+Bad Block\s*:\s*(.*)$')
with open(inputfile, 'r') as infile:
for line in infile:
match = cycle_dut.search(line)
if match:
cycle, dut, data = match.groups()
data = [int(v) for v in data.split(',')]
if cycle == '0':
# Store cycle 0 DUT values keyed on the DUT number
dut_data[dut] = data
else:
# Compare against cycle 0 data, if the same DUT number was present
cycle_0_data = dut_data.get(dut)
if cycle_0_data is not None:
# compare cycle_0_data and data here
print 'DUT {} differences: {}'.format(dut, ','.join([str(v) for v in sorted(set(cycle_0_data).symmetric_difference(data))]))
I used a quick set difference to print the differences, this may require refining.
For your sample data, this prints:
DUT 2 differences:
DUT 3 differences: 28,30,32

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using a list to search in textfile - Python - python

Related

How to parse a text file into a dictionary in python with key on one line followed by two lines of values

Appending the length of sentences to file

Modifying a data file: issue with savetxt header option

Python - Formatting a print to align a specific column

compare text lines(multiple lists) using python

Categories

Resources