Pickle persistence - python

I'm starting using Python recently and I'm trying to make a program that manipulates data in Python using pickle, however I would like my file to be kind like this:
CODE | PIECE | PRICE
line one 1 1 1,00
line two 2 2 2,00
Consider 1 right down CODE, 1 right down PIECE and 1,00 right down PRICE until gets 50.
Here's the question: Is there anyway to do this using pickle? Like:
columns = int(input('Number of columns : ')) # Which would be 3 (code, piece and price)
data = [ ]
for i in range(columns):
raw = input('Enter data '+str(i)+' : ')
data.append(raw)
file = open('file.dat', 'wb')
pickle.dump(data, file)
file.close()
Obviously, it cannot be done using input, so is there some way to do this?

Related

Replacing `eval` in python for a dynamic input

I am trying to replace eval in a python code. I am using a configuration file to read and create a string command in python syntax which is later executed using eval.
There are two functions:
Reads the configuration file and creates a string which can be executed using eval. Example: 'raw_bytes[26:31].hex()+","+codecs.decode(raw_bytes[41:42],\"cp500\")+","+raw_bytes[48:49].hex()+","+raw_bytes[102:106].hex()'
def extractor_command(config_file):
START=0
CMD=""
with open(config_file,'r') as f:
next(f) #skipping the comments in the first line
for line in f:
col = line.split()
UPTO=START+int(col[2])
if col[3] == "1":
if col[3] == "0":
CMD=CMD+'raw_bytes[{}:{}].hex()'.format(str(START),str(UPTO))
CMD=CMD+"+\",\"+"
if col[3] == "1":
CMD=CMD+'codecs.decode(raw_bytes[{}:{}],"cp500")'.format(str(START),str(UPTO))
CMD=CMD+"+\",\"+"
elif col[2] == "0":
pass
START=UPTO
CMD=CMD.rstrip('+\",\"+')
return CMD
The configuration file looks like this:
Nr Active Length(bytes) String
Field1 1 8 1
Field2 0 2 0
Field3 1 4 1
...
Field250 1 1 0
Field251 0 1 1
Field252 0 2 1
The second function, will read a binary file and will use the command created in the 1st function to extract from the binary file. The extracted lines are written into a txt file.
def extract(in_file,out_file,cmd):
READBLOCKS=2052
compiled = compile(cmd, '<string>', 'eval')
with open(out_file,'w') as extracted_file:
f=open(in_file, 'rb')
while True:
raw_bytes = f.read(READBLOCKS)
row=eval(compiled)
extracted_file.write(row+'\n')
b=f.read(1)
if not b:
break
f.close()
Although this works fine I am looking for another solution to make the code more readable and avoid eval for security reasons. Also I don't want to create the command to extract everytime a portion of the binary file is read, because it impacts the performance (binary file is huge).
The code doesn't look pretty but it's just for demonstration.
Any suggestion?

Count the occurrences of 3 gram terms within multiple files

I have a list of 3-gram terms of around 10000 in a .txt file. I want to match these terms within multiple .GHC files under a directory and count the occurrences of each of the terms.
One of these files looks like this:
ntdll.dll+0x1e8bd ntdll.dll+0x11a7 ntdll.dll+0x1e6f4 kernel32.dll+0xaa7f kernel32.dll+0xb50b ntdll.dll+0x1e8bd ntdll.dll+0x11a7 ntdll.dll+0x1e6f4 kernel32.dll+0xaa7f kernel32.dll+0xb50b ntdll.dll+0x1e8bd ntdll.dll+0x11a7 ntdll.dll+0x1e6f4 kernel32.dll+0xaa7f kernel32.dll+0xb50b ntdll.dll+0x1e8bd ntdll.dll+0x11a7 ntdll.dll+0x1e6f4 kernel32.dll+0xaa7f kernel32.dll+0xb50b kernel32.dll+0xb511 kernel32.dll+0x16d4f
I want the resulting output to be like this in a dataframe:
N_gram_term_1 N_gram_term_2 ............ N_gram_term_n
2 1 0
3 2 4
3 0 3
the 2nd line here indicates that N_gram_term_1 has appeared 2 times in one file and N_gram_term_2 1 time and so on.
the 3rd line indicates that N_gram_term_1 has appeared 3 times in second file and N_gram_term_2 2 times and so on.
If I need to be more clear about something, please let me know.
I am sure you have implementations for this purpoes, perhaps in sklearn. A simple implementation from scratch, though would be:
import sys
d = {} # dictionary that will have 1st key = file and 2 key = 3gram
for file in sys.argv[1:]: # These are all files to be analyzed
d[file] = {} # The value here is a nested dictionary
with open(file) as f: # Opening each file at a time
for line in f: # going through every row of the file
g = line.strip()
if g in d[file]:
d[file][g] +=1
else:
d[file][g] = 1
import pandas
print(pandas.DataFrame(d).T)

How to import a text file that both contains values and text in python?

I am aware that a lot of questions are already asked on this topic, but none of them worked for my specific case.
I want to import a text file in python, and want to be able to access each value seperatly in python. My text file looks like (it's seperated by tabs):
example dataset
For example, the data '1086: CampNou' is written in one cell. I am mainly interested in getting access to the values presented here. Does anybody have a clue how to do this?
1086: CampNou 2084: Hospi 2090: Sants 2094: BCN-S 2096: BCN-N 2101: UNI 2105: B23 Total
1086: CampNou 0 15,6508 12,5812 30,3729 50,2963 0 56,0408 164,942
2084: Hospi 15,7804 0 19,3732 37,1791 54,1852 27,4028 59,9297 213,85
2090: Sants 12,8067 22,1304 0 30,6268 56,7759 29,9935 62,5204 214,854
2096: BCN-N 51,135 54,8545 57,3742 46,0102 0 45,6746 56,8001 311,849
2101: UNI 0 28,9589 31,4786 37,5029 31,6773 0 50,2681 179,886
2105: B23 51,1242 38,5838 57,3634 75,1552 56,7478 40,2728 0 319,247
Total 130,846 160,178 178,171 256,847 249,683 143,344 285,559 1404,63'
You can use pandas to open and manipulate your data.
import pandas as pd
df = pd.read_csv("mytext.txt")
This should read properly your file
def read_file(filename):
"""Returns content of file"""
file = open(filename, 'r')
content = file.read()
file.close()
return content
content = read_file("the_file.txt") # or whatever your text file is called
items = content.split(' ')
Then your values will be in the list items: ['', '1086: CampNou', '2084: Hospi', '2090: Sants', ...]

Ordering a file

I have this code
archivo=open("archivo.csv","r")
for i in range(10):
for reg in archivo:
if archivo[reg] < archivo[reg+1]:
x = archivo[reg]
archivo[reg] = archivo[reg+1]
archivo[reg+1] = x
archivo.close()
archivo = open("archivo.csv","w")
archivo.write(reg)
What i want is to order alphabetically the files and to save it ordered, but i have several errors. The main one says that the file has not atribute getitem and in the web i didn't find anything equal. Can someone help me?
Input looks like
Matt | 7 | 8
John | 9 | 6
Jim | 6 | 7
I have modified the source CSV file to be comma separated. So archivo.csv looks like
Matt,7,8
John,9,6
Jim,6,7
Now to read this file, python already has standard module called csv. Using that we can read and write csv reliably.
from csv import reader, writer
archivo=reader(open("archivo.csv","r"))
a = sorted(archivo)
archivo1 = writer(open("archivo1.csv", "w"))
for row in a:
archivo1.writerow(row)

Parsing and adding content of a file in Python

What I am doing is that I have a file which contains some data as follows:
ben | 2 | 40
germany | 6 | 60
What I need as an output is::
ben | 2 | 40
germany | 6 | 60
field 1 = 8
field 2 = 100
Please suggest me some solution to move ahead in Python.
This has the aroma of a homework assignment to me, so I'm going to try to stick to providing some pointers rather than an outright solution.
You can use Python's open() function to open a file. The resulting object can be iterated over in a loop, like for line in myfile:. When you're done with the file you should call myfile.close(), and you could re-open it in "append" mode to write the answer at the end.
Each line will be a string, and you can call line.split('|') to get the line into chunks. I like to use multiple assignment: name, col1, col2 = line.split('|'). You will probably need to use int() to coerce the numbers from string format to integer format so that you can add them up.
I think that's probably a pretty reliable start, right?
there's probably a more elegant way to do this...
results = [0, 0]
with open("\path\to\file.txt") as f:
for line in f:
values = line.split("|")
results[0] += int(values[1])
results[1] += int(values[2])
print("field 1 = " + str(results[0]), "field 2 = " + str(results[1]))

Categories

Resources