Use, in another project, a variable saved in a project (python) - python

I have a project named project1, where I take a big .txt file (around 1 GB). I make a list that has each line of the text as elements, with the following code:
txt = open('<path>', 'r', encoding="utf8")
lista = list(txt)
And then I edit the items in the list, which is not important for my question.
I need to use the variable lista in another project (project2), but i don't want to import it in the following way
from project1 import lista
because by doing that I have to run all the code in project1 to get the text in the .txt file and to edit the list.
So my goal is to use lista without having to run code that takes time, since lista will always be the same.
IMPORTANT NOTES
I can't just print it in project1, copy the output and paste it in project2 to use it as a variable, because the list is way too long.
One way I thought about that was to save lista as a string in a .txt file (let's call it lista.txt), open the .txt file in project2 and, in some way, tell python that the string in lista.txt is actually a list. Example to understand better:
In project 1
file_text = open('<path>\\lista.txt', 'w', encoding="utf8")
lista = ['<string_1>', '<string_2>', ..., '<string_n>']
file_text.write(f'{lista}')
file_text.close()
In project 2
file_text = open('<path>\\lista.txt', 'r', encoding="utf8")
list_as_string = file_text
def string_to_list(input_string):
#way to transform the list_as_string into the original "lista" variable, which is a list
#return list
string_to_list(list_as_string)
IMPORTANT: The way that I described looks to complex to me, so it was just an idea, but I'm sure there are better ways (maybe there is a way to save a python variable as a file that keeps information like its type and the directly import it in a project as a variable of that type, in this case a list)

May I suggest that you use txt.readlines() instead of list(txt) in order to get the lines unless every line in the file contains a single character. In Json/Pickle; dump/dumps enable you to save an object to an open file (you could save the list to a file) or obtain the source/bytes, respectively, that would be saved in a writable-file-object; load/loads allows to restore the content from the corresponding dump. Personally I would just make a new list using the file's path or encapsulate the code in the other script to make it less slow on import.

Related

Archive files directly from memory in Python

I'm writing this program where I get a number of files, then zip them with encryption using pyzipper, and also I'm using io.BitesIO() to write these files to it so I keep them in-memory. So now, after some other additions, I want to get all of these in-memory files and zip them together in a single encrypted zip file using the same pyzipper.
The code looks something like this:
# Create the in-memory file object
in_memory = BytesIO()
# Create the zip file and open in write mode
with pyzipper.AESZipFile(in_memory, "w", compression=pyzipper.ZIP_LZMA, encryption=pyzipper.WZ_AES) as zip_file:
# Set password
zip_file.setpassword(b"password")
# Save "data" with file_name
zip_file.writestr(file_name, data)
# Go to the beginning
in_memory.seek(0)
# Read the zip file data
data = in_memory.read()
# Add the data to a list
files.append(data)
So, as you may guess the "files" list is an attribute from a class and the whole thing above is a function that does this a number of times and then you get the full files list. For simplicity's sake, I removed most of the irrelevant parts.
I get no errors for now, but when I try to write all files to a new zip file I get an error. Here's the code:
with pyzipper.AESZipFile(test_name, "w", compression=pyzipper.ZIP_LZMA, encryption=pyzipper.WZ_AES) as zfile:
zfile.setpassword(b"pass")
for file in files:
zfile.write(file)
I get a ValueError because of os.stat:
File "C:\Users\vulka\AppData\Local\Programs\Python\Python310\lib\site-packages\pyzipper\zipfile.py", line 820, in from_file
st = os.stat(filename)
ValueError: stat: embedded null character in path
[WHAT I TRIED]
So, I tried using mmap for this purpose but I don't think this can help me and if it can - then I have no idea how to make it work.
I also tried using fs.memoryfs.MemoryFS to temporarily create a virtual filessystem in memory to store all the files and then get them back to zip everything together and then save it to disk. Again - failed. I got tons of different errors in my tests and TBH, there's very little information out there on this fs method and even if what I'm trying to do is possible - I couldn't figure it out.
P.S: I don't know if pyzipper (almost 1:1 zipfile with the addition of encryption) supports nested zip files at all. This could be the problem I'm facing but if it doesn't I'm open to any suggestions for a new approach to doing this. Also, I don't want to rely on a 3rd party software, even if it is open source! (I'm talking about the method of using 7zip to do all the archiving and ecryption, even though it shouldn't even be possible to use it without saving the files to disk in the first place, which is the main thing I'm trying to avoid)

Duplicate in list created from filenames (python)

I'm trying to create a list of excel files that are saved to a specific directory, but I'm having an issue where when the list is generated it creates a duplicate entry for one of the file names (I am absolutely certain there is not actually a duplicate of the file).
import glob
# get data file names
path =r'D:\larvalSchooling\data'
filenames = glob.glob(path + "/*.xlsx")
output:
>>> filenames
['D:\\larvalSchooling\\data\\copy.xlsx', 'D:\\larvalSchooling\\data\\Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx', 'D:\\larvalSchooling\\data\\Raw data-SF_Sat_70dpf_GroupA_n5_20200808_1015-Trial 1.xlsx', 'D:\\larvalSchooling\\data\\Raw data-SF_Sat_84dpf_GroupABCD_n5_20200822_1440-Trial 1.xlsx', 'D:\\larvalSchooling\\data\\~$Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx']
you'll note 'D:\larvalSchooling\data\Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx' is listed twice.
Rather than going through after the fact and removing duplicates I was hoping to figure out why it's happening to begin with.
I'm using python 3.7 on windows 10 pro
If you wrote the code to remove duplicates (which can be as simple as filenames = set(filenames)) you'd see that you still have two filenames. Print them out one on top of the other to make a visual comparison easier:
'D:\\larvalSchooling\\data\\Raw data-SF_Sat_84dpf_GroupABCD_n5_20200822_1440-Trial 1.xlsx',
'D:\\larvalSchooling\\data\\~$Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial 1.xlsx'
The second one has a leading ~ (probably an auto-backup).
Whenever you open an excel file it will create a ghost copy that works as a temporary backup copy for that specific file. In this case:
Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial1.xlsx
~$ Raw data-SF_Fri_70dpf_GroupABC_n5_20200828_1140-Trial1.xlsx
This means that the file is open by some software and it's showing you that backup inside(usually that file is hidden from the explorer as well)
Just search for the program and close it. Other actions, such as adding validation so the "~$.*.xlsx" type of file is ignored should be also implemented if this is something you want to avoid.
You can use os.path.splittext to get the file extension and loop through the directory using os.listdir . The open excel files can be skipped using the following code:
filenames = []
for file in os.listdir('D:\larvalSchooling\data'):
filename, file_extension = os.path.splitext(file)
if file_extension == '.xlsx':
if not file.startswith('~$'):
filenames.append(file)
Note: this might not be the best solution, but it'll get the job done :)

Naming a list in python using a String name

Does anyone know of a way to name a list in python using a String. I am writing a script that iterates through a directory and parses each file and and generates lists with the contents of the file. I would like to use the filename to name each array. I was wondering if there was a way to do it similar to the exec() method but using lists instead of just a normal variable
If you really want to do it this way, then for instance:
import os
directory = os.getcwd() # current directory or any other you would like to specify
for name in os.listdir(directory):
globals()[name] = []
Each of the lists can be now referenced with the name of the file. Of course, this is a suboptimal approach, normally you should use other data structures, such as dictionaries, to perform your task.
You would be better off using a dictionary. Store the file name as the key value of the dictionary and place the contents inside the corresponding value for the key.
It's like
my_dict = {'file1.txt':'This is the contents of file1','file2.txt':'This is the content of file2'}

Writing an element to a list inside another file

I have 2 .py files. Let's name them foo.py and list.py
list.py is not having any code but just a list which looks like this: allowed = ['a', 'b', 'c']
This is all the list.py contains...
Now, the foo.py has a code which basically uses the list inside the list.py to only allow certain inputs (which should be in the list, else: pass)
I added a code to be able to add elements to the list from inside the program, but i'm unable to do so. I have tried to use the append() function. This brings no change to the list...
Please help me edit and write changes to the list inside list.py by providing the right code to so.
Thanks.
What is likely happening is you are sending a copy of the list, when your program is running - and then appending to that list - without impacting the initial list in list.py
I would look at making sure that when you instantiate the original list in list.py, you aren't doing so in such a way that means each time that code block is called, the list is defined again as you call that code block or function again.
If you are trying to use a program in foo.py to explicitly edit the list.py file, you are probably as well off to simply use python's pickle module, which saves the state of python objects to file, and you would then be able to load them as normal later, for example:
try:
with open(list, 'r') as file:
yourlist = pickle.load(file)
except(FileNotFoundError):
yourlist = ['someDefaultValue','anotherDefaultValue']
# Your code block, doing whatever you're doing
with open(list, 'w') as file:
pickle.dump(yourlist, file)
Without knowing more about what you're looking at or exactly what you're trying to do - it's hard to give a better answer!

Create for loop for naming output file Python

So i'm importing a list of names
e.g.
Textfile would include:
Eleen
Josh
Robert
Nastaran
Miles
my_list = ['Eleen','Josh','Robert','Nastaran','Miles']
Then i'm assigning each name to a list and I want to write a new excel file for each name in that list.
#1. Is there anyway I can create a for loop where on the line:
temp = os.path.join(dir,'...'.xls')
_________________________
def high_throughput(names):
import os
import re
# Reading file
in_file=open(names,'r')
dir,file=os.path.split(names)
temp = os.path.join(dir,'***this is where i want to put a for loop
for each name in the input list of names***.xls')
out_file=open(temp,'w')
data = []
for line in in_file:
data.append(line)
in_file.close()
I'm still not sure what you're trying to do (and by "not sure", I mean "completely baffled"), but I think I can explain some of what you're doing wrong, and how to do it right:
in_file=open(names,'r')
dir,file=os.path.split(names)
temp = os.path.join(dir,'***this is where i want to put a for loop
for each name in the input list of names***.xls')
At this point, you don't have the input list of names. That's what you're reading from in_file, and you haven't read it yet. Later on, you read those named into data, after which you can use them. So:
in_file=open(names,'r')
dir,file=os.path.split(names)
data = []
for line in in_file:
data.append(line)
in_file.close()
for name in data:
temp = os.path.join(dir, '{}.xls'.format(name))
out_file=open(temp,'w')
Note that I put the for loop outside the function call, because you have to do that. And that's a good thing, because you presumably want to open each path (and do stuff to each file) inside that loop, not open a single path made out of a loop of files.
But if you don't insist on using a for loop, there is something that may be closer to what you were looking for: a list comprehension. You have a list of names. You can use that to build a list of paths. And then you can use that to build a list of open files. Like this:
paths = [os.path.join(dir, '{}.xls'.format(name)) for name in data]
out_files = [open(path, 'w') for path in paths]
Then, later, after you've built up the string you want to write to all the files, you can do this:
for out_file in out_files:
out_file.write(stuff)
However, this is kind of an odd design. Mainly because you have to close each file. They may get closed automatically by the garbage collection, and even if they don't, they may get flushed… but unless you get lucky, all that data you wrote is just sitting around in buffers in memory and never gets written to disk. Normally you don't want to write programs that depend on getting lucky. So, you want to close your files. With this design, you'd have to do something like:
for out_file in out_files:
out_file.close()
It's probably a lot simpler to go back to the one big loop I suggested in the first place, so you can do this:
for name in data:
temp = os.path.join(dir, '{}.xls'.format(name))
out_file=open(temp,'w')
out_file.write(stuff)
out_file.close()
Or, even better:
for name in data:
temp = os.path.join(dir, '{}.xls'.format(name))
with open(temp,'w') as out_file:
out_file.write(stuff)
A few more comments, while we're here…
First, you really shouldn't be trying to generate .xls files manually out of strings. You can use a library like openpyxl. Or you can create .csv files instead—they're easy to create with the csv library that comes built in with Python, and Excel can handle them just as easily as .xls files. Or you can use win32com or pywinauto to take control of Excel and make it create your files. Really, anything is better than trying to generate them by hand.
Second, the fact that you can write for line in in_file: means that an in_file is some kind of sequence of lines. So, if all you want to do is convert it to a list of lines, you can do that in one step:
data = list(in_file)
But really, the only reason you want this list in the first place is so you can loop around it later, creating the output files, right? So why not just hold off, and loop over the lines in the file in the first place?
Whatever you do to generate the output stuff, do that first. Then loop over the file with the list of filenames and write stuff. Like this:
stuff = # whatever you were doing later, in the code you haven't shown
dir = os.path.dirname(names)
with open(names, 'r') as in_file:
for line in in_file:
temp = os.path.join(dir, '{}.xls'.format(line))
with open(temp, 'w') as out_file:
out_file.write(stuff)
That replaces all of the code in your sample (except for that function named high_throughput that imports some modules locally and then does nothing).
Take a look at openpyxl, especially if you need to create .xlsx files. Below example assumes the Excel workbooks are created as blank.
from openpyxl import Workbook
names = ['Eleen','Josh','Robert','Nastaran','Miles']
for name in names:
wb = Workbook()
wb.save('{0}.xlsx'.format(name))
Try this:
in_file=open(names,'r')
dir,file=os.path.split(names)
for name in in_file:
temp = os.path.join(dir, name + '.xls')
with open(temp,'w') as out_file:
# write data to out_file

Categories

Resources