File Reading Options Enquiry (Python) - python

I am a programming student for the semester. In class we have been learning about file opening, reading and writing.
We have used a_reader to achieve such tasks for file opening. I have been reading our associated text/s and I have noticed that there is a CSV reader option which I have been using.
I wanted to know if there were anymore possible ways to open/read a file as I am trying to grow my knowledge base in python and its associated contents.
EDIT:
I was referring to CSV more specifically as that is the type of files we use at the moment. We have learnt about CSV Reader and a_reader and an example from one of our lectures is shown below.
def main():
a_reader = open('IDCJAC0016_009225_1800_Data.csv', 'rU')
file_data = a_reader.read()
a_reader.close()
print file_data
main()
It may seem overly broad but I have no knowledge which is why I am asking is there more than just the 2 ways above. If there is can someone who knows provide the types so I can read up on and research on them.

If you're asking about places to store things, the first interfaces you'll meet are files and sockets (pretend a network connection is like a file, see http://docs.python.org/2/library/socket.html).
If you mean file formats (like csv), there are many! Probably you can think of many yourself, but besides csv there are html files, pictures (png, jpg, gif), archive formats (tar, zip), text files (.txt!), python files (.py). The list goes on.

There are many ways to read files in different ways.
Just plain open will take a filename and open it as a sequence of lines. Or, you can just call read() on it, and it will read the whole file at once into one giant string.
codecs.open will take a filename and a character set, and decode each line to Unicode automatically. Or, again, you can just call read() on it, and it will read and decode the whole file at once into one giant Unicode string.
csv.reader will take a file or file-like object, and read it as a sequence of CSV rows. There's no direct equivalent of read()—but you can turn any sequence into a list by just calling list on it, so list(my_reader) will give you a list of rows (each of which is, itself, a list).
zipfile.ZipFile will take a filename, or a file or file-like object, and read it as a ZIP archive. This doesn't go line by line, of course, but you can go archived file by archived file. Or you can do fancier things, like search for archived files by name.
There are modules for reading JSON and XML documents, different ways of handling binary files, and so on. Some of them work differently—for example, you can search an XML document as a tree with one module, or go element by element with a different one.
Python has a pretty extensive standard library, and you can find the documentation online. Every module that seems like it should be able to work on files, probably can.
And, beyond what comes in the standard library, PyPI, the Python Package Index has thousands of additional modules. Looking for a way to read YAML documents? Search PyPI for yaml and you'll find it.
Finally, Python makes it very easy to add things like this on your own. The skeleton of a function like csv.reader is as simple as this:
def reader(fileobj):
for line in fileobj:
yield parse_one_csv_line(line)
You can replace that parse_one_csv_line with anything you want, and you've got a custom reader. For example, here's an uppercase_reader:
def uppercase_reader(fileobj):
for line in fileobj:
yield line.upper()
In fact, you can even write the whole thing in one line:
shouts = (line.upper() for line in fileobj)
And the best thing is that, as long as your reader only yields one line at a time, your reader is itself a file-like object, so you can pass uppercase_reader(fileobj) to csv.reader and it works just fine.

Related

How to extract the full path from a file while using the "with" statement?

I'm trying, just for fun, to understand if I can extract the full path of my file while using the with statement (python 3.8)
I have this simple code:
with open('tmp.txt', 'r') as file:
print(os.path.basename(file))
But I keep getting an error that it's not a suitable type format.
I've been trying also with the relpath, abspath, and so on.
It says that the input should be a string, but even after casting it into string, I'm getting something that I can't manipulate.
Perhaps there isn't an actual way to extract that full path name, but I think there is. I just can't find it, yet.
You could try:
import os
with open("tmp.txt", "r") as file_handle:
print(os.path.abspath(file_handle.name))
The functions in os.path accept strings or path-like objects. You are attempting to pass in a file instead. There are lots of reasons the types aren't interchangable.
Since you opened the file for text reading, file is an instance of io.TextIOWrapper. This class is just an interface that provides text encoding and decoding for some underlying data. It is not associated with a path in general: the underlying stream can be a file on disk, but also a pipe, a network socket, or an in-memory buffer (like io.StringIO). None of the latter are associated with a path or filename in the way that you are thinking, even though you would interface with them as through normal file objects.
If your file-like is an instance of io.FileIO, it will have a name attribute to keep track of this information for you. Other sources of data will not. Since the example in your question uses FileIO, you can do
with open('tmp.txt', 'r') as file:
print(os.path.abspath(file.name))
The full file path is given by os.path.abspath.
That being said, since file objects don't generally care about file names, it is probably better for you to keep track of that info yourself, in case one day you decide to use something else as input. Python 3.8+ allows you to do this without changing your line count using the walrus operator:
with open((filename := 'tmp.txt'), 'r') as file:
print(os.path.abspath(filename))

Search for a word, and modify the whole line in Python text processing

This is my carDatabase.txt
CarID:c01 ModelName:honda VehicleType:city Price:20
CarID:c02 ModelName:honda VehicleType:x Price:30
I want to search for the carID and be only able to modify the whole line without interrupting others
my current code is here:
# Converting txt data into a string and modify
carsDatabaseFile = open('carsDatabase.txt', 'r')
allDataFromDatabase = [line.split(',') for line in carsDatabaseFile.readlines()]
Note:
Your question has a couple of issues: your sample from carDatabase.txt looks like it is tab-delimited, but your current code looks like it is splitting the line around the ',' character. This also looks like a place where a list comprehension might be hurting you more than it is helping you. Break that up into a for-loop if you're trying to add some logic to manipulate a single line.
For looking at CSV files, I would highly recommend using pandas for general manipulation of data in comma ceparated as well as a number of other formats.
That said, if you are truly restricted to only using built-in packages, or you are looking at this as a learning exercise, and your goal is to directly manipulate just one line of that file, what you are looking for is the seek method. You can use this in combination with the tell method ( documented just blow seek in the above link ) to find where you are in the file.
Write a for loop to identify which line in the file you are looking for
From there, you can get the output of tell() to find the specific place in the file you are trying to manipulate
Using the output from the above two steps, you can set the file pointer to a specific location using the seek() method (by byte: files are really stored as one dimensional).
You can now use the write() method to directly update the file at the location you determined above.

Multiple Scripts/Spiders writing to different CSV files. Will this code cause any problems?

I'm building some spiders to do some web scraping and am trying to figure out if my code is ok as written before I start building them out. The spiders will run via crontab at the same time, though they each write to a separate file.
with open(item['store_name']+'price_list2.csv', mode='a', newline ='') as price_list2:
savepriceurl2 = csv.writer(price_list2, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
savepriceurl2.writerow([item['url']]+item['price'])
I'm not sure how the 'open as price_list2' or 'savepriceurl2 = csv.writer' parts of the code work, and will the spiders get mixed up if they all use the same names, even for a different csv file, if they are all running at the same time?
From the minimal code posted it is difficult to say if there will be an issue with two. Assuming that the code you posted will run in each instance of the object (I assume) they will be writing to whatever store they are scraping (defined by your item['store_name'].
Regarding your questions about the code, the open(...) as price_list2 returns an io.TextIOWrapper object (details here) which is stored as the variable price_list. You could achieve the same by writing: price_list2 = open(...)however then you must close the file in order to not leak memory/data. However by writing it as with open(...) as file: means you do not have to call file.close() and thus ensures the file is always closed after usage.
The other line you asked about, savepriceurl2 = csv.writer(...) creates an object that simplifies writing to the actual file. Thus, you can simply use the object function writerow() to easily write a row to the desired file. More information on that can be found here.
So basically what your code is doing is this:
Open an object that represents a file. In your case you have also specified that you will append to the file if it exists (due to the type being 'a')
Create a csv writer instance that will write to the file object price_list2 with the delimiter ',' (and some other options, check the link for details)
Tell the csv writer to write a row to the file which is the concatenation of the value of item['url'] and item['price']
For your last question, given there is no information on your actual design and setup, I am assuming that each spider is an instance of the class that holds this file. As long as each spider is going to different sites (thus meaning that one spider will not have item['store_name'] be the same as the other spider, you should be writing to different files. As long as this is the case it should be fine (I'm not aware of issues of writing two to files 'at the same time' in python). If this is not the case you will run into issues if your spiders try to write to the same file at the same time.
As a tip, googling the functions will often get you the description and clarification on functions quicker than a post here and will have a lot more information.
I hope this helps and clarifies things for you.

Python's csv.writerow() is acting a tad funky

From what I've researched, csv.writeRow should take in a list, and then write it to the given csv file. Here's what I tried:
from csv import writer
with open('Test.csv', 'wb') as file:
csvFile, count = writer(file), 0
titles = ["Hello", "World", "My", "Name", "Is", "Simon"]
csvFile.writerow(titles)
I'm just trying to write it so that each word is in a different column.
When I open the file that it creates, however, I get the following message:
After pressing to continue anyways, I get a message saying that the file is either corrupted, or is a SYLK file. I can then open the file, but only after going through two error messages everytime I open the file.
Why is this?
Thanks!
It's a documented issue that Excel will assume a csv file is SYLK if the first two characters are 'ID'.
Venturing into the realm of opinion - it shouldn't, but Excel thinks it knows better than the extension. To be fair, people expect it to be able to figure out cases where the extension really is wrong, but in a case like this assuming the extension is wrong, and then further assuming the file is corrupt when it doesn't appear corrupt if interpreted according to the extension is just mind-boggling.
#John Y points out:
One thing to watch out for: The "workaround" given by the Microsoft issue linked to by #PeterDeGlopper is to (manually) prepend an apostrophe into the file. (This is also advice commonly found on the Web, including StackOverflow, to try to force CSV digits to be treated as strings rather than numbers.) This is not what I'd call good advice, as that injects a literal apostrophe into your data.
#DSM suggests using quoting=csv.QUOTE_NONNUMERIC on the writer. Excel is not confused by a file beginning with "ID" rather than ID, so if the other tools that are going to work with the CSV accept that quoting level this is probably the best solution other than just ignoring Excel's confusion.

python: pass string instead of file as function parameter

I am beginner in python, and I need to use some thirdparty function which basically has one input - name of a file on a hard drive. This function parses file and then proceses it.
I am generating file contents in my code (it's CSV file which I generate from a list) and want to skip actual file creation. Is there any way I can achieve this and "hack" the thirdparty function to accept my string without creating a file?
After some googling I found StringIO, and created a file object in it, now I am stuck on passing this object to a function (again, it accepts not a file object but a file name).
It looks like you'll need to write your data to a file then pass the name of that file to the 3rd party library. You might want to consider using the tempfile module to create the file in a safe and easy way.
If it requires a filename, then you're going to have to create a file. (And that's poor design on the part of the library creators.)
You should look into the python docs for I/O, seen here:
http://docs.python.org/tutorial/inputoutput.html
Python processes files by opening them, there is no extra file "created". The open file then has a few methods which can be done on them which you can use to create the output you desire; although I'm not entirely sure I understand your wording. What I do understand, you want to open a file, do some stuff with its contents and then create a string of some kind, right? If that's correct, you're in luck, as its pretty easy to do that.
Comma Seperated Values passed into python from a file is extremely easy to parse into python-friendly formats such as lists, tuples and dictionaries.
As you've said, you want a function that you input the name of a file, the file is looked up, read and some stuff is done without the creation of extra files. Alright, so to do that, your code would look like this:
def file_open(filename):
new_dictionary = {}
f = open(/directory/filename, r) ##The second param is mode, here readable
for line in f: ##iterating through each comma seperated value
key,value = line.split(',') ##set the first entry before comma to key then val
new_dictionary[key] = value ##set the new_dictionary key to value
return new_dictionary ##spit that newly assembled dictionary back to us
f.close() ##Now close the file.
As you can see, there is no other file being created in this process. We just open the file on the hard drive, do some parsing to create our dictionary, and then return the dictionary for use. To set something to the dictionary that it outputs, just set a variable to the function. Just make sure you set the directory correctly, from the root of the hard drive.
CSV_dictionary = file_open(my_file) ##This sets CSV with all the info.
I hope this was helpful, if I'm not getting your problem, just answer and I'll try to help you.
-Joseph

Categories

Resources