How do I convert a .csv file to .db file using python? - python

I want to convert a csv file to a db (database) file using python. How should I do it ?

You need to find a library that helps you to parse the csv file, or read the file line by line and parse it with standard python, it could be as simple as split the line on commas.
Insert in the Sqlite database. Here you have the python documentation on SQLite. You could also use sqlalchemy or other ORM .
Another way, could be using the sqlite shell itself.

I don't think this can be done in full generality without out-of-band information or just treating everything as strings/text. That is, the information contained in the CSV file won't, in general, be sufficient to create a semantically “satisfying” solution. It might be good enough to infer what the types probably are for some cases, but it'll be far from bulletproof.
I would use Python's csv and sqlite3 modules, and try to:
convert the cells in the first CSV line into names for the SQL columns (strip “oddball” characters)
infer the types of the columns by going through the cells in the second CSV file line (first line of data), attempting to convert each one first to an int, if that fails, try a float, and if that fails too, fall back to strings
this would give you a list of names and a list of corresponding probably types from which you can roll a CREATE TABLE statement and execute it
try to INSERT the first and subsequent data lines from the CSV file
There are many things to criticize in such an approach (e.g. no keys or indexes, fails if first line contains a field that is a string in general but just so happens to contain a value that's Python-convertible to an int or float in the first data line), but it'll probably work passably for the majority of CSV files.

Related

How to import variables from another file AFTER the program has started in python?

I want to create a program that makes the user select a file, and that file will contain a series of variables (like dictionaries and lists) that the program will use. Now, I know there's the import file function but I can't use it because that's supposed to be at the beginning of the program (the user has to select which program wants to load first). The only step that's left in my program is the one that loads the variables, I know how to do the rest. Could anyone help me? Thank you in advance!
You could just use
with open(file, "r") as f:
data = f.read().splitlines()
to read the data, assuming every variable is on its own line. This will read in all variables in the file file as a list called data containing each file line as one entry.
Note that this will read all variables in as strings, so you may have to cast to different types before actually using the values.

Search for a word, and modify the whole line in Python text processing

This is my carDatabase.txt
CarID:c01 ModelName:honda VehicleType:city Price:20
CarID:c02 ModelName:honda VehicleType:x Price:30
I want to search for the carID and be only able to modify the whole line without interrupting others
my current code is here:
# Converting txt data into a string and modify
carsDatabaseFile = open('carsDatabase.txt', 'r')
allDataFromDatabase = [line.split(',') for line in carsDatabaseFile.readlines()]
Note:
Your question has a couple of issues: your sample from carDatabase.txt looks like it is tab-delimited, but your current code looks like it is splitting the line around the ',' character. This also looks like a place where a list comprehension might be hurting you more than it is helping you. Break that up into a for-loop if you're trying to add some logic to manipulate a single line.
For looking at CSV files, I would highly recommend using pandas for general manipulation of data in comma ceparated as well as a number of other formats.
That said, if you are truly restricted to only using built-in packages, or you are looking at this as a learning exercise, and your goal is to directly manipulate just one line of that file, what you are looking for is the seek method. You can use this in combination with the tell method ( documented just blow seek in the above link ) to find where you are in the file.
Write a for loop to identify which line in the file you are looking for
From there, you can get the output of tell() to find the specific place in the file you are trying to manipulate
Using the output from the above two steps, you can set the file pointer to a specific location using the seek() method (by byte: files are really stored as one dimensional).
You can now use the write() method to directly update the file at the location you determined above.

Removing weird line breaks exported from Oracle TOAD

So I'm trying to remove weird line breaks in order to read a LONG datatype field into a single field on Excel. Length of field does not matter as long as we get all the info into a single field.
After exporting the dataset from TOAD into a .txt flat file, if I open the file on Notepad, the rows are generated perfectly. However, when I open the file on Excel, weird line breaks are inserted to generate bad rows. These line breaks originate from the LONG datatype's line breaks, but I can't figure out to remove them so that I can view the good format on Excel.
I considered loading the .txt file in Python and do a "for line in file.readline" then a "line.replace("\n","")" for all the lines, but I'm not sure if the actual character is a "\n", and whether Python would read the bad line breaks like Excel as well.
Anyways, it's not a huge issue, but wanted to see if there was a quick or interesting fix out there. I could always do my analysis on the .txt file.
If those line breaks are CHR(10) and/or CHR(13), you could replace them with an empty string in SELECT, e.g.
select replace(replace(col, chr(10), ''), chr(13), '') as result
from some_table

File Reading Options Enquiry (Python)

I am a programming student for the semester. In class we have been learning about file opening, reading and writing.
We have used a_reader to achieve such tasks for file opening. I have been reading our associated text/s and I have noticed that there is a CSV reader option which I have been using.
I wanted to know if there were anymore possible ways to open/read a file as I am trying to grow my knowledge base in python and its associated contents.
EDIT:
I was referring to CSV more specifically as that is the type of files we use at the moment. We have learnt about CSV Reader and a_reader and an example from one of our lectures is shown below.
def main():
a_reader = open('IDCJAC0016_009225_1800_Data.csv', 'rU')
file_data = a_reader.read()
a_reader.close()
print file_data
main()
It may seem overly broad but I have no knowledge which is why I am asking is there more than just the 2 ways above. If there is can someone who knows provide the types so I can read up on and research on them.
If you're asking about places to store things, the first interfaces you'll meet are files and sockets (pretend a network connection is like a file, see http://docs.python.org/2/library/socket.html).
If you mean file formats (like csv), there are many! Probably you can think of many yourself, but besides csv there are html files, pictures (png, jpg, gif), archive formats (tar, zip), text files (.txt!), python files (.py). The list goes on.
There are many ways to read files in different ways.
Just plain open will take a filename and open it as a sequence of lines. Or, you can just call read() on it, and it will read the whole file at once into one giant string.
codecs.open will take a filename and a character set, and decode each line to Unicode automatically. Or, again, you can just call read() on it, and it will read and decode the whole file at once into one giant Unicode string.
csv.reader will take a file or file-like object, and read it as a sequence of CSV rows. There's no direct equivalent of read()—but you can turn any sequence into a list by just calling list on it, so list(my_reader) will give you a list of rows (each of which is, itself, a list).
zipfile.ZipFile will take a filename, or a file or file-like object, and read it as a ZIP archive. This doesn't go line by line, of course, but you can go archived file by archived file. Or you can do fancier things, like search for archived files by name.
There are modules for reading JSON and XML documents, different ways of handling binary files, and so on. Some of them work differently—for example, you can search an XML document as a tree with one module, or go element by element with a different one.
Python has a pretty extensive standard library, and you can find the documentation online. Every module that seems like it should be able to work on files, probably can.
And, beyond what comes in the standard library, PyPI, the Python Package Index has thousands of additional modules. Looking for a way to read YAML documents? Search PyPI for yaml and you'll find it.
Finally, Python makes it very easy to add things like this on your own. The skeleton of a function like csv.reader is as simple as this:
def reader(fileobj):
for line in fileobj:
yield parse_one_csv_line(line)
You can replace that parse_one_csv_line with anything you want, and you've got a custom reader. For example, here's an uppercase_reader:
def uppercase_reader(fileobj):
for line in fileobj:
yield line.upper()
In fact, you can even write the whole thing in one line:
shouts = (line.upper() for line in fileobj)
And the best thing is that, as long as your reader only yields one line at a time, your reader is itself a file-like object, so you can pass uppercase_reader(fileobj) to csv.reader and it works just fine.

python: pass string instead of file as function parameter

I am beginner in python, and I need to use some thirdparty function which basically has one input - name of a file on a hard drive. This function parses file and then proceses it.
I am generating file contents in my code (it's CSV file which I generate from a list) and want to skip actual file creation. Is there any way I can achieve this and "hack" the thirdparty function to accept my string without creating a file?
After some googling I found StringIO, and created a file object in it, now I am stuck on passing this object to a function (again, it accepts not a file object but a file name).
It looks like you'll need to write your data to a file then pass the name of that file to the 3rd party library. You might want to consider using the tempfile module to create the file in a safe and easy way.
If it requires a filename, then you're going to have to create a file. (And that's poor design on the part of the library creators.)
You should look into the python docs for I/O, seen here:
http://docs.python.org/tutorial/inputoutput.html
Python processes files by opening them, there is no extra file "created". The open file then has a few methods which can be done on them which you can use to create the output you desire; although I'm not entirely sure I understand your wording. What I do understand, you want to open a file, do some stuff with its contents and then create a string of some kind, right? If that's correct, you're in luck, as its pretty easy to do that.
Comma Seperated Values passed into python from a file is extremely easy to parse into python-friendly formats such as lists, tuples and dictionaries.
As you've said, you want a function that you input the name of a file, the file is looked up, read and some stuff is done without the creation of extra files. Alright, so to do that, your code would look like this:
def file_open(filename):
new_dictionary = {}
f = open(/directory/filename, r) ##The second param is mode, here readable
for line in f: ##iterating through each comma seperated value
key,value = line.split(',') ##set the first entry before comma to key then val
new_dictionary[key] = value ##set the new_dictionary key to value
return new_dictionary ##spit that newly assembled dictionary back to us
f.close() ##Now close the file.
As you can see, there is no other file being created in this process. We just open the file on the hard drive, do some parsing to create our dictionary, and then return the dictionary for use. To set something to the dictionary that it outputs, just set a variable to the function. Just make sure you set the directory correctly, from the root of the hard drive.
CSV_dictionary = file_open(my_file) ##This sets CSV with all the info.
I hope this was helpful, if I'm not getting your problem, just answer and I'll try to help you.
-Joseph

Categories

Resources