I would like to store CSV files in SQL Server. I've created a table with column "myDoc" as varbinary(max). I generate the CSV's on a server using Python/Django. I would like to insert the actual CSV (not the path) as a BLOB object so that I can later retrieve the actual CSV file.
How do I do this? I haven't been able to make much headway with this documentation, as it mostly refers to .jpg's
https://msdn.microsoft.com/en-us/library/a1904w6t(VS.80).aspx
Edit:
I wanted to add that I'm trying to avoid filestream. The CSVs are too small (5kb) and I don't need text search over them.
Not sure why you want varbinary over varchar, but it will work either way
Insert Into YourTable (myDoc)
Select doc = BulkColumn FROM OPENROWSET(BULK 'C:\Working\SomeXMLFile.csv', SINGLE_BLOB) x
Related
I have a table where I wrote 1.6 million records, and each has two columns: an ID, and a JSON string column.
I want to select all of those records and write the json in each row as a file. However, the query result is too large, and I get the 403 associated with that:
"403 Response too large to return. Consider specifying a destination table in your job configuration."
I've been looking at the below documentation around this and understand that they recommend specifying a table for the results and viewing them there, BUT all I want to do is select * from the table, so that would effectively just be copying it over, and I feel like I would run into the same issue querying that result table.
https://cloud.google.com/bigquery/docs/reference/standard-sql/introduction
https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationQuery.FIELDS.allow_large_results
What is the best practice here? Pagination? Table sampling? list_rows?
I'm using the python client library as stated in the question title. My current code is just this:
query = f'SELECT * FROM `{project}.{dataset}.{table}`'
return client.query(query)
I should also mention that the IDs are not sequential, they're just alphanumerics.
The best practice and efficient way is to export your data and then download it instead of querying the whole table (SELECT *).
From there, you may extract your needed data from the exported files (eg. CSV, JSON, etc) using python code without having to wait for your code to finish the SELECT * query.
I am currently using SQL Alchemy to create a database and I want to enable users to update the database by using an API end point by uploading a csv file If ever necessary.
This is the CSV file.
The column names are the same as my database, My API end points are working and I can receive the CSV through a jsonified version of the CSV. I was wondering how to I update my database if say there's a change in the construction programme for a given date.
Thank you in advance ( It's my first time posting, forgive me for the lack of details and explaination)
I have a text file (CSV) which acts as a database for my application formatted as follows:
ID(INT),NAME(STRING),AGE(INT)
1,John,23
2,Paul,34
3,Jack,12
Before you ask, I cannot get away from a CSV text file (imposed) but I can remove/change the first row (header) into another format or into another file all together (I added it to keep track of the schema).
When I start my application I want to read-in all the data in-memory so I can then query it and change it and stuff. I need to extra:
- Schema (column names and types)
- Data
I am trying to figure out what would be the best way to store this in memory using Python (very new to the language and its constructs) - any suggestions/recommendations?
Thanks,
If you use a Pandas DataFrame you can query it like it was an SQL table, and read it directly from CSV and write it back out as well. I think that this is the best option for you. It's very fast and performant, and builds on solid, proven technologies.
I'm looking at a way to access a csv file's cells in a random fashion. If I use Python's csv module, I can only iterate through all lines which is rather slow. I should also add that the file is pretty large (>100MB) and that I'm looking at short response time.
I could preprocess the file into a different data format for faster row/column access. Perhaps someone has done this before and can share some experiences.
Background:
I'd like to show an extract of the csv on screen provided by a web server (depending on scroll position). Keeping the file in memory is not an option.
I have found SQLite good for this sort of thing. It is easy to set up and you can store the data locally, but you also get easier control over what you select than csv files and you get the facility to add indexes etc.
There is also a built in facility for loading csv files into a table: http://www.sqlite.org/cvstrac/wiki?p=ImportingFiles.
Let me know if you want any further details on the SQLite route i.e. how to create the table, load the data in or query it from Python.
SQLite Instructions to load .csv file to table
To create a database file you can just add the filename required as an argument when opening SQLite. Navigate to the directory containing the csv file from the command line (I am assuming here that you want the SQLite .db file to be contained in the same dir). If using Windows add SQLite to your PATH environment variable if not already done, (instructions here if you need them) and open SQLite as follows with an argument for the name that you want to give your database file e.g.:
sqlite3 example.db
Check the database file has been created by entering:
.databases
Create a table to hold the data. I am using an example for a simple customer table here. If data types are inconsistent for any columns use text:
create table customers (ID integer, Title text, Forename text, Surname text, Postcode text, Addr_Line1 text, Addr_Line2 text, Town text, County text, Home_Phone text, Mobile text, Comments text);
Specify the separator to be used:
.separator ","
Issue the command to import the data, the sytnax takes the form .import filename.ext table_name e.g.:
.import cust.csv customers
Check that the data has loaded in:
select count(*) from customers;
Add an index for columns that you are likely to filter on (full syntax described here) e.g.:
create index cust_surname on customers(surname);
You should now have fast access to the data when filtering on any of the indexed columns. To leave SQLite use .exit, to get a list of other helpful non-SQL commands use .help.
Python Alternative
Alternatively if you want to stick with pure Python and pre-process the file then you could load the data into a dictionary which would allow much faster access to the data as the dictionary keys behave like an index meaning that you can get to values associated with a key quickly without going through the records one by one. I would need further details of your input data and what fields the lookups would be based on to provide further details on how to implement this.
However, unless you will know in advance when the data will be required (to be able to pre-process the file before the request for data) then you would still have the overhead of loading the file from disk into memory every time you run this. Depending on your exact usage this may make the database solution more appropriate.
I want to write a python script that populates a database with some information. One of the columns in my table is a BLOB that I would like to save a file to for each entry.
How can I read the file (binary) and insert it into the DB using python? Likewise, how can I retrieve it and write that file back to some arbitrary location on the hard drive?
thedata = open('thefile', 'rb').read()
sql = "INSERT INTO sometable (theblobcolumn) VALUES (%s)"
cursor.execute(sql, (thedata,))
That code of course works as written only if your table has just the BLOB column and what
you want to do is INSERT, but of course you could easily tweak it to add more columns,
use UPDATE instead of INSERT, or whatever it is that you exactly need to do.
I'm also assuming your file is binary rather than text, etc; again, if my guesses are
incorrect it's easy for you to tweak the above code accordingly.
Some kind of SELECT on cursor.execute, then some kind of fetching from the cursor, is how you
retrieve BLOB data, exactly like you retrieve any other kind of data.
You can insert and read BLOBs from a DB like every other column type. From the database API's view there is nothing special about BLOBs.