bsddb and reprepro (berkeley) database - python

I'm trying to read the database files created by reprepro. I don't have that much experience with bdb, so I might be confused here, but it looks like the database is layered in some way.
If I simply do btopen('path/to/packages.db', 'r'), I get the database object with contents like:
In [4]: packages.items()
Out[4]:
[('local-lenny|main|amd64', '\x00\x00\x00\x04'),
('local-lenny|main|i386', '\x00\x00\x00\x02'),
('local-lenny|main|powerpc', '\x00\x00\x00\x14'),
('local-lenny|main|source', '\x00\x00\x00\x06'),
('local-lenny|main|sparc', '\x00\x00\x00\x12')]
However the db4.6_dump shows:
VERSION=3
format=bytevalue
database=local-lenny|main|sparc
type=btree
db_pagesize=4096
HEADER=END
<loads of data>
The file itself is identified as: /var/packages/db/packages.db: Berkeley DB (Btree, version 9, native byte-order) by file.
How do I get to that contents? If I understand it correctly, I got only the names of actual databases in keys(). How do I get to the contents of those dbs now?

And the answer seems to be that the "nice" version of the bsddb interface doesn't support multi btree tables inside one file. You can open such table explicitly via bsddb.db, using:
env = db.DBEnv()
env.open(None, db.DB_CREATE | db.DB_INIT_MPOOL)
internal_db = db.DB(env)
internal_db.open("the filename", "the internal db name", db.DB_BTREE, db.DB_RDONLY)

Related

How can I upload a .csv file with python to Cassandra?

I want to import a csv file in cassandra using python script. I don't know how
If you're looking for a simple solution, you could always use cqlsh's COPY utility.
> COPY myTable (col1, col2, col3, col4) FROM 'temp.csv' WITH HEADER=true;
I'd go with either COPY or DSBulk before building something new in Python. In fact, cqlsh uses the Python driver and is already built to handle things like paging, batch sizes, timeouts, etc.
Documentation: COPY FROM
Edit 20210903
If you're set on querying w/ CQL and processing a result set in Python, you'll want to do something like this...
The import section will look something like this:
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
from cassandra.query import SimpleStatement
First establish your connection:
auth_provider = PlainTextAuthProvider(username=username, password=password)
cluster = Cluster(nodes,auth_provider=auth_provider)
session = cluster.connect()
Then build your query as a SimpleStatement.
strCQL = f"SELECT * FROM {keyspace}.{table}"
print(strCQL)
statement = SimpleStatement(strCQL,fetch_size=100)
rows = session.execute(statement)
for row in rows:
print(row)
Note that you can also print individual column values with their ordinal index on row (row[0],row[1], etc).
In the above example, I'm setting the fetch size to 100. It defaults to 5000, but if the result set is large, you'll want that to be smaller to avoid timeouts.
Link to my Git repo for reference.
You can use the DataStax Bulk Loader tool (DSBulk) to bulk load data in CSV format to a Cassandra table.
Here are some references with examples to help you get started quickly:
Blog - DSBulk Intro + Loading data
Blog - More DSBulk Loading examples
Blog - Counting records with DSBulk
Docs - Loading data examples
Answered questions - DS Community
DSBulk is open-source so it's free to use. Cheers!

Reading sas7bdat files with Python's adodbapi

I'm trying to read a sas7bdat file from SAS (product of the SAS Institute) into Python.
Yes, I'm a aware that we could export to *.csv files, but I'm trying to avoid that as that will double the number of files we need to create.
There's good documentation for doing this in Visual Basic. Still, I want it in Python. For example, in VB you could write...
Dim cn as ADODB.Connection
Dim rs as ADODB.Recordset
obConnection.Provider = "sas.LocalProvider"
obConnection.Properties("Data Source") = "c:\MySasData"
obConnection.Open
rs.Open "work.a", cn, adOpenStatic, adLockReadOnly, adCmdTableDirect
To open your dataset.
But I can't crack the nut to make this work in python.
I can type...
import adodbapi
cnstr = 'Provider=sas.LocalProvider;c:\\MySasData'
cn = adodbap.connect(cnstr)
And a can get a cursor...
cur = cn.cur()
But beyond that, I'm stumped. I did find a cur.rs, which sounds like a recordset, but it is an object with a type of None.
Also, to preempt some alternative methods...
I do not want to create *.csv files in SAS.
The computer with Python does not have SAS installed, but does have the Providers for OLE DB installed. I know for a fact that the VB code I provided works without SAS in read-only mode. You can download these drivers here: http://support.sas.com/downloads/browse.htm?cat=64
I am not expert in SAS. Honestly, I find their tool cumbersome, confusingly documented, and slow. I noticed that there are some other products listed called "IOMProvider" and "SAS/SHARE". If there's an easier way of doing this using those ADO providers, feel free to document it. However, what I'm really looking for is a way of doing this entirely within Python with a relatively simple bit of code.
Oh, and I'm aware of Python's sas7bdat package, but we're using Python 3.3.5 and it doesn't seem to be compatible. Also, I couldn't figure out how to use it on 2.7 anyways as there's not a lot of documentation and even a question on how to use the tool, which, to this day, is unanswered. Python sas7bdat module usage
Thanks!
Didn't test it with SAS as I don't have a provider installed currently, it should go like this:
cn = adodbapi.connect(cnstr)
# print table names in current db
for table in cn.get_table_names():
print(table)
with cn.cursor() as c:
#run an SQL statement on the cursor
sql = 'select * from your_table'
c.execute(sql)
#get the results
db = c.fetchmany(5)
#print them
for rec in db:
print(rec)
cn.close()
EDIT:
Just found this http://support.sas.com/kb/30/795.html so you might need to use other provider for this method, have a look at IOM privoder (https://www.connectionstrings.com/sas-iom-provider/ , http://support.sas.com/documentation/tools/oledb/gs_iom_tasks.htm)

Database not consistent when accessed from different source files

I have a program in Python composed of 4 source files. One of them is the main file which imports the other 3. As I work with a small Sqlite database, I am creating tables in one of the "secondary" source files, but when I access the database again from the main source file, the tables just populated before are empty.
Can I save the tables' content in a more consistent way? I am quite surprised with what is happening.
So in the main file I typed:
conn = sqlite3.connect("bayes.db")
cur = conn.cursor()
cur.execute("select count(*) from TableA")
print cur.fetchone()
The result is 0 (rows).
Just before, in another source file I do the same thing and get size=8 of TableA.
You must call the commit function in order to save your changes in the database. You can see the full documentation here: http://docs.python.org/2/library/sqlite3.html#sqlite3.Connection.commit

How to convert, sort and save to CSV MS Access database .mdb file in Python

I tried researching the answer but was not able to find a good solution. I have files with strange extensions .res. I was told that they are MS Access files. Not sure if they are the same as .mdb but I was able to open them in MS Access. How can I open those files, extract necessary data, sort that data and produce .csv file? I tried using this script: http://mazamascience.com/WorkingWithData/?p=168 and mdb tools on Linux. I got some output with errors in terminal but all the files produced were blank. It could be due to encoding. I am not sure. The file is in ASCII encoding I think.
Error: Table fo_Table
Smart_Battery_Data_Table
MCell_Aci_Data_Table
Aux_Global_Data_Table
Smart_Battery_Clock_Stretch_Table
does not exist in this database.
On Windows I have no idea how to do it. My first step for now is just to dump the necessary table from that database file into .csv. But ideally I need the script to take the file, sort it, extract necessary data, do some calculations (like data in one column divided by data in another column) and save all that stuff into nice .csv.
Thanks a lot. I am not an experienced programmer so please have mercy.
Using the generic pyodbc library should do it. Looks like it has already an embedded MS access driver. This question can probably help you out.
I dont have any MS Access database files with me (It has been ages that I dont have to work with them), but following the examples your code should be something like this:
import pyodbc
db_file = r'''/path/to/the/file.res'''
user = 'admin'
password = 'password'
odbc_conn_str = 'DRIVER={Microsoft Access Driver (*.mdb)};DBQ=%s;UID=%s;PWD=%s' % (db_file, user, password)
conn = pyodbc.connect(odbc_conn_str)
cursor = conn.cursor()
cursor.execute("select * from table order by some_column")
for row in cursor.fetchall():
print ", ".join((row.column1, row.column2, row.columnN))

Reading DBF files with pyodbc

In a project, I need to extract data from a Visual FoxPro database, which is stored in dbf files, y have a data directory with 539 files I need to take into account, each file represents a database table, so I've been doing some testing and my code goes like this:
import pyodbc
connection = pyodbc.connect("Driver={Microsoft Visual FoxPro Driver};SourceType=DBF;SourceDB=P:\\Data;Exclusive=No;Collate=Machine;NULL=No;DELETED=Yes")
tables = connection.cursor().tables()
for _ in tables:
print _
this prints only 15 tables, with no obvious pattern, always the same 15 tables, I thought this was because the rest of the tables were empty but I checked and it some of the tables (dbf files) on the list are empty too, then, I thought it was a permission issue, but all the files have the same permission structure, so, I don't know what's happening here.
Any light??
EDIT:
It is not truccating the output, the tables it list are not the 15 first or anything like that
I DID IT!!!!
There where several problems with what I was doing so, here I come with what I did to solve it (after implementing it the first time with Ethan Furman's solution)
The first thing was a driver problem, it turns out that the Windows' DBF drivers are 32 bits programs and runs on a 64 bits operating system, so, I had installed Python-amd64 and that was the first problem, so I installed a 32bit Python.
The second issue was a library/file issue, according to this, dbf files in VFP > 7 are diferent, so my pyodbc library won't read them correctly, so I tried some OLE-DB libraries with no success and I decided to to it from scratch.
Googling for a while took me to this post which finally gave me a light on this
Basically, what I did was the following:
import win32com.client
conn = win32com.client.Dispatch('ADODB.Connection')
db = 'C:\\Profit\\profit_a\\ARMM'
dsn = 'Provider=VFPOLEDB.1;Data Source=%s' % db
conn.Open(dsn)
cmd = win32com.client.Dispatch('ADODB.Command')
cmd.ActiveConnection = conn
cmd.CommandText = "Select * from factura, reng_fac where factura.fact_num = reng_fac.fact_num AND factura.fact_num = 6099;"
rs, total = cmd.Execute() # This returns a tuple: (<RecordSet>, number_of_records)
while total:
for x in xrange(rs.Fields.Count):
print '%s --> %s' % (rs.Fields.item(x).Name, rs.Fields.item(x).Value)
rs.MoveNext() #<- Extra indent
total = total - 1
And it gave me 20 records which I checked with DBFCommander and were OK
First, you need to install pywin32 extensions (32bits) and the Visual FoxPro OLE-DB Provider (only available for 32bits), in my case for VFP 9.0
Also, it's good to read de ADO Documentation at the w3c website
This worked for me. Thank you very much to those who replied
I would use my own dbf package and the code would go something like this:
import dbf
from glob import glob
for dbf_file in glob(r'p:\data\*.dbf'):
with dbf.Table(dbf_file) as table:
for record in table:
do_something_with(record)
A table is list-like, and iteration through it returns records. A record is list-, dict-, and obj-like, and iteration returns the values; besides iteration through the record, individual fields can be accessed either by offset (record[0] for the first field), by field-name using dict-like access (record['some_field']), or by field-name using obj.attr-like access (record.some_field).
If you just wanted to dump the contents of each dbf file into a csv file you could do:
for dbf_file in glob(r'p:\data\*.dbf'):
with dbf.Table(dbf_file) as table:
dbf.export(table, dbf_file)
I know this doesn't directly answer your question, but might still help. I've had lots of issues using ODBC with VFP databases and I've found it's often much easier treating the VFP tables as free tables when possible.
Using Yusdi Santoso's dbf.py and glob, here's some code to open each table in a directory and run through each record.
import glob
import os
import dbf
os.chdir("P:\\data")
for file in glob.glob("*.dbf"):
table = dbf.readDbf(file)
for row in table:
#do stuff

Categories

Resources