How to insert PDF table data into database - python

I have extracted pdf table data by using Camelot but now how can I do put my table data into my database like do I need to convert it into CSV? like is there any other way to put it into my database? and is there any other way to choose my specific tables or just put the number of the tables. cause in here I need to specify my table no. to be extracted.
def tables_extract(file_name):
filename_with_path = 'upload/media/pos/pdfs/{}'.format(file_name)
tables = camelot.read_pdf(filename_with_path, pages="1-end")
table= tables[2].df
Below is the table data in the pdf which I want to put the values into my DB

Related

Finding and inserting into table based on a string sqlalchemy

I have a function in my code that generates a bunch of tables on an API call. It looks somewhat like this:
def create_tables():
rows = connection.execute(sqlcmd)
for i, row in enumerate(rows):
# Do some work here
t = Table(f"data_{i}", metadata, *columns)
metadata.create_all()
I need another function where I iterate over the tables created in above function, then dump records in to each table from another API. Since, I'm not using declarative mapping or models in sqlalchmey, how do I identify these tables in my database and write data to specific table??
you can use the reflection system
meta.reflect(bind=someengine)
# now all located tables are present within the MetaData object’s
# dictionary of tables
table1 = meta.tables['data_1']
table1.insert().values(...)

pyodbc - write a new column of data to existing table in ms access

I have a ms access db I've connected to with (ignore the ... in the drive name, it's working):
driver = 'DRIVER={...'
con = pyodbc.connect(driver)
cursor = con.cursor()
I have a pandas dataframe which is exactly the same as a table in the db except there's an additional column. Basically I pulled the table with pyodbc, merged it with external excel data to add this additional column, and now want to push the data back to the ms access table with the new column. The pandas df containing the new information is merged_df['Item']
Trying things like below does not work, I've had a variety of errors.
cursor.execute("insert into ToolingData(Item) values (?)", merged_df['Item'])
con.commit()
How can I push the new column to the original table? Can I just write over the entire table instead? Would that be easier? Since merged_df is literally the same thing with the addition of one new column.
If the target MS Access table does not already contain a field to house the data held within the additional column, you'll first need to execute an alter table statement to add the new field.
For example, the following will add a 255-character text field called item to the table ToolingData:
alter table ToolingData add column item text(255)

Python - link database query to python dataframe

I have a spreadsheet with seven thousand rows of user ids. I need to query a database table and return results matching the ids in the spreadsheet.
My current approach is to read the entire database table into a pandas data frame and then merge with another data frame created from the spreadsheet. I'd prefer not to read the entire table into memory due to it's size. Is there anyway to do this without reading in the entire table? In Access or SAS, I could write a query that links the locally created table (i.e. created from spreadsheet) with the database table.
Current code that reads entire table into memory
# read spreadsheet
external_file = pd.read_excel("userlist.xlsx")
# query
qry = "select id,term_code,group_code from employee_table"
# read table from Oracle database
oracle_data = pd.read_sql(qry,connection)
# merge spreadsheet with oracle data
df = pd.merge(external_file,oracle_data,on=['id','term_code'])
I realize the following isn't possible but I would like to be able to query the database like this where "external_file" is a data frame created from my spreadsheet (or at least find an equivalent solution):
query = """
select a.id,
a.term_code,
a.group_code
from employee_table a
inner join external_file b on a.id = b.id and a.term_code=b.term_code
"""
i think you could use a xlwing (https://www.xlwings.org) to create a function that reads the id columns and create the query you want

python script for Comparing data from two tables from mysql and storing the results in excel

table1
here having more information based on condition fetching the data and developers will be passing the data table2
select * from INS where PROJ_NO>='7000-00'
table 2
Here needs to check all the data,it should not contain other data which is not mentioned in the condition,if contain has to show error like extra column has there and finally results will be stored in excel
select LOG_PROJ_NO from INNOTAS_PROJ_DTA_LOG(NOLOCK)
based on proj_no needs to check

BigQuery insert job instead of streaming

I am currently using BigQuery's stream option to load data into tables. However, tables that have date partition on do not show any partitions... I am aware of this being an effect of the streaming.
The Python code I use:
def stream_data(dataset_name, table_name, data):
bigquery_client = bigquery.Client()
dataset = bigquery_client.dataset(dataset_name)
table = dataset.table(table_name)
# Reload the table to get the schema.
table.reload()
rows = data
errors = table.insert_data(rows)
if not errors:
print('Loaded 1 row into {}:{}'.format(dataset_name, table_name))
else:
print('Errors:')
print(errors)
Will date partitioned tables eventually show and if no, how can I create an insert job to realize this?
Not sure what you mean by "partitions not being shown" but when you create a partitioned table you will only see one single table.
The only difference here is that you can query in this table for date partitions, like so:
SELECT
*
FROM
mydataset.partitioned_table
WHERE
_PARTITIONTIME BETWEEN TIMESTAMP('2016-12-25')
AND TIMESTAMP('2016-12-31');
As you can see in this example, partitioned tables have the meta column _PARTITIONTIME and that's what you use to select the partitions you are interested in.
For more info, here are the docs explaining a bit more about querying data in partitioned tables.

Categories

Resources