create table with variable number of columns in sql - python

I'm scraping one site.
And there are several tables that represent attributes of one observation.
I wonder if it is useful to put images in this post because It's Korean alphabet.
I insert explanation image.
There are many tables. I will reshape those table into one table, which will be one record and many fields.
But I got a problem.
A few tables have variable numbers of columns.
I'd like to store those data in sql.
From what I know, sql table has fixed numbers of fields.
Do you have a solution what I have to search??
Here is the link. http://goodauction.land.naver.com/auction/ca_view.php?product_id=1698750&class1=5&ju_price1=&ju_price2=&bi_price1=&bi_price2=&num1=&num2=&lawsup=0&lesson=0&next_biddate1=&next_biddate2=&state=91&b_count1=0&b_count2=0&b_area1=&b_area2=&special=0&e_area1=&e_area2=&si=11&gu=0&dong=0&apt_no=0&order=&start=0&total_record_val=&detail_search=&detail_class=1&recieveCode=
Those variables in table in this link indicate the winning bid, number of floors in apartment, size of the area, use of floor, and so on
And Do you recommend some sites to me in which I learn to scrape the table consisting of cells spanning multiple rows and columns using python.

If you have a table appartment, you need a table floor related to appartment

Related

How to select columns that built on each other in SQL?

how can I read entries in SQL where the column entries build on each other.
In the first column is the material number. These again consist of different materials, with their own material number in the first column.
How can I find out e.g. from material 123 the quantity of all sub-materials.
I have loaded the data from an Exeltabbele into a Panda database and then loaded this into a SQL database.
Should I use loops in python, to iterate over the table? Or is there a smarter way?

Strategy for creating pivot tables that collapse with large data sets

I'm new to the community and I only recently started to use Python and more specifically Pandas.
The data set I have I would like the columns to be the date. For each Date I would like to have a customer list that then breaks down to more specific row elements. Everything would be rolled up by an order number, so a distinct count on an order number because sometimes a client purchases more than 1 item. In excel I create a pivot table and process it by distinct order. Then I sort each row element by the distinct count of the order number. I collapse each row down until I just have the client name. If I click to expand the cell then I see each row element.
So my question: If I'm pulling in these huge data sets as a dataframe can I pull in xlsx in as an array? I know it will strip the values, so I would have to set the datetime as a datetime64 element. I've been trying to reshape the array around the date being column, and the rows I want but so far I haven't had luck. I have tried to use pivot_table and groupby with some success but I wasn't able to move the date to the column.
Summary: Overall what I'm looking to know is am I going down the wrong rabbit hole together? I'm looking to basically create a collapsible pivot table with specific color parameters for the table as well so that the current spreadsheet will look identical to the one I'm automating.
I really appreciate any help, as I said I'm brand new to Pandas so direction is key. If I know I'm onto the "best" way of dealing with the export to excel after I've imported and modified the spreadsheet. I get a single sheet of raw data kicked out in .xlsx form. Thanks again!

SQLite indexing groups of rows?

I have a large set of city, state pairs I'm loading into a SQLite table. I will be querying the city and will know the state. Suppose I want to look for a particular city that I know is in Texas. The following query is roughly O(n) notwithstanding the limit, right?
SELECT * FROM cities WHERE state_abbr=? LIMIT 1
Is there some way of grouping the rows by state or creating a secondary index or something so that SQLite knows where to the find the 'TX' rows and only search within them? I've considered creating separate tables for each state-- and that's an option-- but I'm hoping I can just do something within this single table to make the queries more efficient.
In the tutorials I've read, the query doesn't change after creating a composite index. Is SQLite just using the index under the hood on the same query?
Why not just have a composite index?
create index cities_state_abbr_city on cities(state, city);

Getting the Raw Data Out of an Excel Pivot Table in Python

I have a pivot table in excel that I want to read the raw data from that table into python. Is it possible to do this? I do not see anything in the documentation on it or on Stack Overflow.
If the community could be provided some examples on how to read the raw data that drives pivot tables, this could greatly assist in routine analytical tasks.
EDIT:
In this scenario there are no raw data tabs. I want to know how to ping the pivot table get the raw data and read it into python.
First, recreate raw data from the pivot table. The pivot table has full information to rebuild the raw data.
Make sure that none of the items in the pivot table fields are hidden -- clear all the filters and Slicers that have been applied.
The pivot table does not need to contain all the fields -- just make sure that there is at least one field in the Values area.
Show the grand totals for rows and columns. If the totals aren't visible, select a cell in the pivot table, and on the Ribbon, under PivotTable Tools, click the Analyze tab. In the Layout group, click Grand totals, then click On for Rows and Columns.
Double-click the grand total cell at the bottom right of the pivot table. This should create a new sheet with the related records from the original source data.
Then, you could read the raw data from the source.

Inserting dataframe into a normalized Table

having a database with normalized tables, in which most fields are foreign keys relating to other tables, how does one go about inserting a plain-and-simple DataFrame? That Dataframe, of course, holds the actual representations of the values e.g. Names, Cities, Item Names etc. The tables themselves, mostly expect a serial ID.
Does Pandas contain some functions which make such inserts easy? Or should one go to SQLAlchemy perhaps?
Thanks!

Categories

Resources