I want to insert data from a dictionary into a sqlite table, I am using slqalchemy to do that, the keys in the dictionary and the column names are the same, and I want to insert the values into the same column name in the table. So this is my code:
#This is the class where I create a table from with sqlalchemy, and I want to
#insert my data into.
#I didn't write the __init__ for simplicity
class Sizecurve(Base):
__tablename__ = 'sizecurve'
XS = Column(String(5))
S = Column(String(5))
M = Column(String(5))
L = Column(String(5))
XL = Column(String(5))
XXL = Column(String(5))
o = Mapping() #This creates an object which is actually a dictionary
for eachitem in myitems:
# Here I populate the dictionary with keys from another list
# This gives me a dictionary looking like this: o={'S':None, 'M':None, 'L':None}
o[eachitem] = None
for eachsize in mysizes:
# Here I assign values to each key of the dictionary, if a value exists if not just None
# product_row is a class and size and stock are its attributes
if(product_row.size in o):
o[product_row.size] = product_row.stock
# I put the final object into a list
simplelist.append(o)
Now I want to put each the values from the dictionaries saved in simplelist into the right column in the sizecurve table. But I am stuck I don't know how to do that? So for example I have an object like this:
o= {'S':4, 'M':2, 'L':1}
And I want to see for the row for column S value 4, column M value 2 etc.
Yes, it's possible (though aren't you missing primary keys/foreign keys on this table?).
session.add(Sizecurve(**o))
session.commit()
That should insert the row.
http://docs.sqlalchemy.org/en/latest/core/tutorial.html#executing-multiple-statements
EDIT: On second read it seems like you are trying to insert all those values into one column? If so, I would make use of pickle.
https://docs.python.org/3.5/library/pickle.html
If performance is an issue (pickle is pretty fast, but if your doing 10000 reads per second it'll be the bottleneck), you should either redesign the table or use a database like PostgreSQL that supports JSON objects.
I have found this answer to a similar question, though this is about reading the data from a json file, so now I am working on understanding the code and also changing my data type to json so that I can insert them in the right place.
Convert JSON to SQLite in Python - How to map json keys to database columns properly?
Related
Hi im trying to create an abstract methode that gets an Object of my Database model i want to use and then can assign data to some specific columns inside of it.
The names of the columns i get as a List of Strings from another table.
models is my dictionary filled with my models as values and the tablenames as their keys
db = session.query(Model_tables).filter(Model_tables.ID == 4).all()
for table in db:
x = str(table.table_name)
if x in models:
table = models.get(x)
new = table(column_strings[0]=1, column_strings[1]=2)
session.add(new)
session.commit()
Is there a way of doing it like this?
Man, was I having trouble with how to word the title.
Summary: For a database project in uni, we have to import 1 million rows of data into a database, where each row represents an article scraped from the internet. One of the columns in this data, is the author of the article. As many articles were written by the same author, I wanted to create a table separate from the articles, that linked each unique author to a primary key and then I wanted to replace the author string in the article table, with the key for that author in the other table. How is this done in the most efficient way and is it possible to do it in a way that ensures a deterministic output, in that a specific author string would ALWAYS map to a certain pkey, no matter the order the article rows "come in" when this method creates that table.
What I've done: The way I did it, was to (in Python using Pandas), go through all 1 million article rows and make a unique list of all the authors I found. Then I created a dictionary based on this list (sorted). Then I used this dictionary to replace the author string in the articles tables with a key corresponding to a specific author, and then used the dict to create my authors table. However, as I see it, if a row was inserted into my data with an author not found the first time around, it could mess with the alphabetical order my method adds the authors to the dict in, thus making it not-so-deterministic. So, what do people normally do in these instances? Can SQL on the 1mio articles directly make a new authors table with unique authors and keys, and replace the author string in the articles table? Could it be an idea to use hashing with a specific hash key to ensure a certain string always maps to a certain key, or?
Show some code:
def get_authors_dict():
authors_lists = []
df = pd.read_csv("1mio-raw.csv", usecols=['authors'], low_memory=True)
unique_authors_list = df['authors'].unique()
num_of_authors = len(unique_authors_list)
authors_dict = {}
i = 0
prog = 0
for author in unique_authors_list:
try:
authors_dict[author]
i += 1
except KeyError:
authors_dict[author] = i
i += 1
print(prog / num_of_authors * 100, "%")
prog += 1
return authors_dict
authors_dict = get_authors_dict()
col1_author_id = list(authors_dict.values())
col2_author_name = list(authors_dict.keys())
data_dict = {'col1': col1_author_id,
'col2': col2_author_name}
df = pd.DataFrame(data=data_dict, columns=['col1', 'col2'])
df.to_csv('author.csv', index=False, header=False, sep="~")
f = open('author.csv', encoding="utf8")
conn = psycopg2.connect(--------)
cur = conn.cursor()
cur.copy_from(f, 'author', sep='~')
conn.commit()
cur.close()
# Processing all the 1mio rows again in seperate file
# and making changes to the dataframe using the dict:
sample_data['authors'] = sample_data['authors'].map(authors_dict)
So if i understand you correctly, you want to create a SQL table which connects authors to articles. Your problem is, that you do not know what primary key you should use in such a table, since an author might have written more than one article.
In this cases, instead of trying to do something clever, i would just use a composite primary key for your table. This means you define the author row in connection with the title/publishing date/identfier of the article as the primary key for the table. Thus, each row of your table has a unique identifier (if no author has written two identical articles). This is independent of your python code, as this needs to be defined in the database.
This question might help you to define a composite primary key.
I have a datamodel where I store a list of values separated by comma (1,2,3,4,5...).
In my code, in order to work with arrays instead of string, I have defined the model like this one:
class MyModel(db.Model):
pk = db.Column(db.Integer, primary_key=True)
__fake_array = db.Column(db.String(500), name="fake_array")
#property
def fake_array(self):
if not self.__fake_array:
return
return self.__fake_array.split(',')
#fake_array.setter
def fake_array(self, value):
if value:
self.__fake_array = ",".join(value)
else:
self.__fake_array = None
This works perfect and from the point of view of my source code "fake_array" is an array, It's only transformed into string when it's stored in database.
The problem appears when I try to filter by that field. Expressions like this doesn't work:
MyModel.query.filter_by(fake_array="1").all()
It seems that I cant filter using the SqlAlchemy query model.
What can I do here? Is there any way to filter this kind of fields? Is there is a better pattern for the "fake_array" problem?
Thanks!
What you're trying to do should really be replaced with a pair of tables and a relationship between them.
The first table (which I'll call A) contains everything BUT the array column, and it should have a primary key of some sort. You should have another table (which I'll call B) that contains a primary key, a foreign key column to A (which I'll call a_id, and an integer field.
Using this layout, each row in the A table has its associated array in table B where B's a_id == A.id via a join. You can add or remove values from the array by manipulating the rows in table B. You can filter by using a join.
If the order of the values is needed, then create an order column in table B.
I have created a database with multiple columns and am wanting to use the data stored in two of the columns (named 'cost' and 'Mwe') to create a new column 'Dollar_per_KWh'. I have created two lists, one contains the rowid and the other contains the new value that I want to populate the new Dollar_per_KWh column. As it iterates through all the rows, the two lists are zipped together into a dictionary containing tuples. I then try to populate the new sqlite column. The code runs and I do not receive any errors. When I print out the dictionary it looks correct.
Issue: the new column in my database is not being updated with the new data and I am not sure why. The values in the new column are showing 'NULL'
Thank you for your help. Here is my code:
conn = sqlite3.connect('nuclear_builds.sqlite')
cur = conn.cursor()
cur.execute('''ALTER TABLE Construction
ADD COLUMN Dollar_per_KWh INTEGER''')
cur.execute('SELECT _rowid_, cost, Mwe FROM Construction')
data = cur.fetchall()
dol_pr_kW = dict()
key = list()
value = list()
for row in data:
id = row[0]
cost = row[1]
MWe = row[2]
value.append(int((cost*10**6)/(MWe*10**3)))
key.append(id)
dol_pr_kW = list(zip(key, value))
cur.executemany('''UPDATE Construction SET Dollar_per_KWh = ? WHERE _rowid_ = ?''', (dol_pr_kW[1], dol_pr_kW[0]))
conn.commit()
Not sure why it isn't working. Have you tried just doing it all in SQL?
conn = sqlite3.connect('nuclear_builds.sqlite')
cur = conn.cursor()
cur.execute('''ALTER TABLE Construction
ADD COLUMN Dollar_per_KWh INTEGER;''')
cur.execute('''UPDATE Construction SET Dollar_per_KWh = cast((cost/MWe)*1000 as integer);''')
It's a lot simpler just doing the calculation in SQL than pulling data to Python, manipulating it, and pushing it back to the database.
If you need to do this in Python for some reason, testing whether this works will at least give you some hints as to what is going wrong with your current code.
Update: I see a few more problems now.
First I see you are creating an empty dictionary dol_pr_kW before the for loop. This isn't necessary as you are re-defining it as a list later anyway.
Then you are trying to create the list dol_pr_kW inside the for loop. This has the effect of over-writing it for each row in data.
I'll give a few different ways to solve it. It looks like you were trying a few different things at once (using dict and list, building two lists and zipping into a third list, etc.) that is adding to your trouble, so I am simplifying the code to make it easier to understand. In each solution I will create a list called data_to_insert. That is what you will pass at the end to the executemany function.
First option is to create your list before the for loop, then append it for each row.
dol_pr_kW = list()
for row in data:
id = row[0]
cost = row[1]
MWe = row[2]
val = int((cost*10**6)/(MWe*10**3))
dol_pr_kW.append(id,val)
#you can do this or instead change above step to dol_pr_kW.append(val,id).
data_to_insert = [(r[1],r[0]) for r in dol_pr_kW]
The second way would be to zip the key and value lists AFTER the for loop.
key = list()
value = list()
for row in data:
id = row[0]
cost = row[1]
MWe = row[2]
value.append(int((cost*10**6)/(MWe*10**3)))
key.append(id)
dol_pr_kW = list(zip(key,value))
#you can do this or instead change above step to dol_pr_kW=list(zip(value,key))
data_to_insert = [(r[1],r[0]) for r in dol_pr_kW]
Third, if you would rather keep it as an actual dict you can do this.
dol_pr_kW = dict()
for row in data:
id = row[0]
cost = row[1]
MWe = row[2]
val = int((cost*10**6)/(MWe*10**3))
dol_pr_kW[id] = val
# convert to list
data_to_insert = [(dol_pr_kW[id], id) for id in dol_per_kW]
Then to execute call
cur.executemany('''UPDATE Construction SET Dollar_per_KWh = ? WHERE _rowid_ = ?''', data_to_insert)
cur.commit()
I prefer the first option since it's easiest for me to understand what's happening at a glance. Each iteration of the for loop just adds a (id, val) to the end of the list. It's a little more cumbersome to build two lists independently and zip them together to get a third list.
Also note that if the dol_pr_kW list had been created correctly, passing (dol_pr_kW[1],dol_pr_kW[0]) to executemany would pass the first two rows in the list instead of reversing (key,value) to (value,key). You need to do a list comprehension to accomplish the swap in one line of code. I just did this as a separate line and assigned it to variable data_to_insert for readability.
I have a database table with multiple fields which I am querying and pulling out all data which meets certain parameters. I am using psycopg2 for python with the following syntax:
cur.execute("SELECT * FROM failed_inserts where insertid='%s' AND site_failure=True"%import_id)
failed_sites= cur.fetchall()
This returns the correct values as a list with the data's integrity and order maintained. However I want to query the list returned somewhere else in my application and I only have this list of values, i.e. it is not a dictionary with the fields as the keys for these values. Rather than having to do
desiredValue = failed_sites[13] //where 13 is an arbitrary number of the index for desiredValue
I want to be able to query by the field name like:
desiredValue = failed_sites[fieldName] //where fieldName is the name of the field I am looking for
Is there a simple way and efficient way to do this?
Thank you!
cursor.description will give your the column information (http://www.python.org/dev/peps/pep-0249/#cursor-objects). You can get the column names from it and use them to create a dictionary.
cursor.execute('SELECT ...')
columns = []
for column in cursor.description:
columns.append(column[0].lower())
failed_sites = {}
for row in cursor:
for i in range(len(row)):
failed_sites[columns[i]] = row[i]
if isinstance(row[i], basestring):
failed_sites[columns[i]] = row[i].strip()
The "Dictionary-like cursor", part of psycopg2.extras, seems what you're looking for.