Best way to store and manipulate data with Python - python

I want to find a way to write my data into a file, read it back from the file and sort it, read the sorted version of it.
Basically what I have is:
name: string
average: float
sum: float
coordinates: list of lists, contains floats. can be variable length for each name
I will sort the entries with respect to average or sum field. Then I will read the name and coordinates in the sorted order.
I tried writing a dictionary of dictionaries to a json; however, I couldn't really sort it after reading it back and couldn't manipulate it as I wanted to. My dictionary was like
big_dictionary = {"name1":{"avg":0.1, "sum":0.2, "coordinates":[[0,1,2,3],[4,5,6,7]]}, "name2":{....}}
I also tried csv ); but, I couldn't read the data back with its original data types (I couldn't read the list of lists to a list of lists for instance)
big_list = [[name1, avg1, sum1, [coordinates1, coordinates2,...]], [name2, ...]]
I know that one option is to use pandas. I haven't tried it yet because I am not familiar with it, and I am afraid losing even more time while struggling with its methods. If you recommend this way, I also need some more information
What should I do in this case?
UPDATE: Also, what about ordereddict?

you could use a list of dictionaries so you could sort easy:
data = [{"name": "name1", "avg":0.1, "sum":0.2, "coordinates":[[0,1,2,3],[4,5,6,7]]}, ..]
data.sort(key: lambda x: x["avg"]) # or sum

Sticking with JSON you could sort your data then write it as a list of dicts rather than a dict of dicts:
big_ordered_list_of_dicts = [
{"name":"name1", "avg":0.1, "sum":0.2, "coordinates":[[0,1,2,3],[4,5,6,7]]},
{"name":"name2", ... },
...,
{"name":"zzzzz", ... },
]
which will still be in the same order after writing to JSON and reading back in. It's also quite easy to re-order this list, for example
list_in_sum_order = sorted( big_ordered_list_of_dicts, key=lambda x: x['sum'] )
and relatively efficient since it just builds another list, it does not copy or move the actual data in the dicts

Related

Dictionary created from zip() only contains two records instead of 1000+

I am currently doing the US Medical Insurance Cost Portfolio Project through Code Academy and am having trouble combining two lists into a single dictionary. I created two new lists (smoking_status and insurance_costs) in hope of investigating how insurance costs differ between smokers and non-smokers. When I try to zip these two lists together, however, the new dictionary only has two components. It should have well over a thousand. Where did I go wrong? Below is my code and output. It is worth nothing that the output seen is the last two data points in my original csv file.
import csv
insurance_db =
with open('insurance.csv',newline='') as insurance_csv:
insurance_reader = csv.DictReader(insurance_csv)
for row in insurance_reader:
insurance_db.append(row)
smoking_status = []
for person in insurance_db:
smoking_status.append(person.get('smoker'))
insurance_costs = []
for person in insurance_db:
insurance_costs.append(person.get('charges'))
smoker_dictionary = {key:value for key,value in zip(smoking_status,insurance_costs)}
print(smoker_dictionary)
Output:
{'yes': '29141.3603', 'no': '2007.945'}
A key can’t be presented twice in a dictionary.
If I understand correctly, there are 2 statuses, “yes” and “no”.
If so, then the “correct” dictionary structure would be:
{'yes':[...], 'no': [...]}
You can create an empty dictionary as:
{status: [] for status in set(smoking_status)}
And then run on your zipped list and append to the correct key.
Why are you looping three times over basically the input anyway?
entries = []
with open('insurance.csv',newline='') as insurance_csv:
for row in csv.DictReader(insurance_csv):
entries.append([row["smoker"], row["charges"]])
We can't guess what your expected output should be; zipping two lists should create one list with the rows in each list joined up, so that's what this code does. If you want a dictionary instead, or a list of dictionaries, you'll probably want to combine the data from each input row differently, but the code should otherwise be more or less this.
To spell this out, if your input is
id,smoker,charges
1,no,123
2,no,567
3,yes,987
then the output will be
[["no", 123],
["no", 567],
["yes", 987]]

Using locals() to create a list of dictionaries

This might be simple, but I'm stuck. I have globals() that creates dictionaries based on zipping lists (that will differ in sizes, thus differ in the number of the dictionaries that get created). The new dictionaries that get created look like the below:
dict0 = {foo:bar}
dict1 = {more_foo:more_bar}
How do I call these new dictionaries in a for loop?
I want my script to do the below:
for i in (dict0, dict1):
The only issue is that the number of dictx (dictionaries) will differ based on the inputs from the script.
As nicely put in comments, in your case, you should append the dictionaries to a list:
list_iterator = list()
# create dict 1.
list_iterator.append(dict1)
# create dict 2.
list_iterator.append(dict2)
# and so on. If your dict create algorithm is repetetive, you can add the append command to the end.
I figured it out...
for i in range(len(someList)):
dicts = locals()['dict' + str(i)]

Display the top 2 highest difference car records

How to display the top 2 rows of highest difference from a text file in python
For example here is a text file:
Mazda 64333 53333
Merce 74321 54322
BMW 52211 31432
The expected output would be
Merce 74321 54322
BMW 52211 31432
I tried multiple codes but only managed to display the actual difference and not the whole row.
would this work for you?
from operator import itemgetter
with open("x.txt", "r+") as data:
data = [i.split() for i in data.readlines()]
top = sorted([[row[0], int(row[1])-int(row[2])]for row in data],key=itemgetter(1), reverse=True)
print(top)
print(top[:2])
[['BMW', 20779], ['Merce', 19999], ['Mazda', 11000]]
[['BMW', 20779], ['Merce', 19999]]
So, at a glance, this might seem slightly complicated but it's really not!
let's break down each step of the following program
from operator import itemgetter
with open("x.txt", "r+") as data:
data = [i.split() for i in data.readlines()]
top = sorted([[row[0], int(row[1])-int(row[2])]for row in data],key=itemgetter(1), reverse=True)
now let's first note that operator is a built-in package, it's not an external import such as libraries like requests, and itemgetter is a pretty straightforward function.
with open("x.txt", "r+") as data should be pretty straight forward as well... all this does is open a text file with reading permissions and store that object in data.
we then use our first list comprehension which might look new to you...
data = [i.split() for i in data.readlines()]
all this is doing is going through each line for example car 123 122 and splitting it by spaces into a list like so ["car", "123", "122"].
Now if you look closely at the product of that, there's something wrong. The last 2 elements (which need to be integers to find the difference) are strings! hence, why we are going to have to use the next list comprehension to change that.
top = sorted([[row[0], int(row[1])-int(row[2])]for row in data],key=itemgetter(1), reverse=True)
This is a bit more complicated... but all it's really doing is sorting a simple list comprehension.
It goes through each value in data and gets the differences! Let's see how it does that.
As you know, our data looks something like [["car", "123", "122"], ["car1", "1234", "1223"]] right now. So, we would be able to access the integer values of ["car", "123", "122"] with [1] and [2], with this knowledge we can loop through the data, and get the difference of those when they are casted to integers. E.g int(row[1])-int(row[2]) of ["car", "123", "122"] would return 1 (the difference).
With this knowledge, we can create a new list with the comprehension that contains: the car's name row[0] and the difference int(row[1])-int(row[2]) represented by [row[0], int(row[1])-int(row[2])] in the list comp. while using row as each iterable in data we can easily form this! Heres that list comprehension by itself:
[[row[0], int(row[1])-int(row[2])] for row in data]
Finally, we have arrived at the last piece of this little program... the sorted function! sorted() will return a sorted list based on the key you give it (and you can use reverse=True to have the greatest values first). it's really not that hard to understand when it's abbreviated as follows:
sorted([the list comprehension],key=itemgetter(1), reverse=True)
So while you might know that yes, it's sorting that list comprehension we made and listing the biggest values first, you might not know how its sorting this! To know how it's being sorted we need to look at the key.
itemgetter() is a function of the operator class. All you really need to know about it is that it's getting the 1st index of the lists given and therefore sorting by that. If you can recall each element of our data looks like ["car", difference] (difference is just a placeholder for what actually is the integer difference). Since we want the greatest differences then it makes sense to sort by them right?
using itemgetter(1) it will sort by the 1st index; the difference! and that pretty much sums it up :)
we store all of that to the variable top and then print the first two elements with print(top[:2])
I hope this helped!
Create a dict that contains the distances of each row with the car brand as key.
Then you can sort the dict.items() using the values and return the top 2

Python - Collections

Am new to Python, and would like to know, how to store list of different DataTypes inside a dictionary with a Key
for Example -
{[Key1,int1,int1,String1] , [Key2,int2,int2,String2], [Key3,int3,int3,String3] }
how to create Dictionary and add these elements?
Assuming you meant that your data is:
lst = [[Key1,int1,int1,String1] , [Key2,int2,int2,String2], [Key3,int3,int3,String3]]
Then you could do something like:
{x[0]:x[1:] for x in lst}
What you actually have up there is an attempt to create a set out of a bunch of lists -- and that won't work because list objects aren't hashable.

python dictionary float search within range

Suppose I have some kind of dictionary structure like this (or another data structure representing the same thing.
d = {
42.123231:'X',
42.1432423:'Y',
45.3213213:'Z',
..etc
}
I want to create a function like this:
f(n,d,e):
'''Return a list with the values in dictionary d corresponding to the float n
within (+/-) the float error term e'''
So if I called the function like this with the above dictionary:
f(42,d,2)
It would return
['X','Y']
However, while it is straightforward to write this function with a loop, I don't want to do something that goes through every value in the dictionary and checks it exhaustively, but I want it to take advantage of the indexed structure somehow (or a even a sorted list could be used) to make the search much faster.
Dictionary is a wrong data structure for this. Write a search tree.
Python dictionary is a hashmap implementation. Its keys can't be compared and traversed as in search tree. So you simply can't do it using python dictionary without actually checking all keys.
Dictionaries with numeric keys are usually sorted - by key values. But you may - to be on the safe side - rearrange it as OrderedDictionary - you do it once
from collections import OrderedDict
d_ordered = OrderedDict(sorted(d.items(), key =lambda i:i[0]))
Then filtering values is rather simple - and it will stop at the upper border
import itertools
values = [val for k, val in
itertools.takewhile(lambda (k,v): k<upper, d_ordered.iteritems())
if k > lower]
As I've already stated, ordering dictionary is not really necessary - but some will say that this assumption is based on the current implementation and may change in the future.

Categories

Resources