Is it possible to get data by value in Redis? - python

I already checked to be able to find a row by key in Redis. But I wonder if it's possible to find a row by value in same row. For example, my row's data is {"1", "A", "B"} and I wanna find the row by "A" or "B" not by "1" (first columns is key in this case) with Python.

Redis has nothing out of the box for this. You can create a secondary index in Redis on the value. It comes at a cost though - you need more memory to store the index.

you can build your own 'value index'.
for example, you can add a second key with type sorted set, with key: A, value: 1(1), 2(2), 3(3), 4(4), the number in parentheses is score, you can use your own score like timestamp.
so when you want first ten primary key with value A, use this:
zrangebyscore A -inf +inf limit 0 10

Related

What is the best way to create a dictionary using unique column values, and the corresponding range?

I am trying to create a dictionary with keys being the alphabetized, unique values of a column, and the values being the value of range.
So for example, If i have a column called "States" that we expect to have 50 unique values, there would be 50 keys, each containing a state. I would want the corresponding key to be 1 for the first key, and 50 for the last key.
The dictionary would look like this: {'AL':1, 'AK':2, .... 'WV':49, 'WY':50}
Ive tried something as follows -
mapper = {df.Statee.unique().tolist()[i]:i for i in range(1, len(df.State.unique().tolist()+1))}
but that doesn't work.
Something like this strikes me as most readable:
uniqs = df.State.unique()
mapper = {k:v+1 for v, k in enumerate(uniqs)}

Python Gsheets insert_row is skipping data

I have a double-nested dictionary, where the value returned is a list with characteristics about a person. I want to write each value in the list to google sheets, so have used gspread. Here's my code:
for person in list_id:
index = 2
for key, value in enrich_dict.items():
for keytwo, valuetwo in value.items():
row = [valuetwo[0], valuetwo[1], valuetwo[2], valuetwo[3], person]
sheet.insert_row(row, index)
index += 1
for some reason, valuetwo[3] is never inserted into the sheet, I just get 4 columns of data. No matter what data I test with (have tried using simple strings), this is always the case, the 4th value is skipped.
Can you post an example of your input and expected output?

Fastest pythonic way to loop over dictionary to create new Pandas column

I have a dictionary "c" with 30000 keys and around 600000 unique values (around 20 unique values per key)
I want to create a new pandas series "'DOC_PORTL_ID'" to get a sample value from each row of column "'image_keys'" and then look for its key in my dictionary and return. So I wrote a function like this:
def find_match(row, c):
for key, val in c.items():
for item in val:
if item == row['image_keys']:
return key
and then I use .apply to create my new column like:
df_image_keys['DOC_PORTL_ID'] = df_image_keys.apply(lambda x: find_match(x, c), axis =1)
This takes a long time. I am wondering if I can improve my snippet code to make it faster.
I googled a lot and was not able to find the best way of doing this. Any help would appreciated.
You're using your dictionary as a reverse lookup. And frankly, you haven't given us enough information about the dictionary. Are the 600,000 values unique? If not, you're only returning the first one you find. Is that expected?
Assume they are unique
reverse_dict = {val: key for key, values in c.items() for val in values}
df_image_keys['DOC_PORTL_ID'] = df_image_keys['image_keys'].map(reverse_dict)
This is as good as you've done yourself. If those values are not unique, you'll have to provide a better explanation of what you expect to happen.

Efficiently selecting rows from pandas dataframe using sorted column

I have a large-ish pandas dataframe with multiple columns (c1 ... c8) and ~32 mil rows. The dataframe is already sorted by c1. I want to grab other column values from rows that share a particular value of c1.
something like
keys = big_df['c1'].unique()
red = np.zeros(len(keys))
for i, key in enumerate(keys):
inds = (big_df['c1'] == key)
v1 = np.array(big_df.loc[inds]['c2'])
v2 = np.array(big_df.loc[inds]['c6'])
red[i] = reduce_fun(v1,v2)
However this turns out to be very slow I think because it checks the entire columns for the matching criterion (even though there might only be 10 rows out of 32 mil that are relevant). Since big_df is sorted by c1 and the keys is just the list of all unique c1's, is there a fast way to get the red[] array (ie i know the first row with the next key is the row after the last row of the previous key, I know that the last row for a key is the last row that matches the key, since all subsequent rows are guaranteed not to match).
Thanks,
Ilya
Edit: I am not sure what order unique() method produces, but I basically want to have for every key in keys a value of reduce_fun(), I don't particularly care what order they are (presumably the easiest order is the order c1 is already sorted in).
Edit2: I slightly restructured the code. Basically, is there an efficient way of constructing inds. big_df['c1'] == key takes 75.8% of total time in my data, while creating v1, v2 takes 21.6% according to line profiler.
Rather than a list, I chose a dictionary to hold the reduced values keyed on each item in c1.
red = {key: reduce_func(frame['c2'].values, frame['c7'].values)
for key, frame in df.groupby('c1')}
How about a groupby statement in a list comprehension? This should be especially efficient given the DataFrame is already sorted by c1:
Edit: Forgot that groupby returns a tuple. Oops!
red = [reduce_fun(g['c2'].values, g['c6'].values) for i, g in big_df.groupby('c1', sort=False)]
Seems to chug through pretty quickly for me (~2 seconds for 30 million random rows and a trivial reduce_fun).

pandas value to find in a dictionary and return key - python

In my pandas data frame column, I need to check if the column has any of the word in the dictionary values, then I should return the key.
my_dict = {'woodhill': ["woodhill"],'woodcocks': ["woodcocks"], 'whangateau' : ["whangateau","whangate"],'whangaripo' : ["whangaripo","whangari","whangar"],
'westmere' : ["westmere"],'western springs': ["western springs","western springs","western spring","western sprin",
"western spri","western spr","western sp","western s"]}
I can write a for loop for this, however, I have nearly 1.5 million records in my data frame and the dictionary has more than 100 items and each may have up to 20 values in some case. How do I do this efficiently? Can I create reverse the values as key and key as values in the dictionary to make it fast? Thanks.
you can reverse your dictionary
reversed_dict = {val: key for key in my_dict for val in my_dict[key]}
and then map with your dataframe
df =pd.DataFrame({'col1':['western springs','westerns','whangateau','whangate']})
df['col1'] = df['col1'].map(reversed_dict)
Try this code, this may help you.
1st reverse the dictionary items. # as limited items , so it'll be fast.
2nd create dataframe from dictionary. # instead of searching all keys for each comparison with dataframe, it's best to do join. so for that create dataframe.
3rd make left join from big size dataframe to small size dataframe (in this case dictionary).

Categories

Resources