I am very beginner to python.
I have two tables named as Table A and Table B, In Table A have 1M record is available and Table B have 14M records is available and each record is a very big sentence(Paragraph) with special character numbers etc..,
I want to split the sentences into words of each record and compare the each row of Table A Column 1 into each and row of the Table B column 1 and I would like to the find top 5 highest match(Most relevant match) from the Table B record.
And, if I compare like 1M*14M it tooks more time could you please suggest any one the right way to do in python with mongodb
Related
I have two data sets in excel. Like below (Table 1 and Table 2) I am trying to get result in Table 1 as Yes/No if the date matches for the corresponding ID in Table 2. See result Column in Table 1. Can you please let me know how this can be achieved using excel formulas? Thanks
Table 1
Table1
Table 2
Table2
You could try this:
The formula I've used is:
=IF(COUNTIFS($G$3:$G$6;A3;$H$3:$H$6;B3)=0;"No";"Yes")
Here is a snip of my data for reference
Column A contains 9134 study IDs and column B contains 9467 study IDs. I have previously applied exclusions to column A (it was once column B before I deleted certain people due t exclusion) Columns C-G is the new data which corresponds to column B (there are 9467 rows of the data)
What I am looking to do is match the data to column A..for example "Study ID 2 #1195" I would like that data to line up with that Study ID 1 #1195 in column A
ALSO, because I intentionally deleted some people from column A, there are Study ID 2 #'s that have been deleted from Study ID 1..so I am not interested in those people
The ultimate goal is to line up the data with column A in order to copy and paste seamlessly into an existing SPSS database
I am not sure how to go about this. Any help would be greatly appreciated!!
HERE IS A LINK TO MY DATASET
https://drive.google.com/open?id=1lhbuthqNPLLi8KRVmCOEqmBkh5dy2C41
This seems like more of an excel question than any kind of coding question. I think all you want to do is keep records where the study id exists in column A.
This can be achieved in Excel y doing a vlookup on column b where you check if a record exists in column A.
You can then filter out any records that don't exist and copy that information to a new sheet.
Example:
Add new column
Apply filter:
I have a table which has 27 million id columns.
I plan to update the average and count from another table which is taking very long to complete.
Below is the update query (Database - MySQL, I am using Python to connect to the Database)
UPDATE dna_statistics
SET chorus_count =
(SELECT count(*)
FROM dna B
WHERE B.music_id = <music_id>
AND B.label = 'Chorus')
WHERE music_id = 916094
As scaisEdge already said, you need to check if there are indices on the two tables.
I would like to add to scaisEdge's answer that the order of the columns in the composite index should match the order in which you compare them.
You used
WHERE B.music_id = <music_id>
AND B.label = 'Chorus')
So your index should consist of the columns in order (music_id, label) and not (label, music_id).
I would have added this as comment, but I'm still 1 reputation point away from commenting.
UPDATE clause isn't good solution for 27 millions id's
use EXCHANGE PARTITION instead
https://dev.mysql.com/doc/refman/5.7/en/partitioning-management-exchange.html
be sure you have a cmposite index on
table dna column ( label, music_id)
and a index on
table dna_statistics column (music_id)
I am new to python/pandas and have a fairly basic question.
I have 2 tables with numerous columns and "ID" as the primary key.
I want to create another table with conditions based on the 2 tables.
For example: Table A, Table B --> Table C
In SQL I would write something like this:
create table TableC as select
a.ID,
case when b.Field1=1000 and a.Field1=50 then 20 else 0 end as FieldA,
case when b.Field2=15 and a.Field2=100 then 100 else 0 end as FieldB
from TableA a, TableB b
where a.ID=b.ID
order by 1
I am struggling to put similar together Table C using python.
I have tried to make a function but I cant seem to include more than 1 table in the function nor create a new table based off multiple tables.
Any help will be much appreciated.
IIUC
TableC=TableA.merge(TableB,on='ID')
TableC['FieldA']=np.where((TableC.Field1_x==1000)& (TableC.Field1_y==50),20,0)
TableC['FieldB']=np.where((TableC.Field2_x==15)& (TableC.Field2_y==100),100,0)
I have a big dataset with 4.5 million rows and 150 columns. I want to create a table in my database, and I want to create an index for it.
There isn't a column with ids and I would like to know if there is an easy way to find a column or combination of columns that could be unique to base my index on those.
I am using python and pandas