Data Science Project - CS 839
Team Members
- Sri Harshal Parimi (sparimi@wisc.edu)
- Shebin Roy Yesudhas (royyesudhas@wisc.edu)
- Sankarshan Umesh Bhat (sbhat6@wisc.edu)
Stage-1: Information Extraction from natural text
Stage-2: Crawling and Extracting Structured data from web-pages
Stage-3: Entity-matching
Matching Fodors and Zagats
- UserId: Avengers
- ProjectId: endgame
- Screenshot
Blocking Results
- UserId: Avengers
- ProjectId: MoviesMatcher
- Screenshot
Matching Results
- UserId: Avengers
- ProjectId: MoviesMatcher
- Screenshot
Estimating accuracy
- Candidate set - 72165 tuple pairs
- Prediction list
- Table A
- Candidate set size is 72165 which is greater than 500
- Report for blocking rules
- Code for blocking rules
- Reduced candidate set
- Labeled Tuple pairs
- Recall = [0.9371096866388409 - 0.9910340259360095]
- Precision = [0.9186143717366582 - 0.9782780006778256]