In a previous post on March 23, 2012, we talked a little about our
efforts to create a more useable report for unmatched headings. We have
added more functionality to the report that we hope helps clarify the
results. Also, we plan to continue to refine the algorithm we use for
the near matches as well as the confidence level we have assigned to
each near match.
Here are a few examples of the report (from its current build):
ocm05472887
100 1_ $a Allen, Junius Mordecai.
99.5%
no 95045186
400 1_
$a Allen, Junius Mordecai, $d 1875-1906.
56.5%
no 00103969
100 1_
$a Allen, Junius, $d 1898-1962
ocm77567496
650 _0 $a Adventure and adventurer $v Fiction.
97.1%
sh 85001072
450 __
$a Adventure and adventurers $v Fiction
70.6%
sh2009113774
150 __
$a Adventure and adventurers $z Europe $v Biography
ocm02224738, ocm02464058, ocm02735261, ocm03462153, ocm04493529
490 0_ $a Old West
99.5%
no 96034673
130 _0
$a Old West (Alexandria, Va.)
99.5%
n 99000801
151 __
$a Old West Lawrence Historic District (Lawrence, Kan.)
Not all near matches will be ranked so high on our "confidence level
percentage", but these three should give you a better idea of the
report's results.
We match as much of the original heading to the near match as possible.
Whatever matches on the unmatched heading is highlighted in BLUE. Parts
of the near match that are potential typos or new additions not
contained in the unmatched heading are offset in RED. Then the second
near match is also highlighted similar to the first near match, but in
GREEN, to help distinguish between the two near matches.
As a next step, we are looking into the possibility of sorting this
report based on percentile. So 90 percentile near matches will be listed
first (and sorted within that group A-Z). This might take some extra
finagling from our programming team to successfully implement, but we
will keep you updated on our progress.
While the higher percentile near matches are useful for letting you know
what may actually be a valid match, we also want to point out that the
lower percentile matches are useful in identifying (or dismissing)
headings where there exists no near match. Every unmatched heading will
have two near matches listed underneath it, even if those near matches
are very low probability (less than 5%). This is due to how our
algorithm is setup to generate these near matches for the report.
This report is called:
R00 - Near Match Report.htm
Please feel free to contact your project managers in order to request
that we start delivering this report with your Current Cataloging
results (at no extra cost):
Judy Archer (email
<mailto:jarcher@bslw.com?subject=R00%20-%20Near%20Match%20Report> )
Stephanie Hansen (email
<mailto:shansen@bslw.com?subject=R00%20-%20Near%20Match%20Report> )
We will still be delivering R07 (Unmatched Headings) and R10 (Multiple
Authority Matches), so this R00 - Near Match Report won't yet replace
those. But since every unmatched heading will have two near matches
listed underneath, we do want to point out that it can be quite large
depending on the size of your Current Cataloging file (and matching
results).
We welcome your feedback!
Nate Cothran - nate(a)bslw.com
<mailto:nate@bslw.com?subject=Automation%20Services%20-%20Query>
Product Manager, Automation
Backstage Library Works
533 E 1860 S, Provo UT 84606
(p) 801.342.5697 - (f) 801.356.8220
www.ac.bslw.com/community/blog <http://ac.bslw.com/community/blog/>