Backstage - Near Match Report in XLS Format
We created a near match report in HTM format earlier last year, looking similar to this: 1. a9704426 100 1 $a Bernhardt, Rolf. 99.5% n 81070694 100 10 $a Bernhardt, Rolf, $d 1934- 93.8% n 00114719 100 1 $a Bernhardt, Rudolf, $d b. 1867? 2. a9733670 100 1 $a Carroll, Stephen J. 99.5% n 79041728 100 1 $a Carroll, Stephen J., $d 1930- 99.5% n 79041110 100 1 $a Carroll, Stephen J., $d 1940- As a direct result of requests from clients that our reports also be available in XLS format, we delved into figuring out how to convert the existing HTM reports to XLS format. We also wanted to improve upon the functionality of working with XLS where possible. For instance, the Unrecognized $z report looks like this in HTM format: 2688172 600 10 $a Fradkin, Philip L. $x Travel $z Alaska $z Lituya Bay Region. 2678673 600 10 $a Harrison, Henrietta $x Travel $z China $z Chiqiao Village. When converted to XLS format, it looks like this instead: No Ctrl No Tag Inds Initial Heading Unrecognized $z 1 2688172 600 10 $a Fradkin, Philip L. $x Travel $z Alaska $z Lituya Bay Region. 2 2678673 600 10 $a Harrison, Henrietta $x Travel $z China $z Chiqiao Village. This allows our clients to sort the reports in different ways and provides means to edit/delete/change/distribute to other staff members as needed. But the Near Match report's conversion to XLS proved elusive. Mainly because it is structured so differently from the other types of reports we generate. Our planned purpose with the reports is to reduce the time spent on your side in reviewing issues or problems with the data. The hope is that our conversions to XLS helps to reduce that time spent on reviewing reports. The Near Match report is intended to help rule out certain types of headings based on the (low) confidence value we assign to the near matches we find. However, the HTM version of the report is onerous to scan through as it stands now. Also, it doesn't allow clients to do any of the editing mentioned above. We think we've cracked this egg, finally. And I do want to say that we have had several clients give us very good feedback on how to improve the report itself and make it more useful. We're still working on implementing these changes, but this is how the report stands now in XLS format: No Bib ID UNM Unmatched Heading %-1 Tag-1 Near Match-1 Authority-1 1 a9704426 100 $a Bernhardt, Rolf. 84.8% 100 $a Bernhardt, Rolf, $d 1934- n 81070694 2 a9733670 100 $a Carroll, Stephen J. 87.2% 100 $a Carroll, Stephen J., $d 1930- n 79041728 Then there are another set of 4 columns to the right that lists the second near matched heading (if any), along with the same kind of info that first near match has. One of the benefits of going to XLS for this report is that we are able to cut down its size to list only those unmatched headings with near matches at 75.0% confidence or greater. As an example, we reduced the size of one R00 report from listing 1,000 entries to 413 instead. And if the secondary near match isn't at least 75.0% we don't even list it. Clients can also choose to sort the XLS file by whatever column they like (e.g., sort on %-1 from highest to lowest, so you concentrate on only the percentage-levels you prefer). We're also employing a slightly different algorithm that isn't as forgiving for determining confidence values, which is why you may notice a difference between the HTM version and the XLS version. This is very nearly ready to be in place for our clients. We just wanted to keep you posted on our progress and welcome your additional feedback or impressions. Safe travels to ALA, everyone! Backstage is at booth #1944. Nate Cothran Vice President, Automation Services 533 East 1860 South Provo, Utah 84606 Phone: +1.800.288.1265, ext. 697 Direct: +1.801.342.5697 nate@bslw.com <mailto:nate@bslw.com?subject=Automation%20Services%20-%20Inquiry> * www.bslw.com <http://www.bslw.com>
participants (1)
-
Nate Cothran