Navigation

5/04/2015

ISBN field from OPAC marc records

Trying to get ISBN numbers from Innovative OPAC via SQL (from Sierra Listserv)

Hi Liza:

Ray Voelker had some good suggestions for you. I have a very small database compared to many of you so I rarely run into the timeout issues you are experiencing. Ray's suggestions for use of LIMIT and OFFSET along with ORDER may be what you need to do.

I have a few other suggestions that may also help.

1) Get rid of unnecessary JOINs

In your original query, you join bib_view to subfield_view unnecessarily, I think.  subfield_view contains the column "record_type_code" with which you can limit your output to bib records without the need for the join.

On my small database, your original query takes approximately 14 seconds to run whereas the following alternative takes approximately 5.5 seconds. That's quite a saving.
>
-- ============================================================================
-- Option 1 - no joins to other views
-- ============================================================================
SELECT
  isbn.content                     AS "ISBN"
FROM
  sierra_view.subfield_view        AS isbn
WHERE
  isbn.record_type_code = 'b'
  AND
  isbn.marc_tag IN ('020','024')
  AND
  isbn.tag = 'a'
; 


Note that I have also used the IN ('n1','n2','n3',etc) format in place of the ORs in your original query. I don't think it saves anything on processing time but it sure saves typing time.


2) Removing extraneous data from ISBNs

You have already limited your output to only subfield 'a' of 020 or 024 so you shouldn't be seeing any subfield delimiters. But, you may have extraneous information within subfield 'a' following the number. Assuming that there is always a space between the number and extraneous data (which may be an incorrect assumption), you could extract the portion of the string that appears before the space like this:
-- ============================================================================
-- Option 2 - with CASE statement to remove data that follows ISBN number
--   Assumes that there is a space following the number
-- ============================================================================
SELECT
  CASE
    WHEN POSITION(' ' IN isbn.content) != 0
    THEN SUBSTR(isbn.content,1,POSITION(' ' IN isbn.content)-1)
    ELSE
    isbn.content
  END                                              AS "ISBN (Edited)"
FROM
  sierra_view.subfield_view                        AS isbn
WHERE
  isbn.record_type_code = 'b'
  AND
  isbn.marc_tag IN ('020','024')
  AND
  isbn.tag = 'a'

On my small database, I didn't see a noticeable increase in processing time with the addition of the CASE statement but you might see one with a large database. You'll have to weigh the pros and cons.


3) Limiting to first ISBN per record

Each bib record can have multiple ISBNs. Depending on your project, you may not need to export all of them. If your project could get by with only one ISBN per record, you can trim off some more time by limiting to the first ISBN per record. The following query run over my small database takes a bit less than 4 seconds:
-- ============================================================================
-- Option 3 - Exclude all but the first ISBN found for each record
-- ============================================================================
SELECT
  CASE
    WHEN POSITION(' ' IN isbn.content) != 0
    THEN SUBSTR(isbn.content,1,POSITION(' ' IN isbn.content)-1)
    ELSE
    isbn.content
  END                                              AS "ISBN (Edited)"
FROM
  sierra_view.subfield_view                        AS isbn
WHERE
  isbn.record_type_code = 'b'
  AND
  isbn.marc_tag IN ('020','024')
  AND
  isbn.tag = 'a'
  AND
  isbn.occ_num = 0
;
 
--
Brent Searle
Library Systems Manager


On 2015-05-01 12:00 PM, Liza wrote:
Hi all,

I am trying to pull a list of ISBNs to give to vendors. The SQL query times out with this error:
----------------------------------------------------
ERROR: canceling statement due to statement timeout
SQL state: 57014
----------------------------------------------------

I have tried adding a LIMIT and it will run if the limit is under 100000. If I run the query with a limit, how do I run it again where the first query left off? 

I would also like to get just the ISBNs without the delimiters. I can put the results into Excel and get rid of the delimiters there, but is there a way to get just the numbers in the SQL results? 

Here's the query:

SELECT subfield_view.content as ISBN
FROM sierra_view.bib_view, sierra_view.subfield_view
  WHERE (subfield_view.record_id = bib_view.id) AND ((subfield_view.marc_tag = '020') OR (subfield_view.marc_tag = '024')) AND (subfield_view.tag = 'a');

Query with the limit:

SELECT subfield_view.content as ISBN

FROM sierra_view.bib_view, sierra_view.subfield_view

WHERE (subfield_view.record_id = bib_view.id) AND ((subfield_view.marc_tag = '020') OR (subfield_view.marc_tag = '024')) AND (subfield_view.tag = 'a')

LIMIT 100000;
Thanks for your help,
Liza
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Liza Miller Arend
Systems Administrator
Minuteman Library Network

No comments:

Post a Comment