Trying to get ISBN numbers from Innovative OPAC via SQL (from Sierra Listserv)
Hi Liza:
Ray Voelker had some good suggestions for you. I have a very small database compared to many of you so I rarely run into the timeout issues you are experiencing. Ray's suggestions for use of LIMIT and OFFSET along with ORDER may be what you need to do.
I have a few other suggestions that may also help.
1) Get rid of unnecessary JOINs
In your original query, you join bib_view to subfield_view unnecessarily, I think. subfield_view contains the column "record_type_code" with which you can limit your output to bib records without the need for the join.
On my small database, your original query takes approximately 14 seconds to run whereas the following alternative takes approximately 5.5 seconds. That's quite a saving.>
-- ============================================================================
-- Option 1 - no joins to other views
-- ============================================================================
SELECT
isbn.content AS "ISBN"
FROM
sierra_view.subfield_view AS isbn
WHERE
isbn.record_type_code = 'b'
AND
isbn.marc_tag IN ('020','024')
AND
isbn.tag = 'a'
;
Note that I have also used the IN ('n1','n2','n3',etc) format in place of the ORs in your original query. I don't think it saves anything on processing time but it sure saves typing time.
2) Removing extraneous data from ISBNs
You have already limited your output to only subfield 'a' of 020 or 024 so you shouldn't be seeing any subfield delimiters. But, you may have extraneous information within subfield 'a' following the number. Assuming that there is always a space between the number and extraneous data (which may be an incorrect assumption), you could extract the portion of the string that appears before the space like this:
-- ============================================================================
-- Option 2 - with CASE statement to remove data that follows ISBN number
-- Assumes that there is a space following the number
-- ============================================================================
SELECT
CASE
WHEN POSITION(' ' IN isbn.content) != 0
THEN SUBSTR(isbn.content,1,POSITION(' ' IN isbn.content)-1)
ELSE
isbn.content
END AS "ISBN (Edited)"
FROM
sierra_view.subfield_view AS isbn
WHERE
isbn.record_type_code = 'b'
AND
isbn.marc_tag IN ('020','024')
AND
isbn.tag = 'a'
On my small database, I didn't see a noticeable increase in processing time with the addition of the CASE statement but you might see one with a large database. You'll have to weigh the pros and cons.
3) Limiting to first ISBN per record
Each bib record can have multiple ISBNs. Depending on your project, you may not need to export all of them. If your project could get by with only one ISBN per record, you can trim off some more time by limiting to the first ISBN per record. The following query run over my small database takes a bit less than 4 seconds:
-- ============================================================================
-- Option 3 - Exclude all but the first ISBN found for each record
-- ============================================================================
SELECT
CASE
WHEN POSITION(' ' IN isbn.content) != 0
THEN SUBSTR(isbn.content,1,POSITION(' ' IN isbn.content)-1)
ELSE
isbn.content
END AS "ISBN (Edited)"
FROM
sierra_view.subfield_view AS isbn
WHERE
isbn.record_type_code = 'b'
AND
isbn.marc_tag IN ('020','024')
AND
isbn.tag = 'a'
AND
isbn.occ_num = 0
;
--
Brent Searle
Library Systems Manager
On 2015-05-01 12:00 PM, Liza wrote:
Hi all,
I am trying to pull a list of ISBNs to give to vendors. The SQL query times out with this error:
----------------------------------------------------
ERROR: canceling statement due to statement timeout
SQL state: 57014
----------------------------------------------------
I have tried adding a LIMIT and it will run if the limit is under 100000. If I run the query with a limit, how do I run it again where the first query left off?
I would also like to get just the ISBNs without the delimiters. I can put the results into Excel and get rid of the delimiters there, but is there a way to get just the numbers in the SQL results?
Here's the query:
SELECT subfield_view.content as ISBN
FROM sierra_view.bib_view, sierra_view.subfield_view
WHERE (subfield_view.record_id = bib_view.id) AND ((subfield_view.marc_tag = '020') OR (subfield_view.marc_tag = '024')) AND (subfield_view.tag = 'a');
Query with the limit:
SELECT subfield_view.content as ISBN
FROM sierra_view.bib_view, sierra_view.subfield_view
WHERE (subfield_view.record_id = bib_view.id) AND ((subfield_view.marc_tag = '020') OR (subfield_view.marc_tag = '024')) AND (subfield_view.tag = 'a')
LIMIT 100000;
Thanks for your help,
Liza
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Liza Miller Arend
Systems Administrator
Minuteman Library Network