Navigation

2/26/2015

OPAC SQL: speed of queries

Here is a way to speed up SQL queries in Innovative Interfaces:




Here’s a version of the query that completes in about 1500 ms on our database, or about 0.017 ms per record, which, if I’m right, should shave an additional couple of seconds off for you (7650 ms):

SELECT
  rm.record_type_code || rm.record_num AS record_num,
  email.field_content AS email,
  pname.last_name || ', ' || pname.first_name || ' ' || pname.middle_name AS name,
  pr.home_library_code
FROM
  sierra_view.record_metadata rm
    JOIN sierra_view.patron_record pr
      ON rm.record_type_code = 'p' AND rm.id = pr.record_id
    JOIN sierra_view.varfield email
      ON email.varfield_type_code = 'z' AND email.record_id = pr.record_id
    JOIN sierra_view.patron_record_fullname pname
      ON pname.patron_record_id = pr.record_id
WHERE
  email.field_content !~ '[^@]+@[^@]+[.][^@]+';

The main difference here between this one and yours and Joe’s is that it doesn’t use any of the *_view tables. Those are constructed views that III created for convenience, but in my experience they’re almost always slower than just picking whichever tables you need and joining manually. Of course, 7-8 seconds is still a very long response time for a UI, so doing something like what Joshua suggested may still be necessary.

As an aside, if you’re not familiar with using EXPLAIN and EXPLAIN ANALYZE, these are very helpful tools for figuring out what’s going on in a particular query. Their output is a bit cryptic, but there are several helpful tutorials/explanations available. I always have to refer back to the Postgresql documentation, here:


If you’re using pg-admin, it can give you a nice visual representation (in the Query window, go to Query / Explain and Query / Explain Analyze) that’s a bit easier to parse than the raw output.

Good luck!

Jason Thomale
User Interfaces Unit, UNT Libraries


From: Boggs, John Sent: Wednesday, February 25, 2015 12:07 PM
To: Sierra Users List
Subject: [sierra] Speed up an SQL query?

Hi all,

I'm trying to put together a canned report for my circ folks of badly-formed email addresses. The report takes about 15 seconds to run, which isn't really all that long but it does lead folks waiting for a web page to click on the link again thinking maybe their click didn't work -- and that just causes the query to start all over again. I have a much more complex query, over much more data, for items in transit too long that runs in about 3 seconds and I'm puzzled as to why the patron one would run so much slower.  We have 2.5 million items and only 450,000 patrons.

I'm attaching the query in case anybody sees anything I'm missing. I have tried selecting from each of the three tables that I'm using and joining in a different orders, but it all takes at least 15 seconds. 

Thanks for any suggestions!

SELECT
  'p' || pat_view.record_num || 'a',
  email.field_content,
  name.last_name || ',' || name.first_name || ' ' || name.middle_name,
  pat_view.home_library_code
FROM
  sierra_view.patron_view AS pat_view
JOIN sierra_view.varfield_view AS email
  ON ( pat_view.record_num = email.record_num AND
       email.record_type_code = 'p' AND
       email.varfield_type_code = 'z' )

JOIN sierra_view.patron_record_fullname AS name
  ON (pat_view.id = name.patron_record_id)

WHERE
  email.field_content !~ '[^@]+@[^@]+[.][^@]+';

No comments:

Post a Comment