class: center, middle # Solr and Plone - Search and Beyond Manuel Reinhardt - Syslab.com - reinhardt@syslab.com ??? * Clone presentation, move to workspace 2, move to external screen * Activate presentation mode * Handy lautlos * Stoppuhr auf dem Handy * Timer in der Präsentation --- exclude: true # Outline --- # What is Solr? * Lucene indexing and search engine * Plus HTTP interface, caching, replication etc. * Open Source ??? * Lucene Core * Java-based indexing and search engine * Solr * XML/JSON over HTTP * caching, replication, web admin interface, etc. * high performance search server * Apache - Open Source --- exclude: true # Using Solr Demo --- # Connecting Solr to Plone * Why? * Speed * Features ??? * Why? * Speed * portal_catalog is faster than objects * but has its limits * Features * Faceting * Highlighting * Complex queries * Cross-site search * etc. --- # Connecting Solr to Plone * How? * collective.solr * alm.solrindex * scorched & ploneintranet.search --- # collective.solr * Indexes both in catalog and solr (`collective.indexing`) * If SearchableText is in the query, divert to Solr --- exclude: true # collective.solr * Generalisation: Required query parameters * May be empty - always use solr ![Required query parameters](required-query-parameters-01.png) --- # alm.solrindex * Implements single index with a Solr connector --- # scorched * low-level Solr API * more powerful * less convenient ??? ```python In [1]: si.query(SearchableText='test') ``` --- # ploneintranet.search * Based on scorched * Tuned towards Plone/Ploneintranet --- # Features * Boosting * Highlighting * Cross-site search --- # Boosting *Take Control over your Search Results* * A good search shows the most relevant results first -- * Solr results are sorted by score -- * Influence score via boosting -- * Example: Date boost - recently added/modified content scores higher ``` recip(ms(NOW/HOUR, modified), 3.16e-11, 0.08, 0.05) ``` --- # Highlighting ![UniBW Highlighting](unibw-highlighting-01.png) --- # Highlighting *Context for your Search Terms* * In what connection does this document mention my search term? * Solr can deliver snippets containing the search term ??? * What Google does -- * Caveat: SearchableText must be stored (metadata) * Explicitly exclude from response --- exclude: true # Faceting ![Recensio browsing](recensio-browsing-01.png) --- exclude: true # Faceting * Drill down to improve initial results * Numbers indicate new number of results --- ## Cross-Site Search *Search all your Content in one Place* * There's more than just your Plone site * What if the user doesn't know where the information is? -- * Index multiple sources into same Solr * Other Plone sites * Arbitrary sites with Nutch -- * Tweak query to search for all paths (not only /Plone) * Tweak results page to get external links right --- # Cross-Site Search ![UniBW Cross-Site Search](unibw-cross-site-01.png) --- exclude: true # Complex Queries *Formulate precisely what you want to find* * Give me all items that * are inside Folder A *OR* are tagged with T *AND* * have been modified within the last week *OR* created within the last month -- exclude: true * Catalog has hard rules (*AND* for keys, *OR* for values) * Solr has flexible query language (AND/OR/NOT, parentheses, ...) ??? Real-world example? --- # Potential Pitfalls * Network overhead * transaction integrity * Commits are slow --- exclude: true # Batching * Solr does proper batching * May take some getting used to --- # Network overhead * Index processing time vs. network time * With a local Solr you should be fine ??? * Exclude SearchableText --- # Transaction integrity * Basic transaction support in c.indexing * Sometimes Solr gets out of sync --- # Commits are slow * Optimised for lookup -- * Asynchronous commits * Delay in index update ??? * Calendar view * Create event takes you back to calendar * Generated from Solr * Not up to date after creating event --- # Summary * Several methods of connecting * Awesome features * Some (manageable) costs and risks -- *Should I use Solr?* * Fast query, slow indexing * Large site * Features ??? * If you subscribe to "query should be fast, indexing can take time" * If your site is large * If any of the features are of value to you --- # Thank you! Solr Training - https://training.plone.org/5/solr/index.html Syslab.com - http://www.syslab.com/ * Web Solutions * Intranets * Risk Assessment .center[![Syslab.com Logo](logo_syslabcom_smaller.jpg)]