PINQ - Interrogated datasets. Faceted search. Finding the right path: how faceted navigation affects SEO (translation)

Organizes the so-called faceted search (faceted navigation) on the site. Its meaning is that search results can be refined using various characteristics of the material - author, type, term, date of creation, etc.

For example, if you have an online store selling electronic technology, and the user enters the phrase into the search audio player. On the results page, in addition to the results themselves, there will be facets:

- Chapter: audio equipment (54), computer technology (85)
- Brand: Apple (25), Samsung (68), iRiver (78)
- Availability in stock: yes (456), no (12)
- Price: 100-1000$ (45), 1000-10000$ (12)

etc. The number of products (nodes) that meet these characteristics will be indicated in brackets. By clicking on the links, the user will narrow the search results.


On the one hand, this is an alternative to expanded filters in Views, on the other, an alternative to the standard advanced search.

Installation

Facets section

In this section, you can specify which facets to use when searching. For example, allow you to select materials by taxonomy, date added, or author. The number of facets depends on the included modules.

Results page section

Display style- search results display style: Extracts means to display as in a normal search (highlighted text, author, date); Teasers means displaying teasers of materials using the appropriate node.tpl.php.

Use the Extracts display style selectively- If the option is checked, then the style Extracts will always be applied if a keyword is entered. If you do not check this option, you can use the module as a replacement for navigating taxonomy terms.

Current search section

Allows you to turn on the block Current search , which displays the search terms:

In today’s article I’ll tell you about the Sphinx feature called multi-queries: its built-in optimizations, implementation of etc. faceted search, and in general how sometimes you can use it to make a search three times faster.

But first, 15 seconds of political information (you can’t praise yourself, no one will praise you). This year, Sphinx qualified in the SysAdmins and Enterprise categories (they say it missed out just a little in the Developers category). Voting will continue for another week (until the 20th). Except for the worker email addresses, do not need anything. Thanks in advance to everyone who won't let us go to waste!

And back to development. What are multi-requests anyway, and where does the promised three times faster speed come from?

Multi-queries is a mechanism that allows you to send multiple search requests in one packet.

API methods that implement the multi-request mechanism are called AddQuery() And RunQueries(). (By the way, the “regular” Query() method uses them internally: it calls AddQuery() once, and then immediately RunQueries()). The AddQuery() method saves the current state of all query settings set by previous API calls and remembers the query. The settings of an already remembered request will no longer change, any API calls will not touch them, so for subsequent requests you can use any other settings (another sorting mode, other filters, etc.). The RunQueries() method actually sends all the stored queries in one batch and returns multiple results. No restrictions are imposed on the requests participating in the package. The number of queries, just in case, is limited by the max_batch_queries directive (added in 0.9.10, previously fixed at 32), but this is generally only a check against broken packets.

Why use multi-queries? Generally speaking, it all comes down to performance. Firstly, by sending requests to searchd in one packet, we always save a little resources and time by sending fewer network packets back and forth. Secondly, and much more importantly, searchd gets the opportunity to perform some optimizations on the entire batch of queries. Over time, new optimizations are gradually added, so it makes sense to send requests in batches whenever possible - then when you update Sphinx, new batch optimizations turn on completely automatically. In the case where no batch optimizations can be applied, requests will simply be processed one at a time, without any visible differences for the application.

Why (more precisely, when) NOT to use multi-queries? All queries in a batch should be independent, but sometimes this is not the case, and query B may depend on the results of query A. For example, we may want to show search results from additional index only when nothing was found in the main index. Or simply choose a different offset in the 2nd set of results depending on the number of matches in the 1st set. In such cases you will have to use individual requests(or separate packages).

There are two important batch optimizations worth knowing about: optimization general inquiries(available since version 0.9.8), and optimization of common subtrees (available since version 0.9.10, which is in development).

General Query Optimization it works like this. searchd selects from the package all requests for which only the sorting and grouping settings differ, but the full-text part, filters, etc. are the same - and searches only once. For example, if there are 3 queries in a package, the text part of all of them is “ipod nano”, but the 1st query selects the 10 cheapest results, the 2nd one groups the results by store ID and sorts stores by rating, and the 3rd query simply selects the maximum price, search for “ipod nano” ” will work only once, but from its results 3 differently sorted and grouped responses will be built.

So-called facet search is a special case for which it is applicable this optimization. In fact, it can be implemented by running several search queries with different settings: one for the main search results, several more with the same search query, but different grouping settings (top-3 authors, top-5 stores, etc.). When everything except sorting and grouping is the same, optimization is turned on and the speed increases quite well (example below).

Optimizing shared subtrees even more interesting thing. It allows searchd to exploit similarities between different queries within a batch. Inside all those who came are separate - different! - full-text queries identify common parts, and if there are any, intermediate results calculations are cached and shared between requests. For example, in this package of 3 requests

Barack obama president barack obama john mccain barack obama speech

There is a common part of 2 words (“barack obama”), which can be calculated exactly once for all three queries and cached. This is exactly what shared subtree optimization does. The maximum cache size per batch is strictly limited by the subtree_docs_cache and subtree_hits_cache directives, so if the common part “i am” is found in one hundred million documents, the server will not suddenly run out of memory.

Let's go back to optimization about general queries. Here's a code example that runs the same query, but with three different modes sorting:
sorting modes:

Require("sphinxapi.php"); $cl = new SphinxClient(); $cl->SetMatchMode(SPH_MATCH_EXTENDED2); $cl->SetSortMode(SPH_SORT_RELEVANCE); $cl->AddQuery("the", "lj"); $cl->SetSortMode(SPH_SORT_EXTENDED, "published desc"); $cl->AddQuery("the", "lj"); $cl->SetSortMode(SPH_SORT_EXTENDED, "published asc"); $cl->AddQuery("the", "lj"); $res = $cl->RunQueries();

How do you know if the optimization worked? If it worked, in the corresponding lines of the log there will be a field with a “multiplier”, which shows how many requests were processed together:

0.040 sec x3 the 0.040 sec x3 the 0.040 sec x3 the

Pay attention to “x3”, this is exactly it - it means that the request was optimized and processed as part of a package of 3 requests (including this one). For comparison, this is what the log looks like, in which the same requests were sent one at a time:

0.059 sec the 0.091 sec the 0.092 sec the

It can be seen that the search time for each query in the case of a multi-query has improved from 1.5 to 2.3 times, depending on the sorting mode. In fact, this is not the limit. For both optimizations, there are known cases where the speed improved by 3 or more times - and not in synthetic tests, but quite in production. Optimization of general queries fits quite well with vertical searches for products and online stores, a cache of general subtrees, and also with data mining queries; but, of course, the applicability is not strictly limited to these areas. For example, you can do a search without the full text part at all and read several different reports (with different sorting, grouping, etc.) using the same data in one request.

What other optimizations can we expect in the future? Depends on you. So far, the long-term plan includes clear optimization for identical queries with different sets of filters. Do you know another common pattern that can be cleverly optimized? Send it!

We took a quick look at the installation and basic syntax of PINQ, a port of LINQ to PHP. In this article, we'll look at how to use PINQ to simulate the faceted search feature in MySQL.

In this article we will not cover all aspects of faceted search. Interested people can search suitable information in the Internet.

A typical faceted search works like this:

  • The user enters a keyword, or several keywords, to search. For example, “router” to search for products in which the word “router” appears in the description, keywords, category name, tags, etc.
  • The site returns a list of products that match these criteria.
  • The site provides several links to customize your search terms. For example, it may allow you to specify specific router manufacturers, or set a price range, or other features.
  • The user can continue to specify additional criteria search in order to obtain the data set of interest.

Faceted Search very popular and is powerful tool, it can be observed on almost any e-commerce related website.

Unfortunately, faceted search is not built into MySQL. So what should we do if we still use MySQL, but want to give the user this opportunity?

With PINQ, which has a similar, powerful and simple approach, we can achieve the same behavior as if we were using other database engines.

Expanding the demo from the first part

Comment: All code from this part, and from the first part, can be found in the repository.

In this article, we'll expand on the demo from Part 1 with a significant improvement in the form of faceted search.

Let's start with index.php by adding following lines:

$app->get("demo2", function () use ($app) ( global $demo; $test2 = new pinqDemo\Demo($app); return $test2->test2($app, $demo->test1 ($app)); $app->get("demo2/facet/(key)/(value)", function ($key, $value) use ($app) ( global $demo; $test3 = new pinqDemo\Demo($app); return $test3->test3($app, $demo->test1($app), $key, $value ));

The first route takes us to a page to view all posts that match the keyword search. To keep the example simple, we select all books from the book_book table. It will also display the resulting data set and a set of links to specify the search criteria.

IN real applications, after clicking on such links, all facet filters will adjust to the boundary values ​​of the resulting data set. The user will thus be able to sequentially add new search conditions, for example, first select a manufacturer, then specify a price range, etc.

But in this example we will not implement this behavior - all filters will reflect the boundary values ​​​​of the original data set. This is the first limitation and the first candidate for improvement in our demo.

As you can see in the code above, the actual functions are located in another file called pinqDemo.php. Let's take a look at the corresponding code that provides the faceted search feature.

Aspect class

The first step is to create a class that represents an aspect. In general, an aspect should contain several properties:

  • The data it operates on ( $data)
  • The key by which the grouping is performed ( $key)
  • Key type ($type). Can be one of the following:
    • specify the full string for an exact match
    • indicate part of the string (usually the initial one) to search by pattern
    • indicate a range of values, for grouping by range
  • if the key type is a range of values, you need to define a value step to determine the lower and upper bounds of the range; or if the type is part of a string, you must specify how many first letters will be used for grouping ($range)

Grouping- the most critical part of the aspect. All aggregated information that an aspect may be able to return depends on the grouping criteria. Typically, the most used search criteria are “Full String”, “Part of String”, or “Range of Values”.

Namespace classFacet ( use Pinq\ITraversable, Pinq\Traversable; class Facet ( public $data; // Original data set public $key; // field by which to group public $type; // F: entire row; S: start strings; R: range; public $range; // plays a role only if $type != F ... public function getFacet() ( $filter = ""; if ($this->type == "F") // entire line ( ... ) elseif ($this->type == "S") // start of line ( ... ) elseif ($this->type == "R") // range of values ​​( $ filter = $this->data ->groupBy(function($row) ( return floor($row[$this->key] / $this->range) * $this->range; )) ->select(function (ITraversable $data) ( return ["key" => $data->last()[$this->key], "count" => $data->count()]; )); return $filter; ) ) )

The main function of this class is to return a filtered dataset based on the original dataset and aspect properties. From the code it is clear that for various types accounts are used various ways grouping data. In the code above we showed what the code might look like if we group the data by a range of values ​​in increments specified in $range.

Setting aspects and displaying source data

Public function test2($app, $data) ( $facet = $this->getFacet($data); return $app["twig"]->render("demo2.html.twig", array("facet" = > $facet, "data" => $data)); private function getFacet($originalData) ( $facet = array(); $data = \Pinq\Traversable::from($originalData); // 3 creation examples various objects aspects, and return aspects $filter1 = new \classFacet\Facet($data, "author", "F");

$filter2 = new \classFacet\Facet($data, "title", "S", 6);

  • $filter3 = new \classFacet\Facet($data, "price", "R", 10);
  • $facet[$filter1->key] = $filter1->getFacet();
  • $facet[$filter2->key] = $filter2->getFacet();

$facet[$filter3->key] = $filter3->getFacet();

return $facet; )

In the getFacet() method we do the following:

Convert the original data into a Pinq\Traversable object for further processing

We create three aspects. The 'author' aspect will group by the author field, and implement grouping by the entire row; aspect 'title' - by the title field with grouping by part of the line (by the first 6 characters); aspect 'price' - by the price field with grouping by range (in increments of 10)

Finally, we extract the aspects and return them to the test2 function so that they can be output to the template for display

Finally, we display the raw data (along with filters) in the template. This route uses the same pattern we used in "demo2".

Search Bar

    (% for k, v in facet %)
  • ((k|capitalize))
    • (% for vv in v %)
    • ((vv.count))((vv.key))
    • (%endfor%)
    (%endfor%)

We need to remember that the aspects generated by our application are nested arrays. At the first level, this is an array of all aspects, and, in our case, there are three of them (for author, title, price, respectively).

Each aspect has a key-value array, so we can iterate over it using the usual methods.

Notice how we build the URLs for our links. We use both the outer loop key (k) and the inner loop keys (vv.key) as parameters for the route ("demo2/facet/(key)/(value)"). The size of the arrays (vv.count) is used for display in the template.

The first image shows the original data set, and the second image is filtered by price range from $0 to $10, and sorted by author.

Great, we were able to simulate faceted search in our application!

Before finishing this article, we need to take a final look at our example and determine what can be improved and what limitations we have.

Possible improvements

In general, it's very elementary example. We just walked through basic syntax and concepts and implemented them as a working example. As previously stated, we have several areas that could be improved for greater flexibility.

We need to implement “overlay” search criteria, since the current example limits us to the ability to apply search filtering only to the original data set; we cannot apply faceted search to an already filtered result. This is the biggest improvement I can imagine.

Restrictions

The facet search implemented in this article has serious limitations (which may also apply to other facet search implementations):

We fetch data from MySQL every time

This application uses the Silex framework. Like any single entry point framework like Silex, Symfony, Laravel, its index.php (or app.php) file is called every time a route is parsed and controller functions are executed.

If you look at the code in our index.php, you will notice that the following line of code:

$demo = new pinqDemo\Demo($app);

is called every time the application page is displayed, which means the following lines of code are executed every time:

Class Demo ( private $books = ""; public function __construct($app) ( $sql = "select * from book_book order by id"; $this->books = $app["db"]->fetchAll($sql ; )

Will it be better if we don't use a framework? Well, despite the fact that developing applications without frameworks is not a good idea, I can say that we will encounter the same problems: data (and state) are not saved between different HTTP requests. This is a fundamental characteristic of HTTP. This can be avoided by using caching mechanisms.

We saved several SQL queries by using aspects. Instead of passing one select query to retrieve the data, and three group by queries with corresponding where clauses, we ran just one where query, and used PINQ to get the aggregated information.

Conclusion

In this part, we implemented the ability to facet search a collection of books. As I said, this is just a small example that has room for improvement and has a number of limitations.

Faceted navigation - this is a type of site structuring in which users have the opportunity to specify different facets (desired parameters) in order to find the product or service they are looking for.

This allows visitors to the online store to easily navigate through the variety of products or services offered, quickly arriving at what they are looking for. Only in this case, each user searches along his own path.

The best way to demonstrate the principle of faceted navigation is with a specific example.

For example, you are going to buy mobile phone in the online store. Do you want to find a phone number? a certain model, colors, prices, brands. It would be easier and faster to find what you need by narrowing your search using several or all parameters (facets).

Such flexibility of the site structure allows you to easily create landing pages for individual keywords.

This may seem simple enough on paper. In practice, everything is much more complicated.

Let's consider the main difficult questions.

1. How many facets are needed for your site to be indexed well?

Ideally, the “depth” of a facet should not exceed 100 items. This will allow search robots to index all pages of the resource. Most website promotion specialists tend to believe that search robots can recognize more than 100 links on one page. Since most sites have navigation links on every page anyway, the number of product links per separate page should not exceed 100.

2. Facets and search filters

There may be options on your site that you want to offer to visitors, but that are not that important from a search engine optimization perspective. For example, it is very convenient for visitors to select products by the right size, and you may not actually be interested in indexing this particular resource page. In this case, use filters with using Java Script and block certain internal pages from indexing.

3. Sorting

You might want to include Extra options choice (for example, the price of a product, its popularity, etc.). Of course, this is very convenient for customers, but there is a risk of duplicate content. If you don't want the same page to be indexed by a search engine multiple times due to different navigation paths, use JavaScript or Ajax.

4. The problem of duplicate content

With a faceted site structure, the problem of duplicate content arises due to the presence of different navigation paths to the same page. And if you are not careful about this issue, you will end up with the same content on several pages.

The navigation path that a visitor uses to find a specific product is not important. It is important that only one of the paths is indexed. To do this, use a CMS. Otherwise, the same page will be indexed more than once.

And once again about the uniqueness of the content...

Let's say you've created a reasonable faceted navigation, relevant pages for each keyword or phrase, but despite all this, your site still contains many similar pages, the content of which is product listings. Therefore, each page must have its own unique content, and the more important the page, the higher the uniqueness index its content should be.

So here's what to remember:

  1. create as many facets as necessary to place no more than 100 products on one page;
  2. make sure that for each key phrase, for which you want to rank in search engines, there is its own landing page;
  3. incorrect sorting can lead to duplicate content, to avoid this use Ajax and Java Script to close some internal pages from indexing;
  4. no matter which navigation path the user uses to find a particular page, only one page should be indexed;
  5. do not forget: the information content should be more interesting and attractive.

World-renowned experts in the field of usability and UX. Every few years they study the success of search on websites ecommerce and share the results on their blog. The last study was conducted in 2017. Especially for you, we read the article with its description, translated it and formulated practical conclusions that will help you improve search on your own website.

Search algorithms

Support advanced search operator "quotes"

NNGroup write that most visitors to online stores do not know how to use advanced search operators. If they want to find a cat toy, they won’t search for “cat AND toy” to see all the products that have both keywords in the description. Therefore, it is not necessary to support such complex search queries.

Quotation marks are the only exception. If you enclose a phrase in quotation marks, the search will be based on a complete match with the phrase. This operator is used in Google search, and is widely known among advanced Internet users.

Automatically sort search results by degree of match with your query

IN search results Those products that satisfy all or most of the query keywords should be visible first.

Example. In previous studies, users of The Container Store website complained about inaccurate search results on the site. One user wanted to purchase a set of stainless steel storage containers with a clear lid. Upon requesting “steel glass container,” he received toilet brushes and glass jars. The user had to reformulate the search query several times, but without success.

Problem search engine on the site was that the search results displayed all products that matched at least one search word (“steel”, “glass” or “container”), not sorted by the degree of match with the original request. Product that matches all three keywords, could be anywhere in the list, not necessarily at the beginning. The site was subsequently updated search algorithm: Now at the beginning of the search results you can see products that match all or most of the query keywords.

Improved search results on containerstore.com: the first result for steel and glass canister matches the user's needs

When sorting results by product rating, consider its weighted value, not the average.

By sorting products by average customer rating, users don't want to see products with only one rating, even if it's 5 stars. People don’t want to stumble upon a custom review, and the average rating of a product based on a couple of reviews makes them suspicious. When sorted by weighted rating, a product with an average rating of 4.9 out of 5 and 342 reviews will be ranked higher than a product with an average rating of 5 out of 5 and 3 reviews. This way the user will be able to get an objective idea of ​​the popularity and quality of the product.

Design and position of the search bar

Display the search bar in one block with the navigation menu in the site header

This arrangement of the search bar is found on many sites, so users already know where to look to find it. In addition, displaying the search string in one block with navigation menu solves many problems, for example, with its absence on some pages of the site and the need to additionally repeat it on the search results page.

On the Wildberries website it is large and clearly visible search string located right in the site header

Display a search bar and a magnifying glass icon on the screen

When visitors to online shopping sites want to use search, they look for a wide empty field or a magnifying glass icon. At the same time, it is no longer necessary to explicitly sign the search string and call it “Search”, although it will not hurt.

Many sites in versions for smartphones successfully use the magnifying glass icon and do not show the line itself, which allows them to save space on the screen. But if your site's sales depend on search, it's better to display the search bar right away, even on small screens. This is especially true for PC versions of websites, where there is more than enough screen space. Use an empty field with a “Find” button or a magnifying glass icon. The field should be visible on every page.

Narrow your search results

Don't use advanced search or category search unless you're Amazon

In the past, many online stores used advanced search and category search features to help users narrow down the number of items in their search results. However, people don't really use advanced search and often get confused when searching by category, so these features have gradually fallen out of fashion.

Such advanced search methods are now only available on those sites where they are truly useful. These are either sites with special search scenarios, like eBay, or online stores with a huge number of products, such as Amazon and Wal-Mart.

In other cases, it is better to use faceted search. His key difference from category search is that users narrow the selection of products AFTER they receive results for a search query, and not BEFORE.

Use faceted search

Faceted search allows users to narrow down their search results using filters based on the attributes of the products users are viewing. If previously faceted search was a nice addition to an online store, now users are so accustomed to it that they look for it on the site and express dissatisfaction if it is not there. Nowadays, e-commerce sites without faceted search are the exception rather than the rule.

Faceted search on the website of the Utkonos online store: filters on the left allow you to narrow down the results

Autocomplete in the search bar

Support autocomplete feature

The autocomplete feature is that as the user types a word in the search bar, he sees recommended queries in a drop-down list. If a request from the list suits the user, then this saves him time and also helps to avoid typos and other errors.

The autocomplete feature was present on most of the sites NNGroup studied. At the same time, the study showed that users chose options from the list of proposed ones not so often - in only 23% of cases. Typically, they would just continue typing their query.

However, autocompletion is useful. Even if users don't select an option from the list, they can see and understand what products are available on the site and what other shoppers are looking for.

Support advanced autocompletion

Autocomplete searches containing recommended products, photos and other content in addition to the list of queries are a trend that is gaining popularity on some e-com sites. He appeared about five years ago, but quickly disappeared, and now he has returned to new form, reminiscent of a megamenu - a drop-down field with recommended query options takes up quite a significant amount of space on the screen.

Search with advanced autocompletion in the Labyrinth online store

NNGroup's research has shown that this feature works best on sites with a variety of product categories or products that are visually very different from each other.

Basic search problems

Key problems that make searching on the site difficult:

  • discreet search function: for example, hidden behind a small magnifying glass icon on big screen or on the hamburger menu mobile version website;
  • an insufficiently “smart” search string that is unable to handle typos, errors or synonyms for query keywords;
  • non-standard display of results (switching between pages, sorting, filtering);
  • poorly thought out filters (irrelevant attributes, poor functionality, empty results).

How USABILITYLAB can help improve search on your online store website

This concludes our analysis of the NNGroup article. We hope it was useful to you.

If you would like to evaluate or improve search on your site, please contact us. We will carry it out. For usability testing, we will involve representatives of your target audience. They will work on your website under the supervision of our expert. Our laboratory is equipped with a one-way mirror, so you can also be present during testing and see everything that respondents do. Based on the testing results, we will draw conclusions about how well the search on your site meets the needs of users and formulate recommendations for its improvement, which you can pass on to your developers.

To learn more about our services, leave a request on our website or write to Dmitry Silaev: