Query selection and clustering. Automatic clustering (grouping) of keywords

Query clustering sorts (splits) a list semantic core(SY), into groups based on similarity, which makes it possible to further optimize site pages for them.

How are queries clustered?

The tool analyzes Yandex output for each request and compares it with the output of other requests from the list. If the same relevant pages are in the TOP-10 for different queries, then these queries are defined as similar and placed in the same group. This means that one page can be optimized for them.

The query clustering threshold is the number of matching relevant pages in the search results for different queries. Simply put, if you enter two queries into Yandex and the TOP 10 results contain two identical pages (two out of ten), then when the “clustering threshold 2” is set, these two queries will be placed in one group.

Disadvantages of manually grouping requests

Grouping of key queries, also known as splitting, is performed by SEO optimizers immediately after collecting keywords.

  1. If there are a large number of requests, it is difficult to manual mode to determine their similarity to each other, you have to either enter each query into the search, or rely on intuition/experience, which can play a cruel joke during promotion and not give the desired results.
  2. High cost, which was formed due to the length of the process. A high-quality breakdown of semantics with 500 queries on board takes an average of 4..16 hours. It is necessary to read each request, determine its group (the presence of which must be kept in mind), if necessary, double-check with search or services... brrrr.

Advantages of automatic query grouping

  1. The speed of breakdown is approximately equal to the speed of sound. The system will check the output of each request, compare them and give the opportunity to correct possible minor exceptions manually, after which the result can be uploaded to a CSV file (Excel).
  2. The accuracy of the result is achieved by eliminating the human factor. A person can get distracted and lose a thought, forget, misunderstand, or simply not be able to do the breakdown correctly; such difficulties are not observed with the program.
  3. The tool is provided completely free of charge; it does not require monthly wages, vacations, or sick leave; He also does not have a work schedule: he works 24/7.

Breakdown is a very important process during promotion; it sets goals for optimizing each page of the project and the entire site as a whole.

This is a group keywords, which are simply a list, dividing them into clusters (groups). This is what turns a thousand of your queries into a complete structure, divided into categories, pages, articles, etc. Without the correct breakdown, you will waste a lot of money and time “idle”, since some requests cannot be “landed” on one page. Or vice versa, keywords require that these queries be on the same URL.

When collecting a semantic core (SN), I usually do clustering by hand, using , here are links on the topic:

But all this is easy and simple when we have clear groups of queries with different logical meanings. We know very well that for the query “Stroller for twins” and “Stroller for boy” there must be different landing pages.

But there are requests that are not clearly separated from each other and it is difficult to “feel” determine which requests should be placed on one page, and which requests should be scattered across different landing URLs.

One of the participants in my SEO marathon asked me a question: “Petya, what to do with these keys: put everything on one page, create several, if so, how many?” And here is an excerpt from the list of keywords:

The word “java” alone is used in three variations (“java”, “java”), plus to all this, people are looking for it for different games, devices, etc. There are a lot of requests there and it’s really hard to understand what’s the best thing to do.

What do you think is correct? Right. The best approach is to analyze competitors who are already in the TOP for these keywords. Today I will tell you how you can cluster the semantic core based on data from competitors.

If you already have a ready-made list of keywords for clustering, you can immediately move on to point 4.

1. Query matrix

Let me take another example: I have one client with an online store of electrical and lighting equipment. The store has a very large number of products (several tens of thousands).

Of course, any store has products that are the highest priority for sale. These products may have high margins, or you simply need to get rid of this product from the warehouse. So, I received a letter, something like this: “Petya, here is a list of products that are interesting to us.” And there the list was listed:

  • switches;
  • lamps;
  • lamps;
  • spotlights;
  • extension cords;
  • and a few more points.

I asked to create a so-called “query matrix”. Since the store owners know their product range better than me, I needed to collect all the products and the main characteristics/differences of each product.

It turned out something like this:

When compiling the matrix, do not forget that some English-language brands are also requested in Russian; this must be taken into account and added.

Of course, if the product had other characteristics, a column was added. This could be “Color”, “Material”, etc.

And such work was done for the highest priority goods.

2. Multiplying queries

There are many services and programs for multiplying queries. I used this key phrase generator http://key-cleaner.ru/KeyGenerator, we enter all our queries there in columns:

The service multiplied all sorts of options with the word extension cord. Important: many generators multiply only consecutive columns, that is, 1 column with the second, then the first two with the third, etc. And this one multiplies everything from the first column with others: the first with the second, then the first with the third, fourth; then first*second*third, first*second*fourth, etc. That is, we get the maximum number of phrases containing the main word in the first column (this is the so-called marker).

Marker- this is the main phrase from which you need to generate a key. Without a marker it is impossible to create an adequate key query. We don't need the phrases "IEC wholesale" or "buy on reel".

When multiplying, it is important that each key phrase has this marker. In our example, this is the phrase “extension cord”. As a result, it was generated in in this example 1439 (!) unique key phrases:

3. Clearing requests from "garbage"

Now there are 2 options for the development of events. You can start clustering all these queries and create a pump great amount generated pages for each cluster, if your website system allows this. Of course, each page should have its own unique meta tags, h1, etc. Yes, and sometimes it’s problematic to put these types of pages into the index.

We didn’t have such a possibility technically, so we didn’t even consider this option. It was necessary to create only the most necessary new landing pages in a “semi-manual” mode.

What type of frequency should I work with? Since our list of products + intersections were not very popular (narrowly targeted), I focused on frequencies with quotes(without exclamation marks) - that is, in various word forms. These are key phrases in different cases, number, gender, declension. It is this indicator that allows us to more or less estimate the traffic that we can receive from Yandex if we get into the TOP.

In Key Collector we remove the frequencies in quotation marks for these phrases (of course, if you have a seasonal product, then you need to remove the frequencies in the “season”):

And we delete everything that is equal to zero. If you have a more popular topic and a lot of words with non-zero frequency, you can increase the lower threshold to 5, or even higher. I have only 43 non-zero queries out of 1439 phrases for the Moscow region and the region.

I transfer these 43 phrases with frequency data into Excel:

4. Query clustering

I do all this in Rush Analytics, here is the clustering algorithm in this service:

For each request, the TOP-10 URLs for a given region are “pulled out” from the search results. Next, clustering occurs using common URLs. You can set the clustering accuracy yourself (from 3 to 8 common urls).

Let's say we set the accuracy to 3. The system remembers the URLs of pages that are in the TOP 10 for the first request. If the second request from the list in the TOP 10 contains the same 3 URLs that the first had, then these two requests will fall into 1 cluster. The number of shared URLs depends on the precision we specify. And such processing occurs with every request. As a result, keywords are divided into clusters.

  1. Go to RushAnalytics -> Clustering, create a new project (upon registration, everyone receives 200 rubles in their account for testing, convenient):
  2. We choose a priority search engine for us and a region:

  3. Select the clustering type. In this case I choose "Wordstat". The "Manual Tokens" method does not work for me, since there is only one "extender" token in the requests. If you are loading several different types of products at once (for example, an extension cord, a light bulb, etc.), then it is better for you to select the “Wordstat + manual markers” type and specify the markers (the markers will need to be marked with the number 1 in the second column, and not markers number 0, frequency will go to the third column). The markers will be the most basic queries that are not logically connected in any way (the query “extension cord” and “light bulb” cannot fit on one page). In my case, I work step by step with each product and created separate campaigns for convenience. You also select the clustering accuracy. If you don’t yet know which method to choose, you can check everything (this will not affect the price in any way), and then, after receiving the result, you can choose the option that best clustered your queries. From experience, I will say that the most suitable in all topics is accuracy = 5. If you are doing clustering for an existing site, I recommend that you enter the URL of your site (if your site is in the TOP 10 for the request, then your URL will highlight in green received file):

  4. In the next step, upload the file to the system. You can also set up stop words, but I had a file without them, so this function not needed in this example. The price of clustering is 50-30 kopecks per 1 request (depending on the volume):
  5. You will need to wait a little while the Rush Analytics service does its job. Enter the completed project. Already there you can view the clusters based on the clustering accuracy (the beginning of a new cluster and its name are highlighted in bold):
  6. Again, it is best to use precision 5 for clustering. It fits most often.
  7. Also in the next tab you can see a list of non-clustered words:

    Why didn't they cluster, you ask? Most likely, the results for these requests are not of very high quality and it was impossible to automatic mode assign these requests to some cluster. What to do with them? You can cluster manually and create separate landing pages according to logic, if possible. You can even create a separate cluster for one request and “plant” it on a separate page. Or you can expand the list of words and re-cluster in the Rush Analytics service.
  8. In the "Subject Leaders" tab you can see the TOP domains for these queries:

  9. By the way, in some queries you can see thumbs up like this, highlighted in green:
    This means that according to these requests, you already have a landing page for this cluster in the TOP 10 and you need to work on it.
  10. You can download this whole thing to your computer in Excel and work in this document. I work with precision 5, so I download this file:

  11. IN Excel document the same information. The beginning of each cluster and its name are highlighted in gray (click on the image to enlarge):

  12. In addition to the names of the clusters, here you will see their sizes, frequencies, total frequencies, Top URL, relevant URL and highlights, which is very necessary when working on a landing page. Here they are:

    Please note that the “Universal” brand (via “U”) is also highlighted, and I didn’t even suspect that this brand could be registered in this way. In the highlights you will also see synonyms and thematic phrases that are highly desirable to use on landing pages to reach the TOP.

Conclusion

What's next? What will this clustering give us? Now for each cluster on our website there should be a separate, and most importantly relevant url. The promotion of these pages is completely in our hands and we promote them further as best we can (content optimization, internal linking, external optimization, social factors, etc.).

If we did the wrong clustering, then it would be difficult to advance a lot of requests. This would be an "anchor" that would hold us back, even though we would spend a ton of money promoting these pages.

Correct clustering will help you save a lot and make it much easier to get into the coveted TOP.

What do you think about it? How do you cluster semantic core queries?

And key clustering search queries. Grouping errors will cost you valuable time, money and other problems. In this article I want to tell you the main principles and rules of grouping, as well as show examples of services and programs.

Keyword Clustering

I highlight 2 main points when grouping:

  1. requests must fit each other in a logical sense
  2. requests must show the same results in Yandex

From a logical point of view, everything is clear - you cannot put the keys “buy a phone” and “car painting in Omsk” on one page. One way or another, the requests must fit each other in meaning. If we have a page about finishing ceilings in an apartment, then all requests should be about finishing ceilings.

With the issuance check, everything is not so clear. In general, the essence is the following - we enter queries into Yandex in “incognito” mode, select the region of promotion and see how much the search results overlap.

Let’s say there are 2 requests “finishing ceilings in an apartment” and “finishing ceilings in a bathroom,” you need to understand whether these keys will fit on one page or not. Open 2 windows in Yandex and enter these queries.

It is immediately clear that in the first case it is clearly stated about finishing the ceilings in the apartment, and in the second case - in the bathroom. This means that the requests lead to different pages and cannot be combined.

Here’s another example: the phrases “buy heating batteries” and “buy heating radiators.” It seems that the requests are different, but let's see the results.

As you can see, the output is the same - both batteries and radiators are present. Therefore, these 2 requests can be safely placed on one page.

Programs and services for keyword clustering

Clustering the semantic core in Excel is quite simple - you put all the queries into the program and start grouping them by hand. You use the principle of grouping as I wrote above. That is, first we group by meaning, then we check the Yandex output.

But, by the way, it happens that the results are “cloudy” for two or more queries, and it is not clear where to place them together or separately. This means that the competition here is small and the search results are not clearly formed, which means it won’t be a mistake to place queries together or on different pages, as is more convenient for you.

Here is an example of semantic core clustering in Excel.

I often use this method myself; if the topic is not complex and there are not many keywords, 100-200 keywords are quite suitable.

Watch the video on how to cluster a kernel in Excel.

You can also use the free one as an alternative to Excel. online service manual clustering kg.ppc-panel.ru.

Automatic clustering

If the semantic core is very large, then I use the service for automatic clustering of search queries seopult.ru. This is a VERY cheap service compared to analogues.

Its only drawback is that the grouping is not entirely accurate, since you still need to eventually revise the clustering and correct shortcomings manually.

Although, I think that there is NOT ONE service that would do 100% correct grouping. Even companies that only deal with collecting and clustering semantics still manually check and edit the final result.

Here short review on setting up the project.

The service will calculate how much kernel clustering costs and offer to launch the project. This is the paid grouping option that I use, and it suits me quite well.

And here detailed video how to use the tool:

Clustering requests in key collector

This method is also quite widely used, but as elsewhere, it still needs to be modified manually.

Load the semantic core into the program and select the promotion region.

In today's episode of On the Board about semantics and structuring of keywords for the site.

About what clustering of the semantic core is. Why do you need to cluster and how can you do it?

He talks about it Oleg Shestakov, founder of Rush Analytics.

The video turned out to be quite voluminous. It contains the main nuances associated with clustering.

Let's move on to watching the video:

Photo from the board:

Important: If you have questions, feel free to ask them in the comments. Oleg will be happy to answer them.

Video transcript

1. What is clustering?

Clustering using the top similarity method is a grouping of keywords based on an analysis of search engine results. How does this happen?

  • We take two queries, for example, “lip gloss” and “buy lip gloss.”
  • We collect for each of the requests search results, we save 10 urls from each issue and check if there are common urls in both issues.
  • If there are at least 3-5 (depending on the clustering accuracy that we specify), then these requests are grouped.

2. Why do clustering?

Why has the clustering trend been on the market for about a year and a half now? Why is this important and how will it help?

  • Save time. Clustering is a wonderful technology that will help reduce the routine when working with semantic core grouping. If an ordinary semantic core specialist parses 100,000 keywords, separating them into groups, about 2-3 weeks (or even more if the semantics are complex), then a clusterer can separate this in order of priority in about an hour.
  • Allows you to avoid the mistake of promoting different requests to one page. Yandex has classifiers that evaluate commercial requests. For example, the results for informational requests and commercial ones are completely different. The queries “lip gloss” and “buy lip gloss” can never be pushed onto the same page.

1) For the first request (“lip gloss”) there are information sites (irecommend, Wikipedia). An information page is needed for this request.

2) For the second request (“buy lip gloss”) - commercial resources, well-known online stores. This request requires a commercial page.

That is, for different requests we need different types pages. A common mistake an optimizer makes is when it promotes everything together on one page. It turns out that half of the semantic core makes it into the TOP 10, and the second half cannot get there. The clusterer allows you to avoid such errors.

To prevent this from happening, you must initially correctly group requests by type of page in the search results.

3. How does clustering help in promotion?

  • data processing speed,
  • classification of pages for which promotion is done.

If the site structure is grouped and internal optimization is done correctly, then this is already half the battle, if we are talking about the Russian market. Naturally, links will be required for Western markets. In our experience, somewhere around 50-60% of requests with proper clustering and proper text optimization simply reach the TOP without any external intervention. For online stores or classifieds (aggregators and portals), in principle, texts are not even needed.

Clustering is the key to correct ranking. At the moment, there is no point in fighting search engine rankings, but it’s easier to adapt to this ranking and enter required types pages and progress successfully. Changing the paradigm of promoting a particular topic is more unrealistic than real.

4. What are the clustering methods? (Hard/Soft)

Soft - this is what was described earlier. A marker request of some category of an online store is taken, other requests are linked to it, and the results are compared. “buy lip gloss”, “buy lip gloss in Moscow”, “buy lip gloss prices” - they have 4-5 connections with the main request.

These requests are bound. This completes the check, a cluster of keywords is obtained and it can be promoted.

But there are more competitive topics, for example, plastic windows. Here you need to check that all requests that were tied to the main can be promoted with each other.

We need to compare whether there are any results for these queries

same url. We compare the results not only with the main request, but also with each other. And we group only those requests that can be related to each other.

For most cases, Soft clustering is sufficient. These are online stores (not very competitive categories), information resources.

5. Clustering in Rush Analytics

We have a clustering module and 3 types of clustering:

  • According to Wordstat. The simplest and least time-consuming method from an optimizer's point of view. Ideal for situations when we know almost nothing about the structure of the site.

1) In Excel, load keywords into one column, frequency according to Wordstat into another, and send for clustering.

2) We sort the entire list in descending order: the most frequent words (usually the shortest) are at the top.

3) The algorithm works like this: we take the first word, try to link all other words to it, and group it. We cut out everything that has become attached, re-sort it and repeat this iteration again.

4) From the list of keywords we get a set of clusters.

By markers

Suitable for sites where the structure is defined. Works very well in e-commerce (for example, online stores).

1) We know the marker request (the main request of the page or several requests under which it is promoted).

2) We take a list of keywords, in the column on the right we mark marker queries with ones, and all other queries with zeros.

3) We take a marker keyword and try to link other keywords to it and group them into clusters. It is important here that in this algorithm the marker words that we marked with ones will never be related to each other. We won't try to tie them down.

Combined clustering

This algorithm combines the previous two

1) We load keywords, mark “token/non-token” and frequency.

2) We bind all the words that we can bind to marker queries.

3) We take keywords that remain unlinked and group them together using Wordstat.

4) Everything else will be classified as “non-clustered”.

5) As a result - a structure that we already know. We will also get automatic clustering of all other keywords, which will help us expand the structure. All of these types of clustering are available in Rush Analytics.

What other tools are there on the market?

Among the worthy ones, besides Rush Analytics, we can highlight the JustMagic service, where there is both Hard and Soft clustering. The service was developed by Alexey Chekushin.

That's all you need to know about clustering to get started with keyword grouping.

Use clustering and save your time. In addition, people often make mistakes; the error rate of the optimizer is about 15%. Entrust the routine to robots - no need to sort it out by hand.

Which I add a little bit all the time. But I have written practically nothing about what keyword (search) word clustering is and how to do it.

So, in order to get started, we need:

  • Semantic core (1 piece),
  • Tools for clustering (2-3 pcs),
  • Stock of patience (2 kg).

In order to understand how search words are clustered, we need this very list of words. I wrote how to assemble it more than once, so I won’t repeat it. Let's imagine that the semantics are collected, the tea is brewed, and a small cart of patience is waiting at the desktop.

What is clustering?

We have several terms that are critical to our work. So, we'll start with them:

Cluster analysis - a multivariate statistical procedure that collects data containing information about a sample of objects and then arranges the objects into relatively homogeneous groups

(c)Wikipedia

Clustering of the semantic core– organizing the list of keywords, creating promotion clusters and separating keys into relevant pages.

How is keyword clustering achieved?

Clustering... or grouping of keywords is possible according to several principles. There are a lot of proprietary technologies floating around the Internet, but basically I would highlight 2 main principles:

Manual clustering of search engines queries (suitable for new sites that are only in the project, the ability to set semantics at the start of the site launch) - it is assumed that you collect keywords by immediately (or later) specifying groups manually.

Example. You can collect keywords for a small business card website that you want to show to users in organic results. For example, the site sells services in the field of apartment renovation...

The principle of collecting a semantic core for a small website

The services themselves are divided into several categories, for example, finishing work and interior finishing work. Each of the directions is divided into a group, i.e. you will already have 2 groups. Next, you analyze search queries and form a separate core for each group. As a result, you get a clustered semantic core, for example, in the form of a table with fields:

  • Keywords
  • Frequency
  • Page URL
  • Group

And then, using a filter in the table, you sort by groups of keywords. As a result, you have lists of words for each of the pages (sections) of the site, which together constitute a clustered semantic core.

How to collect semantics for a project and cluster it most effectively?

Let's take as an example what is described above and look at the expected structure of the site.

Also, we can add some additions to our keyword clustering.

Keywords for main– this cluster should include the most important keywords for your site. To which the page itself is relevant. (if you offer apartment renovation services, an example of the request “apartment renovation in Kyiv” is quite suitable). We will receive a list of requests for more general content in our niche.

Services and product pages– clustering of the semantic core begins for these pages with a logical division of importance. What is more important to you, kitchen remodeling services or “bedroom remodeling services” or do they all have the same priority? This cluster should include words that will correspond to a user query on the topic of services, for example: “construction crew services.”

Articles and Blog– clustering of the semantic core will contain information queries. For example: “how to whitewash a wall yourself” or “manufacturers of wall paints”, etc. Do not neglect such sections of the site, despite the fact that you have a commercial site and only pages with services bring profit, regular and useful content will create stable traffic for you and help convert readers into clients.

Automatic clustering of the semantic core on an existing website

If you decide to do SEO optimization existing site and don’t know where to start, check what keywords you can use to do this.

For example, this can be done using Serpstat. Just enter the address of the page you are checking. All you have to do is see what keywords you already have positions for.


In the example I entered the address home page and received a list of key phrases with positions, and in the URL table I found links that are displayed in search queries. By clicking on the link, I received a list of relevant phrases for a specific page.

This way you can see not only what positions your site ranks, but also cluster search queries using Serpstat.

To be continued…

Let's look at it soon:

  • Tools for manual clustering of search queries,
  • Tools for automatic clustering of search queries.

P.S. If you want to do search query clustering but don't have the time. You can post a link to your project in the comments, and I will write material on a specific example on the topic of how to practically implement clustering of the semantic core.

Publications on the topic