The U.S. District Court for the District of D.C. has released a Memorandum Opinion, a written decision on the antitrust lawsuit against Google.
Document No. 1436, which mainly includes: “Prohibits exclusive agreements for distribution of search, Chrome, etc.”, “Determination that Chrome split is not necessary”, and “Obligation to share search index and user interaction data to a certain extent.”
It is a primary source.
Source: United States v. Google LLC Memorandum Opinion (Document 1436)
Here, we will only pick up the SEO context part.
This is not to say that this is all there is to it, but it may be easier to understand if you keep in mind some of the terms that were first introduced as a common language.
This may be something that is already well known in the field of SEO and web management, but now that it has been presented again, I think it is information that can be easily used in various areas.
Points to keep in mind from the text of this decision
In a nutshell, I would say that I learned specifically what kind of data Google uses, for what purpose, and how they incorporate it into their search engine and generative AI. I don’t have the impression that there were any great discoveries.
-
- The importance of user behavior data is clear
Crawl prioritization, index freshness management, ranking adjustments, and product evaluation, Google’s ongoing use of user behavior data in many of its search engine processes is evident in the ruling text.
This is a key component of improving search qualityIn other words, it is not only about click rate and stay, but also about designing a “sense of task completion” (shortening the diameter of information arrival, internal leads, FAQs and steps). Without post-landing satisfaction, shortcut measures to increase click-through rates alone will have a large negative impact.
- Many Google internal code names have been released for better understanding.
Navboost, GLUE, RankEmbed, FastSearch, MAGIT, and others have been on the fringes of the word but unclear until now, Google’s internal designations and the role of each algorithm, which have been clarified by the release of the ruling text. With the release of the ruling, these names and the roles of each algorithm have been clarified.
This makes it easier to understand the relationship between search and generative AI. It may be that now that we know the names, it is easier to understand, but I think it is a good thing that a common language is created.
- The importance of user behavior data is clear
Details of Google’s internal codenames and their role
I have summarized each code name based on the information in the decision letter, as well as a memorandum.
The code names that have emerged this time can be divided into two categories: those related to AI and those related to the rankings that Google originally used. As a whole, I think it would be good to have the following image. (*This is just a guess.)
-
Related to the search stack (part of the overall ranking)
Responsible for understanding the query, getting candidates from inside, scoring and ranking them, and (if necessary) making the decision to launch AIO. Used here are RankBrain / DeepRank / RankEmbed(BERT), Navboost / Glue / Tangram -
Related to generating AI mode (AI Overviews)
MAGIT (group of models) + custom Gemini. grounding and creating outputs based on search signals and indexes. signals from search side models like RankBrain are also applied around AI.
The overall project name is Project MAGI
Originally the name of an internal project to integrate generative AI into Google Search. The project deals with the underlying technology behind the new search engine experience (≓AI mode).
The SGE (Search Generation Experience), which was already publicized in 2023 and not otherwise hidden (before that, anyway), is thought to have resulted from the first artifact of the Magi project and the various generative AI ideas that were incorporated and eventually realized as the “AI Overviews” feature. The SGE (Search Generation Experience) was not hidden (even before that).
The MAGIT, discussed below, is the “core model (group)” relationship that produces the AIO’s generated text.
AI outputs involved
MAGIT
A custom model thatgenerates natural sentences for AI Overviews based on search data and instruction (prompt) information for summarization.
The Gemini infrastructure model is described as being optimized for AI summarization for search.
We are repeatedly making adjustments using search data, and the adjustments here are likely to be highly relevant to the algorithm.
It is stated that click and query data is not used for pre-training of the base model, but since it is only the base, it is customized from time to time based on user behavior. The decision document also states that GLUE, described below, is used for adjustment.
No information is available about the meaning of the acronym (what it stands for) or details of its internal structure.
Not limited to AI (search ranker group)
The following is what Google uses throughout its search system, not just for AI. When the AI Overview is published, this is where the non-AIO part of the search algorithm is located.
GLUE (super query log)
As the name implies, it is a ” super search behavior log ” used inside Google Search. Specifically, it is “a log that aggregates query content, device information, and overall search behavior such as clicks, hover, scrolling, and spelling correction” on search results. It also includes the Navboost data described below.
The 13-month span is probably due to the fact that we want to include seasonal factors in the ranking signal.
Google is taking detailed data because they consider user behavior to be very important as an evaluation axis.
Not only in AI mode / AI Overview, but also in the conventional search tab, this Glue-derived signal is considered as an important input. Therefore, it is probably the main “axis of judgment” in Google’s mind, not only in terms of AI. The same is true for the Quality Rater criteria in the Quality Assessment Guidelines.
*The NLP benchmark GLUE / SuperGLUEto is unrelated.
Navboost (NavBoost)
This data is collected from “queries x click behavior to results”. The data is from the last 13 months, which seems to be covered by GLUE, but I think it is reasonable to assume that NavBoost came first, and then GLUE began to take data comprehensively and broadly, including NavBoost.
Google stated in the trial documents that NavBoost is a “very important rank factor,” which probably means that of all the data Glue is taking, the most heavily weighted is the click behavior NavBoost is capturing.
NavBoost has been talked about for quite some time, including the name.
Specifically, as of 2005, it is said to be “a fundamental system built into Google search that records the behavior of users clicking, browsing, leaving, etc., in response to search queries, and reflects this information to improve rankings for similar searches in the next round.
RankEmbed/RankEmbedBERT
It is a deep learning rank model for Google search that produces a deep learning signal. It is described as a “top-level signal,” so we assume it is of high importance.
This role can be described as learning by log data and quality rater (quality rater) scores for about 70 days, and adjusting for a deeper understanding of intent and purpose. It is said to be particularly effective in improving quality for long-tail queries.
FastSearch
It specializes in quickly returning short, summarized, ranked web results using RankEmbed-derived signals.
It is also said that since speed is a priority, it is easy to get hit by spam. Another theory is that they do not give much weight to backlinks, etc., which may be one reason for the difference between Ali AI Overview and the traditional search results section.
GCC (Google Common Corpus)/Docjoins
A large web corpus (Google Common Corpus) and its management system (Docjoins) used to pre-train the Gemini model, larger than Common Crawl. It is said to contain search metadata and signals (including aggregated user behavior derivatives). No details are given in the document.
otherwise
Gemini Nano/AICore
A small LLM (Gemini Nano) running on a terminal and its execution environment (AICore). It features high-speed inference. Although not directly related to SEO, I would like to add this.
It would be quicker to take a look at the development materials here.
https://developers.google.com/solutions/pages/android-with-ai?hl=ja
Gemini Nano is a model in the Gemini family optimized to run on devices, integrated directly into the Android OS via AICore. This allows it to deliver a generative AI experience without the need for network connectivity or sending data to the cloud.
Vertex AI (Search grounding)
One that allows existing in-house search and external search services to connect to Gemini and produce answers based on their own data. A cloud feature that allows third parties to ground their search with Google search and other data.
In terms of Google search, I am responsible for the function of searching for the latest information on the public web and having it answered with a citation.
Vertex AI Platform | Google Cloud
https://cloud.google.com/vertex-ai?hl=ja
Key Points for Utilizing User Behavior Data
User behavior is more important than you might think as a factor for exposure on Google
User behavior is positioned as an important component of search quality.
- Google retains click/stay/return and other behavioral logs for a long period of time (in principle 13 months for NavBoost), which is reflected in rankings, index operations, and freshness maintenance.
- Satisfaction is estimated by “good clicks,” “bad clicks,” “last longest viewed clicks,” etc., and used for the next ranking adjustment.
- Positioning of Glue for real-time movements (sudden changes in news, etc.) and Slices for contextual segments.
The quality of clicks, last long viewed results, dwell time and return rates, hover and scroll on SERPs, and other micro-behavioral data are used by Google to adjust rankings and determine search quality.
User behavior is widely used.
It is also used to improve the accuracy of local searches, especially in mobile environments, for long-tail queries and new topics.
Practical measures would be in the direction of content design to reduce direct returns and short-term abandonment, to increase user circulation and stay, to develop structured data, and to present information in a concise manner that summarizes the main points.
So it is also important to have a content structure and FAQs that are easily adopted by the AI summary function. This area is instantly suggested and stiff, but it seems to have a certain value.
The well-known PageRank is treated only as a “single signal of distance from a known quality source,” and I have the impression that it is less important than user behavior data.
summary
The text of the ruling makes it clear that Google is designing its search engine with user behavior at the center.
In the field of SEO and web operations, this is something we have already known for some time, but now that it has been reiterated, I think it is information that is easy to use in various areas. However, I think it is good that it is easy to understand when a common language is created, even if it is just a matter of knowing the name.