Your website needs a full body check! A Preliminary Study of Google URL Inspection API

On January 31, 2022, the Google Search Center released a new API called URL Inspection API, which allows developers to check the URLs of their own websites in batches, confirm the index status in a timely manner, and conduct debugging and processing.

What kind of website is suitable for?

Thousands of URLs are generated every day
Want to do health check for a large number of URLs
Great emphasis on real-time search performance
Want to optimize website SEO

What is URL Inspection Tool?

When you publish a URL and want to make sure that Google has noticed your content, you can drop the URL in and check it out. If it shows a tick, it means that he has read it. When someone searches for relevant keywords, your web page will be displayed. will effectively appear in search results.

What other information can this API find out?

What does Google crawl?
Retrieval time, retrieval status, web page capture status
What is the standard URL chosen by Google?
Are the pages mobile friendly?
Do AMP pages work?
What does the structured data that Google crawls look like?
Does robots.txt block this page?

The information of Google Search Console inspection is as follows

However, it is too troublesome to enter the URLs one by one, and we will publish new reports all the time, which need to be retrieved immediately. At this time, this API will present the retrieval status of Google in large quantities and in real time.

Concatenation steps

This time we are using Python to string!

1. Scepter setting

The most troublesome part of this is to apply for a service account and OAth 2.0 client ID (Client ID), change the permissions of the service account to “complete” or “owner”, and add a new key, you will get a json file, including the following information: (please note that it cannot be outsourced)

2. Install the Python suite

Chaining API Necessary Kits

pip3 install –upgrade google-auth
pip3 install google-api-python-client==1.12.10
pip3 install requests

After the completion, you can start writing the code string!

1. Put the above json file in the folder and add a py file

from google.oauth2 import service_account

from google.oauth2.service_account import Credentials

from googleapiclient.discovery import build

2. Set the scepter

creds="key.json" #此處填入你的json檔名
scopes = ['https://www.googleapis.com/auth/webmasters',
'https://www.googleapis.com/auth/webmasters.readonly']
credentials = service_account.Credentials.from_service_account_file(creds, scopes=scopes)
service = build('searchconsole','v1', credentials=credentials)
3. Tell the API what data you want, and he will return the corresponding INSPECTION data
#inspectionUrl = 你想檢索的網址；siteUrl = 主網站
request = {
  'inspectionUrl': 'https://www.cna.com.tw/news/afe/202203170150.aspx',
  'siteUrl': 'https://www.cna.com.tw/'
}
response = service.urlInspection().index().inspect(body=request).execute()
inspectionResult = response['inspectionResult']
print(inspectionResult)
After the test is no problem, you can write a circle and grab the information you need!
Concatenation Considerations

Quota limit 2,000 queries per day 600 queries per minute
HTTP Error 500 When calling the API in batches, the crawler occasionally displays this message and stops working:

googleapiclient.errors.HttpError: 
Meaning that because of an internal Google error, it couldn't spit back the data we requested. We reached out to the Search Console Community for assistance with this issue, and the expert's response was:
It is difficult for us to avoid this problem. There are about 1000 servers behind the API gateway system processing various requests. If there is a problem with one of them, Error 500 will be returned, but the next time you send a request, it will be sent to other servers , returns valid data.
That is to say, such errors are random, and we can send the same request again after a few seconds. Later, we used the try else syntax to let the crawler detect and process the Error part, and the inspection result will appear on this URL when it is next crawled.
Experiment find-outs
The URL inspection tool is in the Crawled and Indexed-related state, and the speed of change cannot keep up with the actual indexed results
At the beginning of the year, the index status changes we collected in our first experiment are as follows, and by definition, we think that they will appearIndexed, not submitted in sitemaporSubmitted and indexedit will be indexed, but actually use the method of site: to find it. When Discovered, the page can be found. It takes about 2 minutes from the time the sitemap publishes the URL to the time it is indexed by google.

Results of the first exploration of the Inspection feature

BUT!Recently discovered that even if it appearsIndexed, not submitted in sitemap、Submitted and indexedThese two states,"There is no guarantee that it has been indexed by google”, so you may still use site: method to confirm that in which state your website is already indexed, it will be safer!
What does not change is that Inspection is in the indexed state, which is basically later than the time when it is actually indexed.
The update time has no absolute relationship with the crawler retrieval frequency, and there is a problem with the retrieved retrieval time
The number of article updates may affect the number of crawler retrievals, but there are also articles that have been updated multiple times and have not been retrieved multiple times.
However, this experiment did not control the number of inquiries for each data, and there was a modification process in the middle to suspend inquiries, cancel the 5-minute rest period, etc. These variables may also affect the integrity of the data.
In addition, it was found that the last crawling time provided by google has repeatedly occurred, and the reliability of the data is yet to be discussed.

Duplicate content cannot be completely avoided with canonical
Due to the configuration of the website structure, sometimes duplicate content of different URLs will be published. The response method is to change the canonical of the original URL and replace it with a new URL. At the same time, the sitemap will also remove the original URL and put the new URL.
But in fact, google has already included the original URL. During the retrieval process, it will judge that the new URL is duplicate content (Duplicate, submitted URL not selected as canonical). At this time, the canonical has not yet taken effect. In the end, it may be that the crawler crawled the original URL. After receiving the canonical message, the new URL was also included.
But this makes us reflect on whether the status of duplicate content in the middle will affect the website rating?
Why can't it be completely avoided with canonical settings?  John of Google said on the #AskGoogleWebmasters show that the criteria for canonical selection include:

Canonical announcement URL
Forwarding URL
Internal link
URL in sitemap
URL with Https
URL with better URL structure


Great John of Google Search!It is inferred that canonical should be for reference only, in fact, google still has its own judgment

In addition to organizing the captured data with a spreadsheet, some people have also opened a data studio template to visualize the data, which is more beautiful, but this is designed based on the data output by Screaming Frog. If you use the api to capture directly, you may need to adjust the format. !
The above is probably the process and discovery of our actual (ㄨㄢˊ) experiment (ㄕㄨㄚˇ). At that time, when I came across the PM who wanted to test the SEO effectiveness of a certain function, coupled with the ability to improve the editor's coding skills, I carried out this set of experiments, and finally successfully helped the website to catch some old bugs.
According to the results provided by Inspection API, you can better develop SEO optimization strategies suitable for your own website. If you have any new discoveries after using it, please leave a message and share it with us!




Reference resources

Your website needs a full body check! A Preliminary Study of Google URL Inspection API | Square vocus

What kind of website is suitable for?

What is URL Inspection Tool?

What other information can this API find out?

Concatenation steps

Concatenation Considerations

Experiment find-outs

The URL inspection tool is in the Crawled and Indexed-related state, and the speed of change cannot keep up with the actual indexed results

The update time has no absolute relationship with the crawler retrieval frequency, and there is a problem with the retrieved retrieval time

Duplicate content cannot be completely avoided with canonical

Reference resources

Share this:

Related

With Medcampus, the students of Capitanata are at the top of the rankings for the admission test to Medicine

The blunt reversal caused the audience to be dissatisfied with the handling of the relationship between the protagonists and the finale of “Battle of the Roses” caused heated discussions again jqknews

You may also like

Leave a Comment Cancel Reply