Home Ā» Your website needs a full body check! A Preliminary Study of Google URL Inspection API | Square vocus

Your website needs a full body check! A Preliminary Study of Google URL Inspection API | Square vocus

by admin
Your website needs a full body check! A Preliminary Study of Google URL Inspection API | Square vocus
On January 31, 2022, the Google Search Center released a new API called URL Inspection API, which allows developers to check the URLs of their own websites in batches, confirm the index status in a timely manner, and conduct debugging and processing.

What kind of website is suitable for?

  • Thousands of URLs are generated every day
  • Want to do health check for a large number of URLs
  • Great emphasis on real-time search performance
  • Want to optimize website SEO

What is URL Inspection Tool?

When you publish a URL and want to make sure that Google has noticed your content, you can drop the URL in and check it out. If it shows a tick, it means that he has read it. When someone searches for relevant keywords, your web page will be displayed. will effectively appear in search results.

What other information can this API find out?

  • What does Google crawl?
  • Retrieval time, retrieval status, web page capture status
  • What is the standard URL chosen by Google?
  • Are the pages mobile friendly?
  • Do AMP pages work?
  • What does the structured data that Google crawls look like?
  • Does robots.txt block this page?
The information of Google Search Console inspection is as follows

However, it is too troublesome to enter the URLs one by one, and we will publish new reports all the time, which need to be retrieved immediately. At this time, this API will present the retrieval status of Google in large quantities and in real time.

Concatenation steps

This time we are using Python to string!

1. Scepter setting

The most troublesome part of this is to apply for a service account and OAth 2.0 client ID (Client ID), change the permissions of the service account to “complete” or “owner”, and add a new key, you will get a json file, including the following information: (please note that it cannot be outsourced)

2. Install the Python suite

Chaining API Necessary Kits

  • pip3 install –upgrade google-auth
  • pip3 install google-api-python-client==1.12.10
  • pip3 install requests
See also  Overwhelmed by water and mud in the days of the flood, the primary school gym is reborn

After the completion, you can start writing the code string!

1. Put the above json file in the folder and add a py file

from google.oauth2 import service_account
from google.oauth2.service_account import Credentials
from googleapiclient.discovery import build

2. Set the scepter

creds="key.json" #ę­¤č™•å”«å…„ä½ ēš„jsonęŖ”名
scopes = ['https://www.googleapis.com/auth/webmasters',
'https://www.googleapis.com/auth/webmasters.readonly']
credentials = service_account.Credentials.from_service_account_file(creds, scopes=scopes)
service = build('searchconsole','v1', credentials=credentials)

3. Tell the API what data you want, and he will return the corresponding INSPECTION data

#inspectionUrl = ä½ ęƒ³ęŖ¢ē“¢ēš„ē¶²å€ļ¼›siteUrl = äø»ē¶²ē«™
request = {
  'inspectionUrl': 'https://www.cna.com.tw/news/afe/202203170150.aspx',
  'siteUrl': 'https://www.cna.com.tw/'
}
response = service.urlInspection().index().inspect(body=request).execute()
inspectionResult = response['inspectionResult']
print(inspectionResult)

After the test is no problem, you can write a circle and grab the information you need!

Concatenation Considerations

  • Quota limit 2,000 queries per day 600 queries per minute
  • HTTP Error 500 When calling the API in batches, the crawler occasionally displays this message and stops working:
googleapiclient.errors.HttpError: 

Meaning that because of an internal Google error, it couldn't spit back the data we requested. We reached out to the Search Console Community for assistance with this issue, and the expert's response was:

It is difficult for us to avoid this problem. There are about 1000 servers behind the API gateway system processing various requests. If there is a problem with one of them, Error 500 will be returned, but the next time you send a request, it will be sent to other servers , returns valid data.

That is to say, such errors are random, and we can send the same request again after a few seconds. Later, we used the try else syntax to let the crawler detect and process the Error part, and the inspection result will appear on this URL when it is next crawled.

Experiment find-outs

The URL inspection tool is in the Crawled and Indexed-related state, and the speed of change cannot keep up with the actual indexed results

At the beginning of the year, the index status changes we collected in our first experiment are as follows, and by definition, we think that they will appearIndexed, not submitted in sitemaporSubmitted and indexedit will be indexed, but actually use the method of site: to find it. When Discovered, the page can be found. It takes about 2 minutes from the time the sitemap publishes the URL to the time it is indexed by google.

Results of the first exploration of the Inspection feature

BUT!Recently discovered that even if it appearsIndexed, not submitted in sitemap态Submitted and indexedThese two states,"There is no guarantee that it has been indexed by googleā€, so you may still use site: method to confirm that in which state your website is already indexed, it will be safer!

What does not change is that Inspection is in the indexed state, which is basically later than the time when it is actually indexed.

The update time has no absolute relationship with the crawler retrieval frequency, and there is a problem with the retrieved retrieval time

The number of article updates may affect the number of crawler retrievals, but there are also articles that have been updated multiple times and have not been retrieved multiple times.

However, this experiment did not control the number of inquiries for each data, and there was a modification process in the middle to suspend inquiries, cancel the 5-minute rest period, etc. These variables may also affect the integrity of the data.

In addition, it was found that the last crawling time provided by google has repeatedly occurred, and the reliability of the data is yet to be discussed.

Duplicate content cannot be completely avoided with canonical

Due to the configuration of the website structure, sometimes duplicate content of different URLs will be published. The response method is to change the canonical of the original URL and replace it with a new URL. At the same time, the sitemap will also remove the original URL and put the new URL.

But in fact, google has already included the original URL. During the retrieval process, it will judge that the new URL is duplicate content (Duplicate, submitted URL not selected as canonical). At this time, the canonical has not yet taken effect. In the end, it may be that the crawler crawled the original URL. After receiving the canonical message, the new URL was also included.

But this makes us reflect on whether the status of duplicate content in the middle will affect the website rating?

Why can't it be completely avoided with canonical settings? John of Google said on the #AskGoogleWebmasters show that the criteria for canonical selection include:

  • Canonical announcement URL
  • Forwarding URL
  • Internal link
  • URL in sitemap
  • URL with Https
  • URL with better URL structure
Great John of Google Search!It is inferred that canonical should be for reference only, in fact, google still has its own judgment

In addition to organizing the captured data with a spreadsheet, some people have also opened a data studio template to visualize the data, which is more beautiful, but this is designed based on the data output by Screaming Frog. If you use the api to capture directly, you may need to adjust the format. !

The above is probably the process and discovery of our actual (ć„Øć„¢ĖŠ) experiment (愕ć„Ø愚Ė‡). At that time, when I came across the PM who wanted to test the SEO effectiveness of a certain function, coupled with the ability to improve the editor's coding skills, I carried out this set of experiments, and finally successfully helped the website to catch some old bugs.

According to the results provided by Inspection API, you can better develop SEO optimization strategies suitable for your own website. If you have any new discoveries after using it, please leave a message and share it with us!

Reference resources

You may also like

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More

Privacy & Cookies Policy