必威体育Betway必威体育官网
当前位置:首页 > seo技术

seo入门必备知识,爬行和抓取

时间:2020-04-02 14:37:17来源:seo技术作者:seo实验室小编阅读:0次「手机版」
 

  seo入门必备知识,爬行和抓取。什么是爬行和抓取,跟着小编一起了解一下吧。

  爬行和抓取 是搜索引擎工作的第一步,完成数据收集任务。

  Crawling and crawling is the first step in the search engine work, complete the data collection task.

  1、蜘蛛:

  1. Spiders:

  搜索引擎用来爬行和访问页面的程序被称为蜘蛛(spider),也称为机器人(bot)。

  The programs that search engines use to crawl and access pages are called spiders, or bots.

  2、跟踪链接:

  2. Following links:

  为了抓取网上尽量多的页面,搜索引擎蜘蛛会跟踪页面上的链接,从一个页面爬到下一个页面,就好像蜘蛛在蜘蛛网上爬行那样,这也是搜索引擎蜘蛛这个名称的由来。

  In order to crawl as many pages as possible on the web, search engine spiders follow links on the page and crawl from one page to the next, just like spiders crawl on a spider web, which is where the name spider comes from.

  3、吸引蜘蛛:

  3. Attract spiders:

  理论上蜘蛛能爬行和抓取所有页面,但实际上不能,也不会这么做。

  In theory spiders can crawl and grab all pages, but in practice they can't and won't.

  SEO人员想要让自己的更多页面被收录,就要想方设法吸引蜘蛛抓取。

  SEO people want to make more of their pages included, it is necessary to find ways to attract spiders to crawl.

  4、地址库:

  4. Address library:

  为了避免重复爬行和抓取网址,搜索引擎会建立一个地址库,记录已经被发现还没有抓取的页面,以及已经被抓取的页面。

  To avoid repeated crawls and crawls, search engines create an address library that records pages that have been found and not yet crawled, as well as pages that have been crawled.

  5、文件储蓄:

  5. Document savings:

  搜索引擎蜘蛛抓取的数据存入原始页面数据库。

  The data captured by the search engine spider is stored in the original page database.

  其他的页面数据与用户浏览器得到的HTML是完全一样的。

  The rest of the page data is exactly the same as the HTML the user's browser gets.

  每个URL都是这样一个独特的文件编号。

  Each URL is such a unique file number.

分享到:

栏目导航

推荐阅读

热门阅读