搜索引擎系统是最复杂的计算系统之一。
Search engine system is one of the most complex computing systems.
如今,主流搜索引擎服务提供商都是拥有财力和人力资源的大公司。
Today, major search engine service providers are large companies with financial and human resources.
即使有技术、人力和财力的保证,搜索引擎仍然面临许多技术挑战。
Even with the technical, human and financial guarantees, search engines still face many technical challenges.
搜索引擎主要面临哪些挑战?如何解决
What are the main challenges facing search engines?
1、页面抓取需要快而全面:
1. Page crawling needs to be quick and comprehensive:
互联网是一个动态的内容网络,每天有无数页面被更新,创建,无数用户在网站上发布内容,沟通联系。
The Internet is a dynamic content network, with countless pages being updated and created every day, and countless users Posting content and communicating with each other on websites.
要返回最有用的内容,搜索引擎就要抓取最新的页面。
To return the most useful content, a search engine must grab the latest page.
2、海量数据存储:
2. Mass data storage:
一些大型网站单是一个网站就有百万千万个页面,可以想象网上所有网站的页面加起来是一个什么数据。
Some large websites alone have millions of pages, you can imagine all the web site pages add up to a data.
3、搜索处理快速有效,具可扩展性:
3. Fast and effective search processing with scalability:
搜索引擎将 页面数据抓取和储存后,还要进行索引处理,包括链接关系的计算,正向索引,倒排索引等。
Search engine will be the page data capture and storage, but also to conduct index processing, including the link relationship calculation, forward index, inverted index, and so on.
4、查询处理快速准确:
4. Quick and accurate query processing:
查询是普通用户唯一能看到的搜索引擎工作步骤。
Queries are the only search engine steps that the average user can see.
用户在搜索框输入关键词,单击“搜索”按钮后通常不到一秒就会看到搜索结果。
The user enters a keyword in the search box, and clicking the search button usually results in less than a second.
表面最简单的过程,实际上涉及非常复杂的后台处理。
What appears to be the simplest process actually involves very complex background processing.