The World Wide Web is on the eve of a new stage of development called Web 3. This revolutionary concept of online interaction will require a revision of the Internet’s entire infrastructure, including search engines. How does a decentralized search engine work and how does it fundamentally differ from current search engines like Google? As an example, let’s take a look at a decentralized search engine for Web3 that has been created by the Cyber project.
What’s wrong with Google?
Google is the most used search engine in the world. It accounts for about 80% of global search queries but is often criticized for its opaque way of indexing links and generating search results. Although descriptions of much of the technology related to its search algorithm have been published and is publicly available, this does not change much for an end-user trying to figure out how it works: the number of parameters taken into account in producing results is so large that Google’s search algorithm simply appears to be a black box.
In practice, ordinary users face two fundamental problems. Firstly, two different users making the exact same query will often receive radically different search results. This is because Google has managed to collect a treasure trove of data about its users, and it adjusts its search results in accordance with the information it has on them. It also takes into account many other parameters, including location, previous user requests, local legislation, and so on. Secondly, and this is the main complaint often heard about Google, the mechanism that indexes links is unclear to users: why is one content ranked as very relevant to a given query, while another appears far below the top twenty search results, which contains much more content directly applicable to that query?
Finally, the architecture of any search engine that is designed for Web2 – be it Google, Yandex, Bing, or Baidu – works with protocols like TCP/IP, DNS, URL, and HTTP/S, which means it makes use of addressed locations or URL links. The user enters a query into the search bar, receives a list of hyperlinks to third-party sites where relevant content is located and clicks on one of them. Then the browser redirects them to a well-defined physical address of a server on the network, that is, an IP address. What’s wrong with that? In fact, this approach creates a lot of problems. Firstly, this kind of scheme can often render content inaccessible. For instance, a hyperlink can be blocked by local authorities, not to protect the public from harmful or dangerous content, but for political reasons. Secondly, hyperlinks make it possible to falsify content, that is, to replace it. The content on the web is currently extremely vulnerable, as it can change, disappear, or be blocked at any time.
Web 3 represents a whole new stage of development in which working with web content will be organized in a completely different way. Content is addressed by the hash of the content itself, which means content cannot be changed without changing its hash. With this approach, it is easier to find content in a P2P network without knowing its specific storage location, that is, the location of the server. Though it may not be immediately obvious, this provides a huge advantage that will be extremely important in everyday internet use: the ability to exchange permanent links that will not break over time. There are other benefits like copyright protection, for example, because it will no longer be possible to republish content a thousand times on different sites, as the sites themselves will no longer be needed the way they are now. The link to the original content will remain the same forever.
Why is a new search engine needed for Web3?
Existing global search engines are centralized databases with limited access that everyone has to trust. These search engines were developed primarily for the client-server architectures in Web 2.
In content-oriented Web3, the search engine loses its unique power over search results: that power will lie in the hands of peer-to-peer network participants, who will themselves decide on the ranking of cyberlinks (the link between the content, and not the link to the IP address or domain). This approach changes the rules of the game: there is no more arbitrary Google with its opaque link indexing algorithms, there is no need for crawler bots that collect information about possible changes in content on sites, there is no risk of being censored or becoming a victim of privacy loss.
How does a Web 3 search engine work?
Let’s consider the architecture of a search engine designed for Web 3 using Cyber’s protocol as an example. Unlike other search engines, Cyber was built to interact with the World Wide Web in a new way from the very beginning.
A decentralized search engine differs from centralized search engines like Google because, with Web 3 search engines, links to content are organized in a knowledge graph in which peer participants exchange information without being tied to centralized nodes. Users find desired content via its hash, which is stored by another network member. After the content is found and uploaded, the user becomes one of its distribution points. This scheme of operation resembles that of torrent networks, which provide reliable storage, resist censorship, and also make it possible to arrange access to content in the absence of a good or direct internet connection.
To add content to the knowledge graph in the Cyber protocol, it is necessary to conduct a transaction with a cyberlink. This is similar to the payload field in an Ethereum transaction, with the difference that the data is structured. The transaction is then validated through the Tendermint consensus, and the cyberlink is included in the knowledge graph. Every few blocks, Cyber recalculates the rank for all content in the knowledge graph based on a certain formula called cyberRank. Like PageRank, the new algorithm ranks content dynamically, but, at the same time, ensures that the knowledge graph is protected from spam, cyber-attacks, and selfish user behavior via an economic mechanism.
Users and validators in Cyber’s decentralized search engine form a supercomputer. Cyber’s ability to calculate the rankings in the knowledge graph surpasses existing CPU blockchain computers by several orders of magnitude since its calculations are well parallelized and performed on a GPU. Therefore, any cyberlink becomes part of the knowledge graph almost instantly and is ranked within a minute. Even paid advertising in adwords can’t provide such speed, let alone good old organic search engines, in which indexing sometimes has to wait for months.
Ranking in a decentralized search engine for Web 3
The basis of Cyber is called Content Oracle. This is a dynamic, collaborative, and distributed knowledge graph that is formed by the work of all the participants in a decentralized network.
One of the key tasks that the developers of a decentralized search engine face is devising the mechanics that rank the links. In the case of a Web3 search engine, this is a cyberlink to relevant content. In the Cyber protocol, this is implemented via tokenomics.
At the heart of tokenomics is the idea that users should be interested in the long-term success of Superintelligence. Therefore, in order to get tokens that will index the content V (volts) and rank it A (amperes), it is necessary to get a token H (hydrogen) for a certain period. H, in turn, is produced by liquid staking the main network token (BOOT for Bostrom and CYB for Cyber). Thus, Cyber users will be able to access the resources of the knowledge graph with a network token and receive staking income similar to Polkadot, Cosmos, or Solana.
That’s right. The ranking of cyberlinks related to an account depends on the number of tokens. But if tokens have such an impact on the search result, who will they belong to at the beginning? Seventy percent of the tokens in Genesis will be gifted to users of Ethereum and its applications, as well as users of the Cosmos network. The drop will be carried out on the basis of an in-depth analysis of activities in these networks. Therefore, the bulk of the stake will go into the hands of users who have proven their ability to create value. Cyber believes this approach will make it possible to lay the foundation for the semantic core of the Great Web, which will help civilization overcome the difficulties it has encountered.
What will an ordinary user see in a decentralized search engine?
Visually, the search results in the Cyber protocol will differ little from the usual centralized search format. But there are several key advantages:
- The search results include the desired content, which can be read or viewed directly in the search results without going to another page.
- Buttons for interacting with applications on any blockchain and making payments to online stores can be embedded directly in search snippets.
How is the Cyber protocol being tested?
Cyb.ai is an experimental prototype of a browser in a browser. With its help, you can search for content, surf content using a built-in ipfs node, index content, and most importantly, interact with decentralized applications. At the moment, Cyb is connected to a testnet, but, after the launch of the Bostrom canary network at 5 of November, it will be possible to participate in the incredible process of bootstrapping the Superintelligence with the help of Cyb.