Hyper-Federation
Federation: One System, All Necessary Information
Some vendors have described the federated search function as a portal delivering a roll-up of “federated content.” The typical federating method is to take a query and transform it into the syntax of other retrieval systems. Once the query has been translated, a federating system sends it to two or more other search systems. The results come back, are processed to remove duplicate items, and are sent to the user's browser for review.
Most federating systems do not build a local index of the processed content, nor do the systems support content repositories. Some systems can index certain content and then perform federation across the local index and other remote indexes accessible to the system.
The key point is that federating systems send queries to other systems. The approach saves the user the time and bother of sending a query to multiple systems one at a time. However, each system responds differently, and traditional federating systems may exhibit some sluggish response times.
Federation:
The typical federating method is to take a query and transform it into the syntax of other retrieval systems. Once the query has been translated, a federating system sends it to two or more other search systems. The results come back, are processed to remove duplicate items, and are sent to the user's browser for review.
This may be addressed by adding hardware and implementing more aggressive result caching. Vendors of traditional federated systems address delays using a variety of methods, including the use of parallel systems
The idea of searching many different sources, databases, and collections appears to be a good one on the surface, but search time delays are a problem. The problem is exacerbated by disappearing documents, sometimes called “broken connectors.” An index from a remote system may point to a document that has been deleted or is no longer available, and users do not understand a link without the source content.
The principal hurdle for traditional federated search systems is cost. Using multiple servers and then scaling them when performance degrades is too expensive, too complicated, and too unpredictable for most organizations. As a result, search becomes a compromise.
Perfect Search and Hyper-Federation
Hyper-federation deals with the problem of information retrieval in a fundamentally new way. The idea behind hyper-federation is to tackle the problems of the amount of information, the different types of data, and the cost of infrastructure head on.
Hyper-federation processes content from multiple sources and creates a single, optimized index. Source data can be retained in a repository or in the search system itself. The low cost of storage and advanced content processing eliminates the need for time-consuming round trips between the user and the source.
Hyper-federation:
Processing content from multiple sources and creating a single, optimized index that is searchable with a single query.
Attribute |
Federated Search |
Hyper-Federation |
Query Method |
Query transformation and dependency on other systems’ indices. |
Query passed to a single hyper-index of structured, unstructured, and real time content. |
Relevance |
Approximate due to latency in remote system responses |
Statistical and semantic methods with “just in time” refinement |
Syntax |
Free text, some Boolean operations |
Fuzzy, Boolean, and free text queries supported |
Response time |
Dependent on remote system response and resources available to the federating system |
Sub-second response time on low end servers |
Index refresh |
Dependent on remote systems |
Indexes refresh on a licensee-stipulated schedule; for example, every 15 minutes, hourly, etc. |
Content throughput |
Dependent on remote systems |
Gigabytes of text per hour on a basic server; more with additional server resources |
Interface |
Results list. Some systems include clustering and facets |
Customizable to licensee requirements |
Vortex: The Perfect Search Method
Perfect Search has developed a method that permits indexing multiple, sometimes geographically separate databases with a single search engine. The innovations at Perfect Search permit high-speed retrieval, drastically reduced hardware footprints, and extensive customization.
Perfect Search is one of the few organizations able to handle billions of documents and other instances of digital information using two or three servers instead of racks of cutting-edge devices.
The system processes huge volumes of source information and reduces it to a hierarchy of indexes and accelerators without losing the sources' meaning. The system processes information into a representation of the content, ideas, and meaning of the source material in a series of small bundles of molecules arranged in “vortexes”-that means that each of these representations can “spin” a query to the precise information matching the user's information need. The system then “unwraps” the bundles of information molecules and delivers a relevant, on-target information answer. The efficiency allows the system to maintain performance in a disk-based implementation. As a result, one or two servers can do the work of six or eight servers required by other vendors' approaches.
The Perfect Search system ingests content in a wide range of file types and formats. The indexation process takes the “chunks” of content in the form of documents, emails, structured data sets, and Adobe Portable Document Format files, among others, and “distills” them. Key features of each content object are extracted, tagged, and written to the Perfect Search patented “vortex pattern indexes”. These indexes are optimized and exposed in a way that makes low-latency retrieval possible. The indexes have two other important features as well.
First, if specialized data mined molecules for searching are used, the size of the Perfect Search index is much smaller than traditional search and retrieval systems. Larger indexes are also often used to fine tune exceptional performance on retrieval or to improve relevance rankings. However, even these larger indexes are created many times faster and also are accessed for retrievals many times faster than competing systems. Second, the structure of the index makes it possible to “slice and dice” information across the content types processed by the index. Cost and performance benefits are two benefits of the Perfect Search approach. Perhaps more important is the fact that the system makes it possible to have a single view of the content processed into vortex pattern indexes.
Copyright © 2009 - 2010 Perfect Search Corporation. All rights reserved.











