Friday, November 14, 2008

What is Solr?

What is Solr?
SOLR is a HTTP interface for Lucene. Solr makes the features in Lucene Platform agnostic.
Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.


How does Apache Solr work?
Content created information passed to the solr server in XML format. Based on the Schema file in the Solr server will index the docs which contains the data.


Search in solr done by IndexSearcher. We can send query by admin interface. Then solrIndexSearcher and queryHandlers handle the query. Finally the result will displayed in XML format(docs) relevant to your query.


Why use Apache Solr?

When we have a large database, the search and indexing takes a long time. If
we use Apache Solr we can run the indexing and search in a separate server.While sending a document the client can be of any language where in which the request and response are in XML format.It's a open source, HTTP-based search server that can be easily incorporated into a wide variety of Web applications

Why apache solr is fast?

Apache Solr is a java servlet application.Each request handled in a separate thread.So it is faster.Solr is 5 times faster than other search servers(sphinx,mysql full text search).

3. Analyzers: Tokenizers and Tokens

If you want different columns in your database to use different Tokenizers,they must be associated with different data types in Solr. Over and above the tokenizers, the text can be further indexed using the Token filters.We an also have predefined analyzer classes in java and then just include them,

Caching:

Solr caches are associated with an Index Searcher—a particular view of the index that doesn't change.So as long as that index Searcher is being used,any items in the cache will be valid and available for reuse.Caching in Solr is unlike ordinary caches in that Solr cached objects will not .