I'm trying to evaluate between using MS Sql server based full text search Vs open-source full-text search for our core product. Does anyone know of any previous benchmarks/comparisons between these two approaches? Appreciate any reply.
************************************************** ********************
Sent via Fuzzy Software @. http://www.fuzzysoftware.com/
Comprehensive, categorised, searchable collection of links to ASP & ASP.NET resources...
Anantha,
Unfortunately, no such benchmarks/comparisons exists today and Microsoft has
never released any such benchmarks as well.
However, I have long been kicking around the idea of building a "SQL FTS
Benchmarking Toolkit" along the lines of a TPC Benchmark suite and I
submitted and abstract on it for the 2003 PASS conference. I'm assuming
you're comparing SQL Server 2000 FTS vs. MySQL FTS vs. PostgreSQL with is
TSearch2 or OpenFTS as I have researched all of these products or are you
considering other open-source full-text search products?
However, what you can do is build a sample database with publicly available
text data from the Moby lexicon project built by Grady Ward at
http://www.dcs.shef.ac.uk/research/ilash/Moby/ and then setup up a standard
benchmarking test. Note, this data is freely
available and is in the public domain, per Grady Ward. Additionally,
Microsoft as well as other RDBMS vendors, such as ORACLE and IBM compete in
standard TPC Benchmarking tests in order to determine which database is the
fastest, etc. while using a standard test suite of tools, database schema
and data, using the TPC Benchmark C (http://www.tpc.org/tpcc/detail.asp).
The TPC Benchmark that is closest to a "Full Text Search" TPC Benchmark is
TCP-W (http://www.tpc.org/tpcw/default.asp), but this too is mostly a
transactional web e-Commerce benchmark and not strictly for FTS queries.
Full Text Indexing (FTI) and Full Text Search (FTS) performance go hand in
hand along with the language of the text (Moby has word lists in five of
languages), the size (both row count and the amount of text per row) to
create a matrix of tests that will not only measure the FTI performance, but
will measure FTS queries from multiple clients issuing random FTS queries.
Additional factors, include both hardware and software configurations, for
example: the number, speed of the CPU's as well as the size and type of
L-cache per CPU. Other hardware configurations, includes the amount of RAM,
the number of disk controllers as well as the type of raid disk drives and
where the database files and FT Catalog files are placed. As you can see
this is a non-trivial effort and one I plan on documenting for my book on
this subject.
I continue to work on completing the "SQL FTS Benchmarking Toolkit" and
until it is completed, I'd recommend that you download some of the Moby test
files and develop a test database and tables and load this data into it and
then use the Microsoft provide client tool OSTRESS utility (download at:
http://support.microsoft.com/default...b;en-us;887057) use it to
measure the performance of multiple FTS queries from multiple clients
against your test database for comparisons against other open-source
full-text search for your core product.
Please feel free to contact me if you need additional details.
Regards,
John
"Anantha Padmanabhan" <ananthapus@.hotmail.com> wrote in message
news:#$cwriW1EHA.4004@.tk2msftngp13.phx.gbl...
> I'm trying to evaluate between using MS Sql server based full text search
Vs open-source full-text search for our core product. Does anyone know of
any previous benchmarks/comparisons between these two approaches?
Appreciate any reply.
>
> ************************************************** ********************
> Sent via Fuzzy Software @. http://www.fuzzysoftware.com/
> Comprehensive, categorised, searchable collection of links to ASP &
ASP.NET resources...
|||John,
Thanks for your help. I'm trying to evaluate MS SQL Server 2000 FTS with open-source product/framework Lucene.
************************************************** ********************
Sent via Fuzzy Software @. http://www.fuzzysoftware.com/
Comprehensive, categorised, searchable collection of links to ASP & ASP.NET resources...
|||You're welcome, Anantha,
If you're in comparing the open-source product/framework Lucene with SQL
Server 2000 FTS (both very different implementations of Full Text Search),
you may be interested in DotLucene - The Open Source Search Engine for .NET
at: http://openlucene.net/. Also, keep in mind if you're goal is to do full
text search of documents (MS Word, HTML, etc.) stored outside of SQL Server
tables, you can also use the Windows Indexing Service and setup a Linked
Server (via MSIDXS OLEDB provider) for other data stored in SQL Server.
Regards,
John
"Anantha Padmanabhan" <ananthapus@.hotmail.com> wrote in message
news:e1Z8S8Z1EHA.3840@.tk2msftngp13.phx.gbl...
> John,
> Thanks for your help. I'm trying to evaluate MS SQL Server 2000 FTS with
open-source product/framework Lucene.
>
> ************************************************** ********************
> Sent via Fuzzy Software @. http://www.fuzzysoftware.com/
> Comprehensive, categorised, searchable collection of links to ASP &
ASP.NET resources...
sql
No comments:
Post a Comment