1. What is Text Mine ?
Text Mine is a collection of Perl modules to mine text. The term 'text mining' is relatively new. Its data counterpart - 'data mining' is better known for finding patterns and trends from databases or data warehouses that are not abvious. Likewise, text mining locates patterns and hidden information within natural language text that is not obvious or requires tedious search and analysis. Some of the tools included in Text Mine are 'text mining' tools while others are Information Retrieval (IR) or Natural Language Processing (NLP) tools.
2. What tools does Text Mine provide ?
a. Information Extraction: Find entities (people, places, organizations, and other terms) in text.
b. POS Tagger: Find the parts of speech in text
c. Spider: Search the web using an intelligent spider
d. E-Mail: Track, categorize, and manage your e-mail in an archive.
e. Clustering: Group a collection of documents by content into clusters of like documents.
f. Summarization: Build a single or multiple line summary of web pages, articles, and other documents.
g. Dictionary: Search and find relationships between words using WordNet.
h. Q & A: Find answers for questions in text
i. Search: Search your local machine and locate images, formatted documents, and other files using a automatic/manual index.
3. How do I install Text Mine ?
Text Mine runs on Linux and Windows. It is written in Perl. An install script is included for each platform. Before installation, check the requirements (MySql, Perl, and Apache). The MySql server should be running before installation and the userid and password should be set in setup.pl .
4. How do I use these tools ?
First locate the Text Mine module for the function you want to use. Then check the documentation for the module. A web based interface maybe available, if not the command line interface can be used to test.
5. Are these tools any good ?
Try it and see for yourself. If you know Perl, you can improve these tools. If not, you can suggest improvements in the algorithm or dictionaries to get better results. Many of the problems in text mining do not have a single solution that is clearly the best. Instead there are acceptable solutions of varying degrees and the aim in text mining algorithms is to get as close as possible to the 'best solution'.