About Text Mine
  Home   About   Install   FAQ   Screens   Tools
  Search   Extract   Q & A   E-Mail   Cluster   Dictionary


Each e-mail is automatically sorted and placed in the most likely category or placed in a miscellaneous category. A category consists of a set of keywords and associated weights that describe the category. You can build a hierarchy of categories. Each e-mail is converted to a set of weighted keywords and compared with the keywords for every category. The closest (highest similarity) category is selected if it exceeds a threshold, otherwise, it is assigned to the miscellaneous category.

All e-mails from a category can be viewed separately. E-mails can be manually routed to categories to build an accurate representation of a category. The weights of the terms describing a category can also be adjusted. Finally, when a category is sufficiently trained with a set of e-mails and manual tuning and appears to accurately reflect a category, it can be made static by stopping training.

An E-flow function is included. This function tracks the flow of e-mail traffic. You can set reminders to send or receive e-mails. If you are expecting an e-mail, an E-flow entry can be used to verify if it has arrived or not. Likewise, if you have to send an e-mail, you can generate a task to remind you. The reminder will let you know if the task is complete or not.

E-flow can be used for reminders to send greetings or to reply to an e-mail later. The status for a task can be complete or incomplete and is set automatically by comparing the task description with the to address and the contents of the e-mail. If the similarity exceeds a threshold and the date of the e-mail is in a date range, then the task is complete.


MailUtil - This module contains functions to display the e-mail header HTML for CGI scripts, save an e-mail in the database, categorize an e-mail, compute a centroid for an e-mail category, and other functions. The CGI scripts starting with the em_ prefix contain examples of E-Mail functions.

Function Calls for save_email

The save_email function accepts a file consisting of many concatenated e-mails separated by e-mail headers. The body and headers for the e-mails are saved in a hash. An e-mail vector is generated using the email_vector function. This vector is compared with the centroids for email categories. For each comparison, categories with a similarity greater than a threshold are saved. Finally, the e-mail is assigned the category with the highest similarity. If the similarity with all categories is below the threshold, then the e-mail is assigned to the miscellaneous category. Finally, the e-mail network is updated to reflect the new e-mail's from and to addresses.

Processing a received email


1. Modules to import e-mail data from sources into the e-mail archive. Copying e-mails from a mail server, mail client, or web based e-mail to the archive.

2. Improve accuracy of categorization - can manual setting of term weights for a category lead to a better representation of a category

3. Check E-flow. Does it track the flow (in and out) of e-mails ?