|
|
E-Mail
|
|
|
Each e-mail is automatically sorted and placed in the most
likely category or placed in a miscellaneous
category. A category consists of a set of keywords and associated
weights that describe the category. You can build a hierarchy of
categories. Each e-mail is converted to a set of weighted keywords
and compared with the keywords for every category. The closest
(highest similarity) category is selected if it exceeds a threshold,
otherwise, it is assigned to the miscellaneous category.
|
|
|
All e-mails from a category can be viewed separately. E-mails
can be manually routed to categories to build an accurate
representation of a category. The weights of the terms describing
a category can also be adjusted. Finally, when a category is
sufficiently trained with a set of e-mails and manual
tuning and appears to accurately reflect a category, it can
be made static by stopping training.
|
|
|
An E-flow function is included.
This function tracks the flow of e-mail traffic. You can
set reminders to send or receive e-mails. If you are expecting
an e-mail, an E-flow entry can be used to verify if it has
arrived or not. Likewise, if you have to send an e-mail, you
can generate a task to remind you. The reminder will let you
know if the task is complete or not.
|
|
|
E-flow can be used for reminders to send greetings or to
reply to an e-mail later. The status for a task can be
complete or incomplete and is set automatically by comparing
the task description with the to address and the contents
of the e-mail. If the similarity exceeds a threshold and
the date of the e-mail is in a date range, then the task
is complete.
|
|
|
Modules
|
|
|
MailUtil -
This module contains functions to display the e-mail header
HTML for CGI scripts, save an e-mail in the database,
categorize an e-mail, compute a centroid for an e-mail
category, and other functions. The CGI scripts starting
with the em_ prefix contain examples of E-Mail functions.
|
|
|
Function Calls for save_email
|
|
|
The save_email function
accepts a file consisting of many concatenated e-mails separated
by e-mail headers. The body and
headers for the e-mails are saved in a hash. An e-mail vector
is generated using the email_vector
function. This vector is compared with the centroids
for email categories. For each comparison, categories with
a similarity greater than a threshold are saved. Finally, the
e-mail is assigned the category with the highest similarity.
If the similarity with all categories is below the threshold,
then the e-mail is assigned to the miscellaneous category.
Finally, the e-mail network is updated to reflect the new
e-mail's from and to addresses.
|
|
|
|
Processing a received email |
|
|
Projects
|
|
|
1. Modules to import e-mail data from sources into the e-mail
archive. Copying e-mails from a mail server, mail client, or
web based e-mail to the archive.
2. Improve accuracy of categorization - can manual setting of
term weights for a category lead to a better representation
of a category
3. Check E-flow. Does it track the flow (in and out) of e-mails ?
|
|