Text Manipulation
Text manipulation tasks are an integral part of any data
processing activities in the IT department of an
enterprise. Traditionally, UNIX/Linux based tools with support for
regular expressions like grep, sed,
awk, etc. have been used extensively for these purposes,
and these tools perform very well. However, such tools are
command-line based and require a high level of knowledge of the UNIX
environment and expertise in the tools themselves. Ferrite has been
designed from the ground up with a similar philosophy as these
industrial strength tools. However, unlike these tools, Ferrite aims
to be easy to use for the novice and the expert alike by
sporting an easy-to-use GUI. Another point in the favor of Ferrite is
that it is a desktop tool supported on Windows(TM) platforms.
Ferrite is a must-have tool if you are looking to automate text
manipulation tasks. It includes powerful text search and manipulation
filters which use rule-based instructions to operate on streaming
text. A typical processing workflow would be: open a file, stream data
through one or more processing filters using rules to process data. In
addition, you can save the workflow definition for later use, and
apply the same transformations again with a single click. More Features ..
In the following sections, we examine a few core features of
Ferrite that make it indispensable as a text processing tool,
including:
The Power of Regular Expressions
Regular Expression Search refers to a technique commonly used in text
search and manipulation tools for specifying search patterns. Some
simple examples of the patterns which can be specified include
- Anchor search to the beginning of the line using the character
^ i.e. search for specified text only if it occurs at the
beginning of the line. For example, the regular expression pattern
^From: can be used to select only those lines beginning
with From:
- Anchor search to the end using $: Using the pattern
.com$ a match occurs only for those records ending with
.com.
- Search for one or more consecutive digits in the text using the
pattern
\d+. Search for at least three consecutive digits
using \d{3}. A phone number in the United States could be
matched using the regular expression \d{3}-\d{3}-\d{4}.
- Search for text only at the beginning or end of a word using the
special construct
\b. For instance, the pattern
begin\b would match begin, but not
beginning.
- Search for multi-spaces lines, blank lines, lines with only
white-space or blanks, etc.
- And much more.
By supporting Regular Expressions extensively, Ferrite brings a new
vista to the world of text processing and manipulation.
Javascript for Text Manipulation
Javascript is an industry standard scripting language commonly used to
script websites. Due to its ease of use, we decided to embed a
Javascript Engine within Ferrite for scripting purposes. In addition
to the excellent features already present in Javascript, we have
extended the version of Javascript in Ferrite to add features useful
for text and data manipulation.
- Strong string manipulation features in Javascript coupled with
the close binding with Ferrite core puts the power in the hands of the
user.
- Javascript array processing features relate well with the field
processing features in Ferrite.
- Expressions of arbitrary complexity can be used in the selection
of and manipulation rows and columns of data.
Multi-Threaded Execution
Ferrite uses an architecture based on executing each filter in its own
thread with the data streamed from one filter to the next. This
arrangement provides extreme flexibility to the user since the filters
can be combined according to need. Since each filter executes in a
separate thread independently of the others, the user experiences
optimal speed, performance and throughput. The architecture is similar
to the arrangement used in traditional UNIX operating systems where
each process executes independently getting the job done
faster.
Extensibility: Small is Beautiful
As explained above, Ferrite uses an architecture based on chaining
filters together to manipulate text. Central to this design is the
fact that each filter accomplishes a single task and attempts to do it
well. This reduces the complexity of the entire platform, keeps the
design of each filter simple, and eases extensibility and addition of
new filters. New filters are constantly being added, so check back for
more filters in the following categories soon:
- Sorting
- Split and merge
- XML Stylesheet processing
- IMAP and POP3 Email processing
- Reading and writing Excel Spreadsheets
- Reading webpages directly.
- Database import and export.
|