Why is Unstructured Data Complicated!
Structured systems like ATM Transactions,
Airline Reservations, Manufacturing Inventory Control Systems, Point of sale
systems and many more are a part of the daily operational activities and grow
quickly. These are defined in rules and are supported by analytical systems.
However, Unstructured data such as Emails,
reports, telephonic conversations, media and many more do not have defined
rules and are not predetermined in a structure.
The analytical systems support only the structured data and here is
where the complexity arises with Unstructured as they differ a lot
Technologically, Organizationally, Structurally with structured data.
To include the unstructured data in to these
analytical systems for analysis the data must be read and integrated with the
structured data. Thus Reading the unstructured data is not an easy job too
because of the following complications that are related to it-
- Text on paper to be converted to electronic format through OCR (Optical Character Recognition), but many times text can be damaged, paper may be too old thus here the data should be read and then entered.
- Voice reading to be converted using VCR (Voice character recognition)- may affect quality
- Reading text from the following formats- .pdf, .txt, .doc, .ppt, .xls, .txt compatible, Lotus, Outlook may require the need of third party vendors as they are more efficient and reliable.
- Taped telephonic conversations through VCR
Once all the data is read from different
sources it must be Integrated to prepare it for analysis.
Why do we need Data Integration?
- It offers Simple Search
- Alternate Names can be searched indirectly
- Related Terms can be searched indirectly
- Permutation of words can be searched indirectly