Georgia Tech inventors have developed a method that combines delicate natural language processing methods and an unsupervised learning algorithm to extract critical and latent features (embeddings) from raw text. These features are highly separate from one another and typically have lower dimensionality and more compact representations compared to conventional language processors. In addition, the learning process does not depend on any structural or label information and can be updated by the algorithm itself with the increase of new data. The technique was originally tailored to analyze police reports, consisting of time, location, and text descriptions, but could be utilized for a variety of applications. The inventors have also developed a software interface for the text analysis algorithm.
- Better performance compared to the conventional natural language processing methods
- Text analysis including but not limited to analysis of:
- Insurance records
- Medical reports
- Crime reports
- Social media content
Sifting through unstructured text for meaning can be a labor intensive job. Currently there are algorithms to identify classifications of text and patterns within a text, but these representations can be ambiguous.