Evaluation of text analysis in social media: a linguistic approach

The analysis of text data is gaining more and more importance every day. The need for companies to know what people think and want is key to invest money in providing customers what they want. The first approach to text analysis was statistical but adding linguistic information has been proven to work well for improving the results.

One of the problems that you need to address when analyzing social media is time. People are constantly exchanging information, users write comments every day about what they think of a product, what they do or the places they visit. It is difficult to keep track of everything that happens. Moreover, information is sometimes expressed in short sentences, keywords, or isolated ideas, such as in Tweets. Language is usually unstructured because it is composed of isolated ideas, or without context.

I will talk about the problem of opinion in opinion mining: how do we define opinion. I will also explain briefly how Naïve Bayes classifiers can be useful in sentiment analysis and I will use an example to show how linguistic information can help improve the trainings. I will also evaluate the results and compare them to the results of training without any linguistic information (i.e. supervised vs unsupervised learning).

I will also explain about opinion lexicons, both dictionary based and corpus-based, and about using lexicons for semi-supervised learning and supervised learning. If I have time left, I will explain about other use cases of text analysis, such as spam detection.