Methodology

The system-starting point is the product “BookOnCloud”, which will offer to the customers-owners of its tourist accommodation various packages that will enable them to monitor the competition and their position in it at any time. The end result will be that this useful information will be displayed on the customer management screens.

Each customer of “BookOnCloud” depending on the offer package he has purchased will receive the requested information during the period covered by the package he has purchased. This time period is translated into cron expressions and stored in a table of a postgres database along with the unique id of the client and the id of the target. In this way a complete customer request has been made.

Here is another subsystem, which is implemented with Apache Kafka. A node.js script undertakes to read the table from the database and execute a scheduled process that will run every now and then so that this is equal to the time specified by the cron expression for each client request. This process essentially activates an Apache Kafka producer, who places these requests in a queue. These applications are the first topic of Apache Kafka.

This topic is “consumed” by a group of consumers. We use groups to take advantage of more consumers who will read the messages of the topic as the queue grows. In this way the system becomes more efficient and faster since the processes are performed in parallel on the separate servers offered by Apache Kafka. This is why Apache Kafka was preferred because it can efficiently manage queues and transfer real-time data from sender to recipient.

The message acts as input to the data collection subsystem from the target. This communication is achieved through some proxies, which change in case of failure. For example, an alternative to the proxy is the Tor Browser which successfully connects the subsystem to the target online.

The data collection subsystem outputs responses to customer requests, which include the information requested by each customer. In accordance with the application process, the responses are considered messages of an Apache Kafka topic and are therefore stored in a queue for consumption by a second group of consumers. If a review is found in the message, the api developed by the collaborating university team is automatically called using neural networks. The response of this api is the final decision for the evaluation of the critique in “positive” or “negative” based on the content of the text that was accessed and various other parameters.

After reading the message, the consumer who read this message undertakes to save it in a table of a postgres database along with the api response to the reviews. But this only happens if there is no previous error.

Errors can occur in a number of cases, with the most common being that the target has changed the format that represents its information and no longer coincides with the format in which the same information is represented by the model following the data collection system.

If, therefore, an error occurs the answer instead of being stored in the above queue, it will be saved in another queue corresponding to a different topic of Apache Kafka and this will only concern the errors that it will detect each time. For immediate response of the data collection system and resolution of the errors that occurred, it is important that the errors are analyzed and displayed in bulk on a console to facilitate the work of the data collection system administrator who will be called upon to make the necessary modifications. Error monitoring will be implemented with Elastic tools, Elasticsearch-Logstash-Kibana because they can manage a very large amount of data in real time and then visualize it in auxiliary graphs, such as dashboards.

Finally, regardless of the previous subsystems, a restful api will be implemented in node.js (express), which will be connected to the database that contains all the useful information and will analyze it, returning only the necessary and appropriate results according to its request. user and the corresponding data will be visualized directly on the management screen of the customer-owner of the tourist accommodation of “BookOnCloud”.