How the Data Ecosystem really functions in Analytics
More clarity on data ecosystem...
As established earlier in our prior article, the data ecosystem is like an environment for data analysis and gathering. However, what wasn’t discussed was its functionality as regards the data analytics process.
So how does the data ecosystem really function when it comes to analytics?
Well, it starts with a process called the ‘Sensing’, and it is most often concerned with the identifying of data sources for a project. For instance, say a business owner was to carry out a project into consumer behavior and interaction with his products, how does he evaluate his sources or the quality of the data he gets to derive?
This is certainly one of the questions that are asked in the Sensing process of the data ecosystem evaluation.
At this stage, questions asking of the accuracy of the data, its timeliness and its validity are placed into test to be answered.
It should be taken into account that data can be sourced during this stage from internal sources such as software like spreadsheets, from databases or CRMs or from external sources like websites or third party.
And what is very much interesting is that these internal sources and external sources alongside algorithms are what comprise the data ecosystem.
Once the data are identified or sensed in the ‘Sensing’ process, it comes into the next stage of ‘Collection’. You have identified your data, it is available, you only have collect them.
This can be completed either manually or automatically depending on the scale of data collection intended. If it is manual, it wouldn’t be wise performing large-scale data collection, hence the need for programming languages or applications.
The automatic data collection allows the analyst to scrape relevant information from within the data ecosystem instead of taking along the irrelevant ones. This is basically because applications or softwares are already designed to extract specific information from within the database already.
Once data is collected, it still needs to be used, and the case in point is that, the extracted data cannot be used in its raw form, hence it needs to be converted into a format that is usable.
This stage is often refered to as the ‘Data Wrangling’.
Now, depending on the quality of the data being addressed, this stage could see the merging of multiple datasets, the identifying and filling of loopholes within data and the removal of inadequate or incorrect data. The datasets that are established as authentic are structured for further analysis.
Like it was during the collection stage, it can also be performed manually or automatically.
Data wrangling tools such as the DataWrangler, CSVKit, OpenRefine are also within the data ecosystem.
The data has been sensed, collected and wrangled, then comes the ‘Analysis’.
At this stage, the nature of the project is put into consideration to determine whether the analysis will be diagnostic, descriptive, predictive or perhaps prescriptive. This is because each form of analysis is quite unique even though utilizing same processes or tools are applied to them.
Algorithms and Statistical models are used at the stage to investigate and interpret the data or the result.
Then there are the data visualization tools also such as Microsoft BI, Google Charts for graphical representation during analysis.
‘Storing’ data can be considered very paramount for future references and it will do the whole process justice if it is placed in a secure and accessible place, which brings the whole data ecosystem full circle.