What is Data in Machine Learning
DATA : Data is any sequence of one or more symbols given meaning by precise act of interpretation.
It can be value, text, sound or picture or anythings that is not being interpreted and analyzed. Data is the most important for Data Analytics, Machine Learning, Artificial Intelligence. Without data, any model can’t be trained and processed for our application and all modern research and automation will be worthless. Big companies are spending lots of money just to gather as much certain data as possible.
Example: The video, audio, text that we send on Facebook, Whatsapp,, Snapchat to the different users can be considered as data
INFORMATION : Data that has been interpreted and manipulated in the phase of information and has now some meaningful values for the users.
KNOWLEDGE : This is actually the combination of inferred information, learning and insights. Results in consciousness or idea building for an individual or organization.
How we split data in Machine Learning?
Training Dataset:Training data is the portion of data that we process during the training phase. The initial input data and final output as a whole called training data.
Validation Data:This is the part of data which we use to do a frequent assessment of model. These data fit on training dataset along with improving involved hyper-parameters . This data plays an important role during the training of model.
Train test validation:When our model is completely trained through the training data and validation data, finally testing data provides the unbiased valuation. When we feed in the Testing data, our model will forecast some values without seeing actual output result. After prediction of the required data, we evaluate our model by comparing it with actual output present in the testing data phase. By this way the evaluation of the dataset is done. Finally our model has learned from the experiences feed data in as training data, set at the time of training.
Consider an example:
There is an owner of big shopping mall. To collect the review and requirements of the customer, he organize the questionnaires of different sets of problem. These problems are collected in the form of DATA. At the time of evaluation of data, the owner doesn’t turn each page of the paper rather the data he wants to access can be done through the software where the required trained model have encapsulated. This reduce the time wastage and make work easier. Data is deployed through software, calculations, graphs etc. as per own convenience, this inference from deployed data is called as Information. So, Data is fundamental for Information.
Read : Application of Machine Learning for further info.
Data Property :–
Volume:Scale of input data and output data that is processed during the training of the model. With the demand of growing world population and technology at exposure, huge amount of data is being generated each and every millisecond.
Variety:The variety of data is different forms of data – healthcare, images, videos, audio clippings.
Velocity:Rate at which data is streaming and being generated each time.
Value: Meaningfulness of data. This can be categorized as important and unimportant data.
Veracity:This refers to the Certainty and correctness in data we are working on.
8 interesting facts about big data:
Data in machine learning is big. Following are the 8 interesting facts about the big data.
- The data will be increased 300 times i.e. 40 Zeta-bytes (1ZB=10^21 bytes) by 2020.
- Healthcare by sector has a data of 200 Billion Gigabytes by 2020
- Around 400 Million tweets are sent all over the world by about 200 million active users per day
- 4 Billion hours of video streaming is done by the users each month by different users.
- 30 Billion different kinds of contents are shared by the users every month.
- By 2020 we will have created 35 zeta-bytes of data and one third of data will be stored and passed to the cloud.
- Big Data will be $48.6 billion worth in annual spending by 2020.
- There will be the shortage of talent for managing and controlling the data in the near future.
As you can see the above mentioned fact of big data is really big to deal with. So don’t collect the data only for the sake of collection but collect those data which are useful to us. For this you need to know the management of collecting the data using clear plan.So if you want to collect, store, and report data, then you require to put your data to work to get the adequate amount value out of it.
The above-mentioned facts are just a glimpse of the actually existing huge amount of data statistics. When we take real world scenario, the size of data currently present and is getting generated each and every moment is beyond our our expectations.