Tuesday 29 June 2021

Machine learning dataset and repositories

 It is always good idea to practice and learn with real dataset. There are many websites that publishes open datasets for training different models. There are few websites that list open data repositories. These are called Meta portals. Below are the links to the repositories. Make sure to check the data size before downloading. Some datasets are really huge which will take considerable amount of time and internet bandwidth to download.

Data repositories:

  • https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research
  • http://archive.ics.uci.edu/ml/index.php
  • https://registry.opendata.aws/
  • https://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public
  • https://www.kaggle.com/datasets
  • http://dataportals.org/
  • https://opendatamonitor.eu/frontend/web/index.php?r=dashboard%2Findex
  • https://www.quandl.com/

Sunday 27 June 2021

Types of Machine Learning Systems

There are different categories of systems available for Machine Learning. These systems are mostly classified based on below criteria:

  • Systems that require human supervision for example Supervised , Unsupervised , Semi supervised and Reinforcement learnings.



  • Systems that can learn on the fly for example Online vs Batch learning.


  • Instance based or model based learning. 
Instance based system learns the examples and tries to compare new inputs with the learned examples. For example Spam Filter, system learns the spam email and then flags the incoming email by comparing with the learned Spam emails.
Model based systems creates model and from set of example and then uses the model to make predictions. For example Housing Price prediction system can create model based on set of features and then make prediction for new Houses by using the model.
  • Hybrid System that leverages combination of above approach. For example Spam Filter can use on the fly learning approach using neural network models making it an on online , model based and supervised learning system.




Machine Learning landscape.

 What is Machine Learning ?

Machine Learning is the field of study that gives computers the ability to learn without being explicitly programmed.  (Arthur Samuel, 1959)


A Computer program is said to learn from experience E with respect to some task T and some performance measure P. if its performance on T, as measured by P, improved with experience E.   (Tom Mitchell, 1997) 

 

   The second definition by Tom Mitchell is more technical and is helpful to understand what Machine Learning does. Let's consider the example of Spam email filter. A spam filter analyzes email and classifies them into either spam or not spam. The examples that Spam Filter will use to train the model is called "Training Set". Each training example is called "Training Instance" or "Sample". In this example the task (T) is to flag any new email, the experience E is the training data, and the performance measure P could be the ratio of correct classified email.

In traditional application we will have a huge list of if-else-if condition or some kind of rule engine running on the list of phrases like "free", "amazing", "award", "4U" etc. Our program will scan the content of the email and match it with these words. If these words are present in the email program will flag the email as spam. 

Let's consider a scenario where Spammer identifies that if emails have phrase "4U", spam filter is marking email as spam and they update the phrase to "for U". To fix this we need to revisit our application and update rules to include phrase "for U" as well. This is very simple scenario there can be more complicated scenarios that could lead to frequent release of rules for application or frequent update of application code.

If our Spam filter is built using Machine Learning, it will automatically learn which words and phrases are spam by detecting patterns of words. If spammer changes the phrase "4U" to "for U", spam filter will automatically detect the pattern by analyzing user feedback. 

Owner of Machine Learning Algorithms can inspect to see what their system have learned so far. For our example of Spam email filter, it can be inspected to view the list of words or combination of words that is believes to be spam. This is helpful in having better understanding of problem. Using Machine Learning technique to dig into huge amount of data and identify patterns is called Data Mining. 



Applications of Machine learning

  • Auto classification of products on a production line. (Convolution Neural Network)
  • Summarizing long documentary (Natural Language Processing)
  • Revenue forecasting based on performance metrics. (Linear Regression, SVM, Artificial Neural Network)
  • Flag comments real time. (Natural Language Processing)
  • Classify pages , emails or news. (Natural Language Processing)
  • Detecting tumors. (Convolutional Neural Network)
  • Making application react to voice commands.
  • Detecting fraud in Financial institutions like credit card fraud. (Anomaly Detection)
  • Product recommendation based on past purchases. (Artificial Neural Network)


Summary

Machine Learning is used for:

  1. Rule based application. Machine learning algorithms is more simpler and better performant for application that requires long list of rules.
  2. Application that continuously evaluate new set of data (fluctuating data).  
  3. Getting insights into complex problem. 

Saturday 26 June 2021

AI and ML 10000 feet overview.

What is Artificial Intelligence? 

AI is the process of simulating human intelligence in machine in other words machine can be programed to think like human and can also mimic human action. Machine showing traits of AI makes rational decision based on logic or learning from past experiences. A subset of Artificial Intelligence is Machine Learning. Machine Learning or ML is a technique within AI world that enables computer to improve task execution by using results of previous executions. Deep Learning is a technique that enables automatic learning using huge amount of data. These data are mostly unstructured such as text, images or video.

AI can be divided into two categories: weak and strong. Weak AI applications carry out one specific job for example Amazon' Alexa, Apple Siri or other voice assistant. Strong AI on the other hand performs more complex and complicated tasks. Example of strong AI is self driving cars. Strong AI is sometime called as full AI or Artificial General Intelligence. Machine having strong AI will demonstrate self-awareness and consciousness. Achieving AGI is a long term goal for AI. 

Artificial Intelligence timelines.

In 1955, AI became part of academic discipline. It has seen several wave of optimism and disappointment since then. The time period when AI research and development almost stopped is called AI Winter. There were two major winters between 1974-1980 and 1987-1993.

More detailed timeline of AI can be found here.

Artificial Intelligence Landscape.

Purpose of this blog

Recently I found my interest in AI and Machine learning. First step I took was to go over Machine learning training from Coursera. As I move ahead and learn more and more about AI and ML, I will continue to post my knowledge in this blog. I think this will be helpful for anyone who wants to learn and experiment with AI and ML.


Statistical Learnings

Statistics is the study and manipulation of data, including ways to gather, analyze and draw conclusions from data. Statistical learning, ak...