Blockchain

Blockchain is a distributed ledger that is used to record transactions of a digital asset such as bitcoin. It has been one of the most talked about technologies in the 2010s, continuing to be popular for its tamper-proof design even after bitcoin became notorious for its price vagaries.

Blockchain became a mainstream term since 2014 with applications across several industries being explored – to store data about property exchanges, stops in a supply chain, and even votes for a candidate. I have been following blockchain for many years and finally got some hands-on experience on Ethereum while completing a Pluralsight course.

In this blog, I have listed key blockchain terms / concepts with brief description and also summarised the development environment used:

  • Blockchain is a chain of digital data blocks, with each one containing:
    • Information about transactions such as date, time, amount, price, etc.
    • Unique code called a “hash” that distinguishes a block from another. This is generated by a hashing process.
  • Hashing: an algorithm performed on data to produce an output that can be used to verify that data is not modified, tampered with or corrupted:
    • Length of output always the same.
    • One-way: the hash cannot be converted back into the original key.
    • Digital fingerprint that allows verifying consistency.
  • Obfuscation and Encryption provide data security to blockchain.
  • Key features of Blockchain: Immutable, decentralized, verifiable, increased capacity, better security, faster settlement.
  • Blockchain can be public (like Ethereum) or private (like R3 Corda).
  • Top implementations: Ethereum, Hyperledger Fabric, Ripple, Quorum, R3 Corda.
  • Distributed Applications (DAPPs) architecture using Ethereum: Blockchain encapsulates shared data and logic related to transactions while client is responsible for interface, user credentials and private data.
  • Payment for transactions is made through “gas”.
  • Transactions contain information on recipient, signature, value, gasprice, startgas and message.
  • A consensus mechanism is required to confirm transactions that take place on a blockchain without the need for a third party. Proof of Work and Proof of Stake are the two models currently available to achieve this.
  • Proof of Work is based on cryptography, hence digital coins like Bitcoin and Ethereum are called cryptocurrencies. Cryptography uses mathematical equations that are so difficult that only powerful computers can solve them. No equation is ever the same, meaning that once it is solved, the network knows that the transaction is authentic. Although Proof of Work is an amazing invention, it needs significant amounts of electricity and it is also very limited in the number of transactions it can process at the same time.
  • With Proof of Stake, miners can mine or validate block transactions based on the amount of Bitcoin held by them. This way, instead of utilizing energy to answer Proof of Work puzzles, a miner is limited to mining a percentage of transactions that is reflective of ownership stake.

Development environment:

Ethereum uses a programming language called Solidity, which is an object-oriented language for writing smart contracts. Solidity plugins are available for popular IDEs. I used the following setup:

  • IDE – Eclipse with YAKINDU-Solidity Tools / Visual Studio Code
  • Node.js Package Manager – npm
  • Windows Package Manager – chocolatey (installed from Powershell as administrator)
  • Node.js Windows Build Tool
  • Local ethereum test server and emulator – Ganache / Test RPC
  • Dev & Testing framework – Truffle Suite
  • Crypto wallet & gateway – Metamask
  • My practice code – https://github.com/gsanth/experiment/tree/master/ethereum_experiment

To summarize, Blockchain’s potential as a decentralized form of record-keeping is enormous. From greater user privacy and heightened security to lower processing fees and fewer errors, blockchain technology will continue to see applications across several industries. However, it is easy to get carried away by blockchain’s apparent potential and get into technology overkill. So, it is important to understand the pros and cons of blockchain, and ensure we leverage it for the right use cases.

ProsCons
Decentralized (difficult to tamper)Technology cost for mining
Improved accuracy (no manual effort)Low transactions throughput
Reduced cost (no third party charge)Poor reputation (illicit activities)
Transparent Technology

AWS Certified Solution Architect – Associate

In today’s rapidly changing technology landscape, staying relevant as a software engineer hinges on one’s ability to continuously learn and master emerging technologies. To this end, structured technology courses and certifications help lay a solid foundation that can lead to eventual mastery with real life hands-on experience. I target one technology certification every year, finished Stanford Machine Learning Certification last year and decided to pursue a cloud certification in 2020.

My goal was to complete a comprehensive cloud learning path and had to choose between Google Cloud Platform, Microsoft Azure and Amazon Web Services. All of them offer similar services, so learning one will automatically build understanding of the others. I decided to pursue AWS certification as it is the clear industry leader with almost one third of global cloud market share. There are a dozen AWS certifications available and my choice was Solution Architect Associate, which is the most popular one as it covers the entire AWS offering.

Magic Quadrant for Cloud Infrastructure as a Service, Worldwide (2020)

Given my existing familiarity and understanding of cloud computing, I had a bit of a head start and was able to successfully complete my certification in about a month. This blog summarizes my experience and learnings through this journey.

AWS certifications require serious preparation and usually starts with identifying a MOOC platform for access to learning material. I had used Coursera for ML certification last year as it came bundled with a Stanford certificate. As this certification is directly provided by AWS, I found Pluralsight to be more prolific and flexible for my needs.

This was my first time using Pluralsight and was thoroughly impressed with the learning experience. I typically used youtube for quick reference of new technology topics but the video lectures on Pluralsight take learning experience to a completely different level. The one minor drawback is hands-on practice. While Coursera offered hands-on exercises, I had to independently create AWS Free Tier account to practice through AWS Management Console and CLI along with Pluralsight courses. Being a techie, this was perfectly fine with me and enjoyed this experience as well.

After completing the first Pluralsight learning path and some hands-on practice, I attempted the sample questions on AWS site and felt my preparation was insufficient. So, I finished a few more relevant Pluralsight courses, digital training from AWS site, read through whitepapers and FAQs before attempting practice exams at Kaplan (offered free with Pluralsight).

To summarize, I leveraged the following resources:

AWS is vast and a solution architect is expected to understand all its offerings. So, developing deep understanding across foundational service domains of compute, networking, storage and databases, along with other key topics like security, analytics, app integration, containerization, cloud native solutions, etc made it an awesome learning experience.

AWS Services: Knowledge Areas for Solution Architect – Associate Certification are in bold

After about 3 weeks of preparation, I scheduled my exam at https://www.aws.training/Certification. AWS certification exams had to be taken at an exam center in a controlled environment but with COVID situation, proctored online exam option was also available. I chose Pearson VUE exam and scheduled for a Sunday morning slot. All the caveats called out with proctored online exam is a bit scary, particularly if internet or power connection is lost in the middle of the 130 minutes exam (quite common in India during monsoons). Fortunately, my internet connection remained stable despite heavy rain during the exam and allowed me to pass the exam. So, here I am – AWS Certified Solution Architect – Associate!

Zero To One

Information and Communication Technology has been the primary driver of innovation and engineering advances during the last four decades. The dominance is to such an extent that the term technology today refers to these fields, though there are several other engineering disciplines that continue to exist! I am fortunate to have started my professional career in information technology and am enjoying being part of it for more than twenty years. Unlike any other technology domain that emerged as new hotspot since the industrial revolution 200 years back, the entry barrier for information technology is extremely low that allowed passionate technologists to launch their enterprises from garages. And combine this with the success of venture capital industry from 1970s, start-ups have been the primary source of innovation in the technology industry since the advent of personal computing with Intel 8080 processor.

While I have not worked for start-ups so far, I strongly believe that start-up lessons can help enterprises improve their ability to succeed while creating new products. In this blog, I will share my notes from “Zero to One”, one of the best books on start-up philosophy written by Peter Thiel, a successful entrepreneur himself.

“Zero to One” has an explanation for most of the new technology trends during the last twenty years coming from start-ups. From the Founding Fathers in politics to the Royal Society in science to Fairchild Semiconductor’s “traitorous eight” in business, small groups of people bound together by a sense of mission have changed the world for the better. The easiest explanation for this is negative: it’s hard to develop new things in big organizations, and it’s even harder to do it by yourself. Bureaucratic hierarchies move slowly, and entrenched interests shy away from risk. In the most dysfunctional organizations, signaling that work is being done becomes a better strategy for career advancement than actually doing work. At the other extreme, a lone genius might create a classic work of art or literature, but could never create an entire industry. Startups operate on the principle that you need to work with other people to get stuff done, but you also need to stay small enough so that you actually can. Clayton Christensen has provided similar explanation in his book “The Innovator’s Dilemma” through the concept of “disruptive innovation” and how most companies miss out on new waves of innovation. Does it mean big companies cannot develop new things? They can, as long as they enable the teams focused on building new things to operate like a start-up without burdening them with bureaucracy and creativity sapping processes.

Peter Thiel suggests that we must abandon the following four dogmas created after dot-com crash that still guide start-up business thinking today:

  1. Make incremental advances: Small increments using agile methods has far better chances of success today than waterfall world.
  2. Stay lean and flexible: Avoid massive plan and execute model. Instead, iterative development helps stay nimble and deliver through meaningful experimentation.
  3. Improve on competition: New things are invariably improvements on recognizable products already offered by successful competitors.
  4. Focus on product, not sales: Technology is primarily about product development, not distribution.

But having seen a number of projects in large enterprises, I would say that sticking to these principles by default and making exceptions only for compelling reasons is better.

When it comes to creating new software products for a market, don’t build an undifferentiated commodity business but one that’s so good at what it does that no other product can offer a close substitute. Google is a good example of a company that went from zero to one, after distancing from Microsoft and Yahoo almost 20 years back and became a monopoly. While monopolies sound draconian, the companies that get to the top create monopoly based on a unique value proposition they offer in their markets. So, don’t build new things unless there is a desire and plan to capture significant market share, if not monopoly. Every monopoly is unique, but they usually share some combination of the following characteristics:

  • Proprietary technology
  • Network effects
  • Economies of scale
  • Branding

Another interesting observation is around secrets: most people act as if there were no secrets left behind. With advances in Maths, Science and Technology, we know a lot more about the universe than previous generations but there are still numerous unknowns yet to be conquered. It helps to be conscious of the four social trends that have conspired to root out beliefs in secrets:

  1. Incrementalism: From an early age, we are taught that the right way to do things is to proceed one very small step at a time, day by day, grade by grade. However, unlocking secrets requires us to be brutally focused on the ultimate goal rather than staying satisfied with interim milestones.
  2. Risk Aversion: People are scared of secrets because they are scared of being wrong. If your goal is to never make a mistake in life, you shouldn’t look for secrets. And remember, you can’t create something new and impactful without making mistakes.
  3. Complacency: Getting into a top institute or corporation is viewed as an achievement in itself with nothing more to worry and you are set for life. This leads to complacency and no more fire to unlock secrets.
  4. Flatness: As globalization advances, people perceive the world as one homogenous, highly competitive marketplace and an assumption that someone else would have already found out secrets.

To summarise, when a start-up or an enterprise decides to create a new product, it should resist the temptation to go for a commodity one. It should be a product with clear differentiation that will help create a monopoly or significant market share at a minimum. This can happen only through hard work and dedication to unlock some secrets.

There is a lot more learnings from the book but I have only mentioned the key ones that can help us introspect and stay focused on our goals to create new products.

AI / ML in enterprises: Technology Platform

As an organization embarks on leveraging AI / ML at enterprise scale, it is important to establish a flexible technology platform that caters well to different needs of data scientists and the engineering teams supporting them. Technology platform here includes hardware architecture and software framework that allows ML algorithms to run at scale.

Before getting into software stack directly used by data scientists, lets understand the hardware and software components required to enable machine learning.

  • Hardware layer: x86 based servers (typically intel) with acceleration using GPUs (typically nvidia)
  • Operating Systems: Linux (typically redhat)
  • Enterprise Data Lake (EDL): Hadoop based repository like Cloudera or MapR, along with supporting stacks for data processing:
    • Batch ingestion & processing: example – Apache Spark
    • Stream ingestion & processing: example – Apache Spark Streaming
    • Serving: example – Apache Drill
    • Search & browsing: example – Splunk

Once necessary hardware and data platforms setup, the focus is on providing an effective end user computing experience to data scientists:

  • Notebook framework for data manipulation and visualization: like Jupyter Notebooks or Apache Zeppelin, which support most commonly used programming languages for ML like Python and R.
  • Data collection & visualization: like Elastic Stack and Tableau.
  • An integrated application and data-optimized platform like IBM Spectrum makes it simple for enterprises by addressing all the needs listed above (components include enterprise grid orchestrator along with a Notebook framework and Elastic Stack).
  • Machine Learning platforms: specialized platforms like DataRobot, H2O, etc simplifies ML development lifecycle and lets data scientists and engineering focus on creating business value.

There are numerous other popular platforms like Tensorflow, Anaconda, RStudio and evergreen ones like IBM SPSS, MATLAB. Given the number of options available, particularly open source ones, an attempt to create a comprehensive list will be difficult. My objective is to capture the high-level components required as part of Technology platform for an enterprise to get started with AI / ML development.

AI / ML in enterprises: Lifecycle & Departments

Many start-ups are based on AI / ML competence and require this expertise across the organization. In established enterprises, AI / ML is fast becoming pervasive across the organization given the disruption from start-ups and customer expectations. Depending on the size and level of regulation in their respective industries, machine learning activities might be embedded within existing technology teams or dedicated “horizontal” teams might be responsible for them.

ML activities that people readily recognize are the ones performed by data scientists, data engineers and the like. However, there are other business and technology teams that are essential to enable ML development. Given the potential bias and ethics implications with business decisions made by AI / ML, governance to ensure risk and regulatory compliance will be required too. In this blog, I will cover AI / ML lifecycle along with the functions and departments in an enterprise that are critical for successful ML adoption.

AI Model inventory: There is an increasing regulatory expectation that organizations should be aware of all AI / ML models used across the enterprise to effectively manage risks. This McKinsey article provides an overview of risk management expected in banking industry. As an organization embarks on creating AI / ML development process, a good starting point is to define what constitutes an AI model to ensure common understanding across the organization and create a comprehensive inventory.

Intake and prioritization: To avoid indiscriminate and inappropriate AI / ML development and use, it is important that any such development go through an intake process that evaluates risk, regulatory considerations and return on investment. It is a good practice to define certain org wide expectations and preferable to federate the responsibility for agility.

Data Management: Once an AI Model is approved for development, business and technology teams work together to identify required data, secure them from different data sources across the organization and convert them into feature set for model development.

  • Data Administrators manage various data sources, which typically are data lake (Apache Hadoop implementation) or warehouses (like Teradata) or RDBMS (like SQL Server / Oracle).
  • Data Engineers help with data preparation, wrangling, munging and feature engineering using a variety of tools (like Talend) and makes feature set available for model development.

Model Development: Data Scientists use AI / ML platforms (like Anaconda, H2O, jupyter) to develop AI models. While model development is federated in most enterprises, AI / ML governance requires them to adhere to defined risk and regulatory guidelines.

Model Validation: An Enterprise Risk team usually validates models before production use, particularly for ones that are external facing and deemed high risk.

Deployment & Monitoring: Technology team packages approved models with necessary controls and integrated into appropriate business systems and monitors for stability and resilience.

Enterprises strive to automate the entire lifecycle so that focus can be on adding business value effectively and efficiently. Open Source platforms like AirFlow, MLFlow and Kubeflow help automate orchestration and provide seamless end to end integration for all teams across AI / ML lifecycle.

AI / ML in enterprises: Challenges

Organizations need to keep up with the times for long term sustenance and with AI / ML becoming pervasive across business domains, every firm nowadays has teams trying to leverage machine learning algorithms to stay competitive.

In this blog, I will cover the top five challenges that they encounter after initial euphoria with proof of concepts (POC) and pilots.

  1. Lack of understanding: AI / ML has the potential to transform technology and business processes across the organization and create new revenue streams, mitigate risks or save costs. However, AI / ML is not a substitute for subject matter expertise. A discussion among novices will throw up a million possibilities and ML can appear to be an appropriate solution to all world problems. While machine learning is based on the ability of machine to learn by themselves, training the algorithms with appropriate data is an important aspect that can be done only by experts. To generate meaningful results, data scientists need to work in unison with business and technology professionals. Data scientists bring deep understanding of ML algorithms, business professionals identify meaningful features and data engineers help secure data from different sources that will eventually become feature set. As one can see, good understanding of ML across the organization is important to identify the right problems to solve and lack of it is the most important challenge that prevents enterprises from deriving benefit despite investment.
  2. Lack of IT infrastructure: As I had mentioned in my original ML post, machine learning came to prominence due to significant information technology advances in processing power and data storage. Enterprises can acquire the required compute power through cloud providers and many organizations also choose to build their own parallel processing infrastructure. The decision to leverage cloud vs. building internal infrastructure is based on a number of factors like regulations, scale and most importantly cost considerations. Either ways, without this investment, ML programs will not go too far. Some organizations invest in requisite hardware but fail to provide the software and database platforms required for data scientists and technologists to leverage this infrastructure. Most machine and deep learning platforms and tools used for development are open source. However, this open source cost advantage is offset by the numerous options available for ML development and there is no one size fits all solution. To summarize, the second challenge is to create powerful IT infrastructure required for ML development and deployment.
  3. Lack of Data: With good understanding and infrastructure, this challenge should be addressed but data is foundational for ML and I have listed this out separately to call out the nuances. Data should be available in sufficient quantity and with good quality for meaningful results. Data preparation is an important step – wrangling, munging, feature scaling, mean normalization, labeling and creating an appropriate feature set are essential disciplines. It is a challenge to identify problems that have requisite data at scale and prepare this data for machine learning algorithms to work on.
  4. Lack of talent: Going by the number of machine learning projects that fail to meet their purpose, the ability of existing teams across enterprises is questionable. Any technology is only as good as people working on them. A few technologies have managed to simplify the work expected from programmers (just drag and drop or configuration driven). However, machine learning still requires deep math skills and thorough understanding of algorithms. So, finding suitable talent is particularly difficult.
  5. Regulations & policies: In a diverse world with myriad regional nuances, decisions made by machines tend to undergo a lot more scrutiny than ones made by humans. Our societies are still in paranoia of machines taking over humans and governments all over the world have regulations that require proof of decisions made by machines to be fair and without bias. This challenge is made more complex by interpreters of regulations inside an enterprise who place unnecessary controls that might not address the regulation but impede ML development. So, it is important for policies to address regulatory concerns without derailing ML development.

Finally, enterprises are riddled with politics and it is possible to address all above challenges only when business, technology and other supporting functions work together seamlessly. Start-ups and technology organizations that are relatively new keep it simple and are more adept at solving these challenges. Large enterprises that have added layers of internal complexity over the years naturally find it more difficult to overcome differences and solve the same challenges.

AI / ML in enterprises: Relevance

Let’s explore two aspects that will provide insights into AI / ML relevance within technology and across enterprises. First, roles in technology that need to work on machine learning algorithms and second, areas within an enterprise that will benefit the most from AI / ML.

It is an incorrect assumption that all software engineers will work only on ML algorithms in future and demand for other skills will plummet. In fact, majority of current software engineering roles that do not require machine learning expertise will continue to exist in the future.

Software engineering functions that DO NOT require machine learning expertise: UI / UX development, interface / API development, rule based programming and several other client and server side components that requires structured or object oriented programming. In addition, there are others like database development and SDLC functions that are required for AI / ML technology lifecycle but don’t require deep machine learning knowledge. So, this leaves only data / feature engineering, data science and model deployment teams that absolutely require machine learning expertise. However, these are rapidly growing areas and demand for experts will continue to outpace many other areas.

Where can we leverage ML? Any use case where historical data can be used for making decisions but this data is so extensive that it is practically impossible for a human to comprehensively analyze the data and generate holistic insights will be a candidate for ML. The ML approach will be to leverage human subject matter expertise to source relevant data, determine the right data elements (features), select appropriate ML model and train the model to make predictions and propose decisions. A few examples:

  • Sales & Marketing: Use data around customer behavior and make recommendations. We see this all the time from Amazon, You Tube, Netflix and other technology platforms.
  • IT Operations: Use a variety of features to predict potential failures or outages and alert users / ops.
  • Customer Service: Chatbots that use natural language processing to answer user queries.
  • Intelligent Process Automation: Eliminate manual operations thereby optimizing labor costs and reducing operational risk.
  • Cyber Security: Detect malicious activity and stop attacks.
  • Anomaly detection: Every business domain needs to beware of anomalies and detecting them will reduce losses or accidents. It could be detecting defaults or money laundering or fraud for banks, detecting leak in a chemical plant, detecting a traffic violator, etc.

Every enterprise, large or small, is likely to have AI / ML opportunities that will result in bottom line benefits. In the next part, I will cover the typical challenges an enterprise faces during adoption.

AI / ML in enterprises: Hype vs. Reality

Having done my Machine Learning certification in August 2019, I was fortunate to get an opportunity soon after to build and lead technology team that worked on AI / ML problems across the enterprise.

During the team build-out phase, I realized that many software engineers have completed a formal certification on machine learning to qualify themselves for a role in this emerging technology area where demand is expected to increase. There is also an unfounded assumption that all software engineers will work only on ML algorithms in future and demand for other skills will plummet. The reality is that not all software applications will be suitable machine learning candidates. Moreover, developing machine learning algorithms is only part of AI / ML technology lifecycle. There be massive software engineering needs outside of machine learning, particularly around data and SDLC automation to enable AI / ML technology. Having said that, familiarity of machine learning concepts will increase effectiveness of software engineers as all applications in near-future will interface with ML modules for certain functions.

Now, let’s address another question – is AI / ML just hype? To understand this, lets look at it through the lens of Gartner Hype Cycle. Since mid 1990s, a number of technologies fell by the wayside after inflated expectations in the beginning. However, a few like cloud computing, APIs / web services and social software went through the hype cycle but the reality after mainstream adoption was quite close to initial expectations. Looking at hypes since 2013, several technologies related to AI / ML have been at the top every year. Starting with big data and content analytics, we have seen natural language processing, autonomous vehicles, virtual assistants, deep learning and deep neural networks emerge at the top during the last seven years. And results from machine learning algorithms have already become part of our day to day life – like recommendations made by Amazon, You Tube or Netflix and chatbots available through a number of channels.

So, I believe AI / ML is real and will continue to disrupt mainstream industries. However, it will be different from other familiar technology disruptions in many ways:

  • AI / ML technology will continue to evolve rapidly, driven by silicon valley innovation.
  • New specialized areas of expertise will emerge every year that will require deep math understanding.
  • Technology workforce will be under pressure as past work experience will be of limited value due to this fast evolution.
  • Traditional enterprises will struggle to keep pace.
  • Possibility of learning through data will undermine established business theories.

Finally, the overwhelmingly open source nature of this domain will lower entry barrier and promote start-ups to challenge established players. It will also give an opportunity for established organizations to adopt and manage this disruption. The choices made will determine whether an organization disappears like Blackberry, comes back with a bang like Microsoft or continue to hang-on like IBM. While this is primarily about embracing a relatively new technology domain, appropriate strategy around people and process will also be required to succeed. To summarize, organizations will have to create the right ecosystem and provide clarity on approach that encourages people to innovate.

In this blog series, I will articulate my thoughts around people, process and technology considerations while adopting AI / ML in a large enterprise:

  • Technology functions that will require machine learning expertise.
  • Business domains that will benefit from AI / ML.
  • Challenges that enterprises should be prepared to encounter.
  • Structure and governance to scale up adoption.
  • ML technology platform.

Machine Learning

Artificial Intelligence (AI) and Machine Learning (ML) are among the top trending technology keywords currently. While some people use these terms interchangeably, ML is really a subset of AI and the definition here gives a good idea:

  • AI is the wider concept of machines being able to execute tasks in a way that we would consider “smart”.
  • ML is an active application of the AI-based idea that we should be able to give machines way into data and let them learn by themselves.

Machine Learning is the field in focus nowadays with technology companies leading the way over the last decade and organizations across other domains following suit to provide differentiated offerings to their customers leveraging ML. All this was made possible by computer and information technology advances during the last decade:

  • Wide availability of GPUs that has made parallel computing cheaper and faster.
  • Data storage becoming cheaper enabling infinite storage at a fraction of the cost compared to few years ago.
  • Ubiquitous mobile phones with internet access creating a flood of data of all stripes – images, text, mapping data, etc.

In fact, the above three advances have transformed IT strategy across industries beyond just ML. Industry leaders are aggressively pursuing cloud first strategy, re-engineering their applications into microservices based architecture and generating insights for businesses and customers using data science that leverage machine learning algorithms.

It is important to get a good understanding of these technologies to transform existing platforms. When I was looking for a way to get hands-on understanding on Machine Learning, one of my colleagues suggested the online course offered by Stanford University and taught by the renowned ML researcher Andrew Ng. This course has programming assignments that requires one to write algorithms and solve real world problems in either Octave or Matlab. You can find my completed assignments on my github page. This blog summarizes my learning from this course – one that offered one of the most interesting learning experiences for me! Kudos to the course!!!

Lets start with a machine learning definition quoted in the course: Well-posed learning problem by Tom Mitchell (1998) – A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T as measured by P improves with experience E.

Supervised Learning: we are given a data set and already know what our correct output should look like, having the idea that there is a relationship between the input and the output.

  • Regression: predict results within a continuous output, meaning that we are trying to map input variables to some continuous function.
  • Classification: predict results in a discrete output. In other words, map input variables into discrete categories.

Unsupervised Learning: allows us to approach problems with little or no idea what our results should look like. We can derive structure from data where we don’t necessarily know the effect of the variables.

Some of the key components of machine learning algorithms along with how they are denoted in this course:

  • Input or Feature: denoted by x or X (uppercase denotes vector)
  • Output or Label: denoted by y or Y
  • Training example: a pair of input and corresponding output (ith pair) – (x(i), y(i))
  • Training set: a list of m training examples
  • Number of features: denoted by n
  • Hypothesis: function h is called hypothesis. Given a training set, to learn a function h : X → Y so that h(x) is a “good” predictor for the corresponding value of y
  • Parameter estimates or Theta: denoted by Θ. Every feature (xj) will have a parameter estimate (Θj) and adding a bias parameter (Θ0) helps shift the output on either side of the axis.
  • Cost function:  denoted by J(Θ0, Θ1), “Squared error function” or “Mean squared error” used to measure the accuracy of hypothesis function.
    J(θ0, θ1) = ⁄ 2m ∑ (i=1 to m) (hθ(xi) − yi)2
  • Gradient descent: we have our hypothesis function and we have a way of measuring how well it fits into the data. Gradient descent helps us to arrive at optimum parameters in the hypothesis function.
    θj := θj − α ⁄ ∂θj J(θ0, θ1)
  • Learning rate / step: denoted by α (size of each gradient descent step)
  • Feature scaling: dividing the input values by the range (i.e. the maximum value minus the minimum value) of the input variable, resulting in a new range of just 1.
  • Mean normalization: subtracting the average value for an input variable from the values for that input variable resulting in a new average value for the input variable of just zero.
  • Polynomial regression: creating a better fit for the curve of our hypothesis function by making it a quadratic, cubic or square root function (or any other form)
  • Underfitting or high bias: when the form of our hypothesis function maps poorly to the trend of the data. It is usually caused by a function that is too simple or uses too few features.
  • Overfitting or high variance: caused by a hypothesis function that fits the available data but does not generalize well to predict new data. It is usually caused by a complicated function that creates a lot of unnecessary curves and angles unrelated to the data.
  • Regularization parameter or Lambda: denoted by λ. It determines how much the costs of our theta parameters are inflated and can smooth the output of our hypothesis function to reduce overfitting.
  • Learning curves: Plot of training error and test (cross validation) error curves with training set size on x-axis and error on y-axis. Lack of convergence of these curves with increasing training set size indicates high-bias (underfit) whereas high-variance (overfit) scenarios will converge as more training examples are made available.
  • Metrics for skewed classes:
    • Precision (P): “true positives” / “number of predicted positive”
    • Recall (R): “true positives” / “number of actual positive”
    • F1 score: (2*P*R) / (P+R)
  • Decision boundary: Specific to classification algorithms – is the line that separates the area where y = 0 and where y = 1. It is created by our hypothesis function.

Machine Learning algorithms:

  • Linear Regression
  • Logistic Regression
  • Support Vector Machines (SVM): An alternative to logistic regression for classification problems:
    • Use a kernel like Gaussian kernel to come up with hypothesis.
    • Appropriate when n is small and m is large.
    • As the number of features increase, computation of Gaussian kernel slows down.
  • K-Means Algorithm: Unsupervised learning algorithm for identifying clusters in a dataset.
  • Dimensionality reduction / Principal Component Analysis: Compression reduces memory needed to store data and also speeds up learning algorithm.
  • Anomaly Detection: Used when a dataset comprises a small number of positive examples. Typical use cases:
    • Fraud detection
    • Quality testing in manufacturing
    • Monitoring computers in a data center
  • Recommender Systems
  • Stochastic & mini-batch Gradient Descent: for ML with large datasets

Neural networks: This forms the base for deep learning and this course provides an introduction to this complex area. Neural network model is based on how our brain works with millions of neurons – neurons are basically computational units that take inputs (dendrites) as electrical inputs (called “spikes”) that are channeled to outputs (axons).

In our model, our dendrites are like the input features x1…..xn, and the output is the result of our hypothesis function. In this model our x0 input node is called the “bias unit.” It is always equal to 1. In neural networks, we use the same logistic function as in classification and call it a sigmoid (logistic) activation function. In this situation, our “theta” parameters are sometimes called “weights”. Our input nodes (layer 1), also known as the “input layer”, go into another node (layer 2), which finally outputs the hypothesis function, known as the “output layer”. We can have intermediate layers of nodes between the input and output layers called the “hidden layers.”

“Backpropagation” is neural-network terminology for minimizing our cost function, just like what we were doing with gradient descent in logistic and linear regression.

Steps to setup and train a neural network:

  • First, pick a network architecture; choose the layout of your neural network, including how many hidden units in each layer and how many layers in total you want to have.
    • Number of input units = dimension of features x(i)
    • Number of output units = number of classes
    • Number of hidden units per layer = usually more the better (must balance with cost of computation as it increases with more hidden units)
    • Defaults: 1 hidden layer. If you have more than 1 hidden layer, then it is recommended that you have the same number of units in every hidden layer
  • Randomly initialize the weights
  • Implement forward propagation to get hΘ(x(i)) for any x(i)
  • Implement the cost function
  • Implement backpropagation to compute partial derivatives
  • Use gradient checking to confirm that your backpropagation works. Then disable gradient checking.
  • Use gradient descent or a built-in optimization function to minimize the cost function with the weights in theta.

Overall, an excellent course to get started with Machine Learning and get insights into the most commonly used ML algorithms.