Technologist

AI / ML in enterprises: Lifecycle & Departments

Many start-ups are based on AI / ML competence and require this expertise across the organization. In established enterprises, AI / ML is fast becoming pervasive across the organization given the disruption from start-ups and customer expectations. Depending on the size and level of regulation in their respective industries, machine learning activities might be embedded within existing technology teams or dedicated “horizontal” teams might be responsible for them.

ML activities that people readily recognize are the ones performed by data scientists, data engineers and the like. However, there are other business and technology teams that are essential to enable ML development. Given the potential bias and ethics implications with business decisions made by AI / ML, governance to ensure risk and regulatory compliance will be required too. In this blog, I will cover AI / ML lifecycle along with the functions and departments in an enterprise that are critical for successful ML adoption.

AI Model inventory: There is an increasing regulatory expectation that organizations should be aware of all AI / ML models used across the enterprise to effectively manage risks. This McKinsey article provides an overview of risk management expected in banking industry. As an organization embarks on creating AI / ML development process, a good starting point is to define what constitutes an AI model to ensure common understanding across the organization and create a comprehensive inventory.

Intake and prioritization: To avoid indiscriminate and inappropriate AI / ML development and use, it is important that any such development go through an intake process that evaluates risk, regulatory considerations and return on investment. It is a good practice to define certain org wide expectations and preferable to federate the responsibility for agility.

Data Management: Once an AI Model is approved for development, business and technology teams work together to identify required data, secure them from different data sources across the organization and convert them into feature set for model development.

  • Data Administrators manage various data sources, which typically are data lake (Apache Hadoop implementation) or warehouses (like Teradata) or RDBMS (like SQL Server / Oracle).
  • Data Engineers help with data preparation, wrangling, munging and feature engineering using a variety of tools (like Talend) and makes feature set available for model development.

Model Development: Data Scientists use AI / ML platforms (like Anaconda, H2O, jupyter) to develop AI models. While model development is federated in most enterprises, AI / ML governance requires them to adhere to defined risk and regulatory guidelines.

Model Validation: An Enterprise Risk team usually validates models before production use, particularly for ones that are external facing and deemed high risk.

Deployment & Monitoring: Technology team packages approved models with necessary controls and integrated into appropriate business systems and monitors for stability and resilience.

Enterprises strive to automate the entire lifecycle so that focus can be on adding business value effectively and efficiently. Open Source platforms like AirFlow, MLFlow and Kubeflow help automate orchestration and provide seamless end to end integration for all teams across AI / ML lifecycle.

AI / ML in enterprises: Challenges

Organizations need to keep up with the times for long term sustenance and with AI / ML becoming pervasive across business domains, every firm nowadays has teams trying to leverage machine learning algorithms to stay competitive.

In this blog, I will cover the top five challenges that they encounter after initial euphoria with proof of concepts (POC) and pilots.

  1. Lack of understanding: AI / ML has the potential to transform technology and business processes across the organization and create new revenue streams, mitigate risks or save costs. However, AI / ML is not a substitute for subject matter expertise. A discussion among novices will throw up a million possibilities and ML can appear to be an appropriate solution to all world problems. While machine learning is based on the ability of machine to learn by themselves, training the algorithms with appropriate data is an important aspect that can be done only by experts. To generate meaningful results, data scientists need to work in unison with business and technology professionals. Data scientists bring deep understanding of ML algorithms, business professionals identify meaningful features and data engineers help secure data from different sources that will eventually become feature set. As one can see, good understanding of ML across the organization is important to identify the right problems to solve and lack of it is the most important challenge that prevents enterprises from deriving benefit despite investment.
  2. Lack of IT infrastructure: As I had mentioned in my original ML post, machine learning came to prominence due to significant information technology advances in processing power and data storage. Enterprises can acquire the required compute power through cloud providers and many organizations also choose to build their own parallel processing infrastructure. The decision to leverage cloud vs. building internal infrastructure is based on a number of factors like regulations, scale and most importantly cost considerations. Either ways, without this investment, ML programs will not go too far. Some organizations invest in requisite hardware but fail to provide the software and database platforms required for data scientists and technologists to leverage this infrastructure. Most machine and deep learning platforms and tools used for development are open source. However, this open source cost advantage is offset by the numerous options available for ML development and there is no one size fits all solution. To summarize, the second challenge is to create powerful IT infrastructure required for ML development and deployment.
  3. Lack of Data: With good understanding and infrastructure, this challenge should be addressed but data is foundational for ML and I have listed this out separately to call out the nuances. Data should be available in sufficient quantity and with good quality for meaningful results. Data preparation is an important step – wrangling, munging, feature scaling, mean normalization, labeling and creating an appropriate feature set are essential disciplines. It is a challenge to identify problems that have requisite data at scale and prepare this data for machine learning algorithms to work on.
  4. Lack of talent: Going by the number of machine learning projects that fail to meet their purpose, the ability of existing teams across enterprises is questionable. Any technology is only as good as people working on them. A few technologies have managed to simplify the work expected from programmers (just drag and drop or configuration driven). However, machine learning still requires deep math skills and thorough understanding of algorithms. So, finding suitable talent is particularly difficult.
  5. Regulations & policies: In a diverse world with myriad regional nuances, decisions made by machines tend to undergo a lot more scrutiny than ones made by humans. Our societies are still in paranoia of machines taking over humans and governments all over the world have regulations that require proof of decisions made by machines to be fair and without bias. This challenge is made more complex by interpreters of regulations inside an enterprise who place unnecessary controls that might not address the regulation but impede ML development. So, it is important for policies to address regulatory concerns without derailing ML development.

Finally, enterprises are riddled with politics and it is possible to address all above challenges only when business, technology and other supporting functions work together seamlessly. Start-ups and technology organizations that are relatively new keep it simple and are more adept at solving these challenges. Large enterprises that have added layers of internal complexity over the years naturally find it more difficult to overcome differences and solve the same challenges.

AI / ML in enterprises: Relevance

Let’s explore two aspects that will provide insights into AI / ML relevance within technology and across enterprises. First, roles in technology that need to work on machine learning algorithms and second, areas within an enterprise that will benefit the most from AI / ML.

It is an incorrect assumption that all software engineers will work only on ML algorithms in future and demand for other skills will plummet. In fact, majority of current software engineering roles that do not require machine learning expertise will continue to exist in the future.

Software engineering functions that DO NOT require machine learning expertise: UI / UX development, interface / API development, rule based programming and several other client and server side components that requires structured or object oriented programming. In addition, there are others like database development and SDLC functions that are required for AI / ML technology lifecycle but don’t require deep machine learning knowledge. So, this leaves only data / feature engineering, data science and model deployment teams that absolutely require machine learning expertise. However, these are rapidly growing areas and demand for experts will continue to outpace many other areas.

Where can we leverage ML? Any use case where historical data can be used for making decisions but this data is so extensive that it is practically impossible for a human to comprehensively analyze the data and generate holistic insights will be a candidate for ML. The ML approach will be to leverage human subject matter expertise to source relevant data, determine the right data elements (features), select appropriate ML model and train the model to make predictions and propose decisions. A few examples:

  • Sales & Marketing: Use data around customer behavior and make recommendations. We see this all the time from Amazon, You Tube, Netflix and other technology platforms.
  • IT Operations: Use a variety of features to predict potential failures or outages and alert users / ops.
  • Customer Service: Chatbots that use natural language processing to answer user queries.
  • Intelligent Process Automation: Eliminate manual operations thereby optimizing labor costs and reducing operational risk.
  • Cyber Security: Detect malicious activity and stop attacks.
  • Anomaly detection: Every business domain needs to beware of anomalies and detecting them will reduce losses or accidents. It could be detecting defaults or money laundering or fraud for banks, detecting leak in a chemical plant, detecting a traffic violator, etc.

Every enterprise, large or small, is likely to have AI / ML opportunities that will result in bottom line benefits. In the next part, I will cover the typical challenges an enterprise faces during adoption.

AI / ML in enterprises: Hype vs. Reality

Having done my Machine Learning certification in August 2019, I was fortunate to get an opportunity soon after to build and lead technology team that worked on AI / ML problems across the enterprise.

During the team build-out phase, I realized that many software engineers have completed a formal certification on machine learning to qualify themselves for a role in this emerging technology area where demand is expected to increase. There is also an unfounded assumption that all software engineers will work only on ML algorithms in future and demand for other skills will plummet. The reality is that not all software applications will be suitable machine learning candidates. Moreover, developing machine learning algorithms is only part of AI / ML technology lifecycle. There be massive software engineering needs outside of machine learning, particularly around data and SDLC automation to enable AI / ML technology. Having said that, familiarity of machine learning concepts will increase effectiveness of software engineers as all applications in near-future will interface with ML modules for certain functions.

Now, let’s address another question – is AI / ML just hype? To understand this, lets look at it through the lens of Gartner Hype Cycle. Since mid 1990s, a number of technologies fell by the wayside after inflated expectations in the beginning. However, a few like cloud computing, APIs / web services and social software went through the hype cycle but the reality after mainstream adoption was quite close to initial expectations. Looking at hypes since 2013, several technologies related to AI / ML have been at the top every year. Starting with big data and content analytics, we have seen natural language processing, autonomous vehicles, virtual assistants, deep learning and deep neural networks emerge at the top during the last seven years. And results from machine learning algorithms have already become part of our day to day life – like recommendations made by Amazon, You Tube or Netflix and chatbots available through a number of channels.

So, I believe AI / ML is real and will continue to disrupt mainstream industries. However, it will be different from other familiar technology disruptions in many ways:

  • AI / ML technology will continue to evolve rapidly, driven by silicon valley innovation.
  • New specialized areas of expertise will emerge every year that will require deep math understanding.
  • Technology workforce will be under pressure as past work experience will be of limited value due to this fast evolution.
  • Traditional enterprises will struggle to keep pace.
  • Possibility of learning through data will undermine established business theories.

Finally, the overwhelmingly open source nature of this domain will lower entry barrier and promote start-ups to challenge established players. It will also give an opportunity for established organizations to adopt and manage this disruption. The choices made will determine whether an organization disappears like Blackberry, comes back with a bang like Microsoft or continue to hang-on like IBM. While this is primarily about embracing a relatively new technology domain, appropriate strategy around people and process will also be required to succeed. To summarize, organizations will have to create the right ecosystem and provide clarity on approach that encourages people to innovate.

In this blog series, I will articulate my thoughts around people, process and technology considerations while adopting AI / ML in a large enterprise:

  • Technology functions that will require machine learning expertise.
  • Business domains that will benefit from AI / ML.
  • Challenges that enterprises should be prepared to encounter.
  • Structure and governance to scale up adoption.
  • ML technology platform.

Machine Learning

Artificial Intelligence (AI) and Machine Learning (ML) are among the top trending technology keywords currently. While some people use these terms interchangeably, ML is really a subset of AI and the definition here gives a good idea:

  • AI is the wider concept of machines being able to execute tasks in a way that we would consider “smart”.
  • ML is an active application of the AI-based idea that we should be able to give machines way into data and let them learn by themselves.

Machine Learning is the field in focus nowadays with technology companies leading the way over the last decade and organizations across other domains following suit to provide differentiated offerings to their customers leveraging ML. All this was made possible by computer and information technology advances during the last decade:

  • Wide availability of GPUs that has made parallel computing cheaper and faster.
  • Data storage becoming cheaper enabling infinite storage at a fraction of the cost compared to few years ago.
  • Ubiquitous mobile phones with internet access creating a flood of data of all stripes – images, text, mapping data, etc.

In fact, the above three advances have transformed IT strategy across industries beyond just ML. Industry leaders are aggressively pursuing cloud first strategy, re-engineering their applications into microservices based architecture and generating insights for businesses and customers using data science that leverage machine learning algorithms.

It is important to get a good understanding of these technologies to transform existing platforms. When I was looking for a way to get hands-on understanding on Machine Learning, one of my colleagues suggested the online course offered by Stanford University and taught by the renowned ML researcher Andrew Ng. This course has programming assignments that requires one to write algorithms and solve real world problems in either Octave or Matlab. You can find my completed assignments on my github page. This blog summarizes my learning from this course – one that offered one of the most interesting learning experiences for me! Kudos to the course!!!

Lets start with a machine learning definition quoted in the course: Well-posed learning problem by Tom Mitchell (1998) – A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T as measured by P improves with experience E.

Supervised Learning: we are given a data set and already know what our correct output should look like, having the idea that there is a relationship between the input and the output.

  • Regression: predict results within a continuous output, meaning that we are trying to map input variables to some continuous function.
  • Classification: predict results in a discrete output. In other words, map input variables into discrete categories.

Unsupervised Learning: allows us to approach problems with little or no idea what our results should look like. We can derive structure from data where we don’t necessarily know the effect of the variables.

Some of the key components of machine learning algorithms along with how they are denoted in this course:

  • Input or Feature: denoted by x or X (uppercase denotes vector)
  • Output or Label: denoted by y or Y
  • Training example: a pair of input and corresponding output (ith pair) – (x(i), y(i))
  • Training set: a list of m training examples
  • Number of features: denoted by n
  • Hypothesis: function h is called hypothesis. Given a training set, to learn a function h : X → Y so that h(x) is a “good” predictor for the corresponding value of y
  • Parameter estimates or Theta: denoted by Θ. Every feature (xj) will have a parameter estimate (Θj) and adding a bias parameter (Θ0) helps shift the output on either side of the axis.
  • Cost function:  denoted by J(Θ0, Θ1), “Squared error function” or “Mean squared error” used to measure the accuracy of hypothesis function.
    J(θ0, θ1) = ⁄ 2m ∑ (i=1 to m) (hθ(xi) − yi)2
  • Gradient descent: we have our hypothesis function and we have a way of measuring how well it fits into the data. Gradient descent helps us to arrive at optimum parameters in the hypothesis function.
    θj := θj − α ⁄ ∂θj J(θ0, θ1)
  • Learning rate / step: denoted by α (size of each gradient descent step)
  • Feature scaling: dividing the input values by the range (i.e. the maximum value minus the minimum value) of the input variable, resulting in a new range of just 1.
  • Mean normalization: subtracting the average value for an input variable from the values for that input variable resulting in a new average value for the input variable of just zero.
  • Polynomial regression: creating a better fit for the curve of our hypothesis function by making it a quadratic, cubic or square root function (or any other form)
  • Underfitting or high bias: when the form of our hypothesis function maps poorly to the trend of the data. It is usually caused by a function that is too simple or uses too few features.
  • Overfitting or high variance: caused by a hypothesis function that fits the available data but does not generalize well to predict new data. It is usually caused by a complicated function that creates a lot of unnecessary curves and angles unrelated to the data.
  • Regularization parameter or Lambda: denoted by λ. It determines how much the costs of our theta parameters are inflated and can smooth the output of our hypothesis function to reduce overfitting.
  • Learning curves: Plot of training error and test (cross validation) error curves with training set size on x-axis and error on y-axis. Lack of convergence of these curves with increasing training set size indicates high-bias (underfit) whereas high-variance (overfit) scenarios will converge as more training examples are made available.
  • Metrics for skewed classes:
    • Precision (P): “true positives” / “number of predicted positive”
    • Recall (R): “true positives” / “number of actual positive”
    • F1 score: (2*P*R) / (P+R)
  • Decision boundary: Specific to classification algorithms – is the line that separates the area where y = 0 and where y = 1. It is created by our hypothesis function.

Machine Learning algorithms:

  • Linear Regression
  • Logistic Regression
  • Support Vector Machines (SVM): An alternative to logistic regression for classification problems:
    • Use a kernel like Gaussian kernel to come up with hypothesis.
    • Appropriate when n is small and m is large.
    • As the number of features increase, computation of Gaussian kernel slows down.
  • K-Means Algorithm: Unsupervised learning algorithm for identifying clusters in a dataset.
  • Dimensionality reduction / Principal Component Analysis: Compression reduces memory needed to store data and also speeds up learning algorithm.
  • Anomaly Detection: Used when a dataset comprises a small number of positive examples. Typical use cases:
    • Fraud detection
    • Quality testing in manufacturing
    • Monitoring computers in a data center
  • Recommender Systems
  • Stochastic & mini-batch Gradient Descent: for ML with large datasets

Neural networks: This forms the base for deep learning and this course provides an introduction to this complex area. Neural network model is based on how our brain works with millions of neurons – neurons are basically computational units that take inputs (dendrites) as electrical inputs (called “spikes”) that are channeled to outputs (axons).

In our model, our dendrites are like the input features x1…..xn, and the output is the result of our hypothesis function. In this model our x0 input node is called the “bias unit.” It is always equal to 1. In neural networks, we use the same logistic function as in classification and call it a sigmoid (logistic) activation function. In this situation, our “theta” parameters are sometimes called “weights”. Our input nodes (layer 1), also known as the “input layer”, go into another node (layer 2), which finally outputs the hypothesis function, known as the “output layer”. We can have intermediate layers of nodes between the input and output layers called the “hidden layers.”

“Backpropagation” is neural-network terminology for minimizing our cost function, just like what we were doing with gradient descent in logistic and linear regression.

Steps to setup and train a neural network:

  • First, pick a network architecture; choose the layout of your neural network, including how many hidden units in each layer and how many layers in total you want to have.
    • Number of input units = dimension of features x(i)
    • Number of output units = number of classes
    • Number of hidden units per layer = usually more the better (must balance with cost of computation as it increases with more hidden units)
    • Defaults: 1 hidden layer. If you have more than 1 hidden layer, then it is recommended that you have the same number of units in every hidden layer
  • Randomly initialize the weights
  • Implement forward propagation to get hΘ(x(i)) for any x(i)
  • Implement the cost function
  • Implement backpropagation to compute partial derivatives
  • Use gradient checking to confirm that your backpropagation works. Then disable gradient checking.
  • Use gradient descent or a built-in optimization function to minimize the cost function with the weights in theta.

Overall, an excellent course to get started with Machine Learning and get insights into the most commonly used ML algorithms.

Agile Engineering Practices

Agile software development helps reduce “time to market” by placing value on “responding to change” over “following a plan”. It is proven that a “project plan” only provides an illusion of progress towards the product goal, given the number of failed projects across the industry with solid plans and after several person years of effort. Instead, Agile seeks to “fail fast” and “pivot” to more valuable goals. This is possible only when the team operates with strong discipline and solid engineering practices.

Using the word “engineering” is anathema for some practitioners who consider software development to be “craft” than “engineering” discipline. While creativity is essential for software development, engineering discipline enables creativity. Remember Nikola Tesla who proved Thomas Edison wrong on alternating current – how many can claim to be more creative than him? Nikola Tesla was an electrical and mechanical engineer who combined his engineering discipline with creativity to become a genius! Engineering discipline helps address variability and unpredictability with software development. The engineering practices I will cover below act as the scaffolding required to provide safety to the Agile team as they embark on building a tall tower!

Test Driven Development (TDD): This invariably appears on any list of engineering practices and there are variants in Behaviour Driven Development (BDD) and Acceptance Test Driven Development (ATDD). TDD is best described by Rob Martin with three rules:

  1. Write no production code except to pass a failing test
  2. Write only enough of a test to demonstrate failure
  3. Write only enough production code to pass a failing test

These three rules are logical and sound simple. But I can vouch this will be painful. In this competitive world, there is no way to create something outstanding and unique without going through pain. Automated unit tests form the basis for other engineering practices that come later in SDLC.

There are numerous tools for TDD, some of the popular ones I have used – JUnit, Robot Framework, Fitnesse, Lettuce (BDD).

Continuous Integration (CI): is the practice of merging all developer working copies to a shared mainline several times a day. This will help avoid “integration hell” that developers encounter when they try to merge their changes just before release packaging. The reason for “integration hell” is obvious – a developer continues to accumulate technology debt by hanging on to changes in local environment without checking them into the mainline. It is prudent to keep repaying debt in small increments rather than accumulating it to become a monster! CI has the following prerequisites:

  1. Code Repository – Git, SVN, TFS, etc.
  2. Automated build – Gradle, Maven, Ant, Make, etc.
  3. Build self-test – refer to TDD

Jenkins is the most popular CI server with thousands of plugins to setup a robust CI environment. Once you have CI setup, next level engineering is Continuous Deployment (CD) that enables software to be deployed directly into production.

Refactoring: Martin Fowler’s book is the authority on this topic. His preamble is insightful – Refactoring is a controlled technique for improving the design of an existing code base. Its essence is applying a series of small behavior-preserving transformations, each of which “too small to be worth doing”. However the cumulative effect of each of these transformations is quite significant. By doing them in small steps you reduce the risk of introducing errors. You also avoid having the system broken while you are carrying out the restructuring – which allows you to gradually refactor a system over an extended period of time.

Technologists often talk about challenges with legacy code. Refactoring regularly will ensure software does not become “legacy”!

Other major engineering practices are:

  • Pair Programming
  • Collective Ownership
  • Emergent Design

To summarize, engineering practices help a team become agile and stay that way. It is important to understand that adopting engineering practices is a cultural aspect and not just a matter of mandating a bunch of popular tools for the team to use. Agile teams will immensely benefit by embracing engineering discipline with conviction.

Leading Agile Teams

Welcome to the third part of my Agile series. Having covered the foundational elements of Agile and basics about the most widely used Agile framework, I will share my knowledge on how to motivate an Agile team towards the product goal. An Agile team is self-organizing and cross-functional. The term “self-organizing” is key, indicating that the traditional management approach of direction and control will not work.

Let me start with the origins of traditional rationale for the need for direction and control. “The Human Side of Enterprise”, a management classic written by Douglas McGregor almost 60 years back insightfully covers the assumption on which the traditional view is based:

  • The average human being has an inherent dislike of work and will avoid it if possible
  • Hence most people must be coerced, controlled, directed and threatened with punishment to get them to put forth adequate efforts towards achievement of organizational goals
  • The average human being prefers to be directed, wishes to avoid responsibility, has relatively limited motivation and wants security above all

I can bet that no one reading this blog will associate themselves with this average human being! This characterization is demeaning and Douglas McGregor concludes by saying “under the conditions of modern industrial life, the intellectual potentialities of the average human being are only partially utilized”. He made this case for factory workers sixty years ago. Software development in modern technology environment requires even more intellectual stimulation than routine work in factories.

I will now switch to a classic Harvard Business Review article from the 1980s by Frederick Herzberg titled “One more time: How do you motivate employees”. It starts with an interesting preamble: “Forget praise. Forget punishment. Forget cash. You need to make their jobs more interesting”. In short, we can enrich jobs by applying the following principles:

  • Increase individuals’ accountability for their work by removing some controls
  • Give people responsibility for a complete process or unit of work
  • Make information available directly to employees rather than sending it through their managers first
  • Enable people to take new, more difficult tasks they have not handled before
  • Assign individuals specialized tasks that allow them to become experts

A relatively modern book “Drive: The surprising truth about what motivates us” by Daniel Pink provides the most powerful insights that are applicable for software development. He says the predominant motivating factors have changed as humans evolved over the last 50,000 years. While the motivation 50,000 years back was just trying to survive, the labor workforce during early stages of industrial revolution was motivated to seek rewards and avoid punishments. He delves deep into what motivates the modern technology workforce required for software development.

He makes a compelling case on why rewards don’t work. The deadly flaws with rewards are that they can extinguish intrinsic motivation, diminish high performance, crush creativity, crowd out good behavior, encourage unethical behavior, become addictive and foster short-term thinking. Rewards are often equated to compensation and does this mean compensation does not matter? Compensation does matter and is vital to attract good talent. Instead of carrot and stick approach towards compensation, pay the team well in line with their market value and take it out of the equation so that the team is driven by intrinsic motivation.

The question then is how to achieve intrinsic motivation. Daniel Pink has an answer that I have seen work effectively – create a Results Only Work Environment and provide autonomy over the 4 “T”s:

  • Task: People are hired for specific business needs and they need to perform activities required to satisfy them. At the same time, several companies have benefited immensely by encouraging their people to spend about 20% of their time on tasks that they want to do on their own.
  • Time: Stop tracking time! Several studies have shown that creative work like software development cannot be measured by time – there are situations when an outcome that an expert programmer can produce in 2 hours cannot be achieved even after hundreds of hours spent by several mediocre programmers.
  • Technique: Business priorities determine what needs to be done but avoid telling the team how to do it. The suggestion is simple – hire people you can trust, tell them what needs to be done and trust them to figure out how to do it.
  • Team: Let the Team interview and select new members for their own team.

I will conclude by referring to Mihaly Csikszentmihalyi’s theory that people are happiest when they are in a state of flow – a state of concentration or complete absorption with the activity at hand and the situation. It is a state in which people are so involved in an activity that nothing else seems to matter. Some people call it being in the zone or getting in the groove. This is the state that people in an Agile team aspire to reach. So, create an environment where the team is fueled by intrinsic motivation and let the results flow in!

Scrum: What is it all about?

After articulating my views on agile in my previous blog, the next step is to cover the most famous agile framework in practice across the industry – Scrum. If you want to get a quick insight into Scrum, you should read The Scrum Guide authored by the creators themselves. There are numerous books and online material available to cater to your specific interests. This blog is only my mental model of Scrum.

Where did the term scrum come from? Rugby – scrum (short for scrummage) is a method of restarting play in rugby that involves players packing closely together with their heads down and attempting to gain possession of the ball. It was first used in software development context by Hirotaka Takeuchi and Ikujiro Nonaka in their 1986 HBR paper “The New New Product Development Game”. Rugby is team sport and success can be achieved only when all the players perform in unison. Teamwork is essential for software development to succeed too.

Who developed Scrum for software development? Ken Schwaber and Jeff Sutherland. They were among the 17 original signatories of the Agile Manifesto in Feb 2001.

Definition of Scrum: A framework within which people can address complex adaptive problems, while productively and creatively delivering products of the highest possible value. Scrum is lightweight and simple to understand but difficult to master.

Scrum is founded on empirical process control theory, or empiricism. Empiricism asserts that knowledge comes from experience and making decisions based on what is known. Scrum employs an iterative, incremental approach to optimize predictability and control risk. Three pillars uphold every implementation of empirical process control: transparency, inspection and adaptation.

One needs to go through a 2-day Certified Scrum Master (CSM) training to get a good understanding of Scrum. Having gone through the training twice and practiced it for several years, I would say Scrum is all about understanding the roles, events and artifacts, and bringing them together to succeed in developing complex software.

Roles in a Scrum Team: The Scrum Guide has captured this foundational element insightfully. To retain the impact, I have just pasted the excerpt below:

The Scrum Team consists of a Product Owner, the Development Team, and a Scrum Master. Scrum Teams are self-organizing and cross-functional. Self-organizing teams choose how best to accomplish their work, rather than being directed by others outside the team. Cross-functional teams have all competencies needed to accomplish the work without depending on others not part of the team. The team model in Scrum is designed to optimize flexibility, creativity, and productivity. The Scrum Team has proven itself to be increasingly effective for all the earlier stated uses, and any complex work.

Scrum Teams deliver products iteratively and incrementally, maximizing opportunities for feedback. Incremental deliveries of “Done” product ensure a potentially useful version of working product is always available.

Every word stated above is important and really leaves no scope for misinterpretation. However, many practitioners and so-called experts continue to alter the roles for their convenience. I have seen instances where a Manager from the legacy process becomes Scrum Master in the new environment and attempts to continue managing the team. As per my Scrum Coach, any violation of these definitions is fake scrum!

A quick summary of the only three roles recognized in Scrum:

  • The Product Owner is the only person responsible for managing and prioritizing the book of work (Product Backlog).
  • The Development Teams in scrum typically includes seven plus / minus two members. They are self-organizing, cross functional the accountability for delivering committed items belong to the development team as a whole.
  • The Scrum Master is a servant-leader for the scrum team, being responsible for promoting and supporting scrum by helping every one understand scrum theory, practices, rules and values.

Scrum Events: Some people call them ceremonies or routines, I feel the former unnecessarily glorifies them while the latter sounds mundane. I like to stick to events as it reflects simplicity and necessity. All events are time-boxed with an agreed maximum duration. The super event is The Sprint, which is a container of all other events that are designed to facilitate the three pillars of Scrum – transparency, inspection and adaptation.

An overview of the events:

  • The Sprint is the heart of Scrum, a timebox of one month or less during which a Potentially Shippable Product Increment (PSPI) is created. Sprints have consistent durations throughout development effort and a series of Sprints would typically result in a Minimum Viable Product (MVP). While sprint duration should be less than a month, the most preferred duration is a fortnight. As a thumb rule, the higher the ambiguity in requirements, the shorter the sprint. This might be counter-intuitive for some, but will be easy to understand when you consider from inspection and adaptation perspective. Shorter sprints allow for failing faster and pivoting quickly without being carried away by an illusion of control.
  • Sprint Planning is the first event during a Sprint. The primary input for this event is the prioritized Product Backlog that the Product Owner maintains. Sprint Planning covers what can be done in this sprint and how will we do it. The outcome is Sprint Backlog and Sprint Goal that the entire team commits to. It is time-boxed to not more than 5% of a Sprint.
  • Daily Scrum is a 15 minute event for the development team where every team member answers the following three questions:
    • What did I do yesterday to meet the Sprint Goal?
    • What do I plan to do today?
    • What are the impediments that need to be addressed?
  • Sprint Review is held at the end of the Sprint for the development team to demo the PSPI to Product Owner. It can occupy upto 5% of the Sprint depending on the level of details that need to be covered. At the end of the review, the Product Owner updates the Product Backlog based on learnings from the Sprint in the spirit of inspection and adaptation.
  • Sprint Retrospective is an opportunity for the team to introspect. All team members articulate what went well during the sprint, what could have been done better and collectively come up with a plan for improvements. The Scrum Master plays a key role during this event, helping the team to stay positive and productive.

Sprint Artifacts: Scrum keeps this part simple and focuses on enabling the three pillars of Scrum. The artifacts are:

  • Product Backlog is a list of everything that is known to be needed in the product and ordered by their value as determined by the Product Owner. Product Backlog is always evolving and the highest ordered items are more detailed than lower order ones. The details include estimates and the Product Owner collaborates with the development team to flesh out the details. This process is called Product Backlog Refinement.
  • Sprint Backlog is the list of all items to be completed to achieve a Sprint Goal.

This is Scrum basics in a thousand words. It is quite simple and sometimes simple things are the most difficult ones to follow. A team will realize this as they encounter issues during the initial sprints after agile transformation. However, the good news is that Scrum Framework provides the means to deal with all the challenges that will inevitably come up. Just stick to the basics and persevere using the framework, success will follow! Happy scrumming!!!

Agile Software Development: Revisited

It is six years since I was formally initiated into Agile Software Development and find myself at a logical juncture to reminisce the experience. I started my Agile journey in Jan 2013 as a skeptic, having seen another team decimated during the previous year after a global Agile transformation. There was no choice as my team was next in the line and the transformation was scheduled to officially start with a week long Certified ScrumMaster course at Chicago. The course started on a cold winter morning with senior leaders from all locations in attendance and it soon became clear that it had to be an all-in transformation with any half-measures doomed to fail. Over the next three months, having understood the merits of succeeding with Agile and the risks of not doing so, I became a believer and an earnest adopter. I was a proud practitioner during the next two years, coaching seven scrum teams across more than fifty sprints. I am not going to tell the story here, but will share some of the learnings from the experience.

What is Agile Software Development and how is it different from the other methods used? There are many Agile frameworks / methodologies – Scrum, Extreme Programming (XP), Lean, Adaptive Software Development and many more. The common elements across all these methods are captured insightfully in the Agile Manifesto signed in Feb 2001. If a team truly embraces all the values listed below from the manifesto even without religiously following a specific methodology, it is still an agile team. As a corollary, if any of these values is not followed in letter and spirit, then it is fake agile!

Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan

That is, while there is value in the items on
the right, we value the items on the left more.

These values are achieved by following the 12 principles that complete the Agile Manifesto.

  1. Our highest priority is to satisfy the customer through early and continuous delivery of valuable software.
  2. Welcome changing requirements, even late in development. Agile processes harness change for the customer’s competitive advantage.
  3. Deliver working software frequently, from a couple of weeks to a couple of months, with a preference to the shorter timescale.
  4. Business people and developers must work together daily throughout the project.
  5. Build projects around motivated individuals. Give them the environment and support they need, and trust them to get the job done.
  6. The most efficient and effective method of conveying information to and within a development team is face-to-face conversation.
  7. Working software is the primary measure of progress.
  8. Agile processes promote sustainable development. The sponsors, developers, and users should be able to maintain a constant pace indefinitely.
  9. Continuous attention to technical excellence and good design enhances agility.
  10. Simplicity–the art of maximizing the amount of work not done–is essential.
  11. The best architectures, requirements, and designs emerge from self-organizing teams.
  12. At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly.

Just stick to these values and principles without violating any and one will be Agile! It is that simple!

The challenge is not about learning to be agile, the difficult part is to unlearn old ways that people have grown to be comfortable with. Some of the elements to follow will be against what is perceived as common sense. So, we need to believe in Albert Einstein’s quote “Common sense is the collection of prejudices acquired by age eighteen”.

There is an ongoing debate about purist / theoretical agile vs. being agile in spirit. As one can see from the manifesto, a team is either agile or not. So, where does a purist angle come into play? It does when a specific framework or methodology is used. I experienced it when the teams had to adopt Scrum framework as part of an “all-in” transformation. All-in transformation is one where an entire group decides to make fundamental changes to ways of working by following a framework. It is hard but effective as it reduces ambiguity and resistance, avoids problems created by having scrum and traditional teams work together and will be over more quickly. More importantly, when a team is forced to go all-in by abandoning comfortable traditional practices and mandating hard new practices, it becomes difficult to pretend to adopt the change. It will essentially leave the team with only two options – embrace and survive OR pretend and perish. And Scrum has a number of routines that are difficult to religiously follow. It takes teams to the brink but once the transformation is complete, they will find their sweet spot and settle down while retaining the new found effectiveness.

As Mike Cohn says in “Succeeding with Agile”, becoming Agile is hard but worth it. It is hard as successful change is neither top-down or bottom-up, the end state is unpredictable, it is pervasive and dramatically different. But it is worth the effort as successful change will result in higher productivity, faster time to market, higher quality, improved employee engagement and job satisfaction among other benefits. However, not every one will willingly and whole-heartedly support the change. One of the significant reasons for resistance from certain groups is explained by Larman’s Laws of Organizational Behavior. It might not be possible to eliminate all the complexity with org structures in a large organization. But it is important to sponsor and empower the agile team. Free them from traditional monitor and control processes. Trust them to get the job done.

There is a lot more for me to share – on Scrum, Kanban, tools, techniques, books, etc. In the spirit of keeping my blog posts to a thousand words, here is a summary of my journey during the last six years:

  • During the first couple of years, I converted from being a skeptic to a proud agile practitioner coaching co-located, cross-functional, long-lived feature teams to success. It was a great experience to see agile engineering practices like test driven development, peer reviews, continuous integration and continuous deployment in action.
  • Took up a different role during the next four years where most teams were made up of 6 to 9 members and expected to release software every month. So, they had to follow most of the agile principles.
  • During this time, I have seen attempts to centrally administer Agile and plan / project manage agile transformation with fancy launch ceremonies. Such approaches that go against agile values and principles have consistently failed to produce desired results.

I will pause here and will continue this as a series soon.