In this and upcoming blog posts I’m going to discuss how AI technologies will apply to networking.
Before I do, please keep in mind that AI is a large and growing field, with several branches. In networking, there are three subfields of AI that are most relevant:
- Natural language processing (NLP), which includes speech recognition and natural language understanding.
- Machine learning (ML), in which we use data to learn patterns, so we can form inferences on new data for tasks like classification or prediction.
- Machine reasoning (MR), which includes using domain-specific knowledge bases (facts, relationships, and rules) and manipulations of the knowledge to answer questions.
I’ll refer to “AI” if I mean multiple of the above or related techniques. I’ll refer to a single technique if I want to be more specific.
A Standard Scenario
To make our discussion more concrete, let’s examine a task many network managers have to handle, and explore how AI in an intent-based network (IBN) architecture can improve the experience — for both end users and IT staff. Here’s our scenario: We work at a company with an international footprint, and we need to set up a company-wide video all-hands meeting, for all our locations around the world. It’s important that everyone can view this call with high-quality, low-latency video, and that they can send high-quality video into the call too, when it’s time for the Q&A.
We will need to prepare the network and application services to make sure we meet our goal of giving everyone a high-quality experience, no matter where they are and no matter what happens to the networks inside or outside the company.
This would generally require a large amount of human-driven preparatory work. Often there are subtle problems that are difficult to detect or predict prior to the event, even in a testing scenario. During the event itself, if any issues arise, it will likely be impossible to identify and fix the problem in time. In fact, during an event it generally is not possible to know how the event is going for all users, without them submitting real-time feedback.
The Modern Solution
A modern, intent-based networking architecture gives us a fighting chance to overcome these challenges. With IBN, we express networking as four conceptual functions: Translation, Activation, Assurance, and Infrastructure. These functions take our Intent and turn it into reality.
In the diagram below, we see the physical and virtual infrastructure — wireless access points, switches, routers, compute, storage — at the bottom. To make the infrastructure do what we want, we use the Translation function near the top to convert the intent (what we are trying to accomplish) from a person or computer into the correct network and security policies. These policies then must be activated on the network. A deeper dive on IBN is given in this white paper I co-authored, Intent-Based Networking: Building the bridge between business and IT.
Of course, we not only want to activate the policies, but we also want to assure that the network is providing the service as intended. Assurance is powerful, and relatively new capability for networks.
Likewise, Activation may be familiar to those aware of software defined networking (SDN) architectures. But IBN improves on it with the Translation and Assurance functions, which form a valuable feedback loop. In addition, the IBN architecture provides the capability to gather telemetry from across the network. As we’ll discuss, the data-gathering is critical to feeding the various AI engines, thereby improving network performance, reliability, and security.
The Role of AI in Intent-Based Networking
So how does AI help? It starts at the top, with codifying the core of IBN – the intent of the network operator. The Intent the operator expresses, in human language or through a more traditional interface, must be translated into network and security policies. This step can use natural-language processing (NLP), as well as forms of machine learning (ML) and machine reasoning (MR). It is often especially important to use machine reasoning, to leverage domain-specific knowledge about networking to determine how to realize the desired intent in the given network context.
Then the Activation step kicks in. It takes the network and security polices codified by the previous step, and couples them with a deep understanding of the network infrastructure that includes both real-time and historic data about its current behavior. It then activates or automates the policies across all of the network infrastructure elements, ideally optimizing for performance, reliability, and security.
In our example, it’s the Activation step that determines how to provision quality of service (QoS) at each infrastructure element across the global network to provide the desired high-quality video, while ensuring other important network tasks also operate as intended. Activation could also apply ML to predict where employees will be throughout the world at the time of the video call, so it can provision adequate bandwidth and processing based on their locations. Accurately identifying, ahead of time, which regions will have attendees in an office, and which will have more employees at home or on mobile devices, can significantly improve the user’s experience as well as cost-efficiency of the network itself.
In some cases, it may even be possible to predict that a user may not have sufficient bandwidth in their location. They could be notified in advance that if they want video they should go to their office, or else they will likely only receive audio.
But how good will the AI-driven Activation of our network equipment be? How will it adapt to real-time network changes? The Assurance component is what checks that the network is providing the service the intent calls for and the Activation step implements.
First, the Assurance step processes an immense amount of real-time data, using AI to surface only the factors that could apply to the issue at hand. For example, Assurance will watch the onboarding time (time to attach to a Wi-Fi access point) of all devices on the network. Assurance will tell us if onboarding times in a particular region are outside the bounds of normal fluctuation, possibly the result of a service issue, security incursion or other factor.
During our global, all-hands video meeting there is likely to be a massive spike in terms of people connecting at the start of the meeting. With ML in the Assurance system, we can determine when an unusual onboarding time is a problem, or just reflective of the global all-hands video meeting.
By using ML and MR, Assurance can also sift through the massive amount of data related to a global event to correctly identify if there are any problems arising. We can then get solutions to these issues – and even automatically apply solutions – more quickly and more reliably than before. For example, Assurance could identify that the WAN bandwidth to certain sites is increasing at a rate that will saturate the network paths, and it could proactively reroute some of the flows through alternative paths to prevent the problem from occurring. In prior systems, this problem would typically only be recognized after the bandwidth bottleneck occurred and users experienced a drop in call quality or even lost their connection to the meeting. It would be challenging or impossible to identify the issue in real time, much less to fix it before it distracted from the experience of the meeting. Accurate and fast identification through ML and MR coupled with intelligent automation through the feedback loop is key to successful outcome.
We are able to successfully perform Assurance for several reasons. First, we have very deep expertise in designing, running, and debugging networks. Second, we have designed our networking gear from the ASIC, OS, and software levels to gather key data, via our IBN architecture which provides unified data collection and performs algorithmic analysis across the entire network (wired, wireless, LAN, WAN, datacenter). Third, because we have been the #1 enterprise network vendor for the past 20+ years, we have a massive collection of network data, including a database of problems and associated root causes. And fourth, we have been investing for many years to create innovative network data analysis and ML, MR, and other AI techniques to identify and solve key problems.
This combination of capabilities enables us our products to quickly identify if a problem exists, its associated root cause, and to identify fixes to solve it. The network operator can accept the proposed fixes and then they are applied. The feedback loop continues and we gather more data to determine if the network is operating as intended. If not, we identify why and continue to improve the network.
Closing the Loop
AI amplifies the powerful capabilities of intent-based networking: It can accelerate the path from Intent into Translation and Activation, and then examine network and behavior data in the Assurance step to make sure everything is working correctly. Activation uses the insights to drive more intelligent actions for improved performance, reliability, and security, creating a virtuous cycle of network optimization. Prior architectures, such as SDN, only had the feedforward path of automation.
I also want to stress that the feedback the IT user gets from the IBN system with AI is not a flood of arcane telemetry data; instead it is valuable and actionable insights at scale, derived from the immense data and behavioral analytics using AI. The feedback loop illustrates how IBN and AI amplify each other in ways not possible before.
We envision many more exciting AI capabilities in IBN networks in the near future.