PREVAIL 2022: PREVAIL 2022: AVAILABILITY, PERFORMANCE, SECURITY AND TESTING
PROGRAM FOR MONDAY, OCTOBER 17TH
Days:
next day
all days

View: session overviewtalk overview

08:00-08:59 Session 1
Location: Virtual Room A
08:00
The importance of Security in Operational Resilience

ABSTRACT. The importance of Security in Operational Resilience Keynote.

09:00-10:00 Session 2A
Location: Virtual Room A
09:00
Key insights on highly secure yet efficient real-world usage of Hardware Security Modules (HSMs)

ABSTRACT. HSMs - Hardware Security Modules - provide well established means of high operational security in context of applied cryptography.

As the letter 'H' reveals, those machines are physical hardware - thus, their overall performance capabilities are reflected by their actual existing number of tangible entities.

How does this match modern architectural approaches which center the cloud and benefit from its flexibility in regards of resource usage? Are HSMs doomed to become your performance bottleneck? Or can those literally cryptic devices even support you in achieving your system's goals?

Nicolai's experience in this matter arouse in highly relevant real-world scenarios of the German public health care system: he was responsible for the solution design and development lead for the electronic patient record's key derivation component as well as the electronic prescription service, both of which were legally required to utilize HSMs. Of course, protection of personal health data is crucial and HSM usage therefore a set choice in the patients' best interest. But imagine any situation where a patient's data is urgently needed; accordingly, just as important is the system's consistent reliability and availability.

In this talk, Nicolai will share his - attention, pun ahead - key insights on such efficient HSM usage.

After all - who said HSM could not also be an abbreviation for Highly Scalable Machines?

Target audience are certainly not only cryptography enthusiasts, but everyone for whom best in class data protection is a priority in current or future system design - while applying modern state of the art overall architecture.

09:00-10:00 Session 2B
Location: Virtual Room B
09:00
IaC and GitOps: the wining couple to automate your OpenShift deployments

ABSTRACT. This session will describe how to automate the build of a container platform and how to deploy applications on it. Infrastructure as Code(IaC)is used to automate the build of the platform. GitOps approach is used to automate the first deployment of application and to ensure that the state of applications is always at the desired state. The session will give concrete examples how to use IaC and GitOps in the context of OpenShift and CP4I architecture.

10:00-10:59 Session 3
Location: Virtual Room A
10:00
The future of cybersecurity with Quantum Computing
PRESENTER: Charles Palmer

ABSTRACT. QC is no longer a dead end for cryptography, but a new beginning for cybersecurity.

(full abstract to follow)

Michael Osborne will be the presenter.

11:00-12:00 Session 4A
Location: Virtual Room A
11:00
Enable end-to-end Observability to improve performance and efficiency of your App

ABSTRACT. Observability provides deep visibility into modern distributed applications for faster, automated problem identification and resolution. It enables better application performance monitoring (APM) and supports the organisations to meet Agile/DevOps/SRE goals of delivering higher quality software faster. It seems to be of more benefit to operations teams. However, in modern DevOps practices, the boundary between “Dev” and “Ops” is blurring hence Observability brings as much value to developers as it does to operations teams.

In this talk, I will cover what does “end-to-end observability” exactly mean? How can it be enabled for the applications? What are the common challenges? What are the benefits and the client impact it creates? And how observability will influence the future customers?

11:00-12:00 Session 4B
Location: Virtual Room B
11:00
Introduction to IBM LinuxONE. A Linux hardware platform to deploy your open-source platforms with highest availability, security, and savings

ABSTRACT. The term mainframe makes many engineers think of legacy, or even obsolete, IT systems. Yet, most don’t realize that the mainframe, now known as IBM Z, has continually evolved since its inception in the 1950s to be the most resilient platform used by enterprises worldwide for their business-critical workloads also on Linux and/or containers in a hybrid cloud environment. LinuxONE is a Linux Appliance, running Red Hat as well as other traditional Open-Source Software, built on Mainframe advanced technology innovation, that provides high performance, resilience to workload peaks, seamless scalability, security, and 99.999% availability for business demands 24x7 ready for hybrid cloud deployment. In this one-hour session you will learn some key technical and financial essentials of IBM Z and LinuxONE and come away with a new perspective for your Linux workloads in the cloud. Not only will this session arm you with significant platform insights. It will also show you how to compare and estimate costs for your current workloads, in a LinuxONE environment using a total cost of ownership model. Learn how even a small number of workloads can impact your IT budget, compare your costs, with examples from actual client environments, and improve your sustainability standing.

13:00-14:00 Session 6A
Location: Virtual Room A
13:00
Best Practices for Resilience Testing on Cloud Environments
PRESENTER: Ana Biazetti

ABSTRACT. When moving workloads to public cloud, clients need to understand the resilience capabilities associated with the cloud provider as well as of their applications. Cloud providers must comply with different regulations and prove their resilience through testing of resilience scenarios. Then, the clients/solution teams can build their own testing strategy based on the resilience capabilities provided to them. This presentation will share the best practices of the IBM Cloud team which current provides hundreds of resilient cloud services, the types of resilience they provide and the different failure scenarios that are tested regularly on IBM Cloud. It will also share recommendations for client workloads and applications built on top of the IBM Cloud services and how their resilience should be tested.

Learning Objectives and Expected Outcomes for attendees: - Understand resilience requirements for cloud providers and cloud workloads/applications - Understand the different types of resilience that a cloud service may provide - Understand the different test scenarios that need to be considered to guarantee the resilience of cloud services - Understand testing techniques for resilience of cloud providers and cloud workloads/applications - Apply leading practices for resilience testing design and planning for cloud environments

Session Type: Experience sharing Delivery Method: Lecture

Biographies:

Ana Biazetti is a Distinguished Engineer at Financial Services Cloud organization, where, as Chief Architect for Payments Solutions, she is responsible for driving technology innovation in payments and for developing the Financial Services solution architectures that bring the best of IBM and our ISV payments ecosystem to win in the market. Ana is passionate about leading teams in developing cutting edge technology solutions that address real world problems. With extensive experience in the complete dev/sec/ops lifecycle, including High Availability, Disaster Recovery and Reliability of solutions, she brings technical and business knowledge in leading global teams to create strategic, innovative architectures that support clients’ digital transformation. Ana is an IBM Master Inventor and member of the IBM Academy of Technology Leadership Team.

Fabio Benedetti IBM Distinguished Engineer in IBM Cloud Platform unit as lead architect of CTO Architecture Guild that is driving architecture decisions across IBM Cloud to establish best practices and to accelerate the delivery of cloud capabilities to market. Member of the IBM Academy of Technology. Master Inventor. More than 25 years of experience in the IT industry with IBM, occupying various roles in development, design, and architecture in various products across all major platforms. Prior to this role covered the roles of lead architect Service Management organization responsible for the IBM Automation portfolio, lead architect of IBM Cloud Orchestrator product, Cloud Computing architect in the IBM Software services organization.

13:00-14:00 Session 6B
Location: Virtual Room B
13:00
Hybrid Clouds and Lean IT : Interrogating Lean IT and Enterprise Architecture
PRESENTER: Ope Jegede

ABSTRACT. The rapid emergence of cloud computing has drastically simplified enterprise architecture with improved resilience. Allowing businesses to focus on their core competencies while taking advantage of information Technology infrastructure capabilities to improve their performance. The primary goal of this research work is to expound the fundamental principles and concepts of lean thinking, to find out how hybrid cloud has led to the emergence of lean IT and particularly to offer a foundation for lean transformation within any organizations. Reviews of the previous literature indicates that enterprise can balance lean and resilience as a way to address customers needs while meeting the continuous enterprise financial obligations.

This research study noted that "Hybrid Cloud" and "Lean" are contemporary paradigms that are often generally intertwined together. Hybrid cloud is a system that combines a private cloud with one or more public cloud services with proprietary applications allowing services to communicate across regions and data centers. Applying the lean principles allows various high performing enterprises to benefit from Cloud Services that eliminate waste, build quality, create knowledge, delegate risk, deliver fast, and optimize IT infrastructures.

Enterprises want to have sovereignty over their data and maintain high control over highly sensitive data. They want guaranteed secure data with audit security trials and controls. Hybrid cloud model will have to ensure proper clarity to customer concerns in order to help them make informed decisions on Data Sovereignty and jurisdiction control, Data security and compliance, Data independence, Data access and integrity and Data residency

These will improve the trajectory of adoption more and will set the stage for a leaner IT and increase the adoption of hybrid cloud. Organizations are increasingly turning to the cloud to reduce the infrastructure cost ownership, access remote capabilities and innovate.

Hybrid cloud has been growing extensively due to the emergence of infrastructure as a Service(IaaS), Platform as a Service(PasS), Software as a Service and Hyper-converged infrastructures. Cloud infrastructure is often compared to a utility, but the basic difference between a utility and a cloud is the risk involved with the data protection, provisioning and control. But how does the hybrid cloud interact with the lean IT? How can an organization benefit from this premise or approach while running their enterprise securely in the cloud? Hybrid cloud allows organizations to set their authentication mechanism and other security parameters to guard their platform from an attack. With an encryption key mechanism that allows customers to bring their own Key and keep their encryption key in their custody.

Modeling hybrid cloud service into a lean IT certain factors must be addressed to assure a robust architecture that is resilient. Some organizations adopt both infrastructure as a service and platform as a Service. Cloud infrastructures have working models with service-oriented delivery mechanisms as well as deployment-oriented infrastructure models. "A solution that will increase customers' confidence in the cloud while guaranteeing resilience, security and scalability will help increase the adoption of hybrid cloud or a full transition into the cloud".

15:00-16:00 Session 8
Location: Virtual Room A
Panel: Cybersecurity in a changing world
PRESENTER: Sridhar Muppidi

ABSTRACT. Huge data leaks, ransomware/extortionware, open source everywhere, data breaches, nation-state sponsored IP-theft, cyberwarfare, and nasty new malware... the challenges keep coming. This panel will look at how cybersecurity challenges are evolving along with the the accelerating social changes in the world, and how IBM is stepping up to meet them.

16:00-16:59 Session 9
Location: Virtual Room A
16:00
The Mayflower Autonomous Ship

ABSTRACT. The Mayflower Autonomous Ship Keynote

17:00-18:00 Session 10A
Location: Virtual Room A
17:00
Using the Scientific Method to Build Resilience into Systems

ABSTRACT. The basis of chaos engineering is to use a hypothesis-based approach to understanding potential issues and resolving them before data integrity, resiliency, or disaster recovery becomes a major concern. Using this type of approach has benefits across the CIA (Confidentiality, Integrity, and Availability) triad. Let's explore how the scientific method can be used at a larger scale to improve the identification and ultimately, the resolution of potential concerns across the software supply chain and infrastructure.

17:00-18:00 Session 10B
Location: Virtual Room B
17:00
Spectrum Scale on IBM Cloud
PRESENTER: Greg Mewhinney

ABSTRACT. The IBM Cloud Workload Engineering Services team partnered with the Spectrum Scale development team earlier this year to bring a high performance resilient distributed storage offeringto IBM Cloudthat leverages Spectrum Scale.

From a tile in the IBM Cloud catalog, our customers can fill out a configuration page specifying the parameters of their desired storage and compute clusters. Click start, and set into motion a large, concerted effort that uses IBM Cloud Schematics, Terraform and Ansible to deploy and configure a multitude of IBM Cloud VPC resources, that will, in less than an hour, culminate in cloud based high performance parallel filesystem storage and compute clusters that are ready to run.

In this talk we will lift the curtain and describe what is going on behind the scenes to prepare the cluster and how performance measurements and optimizations have shaped this offering. We will explain and give examples of how we process, organize and analyze Terraform logs to understand a provisioning process that can include up to 100 simultaneous tasks. We will use this data to show how we can separate the consequential processes that require synchronization delays from those whose runtime is masked by longer running processes.

After explaining the analysis and optimizations applied to cluster creation, we will shift gears and cover the important details of the storage cluster’s runtime performance. We will tour the hardware showing how we build a performance picture piece by piece. Measuring IOPS and throughput on raw disks, checking that the network links meet their advertised throughput, and making sure the numbers add up at every step before concluding with a detailed discussion of the stunning performance numbers that can be achieved by employing IBM Cloud bare metal hosts with NVMe storage in our Spectrum Scale filesystem.

18:00-19:00 Session 11A
Location: Virtual Room A
18:00
Controlling access to distributed data across an enterprise using dynamic policies
PRESENTER: Grant Miller

ABSTRACT. This presentation will discuss topics around accessing data in a controlled data lake was well as controlling access to distributed data across an enterprise. The discussion will include the use of an Attribute Based Access Control methodology integrated with IBM Verify to apply consistent policies and dynamic roles everywhere. It will also touch on how to apply policies consistently across data fabric.

18:00-19:00 Session 11B
Location: Virtual Room B
18:00
Comprehensive API Testing Ecosystem
PRESENTER: Lisa Waugh

ABSTRACT. Maintaining API test cases across multiple environments, for different load levels, and types of tests (eg. smoke, functional, regression) traditionally required separate tests be written and maintained. Test maintenance is expensive. If you have three lower environments that tests are run for, then you usually have at least three different tests you are trying to keep in sync. If you have a new api added for a microservice, then you have to modify and validate across all three tests at a minimum. If you are running different load levels for build tests verses load tests, for example, that could add additional tests that need to be maintained.

We have developed a method of using one API test and modifying it at run time to support multiple environments, types of tests, and load levels and we can run it in an IBM Cloud container as a microservice.

19:00-20:00 Session 12
Location: Virtual Room A
POSTER:DRaaS for Enterprise Container workloads with NovaLink
PRESENTER: Rekha Bomkanti

ABSTRACT. This document explains about technical solution that can be used during disaster for enterprise workloads with containers using NovaLink. This solution is service based disaster recovery solution, it can be implemented using VM Recovery Manager DR capability to provide Disaster Recovery as a Service (DRaaS). Solution provides technical aspects to handle power infrastructure to serve enterprise workloads.

POSTER: Mature Observability in Next Gen Platform by SRE approach

ABSTRACT. Problem :

Migrated to next generation platform running on the Redhat Openshift platform and IBM Cloud Paks with a standard IBM Cloud VSI. We had challenge in scaling up the observability and had lot of gaps which do not support Middleware / Platform led to issues on the availability , resiliency and reliability of the system

Purpose: ( Experience Sharing )

As part of the improvement we started our journey in a phased approach to perform the below activities

1. Discover - Validated the existing Observability / Monitoring Solution Ex . APM tools not supporting Middleware components in Next Gen Reviewed with the Development & Monitoring team to identify the gaps in the solution Analyzed the Production noise caused by the Observability / Monitoring tools Identify Security gaps impacts system availability Identify Devops failures to address application availability 2. Describe - Documented the solutions to address the gap Presented the analytics with Pattern which is causing False Positive Alerts Solutions to achieve Self Support / Auto Heal solutions to improve MTTR

Methods ( Lecture on Real Time Scenario ) Fine tuned Observability / Monitoring Thresholds on the APM / API Monitoring / Platform ( Infra ) Co-relation between events to reduce duplication of tickets Chatops feature for Repetiive tickets where SME not required Run scripts to recover the system based on the Monitoring Thresholds proactively SRE operational dashboards to have single pane view

POSTER:System and method to deploy schematically enriching document content as an REST APIs

ABSTRACT. The product documentation or common technical documentation are shipped in Infocenter application page or PDF or any other mean. The re-use of the published data requires parsing the published content type and extract information from the huge document. There are no simple way to get a particular section of data from the published media like Infocenter/ PDF documents. The idea here is to deploy the same technical document as a REST service APIs so it is easy to access and mange data by any type of clients. The technical documentation are written in a different schematics and the REST has its own schematic for resource modelling. Nowadays, there is enormous amount of Information and documentation pertaining to a particular product available in wide spread mediums as blogs, articles, tutorials, Infocenter document, videos etc. The information is spread across various blogs, videos, websites, etc.

POSTER:Secure Release Culture

ABSTRACT. Secure and Privacy by Design (SPbD) is an effort around securing IBM products for customers.

The session with share some key details around secure release using example from IBM Power products.

SPbD, the secure release process is followed by IBM Power products since 2018 across Operating Systems, Firmware and Systems Management.

Several Subject Matter Experts (SMEs) are involved in various stages of SPbD review. SPbD SME performs the final review of the submissions made by each product aligned to the respective release cycle before its reviewed by Business Information Security Officer (BISO). The process ensures that key foundational security requirements of the product have been met. It is a mandatory requirement for all products to obtain SPbD approval at least once in a year in the Blueline tool.

POSTER: A secure and hassle-free way to deploy to cloud
PRESENTER: Amrita Maitra

ABSTRACT. Each project has several environments such as development, system integration, production, tooling, etc. Code needs to be deployed continuously and integrated continuously in all these environments in an automatic fashion and securely. For this purpose, deployment pipelines are created. But the environment or cloud technology used may vary with applications, such as AWS, OpenShift, etc. Based on the technology and requirement, knowing when to used which type of pipeline and how to securely deploy code is an essential point. We will walk you through how pipelines are designed and created. What are the issues that you can possibly face and how to deploy securely.

POSTER: Cloud Pak for Integration reference Architecture

ABSTRACT. This session will introduce the different architecture pattern you could use when you deploy a solution on IBM Cloud Pak for Integration. It will cover from the simple pattern to more complex including ones with high Availability requirements.

21:00-22:00 Session 14A
Location: Virtual Room A
21:00
Re-Engineering BAW applications to deliver an ‘order-of-magnitude’ business challenge

ABSTRACT. This session describes how a complex Business Automation Workflow (BAW) application running on AWS was re-engineered to meet the business challenge of an ‘order-of-magnitude’ increase in throughput requirements. Distilled from a client engagement, we will describe the approach and the range of optimisations we followed to scale a series of mountains, starting with the original challenge towards the final goal of a 10x ‘near-linear’ throughput increase. Using Performance Engineering to re-engineer a traditional application to deliver performance and availability objectives without compromising other operational characteristics on AWS.

Many of the patterns and approaches described in this session are applicable to any traditional application that needs to be scaled up to handle significantly higher performance requirements, particularly where other options such as containerisation and auto-scaling aren’t available.

21:00-22:00 Session 14B
Location: Virtual Room B
21:00
Secure Your SLA & SLO While Keeping Customers Happy?

ABSTRACT. Track: Learning module, Lecture Topics: Site Reliability Engineering

Learning Objectives: We live in a world of services. We have vendors who sell us Software as a Service, Platform as a Service, or Infrastructure as a Service. We pay for Monitoring as a Service, Database as a Service or Storage as a service. Moreover, we want to make sure the services we purchase are reliable and performant. To do this, we started thinking about things differently than in the past. There is no perfect predictor, and nobody can predict the future. But, if there is enough data, there may be many formulas that can be used to predict things. For most people, past performance predicts future performance. As we start to think about the reliability and performance of our service, we must begin with its history. It doesn't matter if customers have been satisfied or unhappy with our service in the past, the important thing is to understand where we have been and where we are now. This is because customers expect things from us, and today we have probably already made implicit SLOs or SLAs with them. This presentation will start with SRE principles such as SLA, SLO and maturity model then will explain why historical data is very important to keep the agreements safe and the customer happy and how we can do it.

Expected Outcomes: Attendees should gain new perspectives on the maturity model, valid performance tests and the importance of historical data. They should also have a new viewpoint on how they can apply them to their environments. Delivery Method: The presentation will be delivered as a lecture including best practices to understand how the approach can be implemented.

Biography of the presenter: Cynthia Unwin is an Executive Architect with the HCS Canada CTO team and leader of the Sire Reliability Architecture practice, Murat Simsek is a senior SRE on the Canadian Site Reliability team with years of hands on experience solving customer problems.

22:00-23:00 Session 15A
Location: Virtual Room A
22:00
The 'A' in API stands for 'Application' not 'Achilles Heel'
PRESENTER: Troy Fisher

ABSTRACT. In modern applications and cloud products, APIs are said to be the glue that holds all of the components together.

Because APIS often link back-end components or external devices/services, it is sometimes presumed that whatever is using them will be well-behaved or trustworthy and that security checks need not be as rigorous as on other types of interfaces.

That presumption can be dangerous. API security weaknesses can be the Achilles Heel in an otherwise well-secured system.

Troy Fisher and Ben Goodspeed from the IBM Security Ethical Hacking Team will guide attendees through common API vulnerability types, with advice on how to avoid making mistakes.

22:00-23:00 Session 15B
Location: Virtual Room B
22:00
Continuous Availability considerations and challenges

ABSTRACT. There is an increasing demand to provide continuous service and to avoid unplanned outages. Also, the tolerance for visible planned outages is decreasing to the point where many businesses no longer have any planned maintenance windows that interrupt normal service. This is true for business-to-business, business-to-consumer, and even business-to-employee services. This shrinking or elimination of the maintenance window puts pressure on people, processes, and technologies to support concurrent upgrades and concurrent maintenance.

The reality is that continuous availability is much more than just hardware and software. The facility costs and resource requirements greatly exceed the initial implementation costs. As a result, continuous availability needs to be viewed in terms of total cost of ownership (TCO), and it must be cost-justified based on the inherent risks of not providing continuous availability.

In this session we will try to shed the light on the differences between high availability (HA) and continuous availability (CA) and list main considerations and solutions to ensure continuous operations for business critical applications.