View: session overviewtalk overview
00:00 | Data Transfer in Multi-Cloud Environments ABSTRACT. Data Transfer in Multi-Cloud Environments Keynote |
01:00 | AI driven optimisations for Chaos Testing PRESENTER: Mudit Verma ABSTRACT. The problem: Chaos Testing is a popular method to gauge the resiliency of a system under adverse conditions by injecting faults into its different components. However, existing Chaos testing practices often utilize faults that are injected randomly or selected intuitively by the tester or SRE. Furthermore, the overall chaos-test space is so huge that it is practically not possible to cover all scenarios in a time-bound, cost-effective manner. Many of these faults may not even be valid or suitable for a given system under test. Purpose: This session will discuss how AI for Chaos Testing can provide a guided approach to chaos engineering where faults are selected or omitted intelligently, in turn leading to reduced, yet effective test cases. Specifically, the session shall cover various novel chaos-testing optimization techniques such as historical incidents and outage analysis, application behavior analysis, and reinforcement learning-based fault injection. Expected learning outcomes: - What Chaos testing is, the current landscape and processes involved in the microservices era - Why is it important to navigate the huge test space with intelligence - How AI-driven offline analysis such as historical incidents and past outage studies can help reduce the number of faults a tester/SRE has to work with - How studying the characteristics of different components in an application and infrastructure topology, such as finding critical services, Network, Compute, and Memory utilization can help prioritize realistic faults and scenarios - How reinforcement learning-based fault injection, driven by a closed-loop feedback mechanism with rewards and penalties can help generate effective test cases and optimize the selected faults. - How these techniques can be integrated and utilized into existing chaos practices and pipelines Presentation delivery shall also include a couple of real-life demonstrations and our implementation of these techniques with a microservices-based application and Litmus Chaos Engineering tool. |
02:00 | Cognitive translation quality evaluation for continuous delivery in devOPs PRESENTER: Vincent Chung ABSTRACT. The problem Translation on UI has high impact on user experience. The bad translation on UI context leads users to misunderstand the feature or deliver incorrect information. How to evaluate translation quality on UI context is critical to software quality. Purpose The story captures the journey on we leverage “Intelligent Localization Test System” developed by us to evaluate the translation quality on localized UI. Software localization test is the essential process before releasing to the product. The translation is generally verified & improved by human translators on UI directly to make sure it conveys the correct message to users based on UI context. It costs a lot to perform software localization test. The talk will cover how we automate software localization test from translation quality perspective. We’ll demonstrate the system architecture, how to leverage Natural Language Processing service to evaluate translation quality on multiple languages & how to train custom model for NLP service to evaluate translation context quality. The solution aims to skip human verification process on translation quality at devops process so that we could achieve continuous delivery & integration without human intervention. Methods We adopt the automation testing tool to inspect the translation & language information on localized UI and get the context information (ex: UI element type) in the meantime. It’s our client-end tool in the system. In the back-end, we leverage NLP service named sBert to evaluate translation quality for multi-language. We pick-up the pre-trained model of the service from multiple models with the proper assessment in the begging. To enable string context evaluation ability, we’ll retrain the model with context information of UI, business type of the product & translation update history through human verification to enhance context evaluation ability. The performance of model will be assessed if it’s positive correlation with translation update history to prove the model with translator’s insights. |
02:00 | How did IBM’s Critical Treasury Application Break the Latency Barrier using Spectrum Scale? PRESENTER: Walter Dietrich ABSTRACT. While moving a mission critical application from legacy data centers into IBM cloud data centers, our team was facing a dilemma. How could we provide the performance our users demanded while also meeting IBM’s disaster recovery requirements? In the legacy data centers, our team used synchronous mirroring for disaster recovery. Synchronous mirroring had been working well for ten years, but when we tried to use it in the cloud data centers, it made some batch jobs intolerably slow. The team found that certain kinds of jobs ran slower because the cloud data centers were 4 times further from each other than the legacy data centers. (The legacy data centers were only 300 miles (480 km) apart.) How were we going to mirror millions of flat files and terabytes of databases using data centers that were more than 1200 miles (1900 km) apart? We could have cobbled together a mixture of point solutions, but we wanted something robust that had built-in automation, security, and monitoring. In this presentation, we will describe the IBM Spectrum Scale software that was key to solving our dilemma and explain why this software was the right fit for the application given the performance requirements, the HA and DR requirements and the architecture of the application. We will demo some of the features that were key to the successful conclusion of our project, including Active File Management Disaster Recovery. We will describe the team that came together to solve the challenges we encountered. We will conclude with lessons learned and future plans. |
Panel: Performance and Resiliency with cloud computing PRESENTER: David Jonas ABSTRACT. Hypervisors are advertising that moving your solution to the cloud makes it easier to scale and be more resilient and each offer frameworks and approaches to achieve this goal, during this panel we explore concepts and views to consider in this evolving landscape. |
08:00 | How Kubernetes can be used to build Robust and Scalable Orchestration systems ABSTRACT. How Kubernetes can be used to build Robust and Scalable Orchestration systems Keynote |
09:00 | DevOps for the software-defined-vehicle PRESENTER: Gregor Resing ABSTRACT. In this session we introduce the Software-Defined-Vehicle (SDV) hardware and software concepts and the related architectural changes. With the SDV, automotive OEMs will become software companies. The development and operational processes will shift to more agile processes based on DevOps principles. We describe how DevOps principles can be applied to the whole ecosystem of vehicles and cloud-based services to develop and operate the future vehicles. DevOps for automotive includes security, backend vehicle services and over-the-air updates, a continuous integration and deployment process, the usage of containerized, hybrid integration testing and software- and configuration management. We show reference architectures for containerized virtual and hybrid testing of automotive software. The transition to the SDV requires fundamental changes in the OEM organization from siloed domain-oriented departments to more cross functional and agile organizations. These changes and the changes in the collaboration of OEMs and suppliers are shortly summarized. Note to reviewers: this lecture is from an IBM Academy of Technology Initiative |
13:00 | Addressing Concentration Risk for Improved Resilience of Hybrid Cloud PRESENTER: Ana Biazetti ABSTRACT. In the current Hybrid Cloud environment, where clients use a mix of on premises data centers as well as public cloud providers to fulfill their needs for computing resources for their solutions, there is an increased focus on cybersecurity risk management, which includes cloud concentration risk and its effect on the resilience of solutions. Traditionally, concentration risk was considered in terms of vendor assessment and supply chain risk, but in the new Hybrid Cloud solutions, the context of assessing risk needs to evolve to include CSP (Cloud Service Providers) lock-in, workload placement strategy, and data portability. This presentation proposes an operational resilience model across the digital supply chain through optimal workload placement and automation of regulatory compliance across multiple clouds and estates which addresses concentration risk and increases resilience. We can do so by ensuring service availability and business continuity, balancing requirements with cost, expecting failure and designing for it, understanding shared responsibility, and hardening integrated service management, driving resilience through automation and continuous testing, designing for resilient and secure data, and systematically balancing concentration risk. Learning Objectives and Expected Outcomes for attendees: - Understand concentration risk and its effect in resilience in hybrid cloud environments. - Learn a taxonomy and associated model that harmonizes the approach to managing concentration risk. - Apply leading practices for concentration risk management to improve resilience Session Type: Innovative Point of View Delivery Method: Lecture Biographies: Ana Biazetti is a Distinguished Engineer at Financial Services Cloud organization, where, as Chief Architect for Payments Solutions, she is responsible for driving technology innovation in payments and for developing the Financial Services solution architectures that bring the best of IBM and our ISV payments ecosystem to win in the market. Ana is passionate about leading teams in developing cutting edge technology solutions that address real world problems. With extensive experience in the complete dev/sec/ops lifecycle, including High Availability, Disaster Recovery and Reliability of solutions, she brings technical and business knowledge in leading global teams to create strategic, innovative architectures that support clients’ digital transformation. Ana is an IBM Master Inventor and member of the IBM Academy of Technology Leadership Team. Boas Betzler Boas is Distinguished Engineer for Solution Architecture at IBM Cloud. Clients trust him because he has built and operated Cloud solutions since 2009. Innovators credit him with breaking all the rules when he ported Linux to the mainframe as a kid. |
13:00 | Prepare to Prevent business disruption PRESENTER: John Hendley ABSTRACT. In the age of ransomware, a client's business is likely to be disrupted causing catastrophic consequences including going out of business. The session will discuss an approach to take a protection first approach to prevent business disruption. During this session 3 form approaches, Exposure Management, Improving the effectiveness of security technologies, and Faster detection and response will be discussed. The audience can learn about the modular ways to approach security implementation and protect clients' business. |
14:00 | Go Multi Cloud with Canopy - Cross Cloud Distributed Databases on Kubernetes based Platforms PRESENTER: Rakesh Jain ABSTRACT. As enterprises are moving their applications from on-premise to the cloud, deploying the critical applications to only one cloud creates cloud concentration risk. There are regulations like Digital Operations Resilience Act (DORA) of European Union which suggest that the financial enterprises should not rely on one cloud provider only for their critical business applications. There are similar upcoming regulations from other countries. In this talk, we present Canopy, a novel solution from IBM Research and IBM Consulting, which allows the enterprises to set up their NoSQL or SQL distributed databases across two or more clouds on Kubernetes based platforms, such that they can be used for active-active or active-passive setup and allow failover from one cloud to other. The key takeaways from this talk are - what are the technical challenges when dealing with multi cloud setup, and, when designing your multi cloud and cloud native strategy you have new technologies available which will help address your business continuity needs and also meeting the regulatory requirements. We will also demonstrate the real world scenario by setting up a distributed NoSQL database across two clouds, in two Kubernetes clusters, and show: • Read & write data cross cloud, • Simulate cloud-1 outage & read database data in cloud-2, • Write data during an outage in cloud-2 and be able to read it in cloud-1 when it is back up, • All data processed within milliseconds. • Monitoring of data replication and connectivity between clouds. This technology has been applied to a leading financial institution in the United Kingdom. The delivery method for this session is: Lecture. |
17:00 | AnFiSA: An open-source computational platform for the analysis of sequencing data on IBM Cloud PRESENTER: Yuri Gankin ABSTRACT. Despite genomic sequencing rapidly transforming from being a bench-side tool to a routine procedure in a hospital, there is a noticeable lack of genomic analysis software that supports both clinical and research workflows as well as crowdsourcing. Furthermore, most existing software packages are not forward-compatible in regards to supporting ever-changing diagnostic rules adopted by the genetics community. Regular updates of genomics databases pose challenges for reproducible and traceable automated genetic diagnostics tools. Lastly, most of the software tools score low on explainability amongst clinicians. Researchers from Harvard Medical School, clinicians from Brigham and Women’s hospital, software developers from Quantori and engineers from IBM have created a fully open-source variant curation tool, AnFiSA, with the intention to invite and accept contributions from clinicians, researchers and professional software developers. The design of AnFiSA addresses the aforementioned issues via the following architectural principles: using a multidimensional database management system (DBMS) for genomic data to address reproducibility, curated decision trees adaptable to changing clinical rules, and a crowdsourcing-friendly interface to address difficult-to-diagnose cases. Originally, AnFiSA was developed for on-prem deployment and also deployed on Amazon Web Services (AWS) and was later migrated to IBM OpenShift cluster. We will discuss how we have chosen our technology stack and describe the design and implementation of the software and our experience in migration of the software to IBM hybrid cloud. Finally, we will show in detail how selected workflows can be implemented using the current version of AnFiSA by a medical geneticist. |
17:00 | Building resilient 5G edge computing environments in a hybrid cloud environment PRESENTER: Mathews Thomas ABSTRACT. 5G edge computing is maturing and being adopted by multiple industries, but several challenges remain. One key challenge is operating a resilient 5G network across a true hybrid cloud environment spanning from the public cloud to core networks to multi-edge compute nodes to far edge devices. This session will discuss the challenges and how intelligent and optimized day 0 – day 2 operations of a 5G network across a hybrid cloud environment ensure network deployment and operational aspects are met to create a resilient 5G edge environment. The architecture is built on emerging and maturing technologies using IBM Cloud Satellite with various Cloud Paks including Cloud Pak for Network Automation, AIOps, Data, and Security integrated with 5G networking components to enable key players including Communication Service Providers, Network Equipment Providers, Edge application providers and System integrators to monetize on the investment they are making. Examples of client engagements with underlying architectures, technical research innovation, impact on emerging standards and lessons learned will be discussed. A brief demo will also be presented so that the challenges and solutions to address 5G computing in a hybrid cloud environment are clear to the audience. |
18:00 | Architecting Security for Regulated Workloads in Hybrid Cloud PRESENTER: Mark Buckwell ABSTRACT. Clients have been slow to migrate regulated workloads to hybrid cloud due to the additional risk to the sensitive data and the need to adopt stronger security controls. Organizations need a systematic approach to architect security that integrates zero trust architectural principles. The approach needs to ensure effective security controls are embedded into the solution that is appropriate to regulated workloads hosted in hybrid cloud. The main objectives of the session are to i) summarize the architectural thinking practices required to integrate security and compliance into regulated workloads on hybrid cloud and ii) provide the next steps to develop skills to enable confident architectural thinking for security. The session will summarize how security can be integrated into an architectural thinking process by describing the techniques and concepts to use with standard architecture artefacts, together with additional artefacts specific to security. The practices being discussed are from the Architectural Thinking for Security class that has been updated, based on recent client engagements, to take a cloud-first perspective with zero trust practices. Over 600 students have now completed the class, either as an internal IBM class or as an MSc degree module. The artefacts have also been updated to use the new IBM Design language for technical diagrams and the cloud deployment model recently integrated into the Cognitive Architect tool. |
18:00 | IBM Z development transformation PRESENTER: Edward McCain ABSTRACT. This article discusses how the product development cycle is being transformed with “Artificial Intelligence” (AI) for the first time in zSeries history. This new era of AI, under the project name IBM Z Development Transformation (zDT), has allowed the team to grow and learn new skills in data science. This transformation forces change structurally in how data is prepared and stored. In z14, there were incremental productivity gains with enhancements to automation with eServer Automation Test Solution and a technology data analysis engine called zDataAssist. However, in z15, AI will significantly accelerate our efficiency. This article explains how Design Thinking and Agile principles were used to identify areas that are of high impact and feasible to implement: 1) what and how data is collected via System Test Event Logging and Analysis engine, Problem ticket management system (Jupitr), and Processor data analysis engine (Xrings); 2) problem identification, analysis, and management (AutoJup) along with Intelligent Recovery Verification Assistant; 3) product design documentation search engine (AskTheMachine); and 4) prototype microprocessor allocation processes Intelligent Commodity Fulfillment System using Machine Learning. This article details the approach of these areas for z15, the implementation of these solutions under the zDT project, as well as the results and future work. |
21:00 | Fybrik: A cloud-native platform for addressing non-functional aspects of data usage PRESENTER: Sima Nadler ABSTRACT. Making the most of enterprise data is a huge challenge, especially in a multi-cloud and hybrid cloud environment, and in a world that highly regulates the use of sensitive data. Fybrik enables easier access to data, while orchestrating optimal data flows according to business needs and enforcing data governance policies. Fybrik (https://fybrik.io/v1.0.0/) is an open-source cloud-native infrastructure that enables enterprise wide data governance enforcement based on pre-defined rules. It is a key component of IBM Data and AI's Data Fabric. The use of Fybrik decreases the manual processes currently in place, providing access to data in seconds to minutes rather than months. In addition. Fybrik also negates the need for sharing credentials with data users, increasing security. And, it ensures that new data and copies are only written to allowed locations, based on IT admin preferences as well as data governance restrictions, and automatically registers them in the enterprise data catalog. Fybrik supports hybrid and multi-cloud environments. To this end, it also provides capabilities for determining the optimal data path between a declared workload and the data sets it uses. Which capabilities should be included (read, write, copy, transform, etc), which storage accounts should be used when writing, and in which cluster/region each capability should run, are all decisions made by Fybrik as it optimizes the data plane for a given workload. It takes into account the context of the workload, metadata of the data, data governance rules, and infrastructure capabilities and the enterprise's priorities for how to leverage its infrastructure. In this talk, we will introduce Fybrik and its architecture, as well as sharing an example use cases from a pilot done with a major European bank and invite participants to download and try out Fybrik. |
22:00 | Scaling Cloud Pak for Data for Data Fabric usecases PRESENTER: Sourav Mazumder ABSTRACT. Enterprises can build different kinds of Data Fabric use cases using IBM Cloud Pak for Data. The use cases can range from Meta Data Ingestion, Meta Data discovery, Data Virtualization, to consumption of Data from Watson Knowledge Catalog or Project. It is extremely important for enterprises to test and benchmark these use cases for performance at scale in terms of data volumes and concurrent users. In this presentation we shall cover best practices around isolating various use cases of Data Fabric into different workloads, based on the flows covering key user actions, data preparation, and a list of APIs for performance and scalability testing of several key data Fabric use cases. We shall also cover the results and learning from the case study, and some performance monitoring best practices to help proactively identifying resource constraints and custom tuning. |
PANEL:Responsible Computing: The Blueprint to Make Sustainability Part of Your Organization’s DNA PRESENTER: Marc Peters ABSTRACT. Responsible Computing goals are to ensure IT organization is contributing to – and recognized for – the planet’s sustainable development goals. A focus on responsible computing reduces costs and enhances operational efficiencies while addressing the most pressing challenges of our day environmental sustainability, efficient infrastructure, secure coding, and ethical and transparent systems that reflect our diversity. The Responsible Computing blueprint ties IT decisions to environmental, social, and corporate governance (ESG) KPIs to meet sustainability goals that also make your organization more operationally efficient. Learn how to integrate your digital transformation efforts into an overall environmental sustainability strategy that transforms business processes into green, intelligent workflows. |