Program for Tuesday, October 15th

PROGRAM FOR TUESDAY, OCTOBER 15TH

Days:

previous day

next day

all days

View: session overview talk overview

08:00-19:00 Session 12: Posters, see monday for listing

09:00-09:15 Session 13: Keynote

Location: Yorktown South, 21st floor

09:00

Andrea Martin

Welcome from the IBM Watson IoT center leader

09:15-10:15 Session 14: Keynote

Location: Yorktown South, 21st floor

09:15

Simon Whelband

The Maersk AlwaysOn Journey - How Maersk migrated to the cloud whilst still achieving 5x9's availability

ABSTRACT. In this session Simon Whelband (Chief Software Architect for AlwaysOn) will describe the journey Maersk has embarked on since 2016. Simon will describe why AlwaysOn was required, and how, over a period of 3 years, the platform has evolved, as technology has evolved.

From Softlayer and Docker to Kubernetes in both Dedicated cloud to then full Public, AlwaysOn is now a very different beast to what it was when it started out.
How has Maersk continued to deliver 100% platform uptime whilst evolving as the cloud has evolved? And where does it go next?

The above likely describes the way I'll structure the keynote. It's a fairly broad topic to be honest, and it will easily fill 45-50 mins. the intent will be to describe where AO came from (and why), and then the overall evolutionary approach we've taken to the platform (with constantly reinventing it based on technology change).

In addition we've also gone from a more conventional ops based model to a full on SRE model. That itself will form a part of this.

10:15-10:30Coffee Break

10:30-11:30 Session 15A: Breakout Performance

Location: Yorktown North, 21th floor

10:30

Surya Duggirala

Automated Performance Regression Patrol Framework for IBM Cloud

ABSTRACT. As we adopt agile methodologies in our daily development process, continuous automated code quality checks should be incorporated to catch the regressions right when we drop the code. This includes adding performance checks as an essential phase in every CI/CD pipeline. In this session, we will share our projects ‘Vulcan’ and ‘Snow Leopard’ focusing on performance automation and integrated monitoring across IBM Cloud and Istio open source workgroup.

10:30-11:30 Session 15B: Breakout Security

Location: Wolga, 22nd floor

10:30

Rik Lammers

Organizational aspects - Engineering Roles & Departments in a modern agile enterprise!

ABSTRACT. Cloud, DevOps and Agile are here to stay. Availability, Performance and Security are more then ever important.

However, who will take care? Specialists or Generalists? Or both? What roles will be there? How do those roles fit into a modern digital enterprise? Do we still think in terms like Level 1, Level 2 Level 3 and a service desk? How do new concepts like Site Reliability Engineering fit in? Same about processes. Do we get new processes? Is ITIL still applicable? What about automation?

To illustrate the changes, the author will use a model called FRM-IT (Functional Reference Model for the Business of IT) This consulting asset illustrates both traditional roles in IT and the roles in a current modern digital enterprise where DevOps, Agile and Cloud have been embraced. Will use a recent client case, where I worked with the asset as example. Also some attention will possibly be given to how IBM is handling this transformation in GBS and GTS.

The author will not pretend that he has the final word on all of this. There will be room for discussion and exchange of ideas.

Biography: Rik Lammers is a Dutch certified senior IT architect with a long and extended background in IT Service Management and Architecture Governance. In recent years,he has nearly exclusively worked on the impact of Cloud and DevOps on established IT Service Management, Architecture and IT Governance practices. Earlier he did similar but then it was the impact of SOA. He has also fulfilled the lead architect role for the development of one of IBM's first extended integrated large SO multi tenant ITIL based Service Management solutions.

10:30-11:30 Session 15C: Breakout Availability

Location: Yorktown South, 21st floor

10:30

Steve Hayes

Hybrid Cloud needs Hybrid Resilient Networks

ABSTRACT. Don’t assume you can have a one-size fits all multi-site network architecture and meet your resiliency goals! Abstract This session examines the impact of early network architectural decisions on the resiliency capability delivered in large transformation programmes. The two case studies are from retail banking “design, build and run” programmes each with a TCV over $1B. In each case the objective was to migrate workloads to IBM from existing client data centres, where networks were stretched between the two sites for a combined HA/DR solution. I posit the principle that 'Layer 2 stretching is always a bad idea: the question is “how bad an idea is it this time?”' and that hybrid workloads will need some but not too much. The first case study is a traditional dual-site datacentre solution with a decision not to stretch networks. In consequence every migrated application needed a completely different DR solution for the target environment. The second study is a private cloud solution with software-defined networking, where it was decided to stretch the software-defined “overlay” networks between the sites. This supported a wide range of DR patterns, which was a great boon to application migration, but the cost was significant additional complexity and issues in meeting the resiliency goals for the network. Expected Outcomes: understand how, in large private cloud programmes, resiliency architects must be fully engaged at the outset to shape the key architectural decisions that will determine the future HA/DR capabilities. Session Type: Experience sharing, Innovative point of view Delivery Method: Lecture Speaker Bio Steve Hayes is an IBM Executive Architect and Fellow of the British Computer Society with 33 years’ experience in IBM Services. He has performed as Chief Architect on several large transformation programmes in the banking and other sectors. He was previously Public Sector Chief Technology Officer for IBM Global Technology Services in the UK and Ireland, for which he is also the Architecture Profession leader. He has presented papers to five Academy of Technology conferences on High Availability and is an advocate of structured architectural methods and systems engineering techniques. Steve lives in Glasgow, Scotland and is married with five children who absorb most of his non-professional time.

11:30-12:30 Session 16A: Breakout Performance

Location: Yorktown North, 21th floor

11:30

Andrew McDonald

Preserving and Enhancing Availability and Performance while Migrating to Cloud

ABSTRACT. Whether consolidating to a new Data Centre or migrating to cloud, you need to manage Performance and Availability migration risks but you also have a great opportunity to take up new capabilities to enhance resilience, especially in the cloud.

An experienced Migration Architect talks about how you can manage migration risks to Performance and Availability, ending up with a more robust IT Landscape than you started with.

11:30-12:30 Session 16B: Breakout Security

Location: Wolga, 22nd floor

11:30

Samvedna Jha

Secure like no one can hack

ABSTRACT. Learning objectives: Security has been in everyone’s mind and thanks for recently reported vulnerabilities in most trusted environment. While the hardware cannot do much to resolve vulnerabilities like S&M immediately, the responsibility comes on software and operating systems. This also opens up opportunity for software development process to include security in the design or at least automation framework like DevOps.

Expected outcomes: Sanitized code is the need of the hour. Which means the released code should secured at least for the known vulnerabilities. The question is how to do that? This requires security scans on the code before they are released. Manual security scan requires efforts to run scans and then to analyze the reports. Both are repetitive tasks. Several security scan tools come with heavy price. In such cases, there are several open source tools which can be explored. The proposed session will share details on integration of such scan tools into automation environment like Continuous Test in DevOps. Many times, we try education sessions in security domain to bring-in security in the development process. But it only helps to bring-in awareness. The integration of such tools helps to enforce security in the development process.

Session type: Learning module

Delivery Method: Participative lecture.

Bio of the presenter: Samvedna Jha is leading a team of security analysts. She collaborates with Systems Management, Operating System and Firmware Architects and developers to bring a culture of Security by Design to ensure that the product deliverables meet the security standard of IBM and clients. Some of the job responsibilities are: - Identify and define Systems Management secure engineering requirements - Create Systems Management Security roadmap through close collaboration with IBM offering management. - Perform vulnerability assessments using several cognitive tools - Develop technical solutions and identify new security tools to help mitigate security vulnerabilities and automate critical tasks. - Work with Firmware team to ensure compliance with necessary US and world wide government, and cryptographic standards. - Respond to customer concerns about vulnerabilities, audit and compliance requirements. - Develop security domain skills within security analysts and extended team. - Ensure Systems Management security capabilities are auditable and meet industry, corporate and customer requirements

12:00

Alexander Hachmann and Slobodanka Sersik

A Legacy Guide to the Cloud

ABSTRACT. Cloud-based applications, microservices and fast processes from developer work in to test / production is a topic that everyone has in his head. It enables the team to work more agile and the customer to experience a working product much faster. But today, not every project or product is built on such architecture yet. What shall those projects do? Within the Watson Health DACH team for Social Program Management we were facing such problems. Especially when it came to testing we had big challenges. For example, requesting test machines took very long time. Even though we were already using Virtual Machines, commissioning them took time. Configuring the machine to its own identity was still a task to do and could cause problems as it can be error prone.In the last two years we have been working on containerizing our legacy application and created a deployment architecture that enables the team to have a decrease of development to test from around 5 hours (without commissioning) to 1 hour. That approach is also not limited to the machines that are available but is fully scalable. The deployment will be done on IBM Kubernetes Service and Cloud DB2. Having developed an additional command line tool that also configures the K8s Ingress and Services, every Developer or even business analyst or tester is enabled to have their deployment of a nightly built software drop done within 45 Minutes. We would like to present you our approach how to create process that supports agile development methods and DevOps pipelines if you are modernizing a legacy monolithic application. Our real project experience – presented during the talk - proves that this approach can substantially improve quality, availability and performance of traditional applications.

11:30-12:30 Session 16C: Breakout Availability

Location: Yorktown South, 21st floor

11:30

Haytham Elkhoja

Availability in a Cloud Native World, Guidelines for mere mortals.

ABSTRACT. I present architectural methods, patterns and practices that are to be followed by developers, SREs and software architects when building and maintaining cloud-native applications and services that need to provide the highest levels of availability.

The methods describe how to provide *true* five nines (99.999%) for end to end business services by incorporating Site Reliability Engineering (SRE), DevOps, Microservices, Chaos Engineering, Cloud-native Architectures, Application Modernization, Multi-Availability Regions, Geo-dispersity, Data Consistency, Performance and Scalability, Content Delivery Networks (CDN), and Software-defined Environments (SDE).

12:30-13:15Lunch Break

13:15-13:45 Session 17: Keynote

Location: Yorktown South, 21st floor

13:15

Ingo Averdunk

What is SRE

13:45-14:30 Session 18: Panel Discussion

Location: Yorktown South, 21st floor

13:45

Mario Gaon

Panel Performance

ABSTRACT. Moderator: Mario Gaon

Panel: Chris Winter Lydia M Duijvestijn Surya Duggirala Andrew McDonald Martin Jowet

14:30-15:30 Session 19A: Breakout Performance

Location: Yorktown North, 21th floor

14:30

Robert Barron

Teaching old (and good) dogs new tricks with SRE and ChatOps

ABSTRACT. Abstract: In this session I will describe the work I've done as a ChatOps "evangelist" at large companies who are taking their first steps towards SRE-like operations.

The session is aimed at people, like my clients, who are starting along their path towards SRE adoption and see many possible pitfalls along the way. This session will show how to avoid two common pitfalls: 1. Thinking that change of technology is more important than culture change 2. Thinking that only "new technology" can be used for SRE purposes. Many organizations want to start adopting DevOps and SRE concepts and processes, yet do not want to "rip-and-replace" their traditional ITSM toolchains. Specialized monitoring tools remain the silo-ed domain of the NoC and Operations staff while developers, incident managers and others use their own tools. This leads to a low level of collaboration and inhibits processes such as Incident Management.

Because many of the traditional monitoring & service management tools were created without SRE or DevOps in mind and are oriented to a specific domain they require significant expertise to use. Most customers do not want to lose the effort (and money) they've invested in their existing products and solutions, yet feel constrained because these products are supposedly "not designed for SRE/DevOps". I will describe how I use ChatOps at my clients to break down enforced silos and empower my clients to gain the benefits of ChatOps and SRE-type collaboration while still using their "traditional" monitoring/operations technological stack. The solution I use is the addition of a custom integration layer between the existing tools and the collaboration platform that will expose the functionality of the tools and correlate between them.

I will present case examples where the adoption of ChatOps as an intermediary layer between the legacy monitoring & event management suite and DevOps/SRE/Developers/LoB owners empowered and accelerated the adoption of SRE practices. This allows the entire organization to get the full benefit of the existing monitoring/ITSM stack, instead of the dedicated silos each component traditionally belongs to.

The session can take approximately an hour, with ~10 minutes of Q&A

The audience take-aways will be: 1. Confidence that legacy monitoring/event management suites can be used successfully with SREs 2. Knowledge about the benefits of ChatOps in general 3. Ideas for entry/starting points in their own organization 4. Access to an MVP lab (github) with code that can be used in their own environments. Biography: Robert Barron is a Senior Managing Consultant and member of the IBM Garage Solution Engineering group. Within the worldwide Garage SE team, he is part of the Cloud Service Management and Operations Experts team, working in all fields of CSMO and specializing in Site Reliability Engineering and Chat Operations.

Robert joined IBM in 2007 and has held various positions in IBM throughout his career, all in the field of Service Management. In total, he has over twenty years of experience in enterprise systems in multiple domains spanning development, technical leadership, project management and offering management. In previous roles, Robert was the squad leader for CSMO in the Solution Engineering group within CASE and the technical lead for Service Management in Israel.

Robert speaks at global conferences for IBM and creates assets that range from internal documentation to published books and public code.

14:30-15:30 Session 19B: Breakout Security

Location: Wolga, 22nd floor

14:30

Kazumasa Kawaguchi

Stand-In system design for Always On, Continuous Availability at Digital Era

ABSTRACT. "Always On" 24/7 non-stop service needs Continuous Availability (CA) included both High Availability (HA) to remove unplanned outage and Continuous Operations (CO) to minimize planned outage. CA is a mandatory non-functional requirement for public and private Cloud at digital Era. HA included Disaster Recovery (DR) system is already implemented at many customers. But some maintenance like the change or the reorganization of database needs planned outage. So CO to reduce planned outage time is one of key issues to implement CA.

"Stand-In" system which has the different IT resources from the production system is the solution to minimize planned outage time. When production system is switched to Stand-In system, the original production system can be shut down for the maintenance during Stand-In system runs as production. After the maintenance, production can be switch back from Stand-In system to the original production system. And if these planned switches can be done within one minute, CO can be kept during maintenance.

I will show the design and the operation for Stand-In system I made at a banking customer by using DR system. This was the first implementation of the combination of asynchronous disk copy and SW replication between production and Stand-In with long distance. This design can be applied to any other customers who want "Always On" 24/7 non-stop service.

-Customers' requirements for CA -Comparison of techniques to minimize planned outage for CO -Design for remote Stand-In system with long distance -Operations for planned switch between production and Stand-In system within a minute

Lecture to share my experiences

VIO: https://ibm.box.com/s/rwec2p0ar2n2c1nimcqmwy8r7loeekbn Kazumasa Kawaguchi, a DE for Client Technical Sales in IBM Systems -Speaker at TLE in 2006, AP STG University in 2011, Technical University in 2013 -Leader of AoT initiative in 2017. Its report is "Consulting a Polish company" (AEB-1114) in 2018.

14:30-15:30 Session 19C: Breakout Availability

Location: Yorktown South, 21st floor

14:30

Marc Coq

RESILIENCY REGULATORY COMPLIANCE AND IBM SYSTEMS Z ECOSYSTEM.

ABSTRACT. Resiliency Compliance regulations are emerging around the world in different countries. Solution providers must become aware of those regulations, and address the requirements in their products. As there are numerous resiliency compliance requirements, many of them will have a major impact on your product development, test and production environment. Complying with these requirements are often challenging and resource intensive. Companies must be better positioned to address them in terms of skills and resources – if they had a good forecast of what is to come.

As studies show, most companies are not even aware of these regulatory and they are often caught by surprise when they get audited by government agencies or developing their business continuity plan. We believe that companies must be better positioned to assess the regulations and identify key requirements affecting their businesses. They must also ensure their systems and solutions meet the requirements to enable successful compliance audits.

The IBM Z System provides solutions to meet compliance requirements while delivering Continuous Availability in their environments.

15:30-15:45Coffee Break

15:45-16:45 Session 20A: Unconference Performance

Location: Yorktown North, 21th floor

15:45

Eric Barsness

Ensuring Your SOR Can Soar!

ABSTRACT. As front-end and middle tiers continue to scale-out, most back-end systems of record (SOR) are hosted on very large single OS instance servers that need to scale up. Scaling the back-end up continues to be a challenge as partition sizes grow to be over a 100 multi-threaded processor cores accessing dozens of terabytes of memory on non-uniform memory architecture (NUMA) systems. Hot cache lines and other node-to-node interactions continue to cause problems in the field. These problems range from occasional slowdowns to applications becoming unresponsive to the point they are no longer available.

The speaker proactively works with the clients that have the largest partitions in the world to ensure they can continue to scale up their systems of record to handle their business growth. In this session he will explain some of the hardware, firmware, and operating system challenges in allowing massively multi-core partitions to scale up. Methods to identify scalability problems will be shown, as well as example solutions at both an OS and application level.

Good performance of back-end servers is also dependent on well-behaved middle tier servers. The session will include tips on how to prevent the middle tier from making a small problem into a big one when back-end servers are struggling to keep up with incoming requests.

After this session, you will better appreciate the challenges and importance of improving the scalability of large back-end servers. You will also be able to better configure your middle tier systems for improved performance and availability of your infrastructure from front-end to back-end.

Eric Barsness has been optimizing IBM products and customer's applications and partitions for over 25 years. He is the team leader of the IBM i performance team in IBM Systems Lab Services. He is an Executive Consultant and an IBM Master Inventor.

15:45-16:45 Session 20B: Breakout Security

Location: Wolga, 22nd floor

15:45

Taida Buljina-Prohic

How will refactoring and migration to Cloud of an existing health care application, affect its performances, security and availability

ABSTRACT. IBM Watson Health Payer together with Watson Foundation for Health is preparing/working on PoV (Proof of Value) to evaluate migration of an existing Health Care application to Cloud. The existing application serves several hundreds of customers on-boarding, enhancing and aggregating administrative medical claims and enrollment data utilizing highly salable parallel framework. Customers size ranges from tens of million records to 2-3 billion records stored and available for reporting within Oracle Exadata databases. The goal/expectation of the migration to Cloud is to reduce the time to insights, increase the frequency of the updates as well as bring the data to the common platform. The goal of the PoV, besides selection of the appropriate tools and services in Cloud to support near real time processing; and decomposing and externalizing the application to common/sharable services, is to determine how will this migration affect performances, security and availability. The goal for this paper is to share the findings of the PoV.

16:15

Alecio Binotto

Improving devices communication in Industry 4.0 wireless networks*

ABSTRACT. Internet of Things (IoT) and cyberphysical system (CPS) technologies play huge roles in the context of Industry 4.0. These technologies introduce cognitive automation to implement the concept of intelligent production, leading to smart products and services. One of the technological challenges related to Industry 4.0 is to provide support to big data cloud based applications which demand QoS-enabled Internet connectivity for information gathering, exchange, and processing. In order to deal with this challenge, in this article, a QoS-aware cloud based solution is proposed by adapting a recently proposed seamless resources sharing architecture to the IoT scenario. The resulting solution aims at improving device to cloud communications considering the coexistence of different wireless networks technologies, particularly in the domain of Industry 4.0. Results are obtained via simulations of three QoS demanding industrial applications. The outcomes of the simulations show that both delay and jitter QoS metrics are kept below their specific thresholds in the context of VoIP applications used for distributed manipulators fine tuning control. In the case of video-based production control, the jitter was controlled to meet the application demands, and even the throughput for best-effort supervisory systems HTTP access is guaranteed.

*This proposal has the goal to share within IBM internally the following paper to be published: https://www.sciencedirect.com/science/article/pii/S0952197619300995?dgcid=author

15:45-16:45 Session 20C: Breakout Availability

Location: Yorktown South, 21st floor