EGXTECH2018: EG XTECH CONFERENCE 2018
PROGRAM FOR TUESDAY, NOVEMBER 13TH
Days:
previous day
next day
all days

View: session overviewtalk overview

10:00-11:30 Session 9A: Day 2: Rome

Day 2: Rome

Location: Rome
10:00
Shared Services Marketplace: Hotels.com vision of an API Developers Portal and Documentation Hub

ABSTRACT. In Hotels.com Shared Services Marketplace is a governance process to support Shared Services lifecycle.

This project started as a process to facilitate collaboration between teams owning different microservices, in order to reduce frictions and simplify communications between shared services owners and consumers.

This project is now evolving in collaboration platform to provide a Documentation Hub and an API Developers Portal to all the whole HCOM community.

10:30
Smarter testing with Spock

ABSTRACT. Spock is a testing and specification framework for Java and Groovy applications. We will see how Spock with its powerful features can help us write shorter and elegant tests for our Java code compared to our usual JUnit/TestNG setup.

We will cover these topics using Spock: 

  • Specification (test) structure: fields, fixtures, features 
  • Labeled blocks 
  • Mocks and Stubs 
  • Parameterized and Data-driven testing 
  • Interoperation with Java frameworks

In this talk, you will learn how to integrate Spock and Groovy for testing your existing Java projects and the benefits of adopting it, even progressively, side by side with Java tests.

Level: intermediate

11:00
Integration Test Platform For Hcom Mobile Pillar

ABSTRACT. We have a dream: a platform for integration tests on mobile side, from backend services to APPs... a platform where tech people (cross teams) can collaborate with each other and with business (TPMs, PMs, and so on) for a common goal, that is "making our mobile user happy". Utopia? Not at all! Let's try to think together of a possible architecture for that platform, to ensure a coherent and consistent navigation, not only in UX terms, but following data. Data come from different sources and are elaborated several times before being shown to the final user. For example, what could it happen if prices changed through the navigation? Would our user be still confident in using our APP? Maybe not... So let's just prevent these kinds of issues with integration tests focused on the contextualization of data shown.

10:00-11:00 Session 9B: Day 2: Montreal

Day 2: Montreal

Location: Montreal
10:00
Smart cache API Gateway to shield a service from poor performance from downstream dependencies

ABSTRACT. In early 2018, we implemented an API gateway that is used throughout the Flex ecosystem to communicate with downstream services (on which we depend to get data) that are too slow.

In order to serve our customers and not lose SEO ranking, we need to serve pages as fast as possible. However, with AWS cost cuts etc... many of our downstream services treated performance as a secondary requirement. For that reason, we implemented an api gateway with smart caching capabilities using the following tech: Kotlin, Dynamo DB, Spring Zuul as well as an extremely reliable monitoring system leveraging both platform metrics libraries and DropWizard's metrics.

Since we implemented this service, many teams from across the org have used it to improve their performance. Landing pages went from having a TP95 of 3 seconds to 1.5 seconds. Furthermore, the Warpgate has become a critical service that sometimes handles more than requests than CGP (200,000 a minute) and it has never failed us.

We'd like to present the architecture since it can definitely be used as a solution for teams across the company, our reliable monitoring system and how we measure performance gains when new services are onboarded on the warpgate.

10:30
Cheap, simple and fast: serverless cache on S3

ABSTRACT. Ever needed a key-value store that supports high TPS in read and write but not willing to invest time and or $$$ into infrastructure? Then maybe a simple solution using the AWS S3 key-value store as a cache is for you. This talk will briefly review the Geo Insights ETL process needs for a caching technology and the use of S3 as a replicated, highly-available, iterable and cost effective cache.

10:00-12:00 Session 9C: Day 2: Chicago

Day 2: Chicago

Location: Chicago
10:00
Stockyard Lessons Learned, and How We're Applying Them To EKS

ABSTRACT. In its infancy, Stockyard began as a simple containerization solution for microservice applications within Orbitz. It found a new home within Expedia's data centers in Chandler and Phoenix to allow brands to quickly stand up Docker-based services on-prem. Since then, Stockyard has evolved from a niche curiosity to powering Egencia's production global platform in the public cloud.

In the four years since it was first launched, we've learned a lot - not just about Docker, but how to orchestrate a containerized production environment to provide a robust, scaleable platform that can be maintained by a small group of people at scale. In this talk, we'll discuss what lessons we've learned along the way, and how we're applying them to the next iteration of Egencia's containerization platform being built on top of Amazon's Elastic Container Service for Kubernetes.

10:30
@FlexTrace - a better way to do Haystack Trace

ABSTRACT. This talk is about how to do Haystack tracing for 100+ apps with minimally invasive code. We will explore the problems that we faced while doing Haystack implementations for complex flex applications. The challenges of maintaining complex parent-child relationships within a service, providing a consistent way to capture metadata and most importantly doing all these within a single Java annotation, such that developers of any skill level can pick up and implement haystack tracing on their app. @FlexTrace was developed keeping in mind that it can be consumed by any application in Expedia and is not restricted to Flex applications. "Flex" in @FlexTrace stands for "being truly flexible" and not the flex platform.

During the presentation you will learn: - Problems using Haystack out of the box - how to use @FlexTrace to do haystack tracing for your application

Duration: 30-45 mins Audience: all level: beginners

11:00
LShop OSI - Together Is Better!

ABSTRACT. Enable all teams working in LShop codebases to be self-sufficient by making it a truly open source platform.

What does it mean?

- LShop bases are open-source - All teams/contributors should be able to self-service requests - design, code and merge - OSI core team coaches contributors, review requests, create documentation and improve the ecosystem - Anyone contributing to LShop code bases is responsible for health of the systems

Why are we doing this?

- To create a culture where collaboration is easy and scalable - To provide autonomy for all contributors (minimal/no gate-keeping) - To share the responsibility of LShop related code bases with all contributors - To enable a self-organizing LShop community - To provide streamlined communication channels to report/escalate issues and general support

11:30
UI Platform: Bernie and You

ABSTRACT. This talk is for all levels of engineering, product, UX, and other. UI engineers would be the ones mostly working with Bernie, but it’s important for everyone to understand more about it and what they can expect. Bernie is our common platform for front-end applications at Expedia with a focus on performance and PWA (Progressive Web App) best practices. We are working hard to standardize and streamline UI development for our sites so we can deliver a high performant, high quality experience to our users. We will give an overview of Bernie, discuss site performance in the common web, and explore the constantly evolving UI ecosystem of Bernie, UITK, UDS, localization, analytics, etc.

The Bernie overview will be the bulk of the talk and broken down as such:

• What the heck is Bernie • Why we built Bernie • The benefits of teams using Bernie • Bernie’s role in the ecosystem with UDS, UITK, localization, analytics • The goals and roadmap of Bernie

The goal of this talk is to educate the broader group about what we’ve been working on and how/why they can get started using Bernie. We are growing a strong UI community within Expedia and we want to share and spread the passion that we have.

10:00-11:00 Session 9D: Day 2: San Francisco

Day 2: San Francisco

Location: San Francisco
10:00
AWS Cost Optimization – Saving at Scale and Realizing it Quickly

ABSTRACT. On Expedia’s cloud adoption journey, the technology & business teams are learning lessons about where to find optimization opportunities and how to execute at enterprise scale. Increased visibility, daily management, collaboration, personalization, learning between teams, and a healthy quest for better engineering & automated solutions are key elements in the approach to effectively reduce costs as part of day-to-day operational processes. I will share the tools, techniques, structure, and areas of focus which led to cost reductions of 41% in 4 months for BEX AWS expenses.

10:30
Leveraging Airflow for managing workflows

ABSTRACT. Airflow is an open-source workflow engine developed by Airbnb, that allows a developer to create, schedule and monitor data pipelines. It can be thought of being in the same space as Oozie or Azkaban and is not a data streaming solution like Spark or Storm. Airflow provides a set of command line utilities as well as a rich user interface to manage workflows which are modeled as directed acyclic graphs.

The 4 principles that Airflow pipelines are built around are: 1. Dynamic - as the pipelines are basically configuration as code developed in Python, these are built dynamically 2. Extensible - it is possible to define own set of operations and executors to extend the library and customize it to a particular usecase 3. Elegant - parameterizing pipelines is built into the core of Airflow as it leverages the Jinja templating engine 4. Scalable - it uses a message queue to orchestrate an arbitrary number of workers allowing it to scale to support high workloads

These properties of Airflow allow it to be chosen by both developers and non-developers like data analysts to build pipelines for: 1. data warehousing 2. growth analytics 3. search 4. data maintenance

Apart from supporting popular data systems like Hive, MySQL, S3 and Presto, it provides extensible base modules that can be developed on top of. It has already been adopted by companies such as Pandora and Robinhood.

This talk will give a general overview of the features offered and simple POCs to demonstrate the ways Expedia group can leverage Airflow. This will be a talk at the beginner talk and mostly interest teams that work with large data.

11:00-13:30 Session 10: Day 2: Gurgaon

Day 2: Gurgaon

Location: Gurgaon
11:00
New Decaf Domain Build Automation in AWS

ABSTRACT. Q==What your presentation or workshop is about? A== We have recently automated new Microsoft domain build automation for decaf domains across any AZ in AWS. We would like to talk about it & will share a demo on how it functions.

Q==What main points/topics you will cover? A== We will cover following. 1. Cloud formation templates & how to create CFT to automate any new stack build on AWS 2. Various components of the new domain build and the required resources 3. Integration of CFT & AWS with the on-prem FLEXDC environment (StackStorm & Runway) 4. Giving a live demo of the entire automation from start to finish

Q==What the audience will learn by attending your session? They will learn how to use CFT to create any new stack in AWS & how to integrate this with Kumo or FlexDC.

Q==What level of experience the audience needs (beginner, intermediate, advanced) A== Intermediate to advanced knowledge is required on AWS & Stackstorm.

Q==If it’s a workshop, can people join remotely? What will attendees need to participate (laptop, software, etc.)? A== Anyone can join remotely. They need a laptop & BlueJeans running to view the demo.

Q== How long will your session last? A== The session along with Q & A will last 20 to 30 minutes.

11:30
Economics of Serverless

ABSTRACT. We all think that moving to serverless architecture like AWS Lambda reduces cost. But, your innocent looking lambda might end up burning a hole in your pocket. In Haystack, we encountered few such cases where lambda functions were hooked to kinesis stream and we had to move away from `platform-as-a-service` to save on cost. Lambda cost vs when we moved it inside our k8s cluster was 3000 usd/month vs 120 usd/month. We will also talk about intricacies of how AWS runs lambda functions, how the executions are charged and in what scenarios we can look at vanilla services to drastically cut down on cost.

11:45
Tomcat Thread based Autoscaling for Apps deployed on Amazon ECS

ABSTRACT. Problem Statement: Current primer based app deployed on ECS container don't support publishing Tomcat metrics to Amazon cloud watch and also there is no way to autoscale based on the custom metrics published on cloud watch. In past there has been quite a few outages on lodging shopping apps because of non support of autoscaling based on Tomcat Thread.

This problem is now resolved by publishing Tomcat Thread metrics to Amazon Cloud watch using netflix servo library. Using servo we can publish different JMX metrics and enable autoscaling based on these metrics.

We will cover usage of netflix servo library to publish metics to Amazon Cloud Watch. What benefits LShop team got by enabling Thread based autoscaling for our apps. As our primer apps don't support it so other team can use our learning and enable Tomcat thread based autoscaling in their apps.

12:00
Intelligent Filters for Easy Shopping on Mobile App

ABSTRACT. The fastest and most convenient way to shortlist an item on an e-commerce portal is by applying filters. There is a need to personalise the shopping experience in an efficient way so as to decrease the time invested by users to find their desired product. To help users make their booking decisions faster, we can surface the most relevant filters earlier in the funnel. On the basis of the behaviour of users searching for flights on a route, we will improve the discoverability of the filters on that route that the user might apply. This will help users discover their flight faster based on how other similar customers have made their choices. The solution currently is focused on Mobile App domain, as 54 % of people say their phones reduce stress and/or anxiety in their lives therefore, it's important to consider how your brand is creating mobile experiences that cater to people's desire for efficiency. The solution entails understanding the behaviour of users searching for flights with the help of previous filters usage data that is ingested into a machine learning system. The output of the system is an incremental model that can be used for further learning as well as predicting the filters that the particular user might apply to discover the most suitable flight for herself/himself.

12:30
Pay with Points 'SmartConnect Adaptors': A case study in building Integration platforms

ABSTRACT. Stored Value Platform provides a platform to allow customers to pay for travel orders with loyalty points. Expedia group integrates with multiple loyalty point programs to allow customers on varied points of sale the option to pay with their preferred loyalty currency. Expedia systems then connect to tens of different internal and external APIs that allow us to debit these loyalty points.

This naturally leads to architectures where we need to integrate with multiple APIs. Who does this? if a single team is responsible and tasked to do this integration, then the business cannot scale and will be bottlenecked. Hence, we need a platform based on which any business (brands such as beX and HCom) can drive their own integration effort without being limited to any central team.

In Stored Value Platform, we built a SmartConnect Open Adaptor framework solution to solve this problem. With this solution, any development team can build this integration with the internal or external API. The dev team does not require to have functional domain expertise or the knowhow on the complexities of the booking platform to facilitate this integration. They implement a standard interface with the responsibility of transforming the models from this standard interface to the target API model and interface that they are integrating with. They get an SDK, a Testing suite and host of documentation to build this integration on their own.

The talk would consist of two major parts. In the first part, I would very briefly provide an introduction to this framework and introduce the components and Architecture of the framework and platform. In the second part of the talk, I would cover our experience, design tenets, challenges and Architecture considerations when building such a integration platform solution. Some key points from the second part of the talk would include 1. The importance of knowing the goal of the platforms and how you formulate design tenets that help you achieve the goal 2. What are the considerations when you build a programming framework. How do you protect yourself against any implementation of the framework (that you wouldn't have fathomed during design of the SDK) 3. What are the challenges and considerations when you build and provide an SDK? 4. What are the considerations when you are internal open sourcing a component in a very complex functional domain? .. and some more of such topics.

The detailed documentation of the framework is available here: https://confluence.expedia.biz/display/STV/Smart+Connect+Open+Adaptor+-+1.+Introduction

Some pre-existing presentations and video demos are available here https://confluence.expedia.biz/display/STV/SVG+Open+Adaptor+-+Presentations

13:00
Egencia Auth Service - Migrate TIER1 service to Cloud with Multi Region Active-Active Architecture

ABSTRACT. Who are we ? 1. Auth-service is a TIER1 service at Egencia, handling close to 50% of traffic. 2. We support standards based authentication / authorisation (SAML single sign on and OAuth 2) for internal microservices, customer login (web / mobile / virtual assistants), and external partners.

Problem Statement 1. Split current NA region customer accounts DB (PII) into NA and EU region clusters (comply with EU Laws). Support further PII splits to more global regions as more legal constraints emerge. 2. Move Auth applications from Chandler to Cloud with multi region presence (currently AWS Oregon for NA and AWS Dublin for EU). Support expansion to more cloud regions.

Earlier Egencia auth service, including it's single database was hosted in chandler datacenter in NA region. Under EU data protection laws we were obligated to start hosting EU customer data within EU region. Also other Egencia applications were gradually moving to EU region, and wanted auth service locally available, to avoid cross region calls to auth in NA, towards controlling latency and be resilient in line with Vegas Rule.

Therefore auth data needed to be split between EU and NA, and also auth service had to be hosted in EU region. Along with these changes, we had to migrate to AWS regions Oregon/Dublin from Chandler, and find a way to route traffic, both application and multi-database calls traffic in the most optimal way during each of these major transition steps we broken the migration into.

The challenge was to ensure cross region token replications with very low latency and also stay available during most possible failure scenario including regional network partitions. We had to optimise initial token creation time and subsequent validation with very low latency so that all Egencia traffic stays unaffected.

We broke the problem into multiple detailed phases spanning 6 months, targeting both data split and cloud migration in stages. We designed persistence framework to work below the DAO layer, to support data split (dimensions NA-EU, and PII-NonPII) and smart lookup with minimising application code changes and controlling regression impact. The framework was designed to be highly configurable, with various levers to control read/write performance, data replication across clusters with fallback options. Some examples of what the configuration supported were - database cluster selection, database node selection (primary/secondary/nearest), route query to on basis of entity, operation and shard keys.

Towards various NFR goals (resiliency, performance, extensibility, maintainability, availability, fault tolerance), we implemented various optimisations for data refactoring, performance, resiliency and to handle phases transition. Some examples were PII cleanup, nearest node data lookup, async writes, secondary reads, shard resolution, degraded login, bulk-heading, de-duplication, fallback to primary node, enhanced access token, fallback to caches, circuit breakers.

In process, Auth service moved from single region/single database to multi region, with 3 databases, with one of the database clusters being distributed across geographies.

Learnings

1. Break large problem into smaller problems. For complex changes like this problem, detailed multi step plan with smaller changes at each step should be designed to manage risk. 2. Detailed Rollout / Rollback plans. Meticulous rollout/rollback plans to be created for major changes, outlining steps in extreme detail to avoid surprises. Also for big changes, other than rollout plan, the roll back plan should also be rehearsed before in lab environment. 3. Configurable framework. When you design a framework, design it to be as configurable and extensible expecting future requirements. 4. Plan for the worst. Availability and Resiliency matter most especially for a TIER 1 service.When rolling out features, expect all possible failures and build levers to be able to control outcomes. As the number of moving pieces increase, expect failures of each piece and plan to stay available.

11:30-13:00 Session 11: Day 2: Paris

Day 2: Paris

Location: Paris
11:30
Helping travel agents with Data Science

ABSTRACT. This presentation will go through the Big Data techniques and software that we have developed to predict drops in flight ticket prices using Egencia data.

The following topics will be covered:

- Use of agile techniques to create a webapp that improved the productivity of travel agents - Use of Spark and Amazon EMR to create a Data Science model - Use MLeap to execute the model in real time in production - Use Kinesis, Lambda and S3 to collect feedback from agents and improve continuously the model

The audience will learn from our first experience in creating a Data Driven product, including some basics on the software and the data science techniques used.

This presentation is oriented mostly to developers, data scientists and data engineers.

12:00
Domain Specific Language expression using Kotlin

ABSTRACT. Have you heard of infix functions ? read only properties ? Is the syntax erasure appeal to you ? Have you thought about writing code that you can present to your product team ?

I have a quite opinionated point of view : specs should be code, and code should be specs.

It is quite comfortable to make the tests written as acceptance tests, but it is even more transparent to make the code look like the living manifest of your application.

In this talk we will cover all the facilities that we can leverage to get rid of the JVM syntax that we used to accept.

You are either going to use it or refuse it, but you will certainly keep that in mind.

12:30
Measuring availability in Gaia

ABSTRACT. This talk will describe how we measure availability for gaia. Gaia is a tier 1 application in EG used by most brands. The application is deployed in both chandler and aws (us-west-2 and us-east-1). Globally, Gaia is serving more than 10k query per seconds. Our SLA are really aggressive from tp99 10ms for simple service to 500ms for complex API.

Currently our objective is to make Gaia-RT 99.9% available (43.8 minutes of downtime per month) with a success rate of 99.9%

But how are we capable to control that we match this objective?

This is the goal of this presentation: * I will first present the formula behind availability: SLA, what is considered as a downtime, how we can track it using splunk. * On a second part i will present our availability reporting in AWS based on serverless technologies: AWS lambda, AWS step functions, AWS S3 and splunk servers. * Finally, I will present the visualization tool geo-watch-tower, a simple E3 prime full JS webapp deployed on ECS used to check on a regular basis this result.

With this presentation the audience will learn a way to measure availability for an application. The presentation on some serverless technologies like AWS step functions could give idea to some developers as they are not really used/known for the moment in EG.

The audience targeted is developer but i think the first part on how and why we compute availability this way is interesting for TPM or product owners.

12:00-15:30 Session 12A: Day 2: Brisbane

Day 2: Brisbane

Location: Brisbane
12:00
From the Depths of the Mine

ABSTRACT. Though it took multiple tries spanning more than a half-decade, EPS has moved its huge API implementation (well most of it) from the datacenter. At first, it looked like the application would stay in the datacenter forever and be functionally replaced in time with green-field developed applications. Once the scope of the work required for feature parity was discovered and the complete bootstrapping of infrastructure was understood, the new projects were abandoned or retooled for the datacenter. With an eye toward the cloud and with the learnings of the first attempt well in mind, the reasons for abandoning the legacy app were slowly eroded while the infrastructure in the cloud was built up into a suitable ecosystem. When the time was right a new attempt was made, this time succeeding in moving the high volume traffic from the datacenter to the cloud but not without difficulty. Through this adversity new methodologies were introduced which has directly led to more velocity, reliability, and deployability which have all in turn had positive impacts on the ability of the organization to maintain and run the application into the future.

12:30
Cost savings: ElasticSearch hot/warm architecture with EBS and chef

ABSTRACT. What your presentation is about: How our team (Lodging Partner Integration and Experience) uses a hot/warm architecture for our ElasticSearch cluster to trade cost for performance while providing longer data retention periods.

What main points/topics you will cover: - ElasticSearch hot/warm architecture - How we implement this with EBS volume types and chef - Alternatives

What level of experience the audience needs (beginner, intermediate, advanced): - Intermediate

13:00
Principles of OOP and OO design (Code Smells, Design Patterns and Refactoring)

ABSTRACT. Motivation It took almost 3 years for me to be finally introduced to the concept of Design Patterns. It felt like this new knowledge opened my mind and change the way I view as good code. I was surprised how long it took me and kind of regret all the code I have written so far.

concepts OO principles OOP design principles OO design patterns OO code smells code refactoring TDD (superficially) Continuous refactoring Description This talk will introduce, and refresh, the basic concepts of OOP and the principles of good design in a real code scenario, using examples from the systems that I must maintain on a day to day basis. It will also show real examples of refactoring using Design Patterns and following TDD to solve some of the most common Code Smells.

Usually, concepts like design patterns and basic principles for a paradigm are introduced using examples that not necessarily translate to the real world. Exercises like fizz buzz or the game of life, although great didactical tools, don't showcase all the nuances that can be encountered when developing in a real-life system. As part of this talk, we’ll emphasize that these concepts are tools that need to be used with care and thoughtful consideration. None the less, it can be shown how methods of Continuous Refactoring could help curb down tech debt.

Althought this talk is Java centric, this concepts are ortogonal to all OO programming languages, and, although examples are going to be shown with Java in mind, the concepts itself are transferable to any OO language with slight modifications in their implementation.

What will you learn During this talk, you will learn, or refresh, the basic concepts of OOP and OOP design principles. These concepts help produce better and more maintainable code.

It will also introduce to some of the most common Code Smells and the strategies to solve them using Design Patterns and TDD.

Audience This talk is specifically geared towards beginner developers that have a minimum grasp of OOP and it could help as a refresher for more experience devs.

14:00
Contraction hierarchies based journey planning algorithms

ABSTRACT. The state of the art in shortest path algorithms has advanced a great deal since Dijkstra. This talk presents one modern algorithm, PHAST. It combines several useful ideas: contraction hierarchies, exploiting locality, enabling parallelism. The resulting algorithm is extremely fast, quite elegant and relatively simple.

This talk is based on work of Daniel Delling and others, not any original research by us. Although this exact algorithm is not used by SilverRail, the ideas mentioned above have proven useful.

Purpose of the talk is to present interesting ideas to audience that knows basic graph and algorithm theory.

14:30
From the past to the future of SilverRail journey planning algorithms

ABSTRACT. We'll talk through some of the different algorithms explored by SilverRail in the past to solve journey planning problems, and will also provide a brief overview of some of the future directions in journey planning, using crowd sourced data, and extending the definition of "best journeys" to cover more than fast journeys.

15:00
Multi modal tourist site-seeing itinerary planning

ABSTRACT. The Tourist Trip Design Problem (TTDP) solves the problem of optimally planning a day of landmark site-seeing for tourists using the public transport network. We did a proof of concept for this problem using the SlackRoutes algorithm (Gavalas et al) and implemeted for London. Landmarks are prioritised by user interest e.g. castles and by social media popularity. I will further discuss the approach we used for this project and I will also demo the algorithm using a web app.

15:15
Scenic Journey Planning for Rail: A case study in the UK

ABSTRACT. Scenic Journey Planning offers passengers information about how picturesque or beautiful the route they intend to travel on is. It also offers them more scenic alternatives. Earlier this year we conducted a proof of concept project to demonstrate scenic routing on the UK railway network. I will present the approach we took and a demo of the finished product.

12:00-15:00 Session 12B: Day 2: London

Day 2: London

Location: London
12:00
Mariano Albera Talk

ABSTRACT. TBC

12:30
Next Generation CD Pipelines

ABSTRACT. Learn how EPS are using a layered continuous delivery pipeline abstraction to manage the complexity of deploying many components into multiple AWS regions. We are using AWS serverless technologies to implement low cost fault tolerant pipelines. We will share what we’ve learned while implementing these pipelines using AWS Step Function State Machines. Finally we will look at the software delivery UX and what we can do to maximise performance.

The presentation content will be suitable for all levels.

https://www.linkedin.com/in/jakedanielcollins/

https://mosaic.exp-tools.net/#/profile/jakcollins

13:00
Continuous Deployment pipeline with AWS Step Functions and an in-house CLI tool

ABSTRACT. Abstract

Continuous deployment aims to reduce the time elapsed between the commit of new code and this code being executed in production. What makes the difference between Continuous Delivery and Continuous Deployment is that in the latter no manual steps are required (Zero-touch).

EPS (formely EAN side) with the goal of providing a zero-touch pipeline for development teams has built a Cloud Deployment Platform which leverages on AWS Step Functions State machines and an internal CLI tool.

Further Details With AWS Step Functions State Machines is easy to check each stage involved in the deployment process, and at the same time to build something tailor made to your organization’s needs.

While a CLI tool allows CI agents to interact with the Deployment Platform, in particular to “prepare” a deployment, generating all the metadata that is needed to orchestrate the operations, and to upload all the items that are necessary, from deployment/project metadata files to application artifacts.

The CLI tool that the Cloud Platform team at EPS has developed is written in Ruby and provides utilities for the deployment in AWS Cloud. The main features available with this tool are: • Ability of doing multi branch builds and deploys in test environment at the same time, and the same project; • Automated Cloud resource tagging; • Dynamic Cloud Formation parameters resolution and environment overrides; • Zero-Touch deployments.

The main components that interacts each other in order to complete a deploy are: • Platform Orchestration Service: it orchestrates the multi-region artifacts synchronization, deployments and logs the results in a release management tool; • Deployment Service: used by Platform Orchestration Service to perform the deployment in each region and environment; • Continuous Deployment Bot (part of the release management tool): approve deployments automatically and tracks the process.

How do all the components work together? A merge to the master branch triggers a build in the CI server, where the artifacts for deployments are created. Once the build is successful, depending on the nature of the artifact, it is pushed to S3, or if a Docker Image, to a ECR Repository. Also at this stage the platform before to start the deployment process generates a file which instructs the deployment service where to find the artifacts for the deployment.

After that the deployment process is triggered via a CLI command which starts the Platform Orchestration Service, which first of all checks if a deploy in the target environment(s) require a the approval from the release management tool. Once the approval has been granted, or if not necessary, the deployment is performed by the Deployment Service called by the Platform Orchestration Service. The Platform Orchestration checks the state of the Deployment Service periodically, once the deployment is executed, then it notifies the release management tool if necessary, and therefore the deployment is completed.

Under the hood of the Orchestration Service and Deployment Service there are Step Function State Machines. State Machines allows to track deployments states easily not only from AWS API, but also from a visual workflow available in the AWS web console. You can see a graph which allows you to understand what the state of the execution is, and if there has been a failure you can see at what stage, making the troubleshooting much easier.

In case of failure our implementation is able to catch it and notify the release management tool of what environment had a failure and then the machine terminate its execution in a failed state.

Also with Step Function State Machines it is possible to implements retries and exponential backoffs in case of a component is temporary unavailable, or in a polling scenario while waiting for the completion of an operation, such as an AWS Cloud Formation stack update.

This implementation allows us to deploy very easily, in any desired environment/regions and to track all the steps involved in the process, without any overhead for development teams.

13:30
A cloud-native egress solution for Hotels.com

ABSTRACT. This presentation tries to show the journey that hotels.com is doing in order to get a secure egress solution for containers running in Kubernetes and AWS. The solution explores the usage of service mesh (by using Istio) to protect our workload from exposing data to malicious external services. It also tries to summarize the different maturity stages that we have gone through. It is an advanced topic not only for the technology and risks involved but also for showing how to bring collaboration between an Expedia brand and Expedia central services (ERS) for a success case.

The audience expected is merely technical interested in cloud and containers with an intermediate/advanced knowledge on AWS, Kubernetes, and Istio.

The session is planned for last in between 45 - 50 mins with 10 mins for questions. It will be held in London and the people can join remotely.

14:00
Tools and approaches for migrating big datasets to the cloud

ABSTRACT. This presentation describes the journey taken by the Hotels.com big data platform team when tasked with migrating big data sets and pipelines from our shared Expedia Chandler Hadoop cluster to AWS. We present open source tools that we built to overcome the unexpected challenges we faced around sharing data between various EG units storing their data in different AWS accounts.

The first of these is Circus Train — a dataset replication tool that copies Hive tables between clusters and clouds. We will also discuss various other options for dataset replication and what unique features Circus train provides. The second tool is Waggle Dance — a federated Hive query service that enables querying of data stored across multiple Hive metastores. We will demonstrate the differences between Waggle Dance and existing federated SQL query engine tools and what use cases it enables. Giving real world examples, we will describe how we've used these tools to successfully build a petabyte scale platform that is now also being used by other brands within the Expedia organisation. We focus on actual problems and solutions that have arisen in a huge, organically grown corporation, rather than idealised architectures.

This talk is aimed at a technical audience but should be understandable by anyone with a basic understanding of Hadoop, Hive and AWS.

14:30
Apiary Combining data lakes to enable data sharing

ABSTRACT. Brands within the Expedia Group have been migrating their data platforms to AWS with the goals of increased agility, resource elasticity, and the potential to reduce costs. However, while we were able to realize many benefits from this, the shift from an organization-wide monolithic cluster to multiple brand-owned platforms presented some interesting technical challenges. Our cloud-based data lakes became data silos that impeded our ability to explore, discover, and share data across our organization.

Apiary is an open source project created to tackle these issues by providing a standard pattern for data lakes and enabling dataset sharing. In addition to enabling cross-brand data collaboration in the cloud, Apiary is also a good example of a cross-brand software development project.

In this talk, we'll describe the problems that Apiary solves and its architecture. We'll also describe the processes that we've used to enable developers from HCom, DSP, and BEX, working in different locations, to deliver a critical component of EG data infrastructure. Finally, we'll describe how you can get involved with the project.

13:00-16:00 Session 13: Day 2: Austin

Day 2: Austin

Location: Austin
13:00
Building a Serverless Architecture in AWS

ABSTRACT. EPS’s Content teams have spent the last year creating a Serverless architecture in AWS to intake Expedia’s lodging content and expose that data to our partners. In this presentation we’ll talk about why we chose Serverless and how we have used it to efficiently and quickly build cloud native applications. We’ll go over AWS technologies such as Lambda, SNS, and SQS and how these pieces fit together to build services that are fast, scalable, and resilient. We will also briefly discuss some challenges we faced when transitioning to Serverless and how we handled them.

13:30
JK Talk

ABSTRACT. TBC

14:30
Expedia Group Inner Source Panel

ABSTRACT. In the past couple of years, we've had some highly visible and successful inner source and open source projects: Cloud Gate Proxy / Styx, Abacus, Haystack, and more.

This panel focuses on bringing together core contributors and maintainers of these projects to learn about their inner source practice and culture, what their success stories are, and lessons learned.

The panel Q&A will also be open to the audience where EG Inner Source TSC chair(s) Trevor Livingston and Ronny Katzenberger can help answer questions about what the TSC is doing to make Inner Source and Open Source more successful.

Panel participants would include (proposed): Magesh Chandramouli, Seth Hodgson, and others (TBD based on projects the EG Inner Source TSC picks).

15:00
Event Sourcing To The Cloud

ABSTRACT. Intermediate

More progressive data architecture patterns, like event sourcing, help businesses address new challenges that occur when migrating to the cloud - challenges like eventual consistency, legacy service dependencies, elastic scale, and unreliable networking, to name a few. Event Sourcing is a data architecture pattern that changes the paradigm of data storage by storing data as a sequence of domain state changes. In essence, data is stored like a log that services can consume.

At HomeAway, we have been able to migrate services to the cloud very quickly that were blocked, for months, by dependent services. These experiences can now take full advantage of data locality and elastic scale. Additionally, the data has been democratized in the cloud, so new experiences can leverage that data for near real-time data loops and reactive streams.

This presentation will cover how leveraging the event sourcing pattern has accelerated HomeAway initiatives, our learnings, and recommended practices.

- Break free of legacy dependencies. - Democratize data. - Use event sourcing to feed new cloud experiences. - Position your company for strangling legacy services.

Adam Haines is a Data Architect for the Data Engineering Services team at HomeAway. Adam provides leadership, mentoring, and technical direction for delivering highly elastic, secure, and scalable big data platform solutions, in both AWS and in the data center. Adam’s primary responsibilities encompass evaluating and rolling out scalable data platform, evaluating new technology, mentoring and training, establishing data architecture patterns that help accelerate HomeAway’s initiatives, and architecting innovative tools that help accelerate cloud adoption.

15:30
Fully-Managed E-Invoicing B2B with AWS Lambda

ABSTRACT. Main points: AWS resiliency, secure data sharing, pay-as-you-go costs, support challenges Audience learning: How to develop and productionalize a system built exclusively on AWS managed services Audience experience: Moderate to advanced AWS developers and novice B2B specialists

HomeAway and Expedia leadership made made a priority of migrating HomeAway's product platform out of an Austin-based data center and into Amazon Web Services (AWS). More recently, Expedia has been recommending we use managed services–cloud functionality whose uptime and scaling are managed by Amazon–to the extent possible to lower our operational costs and risks. Examples of managed services provided by AWS include Lambda, DynamoDB, Simple Queuing Service (SQS), and Simple Storage Service (S3).

Our distributed ecommerce team implemented a new e-invoicing solution that uses all of those managed technologies. This new system is the first step in a broader strategy to migrate the entire invoicing platform from of an aging, internally-hosted legacy system to a fully-managed AWS ecosystem. It was particularly challenging because it must protect personally-identifiable information (PII) while sharing it in a controlled fashion with an external vendor via an S3-based B2B integration.

The team delivered the e-invoicing product with a minimum of development effort and it runs well in production. The development team was new to all of the technologies involved, however, and can share valuable lessons they learned about how to design, implement, and manage such a system.

Our presentation will discuss:

* The existing invoice business process and the structure of the legacy system we aim to replace * How e-invoicing works and why we were required to implement a new solution for it * The tremendous advantages we realized through the use of AWS for B2B, including: - Out-of-the-box cross-account data access and control using S3 - Our vendor's use of AWS Storage Gateway to cloud-enable their internal solution * The design of our solution, focusing on: - Event-driven architecture - "Vegas rules": What happens in one data center stays there; communication across data centers is held to a bare minimum - Parallel development through modular design - Its role in our broader cloud migration strategy * Overview of the AWS technologies we used and some lessons learned for each: - Terraform for infrastructure-as-code and CI/CD pipeline -- This toolchain is immature for continuous delivery and we paid some pioneer penalties - Lambdas written in NodeJS and Java to implement triggers and business logic -- Triggers have some surprising semantics and design constraints - S3 for data storage and integration both with the legacy system and our B2G vendor -- This is very powerful and far advanced over legacy B2B, but configuration is difficult to get correct - DynamoDB for tracking invoices through their lifecycle -- DynamoDB change streams have in-order semantics that can stop the event flow cold - SQS for resilient task queueing - Cloudwatch, integrated with our Splunk indexers, for operational monitoring * General lessons learned from this solution and the experience of developing it: - Teamwork across continents can be more productive when the technologies used are widely documented and easily accessed globally; we also found these particular technologies fun to work with, helping boost morale - Experts in AWS technologies were readily available internally and from Amazon and were invaluable in delivering a quality system - Not all use cases are suited well to the architecture that emerged for e-invoicing; every design must factor in their particular load characteristics and cost tolerances

At least the Austin-based members of the team will be present to provide color and answer questions about the technologies and their experiences using them.

13:30-14:30 Session 14: Day 2: Bangalore

Day 2: Bangalore

Location: Bangalore
13:30
Best Practices to migrate a service to cross brand AWS infrastructure

ABSTRACT. What your presentation or workshop is about? >Best Practices to migrate a service to BEX AWS infrastructure.

What main points/topics you will cover? >Common hurdles for any service migration >Know about BEX AWS infrastructure & components >Leverage the components of the BEX infrastructure >IP Whitelisting on consuming stack & external connectivity. >Stress test on AWS infra >Shadow traffic / Traffic migration

What the audience will learn by attending your session? >Enlighten the users about the hurdles or key requirements for AWS migration.

What level of experience the audience needs (beginner, intermediate, advanced)? >Intermediate + Advanced

If it’s a workshop, can people join remotely? What will attendees need to participate (laptop, software, etc.)? >Not a workshop. Can join remotely.

How long will your session last? >30 mins

14:00
AWS Resource estimation for Spark Jobs

ABSTRACT. *Problem* In the world of Batch Analytics running on Cloud infrastructure, like AWS, a user has following set of concerns: ◦ What’s the cheapest EMR cluster to run my job under an hour? ◦ Given a budget, what’s the minimum runtime I can achieve for my job? ◦ What kind of EC2 instance types I should use as part of the EMR cluster. ◦ How to choose the right size of cluster on a given day?

*Idea* Cloud Infrastructure Providers like AWS and Google Cloud offers elastic and scalable compute resources which are leveraged by long running analytics jobs to execute within a defined SLA. At Hotels.com we aim to predict the AWS EMR cluster size and the compute resources that needs to be allocated in order to execute the Spark Job successfully within the SLA, thereby optimizing the AWS cost. We capture the resource usage by different Spark executors, and the time taken by different stages during shuffling, serialization and disk or memory spill. We use these metrics for feature extraction as part of the process of building the Model. We use linear regression model to run the job with different dataset and different EC2 instance types in EMR cluster and based on stats captured, we tune the model.

*Key Objectives* ◦ Find the optimal number of machines in EMR cluster ◦ Improve prediction accuracy ◦ Reduce the Model training overhead ◦ Choose the right set of EC2 instances

*Learnings & Outcome*

The above approach for optimal AWS resource usage could be leveraged by other teams in Expedia to reduce the AWS cost significantly. This would be a great example of how to use ML by Data Platform team of different brands in Expedia in order to provide a generic framework for spawning the EMR cluster of right size. Different teams could come together to improve the model, thereby increasing the accuracy of the prediction.

14:00-18:00 Session 15: Day 2: Bellevue

Day 2: Bellevue

Location: Bellevue
14:00
MICHAEL NIXON CTO TALK

ABSTRACT. TBC

14:30
Realtime monitoring of Cloud Security Controls

ABSTRACT. Canopy is a Expedia's Cloud Governance tool that is used to audit our cloud environments for compliance with our cloud security controls. Currently these controls are being monitored on a schedule basis. Realtime monitoring will ensure that all automated cloud security controls are monitored, remediated or notified almost realtime for any possible violations.

15:00
Tokenization of Sensitive Informaton

ABSTRACT. This presentation covers a discussion on the approach of tokenizing sensitive data to limit its exposure. We will talk about what it means to tokenize information, how tokenization impacts compliance for PCI and GDPR, and techniques for integrating tokenization into a system after it has been constructed to pass around the full information. The audience will learn about upcoming approaches for minimizing security scopes and how it can benefit their teams. This sessions is designed for beginners.

15:15
xTech Rising Program

ABSTRACT. I'd like to present the xTech Rising Program, including the who/what/why. I'd like to present what's happened so far in the program, what's going on now and what is on our future roadmap. I'll be soliciting for feedback and poll/focus group participation to shape the program to be what engineers, TPMs, Product, UX and technologists want it to be.

I can give a Talk and have a Working Session to get a pulse as to what the attendees are looking for in terms of connecting with each other across EG Organizations.

In terms of presenters, I may have some xTech Champions (https://go/xtechchamps) join me, as we are all working in a Virtual Group (v-group).

(I'm not certain whether I need to submit here, but just want to be sure it's included in the schedule.)

15:30
Dr. Shadow

ABSTRACT. Using Dr. Shadow, you can take a slice of prod traffic and test it against your new piece of software in pre-production environment, and compare it with your live production code performance. This will help with testing the new piece of code with live traffic even before its used by users.

Dr. Shadow library can just be added as a dependency in your service code, which then allows you to send a percentage of production traffic to pre-production/Gamma stage. Some key points about Dr. Shadow library, 1. Real traffic: Live prod traffic is being shadow tested against new code in different environment and no impact on production traffic or end users. 2. Easy to adopt: Dr. Shadow is maven library which we can include in our pom.xml. 3. Configurable: We can turn on or off testing per API and environment. 4. Compare Error rates, latencies, analytics: Once the shadow traffic is turned on, the results of the computation from both Gamma and Prod are sent to DataDog and Kibana (or any monitoring applications which your services reports to), which then allows us to compare all various metrics.

How we use the Dr. Shadow in Egencia? 1. Regression testing The shadow testing capability, coupled with other automated testing techniques like unit, component and integration testing, greatly powers the CI/CD process as you can test the new code against real production traffic both in terms of scale and variety of use cases that can almost be impossible to write regression tests for. We have designed Dr. Shadow to be a plug-and-play library that service owners can just define as a dependency and then set up configuration (in whatever configuration management system your service uses) to define what percentage of production traffic to be sent to which test endpoint. This setup allows you to do not just regular functional testing. 2. Cloud migration and scalability testing Other Tier-1 services in Egencia are using Dr. Shadow as well and it has greatly simplified rolling out of big projects like AWS migration. 3. Performance/Load testing We can configure the settings of shadow traffic to introduce more load than a single instance in Prod. 4. Chaos testing We use Dr. Shadow in pre-prod environment along with one of our other library Dr.Squid to mock errors from downstream so service resiliency is tested.

We would love other teams within Expedia Group to adopt and benefit from this. There are other enhancements we can make this to this library as well, so we would welcome feedback as well as contributions.

16:00
Getting started with bex-api

ABSTRACT. For almost 2 decades REST tried to describe how to work with resources given a set of constraints. Each resource being described with its own model and loosely linked to others with some conventions.

As REST doesn't enforce any particular standard, each web service usually end up reinventing a part of the wheel: - which url convention to adopt - how to pass certain information (query param, path param, payload, header) - the structure of the targeted resources - how to link resources together.

By nature, the client-server relationship in loosely coupled and overtime developers felt the need to reinstate a light contract between the two. Different forms of contracts are possible driven by different technologies such as Open API, JSON Schema or GraphQL. Each of them having their own approaches, ways and opinions on how to do things.

In the Expedia world, a lot of teams are answering those questions but with a slightly different approach resulting to a more complex development model for front-end API consumers like Mobile, PWA or conversation platforms.

The bex api project is simplifying that model by creating a consistent set of easy to discover, data-oriented, GraphQL apis.

During this presentation you will learn about: - Graphql and its query language - how bex api fits into the Expedia ecosystem

Audience: all level: beginners Locations: Bellevue, Chicago, San Fransisco and Brisbane

16:30
Apiary: Federating Data Lakes via Open Source Cross Brand Collaboration

ABSTRACT. Apiary: Combining data lakes to enable data sharing

Brands within Expedia Group have been migrating their data platforms to AWS with the goals of increased agility, resource elasticity, and the potential to reduce costs. However, while were able to realize many benefits, the shift from an organisation-wide monolithic cluster to multiple brand-owned platforms presented some interesting technical challenges. Our cloud based data lakes were also data silos that impeded our ability to explore, discover, and share data across our organization.

Apiary is an open source project created to tackle these issues by providing a standard pattern for data lakes and enabling dataset sharing. In addition to enabling cross-brand data collaboration in the cloud, Apiary is also a good example of a cross-brand software development project.

In this talk we'll describe the problems that Apiary solves and it's architecture. We'll also describe the processes that we've used to enable developers from HCom, DSP, and BEX, working in different locations, to deliver a critical component of EG data infrastructure. Finally we'll describe how you can get involved with the project.

17:00
Data Modelstorming: Collaborative Requirements Gathering and Data Modeling

ABSTRACT. Introduction ------------ Gathering requirements is hard. Customers often don’t know what they want or provide requirements in terms of solutions. BEAM (Business Event Analysis & Modeling) helps close the gap between the business users’ data requirements and database design. Whether on premise or cloud-based, BEAM offers platform independent techniques that are business-friendly and free of technical jargon. It enables groups of business and IT professionals to modelstorm (model and brainstorm) data collaboratively to gather and prioritize data requirements, create a shared understanding of data analytics opportunities and design more flexible BI solutions.

Background ---------- Modelstorming techniques are the brain-child of consultant, author and educator Lawrence Corr. The topic is covered in his book "Agile Data Warehouse Design". A number of Expedians attended Lawrence’s three-day course in Seattle a couple of years ago. It was so well received that we brought Lawrence in-house to provide training at Bellevue HQ in 2017 and purchased the rights to use Lawrence’s material for internal training. Modelstorming techniques are applicable beyond data modeling. Modelstorming is useful any time there is a need to understand a business process and the data involved.

Proposed Presentation --------------------- This session introduces the innovative practice of Modelstorming. Topics covered include:

• Using narrative, visual thinking, 7Ws and lots of Post-it® notes to get everyone thinking more clearly about data requirements and creating better data models

• Modeling data requirements collaboratively using visual thinking and storytelling

• Helping business users develop powerful mental maps for exploring their data

• Rapidly translate data requirements into efficient, flexible dimensional database designs

• Interactive in-session example of modelstorming in action

Level of Experience: Any

https://confluence.expedia.biz/display/~mkebbe/Data+Modelstorming

17:30
BHALA DALVI CTO TALK

ABSTRACT. TBC