SEARCH

The 2018 F8 Developer Conference

Every year, Facebook hosts its F8 Developer Conference, where the company shows off its latest features, highlights development plans for the coming year, and connects with the thousands of businesses that interact with the platform every day. Koddi sent two representatives to San Jose, California last week in order to […]

Share this post
Read More
Developing with an Immutable Infrastructure

What is Immutable Infrastructure? Simply put, your hardware stack is created and maintained using the programming concept of immutability: once something is instantiated, it is immutable and does not change. If an update is needed (either from a scheduled upgrade or bug), a new instance is created to replace the […]

Share this post
Read More
Building an Event-Driven, Fault-Tolerant Data Pipeline with AWS Lambda, Alluxio, and Spark

In our platform we often have to fetch data from various locations (e.g. S3, SFTP, API) and in various formats (CSV, TSV, JSON, XML) because we have an incredibly diverse client and publisher catalog and each one provides their data in their own unique way. As we have grown over time, we’ve amassed a large list of microservices, processes, and configuration that handle these different data sources and files. The biggest issue that we’ve run into with these services is that the various portions of the data pipeline do not interact as well as we would like, so if there are any errors in that process for any reason, it can be difficult to track down where it is at times. We have begun to feel some strain from this, so we’re abstracting and centralizing as much as we can.

Share this post
Read More
How To Add Basic Hotel Booking to Chat

In the past years, we’ve seen an explosion of chat bots across multiple industries. Many times we are asked what can a chat bot do, and how would it benefit our product? In our experience, chat bots need to be tailored specifically to what a client would want otherwise, there is a very generic feeling to these bots (much like calling into an automated call center). So how can we make a bot succeed in an area crowded with thousands of existing bots?

Share this post
Read More
Running Alluxio with Docker and S3 on DCOS/Mesos/Marathon

At Koddi we’re always looking for ways to increase the speed and stability of our platform. One of our latest projects is speeding up our daily ingestion of data.

All of our data is initially stored in flat files on S3 before being loaded into our database. We’re currently in the process of integrating Apache Spark into our load process to drastically increase the speed of our loads. One problem we ran into is that S3 doesn’t behave like a normal file system in terms of read and write speeds. This is where Alluxio comes in. Alluxio is a “memory speed virtual distributed storage system” which lies between frameworks (such as Spark, MapReduce, Flink, etc.) and a storage system (Amazon S3, Google Cloud Storage, HDFS, Ceph, etc.). This allows for dramatically faster data access, with some users seeing a 30x increase in data throughput. For a more in-depth overview of Alluxio, see their documentation.

Share this post
Read More
How Engineers Can Help Drive Innovation

Every engineer out there is looking to build something amazing, just like every visionary likes to see their ideas come to life. Unfortunately, innovation can be lost in the day to day, technicalities, other priorities, and requirements documents. All of these have created pitfalls for many promising projects, but it doesn’t have to be the case if you can be aware of where those pitfalls may pop up and implement a little bit of autonomy in bridging those gaps.

Here are a few things that we do to keep our engineering team connected to and at the forefront of innovation.

Share this post
Read More
Schema.org 3.1: Hotels and Hospitality Just Got A Lot More Structured

Hospitality brands gained some new ways to share information with search engines in this week’s Schema.org release, allowing hoteliers to specify everything from what kinds of rooms are on offer to whether they’re pet friendly. These enhancements to the markup standards, which Google uses to enhance a site’s search results, set the stage for travel shoppers to more fully assess what chains, single-location B&B’s, OTAs and even peer-to-peer networks like Airbnb have to offer directly on search results pages – and maybe even someday book from there as well.

Share this post
Read More
Reactive Applications with AWS Lambda

Sometimes you may find yourself requiring a CRON script to clean a file, or maybe you need to watch a directory of images to create preview thumbnails when they arrive on the server. Processes like these suffer from the same limitation; they require you to poll a script until you get a “successful” result.

This is problematic because it forces the developer to write redundancy checks in the code instead of just focusing on the core problem. Moreover, file watching utilities generally notify once the file is created, not when the file is finished writing. All of these problems must be accounted for, and result in more complexity, overhead, and development time.

This is where event driven programming can greatly reduce your development overhead. Part of that is maintaining a centralized data lake for all of your raw files. Data lakes generally maintain an event API for easy management and access of files within the lake. In our case, Amazon S3 is the data lake of choice and thanks to AWS Lambda we can hook into the S3 event API with minimal effort for simple use cases like cleaning files.

Share this post
Read More
Optimizing your Docker workflow

We create a lot of single responsibility services including fetching mail, downloading groups of files, cleaning data, importing data, and many others. This requires us to create new servers that need to be monitored and maintained so we use docker containers to normalize our process and work efficiently. From testing to staging to production, docker containers provide a simplistic way to create disposable server images.

The primary drawback is most docker images lack proper setup or are not designed for your network or architecture. Below are a list of recommendations that will make creating docker containers a less time consuming process.

Share this post
Read More
Moving One Billion Rows in MySQL (Amazon RDS)

So you may remember from our article in November of 2014 about our switch to Redshift, that Koddi uses Amazon Web Services (AWS) to power our platform. While we have moved some of our data to Redshift, we still have quite a bit in MySQL (RDS), and at the beginning of this year we needed to move our main database from one AWS account to another. The normal process when creating a copy of a database in RDS is to take a snapshot and spin up a new database from this snapshot. However, Amazon doesn’t allow you to share snapshots between accounts. This posed the question, how do we efficiently migrate over a billion rows of data?

Share this post
Read More