Developing with an Immutable Infrastructure

What is Immutable Infrastructure? Simply put, your hardware stack is created and maintained using the programming concept of immutability: once something is instantiated, it is immutable and does not change. If an update is needed (either from a scheduled upgrade or bug), a new instance is created to replace the […]

Read More
Koddi Releases Scala Geocoding Library

Today Koddi is excited to announce the initial release of our own Scala Geocoding library. Here at Koddi, we value open source projects that allow small organizations to grow quickly and hope this project can return the favor for other developers out there!

A few may be asking why we chose to write our own library for something as simple as Geocoding. When we were researching libraries to use, we noticed a lack of high-quality Scala geocoding libraries. There are some available<, but most of them never really caught our eye, so we set out to write a clean, lightweight library that any Scala programmer can use.

Before building the library, we had some clear-cut objectives: No 3rd-party dependencies, fully tested, compliant with the Google Geocoding API including tertiary parameters, and easy enough to use that developers would want to adopt this library. The Koddi Geocoder accomplishes all of these goals and some additional features we were able to roll in afterward. Let’s dive in and take a look at some usage examples.

Read More
Building an Event-Driven, Fault-Tolerant Data Pipeline with AWS Lambda, Alluxio, and Spark

In our platform we often have to fetch data from various locations (e.g. S3, SFTP, API) and in various formats (CSV, TSV, JSON, XML) because we have an incredibly diverse client and publisher catalog and each one provides their data in their own unique way. As we have grown over time, we’ve amassed a large list of microservices, processes, and configuration that handle these different data sources and files. The biggest issue that we’ve run into with these services is that the various portions of the data pipeline do not interact as well as we would like, so if there are any errors in that process for any reason, it can be difficult to track down where it is at times. We have begun to feel some strain from this, so we’re abstracting and centralizing as much as we can.

Read More
How To Add Basic Hotel Booking to Chat

In the past years, we’ve seen an explosion of chat bots across multiple industries. Many times we are asked what can a chat bot do, and how would it benefit our product? In our experience, chat bots need to be tailored specifically to what a client would want otherwise, there is a very generic feeling to these bots (much like calling into an automated call center). So how can we make a bot succeed in an area crowded with thousands of existing bots?

Read More
Running Alluxio with Docker and S3 on DCOS/Mesos/Marathon

At Koddi we’re always looking for ways to increase the speed and stability of our platform. One of our latest projects is speeding up our daily ingestion of data.

All of our data is initially stored in flat files on S3 before being loaded into our database. We’re currently in the process of integrating Apache Spark into our load process to drastically increase the speed of our loads. One problem we ran into is that S3 doesn’t behave like a normal file system in terms of read and write speeds. This is where Alluxio comes in. Alluxio is a “memory speed virtual distributed storage system” which lies between frameworks (such as Spark, MapReduce, Flink, etc.) and a storage system (Amazon S3, Google Cloud Storage, HDFS, Ceph, etc.). This allows for dramatically faster data access, with some users seeing a 30x increase in data throughput. For a more in-depth overview of Alluxio, see their documentation.

Read More
How Engineers Can Help Drive Innovation

Every engineer out there is looking to build something amazing, just like every visionary likes to see their ideas come to life. Unfortunately, innovation can be lost in the day to day, technicalities, other priorities, and requirements documents. All of these have created pitfalls for many promising projects, but it doesn’t have to be the case if you can be aware of where those pitfalls may pop up and implement a little bit of autonomy in bridging those gaps.

Here are a few things that we do to keep our engineering team connected to and at the forefront of innovation.

Read More 3.1: Hotels and Hospitality Just Got A Lot More Structured

Hospitality brands gained some new ways to share information with search engines in this week’s release, allowing hoteliers to specify everything from what kinds of rooms are on offer to whether they’re pet friendly. These enhancements to the markup standards, which Google uses to enhance a site’s search results, set the stage for travel shoppers to more fully assess what chains, single-location B&B’s, OTAs and even peer-to-peer networks like Airbnb have to offer directly on search results pages – and maybe even someday book from there as well.

Read More
Designing event based applications with incron

Recently we wrote an article on leveraging AWS Lambda to create event based applications using S3. However what happens when you don’t have access to S3? What if you are using FTP or shared drives? Luckily there are still solutions! One way to accomplish this on Linux  is using incron. […]

Read More
Reactive Applications with AWS Lambda

Sometimes you may find yourself requiring a CRON script to clean a file, or maybe you need to watch a directory of images to create preview thumbnails when they arrive on the server. Processes like these suffer from the same limitation; they require you to poll a script until you get a “successful” result.

This is problematic because it forces the developer to write redundancy checks in the code instead of just focusing on the core problem. Moreover, file watching utilities generally notify once the file is created, not when the file is finished writing. All of these problems must be accounted for, and result in more complexity, overhead, and development time.

This is where event driven programming can greatly reduce your development overhead. Part of that is maintaining a centralized data lake for all of your raw files. Data lakes generally maintain an event API for easy management and access of files within the lake. In our case, Amazon S3 is the data lake of choice and thanks to AWS Lambda we can hook into the S3 event API with minimal effort for simple use cases like cleaning files.

Read More
Optimizing your Docker workflow

We create a lot of single responsibility services including fetching mail, downloading groups of files, cleaning data, importing data, and many others. This requires us to create new servers that need to be monitored and maintained so we use docker containers to normalize our process and work efficiently. From testing to staging to production, docker containers provide a simplistic way to create disposable server images.

The primary drawback is most docker images lack proper setup or are not designed for your network or architecture. Below are a list of recommendations that will make creating docker containers a less time consuming process.

Read More