Challenges in data sharing and transfer

The various parts of an ML system. From Hidden Technical Debt in Machine Learning Systems.

Introduction

Avoid data transfer

Public or synthetic data

Bring-your-own-cloud

But data transfer is inevitable

Data transfer is not that simple

SFTP is not quite right for the cloud

The status quo in cloud-to-cloud data transfer involves humans at multiple steps in the process.

Complex pipelines leave the scope of the data team

Managing compliance is difficult

Testing is difficult

Versioning is difficult

Properties of a new solution

Interoperability

Auditability and consistency

Security and compliance

Data validation and testing

Conclusion

The public and private clouds aren’t connected. Let’s build some pipelines.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store