Dynamic data processing and data science in the cloud

Data engineering isn’t a static activity, so why should your data infrastructure remain fixed? Cloudera Enterprise in the cloud brings elasticity and flexibility for large-scale data processing and data science activities.

Flexibility and self-service, without the high cost.

Reduce the cost of short-lived workloads like ETL and data modeling while giving data engineers access to more data and the flexibility to use the processing and analytics tools they’re most comfortable with. Data scientists can experiment and provision the compute and storage resources they need without involving IT. Cloudera Enterprise on cloud infrastructure enables data engineers and scientists to leverage cloud-native object stores and spin up and elastically grow and shrink clusters in the cloud to reduce TCO.

Read the blog

Reimagine data engineering in the cloud

Cloudera Enterprise

Pay-as-you-go pricing

Reduce cost by utilizing transient clusters, and pay only for the time your workload is running.

Work against cloud-native storage

Run fast batch jobs with Hive and Spark against data in common object storage.

Grow and shrink on demand

Elastically scale your cluster, then shrink or terminate as business demands and SLAs change.

Technology benefits

Cloudera Enterprise is the comprehensive platform for data science and engineering in the public cloud. Whether users are launching multiple workloads on a multi-tenant environment or designing jobs that leverage cloud infrastructure for specific jobs like ETL and batch processing, they can expect reliable performance alongside the flexibility they get with the public cloud. Cloudera Enterprise isolates storage from compute to help you achieve your business objectives without investing in permanent infrastructure. Users can achieve even deeper savings by utilizing infrastructure at its cheapest via spot instances on Amazon.

Cloudera Altus Director offers administrators full management across multiple cloud providers with easy instantiation and package deployment. Apache Spark™ workloads can run natively on data living in object storage, reducing data movement over the network. Data science can be done directly on cloud native data, and machine learning algorithms can be developed, trained, and saved for additional flexibility.

Misa Amane