Managed Hadoop, Spark, HBase, and Storm made easy. • A Data Lake service • Scale to petabytes on demand • Process unstructured and semi-structured data • Develop in Java, .NET, and more • No hardware to buy or maintain • Deploy in Windows or Linux • Spin up a Hadoop cluster in minutes • Visualize your Hadoop data in Excel • Easily integrate on-premises Hadoop clusters
Azure HDInsight is a Hadoop distribution powered by the cloud. This means that HDInsight was architected to handle any amount of data, scaling from terabytes to petabytes on demand. You can spin up any number of nodes at any time. We charge only for the compute and storage that you actually use.
Because it’s 100% Apache Hadoop, HDInsight can process unstructured or semi-structured data from web clickstreams, social media, server logs, devices and sensors, and more. This allows you to analyze new sets of data and uncover new business possibilities to drive your organization forward.
HDInsight has powerful programming extensions for languages including, C#, Java, .NET, and more. You can use your programming language of choice on Hadoop to create, configure, submit, and monitor Hadoop jobs.
With HDInsight, you can deploy Hadoop in the cloud without buying new hardware or incurring other up-front costs. There’s also no time-consuming installation or set up. Azure does it for you. You can launch your first cluster in minutes.
Because it’s integrated with Excel, HDInsight lets you visualize and analyze your Hadoop data in compelling new ways using a tool that’s familiar to your business users. From Excel, users can select HDInsight as a data source.
HDInsight is also integrated with Hortonworks Data Platform, so you can move Hadoop data from an on-site datacenter to the Azure cloud for backup, Dev/Test, and cloud-bursting scenarios. Using the Microsoft Analytics Platform System, you can even query your on-premises and cloud-based Hadoop clusters at the same time.
This Hadoop ecosystem is a portfolio of fast-moving open source projects that are evolving quickly. To give customers flexibility, HDInsight gives you the option to deploy arbitrary Hadoop projects through custom scripts. This includes popular projects like Spark, R, Giraph, and Solr.
HDInsight also includes Apache HBase, a columnar NoSQL database that runs on top of the Hadoop Distributed File System (HDFS). This allows you to do large transactional processing (OLTP) of non-relational data, enabling use cases like having interactive websites or sensor data write to Azure Blob storage.
HDInsight includes Apache Storm, an open-source stream analytics platform that can process real-time events at large scale. This allows you to do processing on millions of events as they are generated enabling use cases like Internet of Things (IoT) and gaining insights from your connected devices or web-triggered events. We make deploying and implementing Storm easier.
Select Linux or Windows clusters when deploying big data workloads into Azure. With Windows, leverage existing Windows-based code, including .NET, to scale over all of your data in Azure. With Linux, you can more easily move existing Hadoop workloads into the cloud and incorporate additional big data components that can run in the service. By offering both Windows and Linux clusters, Microsoft enhances flexibility for customers to gain insights from the massive amounts of data being created in the cloud with the operating system of your choice.