BRK3562: Microsoft Azure DocumentDB and Azure HDInsight: Better Together

[Speaker: Andrew Hoh] Unstructured, schema-less data is growing at an exponential rate and running analytics to turn this data in to meaningful business information is becoming time consuming and difficult. Come learn how the DocumentDB team at Microsoft built a solution that marries the ad hoc query capabilities of Azure DocumentDB with the power of HDInsight to efficiently analyze trends in the ever-growing world of Big Data.

Business ValueBreakoutIgnite2015
1.0x

BRK3562: Microsoft Azure DocumentDB and Azure HDInsight: Better Together

Created 2 years ago

Duration 1:06:05
lesson view count 57
[Speaker: Andrew Hoh] Unstructured, schema-less data is growing at an exponential rate and running analytics to turn this data in to meaningful business information is becoming time consuming and difficult. Come learn how the DocumentDB team at Microsoft built a solution that marries the ad hoc query capabilities of Azure DocumentDB with the power of HDInsight to efficiently analyze trends in the ever-growing world of Big Data.
Select the file type you wish to download
Slide Content
  1. Microsoft Azure DocumentDB and Azure HDInsight: Better Together

    Slide 1 - Microsoft Azure DocumentDB and Azure HDInsight: Better Together

    • Andrew Hoh
    • Azure DocumentDB
    • BRK3562
  2. In this session:

    Slide 2 - In this session:

    • Introduction to Azure DocumentDB
    • Introduction to Azure HDInsight
    • Using Hadoop and DocumentDB
    • Using Storm and DocumentDB
  3. Introducing DocumentDB

    Slide 3 - Introducing DocumentDB

    • DocumentDB – Azure’s NoSQL document database-as-a-service
    • What is NoSQL?
    • What is a Document Database?
  4. Introducing DocumentDB

    Slide 4 - Introducing DocumentDB

    • No
    • Yes!
    • {
    • "name": "John",
    • "country": "Canada",
    • "age": 43,
    • "lastUse": "March 4, 2014"
    • }
  5. DocumentDB Overview

    Slide 5 - DocumentDB Overview

    • Application
    • query
    • Collection
    • Document 1
    • Document 2
    • Document 3
    • Document 4
    • DocumentDB
    • {
    • "name": "John",
    • "country": "Canada",
    • "age": 43,
    • "lastUse": "March 4, 2014"
    • }
    • {
    • "name": “Andrew",
    • "country": “America",
    • "age": 22,
    • "firstUse": “June 17, 2014"
    • }
    • {
    • "docCount": 3,
    • "last": "May 1, 2014"
    • }
    • {
    • "name": "Eva",
    • "country": "Germany",
    • "age": 25
    • }
    • JSON
  6. DocumentDB Overview

    Slide 6 - DocumentDB Overview

    • flexible schema
    • and
    • queryable
    • { }
    • SQL
    • JS
    • multi-document
    • transactions
    • tunable and fast
    • scalable
    • and
    • fully managed
  7. Demo Overview

    Slide 7 - Demo Overview

    • Azure Website
    • Interact with website
    • Azure DocumentDB
    • Chicago Food Inspections
    • User Profiles & Comments
    • Customer
    • This sample dataset has been modified for use from its original source, www.cityofchicago.org, the official website of the City of Chicago.
    • The City of Chicago makes no claims as to the content, accuracy, timeliness, or completeness of any of the data provided at this site.  The data provided at this site is subject to change at any time. 
    • It is understood that the data provided at this site is being used at one’s own risk.
  8. Simple web app

    Slide 8 - Simple web app

    • Andrew Hoh
  9. About Hadoop

    Slide 9 - About Hadoop

    • Data Volumes
    • Data Variety
    • Data Velocity
    • Apache Open Source Project
    • Hadoop distributed file system (HDFS)
    • Distributed processing on data nodes
  10. Introducing HDInsight

    Slide 10 - Introducing HDInsight

    • scalable
    • and
    • fully managed
    • flexible – structured,
    • semi-structured, unstructured
    • SQL
    • customizable
  11. Introducing HDInsight

    Slide 11 - Introducing HDInsight

    • Microsoft’s cloud Hadoop offering
    • 100% open source Apache Hadoop
    • Built on the latest releases across Hadoop
    • Up and running in minutes with no hardware to deploy
    • Harness existing .NET and Java skills to write MapReduce
    • Utilize familiar BI tools for analysis including Microsoft Excel
  12. DocumentDB + HDInsight

    Slide 12 - DocumentDB + HDInsight

    • Use Case 1:
    • Batch processing of data across multiple DocumentDB collections
    • Use Case 2:
    • Storing Hadoop job results in DocumentDB for easy integration with their web application
    • Simplicity in DocumentDB’s learning curve compared to alternative Apache HBase
    • asurion
  13. DocumentDB + HDInsight

    Slide 13 - DocumentDB + HDInsight

    • Open sourced on GitHub
    • https://github.com/Azure/azure-documentdb-hadoop
    • Support for Hive, Pig and MapReduce
    • Easy deployment through script action
  14. Script action

    Slide 14 - Script action

    • Andrew Hoh
  15. DocumentDB -> HDInsight

    Slide 15 - DocumentDB -> HDInsight

    • Boost Hadoop job performance by pushing predicates down to DocumentDB
    • Run complex aggregations across DocumentDB collections, databases and even database accounts
    • Combine DocumentDB data with other data sources
  16. Demo overview

    Slide 16 - Demo overview

    • DocumentDB
    • Chicago Food Inspections
    • HDInsight Hadoop
    • Hive queries against Food Inspections
  17. Online hive editor

    Slide 17 - Online hive editor

    • Andrew Hoh
  18. HDInsight -> DocumentDB

    Slide 18 - HDInsight -> DocumentDB

    • Automatically index analytic results with flexible schemas
    • Query against resulting analytics in real-time
    • Leverage multiple SDKs (.NET, Java, Python, Node.js and Javascript) for rapid application development
  19. Demo overview

    Slide 19 - Demo overview

    • Azure Website
    • Interact with website
    • Customer
    • HDInsight Hadoop
    • Hive queries against Food Inspections
    • Azure DocumentDB
    • Chicago Food Inspections
    • User Profiles & Comments
    • Data Analytics
  20. Web app with analysis

    Slide 20 - Web app with analysis

    • Andrew Hoh
  21. About Storm

    Slide 21 - About Storm

    • Apache Open Source Project
    • Streaming data analysis
    • Distributed, fault tolerant, real-time processing on data nodes
    • TM
  22. HDInsight -> DocumentDB

    Slide 22 - HDInsight -> DocumentDB

    • Automatically index and store stream analytic results
    • Query against analytics in real-time
    • Leverage multiple SDKs (.NET, Java, Python, Node.js and Javascript) for rapid application development
    • TM
  23. Demo overview

    Slide 23 - Demo overview

    • HDInsight Storm
    • Azure Website
    • Interact with website
    • Customer
    • HDInsight Hadoop
    • Hive queries against Food Inspections
    • Azure DocumentDB
    • Chicago Food Inspections
    • User Profiles & Comments
    • Data Analytics
    • TM
  24. Final web app

    Slide 24 - Final web app

    • Andrew Hoh
  25. Additional Resources

    Slide 25 - Additional Resources

    • Tutorial for DocumentDB + HDInsight
    • http://aka.ms/documentdb-hdinsight
    • DocumentDB + Storm GitHub Repo http://aka.ms/documentdb-storm
    • Get Started with DocumentDB
    • http://aka.ms/docdbstart
    • Get Started with HDInsight
    • http://aka.ms/hdinsightstart
  26. Additional Resources

    Slide 26 - Additional Resources

    • Tutorial for DocumentDB + HDInsight
    • http://aka.ms/documentdb-hdinsight
    • DocumentDB + Storm GitHub Repo http://aka.ms/documentdb-storm
    • Get Started with DocumentDB
    • http://aka.ms/docdbstart
    • Get Started with HDInsight
    • http://aka.ms/hdinsightstart
  27. Additional Resources

    Slide 27 - Additional Resources

    • Tutorial for DocumentDB + HDInsight
    • http://aka.ms/documentdb-hdinsight
    • DocumentDB + Storm GitHub Repo http://aka.ms/documentdb-storm
    • Get Started with DocumentDB
    • http://aka.ms/docdbstart
    • Get Started with HDInsight
    • http://aka.ms/hdinsightstart
  28. Slide 28

    • Free Resources for DevOps Practices
    • Optimize your DevOps practices & tools:
    • Get started on your DevOps journey: aka.ms/devops
    • Accelerate your application delivery lifecycle
    • Download the Forrester Infrastructure-as-Code whitepaper:
    • Complexity kills. Automate with Infra as code: aka.ms/iac_tlp
    • Technical resources for Practitioners:
    • Get access to free online training, evals and HOLs: aka.ms/devopsmva
    • Join the Community conversations:
    • Use #TalkDevOps on Twitter
  29. Slide 29

    • Ignite Azure Challenge Sweepstakes
    • Attend Azure sessions and activities, track your progress online, win raffle tickets for great prizes!
    • Aka.ms/MyAzureChallenge
    • Enter this session code online: BRK3562
    • NO PURCHASE NECESSARY. Open only to event attendees. Winners must be present to win. Game ends May 9th, 2015. For Official Rules, see The Cloud and Enterprise Lounge or myignite.com/challenge
  30. Thank you!

    Slide 30 - Thank you!