2-748: Vision APIs: Understanding Images in Your App

Vision APIs: Understanding Images in Your App Vision APIs enable you to build the most compelling app experience by providing ways to present the media in best form: • Use Thumbnail, Color Detection to help you present images in best form • Use Categorization to organize your content • Restrict\filter suggestive content appropriate for your target audience • Use OCR to convert text from images in machine-usable character stream

1.0x

2-748: Vision APIs: Understanding Images in Your App

Created 2 years ago

Duration 0:47:29
lesson view count 49
Vision APIs: Understanding Images in Your App Vision APIs enable you to build the most compelling app experience by providing ways to present the media in best form: • Use Thumbnail, Color Detection to help you present images in best form • Use Categorization to organize your content • Restrict\filter suggestive content appropriate for your target audience • Use OCR to convert text from images in machine-usable character stream
Select the file type you wish to download
Slide Content
  1. Challenges of Modern App

    Slide 3 - Challenges of Modern App

    • Demo and Details of API
    • Analyze Image
    • OCR (Optical Character Recognition)
    • Generate Thumbnail
    • Onboarding
    • Wrap up + Q&A
    • Agenda
  2. Image is the “Hero”

    Slide 4 - Image is the “Hero”

    • E.g Whatsapp, Pinterest, Instagram, Houzz, Snapchat etc.
    • Crowd-sourced content
    • Is the image clean and safe ?
    • What is the image about ?
    • How do I present it beautifully ?
    • Real-time and High Volume
    • Need to instantly process an uploaded image or a photograph taken.
    • Time to Market
    • Image processing is expensive.
    • Dev can focus on core value-add.
    • Challenges of Modern App
  3. Best of Bing and MSR as an Azure service

    Slide 5 - Best of Bing and MSR as an Azure service

    • New cross-platform service for Image processing, OCR and Thumbnail generation. Retrieval Challenge 2013}
    • Big Data
    • Computer Vision
    • Deep Learning
  4. PicHit.me: Startup that connects billions of people in need of photos with the billions of people with a camera.

    Slide 6 - PicHit.me: Startup that connects billions of people in need of photos with the billions of people with a camera.

    • Analyze Image : Customer Scenarios
  5. PicHit.Me needs a way for

    Slide 7 - PicHit.Me needs a way for

    • building an enhanced search experience.
    • categorizing images for easy browsing.
    • choosing an accent color to beautify the site with contrasting colors.
    • Flagging inappropriate submissions for review.
    • Other Customer Scenarios
    • A sports team allows fans to upload images during the game. Are they safe and clean to air ?
    • A wildlife app wants to separate animal and human pictures.
    • Analyze Image: Customer Scenarios
  6. PicHit.Me needs a way for

    Slide 8 - PicHit.Me needs a way for

    • building an enhanced search experience.
    • categorizing images for easy browsing.
    • choosing an accent color to beautify the site with contrasting colors.
    • Flagging inappropriate submissions for review.
    • Other Customer Scenarios
    • A sports team allows fans to upload images during the game. Are they safe and clean to air ?
    • A wildlife app wants to separate animal and human pictures.
    • Analyze Image: Customer Scenarios
  7. PicHit.Me needs a way for

    Slide 9 - PicHit.Me needs a way for

    • building an enhanced search experience.
    • categorizing images for easy browsing.
    • choosing an accent color to beautify the site with contrasting colors.
    • Flagging inappropriate submissions for review.
    • Other Customer Scenarios
    • A sports team allows fans to upload images during the game. Are they safe and clean to air ?
    • A wildlife app wants to separate animal and human pictures.
    • Analyze Image: Customer Scenarios
  8. Analyze Image -Category detection

    Slide 10 - Analyze Image -Category detection

  9. Analyze Image – Category detection

    Slide 11 - Analyze Image – Category detection

  10. Analyze Image – Category Detection

    Slide 12 - Analyze Image – Category Detection

    • Model based on 86 Categories trained using Bing data.
  11. Slide 13

    • A variant of this model is used by One Drive and Bing Image Search to enhance search relevance.
    • precision / recall (95.3/73.8) on a 5K image dataset (MIT-Adobe fiveK).
    • 5K photographs taken with SLR cameras by a set of different photographers.
    • Photographs cover a broad range of scenes, subjects, and lighting conditions.
    • Very good at categorizing outdoors, people, animals.
    • Analyze Image – Category detection
  12. Slide 14

    • Analyze Image – Clipart, Line Drawing & B&W
  13. Slide 15

    • Analyze Image – Clipart, Line Drawing & B&W
  14. Slide 16

    • Analyze Image – Clipart, Line Drawing & B&W
  15. Analyze Image –Clipart, Line Drawing, B&W

    Slide 17 - Analyze Image –Clipart, Line Drawing, B&W

  16. Clipart Model – used to identify graphics art.

    Slide 18 - Clipart Model – used to identify graphics art.

    • 0 - non-clipart; 1 – ambiguous; 2 – identified as Clipart; 3 – identified as high-quality Clipart
    • P/R=65.2/98.4
    • Line Drawing – used to identify line art drawings.
    • 0 – Non Line Drawing; 1 – Line Drawing
    • P/R=82.3/98.75
    • B&W – used to identify Black & White Images.
    • 0 – Color images; 1 – Black & White images.
    • P/R= 83.6/98.2
    • Analyze Image – Clipart, Line Drawing and B&W
  17. Accent Color – Which Border Color is best ?

    Slide 19 - Accent Color – Which Border Color is best ?

  18. Accent Color in Windows 8.1

    Slide 20 - Accent Color in Windows 8.1

  19. Slide 21

    • Accent Color in Windows 8.1
  20. Slide 22

    • Accent Color in Windows 8.1
  21. Dominant Color is the prominent color in the Image

    Slide 23 - Dominant Color is the prominent color in the Image

    • Service Identifies the overall dominant colors in the image; the dominant foreground color and dominant background color.
    • Colors are specifically grouped into 12 color names – black, blue, brown, grey, green, orange, pink, purple, red, white, yellow and teal.
    • This color can be used to build filtering or ranking solutions e.g validate the color of the shopping product with its text.
    • Accent Color
    • It is the most saturated, contrasting or popping color in the image, e.g., color of dress, eyes, lips, background, etc.
    • Best effort to avoid skin tone colors.
    • A default gray color is used if no accent color could be found.
    • This Color can be used to complement the image e.g background color while actual image is loading.
    • Analyze Image – Dominant & Accent Color
  22. Analyze Image- Accent Color

    Slide 24 - Analyze Image- Accent Color

  23. Slide 25

    • Adult & Racy Detection
    • Adult – Rated X or Rated R.
    • Racy – Highly Suggestive images (Rated PG) e.g bikini, thick body paint etc.
    • Used for Bing Safe Search.
    • Analyze Image – Adult & Racy
  24. Models built with years of Bing training data & DNN.

    Slide 26 - Models built with years of Bing training data & DNN.

    • Flexible thresholds for developers.
    • Analyze Image – Adult & Racy
    • Default Operating Point
    • 2. Higher recall.
    • Can use human judgments
    • to remove false positives.
    • 3. Higher precision
    • If you can afford
    • some leakage
    • Adult P/R Curve on 90K public set
  25. Case Study: Private Image Application with no text signals

    Slide 27 - Case Study: Private Image Application with no text signals

    • 10K image being processed daily.
    • 2.6% are adult = 260.
    • Business Requirement: Catch most Adult Images within the judgment budget.
    • Budget: cant judge all 10K; but can afford to judge up-to 400 images per day.
    • Choose an operating point with higher recall.
    • 355 images will be identified as adult by the Vision Service.
    • Judge 355 images manually daily and catch 208 out of 260 images.
    • To reduce judge budget further, highly recommend to use other signals like text, user history, context etc to improve the overall precision & recall.
    • Analyze Image – Adult & Racy
  26. How does it comes together in Bing ?

    Slide 28 - How does it comes together in Bing ?

    • Analyze Image – Adult & Racy
    • Vision API for Adult/Racy
    • Page Classifier
    • Bing Meta Model for Filtering Adult/Racy Documents
    • Adult Query Classifier
    • Filtering for Strict /Moderate Mode
    • Online Model
    • Human Judgments
    • Bing Clicks
    • Bing Clicks
  27. Analyze Image- FaceDetection

    Slide 29 - Analyze Image- FaceDetection

  28. High Precision face location – upto 64 human faces in an image.

    Slide 30 - High Precision face location – upto 64 human faces in an image.

    • Following info is extracted for each face:
    • Rectangle indicating the location of the face in pixels.
    • Gender and Age of the Face.
    • Scenarios
    • Face Verification and Face Searching Algorithms. (separate Face module available).
    • Bing uses face detection, age and gender to improve relevance of queries like young Madonna.
    • Face detection is used for appropriate thumbnail generation on many Microsoft properties.
    • Analyze Image – Face Detection
  29. High Precision face location – upto 64 human faces in an image.

    Slide 31 - High Precision face location – upto 64 human faces in an image.

    • Following info is extracted for each face:
    • Rectangle indicating the location of the face in pixels.
    • Gender and Age of the Face.
    • Scenarios
    • Face Verification and Face Searching Algorithms. (separate Face module available).
    • Bing uses face detection, age and gender to improve relevance of queries like young Madonna.
    • Face detection is used for appropriate thumbnail generation on many Microsoft properties.
    • Analyze Image – Face Detection
  30. High Precision face location – upto 64 human faces in an image.

    Slide 32 - High Precision face location – upto 64 human faces in an image.

    • Following info is extracted for each face:
    • Rectangle indicating the location of the face in pixels.
    • Gender and Age of the Face.
    • Scenarios
    • Face Verification and Face Searching Algorithms. (separate Face module available).
    • Bing uses face detection, age and gender to improve relevance of queries like young Madonna.
    • Face detection is used for appropriate thumbnail generation on many Microsoft properties.
    • Analyze Image – Face Detection
  31. High Precision face location – upto 64 human faces in an image.

    Slide 33 - High Precision face location – upto 64 human faces in an image.

    • Following info is extracted for each face:
    • Rectangle indicating the location of the face in pixels.
    • Gender and Age of the Face.
    • Scenarios
    • Face Verification and Face Searching Algorithms. (separate Face module available).
    • Bing uses face detection, age and gender to improve relevance of queries like young Madonna.
    • Face detection is used for appropriate thumbnail generation on many Microsoft properties.
    • Analyze Image – Face Detection
  32. High Precision face location – upto 64 human faces in an image.

    Slide 34 - High Precision face location – upto 64 human faces in an image.

    • Following info is extracted for each face:
    • Rectangle indicating the location of the face in pixels.
    • Gender and Age of the Face.
    • Scenarios
    • Face Verification and Face Searching Algorithms. (separate Face module available).
    • Bing uses face detection, age and gender to improve relevance of queries like young Madonna.
    • Face detection is used for appropriate thumbnail generation on many Microsoft properties.
    • Analyze Image – Face Detection
  33. Deep Analysis of an image & returns multiple visual features:

    Slide 35 - Deep Analysis of an image & returns multiple visual features:

    • adult – determines if image has nudity or is very suggestive.
    • category – categories an image belongs to e.g cat, baby, church etc.
    • color – dominant color, accent color, B&W.
    • imagetype – clipart, line drawing.
    • face – Face rectangle with co-ordinates, gender and age.
    • All – return all features (default).
    • Analyze Image - Summary
  34. OCR: Customer ScenarioAllen Institute for Artificial Intelligence (AI2)

    Slide 36 - OCR: Customer ScenarioAllen Institute for Artificial Intelligence (AI2)

    • AI2 launched: Jan 2014
  35. Slide 37

    • Project Aristo at AI2 contains large amounts of knowledge in machine-computable form that can answer questions, explain those answers, and discuss those answers with users
    • .
    • OCR: Customer Scenario
  36. Aristo: A machine that can answer questions for science exams.

    Slide 38 - Aristo: A machine that can answer questions for science exams.

    • Many of these questions contain diagrams. They need to detect objects, text and their interactions in the image.
    • Project Aristo is integrating with Vision Service.
    • OCR: Customer Scenario
  37. OCR - demo

    Slide 39 - OCR - demo

  38. OCR - demo

    Slide 40 - OCR - demo

  39. OCR - demo

    Slide 41 - OCR - demo

  40. Optical Character Recognition(OCR) Service reads text from images in machine-usable character stream.

    Slide 42 - Optical Character Recognition(OCR) Service reads text from images in machine-usable character stream.

    • Mature tech used in OneDrive, Office Lens, OneNote, Bing & Microsoft Translator.
    • Automatically detects the language.
    • 21 languages supported:
    • Chinese Simplified, Chinese Traditional, Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Spanish, Swedish, Turkish
    • Gets the rotation of the recognized text, in degrees, around the image center.
    • Bounding Box coordinates of each word.
    • OCR Service - Features
  41. Dimension of image between 40x40 to 2600x2600 pixels.

    Slide 43 - Dimension of image between 40x40 to 2600x2600 pixels.

    • All text in the image must have normal orientation with all lines written in the same direction. However the OCR engine can correct rotation up to ±40 degrees.
    • The accuracy of text recognition depends on the quality of the image. An inaccurate reading may be caused by the following:
    • Blurry images.
    • Handwritten or cursive text.
    • Artistic font styles; Small text size.
    • Complex backgrounds; Shadows or glare over text; Perspective distortion; Oversized or dropped capital letters at the beginnings of words; Subscript, superscript, or strikethrough text.
    • On Photos where text is dominant, FPs come from partially recognized words.
    • On random photos (especially photos without any text) precision can vary a lot depending on type of images.
    • OCR Service – Best Practices
  42. Dimension of image between 40x40 to 2600x2600 pixels.

    Slide 44 - Dimension of image between 40x40 to 2600x2600 pixels.

    • All text in the image must have normal orientation with all lines written in the same direction. However the OCR engine can correct rotation up to ±40 degrees.
    • The accuracy of text recognition depends on the quality of the image. An inaccurate reading may be caused by the following:
    • Blurry images.
    • Handwritten or cursive text.
    • Artistic font styles; Small text size.
    • Complex backgrounds; Shadows or glare over text; Perspective distortion; Oversized or dropped capital letters at the beginnings of words; Subscript, superscript, or strikethrough text.
    • On Photos where text is dominant, FPs come from partially recognized words.
    • On random photos (especially photos without any text) precision can vary a lot depending on type of images.
    • OCR Service – Best Practices
  43. Dimension of image between 40x40 to 2600x2600 pixels.

    Slide 45 - Dimension of image between 40x40 to 2600x2600 pixels.

    • All text in the image must have normal orientation with all lines written in the same direction. However the OCR engine can correct rotation up to ±40 degrees.
    • The accuracy of text recognition depends on the quality of the image. An inaccurate reading may be caused by the following:
    • Blurry images.
    • Handwritten or cursive text.
    • Artistic font styles; Small text size.
    • Complex backgrounds; Shadows or glare over text; Perspective distortion; Oversized or dropped capital letters at the beginnings of words; Subscript, superscript, or strikethrough text.
    • On Photos where text is dominant, FPs come from partially recognized words.
    • On random photos (especially photos without any text) precision can vary a lot depending on type of images.
    • OCR Service – Best Practices
  44. Input:

    Slide 46 - Input:

    • Image – that needs text to be extracted from.
    • Language – Auto Detect or particular language.
    • Orientation – if true, detects orientation before processing.
    • Output:
    • Language, Text Angle and Orientation.
    • Bounding Box for each line and word.
    • Text for each word.
    • OCR Service - Summary
  45. Thumbnail is a small representation of the original image.

    Slide 47 - Thumbnail is a small representation of the original image.

    • Varied devices (Phone, tablets, PC) create a need for different UX layouts and thumbnail size needs.
    • The thumbnail endpoint takes any image and:
    • Removes distracting elements from the image and recognizes the main object.
    • Crops the image based on identified “Region of Interest”.
    • Changes the aspect ratio to fit the target thumbnail dimensions.
    • Thumbnail Service
  46. Thumbnail is a small representation of the original image.

    Slide 48 - Thumbnail is a small representation of the original image.

    • Varied devices (Phone, tablets, PC) create a need for different UX layouts and thumbnail size needs.
    • The thumbnail endpoint takes any image and:
    • Removes distracting elements from the image and recognizes the main object.
    • Crops the image based on identified “Region of Interest”.
    • Changes the aspect ratio to fit the target thumbnail dimensions.
    • Thumbnail Service
  47. Thumbnail is a small representation of the original image.

    Slide 49 - Thumbnail is a small representation of the original image.

    • Varied devices (Phone, tablets, PC) create a need for different UX layouts and thumbnail size needs.
    • The thumbnail endpoint takes any image and:
    • Removes distracting elements from the image and recognizes the main object.
    • Crops the image based on identified “Region of Interest”.
    • Changes the aspect ratio to fit the target thumbnail dimensions.
    • Thumbnail Service
  48. Thumbnail is a small representation of the original image.

    Slide 50 - Thumbnail is a small representation of the original image.

    • Varied devices (Phone, tablets, PC) create a need for different UX layouts and thumbnail size needs.
    • The thumbnail endpoint takes any image and:
    • Removes distracting elements from the image and recognizes the main object.
    • Crops the image based on identified “Region of Interest”.
    • Changes the aspect ratio to fit the target thumbnail dimensions.
    • Thumbnail Service
  49. Input:

    Slide 51 - Input:

    • Image (minimum 50x50)
    • Dimensions of Target Container
    • Width of Image (between 1 and 1024)
    • Height of Image (between 1 and 1024)
    • Enable Smart Cropping (Boolean flag)
    • Output:
    • Thumbnail Image
    • Thumbnail API
  50. Onboarding using .Net SDK

    Slide 52 - Onboarding using .Net SDK

    • // “analysisResult” Sample
  51. Oxford provides industry leading image processing:

    Slide 53 - Oxford provides industry leading image processing:

    • Adult & Racy Image detection.
    • Face, Gender & Age of People.
    • Dominant & Accent Color.
    • Category & Image Type.
    • OCR with Auto Detection.
    • Thumbnail generation with smart cropping.
    • Best of Microsoft Research & Bing
    • Always evolving & improving.
    • Hosted on Azure as a cloud service:
    • Cross-platform & easy integration.
    • Conclusion
  52. Slide 54

    • is the industry’s intelligence engine
    • Across Microsoft
    • Across Devices
    • With the industry
    • #BingSolutions
  53. Visit http://www.projectoxford.ai to learn more.

    Slide 55 - Visit http://www.projectoxford.ai to learn more.

    • Check out the Session: “Microsoft Project Oxford: Adding Smart to your applications”.
    • Stop by the Project Oxford Booth on 3rd floor.
    • Give us feedback in our forum.
    • Conclusion: Call to Action
  54. Q&A

    Slide 56 - Q&A