• Announcing Microsoft Research Open Data – Datasets by Microsoft Research now available in the cloud


    The Microsoft Research Outreach team has worked extensively with the external research community to enable adoption of cloud-based research infrastructure over the past few years. Through this process, we experienced the ubiquity of Jim Gray’s fourth paradigm of discovery based on data-intensive science – that is, almost all research projects have a data component to them. This data deluge also demonstrated a clear need for curated and meaningful datasets in the research community, not only in computer science but also in interdisciplinary and domain sciences.

    Today we are excited to launch Microsoft Research Open Data – a new data repository in the cloud dedicated to facilitating collaboration across the global research community. Microsoft Research Open Data, in a single, convenient, cloud-hosted location, offers datasets representing many years of data curation and research efforts by Microsoft that were used in published research studies.

    Why we are investing in this

    The goal is to provide a simple platform to Microsoft researchers and collaborators to share datasets and related research technologies and tools. Microsoft Research Open Data is designed to simplify access to these datasets, facilitate collaboration between researchers using cloud-based resources and enable reproducibility of research. We will continue to shape and grow this repository and add features based on feedback from the community.
    We recognize that there are dozens of data repositories already in use by researchers and expect that the capabilities of this repository will augment existing efforts.

    Figure 1 – Dataset in Microsoft Research Open Data

    “This is a game changer for the big data community. Initiatives like Microsoft Research Open Data reduce barriers to data sharing and encourage reproducibility by leveraging the power of cloud computing”
    -Sam Madden, Professor, Massachusetts Institute of Technology

    With data growing at an exponential rate, perceived to be over 150 ZB of data available by 2025, it is now recognized that we need to prioritize bringing processing to data versus relying on data movement through Internet bandwidth that is growing at a much slower pace. We believe that there is real utility in providing an option to bring the processing to the data. Therefore, in addition to providing an option to download the data assets, users can also copy datasets directly to an Azure based Data Science virtual machine, as shown in Figure 2.

    Figure 2 – Data copied from microsoftopendata.com to an Azure based Linux virtual machine

    The Data Science virtual machine comes preloaded with a variety of development tools popular with researchers and practitioners as can been seen in Figure 3.

    Figure 3 Linux Data Science virtual machine

    “I am often asked to share my research data and the public sharing I have done in the past has been popular. Coordinating and cataloging these datasets in one place with Azure will be helpful for both internal and external researchers, giving them easy access, encouraging collaboration, and providing convenient cloud-based access to the wealth of Microsoft Research shared data.” 
    -John Krumm, Principal Researcher, Microsoft Research AI

    Datasets in Microsoft Research Open Data are categorized by their primary research area, as shown in Figure 4. You can find links to research projects or publications with the dataset. You can browse available datasets and download them or copy them directly to an Azure subscription through an automated workflow. To the extent possible, the repository meets the highest standards for data sharing to ensure that datasets are findable, accessible, interoperable and reusable; the entire corpus does not contain personally identifiable information. The site will continue to evolve as we get feedback from users.

    Figure 4 – Dataset Categories

    Microsoft Research Open Data is an outcome of the Microsoft Research Outreach Data science program and was made possible by a collaboration between many teams at Microsoft, Microsoft researchers, our industry partners, and our academic advisors.

    We would love to hear your comments and feedback! Please send us a note via the Feedback feature on the sitehttp://microsoftopendata.com and tell us what you think.

  • 相关阅读:
    linux下动态链接库.so文件 静态链接库.a文件创建及使用
    matlab 自动阈值白平衡算法 程序可编译实现
    C++ 迭代器介绍 [转摘]
    C++ Primer 第三章 标准库类型vector+迭代器iterator 运算
    matlab灰度变彩色+白平衡算法实现
    我和奇葩的故事之失联第七天
    C++ Primer 第三章 标准库类型string运算
    OpenCV白平衡算法之灰度世界法(消除RGB受光照影响)
    查看网络情况netstat指令与动态监控top指令
    linux服务
  • 原文地址:https://www.cnblogs.com/Javi/p/9227618.html
Copyright © 2020-2023  润新知