drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [drill] luocooong commented on a change in pull request #2286: [DOC UPDATE] Translate Tutorial to Chinese 7/14
Date Fri, 06 Aug 2021 02:15:32 GMT

luocooong commented on a change in pull request #2286:
URL: https://github.com/apache/drill/pull/2286#discussion_r683889479



##########
File path: _docs/zh/tutorials/050-analyzing-highly-dynamic-datasets.md
##########
@@ -5,9 +5,9 @@ parent: "教程"
 lang: "zh"
 ---
 
-Today’s data is dynamic and application-driven. The growth of a new era of business applications
driven by industry trends such as web, social, mobile, and Internet of Things are generating
datasets with new data types and new data models. These applications are iterative, and the
associated data models typically are semi-structured, schema-less and constantly evolving.
Semi-structured data models can be complex/nested, schema-less, and capable of having varying
fields in every single row and of constantly evolving as fields get added and removed frequently
to meet business requirements. 

Review comment:
       "并应用驱动" > "并由应用驱动"
   "互联网时代商业软件" > "互联网时代的商业软件"
   "持续变化" > "持续变化的"
   "含有不同" > "包含不同"
   "动态原生数据集" > "动态的原生数据集"

##########
File path: _docs/zh/tutorials/050-analyzing-highly-dynamic-datasets.md
##########
@@ -179,4 +180,4 @@ On the output of flattened data, you use standard SQL functionality such
as filt
 ----------
 
 ## Summary

Review comment:
       "Summary" > "小结" or "总结"

##########
File path: _docs/zh/tutorials/060-analyzing-social-media.md
##########
@@ -5,205 +5,200 @@ parent: "教程"
 lang: "zh"
 ---
 
-This tutorial covers how to analyze Twitter data in native JSON format using Apache Drill.
First, you configure an environment to stream the Twitter data filtered on keywords and languages
using Apache Flume, and then you analyze the data using Drill. Finally, you run interactive
reports and analysis using MicroStrategy.
+本教程介绍了如何使用 Apache Drill 分析原生 JSON 格式的 Twitter 数据。首先,使用
Apache Flume 处理 Twitter 数据流并过滤关键字和语言类型,然后使用 Drill
分析数据。最后,运行 MicroStrategy 以获得交互式报告和分析。
 
-## Social Media Analysis Prerequisites
+## 社交媒体分析所需准备
 
-* Twitter developer account
-* AWS account
-* A MapR node on AWS
-* A MicroStrategy AWS instance
+* Twitter 开发者账户
+* 亚马逊云服务账户
+* 亚马逊云服务中加载一个 MapR 节点
+* 亚马逊云服务中加载一个 MicroStrategy 实例
 
-## Configuring the AWS environment
+## 配置亚马逊云服务环境
 
-Configuring the environment on Amazon Web Services (AWS) consists of these tasks:
+在亚马逊云服务 (AWS) 上配置环境包括以下任务:
 
-* Create a Twitter Dev account and register a Twitter application  
-* Provision a preconfigured AWS MapR node with Flume and Drill  
-* Provision a MicroStrategy AWS instance  
-* Configure MicroStrategy to run reports and analyses using Drill  
-* Create a Twitter Dev account and register an application
+* 创建一个 Twitter 开发者账户并注册一个 Twitter 应用程序  
+* 在开启的 AWS MapR 节点中配置 Flume 和 Drill
+* 在 AWS 虚拟机中配置 MicroStrategy
+* 配置 MicroStrategy 来使用 Drill 运行报告和分析  
 
-This tutorial assumes you are familiar with MicroStrategy. For information about using MicroStrategy,
see the [MicroStrategy documentation](http://www.microstrategy.com/Strategy/media/downloads/products/cloud/cloud_aws-user-guide.pdf).
+本教程假设你已熟悉 MicroStrategy。有关使用 MicroStrategy 的信息,请参考
[MicroStrategy documentation](http://www.microstrategy.com/Strategy/media/downloads/products/cloud/cloud_aws-user-guide.pdf)。
 
 ----------
 
-## Establishing a Twitter Feed and Flume Credentials
+## 订阅 Twitter 新消息并建立 Flume 证书

Review comment:
       "证书" > "凭证"

##########
File path: _docs/zh/tutorials/050-analyzing-highly-dynamic-datasets.md
##########
@@ -58,13 +58,13 @@ First, let’s take a look at the dataset:
     | {"6-6":2,"6-5":1,"7-6":1,"7-5":1,"8-5":2,"10-5":1,"9-3":1,"12-5":1,"15-3":1,"15-5":1,"15-6":1,"16-3":1,"10-0":1,"15-4":1,"10-4":1,"8-2":1}
                                                                                         
    | checkin    | uGykseHzyS5xAMWoN6YUqA |
     |------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|------------------------|
 
-{% include startnote.html %}This document aligns Drill output for example purposes. Drill
output is not aligned in this case.{% include endnote.html %}
+{% include startnote.html %}本文档为了展示方便对齐了 Drill 的输出。实际上
Drill 的输出不会这样对齐。{% include endnote.html %}
 
-You query the data in JSON files directly. Schema definitions in Hive store are not necessary.
The names of the elements within the `checkin_info` column are different between the first
and second row.
+你可以直接查询 JSON 文件中的数据。 不必给 Hive 中的数据定义 schema。`checkin_info`
列中,第一行和第二行之间的元素名称是不同的。
 
-Drill provides a function called KVGEN (Key Value Generator) which is useful when working
with complex data that contains arbitrary maps consisting of dynamic and unknown element names
such as checkin_info. KVGEN turns the dynamic map into an array of key-value pairs where keys
represent the dynamic element names.
+Drill 提供了一个名为 KVGEN(键值生成器)的函数,该函数在处理包含由动态和未知元素名称(例如
checkin_info)组成的任意映射的复杂数据时非常有用。 KVGEN 将动态映射转换为键值对数组,其中键表示动态元素名称。

Review comment:
       "在处理包含由动态和" > "在处理由动态和"

##########
File path: _docs/zh/tutorials/050-analyzing-highly-dynamic-datasets.md
##########
@@ -5,9 +5,9 @@ parent: "教程"
 lang: "zh"
 ---
 
-Today’s data is dynamic and application-driven. The growth of a new era of business applications
driven by industry trends such as web, social, mobile, and Internet of Things are generating
datasets with new data types and new data models. These applications are iterative, and the
associated data models typically are semi-structured, schema-less and constantly evolving.
Semi-structured data models can be complex/nested, schema-less, and capable of having varying
fields in every single row and of constantly evolving as fields get added and removed frequently
to meet business requirements. 
+大数据是动态的并应用驱动的。互联网时代商业软件的发展由不同的产业端所驱动,如网页端,媒体端,移动端,物联网。他们所生成的数据集,包含了新的数据类型和模型。这些应用都是交互式的,他们所关联的数据模型一般都是半结构化,schema-less,以及持续变化。半结构化数据模型可以是复杂/嵌套或
schema-less,并且能够在每一行中含有不同的字段,为满足业务需求,字段会频繁修改。
 
-This tutorial shows you how to natively query dynamic datasets, such as JSON, and derive
insights from any type of data in minutes. The dataset used in the example is from the Yelp
check-ins dataset, which has the following structure:
+本教程将向你展示如何查询动态原生数据集,例如 JSON,并在几分钟内从任意类型的数据中获得有效信息。示例中使用的数据集来自
Yelp 签到数据集,其结构如下:

Review comment:
       "动态原生数据集" > "动态的原生数据集"

##########
File path: _docs/zh/tutorials/050-analyzing-highly-dynamic-datasets.md
##########
@@ -23,32 +23,32 @@ This tutorial shows you how to natively query dynamic datasets, such as
JSON, an
         }, # if there was no checkin for a hour-day block it will not be in the dataset
     }
 
-It is worth repeating the comment at the bottom of this snippet:
+请特别注意此段代码底部的注释:
 
        If there was no checkin for a hour-day block it will not be in the dataset. 
 
-The element names that you see in the `checkin_info` are unknown upfront and can vary for
every row. The data, although simple, is highly dynamic data. To analyze the data there is
no need to first represent this dataset in a flattened relational structure, as you would
using any other SQL on Hadoop technology.
+你在 `checkin_info` 中看到的元素名称预先未知,并且每行都可能不同。数据虽然简单,但却是高度动态的数据。想要分析数据,无需在
Hadoop 平台上那样,需要先以扁平结构表示数据集,然后才能使用 SQL
类工具。

Review comment:
       "预先未知" > "事先是未知的"

##########
File path: _docs/zh/tutorials/050-analyzing-highly-dynamic-datasets.md
##########
@@ -5,9 +5,9 @@ parent: "教程"
 lang: "zh"
 ---
 
-Today’s data is dynamic and application-driven. The growth of a new era of business applications
driven by industry trends such as web, social, mobile, and Internet of Things are generating
datasets with new data types and new data models. These applications are iterative, and the
associated data models typically are semi-structured, schema-less and constantly evolving.
Semi-structured data models can be complex/nested, schema-less, and capable of having varying
fields in every single row and of constantly evolving as fields get added and removed frequently
to meet business requirements. 
+大数据是动态的并应用驱动的。互联网时代商业软件的发展由不同的产业端所驱动,如网页端,媒体端,移动端,物联网。他们所生成的数据集,包含了新的数据类型和模型。这些应用都是交互式的,他们所关联的数据模型一般都是半结构化,schema-less,以及持续变化。半结构化数据模型可以是复杂/嵌套或
schema-less,并且能够在每一行中含有不同的字段,为满足业务需求,字段会频繁修改。

Review comment:
       "并应用驱动" > "并由应用驱动"
   "互联网时代商业软件" > "互联网时代的商业软件"
   "持续变化" > "持续变化的"
   "含有不同" > "包含不同"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



Mime
View raw message