晋江文学城
下一章 上一章  目录  设置

3、The Data Ecosystem and Languages for Data Professionals Summa ...

  •   (一)A data analyst ecosystem includes the infrastructure, software, tools, frameworks, and processes used to gather, clean, analy□□e, and visualize data.
      (二)Based on how well-defined the structure of the data is, data can be categorized as:
      1. Structured Data, that is data which is well organized in formats that can be stored in databases.
      2. Semi-Structured Data, that is data which is partially organized and partially free form.
      3. Unstructured Data, that is data which can not be organized conventionally into rows and columns.
      (三)Data comes in a wide-ranging variety of file formats, such as delimited text files, spreadsheets, XML, PDF, and JSON, each with its own list of benefits and limitations of use.
      (四)Data is extracted from multiple data sources, ranging from relational and non-relational databases to APIs, web services, data streams, social platforms, and sensor devices.
      (五)Once the data is identified and gathered from different sources, it needs to be staged in a data repository so that it can be prepared for analysis. The type, format, and sources of data influence the type of data repository that can be used.
      (六)Data professionals need a host of languages that can help them extract, prepare, and analyze data. These can be classified as:
      1. Querying languages, such as SQL, used for accessing and manipulating data from databases.
      2. Programming languages such as Python, R, and Java, for developing applications and controlling application behavior.
      3. Shell and Scripting languages, such as Unix/Linux Shell, and PowerShell, for automating repetitive operational tasks.

  • 昵称:
  • 评分: 2分|鲜花一捧 1分|一朵小花 0分|交流灌水 0分|别字捉虫 -1分|一块小砖 -2分|砖头一堆
  • 内容:
  •             注:1.评论时输入br/即可换行分段。
  •                 2.发布负分评论消耗的月石并不会给作者。
  •             查看评论规则>>