Fuchsia 6月 29,2019

Fuchsia 不是 Linux

A modular, capability-based operating system

This document is a collection of articles describing the Fuchsia operating system, organized around particular subsystems. Sections will be populated over time.

Zircon Kernel

Zircon is the microkernel underlying the rest of Fuchsia. Zircon also provides core drivers and Fuchsia’s libc implementation.

Zircon Core

Framework

Core Libraries
Application model
- Interface definition language (FIDL)
- Services
- Environments
Boot sequence
Device, user, and story runners
Components
Namespaces
Sandboxing
Story
Module
Agent

Storage

Block devices
File systems
Directory hierarchy
Ledger
Document store
Application cache

Networking

Ethernet
Wireless
Bluetooth
Telephony
Sockets
HTTP

Graphics

Components

Component framework

Media

Audio
Video
DRM

Intelligence

Context
Agent Framework
Suggestions

User interface

Device, user, and story shells
Stories and modules

Backwards compatibility

POSIX lite (what subset of POSIX we support and why)
Web runtime

Update and recovery

Verified boot
Updater

更多知识参考

作者 east

Fuchsia 6月 29,2019

开发

his document is a top-level entry point to all of Fuchsia documentation related to developing Fuchsia and software running on Fuchsia.

Developer workflow

This sections describes the workflows and tools for building, running, testing and debugging Fuchsia and programs running on Fuchsia.

Getting started – start here. This document covers getting the source, building and running Fuchsia.
Source code
fx workflows
Multiple device setup
Pushing a package
Changes that span layers
Debugging
LibFuzzer-based fuzzing
Build system
Workflow tips and FAQ
Testing FAQ

Languages

README – Language usage in Fuchsia
C/C++
Dart
FIDL
Go
Rust
Python
Flutter modules – how to write a graphical module using Flutter
New language – how to bring a new language to Fuchsia

API

README – Developing APIs for Fuchsia
Council – Definition of the API council
System – Rubric for designing the Zircon System Interface
FIDL API – Rubric for designing FIDL protocols
FIDL style – FIDL style rubric
C – Rubric for designing C library interfaces
Tools – Rubrics for designing developer tools
Devices – Rubric for designing device interfaces

ABI

System – Describes scope of the binary-stable Fuchsia System Interface

SDK

SDK – information about developing the Fuchsia SDK

Hardware

This section covers Fuchsia development hardware targets.

Testing

Conventions

This section covers Fuchsia-wide conventions and best practices.

Documentation standards
Endian Issues and recommendations

Tracing

Miscellaneous

作者 east

Fuchsia 6月 29,2019

Fuchsia

Pink + Purple == Fuchsia (a new Operating System)

Welcome to Fuchsia! This document has everything you need to get started with Fuchsia.Note: The Fuchsia source includes Zircon, the core platform that underpins Fuchsia. The Fuchsia build process will build Zircon as a side-effect; to work on Zircon only, read and follow Zircon’s Getting Started doc.

Prerequisites

Prepare your build environment (once per build environment)

Debian

sudo apt-get install build-essential curl git python unzip

macOS

Install Command Line Tools:xcode-select --install
In addition to Command Line Tools, you also need to install a recent version of Xcode.

Get the Source

Follow the instructions to get the Fuchsia source and then return to this document.

Build Fuchsia

Note: A quick overview of the basic build-and-pave workflow can be found here.

Build

If you added .jiri_root/bin to your path as part of getting the source code, the fx command should already be in your path. If not, the command is also available as scripts/fx.

fx set core.x64 --with //bundles:kitchen_sinkfx build

The first command selects the build configuration you wish to build and generates the build system itself in an output directory (e.g., out/x64). Fuchsia can ephemerally download packages over the network; here we use the --available flag to make the necessary packages covered in this guide available for download.

The second command actually executes the build, transforming the source code in build products. If you modify the source tree, you can do an incremental build by re-running the fx build command alone. fx -i build starts a watcher and automatically builds whenever a file is changed.

Alternatively, you can use the underlying build system directly.

[optional] Customize Build Environment

By default you will get a x64 debug build. You can skip this section unless you want something else.

Run fx set to see a list of build options. Some examples:

fx set workstation.x64     # x64 debug buildfx set core.arm64          # arm64 debug buildfx set core.x64 --release  # x64 release build

[optional] Accelerate builds with `ccache` and `goma`

ccache accelerates builds by caching artifacts from previous builds. ccache is enabled automatically if the CCACHE_DIR environment variable is set and refers to a directory that exists.

[Googlers only: goma accelerates builds by distributing compilation across many machines. If you have gomainstalled in ~/goma, it is used by default. It is also used by default in preference to ccache.]

To override the default behaviors, pass flags to fx set:

--ccache     # force use of ccache even if goma is available--no-ccache  # disable use of ccache--no-goma    # disable use of goma

Boot Fuchsia

Installing and booting from hardware

To get Fuchsia running on hardware requires using the paver, which these instructions will help you get up and running with.Note: A quick overview of the basic build-and-pave workflow can be found here.

Boot from QEMU

If you don’t have the supported hardware, you can run Fuchsia under emulation using QEMU. Fuchsia includes prebuilt binaries for QEMU under buildtools/qemu.

The fx run command will launch Zircon within QEMU, using the locally built disk image:

fx run

There are various flags for fx run to control QEMU’s configuration: * -m sets QEMU’s memory size in MB. * -genables graphics (see below). * -N enables networking (see below). * -k enables KVM acceleration on Linux.

Use fx run -h to see all available options.

QEMU tips

ctrl+a x will exit QEMU in text mode.
ctrl+a ? or ctrl+a h prints all supported commands.

Enabling Graphics

Note: Graphics under QEMU are extremely limited due to a lack of Vulkan support. Only the Zircon UI renders.

To enable graphics under QEMU, add the -g flag to fx run:

fx run -g

Enabling Network

First, configure a virtual interface for QEMU’s use.

Once this is done you can add the -N and -u flags to fx run:

fx run -N -u scripts/start-dhcp-server.sh

The -u flag runs a script that sets up a local DHCP server and NAT to configure the IPv4 interface and routing.

Explore Fuchsia

In a separate shell, start the development update server, if it isn’t already running:

fx serve -v

Boot Fuchsia with networking. This can be done either in QEMU via the -N flag, or on a paved hardware, both described above. When Fuchsia has booted and displays the “$” shell prompt, you can run programs!

For example, to receive deep wisdom, run:

fortune

To shutdown or reboot Fuchsia, use the dm command:

dm helpdm shutdown

Change some source

Almost everything that exists on a Fuchsia system is stored in a Fuchsia package. A typical development workflowinvolves re-building and pushing Fuchsia packages to a development device or QEMU virtual device.

Make a change to the rolldice binary in garnet/bin/rolldice/src/main.rs.

Re-build and push the rolldice package to a running Fuchsia device with:

fx build-push rolldice

From a shell prompt on the Fuchsia device, run the updated rolldice component with:

rolldice

Select a tab

Fuchsia shows multiple tabs after booting with graphics enabled. The currently selected tab is highlighted in yellow at the top of the screen. You can switch to the next tab using Alt-Tab on the keyboard.

Tab zero is the console and displays the boot and component log.
Tabs 1, 2 and 3 contain shells.
Tabs 4 and higher contain components you’ve launched.

Note: to select tabs, you may need to enter “console mode”. See the next section for details.

Launch a graphical component

QEMU does not support Vulkan and therefore cannot run our graphics stack.

Most graphical components in Fuchsia use the Scenic system compositor. You can launch such components, commonly found in /system/apps, like this:

launch spinning_square_view

Source code for Scenic example apps is here.

When you launch something that uses Scenic, uses hardware-accelerated graphics, or if you build the defaultpackage (which will boot into the Fuchsia System UI), Fuchsia will enter “graphics mode”, which will not display any of the text shells. In order to use the text shell, you will need to enter “console mode” by pressing Alt-Escape. In console mode, Alt-Tab will have the behavior described in the previous section, and pressing Alt-Escape again will take you back to the graphical shell.

If you would like to use a text shell inside a terminal emulator from within the graphical shell you can launch the termby selecting the “Ask Anything” box and typing moterm.

Running tests

Compiled test binaries are installed in /pkgfs/packages/, and are referenced by a URI. You can run a test by invoking it in the terminal. E.g.

run fuchsia-pkg://fuchsia.com/ledger_tests#meta/ledger_unittests.cmx

If you want to leave Fuchsia running and recompile and re-run a test, run Fuchsia with networking enabled in one terminal, then in another terminal, run:

fx run-test <test name> [<test args>]

You may wish to peruse the testing FAQ.

Contribute changes

See CONTRIBUTING.md.

Additional helpful documents

Fuchsia documentation hub
Working with Zircon – copying files, network booting, log viewing, and more
Documentation Standards – best practices for documentation
Information on the system bootstrap component.
Workflow tips and FAQ that help increase productivity.

作者 east

Fuchsia 6月 29,2019

词汇表

概观

本文档的目的是为Fuchsia中使用的一系列技术术语提供简短定义。

添加新定义

定义应提供术语的高级描述，并且在大多数情况下不应超过两个或三个句子。
当需要使用另一个非平凡的技术术语作为描述的一部分时，请考虑为该术语添加定义并从原始定义链接到该术语。
定义应该由更详细的文档和相关主题的链接列表补充。

Terms

Agent

An agent is a role a component can play to execute in the background in the context of a session. An agent’s life cycle is not tied to any story, is a singleton per session, and provides services to other components. An agent can be invoked by other components or by the system in response to triggers like push notifications. An agent can provide services to components, send and receive messages, and make proposals to give suggestions to the user.

AppMgr

The Application Manager (AppMgr) is responsible for launching components and managing the namespaces in which those components run. It is the first process started in the fuchsia job by the DevMgr.

Banjo

Banjo is a language for defining protocols that are used to communicate between drivers. It is different from FIDL in that it specifies an ABI for drivers to use to call into each other, rather than an IPC protocol.

Base shell

The platform-guaranteed set of software functionality which provides a basic user-facing interface for boot, first-use, authentication, escape from and selection of session shells, and device recovery.

bootfs

The bootfs RAM disk contains the files needed early in the boot process when no other filesystems are available. It is part of the ZBI, and is decompressed and served by bootsvc. After the early boot process is complete, the bootfs is mounted at /boot.

Documentation

bootsvc

bootsvc is the second process started in Fuchsia. It provides a filesystem service for the bootfs and a loader service that loads programs from the same bootfs. After starting these services, it loads the third program, which defaults to devmgr.

Documentation

Bus Driver

A driver for a device that has multiple children. For example, hardware interfaces like PCI specify a topology in which a single controller is used to interface with multiple devices connected to it. In that situation, the driver for the controller would be a bus driver.

Cache directory

Similar to a data directory, except that the contents of a cache directory may be cleared by the system at any time, such as when the device is under storage pressure. Canonically mapped to /cache in the component instance’s namespace.

Testing isolated cache storage.

Capability

A capability is a value which combines an object reference and a set of rights. When a program has a capability it is conferred the privilege to perform certain actions using that capability. A handle is a common example for a capability.

Capability routing

A way for one component to give capabilities to another instance over the component instance tree. Component manifests define how routing takes place, with syntax for service capabilities, directory capabilities, and storage capabilities.

Capability routing is a components v2 concept.

expose

A component instance may use the expose manifest keyword to indicate that it is making a capability available to its parent to route. Parents may offer a capability exposed by any of their children to their other children or to their parent, but they cannot use it themselves in order to avoid dependency cycles.

offer

A component instance may use the offer manifest keyword to route a capability that was exposed to it to one of its children (other than the child that exposed it).

use

A component instance may use the use manifest keyword to consume a capability that was offered to it by its parent.

Channel

A channel is an IPC primitive provided by Zircon. It is a bidirectional, datagram-like transport that can transfer small messages including Handles. FIDL protocols typically use channels as their underlying transport.

Channel Overview

Component

A component is a unit of executable software on Fuchsia. Components support capability routing, software composition, isolation boundaries, continuity between executions, and introspection.

Component collection

A node in the component instance tree whose children are dynamically instantiated rather than statically defined in acomponent manifest.

Component collection is a components v2 concept.

Component declaration

A component declaration is a FIDL table (fuchsia.sys2.ComponentDecl) that includes information about a component’s runtime configuration, capabilities it exposes, offers, and uses, and facets.

Component declaration is a components v2 concept.

Component Framework

An application framework for declaring and managing components, consisting of build tools, APIs, conventions, and system services.

Components v1, Components v2

Component instance

One of possibly many instances of a particular component at runtime. A component instance has its own environment and lifecycle independent of other instances.

Component instance tree

A tree structure that represents the runtime state of parent-child relationships between component instances. If instance A launches instance B then in the tree A will be the parent of B. The component instance tree is used in static capability routing such that parents can offer capabilities to their children to use, and children can exposecapabilities for their parents to expose to their parents or offer to other children.

Component instance tree is a components v2 concept.

Component Manager

A system service which lets component instances manage their children and routes capabilities between them, thus implementing the component instance tree. Component Manager is the system service that implements thecomponents v2 runtime.

Component Manifest

In Components v1, a component manifest is a JSON file with a .cmx extension that contains information about a component’s runtime configuration, services and directories it receives in its namespace, and facets.

In Components v2, a component manifest is a file with a .cm extension, that encodes a component declaration.

Component manifests v2

Component Manifest Facet

Additional metadata that is carried in a component manifest. This is an extension point to the component framework.

Components v1

A shorthand for the Component Architecture as first implemented on Fuchsia. Includes a runtime as implemented by appmgr and sysmgr, protocols and types as defined in fuchsia.sys, build-time tools such as cmc, and SDK libraries such as libsys and libsvc.

Components v2

Components v2

A shorthand for the Component Architecture in its modern implementation. Includes a runtime as implemented bycomponent_manager, protocols and types as defined in fuchsia.sys2, and build-time tools such as cmc.

Components v1

Concurrent Device Driver

A concurrent device driver is a hardware driver that supports multiple concurrent operations. This may be, for example, through a hardware command queue or multiple device channels. From the perspective of the core driver, the device has multiple pending operations, each of which completes or fails independently. If the driven device can internally parallelize an operation, but can only have one operation outstanding at a time, it may be better implemented with a sequential device driver.

Core Driver

A core driver is a driver that implements the application-facing RPC interface for a class of drivers (e.g. block drivers, ethernet drivers). It is hardware-agnostic. It communicates with a hardware driver through banjo to service its requests.

Data directory

A private directory within which a component instance may store data local to the device, canonically mapped to /data in the component instance’s namespace.

DevHost

A Device Host (DevHost) is a process containing one or more device drivers. They are created by the Device Manager, as needed, to provide isolation between drivers for stability and security.

DevMgr

The Device Manager (DevMgr) is responsible for enumerating, loading, and managing the life cycle of device drivers, as well as low level system tasks (providing filesystem servers for the boot filesystem, launching AppMgr, and so on).

DDK

The Driver Development Kit is the documentation, APIs, and ABIs necessary to build Zircon Device Drivers. Device drivers are implemented as ELF shared libraries loaded by Zircon’s Device Manager.

Directory capability

A capability that permits access to a filesystem directory by adding it to the namespace of the component instancethat uses it. If multiple component instances are offered the same directory capability then they will have access to the same underlying filesystem directory.

Directory capability is a components v2 concept.

Capability routing

Driver

A driver is a dynamic shared library which DevMgr can load into a DevHost and that enables, and controls one or more devices.

Environment

A container for a set of components, which provides a way to manage their lifecycle and provision services for them. All components in an environment receive access to (a subset of) the environment’s services.

Escher

Graphics library for compositing user interface content. Its design is inspired by modern real-time and physically based rendering techniques though we anticipate most of the content it renders to have non-realistic or stylized qualities suitable for user interfaces.

FAR

The Fuchsia Archive Format is a container for files to be used by Zircon and Fuchsia.

FAR Spec

FBL

FBL is the Fuchsia Base Library, which is shared between kernel and userspace.

Zircon C++

fdio

fdio is the Zircon IO Library. It provides the implementation of posix-style open(), close(), read(), write(), select(), poll(), etc, against the RemoteIO RPC protocol. These APIs are return- not-supported stubs in libc, and linking against libfdio overrides these stubs with functional implementations.

Source

FIDL

The Fuchsia Interface Definition Language (FIDL) is a language for defining protocols that are typically used over channels. FIDL is programming language agnostic and has bindings for many popular languages, including C, C++, Dart, Go, and Rust. This approach lets system components written in a variety of languages interact seamlessly.

Flutter

Flutter is a functional-reactive user interface framework optimized for Fuchsia and is used by many system components. Flutter also runs on a variety of other platforms, including Android and iOS. Fuchsia itself does not require you to use any particular language or user interface framework.

Fuchsia API Surface

The Fuchsia API Surface is the combination of the Fuchsia System Interface and the client libraries included in the Fuchsia SDK.

Fuchsia Package

A Fuchsia Package is a unit of software distribution. It is a collection of files, such as manifests, metadata, zero or more executables (e.g. Components), and assets. Individual Fuchsia Packages can be identified using fuchsia-pkg URLs.

fuchsia-pkg URL

The fuchsia-pkg URL scheme is a means for referring to a repository, a package, or a package resource. The syntax is fuchsia-pkg://<repo-hostname>[/<pkg-name>][#<path>]]. E.g., for the componentecho_client_dart.cmx published under the package echo_dart‘s meta directory, from the fuchsia.comrepository, its URL is fuchsia-pkg://fuchsia.com/echo_dart#meta/echo_client_dart.cmx.

Fuchsia SDK

The Fuchsia SDK is a collection of libraries and tools that the Fuchsia project provides to Fuchsia developers. Among other things, the Fuchsia SDK contains a definition of the Fuchsia System Interface as well as a number of client libraries.

Fuchsia System Interface

The Fuchsia System Interface is the binary interface that the Fuchsia operating system presents to software it runs. For example, the entry points into the vDSO as well as all the FIDL protocols are part of the Fuchsia System Interface.

Fuchsia Volume Manager

Fuchsia Volume Manager (FVM) is a partition manager providing dynamically allocated groups of blocks known as slices into a virtual block address space. The FVM partitions provide a block interface enabling filesystems to interact with it in a manner largely consistent with a regular block device.

Filesystems

GN

GN is a meta-build system which generates build files so that Fuchsia can be built with Ninja. GN is fast and comes with solid tools to manage and explore dependencies. GN files, named BUILD.gn, are located all over the repository.

Handle

A Handle is how a userspace process refers to a kernel object. They can be passed to other processes over Channels.

Reference

Hardware Driver

A hardware driver is a driver that controls a device. It receives requests from its core driver and translates them into hardware-specific operations. Hardware drivers strive to be as thin as possible. They do not support RPC interfaces, ideally have no local worker threads (though that is not a strict requirement), and some will have interrupt handling threads. They may be further classified into sequential device drivers and concurrent device drivers.

Hub

The hub is a portal for introspection. It enables tools to access detailed structural information about realms and component instances at runtime, such as their names, job and process ids, and published services.

Hub

Jiri

Jiri is a tool for multi-repo development. It is used to checkout the Fuchsia codebase. It supports various subcommands which makes it easy for developers to manage their local checkouts.

Job

A Job is a kernel object that groups a set of related processes, their child processes and their jobs (if any). Every process in the system belongs to a job and all jobs form a single rooted tree.

Job Overview

Kernel Object

A kernel object is a kernel data structure which is used to regulate access to system resources such as memory, i/o, processor time and access to other processes. Userspace can only reference kernel objects via Handles.

Reference

KOID

A Kernel Object Identifier.

Kernel Object

Ledger

Ledger is a distributed storage system for Fuchsia. Applications use Ledger either directly or through state synchronization primitives exposed by the Modular framework that are based on Ledger under-the-hood.

LK

Little Kernel (LK) is the embedded kernel that formed the core of the Zircon Kernel. LK is more microcontroller-centric and lacks support for MMUs, userspace, system calls — features that Zircon added.

LK on Github

Module

A module is a role a component can play to participate in a story. Every component can be be used as a module, but typically a module is asked to show UI. Additionally, a module can have a module metadata file which describes the Module’s data compatibility and semantic role.

Module metadata format

Musl

Fuchsia’s standard C library (libc) is based on Musl Libc.

Namespace

A namespace is the composite hierarchy of files, directories, sockets, services, and other named objects which are offered to components by their environment.

Fuchsia Namespace Spec

Netstack

An implementation of TCP, UDP, IP, and related networking protocols for Fuchsia.

Ninja

Ninja is the build system executing Fuchsia builds. It is a small build system with a strong emphasis on speed. Unlike other systems, Ninja files are not supposed to be manually written but should be generated by other systems, such as GN in Fuchsia.

Outgoing directory

A file system directory where a component may expose capabilities for others to use.

Package

Package is an overloaded term. Package may refer to a Fuchsia Package or a GN build package.

Paver

A tool in Zircon that installs partition images to internal storage of a device.

Guide for installing Fuchsia with paver.

Platform Source Tree

The Platform Source Tree is the open source code hosted on fuchsia.googlesource.com, which comprises the source code for Fuchsia. A given Fuchsia system can include additional software from outside the Platform Source Tree by adding the appropriate Fuchsia Package.

Realm

In components v1, realm is synonymous to environment.

In components v2, a realm is a subtree of component instances in the component instance tree. It acts as a container for component instances and capabilities in the subtree.

Scenic

The system compositor. Includes views, input, compositor, and GPU services.

Sequential Device Driver

A sequential device driver is a hardware driver that will only service a single request at a time. The core driversynchronizes and serializes all requests.

Service

A service is an implementation of a FIDL interface. Components can offer their creator a set of services, which the creator can either use directly or offer to other components.

Services can also be obtained by interface name from a Namespace, which lets the component that created the namespace pick the implementation of the interface. Long-running services, such as Scenic, are typically obtained through a Namespace, which lets many clients connect to a common implementation.

Service capability

A capability that permits communicating with a service over a channel using a specified FIDL protocol. The server end of the channel is held by the component instance that provides the capability. The client end of the channel is given to the component instance that uses the capability.

Capability routing

Service capability is a components v2 concept.

Session

An interactive session with one or more users. Has a session shell, which manages the UI for the session, and zero or more stories. A device might have multiple sessions, for example if users can interact with the device remotely or if the device has multiple terminals.

Session Shell

The replaceable set of software functionality that works in conjunction with devices to create an environment in which people can interact with mods, agents and suggestions.

Storage capability

A storage capability is a capability that allocates per-component isolated storage for a designated purpose within a filesystem directory. Multiple component instances may be given the same storage capability, but underlying directories that are isolated from each other will be allocated for each individual use. This is different from directory capabilities, where a specific filesystem directory is routed to a specific component instance.

Isolation is achieved because Fuchsia does not support dotdot.

There are three types of storage capabilities:

data: a directory is added to the namespace of the component instance that uses the capability. Acts as a data directory.
cache: same as data, but acts as a cache directory.
meta: a directory is allocated to be used by component manager, where it will store metadata to enable features like persistent component collections.

Storage capability is a components v2 concept.

Capability routing

Story

A user-facing logical container encapsulating human activity, satisfied by one or more related modules. Stories allow users to organize activities in ways they find natural, without developers having to imagine all those ways ahead of time.

Story Shell

The system responsible for the visual presentation of a story. Includes the presenter component, plus structure and state information read from each story.

userboot

userboot is the first process started by the Zircon kernel. It is loaded from the kernel image in the same way as the vDSO, instead of being loaded from a filesystem. Its primary purpose is to load the second process, bootsvc, from the bootfs.

Documentation

Virtual Dynamic Shared Object

The Virtual Dynamic Shared Object (vDSO) is a Virtual Shared Library — it is provided by the Zircon kernel and does not appear in the filesystem or a package. It provides the Zircon System Call API/ABI to userspace processes in the form of an ELF library that’s “always there.” In the Fuchsia SDK and Zircon DDK it exists as libzircon.so for the purpose of having something to pass to the linker representing the vDSO.

Virtual Memory Address Range

A Virtual Memory Address Range (VMAR) is a Zircon kernel object that controls where and how Virtual Memory Objects may be mapped into the address space of a process.

VMAR Overview

Virtual Memory Object

A Virtual Memory Object (VMO) is a Zircon kernel object that represents a collection of pages (or the potential for pages) which may be read, written, mapped into the address space of a process, or shared with another process by passing a Handle over a Channel.

VMO Overview

Zircon Boot Image

A Zircon Boot Image (ZBI) contains everything needed during the boot process before any drivers are working. This includes the kernel image and a RAM disk for the boot filesystem.

ZBI header file

Zedboot

Zedboot is a recovery image that is used to install and boot a full Fuchsia system. Zedboot is actually an instance of the Zircon kernel with a minimal set of drivers and services running used to bootstrap a complete Fuchsia system on a target device. Upon startup, Zedboot listens on the network for instructions from a bootserver which may instruct Zedboot to install a new OS. Upon completing the installation Zedboot will reboot into the newly installed system.

Zircon

Zircon is the microkernel and lowest level userspace components (driver runtime environment, core drivers, libc, etc) at the core of Fuchsia. In a traditional monolithic kernel, many of the userspace components of Zircon would be part of the kernel itself.

ZX

ZX is an abbreviation of “Zircon” used in Zircon C APIs/ABIs (zx_channel_create(), zx_handle_t, ZX_EVENT_SIGNALED, etc) and libraries (libzx in particular).

ZXDB

The native low-level system debugger.

Reference

作者 east

Fuchsia 6月 29,2019

Fuchsia 行为准则

Google和Fuchsia团队致力于保护和培养多元化，热情的社区。以下是我们的社区行为准则，适用于我们的回购和组织，邮件列表，博客内容，IRC频道和任何其他由Fuchsia支持的通信组，以及在这些空间环境中发起的任何私人通信。简而言之，社区讨论应该是

尊重和善良;
关于Fuchsia;
关于功能和代码，而不是涉及的个人。
要有尊重和建设性。
尊重每一个人。建立彼此的想法。我们每个人都有权享受我们的经验和参与，而不必担心骚扰，歧视或屈尊俯就，无论是公然的还是微妙的。请记住，Fuchsia是一个地理位置分散的团队，您可能无法与他们的主要语言的人沟通。在处理困难问题时，我们都会感到沮丧，但我们不能让这种挫败感变成人身攻击。

如果您看到或听到某些声音，请说出来。
当您认为自己或他人不受尊重时，您有权礼貌地参与。让你感到不舒服的人可能不知道他们在做什么 – 鼓励他们礼貌地引起他们的注意。

如果您感到不舒服，或者认为您的问题未得到适当考虑，您可以发送电子邮件至fuchsia-community-managers@google.com，请求社区经理参与。与社区经理共享的所有问题都将保密，但您也可以在此处提交匿名报告。请注意，如果没有办法与您联系，匿名报告可能难以采取行动。您还可以创建一个一次性帐户进行报告。如果认为有必要进行公众回应，受害者和记者的身份将保密，除非这些人以其他方式指示我们。

虽然所有报告都将得到认真对待，但Fuchsia社区管理人员可能不会对他们认为不违反本行为准则的投诉采取行动。

我们不会容忍任何形式的骚扰，包括但不限于：
骚扰评论
恐吓
鼓励一个人进行自我伤害。
线程，频道，列表等的持续中断或脱轨
冒犯或暴力评论，笑话或其他
不恰当的性内容
不受欢迎的性关注或其他积极的关注
请求停止后继续进行一对一通信
分发或威胁分发人们的个人识别信息，AKA“doxing”
未遵守此政策的后果
不遵守此政策的后果可能包括由Fuchsia社区经理自行决定：

请求道歉;
私人或公共警告或谴责;
暂时禁止邮件列表，博客，Fuchsia存储库或组织，或其他由Fuchsia支持的通信组，包括失去提交者身份;
永久禁止任何上述内容，或来自所有当前和未来的Fuchsia支持或Google支持的社区，包括失去提交者身份。
与会者警告说，任何骚扰行为都应立即停止;如果不这样做，将导致后果升级。可以通过fuchsia-community-appeals@google.com对Fuchsia社区经理的决定提出上诉。

致谢
本行为准则改编自Chromium行为准则，基于极客女权主义行为准则，Django行为准则和极客女权主义维基“有效行为准则”指南。

作者 east

大数据开发 5月 21,2019

大数据开发面试题及答案-kafka篇

kafka怎么做到不丢失数据，不重复数据，以及kafka中的数据是存储在什么地方的？

昨天面试中被问到kafka怎么做到对于数据的不丢失，不重复。

首先怎么做到不重复消费呢？

在kafka的消费中，我们一般使用zookeeper充当kafka的消费者，去消费kafka中的数据。那么怎么做到不重复消费呢？假如消费了一段时间之后，kafka挂掉了，这时候需要将sparkstreaming拉起来，然后继续进行消费。那么这时候是不是又进行从头开始消费了呢？不是的，因为kafka中有一个offset，就是消费者偏移量，当sparkstreaming消费kafka中的数据，消费完一部分会向zookeeper中记录一次这个消费者偏移量，也就是记录消费到什么地方了，当系统挂掉再一次拉起来之后，会去zookeeper中寻找上一次消费到哪里了，然后接着这个地方消费。

其次，如何做到不丢失呢？

因为kafka中的消息队列中对于消息有一个过期时间，默认是7天，当然这个时间人为可以设定。在这个时间之内的数据，我们在消费的时候还可以继续去消费。

假如sparkstreaming去消费kafka中的数据，同时做处理，当处理了一天的数据量了，才发现这个处理方式式错误的，那怎么办呢？处理方法不当的这批数据是需要废弃的，需要重新进行消费的，那再怎么进行消费呢？因为在7天之内，这个数据还存储在kafka中的，需要指定from beginning，然后再去重新消费就好了。

在kafka的面试中，还会问到一个问题：

kafka中的数据存储在什么地方？

答案是kafka中的数据具体是存储在partition分区中的一个个segment分段中的。

在kafka中有topic————》partition————》segment

一个topic创建几个partition，创建的时候就可以指定。segment中存储了数据文件和索引文件。

kafka集群的规模，消费速度是多少。

答：一般中小型公司是10个节点，每秒20M左右。

作者 east

大数据开发 5月 21,2019

大数据开发面试题及答案-数据仓库篇

关于数据清洗工作的理解（包括数据清洗是做什么的，为什么要进行数据清洗工作，什么样的数据叫脏数据，脏数据如何进行数据的处理）

数据清洗(Data cleaning)– 对数据进行重新审查和校验的过程，目的在于删除重复信息、纠正存在的错误，并提供数据一致性。

数据清洗从名字上也看的出就是把”脏”的”洗掉”，指发现并纠正数据文件中可识别的错误的最后一道程序，包括检查数据一致性，处理无效值和缺失值等。因为数据仓库中的数据是面向某一主题的数据的集合，这些数据从多个业务系统中抽取而来而且包含历史数据，这样就避免不了有的数据是错误数据、有的数据相互之间有冲突，这些错误的或有冲突的数据显然是我们不想要的，称为”脏数据”。我们要按照一定的规则把”脏数据””洗掉”，这就是数据清洗。而数据清洗的任务是过滤那些不符合要求的数据，将过滤的结果交给业务主管部门，确认是否过滤掉还是由业务单位修正之后再进行抽取。不符合要求的数据主要是有不完整的数据、错误的数据、重复的数据三大类。数据清洗是与问卷审核不同，录入后的数据清理一般是由计算机而不是人工完成。

主要类型
残缺数据
这一类数据主要是一些应该有的信息缺失，如供应商的名称、分公司的名称、客户的区域信息缺失、业务系统中主表与明细表不能匹配等。对于这一类数据过滤出来，按缺失的内容分别写入不同Excel文件向客户提交，要求在规定的时间内补全。补全后才写入数据仓库。

错误数据
这一类错误产生的原因是业务系统不够健全，在接收输入后没有进行判断直接写入后台数据库造成的，比如数值数据输成全角数字字符、字符串数据后面有一个回车操作、日期格式不正确、日期越界等。这一类数据也要分类，对于类似于全角字符、数据前后有不可见字符的问题，只能通过写SQL语句的方式找出来，然后要求客户在业务系统修正之后抽取。日期格式不正确的或者是日期越界的这一类错误会导致ETL运行失败，这一类错误需要去业务系统数据库用SQL的方式挑出来，交给业务主管部门要求限期修正，修正之后再抽取。

重复数据
对于这一类数据–特别是维表中会出现这种情况–将重复数据记录的所有字段导出来，让客户确认并整理。

数据清洗是一个反复的过程，不可能在几天内完成，只有不断的发现问题，解决问题。对于是否过滤，是否修正一般要求客户确认，对于过滤掉的数据，写入Excel文件或者将过滤数据写入数据表，在ETL开发的初期可以每天向业务单位发送过滤数据的邮件，促使他们尽快地修正错误,同时也可以做为将来验证数据的依据。数据清洗需要注意的是不要将有用的数据过滤掉，对于每个过滤规则认真进行验证，并要用户确认。

以上是百度百科的答案，以下才是我的理解。

总之：

数据清洗ETL是指对过来的数据进行处理成干净的数据。

主要的步骤有以下几个：

首先是需要接收数据

然后还有可能涉及到数据格式的转换，logstash是将结构化的数据转换成json格式的数据的一种方式

另外如果是进行离线数据处理的话还需要存储过来的脏数据，脏数据一般是存储在HDFS上的。离线一般使用MapReduce进行数据的清洗工作

如果是流式处理框架的话需要接收数据，去进行处理的。我们一般在流式处理框架中是使用kafka进行数据的接收，然后用sparkstreaming充当消费者进行数据的处理的。同时在这个sparkstreaming中进行数据清洗工作。

数据清洗之后就是干净的数据了，需要进行存储，由于数据的量比较大，因此又需要进行存储，一般会选择hdfs进行数据的存储处理。

之后数据就算是入库了，需要进行分析或者进行可视化或者进行AI模型训练等。

这就是数据处理过程的大概流程。

其中数据清洗阶段需要做的工作就是将这些脏数据进行处理，弄成干净的数据。具体怎么做呢？那么何为脏数据呢？

脏数据包括：

1、重复数据

2、残缺不全的数据，也就是数据中有一部分段或者该字段的一部分丢失了

3、错误数据，就是某一些字段或者一些字段对应的值明显是错误的。

那么针对于这些值应该怎么处理呢？

不同的数据，用途不同，处理的方式也是不一样的。一般的处理方式式删除掉，过滤掉，就是下一次存入数据库的时候不进行存车这些数据；还有就是将残缺的数据进行补齐，当然在补齐的时候是需要有一定的规则的，常见的方式式进行对这个值及其前后值去请均值；

如果是重复数据的话，那么使用distinct进行去重；

如果是不太全的数据，后期还需要进行做机器学习模型训练，那么就需要非常大量的数据，而当数据量又不太大的时候需要进行处理加工的，常见的方式就是进行前后值取平均值、和它前边的数据保持一致，和后边的数据进项保持一致等。

如果是错误数据是需要进行纠正错误的，比如将错误的值进行纠正等。

如果数据只是单纯的进行可视化，那么就是不让数据在前端展示就好了，这个时候错误数据处理的方式就有以下几种：

一个是在接口去库里进行查询的时候，做个判断，如果是不符合要求的数据，那么就不要查询这个字段的数据，这个是最后的最笨的办法，其实还有比较聪明的办法，就是在数据第二次入库之前，进行etl数据清洗，将脏数据处理就好了。这个就是ETL做的工作了。

作者 east

大数据开发 5月 20,2019

大数据开发面试题及答案-算法篇

有 10 个文件，每个文件 1G，每个文件的每一行存放的都是用户的 query，每个文件的query 都可能重复。要求你按照 query 的频度排序。还是典型的 TOP K 算法，
解决方案如下：
1）方案 1：
顺序读取 10 个文件，按照 hash(query)%10 的结果将 query 写入到另外 10 个文件（记为）中。这样新生成的文件每个的大小大约也 1G（假设 hash 函数是随机的）。找一台内存在 2G 左右的机器，依次对用 hash_map(query, query_count)来统计每个query 出现的次数。利用快速/堆/归并排序按照出现次数进行排序。将排序好的 query 和对应的 query_cout 输出到文件中。这样得到了 10 个排好序的文件（记为）。对这 10 个文件进行归并排序（内排序与外排序相结合）。
2）方案 2：
一般 query 的总量是有限的，只是重复的次数比较多而已，可能对于所有的 query，一次性就可以加入到内存了。这样，我们就可以采用 trie 树/hash_map等直接来统计每个 query出现的次数，然后按出现次数做快速/堆/归并排序就可以了。
3）方案 3：
与方案 1 类似，但在做完 hash，分成多个文件后，可以交给多个文件来处理，采用分布式的架构来处理（比如 MapReduce），最后再进行合并。

在 2.5 亿个整数中找出不重复的整数，注，内存不足以容纳这 2.5 亿个整数。
1）方案 1：采用 2-Bitmap（每个数分配 2bit，00 表示不存在，01 表示出现一次，10 表示多次，11 无意义）进行，共需内存 2^32 * 2 bit=1 GB 内存，还可以接受。然后扫描这 2.5亿个整数，查看 Bitmap 中相对应位，如果是 00 变 01，01 变 10，10 保持不变。所描完事后，查看 bitmap，把对应位是 01 的整数输出即可。
2）方案 2：也可采用与第 1 题类似的方法，进行划分小文件的方法。然后在小文件中找出不重复的整数，并排序。然后再进行归并，注意去除重复的元素。

腾讯面试题：给 40 亿个不重复的 unsigned int 的整数，没排过序的，然后再给一个数，如何快速判断这个数是否在那 40 亿个数当中？
1）方案 1：oo，申请 512M 的内存，一个 bit 位代表一个 unsigned int 值。读入 40 亿个数，设置相应的 bit 位，读入要查询的数，查看相应 bit 位是否为 1，为 1 表示存在，为 0 表示不存在。
2）方案 2：这个问题在《编程珠玑》里有很好的描述，大家可以参考下面的思路，探讨一下：又因为 2^32 为 40 亿多，所以给定一个数可能在，也可能不在其中；这里我们把 40 亿个数中的每一个用 32 位的二进制来表示，假设这 40 亿个数开始放在一个文件中。然后将这 40 亿个数分成两类:
1.最高位为 0
2.最高位为 1
并将这两类分别写入到两个文件中，其中一个文件中数的个数<=20 亿，而另一个>=20 亿（这相当于折半了）；与要查找的数的最高位比较并接着进入相应的文件再查找再然后把这个文件为又分成两类:
1.次最高位为 0
2.次最高位为 1
并将这两类分别写入到两个文件中，其中一个文件中数的个数<=10 亿，而另一个>=10 亿（这相当于折半了）；与要查找的数的次最高位比较并接着进入相应的文件再查找。
…..
以此类推，就可以找到了,而且时间复杂度为 O(logn)，方案 2 完。
3)附：这里，再简单介绍下，位图方法：使用位图法判断整形数组是否存在重复 ,判断集合中存在重复是常见编程任务之一，当集合中数据量比较大时我们通常希望少进行几次扫描，这时双重循环法就不可取了。
位图法比较适合于这种情况，它的做法是按照集合中最大元素 max 创建一个长度为 max+1的新数组，然后再次扫描原数组，遇到几就给新数组的第几位置上 1，如遇到 5 就给新数组的第六个元素置 1，这样下次再遇到 5 想置位时发现新数组的第六个元素已经是 1 了，这说明这次的数据肯定和以前的数据存在着重复。这种给新数组初始化时置零其后置一的做法类似于位图的处理方法故称位图法。它的运算次数最坏的情况为 2N。如果已知数组的最大值即能事先给新数组定长的话效率还能提高一倍。

怎么在海量数据中找出重复次数最多的一个？
1）方案 1：先做 hash，然后求模映射为小文件，求出每个小文件中重复次数最多的一个，并记录重复次数。然后找出上一步求出的数据中重复次数最多的一个就是所求（具体参考前面的题）。

上千万或上亿数据（有重复），统计其中出现次数最多的钱 N 个数据。
1）方案 1：上千万或上亿的数据，现在的机器的内存应该能存下。所以考虑采用 hash_map/搜索二叉树/红黑树等来进行统计次数。然后就是取出前 N 个出现次数最多的数据了，可以用第 2 题提到的堆机制完成。

一个文本文件，大约有一万行，每行一个词，要求统计出其中最频繁出现的前 10 个词，给出思想，给出时间复杂度分析。
1）方案 1：这题是考虑时间效率。用 trie 树统计每个词出现的次数，时间复杂度是 O(n*le)（le表示单词的平准长度）。然后是找出出现最频繁的前 10 个词，可以用堆来实现，前面的题中已经讲到了，时间复杂度是 O(n*lg10)。所以总的时间复杂度，是 O(n*le)与 O(n*lg10)中较大的哪一个。

100w 个数中找出最大的 100 个数。
1）方案 1：在前面的题中，我们已经提到了，用一个含 100 个元素的最小堆完成。复杂度为O(100w*lg100)。
2）方案 2：采用快速排序的思想，每次分割之后只考虑比轴大的一部分，知道比轴大的一部分在比 100 多的时候，采用传统排序算法排序，取前 100 个。复杂度为 O(100w*100)。
3）方案 3：采用局部淘汰法。选取前 100 个元素，并排序，记为序列 L。然后一次扫描剩余的元素 x，与排好序的 100 个元素中最小的元素比，如果比这个最小的要大，那么把这个最小的元素删除，并把 x 利用插入排序的思想，插入到序列 L 中。依次循环，直到扫描了所有的元素。复杂度为 O(100w*100)。

有一千万条短信，有重复，以文本文件的形式保存，一行一条，有重复。请用 5 分钟时间，找出重复出现最多的前 10 条。
1）分析：常规方法是先排序，在遍历一次，找出重复最多的前 10 条。但是排序的算法复杂度最低为nlgn。

2）可以设计一个 hash_table, hash_map<string, int>，依次读取一千万条短信，加载到hash_table 表中，并且统计重复的次数，与此同时维护一张最多 10 条的短信表。这样遍历一次就能找出最多的前 10 条，算法复杂度为 O(n)。

作者 east

大数据开发 5月 20,2019

大数据开发面试题及答案-hadoop篇

fsimage和edit的区别？
大家都知道namenode与secondary namenode 的关系，当他们要进行数据同步时叫做checkpoint时就用到了fsimage与edit，fsimage是保存最新的元数据的信息，当fsimage数据到一定的大小事会去生成一个新的文件来保存元数据的信息，这个新的文件就是edit，edit会回滚最新的数据。

列举几个配置文件优化？ –发挥
1）Core-site.xml 文件的优化
a、fs.trash.interval，默认值： 0；说明：这个是开启hdfs文件删除自动转移到垃圾箱的选项，值为垃圾箱文件清除时间。一般开启这个会比较好，以防错误删除重要文件。单位是分钟。
b、dfs.namenode.handler.count，默认值：10；说明：hadoop系统里启动的任务线程数，这里改为40，同样可以尝试该值大小对效率的影响变化进行最合适的值的设定。
c、mapreduce.tasktracker.http.threads，默认值：40；说明：map和reduce是通过http进行数据传输的，这个是设置传输的并行线程数。

datanode 首次加入 cluster 的时候，如果 log 报告不兼容文件版本，那需要namenode 执行格式化操作，这样处理的原因是？
1）这样处理是不合理的，因为那么 namenode 格式化操作，是对文件系统进行格式化，namenode 格式化时清空 dfs/name 下空两个目录下的所有文件，之后，会在目录 dfs.name.dir 下创建文件。
2）文本不兼容，有可能时 namenode 与 datanode 的数据里的 namespaceID、clusterID 不一致，找到两个 ID 位置，修改为一样即可解决。

MapReduce 中排序发生在哪几个阶段？这些排序是否可以避免？为什么？
1）一个 MapReduce 作业由 Map 阶段和 Reduce 阶段两部分组成，这两阶段会对数据排序，从这个意义上说，MapReduce 框架本质就是一个 Distributed Sort。
2）在 Map 阶段，Map Task 会在本地磁盘输出一个按照 key 排序（采用的是快速排序）的文件（中间可能产生多个文件，但最终会合并成一个），在 Reduce 阶段，每个 Reduce Task 会对收到的数据排序，这样，数据便按照 Key 分成了若干组，之后以组为单位交给 reduce（）处理。
3）很多人的误解在 Map 阶段，如果不使用 Combiner便不会排序，这是错误的，不管你用不用 Combiner，Map Task 均会对产生的数据排序（如果没有 Reduce Task，则不会排序，实际上 Map 阶段的排序就是为了减轻 Reduce端排序负载）。
4）由于这些排序是 MapReduce 自动完成的，用户无法控制，因此，在hadoop 1.x 中无法避免，也不可以关闭，但 hadoop2.x 是可以关闭的。

hadoop的优化？
1）优化的思路可以从配置文件和系统以及代码的设计思路来优化
2）配置文件的优化：调节适当的参数，在调参数时要进行测试
3）代码的优化：combiner的个数尽量与reduce的个数相同，数据的类型保持一致，可以减少拆包与封包的进度
4）系统的优化：可以设置linux系统打开最大的文件数预计网络的带宽MTU的配置
5）为 job 添加一个 Combiner，可以大大的减少shuffer阶段的maoTask拷贝过来给远程的 reduce task的数据量，一般而言combiner与reduce相同。
6）在开发中尽量使用stringBuffer而不是string，string的模式是read-only的，如果对它进行修改，会产生临时的对象，二stringBuffer是可修改的，不会产生临时对象。
7）修改一下配置：以下是修改 mapred-site.xml 文件
a、修改最大槽位数：槽位数是在各个 tasktracker 上的 mapred-site.xml 上设置的，默认都是 2
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>2</value>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>2</value>
</property>
b、调整心跳间隔：集群规模小于 300 时，心跳间隔为 300 毫秒
mapreduce.jobtracker.heartbeat.interval.min 心跳时间
mapred.heartbeats.in.second 集群每增加多少节点，时间增加下面的值
mapreduce.jobtracker.heartbeat.scaling.factor 集群每增加上面的个数，心跳增多少
c、启动带外心跳
mapreduce.tasktracker.outofband.heartbeat 默认是 false
d、配置多块磁盘
mapreduce.local.dir
e、配置 RPC hander 数目
mapred.job.tracker.handler.count 默认是 10，可以改成 50，根据机器的能力
f、配置 HTTP 线程数目
tasktracker.http.threads 默认是 40，可以改成 100 根据机器的能力
g、选择合适的压缩方式，以 snappy 为例：
<property>
<name>mapred.compress.map.output</name>
<value>true</value>
</property>
<property>
<name>mapred.map.output.compression.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>

hadoop的优点

Hadoop是一个能够让用户轻松架构和使用的分布式计算平台。用户可以轻松地在Hadoop上开发和运行处理海量数据的应用程序。它主要有以下几个优点：
高可靠性。Hadoop按位存储和处理数据的能力值得人们信赖。
高扩展性。Hadoop是在可用的计算机集簇间分配数据并完成计算任务的，这些集簇可以方便地扩展到数以千计的节点中。
高效性。Hadoop能够在节点之间动态地移动数据，并保证各个节点的动态平衡，因此处理速度非常快。
高容错性。Hadoop能够自动保存数据的多个副本，并且能够自动将失败的任务重新分配。低成本。与一体机、商用数据仓库以及QlikView、Yonghong Z-Suite等数据集市相比，hadoop是开源的，项目的软件成本因此会大大降低。

Hadoop的三种运行模式

1.独立（本地）运行模式：无需任何守护进程，所有的程序都运行在同一个JVM上执行。在独立模式下调试MR程序非常高效方便。所以一般该模式主要是在学习或者开发阶段调试使用。
2.伪分布式模式： Hadoop守护进程运行在本地机器上，模拟一个小规模的集群，换句话说，可以配置一台机器的Hadoop集群,伪分布式是完全分布式的一个特例。
3.完全分布式模式：Hadoop守护进程运行在一个集群上。注意：所谓分布式要启动守护进程，即：使用分布式hadoop时，要先启动一些准备程序进程，然后才能使用比如start-dfs.sh start-yarn.sh。而本地模式不需要启动这些守护进程

作者 east

年度归档2019