dask

Parallel/distributed computing. Scale pandas/NumPy beyond memory, parallel DataFrames/Arrays, multi-file processing, task graphs, for larger-than-RAM datasets and parallel workflows.

下载完整技能包查看接入说明

downloads

updated

2026/06/17

author

Admin

visibility

已公开

downloads.trend.tsxlast 7 days

当前技能最近 7 天下载趋势

用和首页一致的趋势图，快速判断这个 skill 最近是否还在被持续下载和使用。

7d total

quickstart.shinstall

安装命令

npx skills add dask

使用建议

先看趋势和左侧结构化信息，再决定是直接下载、复制安装命令，还是继续阅读原始 `SKILL.md`。

打开 Docs 返回搜索页

overview.tsdecision summary

Dask is a Python library for parallel and distributed computing that enables three critical capabilities: - Larger-than-memory execution on single machines for data exceeding available RAM - Parallel processing for improved computational speed across multiple cores - Distributed computation supporting terabyte-scale datasets across multiple machines

Dask scales from laptops (processing 100 GiB) to clusters (processing 100 TiB) while maintaining familiar Python APIs.

SKILL.md previewcollapsible

name

dask

description

Parallel/distributed computing. Scale pandas/NumPy beyond memory, parallel DataFrames/Arrays, multi-file processing, task graphs, for larger-than-RAM datasets and parallel workflows.

---
name: dask
description: "Parallel/distributed computing. Scale pandas/NumPy beyond memory, parallel DataFrames/Arrays, multi-file processing, task graphs, for larger-than-RAM datasets and parallel workflows."
---

# Dask

## Overview

Dask is a Python library for parallel and distributed computing that enables three critical capabilities:
- **Larger-than-memory execution** on single machines for data exceeding available RAM
- **Parallel processing** for improved computational speed across multiple cores
- **Distributed computation** supporting terabyte-scale datasets across multiple machines

Dask scales from laptops (processing ~100 GiB) to clusters (processing ~100 TiB) while maintaining familiar Python APIs.

## When to Use This Skill

This skill should be used when:
- Process datasets that exceed available RAM
- Scale pandas or NumPy operations to larger datasets
- Parallelize computations for performance improvements
- Process multiple files efficiently (CSVs, Parquet, JSON, text logs)
- Build custom parallel workflows with task dependencies
- Distribute workloads across multiple cores or machines

## Core Capabilities

Dask provides five main components, each suited to different use cases:

### 1. DataFrames - Parallel Pandas Operations

**Purpose**: Scale pandas operations to larger datasets through parallel processing.

**When to Use**:
- Tabular data exceeds available RAM
- Need to process multiple CSV/Parquet files together
- Pandas operations are slow and need parallelization
- Scaling from pandas prototype to production

**Reference Documentation**: For comprehensive guidance on Dask DataFrames, refer to `references/dataframes.md` which includes:
- Reading data (single files, multiple files, glob patterns)
- Common operations (filtering, groupby, joins, aggregations)
- Custom operations with `map_partitions`
- Performance optimization tips
- Common patterns (ETL, time series, multi-file processing)

**Quick Example**:
```python
import dask.dataframe as dd

# Read multiple files as single DataFrame
ddf = dd.read_csv('data/2024-*.csv')

# Operations are lazy until compute()
filtered = ddf[ddf['value'] > 100]
result = filtered.groupby('category').mean().compute()
```

**Key Points**:
- Operations are lazy (build task graph) until `.compute()` called
- Use `map_partitions` for efficient custom operations
- Convert to DataFrame early when working with structured data from other sources

### 2. Arrays - Parallel NumPy Operations

**Purpose**: Extend NumPy capabilities to datasets larger than memory using blocked algorithms.

**When to Use**:
-

预览已截断。下载完整技能包可查看全部文件内容。

next-steps.mdrecommended flow

1. 先判断它是否匹配你的任务、运行环境和依赖边界。

2. 再结合最近 7 天下载趋势，决定是直接安装还是先下载完整包审阅。

3. 需要程序化集成时，再去 Docs 查看 API 和 OpenAPI 描述。

related.tssame category

frontend-skill

Use when the task asks for a visually strong landing page, website, app, prototype, demo, or game UI. This skill enforces restrained composition, image-led hierarchy, cohesive content structure, and tasteful motion while avoiding generic cards, weak branding, and UI clutter.

★ 346

超能模式

超能模式：作为主 skill，超能模式负责理解复杂需求、维护统一任务状态并统筹整条执行链路，它会把任务分别交给 “复杂任务分处理器”、"深度网页搜索"、"多内容生成器 "等子 skill 协同完成，适合多步骤、多交付物、研究+生成一体化的复杂任务场景。

★ 279