From 9502ab5f90de468e89d6edfffa77c11b1301dcd0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Istv=C3=A1n=20Zolt=C3=A1n=20Szab=C3=B3?= Date: Thu, 1 Apr 2021 18:46:18 +0200 Subject: [PATCH] [7.x] [DOCS] Adds helpers section to Python book --- docs/guide/helpers.asciidoc | 94 +++++++++++++++++++++++++++++++++++++ docs/guide/index.asciidoc | 4 +- 2 files changed, 97 insertions(+), 1 deletion(-) create mode 100644 docs/guide/helpers.asciidoc diff --git a/docs/guide/helpers.asciidoc b/docs/guide/helpers.asciidoc new file mode 100644 index 00000000..4c7b5f40 --- /dev/null +++ b/docs/guide/helpers.asciidoc @@ -0,0 +1,94 @@ +[[client-helpers]] +== Client helpers + +You can find here a collection of simple helper functions that abstract some +specifics of the raw API. For detailed examples, refer to +https://elasticsearch-py.readthedocs.io/en/stable/helpers.html[this page]. + + +[discrete] +[[bulk-helpers]] +=== Bulk helpers + +There are several helpers for the bulk API since its requirement for specific +formatting and other considerations can make it cumbersome if used directly. + +All bulk helpers accept an instance of `{es}` class and an iterable `action` +(any iterable, can also be a generator, which is ideal in most cases since it +allows you to index large datasets without the need of loading them into +memory). + +The items in the iterable `action` should be the documents we wish to index in +several formats. The most common one is the same as returned by `search()`, for +example: + +[source,yml] +---------------------------- +{ + '_index': 'index-name', + '_type': 'document', + '_id': 42, + '_routing': 5, + 'pipeline': 'my-ingest-pipeline', + '_source': { + "title": "Hello World!", + "body": "..." + } +} +---------------------------- + +Alternatively, if `_source` is not present, it pops all metadata fields from +the doc and use the rest as the document data: + +[source,yml] +---------------------------- +{ + "_id": 42, + "_routing": 5, + "title": "Hello World!", + "body": "..." +} +---------------------------- + +The `bulk()` api accepts `index`, `create`, `delete`, and `update` actions. Use +the `_op_type` field to specify an action (`_op_type` defaults to `index`): + +[source,yml] +---------------------------- +{ + '_op_type': 'delete', + '_index': 'index-name', + '_type': 'document', + '_id': 42, +} +{ + '_op_type': 'update', + '_index': 'index-name', + '_type': 'document', + '_id': 42, + 'doc': {'question': 'The life, universe and everything.'} +} +---------------------------- + + +[discrete] +[[scan]] +=== Scan + +Simple abstraction on top of the `scroll()` API - a simple iterator that yields +all hits as returned by underlining scroll requests. + +By default scan does not return results in any pre-determined order. To have a +standard order in the returned documents (either by score or explicit sort +definition) when scrolling, use `preserve_order=True`. This may be an expensive +operation and will negate the performance benefits of using `scan`. + + +[source,py] +---------------------------- +scan(es, + query={"query": {"match": {"title": "python"}}}, + index="orders-*", + doc_type="books" +) +---------------------------- \ No newline at end of file diff --git a/docs/guide/index.asciidoc b/docs/guide/index.asciidoc index ac154d2f..efb052b9 100644 --- a/docs/guide/index.asciidoc +++ b/docs/guide/index.asciidoc @@ -14,4 +14,6 @@ include::configuration.asciidoc[] include::integrations.asciidoc[] -include::examples.asciidoc[] \ No newline at end of file +include::examples.asciidoc[] + +include::helpers.asciidoc[] \ No newline at end of file