Removed Elastic Doc Reference

Signed-off-by: Shephali Mittal <shephalm@amazon.com>
This commit is contained in:
Shephali Mittal
2021-08-06 17:16:53 +05:30
committed by shephali mittal
parent 07c21725e8
commit 064213a9ed
48 changed files with 1 additions and 2608 deletions
-12
View File
@@ -1,12 +0,0 @@
[[config]]
== Configuration
This page contains information about the most important configuration options of
the Python {es} client.
* <<connection-pool>>
* <<connection-selector>>
include::connection-pool.asciidoc[]
include::connection-selector.asciidoc[]
-68
View File
@@ -1,68 +0,0 @@
[[connecting]]
== Connecting
This page contains the information you need to connect the Client with {es}.
[discrete]
[[authentication]]
=== Authentication
This section contains code snippets to show you how to connect to various {es}
providers.
[discrete]
[[auth-ec]]
==== Elastic Cloud
Cloud ID is an easy way to configure your client to work with your Elastic Cloud
deployment. Combine the `cloud_id` with either `http_auth` or `api_key` to
authenticate with your Elastic Cloud deployment.
Using `cloud_id` enables TLS verification and HTTP compression by default and
sets the port to 443 unless otherwise overwritten via the port parameter or the
port value encoded within `cloud_id`. Using Cloud ID also disables sniffing.
[source,py]
----------------------------
from elasticsearch import Elasticsearch
es = Elasticsearch(
cloud_id=”cluster-1:dXMa5Fx...”
)
----------------------------
[discrete]
[[auth-http]]
==== HTTP Authentication
HTTP authentication uses the `http_auth` parameter by passing in a username and
password within a tuple:
[source,py]
----------------------------
from elasticsearch import Elasticsearch
es = Elasticsearch(
http_auth=(“username”, “password”)
)
----------------------------
[discrete]
[[auth-apikey]]
==== ApiKey authentication
You can configure the client to use {es}'s API Key for connecting to your
cluster.
[source,py]
----------------------------
from elasticsearch import Elasticsearch
es = Elasticsearch(
api_key=(“api_key_id”, “api_key_secret”)
)
----------------------------
-18
View File
@@ -1,18 +0,0 @@
[[connection-pool]]
=== Connection pool
Connection pool is a container that holds the `Connection` instances, manages
the selection process (via a `ConnectionSelector`) and dead connections.
Initially connections are stored in the class as a list and along with the
connection options get passed to the `ConnectionSelector` instance for future
reference.
Upon each request, the `Transport` asks for a `Connection` via the
`get_connection` method. If the connection fails, it is marked as dead (via
`mark_dead`) and put on a timeout. When the timeout is over the connection is
resurrected and returned to the live pool. A connection that has been previously
marked as dead and then succeeds is marked as live (its fail count is deleted).
For reference information, refer to the
https://elasticsearch-py.readthedocs.io/en/latest/connection.html#connection-pool[full {es} Python documentation].
-20
View File
@@ -1,20 +0,0 @@
[[connection-selector]]
=== Connection selector
Connection selector is a simple class used to select a connection from a list of
currently live connection instances. Initially, it is passed a dictionary
containing all the connections options which it can then use during the
selection process. When the _select_ method is called it is given a list of
currently live connections to choose from.
The options dictionary is passed to `Transport` as the hosts parameter and the
same is used to construct the connection object itself. When the connection was
created based on information retrieved from the cluster via the sniffing
process, it is the dictionary returned by the `host_info_callback`.
Example of where this might be useful is a zone-aware selector that would only
select connections from its own zones and only fall back to other connections
where there would be none in its zones.
For reference information, refer to the
https://elasticsearch-py.readthedocs.io/en/latest/connection.html#connection-selector[full {es} Python documentation].
-110
View File
@@ -1,110 +0,0 @@
[[examples]]
== Examples
Below you can find examples of how to use the most frequently called APIs with
the Python client.
* <<ex-index>>
* <<ex-get>>
* <<ex-refresh>>
* <<ex-search>>
* <<ex-update>>
* <<ex-delete>>
[discrete]
[[ex-index]]
=== Indexing a document
To index a document, you need to specify three pieces of information: `index`,
`id`, and a `body`:
[source,py]
----------------------------
from datetime import datetime
from elasticsearch import Elasticsearch
es = Elasticsearch()
doc = {
'author': 'author_name',
'text': 'Interesting content...',
'timestamp': datetime.now(),
}
res = es.index(index="test-index", id=1, body=doc)
print(res['result'])
----------------------------
[discrete]
[[ex-get]]
=== Getting a document
To get a document, you need to specify its `index` and `id`:
[source,py]
----------------------------
res = es.get(index="test-index", id=1)
print(res['_source'])
----------------------------
[discrete]
[[ex-refresh]]
=== Refreshing an index
You can perform the refresh operation on an index:
[source,py]
----------------------------
es.indices.refresh(index="test-index")
----------------------------
[discrete]
[[ex-search]]
=== Searching for a document
The `search()` method returns results that are matching a query:
[source,py]
----------------------------
res = es.search(index="test-index", body={"query": {"match_all": {}}})
print("Got %d Hits:" % res['hits']['total']['value'])
for hit in res['hits']['hits']:
print("%(timestamp)s %(author)s: %(text)s" % hit["_source"])
----------------------------
[discrete]
[[ex-update]]
=== Updating a document
To update a document, you need to specify three pieces of information: `index`,
`id`, and a `body`:
[source,py]
----------------------------
from datetime import datetime
from elasticsearch import Elasticsearch
es = Elasticsearch()
doc = {
'author': 'author_name',
'text': 'Interesting modified content...',
'timestamp': datetime.now(),
}
res = es.update(index="test-index", id=1, body=doc)
print(res['result'])
----------------------------
[discrete]
[[ex-delete]]
=== Deleting a document
You can delete a document by specifying its `index`, and `id` in the `delete()`
method:
[source,py]
----------------------------
es.delete(index="test-index", id=1)
----------------------------
-94
View File
@@ -1,94 +0,0 @@
[[client-helpers]]
== Client helpers
You can find here a collection of simple helper functions that abstract some
specifics of the raw API. For detailed examples, refer to
https://elasticsearch-py.readthedocs.io/en/stable/helpers.html[this page].
[discrete]
[[bulk-helpers]]
=== Bulk helpers
There are several helpers for the bulk API since its requirement for specific
formatting and other considerations can make it cumbersome if used directly.
All bulk helpers accept an instance of `{es}` class and an iterable `action`
(any iterable, can also be a generator, which is ideal in most cases since it
allows you to index large datasets without the need of loading them into
memory).
The items in the iterable `action` should be the documents we wish to index in
several formats. The most common one is the same as returned by `search()`, for
example:
[source,yml]
----------------------------
{
'_index': 'index-name',
'_type': 'document',
'_id': 42,
'_routing': 5,
'pipeline': 'my-ingest-pipeline',
'_source': {
"title": "Hello World!",
"body": "..."
}
}
----------------------------
Alternatively, if `_source` is not present, it pops all metadata fields from
the doc and use the rest as the document data:
[source,yml]
----------------------------
{
"_id": 42,
"_routing": 5,
"title": "Hello World!",
"body": "..."
}
----------------------------
The `bulk()` api accepts `index`, `create`, `delete`, and `update` actions. Use
the `_op_type` field to specify an action (`_op_type` defaults to `index`):
[source,yml]
----------------------------
{
'_op_type': 'delete',
'_index': 'index-name',
'_type': 'document',
'_id': 42,
}
{
'_op_type': 'update',
'_index': 'index-name',
'_type': 'document',
'_id': 42,
'doc': {'question': 'The life, universe and everything.'}
}
----------------------------
[discrete]
[[scan]]
=== Scan
Simple abstraction on top of the `scroll()` API - a simple iterator that yields
all hits as returned by underlining scroll requests.
By default scan does not return results in any pre-determined order. To have a
standard order in the returned documents (either by score or explicit sort
definition) when scrolling, use `preserve_order=True`. This may be an expensive
operation and will negate the performance benefits of using `scan`.
[source,py]
----------------------------
scan(es,
query={"query": {"match": {"title": "python"}}},
index="orders-*",
doc_type="books"
)
----------------------------
-21
View File
@@ -1,21 +0,0 @@
= elasticsearch-py
:doctype: book
include::{asciidoc-dir}/../../shared/attributes.asciidoc[]
include::overview.asciidoc[]
include::installation.asciidoc[]
include::connecting.asciidoc[]
include::configuration.asciidoc[]
include::integrations.asciidoc[]
include::examples.asciidoc[]
include::helpers.asciidoc[]
include::release-notes.asciidoc[]
-20
View File
@@ -1,20 +0,0 @@
[[installation]]
== Installation
The Python client for {es} can be installed with pip:
[source,sh]
-------------------------------------
$ python -m pip install elasticsearch
-------------------------------------
If your application uses async/await in Python you can install with the `async`
extra:
[source,sh]
--------------------------------------------
$ python -m pip install elasticsearch[async]
--------------------------------------------
Read more about
https://elasticsearch-py.readthedocs.io/en/master/async.html[how to use Asyncio with this project].
-27
View File
@@ -1,27 +0,0 @@
[[integrations]]
== Integrations
You can find integration options and information on this page.
[discrete]
[[transport]]
=== Transport
The `Transport` class is a subclass of the
https://elasticsearch-py.readthedocs.io/en/latest/connection.html[Connection Layer API]
that contains all the classes that are responsible for handling the connection
to the {es} cluster.
The `Transport` class is an encapsulation of the transport-related logic of the
Python client. For the exhaustive list of parameters, refer to the
https://elasticsearch-py.readthedocs.io/en/latest/connection.html#transport[documentation].
[discrete]
[[transport-classes]]
==== Transport classes
The `Transport` classes can be used to maintain connection with an {es} cluster.
For the reference information of these classes, refer to the
https://elasticsearch-py.readthedocs.io/en/latest/transports.html[documentation].
-121
View File
@@ -1,121 +0,0 @@
[[overview]]
== Overview
This is the official low-level Python client for {es}. Its goal is to provide
common ground for all {es}-related code in Python. For this reason, the client
is designed to be unopinionated and extendable. Full documentation is available
on https://elasticsearch-py.readthedocs.io[Read the Docs].
[discrete]
=== Compatibility
Current development happens in the master branch.
The library is compatible with all Elasticsearch versions since `0.90.x` but you
**have to use a matching major version**:
For **Elasticsearch 7.0** and later, use the major version 7 (`7.x.y`) of the
library.
For **Elasticsearch 6.0** and later, use the major version 6 (`6.x.y`) of the
library.
For **Elasticsearch 5.0** and later, use the major version 5 (`5.x.y`) of the
library.
For **Elasticsearch 2.0** and later, use the major version 2 (`2.x.y`) of the
library, and so on.
The recommended way to set your requirements in your `setup.py` or
`requirements.txt` is::
# Elasticsearch 7.x
elasticsearch>=7,<8
# Elasticsearch 6.x
elasticsearch>=6,<7
# Elasticsearch 5.x
elasticsearch>=5,<6
# Elasticsearch 2.x
elasticsearch>=2,<3
If you have a need to have multiple versions installed at the same time older
versions are also released as `elasticsearch2` and `elasticsearch5`.
[discrete]
=== Example use
Simple use-case:
[source,python]
------------------------------------
>>> from datetime import datetime
>>> from elasticsearch import Elasticsearch
# By default we connect to localhost:9200
>>> es = Elasticsearch()
# Datetimes will be serialized...
>>> es.index(index="my-index-000001", doc_type="test-type", id=42, body={"any": "data", "timestamp": datetime.now()})
{'_id': '42', '_index': 'my-index-000001', '_type': 'test-type', '_version': 1, 'ok': True}
# ...but not deserialized
>>> es.get(index="my-index-000001", doc_type="test-type", id=42)['_source']
{'any': 'data', 'timestamp': '2013-05-12T19:45:31.804229'}
------------------------------------
[NOTE]
All the API calls map the raw REST API as closely as possible, including
the distinction between required and optional arguments to the calls. This
means that the code makes distinction between positional and keyword arguments;
we, however, recommend that people use keyword arguments for all calls for
consistency and safety.
[discrete]
=== Features
The client's features include:
* Translating basic Python data types to and from JSON
* Configurable automatic discovery of cluster nodes
* Persistent connections
* Load balancing (with pluggable selection strategy) across all available nodes
* Failed connection penalization (time based - failed connections won't be
retried until a timeout is reached)
* Thread safety
* Pluggable architecture
The client also contains a convenient set of
https://elasticsearch-py.readthedocs.org/en/master/helpers.html[helpers] for
some of the more engaging tasks like bulk indexing and reindexing.
[discrete]
=== Elasticsearch DSL
For a more high level client library with more limited scope, have a look at
https://elasticsearch-dsl.readthedocs.org/[elasticsearch-dsl] - a more Pythonic library
sitting on top of `elasticsearch-py`.
It provides a more convenient and idiomatic way to write and manipulate
https://elasticsearch-dsl.readthedocs.org/en/latest/search_dsl.html[queries]. It
stays close to the Elasticsearch JSON DSL, mirroring its terminology and
structure while exposing the whole range of the DSL from Python either directly
using defined classes or a queryset-like expressions.
It also provides an optional
https://elasticsearch-dsl.readthedocs.org/en/latest/persistence.html#doctype[persistence
layer] for working with documents as Python objects in an ORM-like fashion:
defining mappings, retrieving and saving documents, wrapping the document data
in user-defined classes.
-177
View File
@@ -1,177 +0,0 @@
# Makefile for Sphinx documentation
#
# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
PAPER =
BUILDDIR = _build
# User-friendly check for sphinx-build
ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1)
$(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/)
endif
# Internal variables.
PAPEROPT_a4 = -D latex_paper_size=a4
PAPEROPT_letter = -D latex_paper_size=letter
ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
# the i18n builder cannot share the environment and doctrees with the others
I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext
help:
@echo "Please use \`make <target>' where <target> is one of"
@echo " html to make standalone HTML files"
@echo " dirhtml to make HTML files named index.html in directories"
@echo " singlehtml to make a single large HTML file"
@echo " pickle to make pickle files"
@echo " json to make JSON files"
@echo " htmlhelp to make HTML files and a HTML help project"
@echo " qthelp to make HTML files and a qthelp project"
@echo " devhelp to make HTML files and a Devhelp project"
@echo " epub to make an epub"
@echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
@echo " latexpdf to make LaTeX files and run them through pdflatex"
@echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx"
@echo " text to make text files"
@echo " man to make manual pages"
@echo " texinfo to make Texinfo files"
@echo " info to make Texinfo files and run them through makeinfo"
@echo " gettext to make PO message catalogs"
@echo " changes to make an overview of all changed/added/deprecated items"
@echo " xml to make Docutils-native XML files"
@echo " pseudoxml to make pseudoxml-XML files for display purposes"
@echo " linkcheck to check all external links for integrity"
@echo " doctest to run all doctests embedded in the documentation (if enabled)"
clean:
rm -rf $(BUILDDIR)/*
html:
$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
dirhtml:
$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
singlehtml:
$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
@echo
@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
pickle:
$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
@echo
@echo "Build finished; now you can process the pickle files."
json:
$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
@echo
@echo "Build finished; now you can process the JSON files."
htmlhelp:
$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
@echo
@echo "Build finished; now you can run HTML Help Workshop with the" \
".hhp project file in $(BUILDDIR)/htmlhelp."
qthelp:
$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
@echo
@echo "Build finished; now you can run "qcollectiongenerator" with the" \
".qhcp project file in $(BUILDDIR)/qthelp, like this:"
@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/Elasticsearch.qhcp"
@echo "To view the help file:"
@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/Elasticsearch.qhc"
devhelp:
$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
@echo
@echo "Build finished."
@echo "To view the help file:"
@echo "# mkdir -p $$HOME/.local/share/devhelp/Elasticsearch"
@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/Elasticsearch"
@echo "# devhelp"
epub:
$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
@echo
@echo "Build finished. The epub file is in $(BUILDDIR)/epub."
latex:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo
@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
@echo "Run \`make' in that directory to run these through (pdf)latex" \
"(use \`make latexpdf' here to do that automatically)."
latexpdf:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo "Running LaTeX files through pdflatex..."
$(MAKE) -C $(BUILDDIR)/latex all-pdf
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
latexpdfja:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo "Running LaTeX files through platex and dvipdfmx..."
$(MAKE) -C $(BUILDDIR)/latex all-pdf-ja
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
text:
$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
@echo
@echo "Build finished. The text files are in $(BUILDDIR)/text."
man:
$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
@echo
@echo "Build finished. The manual pages are in $(BUILDDIR)/man."
texinfo:
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
@echo
@echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
@echo "Run \`make' in that directory to run these through makeinfo" \
"(use \`make info' here to do that automatically)."
info:
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
@echo "Running Texinfo files through makeinfo..."
make -C $(BUILDDIR)/texinfo info
@echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."
gettext:
$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
@echo
@echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."
changes:
$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
@echo
@echo "The overview file is in $(BUILDDIR)/changes."
linkcheck:
$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
@echo
@echo "Link check complete; look for any errors in the above output " \
"or in $(BUILDDIR)/linkcheck/output.txt."
doctest:
$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
@echo "Testing of doctests in the sources finished, look at the " \
"results in $(BUILDDIR)/doctest/output.txt."
xml:
$(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml
@echo
@echo "Build finished. The XML files are in $(BUILDDIR)/xml."
pseudoxml:
$(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml
@echo
@echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml."
-302
View File
@@ -1,302 +0,0 @@
.. _api:
API Documentation
=================
All the API calls map the raw REST api as closely as possible, including the
distinction between required and optional arguments to the calls. This means
that the code makes distinction between positional and keyword arguments; we,
however, recommend that people **use keyword arguments for all calls for
consistency and safety**.
.. note::
for compatibility with the Python ecosystem we use ``from_`` instead of
``from`` and ``doc_type`` instead of ``type`` as parameter names.
Global Options
--------------
Some parameters are added by the client itself and can be used in all API
calls.
Ignore
~~~~~~
An API call is considered successful (and will return a response) if
elasticsearch returns a 2XX response. Otherwise an instance of
:class:`~elasticsearch.TransportError` (or a more specific subclass) will be
raised. You can see other exception and error states in :ref:`exceptions`. If
you do not wish an exception to be raised you can always pass in an ``ignore``
parameter with either a single status code that should be ignored or a list of
them:
.. code-block:: python
from elasticsearch import Elasticsearch
es = Elasticsearch()
# ignore 400 cause by IndexAlreadyExistsException when creating an index
es.indices.create(index='test-index', ignore=400)
# ignore 404 and 400
es.indices.delete(index='test-index', ignore=[400, 404])
Timeout
~~~~~~~
Global timeout can be set when constructing the client (see
:class:`~elasticsearch.Connection`'s ``timeout`` parameter) or on a per-request
basis using ``request_timeout`` (float value in seconds) as part of any API
call, this value will get passed to the ``perform_request`` method of the
connection class:
.. code-block:: python
# only wait for 1 second, regardless of the client's default
es.cluster.health(wait_for_status='yellow', request_timeout=1)
.. note::
Some API calls also accept a ``timeout`` parameter that is passed to
Elasticsearch server. This timeout is internal and doesn't guarantee that the
request will end in the specified time.
Tracking Requests with Opaque ID
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can enrich your requests against Elasticsearch with an identifier string, that allows you to discover this identifier
in `deprecation logs <https://www.elastic.co/guide/en/elasticsearch/reference/7.4/logging.html#deprecation-logging>`_, to support you with
`identifying search slow log origin <https://www.elastic.co/guide/en/elasticsearch/reference/7.4/index-modules-slowlog.html#_identifying_search_slow_log_origin>`_
or to help with `identifying running tasks <https://www.elastic.co/guide/en/elasticsearch/reference/current/tasks.html#_identifying_running_tasks>`_.
.. code-block:: python
from elasticsearch import Elasticsearch
client = Elasticsearch()
# You can apply X-Opaque-Id in any API request via 'opaque_id':
resp = client.get(index="test", id="1", opaque_id="request-1")
.. py:module:: elasticsearch
Response Filtering
~~~~~~~~~~~~~~~~~~
The ``filter_path`` parameter is used to reduce the response returned by
elasticsearch. For example, to only return ``_id`` and ``_type``, do:
.. code-block:: python
es.search(index='test-index', filter_path=['hits.hits._id', 'hits.hits._type'])
It also supports the ``*`` wildcard character to match any field or part of a
field's name:
.. code-block:: python
es.search(index='test-index', filter_path=['hits.hits._*'])
Elasticsearch
-------------
.. autoclass:: Elasticsearch
:members:
.. py:module:: elasticsearch.client
Async Search
------------
.. autoclass:: AsyncSearchClient
:members:
Autoscaling
-----------
.. autoclass:: AutoscalingClient
:members:
Cat
---
.. autoclass:: CatClient
:members:
Cross-Cluster Replication (CCR)
-------------------------------
.. autoclass:: CcrClient
:members:
Cluster
-------
.. autoclass:: ClusterClient
:members:
Dangling Indices
----------------
.. autoclass:: DanglingIndicesClient
:members:
Enrich Policies
---------------
.. autoclass:: EnrichClient
:members:
Event Query Language (EQL)
--------------------------
.. autoclass:: EqlClient
:members:
Snapshottable Features
----------------------
.. autoclass:: FeaturesClient
:members:
Fleet
-----
.. autoclass:: FleetClient
:members:
Graph Explore
-------------
.. autoclass:: GraphClient
:members:
Index Lifecycle Management (ILM)
--------------------------------
.. autoclass:: IlmClient
:members:
Indices
-------
.. autoclass:: IndicesClient
:members:
Ingest Pipelines
----------------
.. autoclass:: IngestClient
:members:
License
-------
.. autoclass:: LicenseClient
:members:
Logstash
--------
.. autoclass:: LogstashClient
:members:
Migration
---------
.. autoclass:: MigrationClient
:members:
Machine Learning (ML)
---------------------
.. autoclass:: MlClient
:members:
Monitoring
----------
.. autoclass:: MonitoringClient
:members:
Nodes
-----
.. autoclass:: NodesClient
:members:
Rollup Indices
--------------
.. autoclass:: RollupClient
:members:
Searchable Snapshots
--------------------
.. autoclass:: SearchableSnapshotsClient
:members:
Security
--------
.. autoclass:: SecurityClient
:members:
Shutdown
--------
.. autoclass:: ShutdownClient
:members:
Snapshot Lifecycle Management (SLM)
-----------------------------------
.. autoclass:: SlmClient
:members:
Snapshots
---------
.. autoclass:: SnapshotClient
:members:
SQL
---
.. autoclass:: SqlClient
:members:
TLS/SSL
-------
.. autoclass:: SslClient
:members:
Tasks
-----
.. autoclass:: TasksClient
:members:
Text Structure
--------------
.. autoclass:: TextStructureClient
:members:
Transforms
----------
.. autoclass:: TransformClient
:members:
Watcher
-------
.. autoclass:: WatcherClient
:members:
-243
View File
@@ -1,243 +0,0 @@
Using Asyncio with Elasticsearch
================================
.. py:module:: elasticsearch
Starting in ``elasticsearch-py`` v7.8.0 for Python 3.6+ the ``elasticsearch`` package supports async/await with
`Asyncio <https://docs.python.org/3/library/asyncio.html>`_ and `Aiohttp <https://docs.aiohttp.org>`_.
You can either install ``aiohttp`` directly or use the ``[async]`` extra:
.. code-block:: bash
$ python -m pip install elasticsearch>=7.8.0 aiohttp
# - OR -
$ python -m pip install elasticsearch[async]>=7.8.0
.. note::
Async functionality is a new feature of this library in v7.8.0+ so
`please open an issue <https://github.com/elastic/elasticsearch-py/issues>`_
if you find an issue or have a question about async support.
Getting Started with Async
--------------------------
After installation all async API endpoints are available via :class:`~elasticsearch.AsyncElasticsearch`
and are used in the same way as other APIs, just with an extra ``await``:
.. code-block:: python
import asyncio
from elasticsearch import AsyncElasticsearch
es = AsyncElasticsearch()
async def main():
resp = await es.search(
index="documents",
body={"query": {"match_all": {}}},
size=20,
)
print(resp)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
All APIs that are available under the sync client are also available under the async client.
ASGI Applications and Elastic APM
---------------------------------
`ASGI <https://asgi.readthedocs.io>`_ (Asynchronous Server Gateway Interface) is a new way to
serve Python web applications making use of async I/O to achieve better performance.
Some examples of ASGI frameworks include FastAPI, Django 3.0+, and Starlette.
If you're using one of these frameworks along with Elasticsearch then you
should be using :py:class:`~elasticsearch.AsyncElasticsearch` to avoid blocking
the event loop with synchronous network calls for optimal performance.
`Elastic APM <https://www.elastic.co/guide/en/apm/agent/python/current/index.html>`_
also supports tracing of async Elasticsearch queries just the same as
synchronous queries. For an example on how to configure ``AsyncElasticsearch`` with
a popular ASGI framework `FastAPI <https://fastapi.tiangolo.com/>`_ and APM tracing
there is a `pre-built example <https://github.com/elastic/elasticsearch-py/tree/master/examples/fastapi-apm>`_
in the ``examples/fastapi-apm`` directory.
Frequently Asked Questions
--------------------------
NameError / ImportError when importing ``AsyncElasticsearch``?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If when trying to use ``AsyncElasticsearch`` and you're receiving a ``NameError`` or ``ImportError``
you should ensure that you're running Python 3.6+ (check with ``$ python --version``) and
that you have ``aiohttp`` installed in your environment (check with ``$ python -m pip freeze | grep aiohttp``).
If either of the above conditions is not met then async support won't be available.
What about the ``elasticsearch-async`` package?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Previously asyncio was supported separately via the `elasticsearch-async <https://github.com/elastic/elasticsearch-py-async>`_
package. The ``elasticsearch-async`` package has been deprecated in favor of
``AsyncElasticsearch`` provided by the ``elasticsearch`` package
in v7.8 and onwards.
Receiving 'Unclosed client session / connector' warning?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This warning is created by ``aiohttp`` when an open HTTP connection is
garbage collected. You'll typically run into this when closing your application.
To resolve the issue ensure that :meth:`~elasticsearch.AsyncElasticsearch.close`
is called before the :py:class:`~elasticsearch.AsyncElasticsearch` instance is garbage collected.
For example if using FastAPI that might look like this:
.. code-block:: python
from fastapi import FastAPI
from elasticsearch import AsyncElasticsearch
app = FastAPI()
es = AsyncElasticsearch()
# This gets called once the app is shutting down.
@app.on_event("shutdown")
async def app_shutdown():
await es.close()
Async Helpers
-------------
Async variants of all helpers are available in ``elasticsearch.helpers``
and are all prefixed with ``async_*``. You'll notice that these APIs
are identical to the ones in the sync :ref:`helpers` documentation.
All async helpers that accept an iterator or generator also accept async iterators
and async generators.
.. py:module:: elasticsearch.helpers
Bulk and Streaming Bulk
~~~~~~~~~~~~~~~~~~~~~~~
.. autofunction:: async_bulk
.. code-block:: python
import asyncio
from elasticsearch import AsyncElasticsearch
from elasticsearch.helpers import async_bulk
es = AsyncElasticsearch()
async def gendata():
mywords = ['foo', 'bar', 'baz']
for word in mywords:
yield {
"_index": "mywords",
"doc": {"word": word},
}
async def main():
await async_bulk(es, gendata())
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
.. autofunction:: async_streaming_bulk
.. code-block:: python
import asyncio
from elasticsearch import AsyncElasticsearch
from elasticsearch.helpers import async_streaming_bulk
es = AsyncElasticsearch()
async def gendata():
mywords = ['foo', 'bar', 'baz']
for word in mywords:
yield {
"_index": "mywords",
"word": word,
}
async def main():
async for ok, result in async_streaming_bulk(es, gendata()):
action, result = result.popitem()
if not ok:
print("failed to %s document %s" % ())
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Scan
~~~~
.. autofunction:: async_scan
.. code-block:: python
import asyncio
from elasticsearch import AsyncElasticsearch
from elasticsearch.helpers import async_scan
es = AsyncElasticsearch()
async def main():
async for doc in async_scan(
client=es,
query={"query": {"match": {"title": "python"}}},
index="orders-*"
):
print(doc)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Reindex
~~~~~~~
.. autofunction:: async_reindex
API Reference
-------------
.. py:module:: elasticsearch
The API of :class:`~elasticsearch.AsyncElasticsearch` is nearly identical
to the API of :class:`~elasticsearch.Elasticsearch` with the exception that
every API call like :py:func:`~elasticsearch.AsyncElasticsearch.search` is
an ``async`` function and requires an ``await`` to properly return the response
body.
AsyncElasticsearch
~~~~~~~~~~~~~~~~~~
.. note::
To reference Elasticsearch APIs that are namespaced like ``.indices.create()``
refer to the sync API reference. These APIs are identical between sync and async.
.. autoclass:: AsyncElasticsearch
:members:
AsyncTransport
~~~~~~~~~~~~~~
.. autoclass:: AsyncTransport
:members:
AsyncConnection
~~~~~~~~~~~~~~~~~
.. autoclass:: AsyncConnection
:members:
AIOHttpConnection
~~~~~~~~~~~~~~~~~
.. autoclass:: AIOHttpConnection
:members:
-272
View File
@@ -1,272 +0,0 @@
# -*- coding: utf-8 -*-
# SPDX-License-Identifier: Apache-2.0
#
# The OpenSearch Contributors require contributions made to
# this file be licensed under the Apache-2.0 license or a
# compatible open source license.
#
# Modifications Copyright OpenSearch Contributors. See
# GitHub history for details.
#
# Licensed to Elasticsearch B.V under one or more agreements.
# Elasticsearch B.V licenses this file to you under the Apache 2.0 License.
# See the LICENSE file in the project root for more information
import os
import datetime
import elasticsearch
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
# sys.path.insert(0, os.path.abspath('.'))
# -- General configuration -----------------------------------------------------
# If your documentation needs a minimal Sphinx version, state it here.
# needs_sphinx = '1.0'
# Add any Sphinx extension module names here, as strings. They can be extensions
# coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
extensions = ["sphinx.ext.autodoc", "sphinx.ext.doctest"]
autoclass_content = "both"
# Add any paths that contain templates here, relative to this directory.
templates_path = ["_templates"]
# The suffix of source filenames.
source_suffix = ".rst"
# The encoding of source files.
# source_encoding = 'utf-8-sig'
# The master toctree document.
master_doc = "index"
# General information about the project.
project = u"Elasticsearch"
copyright = u"%d, Elasticsearch B.V" % datetime.date.today().year
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
# The short X.Y version.
version = elasticsearch.__versionstr__
# The full version, including alpha/beta/rc tags.
release = version
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
# language = None
# There are two options for replacing |today|: either, you set today to some
# non-false value, then it is used:
# today = ''
# Else, today_fmt is used as the format for a strftime call.
# today_fmt = '%B %d, %Y'
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
exclude_patterns = ["_build"]
# The reST default role (used for this markup: `text`) to use for all documents.
# default_role = None
# If true, '()' will be appended to :func: etc. cross-reference text.
# add_function_parentheses = True
# If true, the current module name will be prepended to all description
# unit titles (such as .. function::).
# add_module_names = True
# If true, sectionauthor and moduleauthor directives will be shown in the
# output. They are ignored by default.
# show_authors = False
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = "sphinx"
# A list of ignored prefixes for module index sorting.
# modindex_common_prefix = []
# If true, keep warnings as "system message" paragraphs in the built documents.
# keep_warnings = False
# -- Options for HTML output ---------------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
on_rtd = os.environ.get("READTHEDOCS", None) == "True"
if not on_rtd: # only import and set the theme if we're building docs locally
import sphinx_rtd_theme
html_theme = "sphinx_rtd_theme"
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
# html_theme_options = {}
# Add any paths that contain custom themes here, relative to this directory.
# html_theme_path = []
# The name for this set of Sphinx documents. If None, it defaults to
# "<project> v<release> documentation".
# html_title = None
# A shorter title for the navigation bar. Default is the same as html_title.
# html_short_title = None
# The name of an image file (relative to this directory) to place at the top
# of the sidebar.
# html_logo = None
# The name of an image file (within the static path) to use as favicon of the
# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
# pixels large.
# html_favicon = None
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
# html_static_path = []
# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
# using the given strftime format.
# html_last_updated_fmt = '%b %d, %Y'
# If true, SmartyPants will be used to convert quotes and dashes to
# typographically correct entities.
# html_use_smartypants = True
# Custom sidebar templates, maps document names to template names.
# html_sidebars = {}
# Additional templates that should be rendered to pages, maps page names to
# template names.
# html_additional_pages = {}
# If false, no module index is generated.
# html_domain_indices = True
# If false, no index is generated.
# html_use_index = True
# If true, the index is split into individual pages for each letter.
# html_split_index = False
# If true, links to the reST sources are added to the pages.
# html_show_sourcelink = True
# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
# html_show_sphinx = True
# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
# html_show_copyright = True
# If true, an OpenSearch description file will be output, and all pages will
# contain a <link> tag referring to it. The value of this option must be the
# base URL from which the finished HTML is served.
# html_use_opensearch = ''
# This is the file name suffix for HTML files (e.g. ".xhtml").
# html_file_suffix = None
# Output file base name for HTML help builder.
htmlhelp_basename = "Elasticsearchdoc"
# -- Options for LaTeX output --------------------------------------------------
latex_elements = {
# The paper size ('letterpaper' or 'a4paper').
# 'papersize': 'letterpaper',
# The font size ('10pt', '11pt' or '12pt').
# 'pointsize': '10pt',
# Additional stuff for the LaTeX preamble.
# 'preamble': '',
}
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title, author, documentclass [howto/manual]).
latex_documents = [
(
"index",
"Elasticsearch.tex",
u"Elasticsearch Documentation",
u"Honza Král",
"manual",
)
]
# The name of an image file (relative to this directory) to place at the top of
# the title page.
# latex_logo = None
# For "manual" documents, if this is true, then toplevel headings are parts,
# not chapters.
# latex_use_parts = False
# If true, show page references after internal links.
# latex_show_pagerefs = False
# If true, show URL addresses after external links.
# latex_show_urls = False
# Documents to append as an appendix to all manuals.
# latex_appendices = []
# If false, no module index is generated.
# latex_domain_indices = True
# -- Options for manual page output --------------------------------------------
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [
("index", "elasticsearch-py", u"Elasticsearch Documentation", [u"Honza Král"], 1)
]
# If true, show URL addresses after external links.
# man_show_urls = False
# -- Options for Texinfo output ------------------------------------------------
# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
(
"index",
"Elasticsearch",
u"Elasticsearch Documentation",
u"Honza Král",
"Elasticsearch",
"One line description of project.",
"Miscellaneous",
)
]
# Documents to append as an appendix to all manuals.
# texinfo_appendices = []
# If false, no module index is generated.
# texinfo_domain_indices = True
# How to display URL addresses: 'footnote', 'no', or 'inline'.
# texinfo_show_urls = 'footnote'
# If true, do not generate a @detailmenu in the "Top" node's menu.
# texinfo_no_detailmenu = False
-88
View File
@@ -1,88 +0,0 @@
.. _connection_api:
Connection Layer API
====================
All of the classes responsible for handling the connection to the Elasticsearch
cluster. The default subclasses used can be overridden by passing parameters to the
:class:`~elasticsearch.Elasticsearch` class. All of the arguments to the client
will be passed on to :class:`~elasticsearch.Transport`,
:class:`~elasticsearch.ConnectionPool` and :class:`~elasticsearch.Connection`.
For example if you wanted to use your own implementation of the
:class:`~elasticsearch.ConnectionSelector` class you can just pass in the
``selector_class`` parameter.
.. note::
:class:`~elasticsearch.ConnectionPool` and related options (like
``selector_class``) will only be used if more than one connection is defined.
Either directly or via the :ref:`sniffing` mechanism.
.. py:module:: elasticsearch
Transport
---------
.. autoclass:: Transport(hosts, connection_class=Urllib3HttpConnection, connection_pool_class=ConnectionPool, host_info_callback=construct_hosts_list, sniff_on_start=False, sniffer_timeout=None, sniff_on_connection_fail=False, serializer=JSONSerializer(), max_retries=3, ** kwargs)
:members:
Connection Pool
---------------
.. autoclass:: ConnectionPool(connections, dead_timeout=60, selector_class=RoundRobinSelector, randomize_hosts=True, ** kwargs)
:members:
Connection Selector
-------------------
.. autoclass:: ConnectionSelector(opts)
:members:
Urllib3HttpConnection (default connection_class)
------------------------------------------------
If you have complex SSL logic for connecting to Elasticsearch using an `SSLContext` object
might be more helpful. You can create one natively using the python SSL library with the
`create_default_context` (https://docs.python.org/3/library/ssl.html#ssl.create_default_context) method.
To create an `SSLContext` object you only need to use one of cafile, capath or cadata:
.. code-block:: python
>>> from ssl import create_default_context
>>> context = create_default_context(cafile=None, capath=None, cadata=None)
* `cafile` is the path to your CA File
* `capath` is the directory of a collection of CA's
* `cadata` is either an ASCII string of one or more PEM-encoded certificates or a bytes-like object of DER-encoded certificates.
Please note that the use of SSLContext is only available for urllib3.
.. autoclass:: Urllib3HttpConnection
:members:
API Compatibility HTTP Header
-----------------------------
The Python client can be configured to emit an HTTP header
``Accept: application/vnd.elasticsearch+json; compatible-with=7``
which signals to Elasticsearch that the client is requesting
``7.x`` version of request and response bodies. This allows for
upgrading from 7.x to 8.x version of Elasticsearch without upgrading
everything at once. Elasticsearch should be upgraded first after
the compatibility header is configured and clients should be upgraded
second.
.. code-block:: python
from elasticsearch import Elasticsearch
client = Elasticsearch("http://...", headers={"accept": "application/vnd.elasticsearch+json; compatible-with=7"})
If you'd like to have the client emit the header without configuring ``headers`` you
can use the environment variable ``ELASTIC_CLIENT_APIVERSIONING=1``.
-25
View File
@@ -1,25 +0,0 @@
.. _exceptions:
Exceptions
==========
.. py:module:: elasticsearch
.. autoclass:: ImproperlyConfigured
.. autoclass:: ElasticsearchException
.. autoclass:: SerializationError(ElasticsearchException)
.. autoclass:: TransportError(ElasticsearchException)
:members:
.. autoclass:: ConnectionError(TransportError)
.. autoclass:: ConnectionTimeout(ConnectionError)
.. autoclass:: SSLError(ConnectionError)
.. autoclass:: NotFoundError(TransportError)
.. autoclass:: ConflictError(TransportError)
.. autoclass:: RequestError(TransportError)
.. autoclass:: AuthenticationException(TransportError)
.. autoclass:: AuthorizationException(TransportError)
-137
View File
@@ -1,137 +0,0 @@
.. _helpers:
Helpers
=======
Collection of simple helper functions that abstract some specifics of the raw API.
Bulk helpers
------------
There are several helpers for the ``bulk`` API since its requirement for
specific formatting and other considerations can make it cumbersome if used directly.
All bulk helpers accept an instance of ``Elasticsearch`` class and an iterable
``actions`` (any iterable, can also be a generator, which is ideal in most
cases since it will allow you to index large datasets without the need of
loading them into memory).
The items in the ``action`` iterable should be the documents we wish to index
in several formats. The most common one is the same as returned by
:meth:`~elasticsearch.Elasticsearch.search`, for example:
.. code:: python
{
'_index': 'index-name',
'_type': 'document',
'_id': 42,
'_routing': 5,
'pipeline': 'my-ingest-pipeline',
'_source': {
"title": "Hello World!",
"body": "..."
}
}
Alternatively, if `_source` is not present, it will pop all metadata fields
from the doc and use the rest as the document data:
.. code:: python
{
"_id": 42,
"_routing": 5,
"title": "Hello World!",
"body": "..."
}
The :meth:`~elasticsearch.Elasticsearch.bulk` api accepts ``index``, ``create``,
``delete``, and ``update`` actions. Use the ``_op_type`` field to specify an
action (``_op_type`` defaults to ``index``):
.. code:: python
{
'_op_type': 'delete',
'_index': 'index-name',
'_type': 'document',
'_id': 42,
}
{
'_op_type': 'update',
'_index': 'index-name',
'_type': 'document',
'_id': 42,
'doc': {'question': 'The life, universe and everything.'}
}
Example:
~~~~~~~~
Lets say we have an iterable of data. Lets say a list of words called ``mywords``
and we want to index those words into individual documents where the structure of the
document is like ``{"word": "<myword>"}``.
.. code:: python
def gendata():
mywords = ['foo', 'bar', 'baz']
for word in mywords:
yield {
"_index": "mywords",
"word": word,
}
bulk(es, gendata())
For a more complete and complex example please take a look at
https://github.com/elastic/elasticsearch-py/blob/master/examples/bulk-ingest
The :meth:`~elasticsearch.Elasticsearch.parallel_bulk` api is a wrapper around the :meth:`~elasticsearch.Elasticsearch.bulk` api to provide threading. :meth:`~elasticsearch.Elasticsearch.parallel_bulk` returns a generator which must be consumed to produce results.
To see the results use:
.. code:: python
for success, info in parallel_bulk(...):
if not success:
print('A document failed:', info)
If you don't care about the results, you can use deque from collections:
.. code:: python
from collections import deque
deque(parallel_bulk(...), maxlen=0)
.. note::
When reading raw json strings from a file, you can also pass them in
directly (without decoding to dicts first). In that case, however, you lose
the ability to specify anything (index, type, even id) on a per-record
basis, all documents will just be sent to elasticsearch to be indexed
as-is.
.. py:module:: elasticsearch.helpers
.. autofunction:: streaming_bulk
.. autofunction:: parallel_bulk
.. autofunction:: bulk
Scan
----
.. autofunction:: scan
Reindex
-------
.. autofunction:: reindex
-451
View File
@@ -1,451 +0,0 @@
Python Elasticsearch Client
===========================
Official low-level client for Elasticsearch. Its goal is to provide common
ground for all Elasticsearch-related code in Python; because of this it tries
to be opinion-free and very extendable.
Installation
------------
Install the ``elasticsearch`` package with `pip
<https://pypi.org/project/elasticsearch>`_:
.. code-block:: console
$ python -m pip install elasticsearch
If your application uses async/await in Python you can install with
the ``async`` extra:
.. code-block:: console
$ python -m pip install elasticsearch[async]
Read more about `how to use asyncio with this project <async.html>`_.
Compatibility
-------------
The library is compatible with all Elasticsearch versions since ``0.90.x`` but you
**have to use a matching major version**:
For **Elasticsearch 7.0** and later, use the major version 7 (``7.x.y``) of the
library.
For **Elasticsearch 6.0** and later, use the major version 6 (``6.x.y``) of the
library.
For **Elasticsearch 5.0** and later, use the major version 5 (``5.x.y``) of the
library.
For **Elasticsearch 2.0** and later, use the major version 2 (``2.x.y``) of the
library, and so on.
The recommended way to set your requirements in your `setup.py` or
`requirements.txt` is:
.. code-block:: python
# Elasticsearch 7.x
elasticsearch>=7.0.0,<8.0.0
# Elasticsearch 6.x
elasticsearch>=6.0.0,<7.0.0
# Elasticsearch 5.x
elasticsearch>=5.0.0,<6.0.0
# Elasticsearch 2.x
elasticsearch>=2.0.0,<3.0.0
If you have a need to have multiple versions installed at the same time older
versions are also released as ``elasticsearch2``, ``elasticsearch5`` and ``elasticsearch6``.
Example Usage
-------------
.. code-block:: python
from datetime import datetime
from elasticsearch import Elasticsearch
es = Elasticsearch()
doc = {
'author': 'kimchy',
'text': 'Elasticsearch: cool. bonsai cool.',
'timestamp': datetime.now(),
}
res = es.index(index="test-index", id=1, body=doc)
print(res['result'])
res = es.get(index="test-index", id=1)
print(res['_source'])
es.indices.refresh(index="test-index")
res = es.search(index="test-index", body={"query": {"match_all": {}}})
print("Got %d Hits:" % res['hits']['total']['value'])
for hit in res['hits']['hits']:
print("%(timestamp)s %(author)s: %(text)s" % hit["_source"])
Features
--------
This client was designed as very thin wrapper around Elasticsearch's REST API to
allow for maximum flexibility. This means that there are no opinions in this
client; it also means that some of the APIs are a little cumbersome to use from
Python. We have created some :ref:`helpers` to help with this issue as well as
a more high level library (`elasticsearch-dsl`_) on top of this one to provide
a more convenient way of working with Elasticsearch.
.. _elasticsearch-dsl: https://elasticsearch-dsl.readthedocs.io/
Persistent Connections
~~~~~~~~~~~~~~~~~~~~~~
``elasticsearch-py`` uses persistent connections inside of individual connection
pools (one per each configured or sniffed node). Out of the box you can choose
between two ``http`` protocol implementations. See :ref:`transports` for more
information.
The transport layer will create an instance of the selected connection class
per node and keep track of the health of individual nodes - if a node becomes
unresponsive (throwing exceptions while connecting to it) it's put on a timeout
by the :class:`~elasticsearch.ConnectionPool` class and only returned to the
circulation after the timeout is over (or when no live nodes are left). By
default nodes are randomized before being passed into the pool and round-robin
strategy is used for load balancing.
You can customize this behavior by passing parameters to the
:ref:`connection_api` (all keyword arguments to the
:class:`~elasticsearch.Elasticsearch` class will be passed through). If what
you want to accomplish is not supported you should be able to create a subclass
of the relevant component and pass it in as a parameter to be used instead of
the default implementation.
Automatic Retries
~~~~~~~~~~~~~~~~~
If a connection to a node fails due to connection issues (raises
:class:`~elasticsearch.ConnectionError`) it is considered in faulty state. It
will be placed on hold for ``dead_timeout`` seconds and the request will be
retried on another node. If a connection fails multiple times in a row the
timeout will get progressively larger to avoid hitting a node that's, by all
indication, down. If no live connection is available, the connection that has
the smallest timeout will be used.
By default retries are not triggered by a timeout
(:class:`~elasticsearch.ConnectionTimeout`), set ``retry_on_timeout`` to
``True`` to also retry on timeouts.
.. _sniffing:
Sniffing
~~~~~~~~
The client can be configured to inspect the cluster state to get a list of
nodes upon startup, periodically and/or on failure. See
:class:`~elasticsearch.Transport` parameters for details.
Some example configurations:
.. code-block:: python
from elasticsearch import Elasticsearch
# by default we don't sniff, ever
es = Elasticsearch()
# you can specify to sniff on startup to inspect the cluster and load
# balance across all nodes
es = Elasticsearch(["seed1", "seed2"], sniff_on_start=True)
# you can also sniff periodically and/or after failure:
es = Elasticsearch(["seed1", "seed2"],
sniff_on_start=True,
sniff_on_connection_fail=True,
sniffer_timeout=60)
Thread safety
~~~~~~~~~~~~~
The client is thread safe and can be used in a multi threaded environment. Best
practice is to create a single global instance of the client and use it
throughout your application. If your application is long-running consider
turning on :ref:`sniffing` to make sure the client is up to date on the cluster
location.
By default we allow ``urllib3`` to open up to 10 connections to each node, if
your application calls for more parallelism, use the ``maxsize`` parameter to
raise the limit:
.. code-block:: python
# allow up to 25 connections to each node
es = Elasticsearch(["host1", "host2"], maxsize=25)
.. note::
Since we use persistent connections throughout the client it means that the
client doesn't tolerate ``fork`` very well. If your application calls for
multiple processes make sure you create a fresh client after call to
``fork``. Note that Python's ``multiprocessing`` module uses ``fork`` to
create new processes on POSIX systems.
TLS/SSL and Authentication
~~~~~~~~~~~~~~~~~~~~~~~~~~
You can configure the client to use ``SSL`` for connecting to your
elasticsearch cluster, including certificate verification and HTTP auth:
.. code-block:: python
from elasticsearch import Elasticsearch
# you can use RFC-1738 to specify the url
es = Elasticsearch(['https://user:secret@localhost:443'])
# ... or specify common parameters as kwargs
es = Elasticsearch(
['localhost', 'otherhost'],
http_auth=('user', 'secret'),
scheme="https",
port=443,
)
# SSL client authentication using client_cert and client_key
from ssl import create_default_context
context = create_default_context(cafile="path/to/cert.pem")
es = Elasticsearch(
['localhost', 'otherhost'],
http_auth=('user', 'secret'),
scheme="https",
port=443,
ssl_context=context,
)
.. warning::
``elasticsearch-py`` doesn't ship with default set of root certificates. To
have working SSL certificate validation you need to either specify your own
as ``cafile`` or ``capath`` or ``cadata`` or install `certifi`_ which will
be picked up automatically.
See class :class:`~elasticsearch.Urllib3HttpConnection` for detailed
description of the options.
.. _certifi: http://certifiio.readthedocs.io/en/latest/
Connecting via Cloud ID
~~~~~~~~~~~~~~~~~~~~~~~
Cloud ID is an easy way to configure your client to work
with your Elastic Cloud deployment. Combine the ``cloud_id``
with either ``http_auth`` or ``api_key`` to authenticate
with your Elastic Cloud deployment.
Using ``cloud_id`` enables TLS verification and HTTP compression by default
and sets the port to ``443`` unless otherwise overwritten via the ``port`` parameter
or the port value encoded within ``cloud_id``. Using Cloud ID also disables sniffing.
.. code-block:: python
from elasticsearch import Elasticsearch
es = Elasticsearch(
cloud_id="cluster-1:dXMa5Fx...",
http_auth=("elastic", "<password>"),
)
API Key Authentication
~~~~~~~~~~~~~~~~~~~~~~
You can configure the client to use Elasticsearch's `API Key`_ for connecting to your cluster.
Please note this authentication method has been introduced with release of Elasticsearch ``6.7.0``.
from elasticsearch import Elasticsearch
# you can use the api key tuple
es = Elasticsearch(
['node-1', 'node-2', 'node-3'],
api_key=('id', 'api_key'),
)
# or you pass the base 64 encoded token
es = Elasticsearch(
['node-1', 'node-2', 'node-3'],
api_key='base64encoded tuple',
)
.. _API Key: https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-create-api-key.html
Logging
~~~~~~~
``elasticsearch-py`` uses the standard `logging library`_ from python to define
two loggers: ``elasticsearch`` and ``elasticsearch.trace``. ``elasticsearch``
is used by the client to log standard activity, depending on the log level.
``elasticsearch.trace`` can be used to log requests to the server in the form
of ``curl`` commands using pretty-printed json that can then be executed from
command line. Because it is designed to be shared (for example to demonstrate
an issue) it also just uses ``localhost:9200`` as the address instead of the
actual address of the host. If the trace logger has not been configured
already it is set to `propagate=False` so it needs to be activated separately.
.. _logging library: http://docs.python.org/3/library/logging.html
Type Hints
~~~~~~~~~~
Starting in ``elasticsearch-py`` v7.10.0 the library now ships with `type hints`_
and supports basic static type analysis with tools like `Mypy`_ and `Pyright`_.
If we write a script that has a type error like using ``request_timeout`` with
a ``str`` argument instead of ``float`` and then run Mypy on the script:
.. code-block:: python
# script.py
from elasticsearch import Elasticsearch
es = Elasticsearch(...)
es.search(
index="test-index",
request_timeout="5" # type error!
)
# $ mypy script.py
# script.py:5: error: Argument "request_timeout" to "search" of "Elasticsearch" has
# incompatible type "str"; expected "Union[int, float, None]"
# Found 1 error in 1 file (checked 1 source file)
For now many parameter types for API methods aren't specific to
a type (ie they are of type ``typing.Any``) but in the future
they will be tightened for even better static type checking.
Type hints also allow tools like your IDE to check types and provide better
auto-complete functionality.
.. warning::
The type hints for API methods like ``search`` don't match the function signature
that can be found in the source code. Type hints represent optimal usage of the
API methods. Using keyword arguments is highly recommended so all optional parameters
and ``body`` are keyword-only in type hints.
JetBrains PyCharm will use the warning ``Unexpected argument`` to denote that the
parameter may be keyword-only.
.. _type hints: https://www.python.org/dev/peps/pep-0484
.. _mypy: http://mypy-lang.org
.. _pyright: https://github.com/microsoft/pyright
Environment considerations
--------------------------
When using the client there are several limitations of your environment that
could come into play.
When using an HTTP load balancer you cannot use the :ref:`sniffing`
functionality - the cluster would supply the client with IP addresses to
directly connect to the cluster, circumventing the load balancer. Depending on
your configuration this might be something you don't want or break completely.
Compression
~~~~~~~~~~~
When using capacity-constrained networks (low throughput), it may be handy to enable
compression. This is especially useful when doing bulk loads or inserting large
documents. This will configure compression.
.. code-block:: python
from elasticsearch import Elasticsearch
es = Elasticsearch(hosts, http_compress=True)
Compression is enabled by default when connecting to Elastic Cloud via ``cloud_id``.
Customization
-------------
Custom serializers
~~~~~~~~~~~~~~~~~~
By default, `JSONSerializer`_ is used to encode all outgoing requests.
However, you can implement your own custom serializer
.. code-block:: python
from elasticsearch.serializer import JSONSerializer
class SetEncoder(JSONSerializer):
def default(self, obj):
if isinstance(obj, set):
return list(obj)
if isinstance(obj, Something):
return 'CustomSomethingRepresentation'
return JSONSerializer.default(self, obj)
es = Elasticsearch(serializer=SetEncoder())
.. _JSONSerializer: https://github.com/elastic/elasticsearch-py/blob/master/elasticsearch/serializer.py#L24
Elasticsearch-DSL
-----------------
For a more high level client library with more limited scope, have a look at
`elasticsearch-dsl`_ - a more pythonic library sitting on top of
``elasticsearch-py``.
`elasticsearch-dsl`_ provides a more convenient and idiomatic way to write and manipulate
`queries`_ by mirroring the terminology and structure of Elasticsearch JSON DSL
while exposing the whole range of the DSL from Python
either directly using defined classes or a queryset-like expressions.
It also provides an optional `persistence layer`_ for working with documents as
Python objects in an ORM-like fashion: defining mappings, retrieving and saving
documents, wrapping the document data in user-defined classes.
.. _elasticsearch-dsl: https://elasticsearch-dsl.readthedocs.io/
.. _queries: https://elasticsearch-dsl.readthedocs.io/en/latest/search_dsl.html
.. _persistence layer: https://elasticsearch-dsl.readthedocs.io/en/latest/persistence.html#doctype
Contents
--------
.. toctree::
:maxdepth: 2
api
exceptions
async
connection
transports
helpers
Release Notes <https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/release-notes.html>
License
-------
Copyright 2021 Elasticsearch B.V. Licensed under the Apache License, Version 2.0.
Indices and tables
------------------
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
-42
View File
@@ -1,42 +0,0 @@
.. _transports:
Transport classes
=================
List of transport classes that can be used, simply import your choice and pass
it to the constructor of :class:`~elasticsearch.Elasticsearch` as
`connection_class`. Note that the
:class:`~elasticsearch.connection.RequestsHttpConnection` requires ``requests``
to be installed.
For example to use the ``requests``-based connection just import it and use it:
.. code-block:: python
from elasticsearch import Elasticsearch, RequestsHttpConnection
es = Elasticsearch(connection_class=RequestsHttpConnection)
The default connection class is based on ``urllib3`` which is more performant
and lightweight than the optional ``requests``-based class. Only use
``RequestsHttpConnection`` if you have need of any of ``requests`` advanced
features like custom auth plugins etc.
.. py:module:: elasticsearch.connection
Connection
----------
.. autoclass:: Connection
Urllib3HttpConnection
---------------------
.. autoclass:: Urllib3HttpConnection
RequestsHttpConnection
----------------------
.. autoclass:: RequestsHttpConnection