Skip to content

Commit adb45d4

Browse files
author
Aaron Gonzales
committed
documentation update
1 parent 07887ba commit adb45d4

File tree

6 files changed

+343
-184
lines changed

6 files changed

+343
-184
lines changed

README.rst

Lines changed: 130 additions & 79 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,29 @@
11
Python Twitter Search API
22
=========================
33

4-
This library serves as a Python interface to the various `Twitter
5-
premium and enterprise search
6-
APIs <https://developer.twitter.com/en/products/tweets/search>`__. It
7-
provides a command-line utility and a library usable from within a
8-
Python program. It comes with tools for assisting in dynamic generation
9-
of search rules and for parsing tweets.
10-
11-
Pretty docs can be seen
12-
`here <https://twitterdev.github.io/search-tweets-python/>`__.
4+
This project serves as a wrapper for the `Twitter premium and enterprise
5+
search
6+
APIs <https://developer.twitter.com/en/products/tweets/search>`__,
7+
providing a command-line utility and a Python library. Pretty docs can
8+
be seen `here <https://twitterdev.github.io/search-tweets-python/>`__.
139

1410
Features
1511
========
1612

1713
- Supports 30-day Search and Full Archive Search (not the standard
18-
Search API).
14+
Search API at this time).
1915
- Command-line utility is pipeable to other tools (e.g., ``jq``).
20-
- Automatically handles pagination of results with specifiable limits
16+
- Automatically handles pagination of search results with specifiable
17+
limits
2118
- Delivers a stream of data to the user for low in-memory requirements
22-
- Handles Enterprise and Premium authentication methods
19+
- Handles enterprise and premium authentication methods
2320
- Flexible usage within a python program
24-
- Compatible with our group's Tweet Parser for rapid extraction of
25-
relevant data fields from each tweet payload
21+
- Compatible with our group's `Tweet
22+
Parser <https://github.com/twitterdev/tweet_parser>`__ for rapid
23+
extraction of relevant data fields from each tweet payload
2624
- Supports the Search Counts endpoint, which can reduce API call usage
27-
and provide rapid insights if you only need volumes and not tweet
28-
payloads
25+
and provide rapid insights if you only need Tweet volumes and not
26+
Tweet payloads
2927

3028
Installation
3129
============
@@ -51,46 +49,81 @@ Credential Handling
5149

5250
The premium and enterprise Search APIs use different authentication
5351
methods and we attempt to provide a seamless way to handle
54-
authentication for all customers. We support both YAML-file based
55-
methods and environment variables for access.
52+
authentication for all customers.
53+
54+
Premium clients will require the ``bearer_token`` and ``endpoint``
55+
fields; Enterprise clients require ``username``, ``password``, and
56+
``endpoint``. If you do not specify the ``account_type``, we attempt to
57+
discern the account type and declare a warning about this behavior.
58+
59+
We support both YAML-file based methods and environment variables for
60+
access, and provide flexible handling with sensible defaults.
5661

57-
A YAML credential file should look like this:
62+
YAML method
63+
-----------
5864

59-
.. code:: .yaml
65+
For premium customers, the simplest credential file should look like
66+
this:
6067

61-
<key>:
62-
account_type: <OPTIONAL PREMIUM_OR_ENTERPRISE>
68+
.. code:: yaml
69+
70+
search_tweets_api:
71+
account_type: premium
6372
endpoint: <FULL_URL_OF_ENDPOINT>
64-
username: <USERNAME>
65-
password: <PW>
6673
bearer_token: <TOKEN>
6774
68-
Premium clients will require the ``bearer_token`` and ``endpoint``
69-
fields; Enterprise clients require ``username``, ``password``, and
70-
``endpoint``. If you do not specify the ``account_type``, we attempt to
71-
discern the account type and declare a warning about this behavior. The
72-
``load_credentials`` function also allows ``account_type`` to be set.
75+
For enterprise customers, the simplest credential file should look like
76+
this:
77+
78+
.. code:: yaml
7379
74-
Our credential reader will look for this file at
80+
search_tweets_api:
81+
account_type: enterprise
82+
endpoint: <FULL_URL_OF_ENDPOINT>
83+
username: <USERNAME>
84+
password: <PW>
85+
86+
By default, this library expects this file at
7587
``"~/.twitter_keys.yaml"``, but you can pass the relevant location as
76-
needed. You can also specify a different key in the yaml file, which can
77-
be useful if you have different endpoints, e.g., ``dev``, ``test``,
78-
``prod``, etc. The file might look like this:
88+
needed, either with the ``--credential-file`` flag for the command-line
89+
app or as demonstrated below in a Python program.
90+
91+
Both above examples require no special command-line arguments or
92+
in-program arguments. The credential parsing methods, unless otherwise
93+
specified, will look for a YAML key called ``search_tweets_api``.
94+
95+
For developers who have multiple endpoints and/or search products, you
96+
can keep all credentials in the same file and specify specific keys to
97+
use. ``--credential-file-key`` specifies this behavior in the command
98+
line app. An example:
99+
100+
.. code:: yaml
101+
102+
search_tweets_30_day_dev:
103+
account_type: premium
104+
endpoint: <FULL_URL_OF_ENDPOINT>
105+
bearer_token: <TOKEN>
79106
80-
.. code:: .yaml
107+
search_tweets_30_day_prod:
108+
account_type: premium
109+
endpoint: <FULL_URL_OF_ENDPOINT>
110+
bearer_token: <TOKEN>
81111
82-
search_tweets_dev:
112+
search_tweets_fullarchive_dev:
83113
account_type: premium
84114
endpoint: <FULL_URL_OF_ENDPOINT>
85115
bearer_token: <TOKEN>
86116
87-
search_tweets_prod:
117+
search_tweets_fullarchive_prod:
88118
account_type: premium
89119
endpoint: <FULL_URL_OF_ENDPOINT>
90120
bearer_token: <TOKEN>
91121
122+
Environment Variables
123+
---------------------
124+
92125
If you want or need to pass credentials via environment variables, you
93-
can set the appropriate variables of the following:
126+
can set the appropriate variables for your product of the following:
94127

95128
::
96129

@@ -101,13 +134,14 @@ can set the appropriate variables of the following:
101134
export SEARCHTWEETS_ACCOUNT_TYPE=
102135

103136
The ``load_credentials`` function will attempt to find these variables
104-
if it cannot load fields from the yaml file, and it will **overwrite any
105-
found credentials from the YAML file** if they have been parsed. This
106-
behavior can be changed by setting the ``load_credentials`` parameter
107-
``env_overwrite`` to ``False``.
137+
if it cannot load fields from the YAML file, and it will **overwrite any
138+
credentials from the YAML file that are present as environment
139+
variables** if they have been parsed. This behavior can be changed by
140+
setting the ``load_credentials`` parameter ``env_overwrite`` to
141+
``False``.
108142

109-
The following cells demonstrates credential handling, both in the
110-
command line app and Python library.
143+
The following cells demonstrates credential handling in the Python
144+
library.
111145

112146
.. code:: python
113147
@@ -145,22 +179,33 @@ regardless of a YAML file's validity or existence.
145179
.. code:: python
146180
147181
import os
148-
os.environ["SEARCHTWEETS_USERNAME"] = "ENV_USERNAME"
149-
os.environ["SEARCHTWEETS_PASSWORD"] = "ENV_PW"
150-
os.environ["SEARCHTWEETS_ENDPOINT"] = "https://endpoint"
182+
os.environ["SEARCHTWEETS_USERNAME"] = "<ENV_USERNAME>"
183+
os.environ["SEARCHTWEETS_PASSWORD"] = "<ENV_PW>"
184+
os.environ["SEARCHTWEETS_ENDPOINT"] = "<https://endpoint>"
151185
152-
load_credentials(filename="nothing", yaml_key="no_key_here")
186+
load_credentials(filename="nothing_here.yaml", yaml_key="no_key_here")
153187
154188
::
155189

156-
cannot read file nothing
190+
cannot read file nothing_here.yaml
157191
Error parsing YAML file; searching for valid environment variables
158192

159193
::
160194

161-
{'endpoint': 'https://endpoint',
162-
'password': 'ENV_PW',
163-
'username': 'ENV_USERNAME'}
195+
{'endpoint': '<https://endpoint>',
196+
'password': '<ENV_PW>',
197+
'username': '<ENV_USERNAME>'}
198+
199+
Command-line app
200+
----------------
201+
202+
the flags:
203+
204+
- ``--credential-file <FILENAME>``
205+
- ``--credential-file-key <KEY>``
206+
- ``--env-overwrite``
207+
208+
are used to control credential behavior from the command-line app.
164209

165210
--------------
166211

@@ -171,11 +216,11 @@ The library includes an application, ``search_tweets.py``, in the
171216
``tools`` directory that provides rapid access to Tweets.
172217

173218
Note that the ``--results-per-call`` flag specifies an argument to the
174-
API call ( ``maxResults``, results returned per CALL), not as a hard max
175-
to number of results returned from this program. The argument
219+
API ( ``maxResults``, results returned per CALL), not as a hard max to
220+
number of results returned from this program. The argument
176221
``--max-results`` defines the maximum number of results to return from a
177222
given call. All examples assume that your credentials are set up
178-
correctly in a default location - ``.twitter_keys.yaml`` or in
223+
correctly in the default location - ``.twitter_keys.yaml`` or in
179224
environment variables.
180225

181226
**Stream json results to stdout without saving**
@@ -210,8 +255,8 @@ environment variables.
210255
--filename-prefix beyonce_geo \
211256
--no-print-stream
212257
213-
Options can be passed via a configuration file (either ini or YAML). An
214-
example file can be found in the ``tools/api_config_example.config`` or
258+
Options can be passed via a configuration file (either ini or YAML).
259+
Example files can be found in the ``tools/api_config_example.config`` or
215260
``./tools/api_yaml_example.yaml`` files, which might look like this:
216261

217262
.. code:: bash
@@ -270,18 +315,18 @@ Full options are listed below:
270315

271316
$ search_tweets.py -h
272317
usage: search_tweets.py [-h] [--credential-file CREDENTIAL_FILE]
273-
[--credential-file-key CREDENTIAL_YAML_KEY]
274-
[--env-overwrite ENV_OVERWRITE]
275-
[--config-file CONFIG_FILENAME]
276-
[--account-type {premium,enterprise}]
277-
[--count-bucket COUNT_BUCKET]
278-
[--start-datetime FROM_DATE] [--end-datetime TO_DATE]
279-
[--filter-rule PT_RULE]
280-
[--results-per-call RESULTS_PER_CALL]
281-
[--max-results MAX_RESULTS] [--max-pages MAX_PAGES]
282-
[--results-per-file RESULTS_PER_FILE]
283-
[--filename-prefix FILENAME_PREFIX]
284-
[--no-print-stream] [--print-stream] [--debug]
318+
[--credential-file-key CREDENTIAL_YAML_KEY]
319+
[--env-overwrite ENV_OVERWRITE]
320+
[--config-file CONFIG_FILENAME]
321+
[--account-type {premium,enterprise}]
322+
[--count-bucket COUNT_BUCKET]
323+
[--start-datetime FROM_DATE] [--end-datetime TO_DATE]
324+
[--filter-rule PT_RULE]
325+
[--results-per-call RESULTS_PER_CALL]
326+
[--max-results MAX_RESULTS] [--max-pages MAX_PAGES]
327+
[--results-per-file RESULTS_PER_FILE]
328+
[--filename-prefix FILENAME_PREFIX]
329+
[--no-print-stream] [--print-stream] [--debug]
285330

286331
optional arguments:
287332
-h, --help show this help message and exit
@@ -319,10 +364,10 @@ Full options are listed below:
319364
Number of results to return per call (default 100; max
320365
500) - corresponds to 'maxResults' in the API
321366
--max-results MAX_RESULTS
322-
Maximum results to return for this session (defaults
323-
to 500; see -a option
367+
Maximum number of Tweets or Counts to return for this
368+
session (defaults to 500)
324369
--max-pages MAX_PAGES
325-
Maximum number of pages/api calls to use for this
370+
Maximum number of pages/API calls to use for this
326371
session.
327372
--results-per-file RESULTS_PER_FILE
328373
Maximum tweets to save per file.
@@ -336,7 +381,7 @@ Full options are listed below:
336381
--------------
337382

338383
Using the Twitter Search APIs' Python Wrapper
339-
============================================
384+
=============================================
340385

341386
Working with the API within a Python program is straightforward both for
342387
Premium and Enterprise clients.
@@ -621,21 +666,27 @@ Our results are pretty straightforward and can be rapidly used.
621666
Dated searches / Full Archive Search
622667
------------------------------------
623668

624-
Let's make a new rule and pass it dates this time.
625-
626-
``gen_rule_payload`` takes dates of the forms ``YYYY-mm-DD`` and
627-
``YYYYmmDD``.
628-
629669
**Note that this will only work with the full archive search option**,
630670
which is available to my account only via the enterprise options. Full
631671
archive search will likely require a different endpoint or access
632672
method; please see your developer console for details.
633673

674+
Let's make a new rule and pass it dates this time.
675+
676+
``gen_rule_payload`` takes timestamps of the following forms:
677+
678+
- ``YYYYmmDDHHMM``
679+
- ``YYYY-mm-DD`` (which will convert to midnight UTC (00:00)
680+
- ``YYYY-mm-DD HH:MM``
681+
- ``YYYY-mm-DDTHH:MM``
682+
683+
Note - all Tweets are stored in UTC time.
684+
634685
.. code:: python
635686
636687
rule = gen_rule_payload("from:jack",
637-
from_date="2017-09-01",
638-
to_date="2017-10-30",
688+
from_date="2017-09-01", #UTC 2017-09-01 00:00
689+
to_date="2017-10-30",#UTC 2017-10-30 00:00
639690
results_per_call=500)
640691
print(rule)
641692

examples/api_example.ipynb

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -425,18 +425,24 @@
425425
"source": [
426426
"## Dated searches / Full Archive Search\n",
427427
"\n",
428+
"**Note that this will only work with the full archive search option**, which is available to my account only via the enterprise options. Full archive search will likely require a different endpoint or access method; please see your developer console for details.\n",
428429
"\n",
429430
"Let's make a new rule and pass it dates this time.\n",
430431
"\n",
431-
"`gen_rule_payload` takes dates of the forms `YYYY-mm-DD` and `YYYYmmDD`.\n",
432+
"`gen_rule_payload` takes timestamps of the following forms:\n",
432433
"\n",
433434
"\n",
434-
"**Note that this will only work with the full archive search option**, which is available to my account only via the enterprise options. Full archive search will likely require a different endpoint or access method; please see your developer console for details."
435+
"- `YYYYmmDDHHMM`\n",
436+
"- `YYYY-mm-DD` (which will convert to midnight UTC (00:00)\n",
437+
"- `YYYY-mm-DD HH:MM`\n",
438+
"- `YYYY-mm-DDTHH:MM`\n",
439+
"\n",
440+
"Note - all Tweets are stored in UTC time."
435441
]
436442
},
437443
{
438444
"cell_type": "code",
439-
"execution_count": 18,
445+
"execution_count": 4,
440446
"metadata": {},
441447
"outputs": [
442448
{
@@ -449,8 +455,8 @@
449455
],
450456
"source": [
451457
"rule = gen_rule_payload(\"from:jack\",\n",
452-
" from_date=\"2017-09-01\",\n",
453-
" to_date=\"2017-10-30\",\n",
458+
" from_date=\"2017-09-01\", #UTC 2017-09-01 00:00\n",
459+
" to_date=\"2017-10-30\",#UTC 2017-10-30 00:00\n",
454460
" results_per_call=500)\n",
455461
"print(rule)"
456462
]

0 commit comments

Comments
 (0)