Skip to content

Commit d69ce9a

Browse files
authored
Merge pull request #22 from Vatis-Tech/release/v2.0
Release/v2.0
2 parents 14a97ab + 563d838 commit d69ce9a

19 files changed

+864
-105
lines changed

README.md

Lines changed: 267 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,48 @@ const vtc = new VatisTechClient.default({
109109

110110
## Props
111111

112+
### `config`
113+
114+
This is an **Object** with the following structure:
115+
116+
```
117+
{
118+
"spokenCommandsList": [
119+
{
120+
"command": "COMMAND_NAME",
121+
"regex": [ "regex1", "regex2", "regex3", ... ]
122+
},
123+
...
124+
]
125+
}
126+
```
127+
128+
Where the value of `spokenCommandsList` is an array of objects that have two properties, `command` and `regex`.
129+
130+
The value of the `command`, i.e. `COMMAND_NAME`, is a **String**.
131+
132+
The value of the `regex`, i.e. `[ "regex1", "regex2", "regex3", ... ]`, is an **Array of Strings**, i.e. `regex1`, `regex2`, `regex3` are **Strings**.
133+
134+
The ideea with this `spokenCommandsList`, is that each time one of the values from the `regex` array is matched in the transcript, it will fire the [onCommandData callback](#oncommanddata), with a special `header` on the data, named `SpokenCommand`.
135+
The value of the `SpokenCommand` header will be exactly the value of the `command`, i.e. `COMMAND_NAME`.
136+
137+
For example, you can use this `spokenCommandsList` to define rules of when you want a new paragraph:
138+
139+
```
140+
{
141+
"spokenCommandsList": [
142+
{
143+
"command": "COMMAND_NAME",
144+
"regex": [ "new line", "new paragraph", "from the start", "start new line" ]
145+
}
146+
]
147+
}
148+
```
149+
150+
So each time the back-end algorithm will find in the transcript one of `"new line"`, `"new paragraph"`, `"from the start"`, `"start new line"` phrases, the VTC client will fire the [onCommandData callback](#oncommanddata). This way, in your applocation, you will be able to know, when to start a new paragraph.
151+
152+
When sending a `config` to the client, the first callback to be fired, will be the [onConfig callback](#oncommanddata).
153+
112154
### `service`
113155

114156
This is a **String** that refers to the service that you would like to use.
@@ -148,7 +190,7 @@ To get one, please follow these instructions:
148190

149191
### `onData`
150192

151-
This is a **Function** on which you will receive from the back-end the transcript chunks.
193+
This is a **Function** on which you will receive from the back-end the transcript chunks. **It is a callback it is always fired.**.
152194

153195
It has the following signature:
154196

@@ -166,7 +208,230 @@ function onData(data) {
166208
}
167209
```
168210

169-
The `data` object that is received has the following props:
211+
The `data` object that is received has the following structure:
212+
213+
#### General structure
214+
215+
```json
216+
{
217+
"type": "<str>",
218+
"headers": {
219+
"key1": "value1",
220+
"key2": "value2"
221+
}
222+
}
223+
```
224+
225+
#### Timestamped transcription packet
226+
227+
```json
228+
{
229+
"type": "TIMESTAMPED_TRANSCRIPTION",
230+
"headers": {},
231+
"transcript": "hello world",
232+
"words": [
233+
{
234+
"word": "hello",
235+
"start_time": 1350.39,
236+
"end_time": 4600.5,
237+
"speaker": "Speaker 1",
238+
"confidence": 0.96,
239+
"entity": null,
240+
"entity_group_id": null
241+
},
242+
{
243+
"word": "world",
244+
"start_time": 6200.3,
245+
"end_time": 8020.0,
246+
"speaker": "Speaker 1",
247+
"confidence": 0.98,
248+
"entity": null,
249+
"entity_group_id": null
250+
}
251+
]
252+
}
253+
```
254+
255+
#### Timestamped transcription packet
256+
257+
```json
258+
{
259+
"type": "PROCESSED_TIMESTAMPED_TRANSCRIPTION",
260+
"headers": {},
261+
"transcript": "Hello, world!",
262+
"words": [
263+
{
264+
"word": "hello",
265+
"start_time": 1350.39,
266+
"end_time": 4600.5,
267+
"speaker": "Speaker 1",
268+
"confidence": 0.96,
269+
"entity": null,
270+
"entity_group_id": null
271+
},
272+
{
273+
"word": "world",
274+
"start_time": 6200.3,
275+
"end_time": 8020.0,
276+
"speaker": "Speaker 1",
277+
"confidence": 0.98,
278+
"entity": null,
279+
"entity_group_id": null
280+
}
281+
],
282+
"processed_words": [
283+
{
284+
"word": "Hello,",
285+
"start_time": 1350.39,
286+
"end_time": 4600.5,
287+
"speaker": "Speaker 1",
288+
"confidence": 0.96,
289+
"entity": null,
290+
"entity_group_id": null
291+
},
292+
{
293+
"word": "world!",
294+
"start_time": 6200.3,
295+
"end_time": 8020.0,
296+
"speaker": "Speaker 1",
297+
"confidence": 0.98,
298+
"entity": null,
299+
"entity_group_id": null
300+
}
301+
]
302+
}
303+
```
304+
305+
#### Headers
306+
307+
| Name | Type | Description |
308+
| --------------------- | ------- | ---------------------------------------------------------------------------------------------------------- |
309+
| PacketNumber | int | Incremental packet number |
310+
| Sid | string | Session id |
311+
| FrameStartTime | double | Frame start time in milliseconds |
312+
| FrameEndTime | double | Frame end time in milliseconds |
313+
| FinalFrame | boolean | Flag for marking that a segment of speech has ended and it won't be updated |
314+
| SilenceDetected | boolean | Flag to indicate silence was detected on the audio frame |
315+
| ProcessingTimeSeconds | double | Time of inferencing |
316+
| SplitPacket | boolean | Flag that indicates the response packet was split and this is one of the pieces |
317+
| FinalSplitPacket | boolean | Flag that indicates this is the final piece of the split response |
318+
| SplitId | string | Full packet id in format `<packet_number>.<split_id>.<sub-split-id>.<sub-sub-split-id>` |
319+
| RequestBytes | int | Additional bytes requested to produce a frame. This is just an estimation, any number of bytes can be sent |
320+
| SpokenCommand | string | Command detected in frame |
321+
322+
#### NOTE
323+
324+
So, the `data` can be final frame - i.e. the backend has fully finalized the transcript for those words and the time intervals (start and end time).
325+
Or can be partial frame - i.e. the backend has not fully finalized the transcript for those words and the time intervals, and it will most likely change until it is overlapped by a final frame.
326+
327+
### `onPartialData`
328+
329+
This is a **Function** on which you will receive from the back-end the partial transcript chunks.
330+
331+
It is identical to what the [onData callback](#ondata) does, just that the `data` will always represent partial frames.
332+
333+
It has the following signature:
334+
335+
```
336+
const onPartialData = (data) => {
337+
/* do something with data */
338+
}
339+
```
340+
341+
Or with function names:
342+
343+
```
344+
function onPartialData(data) {
345+
/* do something with data */
346+
}
347+
```
348+
349+
#### NOTE
350+
351+
The `data` object that comes on the current `onPartialData` callback overrides the `data` object that came on the previous `onPartialData` callback.
352+
353+
### `onFinalData`
354+
355+
This is a **Function** on which you will receive from the back-end the final transcript chunks.
356+
357+
It is identical to what the [onData callback](#ondata) does, just that the `data` will always represent final frames.
358+
359+
It has the following signature:
360+
361+
```
362+
const onFinalData = (data) => {
363+
/* do something with data */
364+
}
365+
```
366+
367+
Or with function names:
368+
369+
```
370+
function onFinalData(data) {
371+
/* do something with data */
372+
}
373+
```
374+
375+
#### NOTE
376+
377+
The `data` object that comes from the `onFinalData` callback overrides the `data` object that came on the previous `onPartialData` callback.
378+
379+
### `onConfig`
380+
381+
This is a **Function** on which you will receive from the back-end a message saying if the config was succesfully added ore not.
382+
383+
It has the following signature:
384+
385+
```
386+
const onConfig = (data) => {
387+
/* do something with data */
388+
}
389+
```
390+
391+
Where `data` object has the following structure:
392+
393+
#### Config applied packet
394+
395+
```json
396+
{
397+
"type": "CONFIG_APPLIED",
398+
"headers": {},
399+
"config_packet": {
400+
"type": "CONFIG",
401+
"headers": {},
402+
"spokenCommandsList": [
403+
{
404+
"command": "NEW_PARAGRAPH",
405+
"regex": ["new line"]
406+
}
407+
]
408+
}
409+
}
410+
```
411+
412+
### `onCommandData`
413+
414+
This is a **Function** on which you will receive from the back-end the transcript chunks for speciffic commands.
415+
416+
For example, if you initialize the plugin with a set of commands (e.g. `{spokenCommandsList: [ { "command": "NEW_PARAGRAPH", "regex": ["start new paragraph", "new phrase", "new sentence"] } ] }`), each time the back-end algorithm will find these sets of commands, it will send on this function the data.
417+
418+
It has the following signature:
419+
420+
```
421+
const onCommandData = (data) => {
422+
/* do something with data */
423+
}
424+
```
425+
426+
Or with function names:
427+
428+
```
429+
function onCommandData(data) {
430+
/* do something with data */
431+
}
432+
```
433+
434+
The `data` object from this callback, is the same as the one from [onData callback](#ondata), but it also has a new property, named `spokenCommand`, with the actual command that triggered the callback.
170435

171436
### `log`
172437

0 commit comments

Comments
 (0)