Skip to content
This repository was archived by the owner on Aug 17, 2024. It is now read-only.

Commit e1a131a

Browse files
committed
v0.2.0
2 parents bafd192 + eb8f5e0 commit e1a131a

32 files changed

+6259
-2
lines changed

.babelrc

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
{
2+
"presets": ["es2015", "stage-0"],
3+
"plugins": ["transform-runtime"]
4+
}

.eslintrc

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
{
2+
"extends": "airbnb",
3+
"parser": "babel-eslint",
4+
"ecmaFeatures": {
5+
"classes": true
6+
},
7+
"rules": {
8+
"indent": [2, 4],
9+
"no-console": 0,
10+
"object-curly-spacing": 0,
11+
"no-spaced-func": 0
12+
}
13+
}

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
npm-debug.log*
2+
logs
3+
node_modules

AUTHORS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Guillaume MOUSNIER <mousnier.guillaume@gmail.com>

CHANGELOG.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# changelog
2+
3+
---
4+
5+
## v0.2.0
6+
7+
**Author**: Guillaume Mousnier.
8+
9+
**Type**: Feature
10+
11+
**Changes**:
12+
- First functional version
13+
14+
---
15+
16+
## v0.1.0
17+
18+
**Author**: Guillaume Mousnier.
19+
20+
**Type**: Feature
21+
22+
**Changes**:
23+
- Init the repo
24+
25+
---

CONTRIBUTING.md

Whitespace-only changes.

README.md

Lines changed: 274 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,275 @@
1-
dataframe-js
2-
============
1+
# dataframe-js
2+
**v0.2.0**
33

4+
## Presentation
5+
6+
dataframe-js provides another way to work with data by using DataFrame, a powerfull data structure already used in some languages (Spark, Python, R, ...).
7+
8+
A DataFrame is simply built on two concepts:
9+
- **Columns** providing ways to select your data and reorganize them.
10+
- **Rows** providing ways to modify or filter your data.
11+
12+
````javascript
13+
const df = new DataFrame(rawData, columns)
14+
df.show()
15+
// DataFrame example
16+
| column1 | column2 | column3 | <--- Columns
17+
------------------------------------
18+
| 3 | 3 | undefined | <--- Row
19+
| 6 | 4 | undefined |
20+
| 8 | 5 | undefined |
21+
| undefined | 6 | undefined |
22+
````
23+
24+
**DataFrame is immutable** (lazy, for performance purposes). Then, each modification on DataFrame will return a new DataFrame decreasing side effects and making your data more secure.
25+
26+
**DataFrame is easy to use** with a simple API (closed to Spark or SQL) designed to manipulate data faster and easier than ever.
27+
28+
**DataFrame is flexible** because you can create DataFrames from multiple data format (array, object) and you can export your DataFrames into these (array, object, csv, json...).
29+
30+
**DataFrame is modulable** because you can use additional modules (Stat and Matrix by default) or create your own.
31+
32+
## Installation
33+
34+
`npm install git+http://93.15.96.71:10080/odin/dataframe-js.git#feature/begin`
35+
36+
## Manual
37+
38+
dataframe-js contains a **principal core (DataFrame and Row)** and **two default modules (Stat and Matrix)**. Refer to this manual to use them. You can also directly read unit tests in `./tests/` or documented code in `./src/`.
39+
40+
### Core
41+
42+
#### DataFrame and Row API documentation: [Core API](./doc/CORE_API.md)
43+
44+
#### Usage:
45+
46+
To use dataframe-js, simply import the library. Then you can use DataFrame, Row or other Core components.
47+
48+
```javascript
49+
import { DataFrame, Row } from 'dataframe-js';
50+
```
51+
52+
To create a DataFrame, you have to passe your data and your column names. You can use different data structures as below:
53+
54+
```javascript
55+
const df = new DataFrame(myData, myColumns);
56+
57+
const dfFromObjectOfArrays = new DataFrame({
58+
column1: [3, 6, 8], //<------ A column
59+
column2: [3, 4, 5, 6],
60+
}, ['column1', 'column2']);
61+
62+
const dfFromArrayOfArrays = new DataFrame([
63+
[1, 6, 9, 10, 12], // <------- A row
64+
[1, 2],
65+
[6, 6, 9, 8, 9, 12],
66+
], ['c1', 'c2', 'c3', 'c4', 'c5', 'c6']);
67+
68+
const dfFromArrayOfObjects = new DataFrame([
69+
{c1: 1, c2: 6}, // <--- A row
70+
{c4: 1, c3: 2}
71+
], ['c1', 'c2', 'c3', 'c4']);
72+
```
73+
74+
If you don't pass column names, they will be infered from your data but **it's slower**:
75+
76+
```javascript
77+
// here you don't pass column names
78+
const dfFromObjectOfArrays = new DataFrame({
79+
column1: [3, 6, 8], //<------ A column
80+
column2: [3, 4, 5, 6],
81+
});
82+
83+
console.log(dfFromObjectOfArrays.listColumns())
84+
// ['column1', 'column2']
85+
86+
const dfFromArrayOfArrays = new DataFrame([
87+
[1, 6, 9, 10, 12], // <------- A row
88+
[1, 2],
89+
[6, 6, 9, 8, 9, 12],
90+
]);
91+
92+
console.log(dfFromArrayOfArrays.listColumns())
93+
// ['0', '1', '2', '3', '4', '5']
94+
95+
96+
const dfFromArrayOfObjects = new DataFrame([
97+
{c1: 1, c2: 6}, // <--- A row
98+
{c4: 1, c3: 2}
99+
]);
100+
101+
console.log(dfFromArrayOfObjects.listColumns())
102+
// ['c1', 'c2', 'c3', 'c4']
103+
```
104+
105+
Of course, you can do the reverse by exporting your DataFrame in another format by using:
106+
* [.toDict()](./doc/CORE_API.md#DataFrame+toDict) ⇒ <code>Object</code>
107+
* [.toArray()](./doc/CORE_API.md#DataFrame+toArray) ⇒ <code>Array</code>
108+
* [.toText([sep], [header], [path])](./doc/CORE_API.md#DataFrame+toText) ⇒ <code>String</code>
109+
* [.toCSV([header], [path])](./doc/CORE_API.md#DataFrame+toCSV) ⇒ <code>String</code>
110+
* [.toJSON([path])](./doc/CORE_API.md#DataFrame+toJSON) ⇒ <code>String</code>
111+
112+
or you can debug by using:
113+
* [.show([rows], [quiet])](./doc/CORE_API.md#DataFrame+show) ⇒ <code>String</code>
114+
115+
When you realize some operations on a DataFrame (or on a Row), it is never mutated. Indeed, when you modify a DataFrame (even if nothing change) you create a new instance of DataFrame. It's a bit slower but you avoid side effects.
116+
117+
Examples:
118+
```javascript
119+
// When you change the DataFrame structure, the original DataFrame doesn't change.
120+
df.drop('column1'); // <--- Here you drop a column.
121+
console.log(df.listColumns());
122+
// But nothing change in df.
123+
// You didn't mutated it. You just have created a new instance of DataFrame.
124+
// ['column1', 'column2', 'column3']
125+
126+
// Here you declare a new variable (const) to save the modified df.
127+
const df2 = df.drop('column1');
128+
console.log(df2.listColumns());
129+
// ['column2', 'column3']
130+
131+
console.log(Object.is(df2.dim(), df.dim()));
132+
// false, they didn't have the same dimensions. df2 is no longer an instance of df.
133+
console.log(
134+
Object.is(
135+
df2.map(row => row),
136+
df2
137+
)
138+
);
139+
// false. a modification of df2 send another instance of DataFrame, even if nothing change.
140+
141+
// if we create a new column
142+
df2.withColumn('anewcolumn', row => row.get('column2') + 8);
143+
console.log(
144+
df2.select('anewcolumn')
145+
);
146+
// NoSuchColumnError
147+
// df2 wasn't mutated
148+
149+
```
150+
151+
For more informations about all DataFrame manipulations you can find the API below.
152+
153+
#### List of available methods and their examples:
154+
155+
* [DataFrame](./doc/CORE_API.md#DataFrame)
156+
* [new DataFrame(data, columns, [...modules])](#new_DataFrame_new)
157+
* [.toDict()](./doc/CORE_API.md#DataFrame+toDict) ⇒ <code>Object</code>
158+
* [.toArray()](./doc/CORE_API.md#DataFrame+toArray) ⇒ <code>Array</code>
159+
* [.toText([sep], [header], [path])](./doc/CORE_API.md#DataFrame+toText) ⇒ <code>String</code>
160+
* [.toCSV([header], [path])](./doc/CORE_API.md#DataFrame+toCSV) ⇒ <code>String</code>
161+
* [.toJSON([path])](./doc/CORE_API.md#DataFrame+toJSON) ⇒ <code>String</code>
162+
* [.push(...rows)](#DataFrame+push) ⇒ <code>[DataFrame](#DataFrame)</code>
163+
* [.dim()](./doc/CORE_API.md#DataFrame+dim) ⇒ <code>Array</code>
164+
* [.transpose()](./doc/CORE_API.md#DataFrame+transpose) ⇒ <code>ÐataFrame</code>
165+
* [.count()](./doc/CORE_API.md#DataFrame+count) ⇒ <code>Int</code>
166+
* [.countValue(valueToCount, [columnName])](./doc/CORE_API.md#DataFrame+countValue) ⇒ <code>Int</code>
167+
* [.show([rows], [quiet])](./doc/CORE_API.md#DataFrame+show) ⇒ <code>String</code>
168+
* [.replace(value, replacment, [...columnNames])](./doc/CORE_API.md#DataFrame+replace) ⇒ <code>[DataFrame](./doc/CORE_API.md#DataFrame)</code>
169+
* [.distinct(columnName)](./doc/CORE_API.md#DataFrame+distinct) ⇒ <code>Array</code>
170+
* [.unique(columnName)](./doc/CORE_API.md#DataFrame+unique) ⇒ <code>Array</code>
171+
* [.listColumns()](./doc/CORE_API.md#DataFrame+listColumns) ⇒ <code>Array</code>
172+
* [.select(...columnNames)](./doc/CORE_API.md#DataFrame+select) ⇒ <code>[DataFrame](./doc/CORE_API.md#DataFrame)</code>
173+
* [.withColumn(columnName, [func])](./doc/CORE_API.md#DataFrame+withColumn) ⇒ <code>[DataFrame](./doc/CORE_API.md#DataFrame)</code>
174+
* [.restructure(newColumnNames)](./doc/CORE_API.md#DataFrame+restructure) ⇒ <code>[DataFrame](./doc/CORE_API.md#DataFrame)</code>
175+
* [.rename(newColumnNames)](./doc/CORE_API.md#DataFrame+rename) ⇒ <code>[DataFrame](./doc/CORE_API.md#DataFrame)</code>
176+
* [.drop(columnName)](./doc/CORE_API.md#DataFrame+drop) ⇒ <code>[DataFrame](./doc/CORE_API.md#DataFrame)</code>
177+
* [.chain(...funcs)](./doc/CORE_API.md#DataFrame+chain) ⇒ <code>[DataFrame](./doc/CORE_API.md#DataFrame)</code>
178+
* [.filter(func)](./doc/CORE_API.md#DataFrame+filter) ⇒ <code>[DataFrame](./doc/CORE_API.md#DataFrame)</code>
179+
* [.where(func)](./doc/CORE_API.md#DataFrame+where) ⇒ <code>[DataFrame](./doc/CORE_API.md#DataFrame)</code>
180+
* [.find(condition)](./doc/CORE_API.md#DataFrame+find) ⇒ <code>[Row](./doc/CORE_API.md#Row)</code>
181+
* [.map(func)](./doc/CORE_API.md#DataFrame+map) ⇒ <code>[DataFrame](./doc/CORE_API.md#DataFrame)</code>
182+
* [.reduce(func, [init])](./doc/CORE_API.md#DataFrame+reduce)
183+
* [.reduceRight(func, [init])](./doc/CORE_API.md#DataFrame+reduceRight)
184+
* [.shuffle()](./doc/CORE_API.md#DataFrame+shuffle) ⇒ <code>[DataFrame](./doc/CORE_API.md#DataFrame)</code>
185+
* [.sample(percentage)](./doc/CORE_API.md#DataFrame+sample) ⇒ <code>[DataFrame](./doc/CORE_API.md#DataFrame)</code>
186+
* [.randomSplit(percentage)](./doc/CORE_API.md#DataFrame+randomSplit) ⇒ <code>Array</code>
187+
* [.groupBy(columnName)](./doc/CORE_API.md#DataFrame+groupBy) ⇒ <code>Array</code>
188+
* [.sortBy(columnName, [reverse])](./doc/CORE_API.md#DataFrame+sortBy) ⇒ <code>[DataFrame](./doc/CORE_API.md#DataFrame)</code>
189+
* [.union(dfToUnion)](./doc/CORE_API.md#DataFrame+union) ⇒ <code>[DataFrame](./doc/CORE_API.md#DataFrame)</code>
190+
* [.join(dfToJoin, on, [how])](./doc/CORE_API.md#DataFrame+join) ⇒ <code>[DataFrame](./doc/CORE_API.md#DataFrame)</code>
191+
* [.innerJoin(dfToJoin, on)](./doc/CORE_API.md#DataFrame+innerJoin) ⇒ <code>[DataFrame](./doc/CORE_API.md#DataFrame)</code>
192+
* [.fullJoin(dfToJoin, on)](./doc/CORE_API.md#DataFrame+fullJoin) ⇒ <code>[DataFrame](./doc/CORE_API.md#DataFrame)</code>
193+
* [.outerJoin(dfToJoin, on)](./doc/CORE_API.md#DataFrame+outerJoin) ⇒ <code>[DataFrame](./doc/CORE_API.md#DataFrame)</code>
194+
* [.leftJoin(dfToJoin, on)](./doc/CORE_API.md#DataFrame+leftJoin) ⇒ <code>[DataFrame](./doc/CORE_API.md#DataFrame)</code>
195+
* [.rightJoin(dfToJoin, on)](./doc/CORE_API.md#DataFrame+rightJoin) ⇒ <code>[DataFrame](./doc/CORE_API.md#DataFrame)</code>
196+
197+
198+
* [Row](./doc/CORE_API.md#Row)
199+
* [new Row(data, columns)](#new_Row_new)
200+
* [.toDict()](./doc/CORE_API.md#Row+toDict) ⇒ <code>Object</code>
201+
* [.toArray()](./doc/CORE_API.md#Row+toArray) ⇒ <code>Array</code>
202+
* [.size()](./doc/CORE_API.md#Row+size) ⇒ <code>Int</code>
203+
* [.has(columnName)](./doc/CORE_API.md#Row+has) ⇒ <code>Boolean</code>
204+
* [.select(...columnNames)](./doc/CORE_API.md#Row+select) ⇒ <code>[Row](./doc/CORE_API.md#Row)</code>
205+
* [.get(columnToGet)](./doc/CORE_API.md#Row+get)
206+
* [.set(columnToSet)](./doc/CORE_API.md#Row+set) ⇒ <code>[Row](./doc/CORE_API.md#Row)</code>
207+
* [.delete(columnToDel)](./doc/CORE_API.md#Row+delete) ⇒ <code>[Row](./doc/CORE_API.md#Row)</code>
208+
209+
210+
211+
### Modules
212+
213+
#### Stat and Matrix modules API documentation: [Modules API](./doc/MODULES_API.md)
214+
215+
#### Usage:
216+
217+
dataframe-js is designed to easily create and add modules in order to extends DataFrame tools.
218+
219+
When you start an instance of DataFrame you can also pass modules which be available by calling their names.
220+
221+
```javascript
222+
// Here you add two modules on your DataFrame instance.
223+
const df = new DataFrame(obj, ['column1', 'column2', 'column3'], fakeModule, anotherModule)
224+
// You can call modules by their names
225+
df.fakemodule.test(4)
226+
```
227+
228+
Modules will be also available for each DataFrame created from your first instance, avoiding to redeclare your modules each time you create a DataFrame.
229+
230+
```javascript
231+
// You create a second DataFrame from the last one.
232+
const df2 = df.withColumn('column4', (row) => row.get('column2') * 2)
233+
// This second DataFrame will have acces to the same modules.
234+
df.fakemodule.test(8)
235+
```
236+
237+
If you want to create your own module, take a look at the Statisticical module (integrated by default) `./src/modules/stat.js` as example.
238+
239+
A simple example of a module structure:
240+
241+
```javascript
242+
class fakeModule {
243+
constructor(dataframe) {
244+
this.df = dataframe;
245+
this.name = 'fakemodule';
246+
}
247+
248+
test(x) {
249+
return this.df.withColumn('test', row => row.set('test', x * 2));
250+
}
251+
}
252+
```
253+
254+
#### List of available modules
255+
256+
* [Matrix](./doc/MODULES_API.md#Matrix)
257+
* [new Matrix(dataframe)](./doc/MODULES_API.md#new_Matrix_new)
258+
* [.hasSameStruct(df)](./doc/MODULES_API.md#Matrix+hasSameStruct) ⇒ <code>Boolean</code>
259+
* [.hasSameTransposedStruct(df)](./doc/MODULES_API.md#Matrix+hasSameTransposedStruct) ⇒ <code>Boolean</code>
260+
* [.add(df)](./doc/MODULES_API.md#Matrix+add) ⇒ <code>DataFrame</code>
261+
* [.product(number)](./doc/MODULES_API.md#Matrix+product) ⇒ <code>DataFrame</code>
262+
* [.dot(df)](./doc/MODULES_API.md#Matrix+dot) ⇒ <code>DataFrame</code>
263+
264+
* [Stat](./doc/MODULES_API.md#Stat)
265+
* [new Stat(dataframe)](./doc/MODULES_API.md#new_Stat_new)
266+
* [.max(columnName)](./doc/MODULES_API.md#Stat+max) ⇒ <code>Number</code>
267+
* [.min(columnName)](./doc/MODULES_API.md#Stat+min) ⇒ <code>Number</code
268+
* [.mean(columnName)](./doc/MODULES_API.md#Stat+mean) ⇒ <code>Number</code>
269+
* [.var(columnName, [population])](./doc/MODULES_API.md#Stat+var) ⇒ <code>Number</code>
270+
* [.sd(columnName, [population])](./doc/MODULES_API.md#Stat+sd) ⇒ <code>Number</code>
271+
* [.stats(columnName)](./doc/MODULES_API.md#Stat+stats) ⇒ <code>Object</code>
272+
273+
## Contribution
274+
275+
[How to contribute ?](./CONTRIBUTING.md)

dataframe-benchmark.js

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
import { Benchmark, DataFrame } from './src/index.js';
2+
import { chain } from './src/reusables.js';
3+
4+
const bench = new Benchmark();
5+
6+
const data = [...Array(100000).keys()].map(r => ({c1: r}));
7+
const data2 = [...Array(100000).keys()].map(r => [r]);
8+
const df = new DataFrame(data, ['c1']);
9+
10+
bench.compare(
11+
() => df.chain(
12+
row => row.set('c1', row.get('c1') * 4),
13+
row => row.get('c1') > 30000,
14+
row => row.set('c1', Math.sqrt(row.get('c1')))
15+
),
16+
() => data.map(row => (Object.assign({}, row, {c1: row.c1 * 4}))).filter(row => row.c1 > 30000).map(row => (Object.assign({}, row, {c1: Math.sqrt(row.c1)}))),
17+
20);
18+
19+
bench.compare(
20+
() => df.chain(
21+
row => row.set('c1', row.get('c1') * 4),
22+
row => row.get('c1') > 30000,
23+
row => row.set('c1', Math.sqrt(row.get('c1')))
24+
),
25+
() => data.map(row => (Object.assign({}, row, {c1: row.c1 * 4}))).filter(row => row.c1 > 30000).map(row => (Object.assign({}, row, {c1: Math.sqrt(row.c1)}))),
26+
20);
27+
28+
bench.compare(
29+
() => df.chain(
30+
row => row.set('c1', row.get('c1') * 4),
31+
row => row.get('c1') > 30000,
32+
row => row.set('c1', Math.sqrt(row.get('c1')))
33+
),
34+
() => df.map(row => row.set('c1', row.get('c1') * 4)).filter(row => row.get('c1') > 30000).map(row => row.set('c1', Math.sqrt(row.get('c1')))),
35+
20);
36+
37+
bench.compare(
38+
() => [...chain(data2,
39+
row => row * 4,
40+
row => row > 30000,
41+
row => Math.sqrt(row),
42+
row => row ** 2,
43+
)],
44+
() => data2.map(row => row * 4).filter(row => row > 30000).map(row => Math.sqrt(row)).map(row => row ** 2),
45+
20);

0 commit comments

Comments
 (0)