Skip to content

Commit d8be3f6

Browse files
committed
Merged version 0.9.6
Former-commit-id: 818daf0
1 parent 49ac183 commit d8be3f6

32 files changed

+1367
-631
lines changed

README.rst

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
metaknowledge
2+
=============
3+
4+
metaknowledge is a Python3 library that simplifies bibliometric research
5+
using Web of Science data. It reads a directory of plain text files
6+
containing meta-data on publications and citations, and writes to a
7+
variety of data structures that are suitable for quantitative, network,
8+
and text analyses. It handles large datasets (e.g. several million
9+
records) efficiently.
10+
11+
The website can be found at
12+
`networkslab.org/metaknowledge <http://networkslab.org/metaknowledge/>`__.
13+
14+
Currently in Beta
15+
-----------------
16+
17+
metaknowledge is in the final stages of testing before a 1.0 release
18+
different versions may behave differently.
19+
20+
Installing
21+
----------
22+
23+
To install run ``python3 setup.py install``
24+
25+
For information on alternate installs read the documentation at the
26+
`website <http://networkslab.org/metaknowledge/installation/>`__.

metaknowledge/__init__.py

Lines changed: 14 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
#Written by Reid McIlroy-Young for Dr. John McLevey, University of Waterloo 2015
12
"""metaknowledge is a Python3 package that simplifies bibliometric and computational analysis of Web of Science data.
23
34
# Example
@@ -12,25 +13,29 @@
1213
Done making a co-citation network of files-from-records 1.1s
1314
>>> print(len(G.nodes()))
1415
223
15-
>>> mk.write_graph(G, "Cocitation-Network-of-Journals")
16+
>>> mk.writeGraph(G, "Cocitation-Network-of-Journals")
17+
18+
There is also a simple command line program called `metaknowledge` that comes with the package. It allows for creating networks without any need to know Python. More information about it can be found at [networkslab.org/metaknowledge/cli]({{ site.baseurl }}/cli)
1619
1720
# Overview
1821
19-
This package can read the files downloaded from the [Thomson Reuters Web of Science](https://webofknowledge.com) (WOS) as plain text. These files contain metadata about scientific records, such as the authors, language, and citations. The records are saved in groups of up-to 500 individual records in a file.
22+
This package can read the files downloaded from the [Thomson Reuters Web of Science](https://webofknowledge.com) (WOS) as plain text. These files contain metadata about scientific records, such as the authors, title, and citations. The records are exported in groups of up-to 500 individual records to a file.
2023
2124
The [metaknowledge.RecordCollection](#RecordCollection.RecordCollection) class can take a path to one or more of these files load and parse them. The object is the main way for work to be done on multiple records. For each individual record it creates an instance of the [metaknowledge.Record](#Record.Record) class that contains the results of the parsing of the record.
2225
23-
The files given by WOS are a flat database containing a series of 2 character tags, e.g. 'TI' is the title. Each WOS tag has one or more values and metaknowledge makes use of them to extract useful information. The approximate meanings of the tags are listed in the [tagProcessing](#tagProcessing.tagProcessing) package, if you simply want the mapping [`tagToFull()`](#metaknowledge.tagToFull) is a function that maps tags to their full names it as well as a few other similar functions are provided by metaknowledge. There are no full official public listings of tag the meanings available. metaknowledge is not attempting to provide the definitive or authoritative meanings. Some
26+
The files given by WOS are a flat database containing a series of 2 character tags, e.g. 'TI' is the title. Each WOS tag has one or more values and metaknowledge can read them to extract useful information. The approximate meanings of the tags are listed in the [tagProcessing](#tagProcessing.tagProcessing) package, along with the parsing functions for each tag. If you simply want the mapping [`tagToFull()`](#metaknowledge.tagToFull) is a function that maps tags to their full names it, as well as a few other similar functions are provided by the base metaknowledge import. Note, the long names can be used in place of the short 2 character codes within metaknowledge. There are no full official public listings of tag the meanings available. metaknowledge is not attempting to provide the definitive or authoritative meanings.
2427
25-
As citations are of great importance to sociology their handling is done with the [Citation](#Citation.Citation) class. This class can parse the citations given by WOS as well as extra details about the full name of their journal and allow simple comparisons.
28+
Citations are handled by a special [Citation](#Citation.Citation) class. This class can parse the citations given by WOS as well as extra details about the full name of their journal and allow simple comparisons.
2629
27-
Note for those reading the docstring metaknowledge's docs are written in markdown and are processed to produce the documentation found at [networkslab.org/metaknowledge/documentation](http://networkslab.org/metaknowledge/documentation/).
30+
Note for those reading the docstrings metaknowledge's docs are written in markdown and are processed to produce the documentation found at [networkslab.org/metaknowledge/documentation]({{ site.baseurl }}/documentation/), but you should have no problem reading them from the help function.
2831
"""
29-
from .record import Record, recordParser, BadISIRecord
32+
from .record import Record, recordParser, BadWOSRecord
3033
from .citation import Citation, BadCitation, filterNonJournals
31-
from .recordCollection import RecordCollection, isiParser
32-
from .graphHelpers import write_edgeList, write_nodeAttributeFile, write_graph, read_graph, _ProgressBar, drop_edges, drop_nodesByDegree, drop_nodesByCount, mergeGraphs, graphStats
34+
from .recordCollection import RecordCollection, wosParser
35+
36+
from .graphHelpers import writeEdgeList, writeNodeAttributeFile, writeGraph, readGraph, _ProgressBar, dropEdges, dropNodesByDegree, dropNodesByCount, mergeGraphs, graphStats
3337
from .constants import VERBOSE_MODE
34-
#from .blondel import blondel, modularity
3538
from .diffusion import diffusionGraph, diffusionCount
3639
from .tagProcessing.funcDicts import tagToFull, isTagOrName, normalizeToTag, normalizeToName
40+
41+
#from .blondel import blondel, modularity #Better implementations can be found on Pypi so this has been discontinued

metaknowledge/bin/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
#Written by Reid McIlroy-Young for Dr. John McLevey, University of Waterloo 2015
12
from .metaknowledgeCLI import mkCLI
23
from .metaknowledgeMdToNb import mkMdToNb
34
from .metaknowledgeDocsGen import mkDocs

metaknowledge/bin/metaknowledgeCLI.py

Lines changed: 29 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
#Written by Reid McIlroy-Young for Dr. John McLevey, University of Waterloo 2015
12
import metaknowledge
23
import metaknowledge.journalAbbreviations
34
import networkx as nx
@@ -11,6 +12,7 @@
1112

1213
#TODO: Figure output name stuff
1314

15+
#These are used when the user goes to the interactive terminal, so they must be global
1416
RC = None
1517
G = None
1618

@@ -21,7 +23,7 @@ def argumentParser():
2123
parser.add_argument("--name", "-n", default = False, help = "The name used for the recordCollection and resulting files.")
2224
parser.add_argument("--debug", "-d", action = 'store_true', default = False, help = "Enables debug messages.")
2325
parser.add_argument("--progress", "-p", action = 'store_true' ,default = False, help = "Progress bar mode, shows progress bars where appropriate")
24-
parser.add_argument("--suffix", "-s", default = '', help = "Progress bar mode, shows progress bars where appropriate")
26+
parser.add_argument("--suffix", "-s", default = '', help = "The suffix of the WOS files you wish to extract Records from, by default all files are used and those that do not have Records are skipped")
2527
return parser.parse_args()
2628

2729
def yesorNo(prompt):
@@ -81,7 +83,7 @@ def getOutputName(clargs, suffix, prompt = "What do you wish to call the output
8183
else:
8284
return s
8385

84-
def getTag(prompt, nMode = False):
86+
def Tag(prompt, nMode = False):
8587
retTag = input(prompt).upper()
8688
if retTag in metaknowledge.tagsAndNames:
8789
return retTag
@@ -90,7 +92,7 @@ def getTag(prompt, nMode = False):
9092
return False
9193
else:
9294
print("{} is not a valid tag, please try again".format(retTag))
93-
return getTag(prompt, nMode = nMode)
95+
return Tag(prompt, nMode = nMode)
9496

9597
def getNum(prompt):
9698
retNum = input(prompt)
@@ -218,21 +220,21 @@ def getNetwork(clargs, inRC):
218220
])
219221
netID = int(inputMenu(netsDict, header = "What type of network do you wish to create?", promptMsg = "Input the number corresponding to the type of network you wish to generate? "))
220222
if netID == 1:
221-
otg = getTag("What is the tag to use for the network? ")
223+
otg = Tag("What is the tag to use for the network? ")
222224
print("Generating a network using the {0} tag.".format(otg))
223225
return inRC.oneModeNetwork(otg)
224226
elif netID == 2:
225-
tg1 = getTag("What is the first tag to use for the network? ")
226-
tg2 = getTag("And the second tag? ")
227+
tg1 = Tag("What is the first tag to use for the network? ")
228+
tg2 = Tag("And the second tag? ")
227229
print("Generating a network using the {0} and {1} tags.".format(tg1, tg2))
228230
return inRC.twoModeNetwork(tg1, tg2)
229231
elif netID == 3:
230232
tgs = []
231-
tgs.append(getTag("What is the first tag to use for the network? "))
232-
innertag = getTag("And the next tag (leave blank to continue)? ", nMode = True)
233+
tgs.append(Tag("What is the first tag to use for the network? "))
234+
innertag = Tag("And the next tag (leave blank to continue)? ", nMode = True)
233235
while innertag:
234236
tgs.append(innertag)
235-
innertag = getTag("And the next tag (leave blank to continue)? ", nMode = True)
237+
innertag = Tag("And the next tag (leave blank to continue)? ", nMode = True)
236238
print("Generating a network using the {0} and {1} tags".format(', '.join(tgs[:-1]), tgs[-1]))
237239
return inRC.nModeNetwork(tgs)
238240
elif netID == 4:
@@ -260,18 +262,23 @@ def getThresholds(clargs, grph):
260262
if thresID == 0:
261263
return grph
262264
elif thresID == 1:
263-
return getThresholds(clargs, metaknowledge.drop_nodesByDegree(grph, minDegree = 1))
265+
metaknowledge.dropNodesByDegree(grph, minDegree = 1)
266+
return getThresholds(clargs, grph)
264267
elif thresID == 2:
265-
return getThresholds(clargs, metaknowledge.drop_edges(grph, dropSelfLoops = True))
268+
metaknowledge.dropEdges(grph, dropSelfLoops = True)
269+
return getThresholds(clargs, grph)
266270
elif thresID == 3:
267-
return getThresholds(clargs, metaknowledge.drop_edges(grph, minWeight = getNum("What is the minumum weight for an edge to be included? ")))
271+
metaknowledge.dropEdges(grph, minWeight = getNum("What is the minumum weight for an edge to be included? "))
272+
return getThresholds(clargs, grph)
268273
elif thresID == 4:
269-
return getThresholds(clargs, metaknowledge.drop_edges(grph, minWeight = getNum("What is the maximum weight for an edge to be included? ")))
274+
metaknowledge.dropEdges(grph, minWeight = getNum("What is the maximum weight for an edge to be included? "))
275+
return getThresholds(clargs, grph)
270276
elif thresID == 5:
271-
return getThresholds(clargs, metaknowledge.drop_nodesByDegree(grph, minDegree = getNum("What is the minumum degree for an edge to be included? ")))
277+
metaknowledge.dropNodesByDegree(grph, minDegree = getNum("What is the minumum degree for an edge to be included? "))
278+
return getThresholds(clargs, grph)
272279
else:
273-
return getThresholds(clargs, metaknowledge.drop_nodesByDegree(grph, minDegree = getNum("What is the maximum degree for an edge to be included? ")))
274-
280+
metaknowledge.dropNodesByDegree(grph, minDegree = getNum("What is the maximum degree for an edge to be included? "))
281+
return getThresholds(clargs, grph)
275282

276283
def outputNetwork(clargs, grph):
277284
outDict = collections.OrderedDict([
@@ -296,15 +303,15 @@ def outputNetwork(clargs, grph):
296303
while True:
297304
try:
298305
outName = getOutputName(clargs, '', checking = False)
299-
metaknowledge.write_graph(grph, outName)
306+
metaknowledge.writeGraph(grph, outName)
300307
except OSError:
301308
if clargs.name:
302-
metaknowledge.write_graph(grph, outName, overwrite = True)
309+
metaknowledge.writeGraph(grph, outName, overwrite = True)
303310
break
304311
else:
305312
overWrite = yesorNo("{}, overwrite (y/n)? ")
306313
if overWrite:
307-
metaknowledge.write_graph(grph, outName, overwrite = True)
314+
metaknowledge.writeGraph(grph, outName, overwrite = True)
308315
break
309316
else:
310317
pass
@@ -313,13 +320,13 @@ def outputNetwork(clargs, grph):
313320

314321
elif outID == 2:
315322
outName = getOutputName(clargs, '.csv')
316-
metaknowledge.write_edgeList(grph, outName)
323+
metaknowledge.writeEdgeList(grph, outName)
317324
elif outID == 3:
318325
outName = getOutputName(clargs, '.csv')
319-
metaknowledge.write_nodeAttributeFile(grph, outName)
326+
metaknowledge.writeNodeAttributeFile(grph, outName)
320327
else:
321328
outName = getOutputName(clargs, '.graphml')
322-
nx.write_graphml(grph, outName)
329+
nx.writeGraphml(grph, outName)
323330

324331
def mkCLI():
325332
try:

metaknowledge/bin/metaknowledgeDocsGen.py

Lines changed: 27 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,21 @@
1-
#!/usr/bin/env python3
2-
1+
#Written by Reid McIlroy-Young for Dr. John McLevey, University of Waterloo 2015
32
"""This is intended for metaknowledge only and may not work with anything else"""
43

54
import inspect
65
import argparse
76
import os
87
import time
98
import metaknowledge
9+
import metaknowledge.tagProcessing
1010
import importlib
1111
import re
1212

13-
documentedModules = ['visual', 'journalAbbreviations', 'tagProcessing']
13+
documentedModules = ['contour', 'journalAbbreviations', 'tagProcessing']
1414

1515
docsPrefix = time.strftime("%Y-%m-%d-")
1616

1717
blurbDict = {
18-
'visual' : "A nicer matplotlib graph visualizer and contour plot",
18+
'contour' : "A nicer matplotlib graph visualizer and contour plot",
1919
'tagProcessing' : "All the tags and how they are handled",
2020
'journalAbbreviations' : "Look here to get your J9 database",
2121
'Citation' : "Citation are special, here is how they are handled",
@@ -25,7 +25,7 @@
2525

2626
singleFileYAML = """---
2727
layout: page
28-
title: ""
28+
title: Full Documentation
2929
author:
3030
- name: Reid McIlroy-Young
3131
department:
@@ -90,7 +90,7 @@ def cleanargs(obj, basic = False):
9090
return '()'
9191

9292
def makeUrls(s):
93-
return "[{0}]({{{{ site.baseurl }}}}{{% post_url /docs/{1}{2} %}}#{3})".format(s.group(1), docsPrefix, s.group(2), s.group(3))
93+
return "[{0}]({{{{ site.baseurl }}}}{{{{ page.url }}}}#{3})".format(s.group(1), docsPrefix, s.group(2), s.group(3))
9494

9595
def makeSingleFileUrls(s):
9696
return "[{0}](#{1})".format(s.group(1), s.group(3))
@@ -137,7 +137,7 @@ def makeLine():
137137
"border-top-style: solid;",
138138
"border-bottom-style: solid;",
139139
]
140-
return('<hr style="{}">'.format(''.join(style)))
140+
return '<hr style="{}">'.format(''.join(style))
141141

142142
def makeTable(entries, header = '', prefix = '', withBlurbs = False, bigTable = False):
143143
ents = []
@@ -165,26 +165,31 @@ def writeFunc(fn, f, prefix = '', level = 5, singleFile = False):
165165
f.write("# Needs to be written\n\n")
166166
print("\033[93m{0}{1} had no docs\033[0m".format(prefix, fn[0]))
167167

168-
def writeClass(cl, f, prefix = '', level = 4, singleFile = False):
169-
f.write(makeTitle(prefix, cl[0], cleanargs(cl[1].__init__), singleFile = singleFile))
168+
def writeClass(cl, f, prefix = '', level = 4, singleFile = False, exceptMode = False):
169+
f.write(makeTitle(prefix, cl[0], "(_{}_)".format(cl[1].__bases__[0].__name__), singleFile = singleFile))
170+
if not exceptMode:
171+
f.write(makeTitle(prefix, "{}.__init__".format(cl[0]), cleanargs(cl[1].__init__), singleFile = singleFile))
170172
try:
171173
f.write(cleanedDoc(cl[1], lvl = level, singleFile = singleFile))
172174
except AttributeError:
173175
f.write("# Needs to be written\n\n")
174176
print("\033[93m{0}{1} had no docs\033[0m".format(prefix, cl[0]))
175177

176-
def proccessClass(cl, f, singleFile = False):
177-
writeClass(cl, f, singleFile = singleFile)
178+
def proccessClass(cl, f, singleFile = False, exceptMode = False):
179+
writeClass(cl, f, singleFile = singleFile, exceptMode = exceptMode)
178180
baseMems = inspect.getmembers(cl[1].__bases__[0])
179181
funcs = []
182+
if singleFile:
183+
f.write(makeLine())
180184
for m in sorted(inspect.getmembers(cl[1]), key = getLineNumber):
181-
if m[0][0] == '_' or m in baseMems:
185+
if m[0][0] == '_' or m in baseMems or m[0] == 'with_traceback':
182186
pass
183187
elif inspect.isfunction(m[1]):
184188
funcs.append(m)
185-
f.write(makeTable(funcs, prefix = cl[0], header = "The {} class has the following methods:".format(cl[0])))
186-
for m in funcs:
187-
writeFunc(m, f, prefix = '{}.'.format(cl[0], singleFile = singleFile))
189+
if len(m) > 0 and not exceptMode:
190+
f.write(makeTable(funcs, prefix = cl[0], header = "\nThe {} class has the following methods:".format(cl[0])))
191+
for m in funcs:
192+
writeFunc(m, f, prefix = '{}.'.format(cl[0], singleFile = singleFile))
188193

189194
def writeClassFile(name, typ, targetFile = None, singleFile = False):
190195
fname = docsPrefix + "{}.md".format(name)
@@ -216,8 +221,11 @@ def writeModuleFile(mod, targetFile = None, singleFile = False):
216221
funcs.append(m)
217222
if mod != "tagProcessing":
218223
f.write(makeTable(funcs, prefix = mod, header = "The {} module provides the following functions:".format(mod)))
219-
for fn in funcs:
220-
writeFunc(fn, f, prefix = "{}.".format(mod))
224+
for fn in funcs:
225+
writeFunc(fn, f, prefix = "{}.".format(mod))
226+
else:
227+
for fn in metaknowledge.tagProcessing.tagToFunc.items():
228+
writeFunc((metaknowledge.tagToFull(fn[0]), fn[1]), f, prefix = "{}.".format(mod))
221229
if targetFile is None:
222230
f.write("\n{% include docsFooter.md %}")
223231
f.close()
@@ -239,11 +247,8 @@ def writeMainBody(funcs, vrs, exceptions, targetFile = None, singleFile = False)
239247
writeFunc(fnc, f)
240248
first = True
241249
for excpt in exceptions:
242-
if first:
243-
first = False
244-
else:
245-
f.write(makeLine() + "\n\n")
246-
proccessClass(excpt, f)
250+
f.write(makeLine() + "\n\n")
251+
proccessClass(excpt, f, exceptMode = True)
247252
if targetFile is None:
248253
f.write("\n{% include docsFooter.md %}")
249254
f.close()

metaknowledge/bin/metaknowledgeMdToNb.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
1-
#!/usr/bin/env python3
2-
1+
#Written by Reid McIlroy-Young for Dr. John McLevey, University of Waterloo 2015
32
import argparse
43
import re
54
import os.path

metaknowledge/blondel.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1+
#Written by Reid McIlroy-Young for Dr. John McLevey, University of Waterloo 2015
2+
#Better implementations can be found on Pypi so this has been discontinued
3+
#None of these functions are tested
14
import metaknowledge
25
from .graphHelpers import _ProgressBar
36

0 commit comments

Comments
 (0)