Skip to content

Commit de817e7

Browse files
Merge pull request #424 from scholarly-python-package/develop
Releasing v1.7.0
2 parents f76a131 + ef32ca2 commit de817e7

File tree

9 files changed

+196
-68
lines changed

9 files changed

+196
-68
lines changed

.github/workflows/pythonpackage.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ name: Python package
55

66
on:
77
schedule:
8-
- cron: "0 0 * * *"
8+
- cron: "0 0 1,10,20 * *"
99
push:
1010
branches: [main, develop]
1111
pull_request:
@@ -22,9 +22,9 @@ jobs:
2222
os: [ubuntu-latest, macos-latest, windows-latest]
2323

2424
steps:
25-
- uses: actions/checkout@v2
25+
- uses: actions/checkout@v3
2626
- name: Set up Python ${{ matrix.python-version }}
27-
uses: actions/setup-python@v1
27+
uses: actions/setup-python@v3
2828
with:
2929
python-version: ${{ matrix.python-version }}
3030
- name: Lint with flake8

CHANGELOG.md

Lines changed: 65 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,44 +1,99 @@
11
# CHANGELOG
22

3+
## Changes in v1.7.0
4+
5+
### Features
6+
- Add a new `citation` entry to `pub` fetched from an author profile with formatted citation entry #423.
7+
8+
### Bugfixes
9+
- Fix pprint failures on Windows #413.
10+
- Thoroughly handle 1000 or more publications that are available (or not) according to public access mandates #414.
11+
- Fix errors in `download_mandates_csv` that may occassionally occur for agencies without a policy link #413.
12+
13+
## Changes in v1.6.3
14+
15+
### Bugfix
16+
- search_pubs method did not respect include_last_year, which is now fixed #420, #421.
17+
18+
### Enhancement
19+
- Unit tests involving funding agency mandates are a bit more robust.
20+
21+
## Changes in v1.6.2
22+
23+
### Bugfix
24+
- Fix an error in the workflow publishing to PyPI.
25+
26+
## Changes in v1.6.1
27+
28+
### Bugfix
29+
- Handle 1000 or more publications that are available (or not) according to public access mandates #414.
30+
31+
### Enhancement
32+
- Fetch 20+ coauthors without requiring geckodriver/chrome-driver to be installed #411.
33+
34+
## Changes in v1.6.0
35+
36+
### Features
37+
- Download table of funding agencies as a CSV file with URL to the funding mandates included
38+
- Downlad top-ranking journals in general, under sub-categories and in different languages as a CSV file
39+
40+
### Bugfixes
41+
- #392
42+
- #394
43+
44+
## Changes in v1.5.1
45+
46+
### Feature
47+
- Support chromium (chrome-driver) as an alternative to geckodriver #387
48+
49+
### Improvements
50+
- Firefox/Geckodriver operates in headless mode
51+
- Increase test coverage to include all public APIs
52+
- Clean up legacy code and improve coding styles
53+
- Remove the use of deprecated functions in dependency packages
54+
55+
### Bugfix
56+
- Stop attempting to reuse a closed webdriver
57+
358
## Changes in v1.5.0
4-
## Features
59+
### Features
560
- Fetch the public access mandates information from a Scholar profile and mark the publications whether or not they satisfy the open-access mandate.
661
- Fetch an author's organization identifer from their Scholar profile
762
- Search for all authors affiliated with an organization
863
- Fetch homepage URL from a Scholar profile
9-
## Enhancements
64+
### Enhancements
1065
- Make `FreeProxies` more robust
1166
- Stop the misleading traceback error message #313
12-
## Bugfixes
67+
### Bugfix
1368
- Fix bug in exception handling #366
1469
---
1570
## Changes in v1.4.4
16-
## Bugfix
71+
### Bugfix
1772
- Fix a bug that would have prevented setting up ScraperAPI with exactly 1000 successful requests during the first week of the trial #356
18-
## Enhancement
73+
### Enhancement
1974
- Use FreeProxy instead of premium proxy servers when possible
2075
---
2176
## Changes in v1.4.3
22-
## Bugfix
77+
### Bugfixes
2378
- Fill the complete title of publications even if it appears truncated
2479
- Robustly handle exceptions when more than 20 coauthors of a scholar cannot be fetched
2580
---
2681
## Changes in v1.4.2
27-
## Bugfix
82+
### Bugfix
2883
- ScraperAPI proxy works reliably
2984
---
3085
## Changes in v1.4.0
31-
## Features
86+
### Features
3287
- Fetch the complete list of coauthors #322
3388
- Fetch all citeids for a given publication #324
3489
- Make scholarly objects inherently serializable #325
3590
- Expose scholarly specific exceptions #327
36-
## Bugfixes
91+
### Bugfixes
3792
- Test Tor on macOS and skip the test if tor is not installed #323
3893
- Get cites_id and citedby_url without having to fill the publication #328
3994
---
4095
## Changes in v1.3.0
41-
## Features
96+
### Features
4297
- Make the Author and Publication objects serializable
4398
- Make `cites_id` a list to allow for multiple values
4499
- Fetch all (more than 20) coauthors from a Scholar profile

requirements-dev.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
coverage
22
flake8
3+
pandas
34
sphinx_rtd_theme

scholarly/_scholarly.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
_PUBSEARCH = '/scholar?hl=en&q={0}'
1919
_CITEDBYSEARCH = '/scholar?hl=en&cites={0}'
2020
_ORGSEARCH = "/citations?view_op=view_org&hl=en&org={0}"
21-
_MANDATES_URL = "https://scholar.google.com/citations?view_op=mandates_leaderboard_csv"
21+
_MANDATES_URL = "https://scholar.google.com/citations?view_op=mandates_leaderboard_csv&hl=en"
2222

2323

2424
class _Scholarly:
@@ -434,7 +434,7 @@ def pprint(self, object: Author or Publication)->None:
434434
del publication['container_type']
435435

436436
del to_print['container_type']
437-
print(pprint.pformat(to_print))
437+
print(pprint.pformat(to_print).encode("utf-8"))
438438

439439
def search_org(self, name: str, fromauthor: bool = False) -> list:
440440
"""
@@ -485,7 +485,7 @@ def download_mandates_csv(self, filename: str, overwrite: bool = False,
485485
"setting overwrite=True")
486486
text = self.__nav._get_page(_MANDATES_URL, premium=False)
487487
if include_links:
488-
soup = self.__nav._get_soup("/citations?view_op=mandates_leaderboard")
488+
soup = self.__nav._get_soup("/citations?hl=en&view_op=mandates_leaderboard")
489489
text = text.replace("Funder,", "Funder,Policy,Cached,", 1)
490490
for agency in soup.find_all("td", class_="gsc_mlt_t"):
491491
cached = agency.find("span", class_="gs_a").a["href"]

scholarly/author_parser.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -132,9 +132,9 @@ def _fill_public_access(self, soup, author):
132132
not_available = soup.find('div', class_='gsc_rsb_m_na')
133133
n_available, n_not_available = 0, 0
134134
if available:
135-
n_available = int(available.text.split(" ")[0].replace(",", ""))
135+
n_available = int(re.sub("[.,]", "", available.text.split(" ")[0]))
136136
if not_available:
137-
n_not_available = int(not_available.text.split(" ")[0].replace(",", ""))
137+
n_not_available = int(re.sub("[.,]", "", not_available.text.split(" ")[0]))
138138

139139
author["public_access"] = PublicAccess(available=n_available,
140140
not_available=n_not_available)

scholarly/data_types.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,7 @@ class BibEntry(TypedDict, total=False):
119119
:param number: NA number of a publication
120120
:param pages: range of pages
121121
:param publisher: The publisher's name
122+
:param citation: Formatted citation string, usually containing journal name, volume and page numbers (source: AUTHOR_PUBLICATION_ENTRY)
122123
:param pub_url: url of the website providing the publication
123124
"""
124125
pub_type: str
@@ -133,6 +134,7 @@ class BibEntry(TypedDict, total=False):
133134
number: str
134135
pages: str
135136
publisher: str
137+
citation: str
136138

137139

138140
class Mandate(TypedDict, total=False):

scholarly/publication_parser.py

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,9 @@
99
_CITATIONPUB = '/citations?hl=en&view_op=view_citation&citation_for_view={0}'
1010
_SCHOLARPUB = '/scholar?hl=en&oi=bibs&cites={0}'
1111
_CITATIONPUBRE = r'citation_for_view=([\w-]*:[\w-]*)'
12-
_BIBCITE = '/scholar?q=info:{0}:scholar.google.com/\
12+
_BIBCITE = '/scholar?hl=en&q=info:{0}:scholar.google.com/\
1313
&output=cite&scirp={1}&hl=en'
14-
_CITEDBYLINK = '/scholar?cites={0}'
14+
_CITEDBYLINK = '/scholar?hl=en&cites={0}'
1515
_MANDATES_URL = '/citations?view_op=view_mandate&hl=en&citation_for_view={0}'
1616

1717
_BIB_MAPPING = {
@@ -127,6 +127,13 @@ def _citation_pub(self, __data, publication: Publication):
127127
and len(year.text) > 0):
128128
publication['bib']['pub_year'] = year.text.strip()
129129

130+
author_citation = __data.find_all('div', class_='gs_gray')
131+
try:
132+
citation = author_citation[1].text
133+
except IndexError:
134+
citation = ""
135+
publication['bib']['citation'] = citation
136+
130137
return publication
131138

132139
def get_publication(self, __data, pubtype: PublicationSource)->Publication:

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55

66
setuptools.setup(
77
name='scholarly',
8-
version='1.6.3',
8+
version='1.7.0',
99
author='Steven A. Cholewiak, Panos Ipeirotis, Victor Silva, Arun Kannawadi',
1010
author_email='steven@cholewiak.com, panos@stern.nyu.edu, vsilva@ualberta.ca, arunkannawadi@astro.princeton.edu',
1111
description='Simple access to Google Scholar authors and citations',

0 commit comments

Comments
 (0)