-
-
Notifications
You must be signed in to change notification settings - Fork 17
[MediaWiki] Generators
IAsyncEnumerable<T> and IAsyncEnumerator<T> are introduced in Ix.Async package as asynchronous counterpart for IEnumerable<T> and IEnumerator<T>. With Ix.Async package, You can consume these asynchronous enumerators in a somewhat similar manner as you are working with ordinary enumerators.
- You can use all the LINQ extension methods on
IAsyncEnumerator<T>. - You can use
Rx.NETpackage to convertIAsyncEnumerator<T>toIObservable<T>, if necessary. - For now, you can consume the items in
IAsyncEnumerator<T>sequentially using the expanded for-each pattern. (SeeShowAllTemplatesAsyncmethod below for example); later whenasync foris introduced into C# 8 (hopefully), you might be able to useasync for eachonIAsyncEnumerator<T>.
Some caveats when consuming the IAsyncEnumerator<T> taken out from generator classes in Wiki Client Library:
- Query continuations are carried out automatically by WCL. You just need to keep enumerating. WCL will request for more results from server when necessary.
- Choose a proper
PaginationSize. It decides (at most) how many items are to be fetched from server in one MediaWiki API request. So for example, if you are working with top 50 items fromRecentChangesGenerator, you might choose 50 rather than 10 (by default) asPaginationSizevalue, so they will all be fetched at one time. - The maximum value of allowed
PaginationSizeis usually 500 for normal users, and 5000 for users withapi-highlimitsright (typicallybotandsysop).- If you are using
PageQueryOptions.FetchContentflag withEnumPagesAsync, this limit will be lowered to 1/10, i.e. 50 for normal users, and 500 for users withapi-highlimitsright. - If you are using
PageQueryOptions.FetchExcerptflag withEnumPagesAsync, this limit will be lowered to 10 for normal users, and 20 for users withapi-highlimitsright. - Considering the stability of network traffic, it is advised that you use 50 for typical in-batch
WikiPageprocessing. PyWikiBot also uses this value for pagination in site.preload method.
- If you are using
- A common idiom for fetching a small number of results from the generator is as follows.
- If you are working with a large number of pages, it's recommended that you convert the returned
IAsyncEnumeratorto something likeIObservableorISourceBlock, or use expanded for-each pattern.
- If you are working with a large number of pages, it's recommended that you convert the returned
static async Task ShowRecentChangesAsync()
{
var generator = new RecentChangesGenerator(myWikiSite)
{
// Choose wisely.
PaginationSize = 50,
// Configure the generator, e.g. setting filter/sorting criteria
NamespaceIds = new[] {BuiltInNamespaces.Main, BuiltInNamespaces.File},
AnonymousFilter = PropertyFilterOption.WithProperty
};
// Gets the latest 50 changes made to article and File: namespace,
// by anonymous users.
var items = await generator.EnumItemsAsync().Take(50).ToList();
foreach (var i in items)
{
Console.WriteLine(i.Title);
// Show revision comments.
Console.Write("\t");
Console.WriteLine(i.Comment);
}
// When you want to fetch extracts for the pages, it's safe to fetch for no more than
// 10 pages at one time.
generator.PaginationSize = 10;
// Gets the latest 50 pages in article and File: namespace that were changed
// by anonymous users.
var pages = await generator.EnumPagesAsync(PageQueryOptions.FetchExtract).Take(50).ToList();
foreach (var i in pages)
{
Console.WriteLine(i.Title);
// Show abstract for each revised page.
Console.Write("\t");
Console.WriteLine(i.Extract);
}
}static async Task SearchAsync()
{
Console.Write("Enter your search keyword: ");
var generator = new SearchGenerator(myWikiSite, Console.ReadLine())
{
PaginationSize = 22
};
// We are only interested in the top 20 items.
foreach (var item in await generator.EnumItemsAsync().Take(20).ToList())
{
Console.WriteLine(item);
Console.WriteLine("\t{0}", item.Snippet);
}
}Most of the WikiPageGenerator-derived classes (including AllPagesGenerator) implement IWikiListGenerator<WikiPageStub>, i.e., .EnumItemsAsync() will return a sequence of WikiPageStub. If you are only interested in the titles of the pages, consider using .EnumItemsAsync() instead of .EnumPagesAsync().
Still, there are some classes implementing IWikiList<T> where T is something other than WikiPageStub, including
class RecentChangesGenerator : WikiPageGenerator<RecentChangeItem, WikiPage>, IWikiList<RecentChangeItem>, IWikiPageGenerator<WikiPage>class RecentChangesGenerator : WikiPageGenerator<RecentChangeItem, WikiPage>, IWikiList<RecentChangeItem>, IWikiPageGenerator<WikiPage>class SearchGenerator : WikiPageGenerator<SearchResultItem, WikiPage>, IWikiList<SearchResultItem>, IWikiPageGenerator<WikiPage>class GeoSearchGenerator : WikiPageGenerator<GeoSearchResultItem, WikiPage>, IWikiList<GeoSearchResultItem>, IWikiPageGenerator<WikiPage>-
class RevisionsGenerator : WikiPagePropertyGenerator<Revision, WikiPage>, IWikiList<Revision>, IWikiPageGenerator<WikiPage>The items These
static async Task ShowAllTemplatesAsync()
{
var generator = new AllPagesGenerator(myWikiSite)
{
StartTitle = "A",
NamespaceId = BuiltInNamespaces.Template,
PaginationSize = 50
};
// You can specify EnumPagesAsync(PageQueryOptions.FetchContent),
// if you are interested in the content of each page
using (var enumerator = generator.EnumPagesAsync().GetEnumerator())
{
int index = 0;
// Before the advent of "async for" (might be introduced in C# 8),
// to handle the items in sequence one by one, we need to use
// the expanded for-each pattern.
while (await enumerator.MoveNext(CancellationToken.None))
{
var page = enumerator.Current;
Console.WriteLine("{0}: {1}", index, page);
index++;
// Prompt user to continue listing, every 50 pages.
if (index % 50 == 0)
{
Console.WriteLine("Esc to exit, any other key for next page.");
if(Console.ReadKey().Key == ConsoleKey.Escape)
break;
}
}
}
}