~ Link Sorter ~
Link sorter is Windows (Framework 4) utility that allows to sort links according to semantic proximity to a given text sample. It supports 14 languages
  1. Czech
  2. Danish
  3. Dutch
  4. English
  5. Finnish
  6. French
  7. German
  8. Hungarian
  9. Italian
  10. Norwegian
  11. Portugal
  12. Romanian
  13. Russian
  14. Spanish
and extracts text from PDF formatted resources.

How it works

User enters list of links, provides wanted or searched text sample, selects language and two optional parameters. One of them is usage of stemming and another is filtering of stop words. Stemming is converting multiple word forms into simple one as "went", "goes" and "going" into "go". And stop words are meaningless frequently using words such as "any", "next", "another" and so on. Obviously, stemming and stop words filtering is written for a particular language and, on that reason, selection of correct language is critical.

For a quick test I added the self test button. It adds links, sets options and user needs only to start the processing.





The result will be saved into an HTML file and shown in the browser.



In order to collect links easy I also provided Firefox extension LinkReader, so user can use any search engine such as Google or Bing to collect the links. Yahoo does not show the links, so not all search engines can be used with my extension.

Download

LinkSorter

Formats and structures

When LinkSorter is unzipped it has only two files. One is DLL (LinkSorter.DLL) and another is executable (SearchUtility.exe). When links are processed they are saved into file "result.html" in the same folder with executable. Links can be saved and read from the text file. The format is obvious from the example below

--valid links--
https://www.blogger.com/?tab=wj
http://www.linternaute.com/ville/paris/ville-75056
http://greater-paris-investment-agency.com/
http://www.maxicours.com/se/fiche/8/1/228481.html
http://tatoeba.org/eng/sentences/show/331233
http://wikitravel.org/fr/Paris
--invalid links--
http://www.larousse.fr/encyclopedie/ville/Paris/137068
https://fr.vikidia.org/wiki/Paris
https://fr.vikidia.org/wiki/Paris#Paris_capitale_du_royaume_cap.C3.A9tien
http://www.pourquois.com/histoire_geo/pourquoi-paris-est-capitale-france.html

Two buttons marked as "=>" and "<=" are used to pass links between two listboxes with valid and invalid links. When links are pasted from the clipboard using Firefox extension they are presorted on valid and invalid. Those that are considered as invalid are some internal Google or Bing links with long generated IDs. They look like follows:

http://webcache.googleusercontent.com/search?q=cache:Az75KX8_OYAJ:www.huffingtonpost.com
My program puts them into invalid links category. User is free to put them back, but I would not recommend to use them, because Google may mistakenly identify such utility as programmatic wrapper around Google search engine and block the IP address.

Customization

This utility is free, but customization can be done for compensation. Those who are interested can contact developer. The contact information in on the main site "SemanticQuery.com".