Ucto-Webservice

Ucto is a rule-based tokeniser for multiple languages. This is the webservice for it, for both humans and machines.

Provided tools & services

Ucto Webservice

Ucto is a unicode-compliant tokeniser. It takes input in the form of one or more untokenised texts, and subsequently tokenises them. Several languages are supported, but the software is extensible to other languages.
Type
  • Web Application
Version
2.5.2
Service Provider
      Centre for Language and Speech Technology, Radboud University and KNAW Humanities Cluster
Input data
Name
*.txt
Description
Text document
Type
DigitalDocument
Encoding Format
text/plain
Output data
Name
*.tok
Description
Tokenised Text Document
Type
DigitalDocument
Encoding Format
text/plain
Name
error.log
Description
Log file with (standard) error output
Type
DigitalDocument
Encoding Format
text/plain
Name
*.vtok
Description
Verbosely Tokenised Text Document
Type
DigitalDocument
Encoding Format
text/plain
Name
*.xml
Description
Tokenised Text Document (FoLiA XML)
Type
TextDigitalDocument
Encoding Format
text/xml

Citation

You can cite this software using the following citation generated from its metadata:

(2024) Ucto-Webservice 2.5.2 .
  • KNAW Humanities Cluster & CLST, Radboud University
.

Logs & Reviews

Name
Automatic software metadata validation report for Ucto-Webservice 2.5.2
Author
  • codemetapy validator using software.ttl
Date
2024-04-20 04:03:23
Review
Please consult the CLARIAH Software Metadata Requirements at https://github.com/CLARIAH/clariah-plus/blob/main/requirements/software-metadata-requirements.md for an in-depth explanation of any found problems

Validation of Ucto-Webservice 2.5.2 was successful (score=3/5), but there are some warnings which should be addressed:

1. Warning: Software source code *SHOULD* link to a continuous integration service that builds the software and runs the software's tests (This is missing in the metadata)
2. Warning: Documentation *SHOULD* be expressed (This is missing in the metadata)
3. Info: Reference publications *SHOULD* be expressed, if any (This is missing in the metadata)
Rating
★ ★ ★ ☆ ☆
(log file starts at Sat Apr 20 04:03:09 UTC 2024)

[harvester info] --> Processing ucto-service (https://github.com/proycon/ucto_webservice) [Sat Apr 20 04:03:09 UTC 2024]

[harvester info] Git updating cached clone of https://github.com/proycon/ucto_webservice...

[harvester info] Found release v2.5.2

[harvester info] Using 'v2.5.2'

[harvester info] Git reference: v2.5.2

[harvester info] Scanning directory /tmp/codemeta-harvester.cache/ucto-service for harvestable resources...

[harvester info] found codemeta-harvest.json for ucto-service (md5sum 79f6bad556b55a273416daeae473f350); values in here take precendence over (override) those in later detection stages

[harvester info] found python setup for ucto-service, converting to codemeta

-- begin log --

No input files specified, but found python project (setup.py) in current dir, using that...

Generating egg_info

running egg_info

writing Ucto_Webservice.egg-info/PKG-INFO

writing dependency_links to Ucto_Webservice.egg-info/dependency_links.txt

writing requirements to Ucto_Webservice.egg-info/requires.txt

writing top-level names to Ucto_Webservice.egg-info/top_level.txt

reading manifest file 'Ucto_Webservice.egg-info/SOURCES.txt'

writing manifest file 'Ucto_Webservice.egg-info/SOURCES.txt'

Adding to contextgraph: /tmp/turtle

Initial URI automatically generated, may be overriden later: https://webservices.cls.ru.nl/portal/ucto-webservice

Processing source #1 of 1

Obtaining python package metadata for: Ucto_Webservice

Loading metadata from Ucto_Webservice via importlib.metadata

WARNING: No translation for distutils or pyproject.toml key Metadata-Version

WARNING: No translation for distutils or pyproject.toml key Description

Found dependency CLAM >= 3.2.10

Found dependency FoLiA-tools 

[CODEMETA COMPOSITION (ucto-webservice)] processed 43 new triples, total is now 44

Remapping URI to (possibly) new identifier and version component: https://webservices.cls.ru.nl/portal/ucto-webservice -> https://webservices.cls.ru.nl/portal/ucto-webservice/2.5.2

[CODEMETA VALIDATION (ucto-webservice)] codeRepository not set

[CODEMETA VALIDATION (ucto-webservice)] done

-- end log --

[harvester info] Looking for license....

[harvester info] No license file found

-- begin log --

Trying README.md ...

No license detected

-- end log --

[harvester info] Getting contributors from git...

-- begin log --

Adding to contextgraph: /tmp/turtle

Initial URI automatically generated, may be overriden later: https://webservices.cls.ru.nl/portal/ucto-service-contributors

Processing source #1 of 1

Extracting contributors from /tmp/codemeta-harvester.cache//tmp/ucto-service.CONTRIBUTORS

[CODEMETA COMPOSITION (https://webservices.cls.ru.nl/portal/ucto-service-contributors)] processed 7 new triples, total is now 8

Remapping URI to (possibly) new identifier and version component: https://webservices.cls.ru.nl/portal/ucto-service-contributors -> https://webservices.cls.ru.nl/portal/ucto-service.contributors/snapshot

[CODEMETA VALIDATION (https://webservices.cls.ru.nl/portal/ucto-service.contributors/snapshot)] codeRepository not set

[CODEMETA VALIDATION (https://webservices.cls.ru.nl/portal/ucto-service.contributors/snapshot)] author not set

[CODEMETA VALIDATION (https://webservices.cls.ru.nl/portal/ucto-service.contributors/snapshot)] license not set

[CODEMETA VALIDATION (https://webservices.cls.ru.nl/portal/ucto-service.contributors/snapshot)] done

-- end log --

[harvester info] Extracting last and first commit date from git log....

[harvester info] Date created: 2022-04-08T14:07:37Z+0200, date modified: 2024-03-14T21:54:52Z+0100

[harvester info] Querying Github/GitLab API (https://github.com/proycon/ucto_webservice)

-- begin log --

Adding to contextgraph: /tmp/turtle

Initial URI automatically generated, may be overriden later: https://webservices.cls.ru.nl/portal/ucto-webservice

Processing source #1 of 1

Querying GitAPI parser for https://github.com/proycon/ucto_webservice

    Parsing Github API response

[CODEMETA COMPOSITION (https://webservices.cls.ru.nl/portal/ucto-webservice)] processed 16 new triples, total is now 17

Remapping URI to (possibly) new identifier and version component: https://webservices.cls.ru.nl/portal/ucto-webservice -> https://webservices.cls.ru.nl/portal/ucto_webservice/snapshot

[CODEMETA VALIDATION (https://webservices.cls.ru.nl/portal/ucto_webservice/snapshot)] license not set

[CODEMETA VALIDATION (https://webservices.cls.ru.nl/portal/ucto_webservice/snapshot)] done

Querying https://api.github.com/repos/proycon/ucto_webservice

Remaining github API requests: 4987 ### Next rate limit reset at: 2024-04-20 05:00:36 (has_token=True)

Querying https://api.github.com/users/proycon

Remaining github API requests: 4986 ### Next rate limit reset at: 2024-04-20 05:00:36 (has_token=True)

-- end log --

[harvester info] Found releaseNotes

[harvester info] Querying Zenodo API for DOI (access token provided)...

[harvester info] Looking for TRL information in README.md...

-- begin log --

-- end log --

[harvester info] Looking for repostatus information in README.md...

[harvester info] Found repostatus https://www.repostatus.org/#active

-- begin log --

-- end log --

[harvester info] Looking for continuous integration information in README.md...

-- begin log --

-- end log --

[harvester info] Looking for documentation links in README.md...

-- begin log --

-- end log --

[harvester info] Falling back to git tag (v2.5.2) if no version number is specified...

[harvester info] Inferring repostatus information from git activity (used only as a fallback if not explicitly provided)...

[harvester info] Inferred repostatus https://www.repostatus.org/#active

[harvester info] Looking for repostatus information in README.md in master branch...

[harvester info] Found repostatus (master branch) https://www.repostatus.org/#active

-- begin log --

-- end log --

[harvester info] Found README.md

[harvester info] Reconciliating: codemetapy  --baseuri https://webservices.cls.ru.nl/portal --baseuri https://webservices.cls.ru.nl/portal --includecontext --addcontext https://w3id.org/nwo-research-fields --addcontext https://w3id.org/research-technology-readiness-levels --addcontextgraph https://vocabs.dariah.eu/rest/v1/tadirah/data?format=text/turtle --trl --identifier "ucto-service" --codeRepository "https://github.com/proycon/ucto_webservice" --validate /etc/software.ttl --released --enrich --textv "Please consult the CLARIAH Software Metadata Requirements at https://github.com/CLARIAH/clariah-plus/blob/main/requirements/software-metadata-requirements.md for an in-depth explanation of any found problems" -O /tmp/out/ucto-service.codemeta.json /tmp/codemeta-harvester.cache//tmp/99-version.ucto-service.codemeta.json /tmp/codemeta-harvester.cache//tmp/99-repostatus.ucto-service.codemeta.json /tmp/codemeta-harvester.cache//tmp/43-releasenotes.ucto-service.codemeta.json /tmp/codemeta-harvester.cache//tmp/41-readme.ucto-service.codemeta.json /tmp/codemeta-harvester.cache//tmp/40-gitapi.ucto-service.codemeta.json /tmp/codemeta-harvester.cache//tmp/39-gitdate.ucto-service.codemeta.json /tmp/codemeta-harvester.cache//tmp/32-contributors.ucto-service.codemeta.json /tmp/codemeta-harvester.cache//tmp/20-python.ucto-service.codemeta.json /tmp/codemeta-harvester.cache//tmp/11-repostatus.ucto-service.codemeta.json /tmp/codemeta-harvester.cache//tmp/10-harvest.ucto-service.codemeta.json /tmp/codemeta-harvester.cache//tmp/05-repostatus.ucto-service.codemeta.json 

-- begin log --

Passed 11 files/sources but specified 0 input types! Automatically guessing types...

Detected input types: [('/tmp/codemeta-harvester.cache//tmp/99-version.ucto-service.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/99-repostatus.ucto-service.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/43-releasenotes.ucto-service.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/41-readme.ucto-service.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/40-gitapi.ucto-service.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/39-gitdate.ucto-service.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/32-contributors.ucto-service.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/20-python.ucto-service.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/11-repostatus.ucto-service.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/10-harvest.ucto-service.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/05-repostatus.ucto-service.codemeta.json', 'json')]

Adding to contextgraph: /tmp/turtle

Initial URI automatically generated, may be overriden later: https://webservices.cls.ru.nl/portal/ucto-service

Processing source #1 of 11

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/99-version.ucto-service.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://webservices.cls.ru.nl/portal/ucto-service

[CODEMETA COMPOSITION (https://webservices.cls.ru.nl/portal/ucto-service)] processed 1 new triples, total is now 2

Processing source #2 of 11

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/99-repostatus.ucto-service.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://webservices.cls.ru.nl/portal/ucto-service

[CODEMETA COMPOSITION (https://webservices.cls.ru.nl/portal/ucto-service)] processed 1 new triples, total is now 3

Processing source #3 of 11

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/43-releasenotes.ucto-service.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://webservices.cls.ru.nl/portal/ucto-service

[CODEMETA COMPOSITION (https://webservices.cls.ru.nl/portal/ucto-service)] processed 2 new triples, total is now 5

Processing source #4 of 11

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/41-readme.ucto-service.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://webservices.cls.ru.nl/portal/ucto-service

[CODEMETA COMPOSITION (https://webservices.cls.ru.nl/portal/ucto-service)] processed 1 new triples, total is now 6

Processing source #5 of 11

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/40-gitapi.ucto-service.codemeta.json

    Found main resource with URI https://webservices.cls.ru.nl/portal/ucto_webservice/snapshot

    Injected (possibly temporary) URI https://webservices.cls.ru.nl/portal/ucto-service

[CODEMETA COMPOSITION (https://webservices.cls.ru.nl/portal/ucto-service)] processed 19 new triples, total is now 24

Processing source #6 of 11

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/39-gitdate.ucto-service.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://webservices.cls.ru.nl/portal/ucto-service

[CODEMETA COMPOSITION (https://webservices.cls.ru.nl/portal/ucto-service)] overriding old http://schema.org/dateCreated (2022-04-08T11:59:05Z -> 2022-04-08T14:07:37Z+0200)

[CODEMETA COMPOSITION (https://webservices.cls.ru.nl/portal/ucto-service)] overriding old http://schema.org/dateModified (2024-03-14T20:59:44Z -> 2024-03-14T21:54:52Z+0100)

[CODEMETA COMPOSITION (https://webservices.cls.ru.nl/portal/ucto-service)] processed 2 new triples, total is now 24

Processing source #7 of 11

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/32-contributors.ucto-service.codemeta.json

    Found main resource with URI https://webservices.cls.ru.nl/portal/ucto-service.contributors/snapshot

    Injected (possibly temporary) URI https://webservices.cls.ru.nl/portal/ucto-service

[CODEMETA COMPOSITION (https://webservices.cls.ru.nl/portal/ucto-service)] processed 8 new triples, total is now 27

Processing source #8 of 11

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/20-python.ucto-service.codemeta.json

    Found main resource with URI https://webservices.cls.ru.nl/portal/ucto-webservice/2.5.2

    Injected (possibly temporary) URI https://webservices.cls.ru.nl/portal/ucto-service

[CODEMETA COMPOSITION (ucto-webservice)] overriding old http://schema.org/author (https://webservices.cls.ru.nl/portal/stub/H1978d489d3ffd63d -> https://webservices.cls.ru.nl/portal/stub/H-69f8fefb1df6f8d6)

[CODEMETA COMPOSITION (ucto-webservice)] overriding old http://schema.org/description (Webservice for the ucto, a rule-based tokeniser for multiple languages -> Ucto is a rule-based tokeniser for multiple languages. This is the webservice for it, for both humans and machines.)

[CODEMETA COMPOSITION (ucto-webservice)] overriding old http://schema.org/name (ucto_webservice -> Ucto-Webservice)

[CODEMETA COMPOSITION (ucto-webservice)] overriding old http://schema.org/version (v2.5.2 -> 2.5.2)

[CODEMETA COMPOSITION (ucto-webservice)] processed 59 new triples, total is now 77

Processing source #9 of 11

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/11-repostatus.ucto-service.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://webservices.cls.ru.nl/portal/ucto-service

[CODEMETA COMPOSITION (ucto-webservice)] processed 1 new triples, total is now 77

Processing source #10 of 11

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/10-harvest.ucto-service.codemeta.json

    Injected (possibly temporary) URI https://webservices.cls.ru.nl/portal/ucto-service

[CODEMETA COMPOSITION (ucto-webservice)] overriding old http://schema.org/softwareRequirements (https://webservices.cls.ru.nl/portal/dependency/folia-tools -> https://webservices.cls.ru.nl/portal/stub/H-30a6bf3b06b1faa5)

[CODEMETA COMPOSITION (ucto-webservice)] overriding old http://schema.org/softwareRequirements (https://webservices.cls.ru.nl/portal/dependency/clam-ge-3-2-10 -> https://webservices.cls.ru.nl/portal/stub/H-30a6bf3b06b1faa5)

[CODEMETA COMPOSITION (ucto-webservice)] overriding old http://schema.org/applicationCategory (Internet > WWW/HTTP > WSGI > Application -> https://vocabs.dariah.eu/tadirah/annotating)

[CODEMETA COMPOSITION (ucto-webservice)] overriding old http://schema.org/applicationCategory (Text Processing > Linguistic -> https://vocabs.dariah.eu/tadirah/annotating)

[CODEMETA COMPOSITION (ucto-webservice)] processed 28 new triples, total is now 99

Processing source #11 of 11

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/05-repostatus.ucto-service.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://webservices.cls.ru.nl/portal/ucto-service

[CODEMETA COMPOSITION (ucto-webservice)] processed 1 new triples, total is now 99

Remapping URI to (possibly) new identifier and version component: https://webservices.cls.ru.nl/portal/ucto-service -> https://webservices.cls.ru.nl/portal/ucto-service/2.5.2

[CODEMETA VALIDATION (ucto-service)] done

[CODEMETA ENRICHMENT (ucto-service)] Guessing interface type http://schema.org/WebAPI based on clues

[CODEMETA ENRICHMENT (ucto-service)] automatically adding programmingLanguage Python derived from runtimePlatform Python

[CODEMETA ENRICHMENT (ucto-service)] automatically adding programmingLanguage Python derived from runtimePlatform Python

[CODEMETA ENRICHMENT (ucto-service)] automatically adding programmingLanguage Python derived from runtimePlatform Python

[CODEMETA ENRICHMENT (ucto-service)] automatically adding programmingLanguage Python derived from runtimePlatform Python

[CODEMETA ENRICHMENT (ucto-service)] automatically adding programmingLanguage Python derived from runtimePlatform Python

[CODEMETA ENRICHMENT (ucto-service)] automatically adding programmingLanguage Python derived from runtimePlatform Python

[CODEMETA ENRICHMENT (ucto-service)] adding affiliation(s) of first author as producer

VALIDATION https://webservices.cls.ru.nl/portal/ucto-service/2.5.2 #1: Warning: Software source code *SHOULD* link to a continuous integration service that builds the software and runs the software's tests (This is missing in the metadata)

VALIDATION https://webservices.cls.ru.nl/portal/ucto-service/2.5.2 #2: Warning: Documentation *SHOULD* be expressed (This is missing in the metadata)

VALIDATION https://webservices.cls.ru.nl/portal/ucto-service/2.5.2 #3: Info: Reference publications *SHOULD* be expressed, if any (This is missing in the metadata)

-- end log --

[harvester info] Output written to /tmp/out/ucto-service.codemeta.json

[harvester info] Harvesting remote service URL https://webservices.cls.ru.nl/ucto/ for ucto-service: codemetapy  --baseuri https://webservices.cls.ru.nl/portal --baseuri https://webservices.cls.ru.nl/portal --includecontext --addcontext https://w3id.org/nwo-research-fields --addcontext https://w3id.org/research-technology-readiness-levels --addcontextgraph https://vocabs.dariah.eu/rest/v1/tadirah/data?format=text/turtle --trl -O "/tmp/codemeta-harvester.cache//tmp/ucto-service.codemeta.json" "/tmp/out/ucto-service.codemeta.json" "https://webservices.cls.ru.nl/ucto/"

-- begin log --

Passed 2 files/sources but specified 0 input types! Automatically guessing types...

Detected input types: [('/tmp/out/ucto-service.codemeta.json', 'json'), ('https://webservices.cls.ru.nl/ucto/', 'web')]

Adding to contextgraph: /tmp/turtle

Initial URI automatically generated, may be overriden later: https://webservices.cls.ru.nl/portal/ucto-service

Processing source #1 of 2

Parsing json-ld file from /tmp/out/ucto-service.codemeta.json

    Found main resource with URI https://webservices.cls.ru.nl/portal/ucto-service/2.5.2

    Injected (possibly temporary) URI https://webservices.cls.ru.nl/portal/ucto-service

[CODEMETA COMPOSITION (ucto-service)] processed 135 new triples, total is now 135

Processing source #2 of 2

Fallback: Obtaining metadata from remote URL https://webservices.cls.ru.nl/ucto/

    Service replied with content-type application/ld+json

    Parsing json...

    Found main resource with URI https://webservices.cls.ru.nl/ucto

    Injected (possibly temporary) URI https://webservices.cls.ru.nl/portal/webapplication/Nbc3c8ca1e37dbf2b918ef0d5143f4416

Adding service (targetProduct) https://webservices.cls.ru.nl/ucto/

[CODEMETA COMPOSITION (ucto-service)] processed 45 new triples, total is now 181

Remapping URI to (possibly) new identifier and version component: https://webservices.cls.ru.nl/portal/ucto-service -> https://webservices.cls.ru.nl/portal/ucto-service/2.5.2

[CODEMETA VALIDATION (ucto-service)] removing stub targetProduct (WebApplication or WebAPI) without a URL, as we already have one (or more) with URL

[CODEMETA VALIDATION (ucto-service)] done

-- end log --

[harvester info] <-- Finished processing ucto-service (https://github.com/proycon/ucto_webservice) [Sat Apr 20 04:03:28 UTC 2024]

        

Metadata Properties

Version
2.5.2 (release notes)
Interface types
  • Web Application
Software website
Source code repository
 https://github.com/proycon/ucto_webservice  Stars are an indicator of the popularity of this project on GitHub
Category
  • Annotating
  • Linguistics
  • Tagging
  • Textual and content analysis
Keywords
  • clam webservice rest nlp computational_linguistics rest
Development Status
  • 8 - Complete: Technology complete and qualified, released for all end-users in scholarly environments.
  • Active: The project has reached a stable, usable state and is being actively developed.
Issue Tracker (Support)
https://github.com/proycon/ucto_webservice/issues  The number of open issues on the issue tracker  The number of closes issues on the issue tracker
Documentation
License
Author(s)
Maintainer(s)
Contributor(s)
Producer
  •   KNAW Humanities Cluster & CLST, Radboud University
Programming Language
  • Python
Runtime Platform
  • Python 3
  • Python 3.10
  • Python 3.6
  • Python 3.7
  • Python 3.8
  • Python 3.9
Operating System
  • BSD
  • Linux
  • macOS
Software dependencies
  • ucto
Metadata validation
★ ★ ★ ☆ ☆
Created
2022-04-08 14:07:37 +0200
Last modified
2024-03-14 21:54:52 +0100  Last commit (main branch). Gives an indication of project development activity and rough indication of how up-to-date the latest release is.  Number of commits since the last release. Gives an indication of project development activity and rough indication of how up-to-date the latest release is.