HSK东西 Scripts Set Operations

Analyse Your 汉字 (?)

The purpose of this tool is to analyse a Chinese vocabulary list or block of Chinese text, to give information about the words and characters it contains.

See the (?) info buttons below below for more information about how to use the various features of this tool.

Vocabulary Actions (?)

Select the actions to be performed on the inputted text or vocabulary list.

The 'analyse' functions will give you statistics such as word counts, and for the HSK options how many words/characters you know at each HSK level.

The 'suggest' options show how the highest frequency words or characters that you don't know.

Analyse words/characters
Analyse HSK words
Analyse HSK characters
Suggest HSK words
Suggest HSK characters
Suggest words
Suggest words using input characters
Suggest characters

Annotated Version (?)

These options will output a version of your source text with popup information giving HSK levels, frequency stroke and radical information, etc.

The words/characters are also clickable, with the links taking you to a full dictionary entry. In addition, words/characters are coloured by HSK level.

Annotate words
Annotate characters

Input Options (?)

Choose one word/character per line when the input is a vocabulary list from Skritter or a flashcard text file. Anything after first whitespace on each line ignored.

Comma/whitespace separated will use the characters ,;| or any whitespace to separate the words in your input.

If pasting text from a web page or document use the 'Big block of text' option. This option is less precise, as word breaks have to be determined by this tool.

One word/character per line
Comma/whitespace separated
Big block of text

Output List Options (?)

The Comma Separated option will output comma separated words/characters, with no frequency information (the other two options will be ignored).

The other two otions add frequency information to the listboxes of hanzi characters that are output.

With the frequency index, 1 is the highest frequency word/character, and higher values are less frequent.

The raw word/character frequency is the actual frequency reported by SUBTLEX-CH, with higher values being more frequent, which helps to understand the relative frequency of each character.

Comma Separated
Add frequency index
Add raw frequency

Input your simpflified Chinese here (?)

This edit box is for the vocabulary list or block of text that you wish to analyse. Choose the format of your list using by selecting the appropriate value from the 'Input Options' section above.

To help to resolve ambiguous words when analysing a block of text, place a | character (vertical bar) between words.


Results (?)

All word/character lists are in descending order of frequency, with the most frequently used words/characters at the top of each list.

Analysis of Words/Characters in Input

Input contained:
Unique Entries
Unique Characters
SUBTLEX Words
CC-CEDICT Words

Analysis of HSK Words in Input

Input contained: Of the 8 unique words in the input:

Each word appeared only once in the input.

HSK 1
HSK 2
HSK 3
HSK 4
HSK 5
HSK 6
Non-HSK

Analysis of HSK Characters in Input

Input contained: Of the 8 unique characters in the input:

Each character appeared only once in the input.

HSK 1
HSK 2
HSK 3
HSK 4
HSK 5
HSK 6
Non-HSK

Annotated Characters

Page generated in 0.009430 seconds

If you find this site useful, let me know!