HSK东西 Scripts Analyse Your 汉字

Vocabulary Set Operations (?)

Use this tool to compare two vocabulary lists.

You can analyse the results using the Analyse Your 汉字 page.

Although intended for use with Chinese characters, any sets of text strings can be compared. See this example to see how it works.

Output Options (?)

Pretty self-explatory, try them out to see the difference.

One per line
Comma separated
Tab separated
Space separated

Word List A (?)

One word/character per line
Comma/whitespace separated
Big block of text

Word List B (?)

One word/character per line
Comma/whitespace separated
Big block of text

These edit boxes are for the vocabulary lists or blocks of text that you wish to compare. Choose the format of your lists using by selecting the appropriate values from the 'Input Options' section above.

To help to resolve ambiguous words when analysing a block of text, place a | character (vertical bar) between words.

Choose one word/character per line when the input is a vocabulary list from Skritter or a flashcard text file. Anything after first whitespace on each line ignored.

Comma/whitespace separated will use the characters ,;| or any whitespace to separate the words in your input.

If pasting text from a web page or document use the 'Big block of text' option. This option is less precise, as word breaks have to be determined by this tool.



Set Operations on Words (?)

Aw and Bw are the sets of all unique words derived from Word Lists A and B above.

Aw ∩ Bw Intersection, words that appear in both sets.
Aw ∪ Bw Union, both sets of words combined together as a single set.
Aw ∖ Bw Difference, words that are Aw but not Bw.
Bw ∖ Aw Difference, words that are Bw but not Aw.
Aw ∆ Bw Symmetric Difference, words that are in only one of the two sets.

All sets are sorted with the most frequently used words first.

Aw (3 words)
Bw (3 words)
Aw ∩ Bw (2 words)
Aw ∪ Bw (4 words)
Aw ∖ Bw (1 word)
Bw ∖ Aw (1 word)
Aw ∆ Bw (2 words)

Set Operations on Characters (?)

Ac and Bc are the sets of all unique characters derived from Word Lists A and B above.

Ac ∩ Bc Intersection, characters that appear in both sets.
Ac ∪ Bc Union, both sets of characters combined together as a single set.
Ac ∖ Bc Difference, characters that are Ac but not Bc.
Bc ∖ Ac Difference, characters that are Bc but not Ac.
Ac ∆ Bc Symmetric Difference, characters that are in only one of the two sets.

All sets are sorted with the most frequently used characters first.

Ac (11 chars)
Bc (11 chars)
Ac ∩ Bc (8 chars)
Ac ∪ Bc (14 chars)
Ac ∖ Bc (3 chars)
Bc ∖ Ac (3 chars)
Ac ∆ Bc (6 chars)

Page generated in 0.003443 seconds

If you find this site useful, let me know!