public class NameML extends Object
NameML.init();
NameML.isName("Mouse");
--> true
NameML.isAbbreviation("PMM");
--> true
NameML.isPersonName("Mickey Mouse",Language.ENGLISH);
--> false
NameML.couldBePersonName("Mickey Mouse",Language.ENGLISH);
--> true
NameML.isPersonName("Pope Mickey Mouse",Language.ENGLISH);
--> true
NameML.isPersonName("Pope Mickey Mouse",Language.SPANISH);
--> false
NameML.isPersonName("Pope Mickey Mouse",Language.GERMAN);
--> false
NameML.isPersonName("Papst Mickey Mouse",Language.GERMAN);
--> true
NameML.of("Prof. Dr. Fabian the Great III of Saarbruecken").describe()
// equivalent to new PersonName(...) in this case
-->
PersonName
Original: Prof. Dr. Fabian the Great III of Saarbruecken
Titles: Prof. Dr.
Given Name: Fabian
Given Names: Fabian
Family Name Prefix: null
Attribute Prefix: the
Family Name: null
Attribute: Great
Family Name Suffix: null
Roman: III
City: Saarbruecken
Normalized: Fabian_Great
IMPORTANT: !Note that for some recognition methods the class falls back to English as the target language
since not all methods have been adapted yet for multi-language support and be aware that the
interface might change for methods that are not yet language-dependent!
Also note that currently you need to initialize the class first by calling one of the init functions
before you can use most of its functions! Otherwise it may throw null pointer errors!
TODO: Turn completely into an instantiable object? then we would not need to init nor to load all languages while only using 1| Modifier and Type | Class and Description |
|---|---|
static class |
NameML.AbbreviationML |
static class |
NameML.CompanyNameML |
static class |
NameML.PersonNameML |
| Modifier and Type | Field and Description |
|---|---|
static String |
A
Contains characters
|
static String |
ANYNAME
Holds the general default name
|
static String |
attributePrefix
Contains attribute Prefixes (like "the" in "Alexander the Great")
|
static Pattern |
attributePrefixPattern |
static String |
B
Contains blank
|
static String |
BC
Contains blank with optional comma
|
static String |
BD
Contains a word boundary
|
static String |
companyNameSuffix
Contains common company name suffixes (like "Inc")
|
static Pattern |
companyNameSuffixPattern |
static String |
DG
Contains digits
|
static String |
directFamilyNamePrefix
A direct family name prefix (such as "Mc")
|
static String |
familyName
Name component with an optional familyNamePrefix and postfix
|
static String |
familyNamePrefix
Contains common family name prefixes (like "von")
|
static Pattern |
familyNamePrefixPattern |
static String |
familyNameSuffix
Contains common name suffixes (like "Junior")
|
static Pattern |
familyNameSuffixPattern |
static String |
givenName
The pattern "Name[-Name]"
|
static String |
givenNameComponent
The pattern "Name."
|
static String |
givenNames
The pattern (personNameComponent+B)+
|
static String |
H
Contains hypens
|
protected static boolean |
hasBeenInitialized |
static String |
L
Contains lower case Characters
|
static Map<String,String> |
languageCodes
Language codes
|
static Pattern |
laxAbbreviationPattern
Contains the lax pattern for abbreviations
|
static Pattern |
laxCompanyPattern
Contains the pattern for companies
|
static String |
laxName
Contains the pattern for names.
|
static Pattern |
laxNamePattern
Contains the pattern for names.
|
static Pattern |
laxPersonNamePatternDe |
static Pattern |
laxPersonNamePatternEn |
static Pattern |
laxPersonNamePatternEs |
static Pattern |
laxPersonNamePatternFr |
static Pattern |
laxPersonNamePatternIt |
static Map<String,String> |
nationality2country |
static String |
nickName
Nickname '...'
|
protected String |
normalized
Holds the normalized name
|
static String |
of
Contains the English "of"
|
static String |
or
Contains "|"
|
protected String |
original
Holds the original name
|
static String |
personNameComponent
The pattern "Name"
|
static String |
prep
Contains prepositions
|
static String |
roman
Contains romam digits
|
static Pattern |
safeAbbreviationPattern
Contains the safe pattern for abbreviations
|
static Pattern |
safeCompanyPattern
Contains the safe pattern for companies
|
static String |
safeName
Contains a pattern that indicates strings that are very likely to be
names
|
static Pattern |
safeNamePattern
Contains a pattern that indicates strings that are very likely to be
names
|
static Pattern |
safeNamesPattern
Contains a pattern that indicates strings that are very likely to be
names
|
static Pattern |
safeNamesPatternNoPrep
Contains a pattern that indicates strings that are very likely to be
names
|
static Pattern |
safePersonNamePatternDe |
static Pattern |
safePersonNamePatternEn |
static Pattern |
safePersonNamePatternEs |
static Pattern |
safePersonNamePatternFr |
static Pattern |
safePersonNamePatternIt |
protected static Set<String> |
stopWordDE |
protected static Set<String> |
stopWordEN |
protected static Set<String> |
stopWordES |
protected static Set<String> |
stopWordFR |
protected static Set<String> |
stopWordIT |
static String |
teamName
team name
|
static Pattern |
teamNamePattern |
static Pattern |
titlePatternDe |
static Pattern |
titlePatternEn
Matches common titles (like "Mr.")
|
static Pattern |
titlePatternEs |
static Pattern |
titlePatternFr |
static Pattern |
titlePatternIt |
protected static Set<String> |
titlesForGivenNamesDe |
protected static Set<String> |
titlesForGivenNamesEn
Contains those titles that go with the given name (e.g.
|
protected static Set<String> |
titlesForGivenNamesEs |
protected static Set<String> |
titlesForGivenNamesFr |
protected static Set<String> |
titlesForGivenNamesIt |
static String |
U
Contains upper case Characters
|
static Map<String,String> |
usStates |
| Modifier | Constructor and Description |
|---|---|
protected |
NameML(String s)
Constructor (for subclasses only; use Name.of instead!)
|
| Modifier and Type | Method and Description |
|---|---|
static String |
c(String s)
Capturing group
|
static boolean |
couldBeAbbreviation(String word)
Tells whether a string could be abbreviation.
|
static boolean |
couldBeCompanyName(String s)
Tells if the string could be a company name
|
static boolean |
couldBeName(String s)
Tells whether a String could possibly be a name
|
static boolean |
couldBePersonName(String s,
Language lang)
Returns true if it is possible that the string is a person name
|
String |
describe()
Returns a description
|
static InputStream |
getConfigFileStream(String configfile) |
static void |
init()
Simply call this function to initialize NameML with the default values
|
static void |
init(NonsharedParameters params) |
static void |
init(String configPath)
If you like to use your own stopword lists etc.
|
static boolean |
isAbbreviation(String word)
Tells whether a string is an abbreviation with high probability
|
static boolean |
isAttributePrefix(String s)
Says whether this String is an attribute Prefix (like "the" in
"Alexander the Great")
|
static boolean |
isCompanyName(String s)
Tells if the string is a company name with high probability
|
static boolean |
isCompanyNameSuffix(String s)
Says whether this String is a company name suffix
|
static boolean |
isFamilyNamePrefix(String s)
Says whether this String is a family name prefix
|
static boolean |
isLanguage(String s)
Returns TRUE for languages
|
static boolean |
isLanguageCode(String s)
Returns TRUE for language codes
|
static boolean |
isName(String s)
Tells whether a String is a name with high probability
|
static boolean |
isNames(String s)
Tells whether a String is a sequence of names with high probability
|
static boolean |
isNation(String s)
Returns TRUE for nations
|
static boolean |
isNationality(String s)
Returns TRUE for nationalities
|
static boolean |
isPersonName(String m,
Language lang)
Returns true if it is highly probable that the string is a person name.
|
static boolean |
isPersonNameSuffix(String s)
Says whether this String is a person name suffix
|
static boolean |
isStopWord(String w,
Language l)
TRUE for stopwords
|
static boolean |
isTitle(String s,
Language lang)
Says whether this String is a title
|
static boolean |
isUSState(String s)
Returns TRUE for US States
|
static boolean |
isUSStateAbbreviation(String s)
Returns TRUE for US State abbreviations
|
static String |
languageForCode(String s)
Returns the language for a code (or NULL)
|
static void |
main(String[] argv)
Test routine
|
static String |
mul(String s)
Repeats the token with blanks one or more times
|
static String |
mulHyp(String s)
Repeats the token with hyphens one or more times
|
static String |
nationForNationality(String s)
Returns the nation for a nationality (or NULL)
|
String |
normalize()
Returns the letters and digits of the original name (eliminates
punctuation)
|
static NameML |
of(String s,
Language lang)
Factory pattern
|
static String |
opt(String s)
optional component
|
static String |
optMul(String s)
optional multiple component
|
static String |
or(String s1,
String s2)
alternative
|
String |
original()
Returns the original name
|
static List<String> |
readTextFileLines(String configFile,
String encoding)
Read an entire text file as a list of strings.
|
static Set<String> |
readTextFileLinesSet(String configFile)
Read set from an UTF-8 text file, ignoring lines starting with "##"
|
String |
toString()
Returns the original name
|
static String |
unabbreviateUSState(String s)
Returns the US sate for an abbreviation (or NULL)
|
public static final String ANYNAME
protected static boolean hasBeenInitialized
public static String roman
public static String of
public static final String U
public static final String L
public static final String A
public static final String B
public static final String BD
public static final String BC
public static final String DG
public static final String H
public static final String or
public static final String familyNamePrefix
public static final Pattern familyNamePrefixPattern
public static String attributePrefix
public static Pattern attributePrefixPattern
public static final String familyNameSuffix
public static final Pattern familyNameSuffixPattern
public static Pattern titlePatternEn
public static Pattern titlePatternDe
public static Pattern titlePatternFr
public static Pattern titlePatternEs
public static Pattern titlePatternIt
protected static Set<String> titlesForGivenNamesEn
public static final String companyNameSuffix
public static final Pattern companyNameSuffixPattern
public static final String teamName
public static final Pattern teamNamePattern
public static final String prep
public static final String laxName
public static final Pattern laxNamePattern
public static final String safeName
public static final Pattern safeNamePattern
public static final Pattern safeNamesPattern
public static final Pattern safeNamesPatternNoPrep
protected String original
protected String normalized
public static final Pattern laxAbbreviationPattern
public static final Pattern safeAbbreviationPattern
public static final Pattern laxCompanyPattern
public static final Pattern safeCompanyPattern
public static final String directFamilyNamePrefix
public static final String personNameComponent
public static final String givenNameComponent
public static final String givenName
public static final String givenNames
public static final String familyName
public static final String nickName
public static Pattern laxPersonNamePatternEn
public static Pattern laxPersonNamePatternDe
public static Pattern laxPersonNamePatternEs
public static Pattern laxPersonNamePatternFr
public static Pattern laxPersonNamePatternIt
public static Pattern safePersonNamePatternEn
public static Pattern safePersonNamePatternDe
public static Pattern safePersonNamePatternEs
public static Pattern safePersonNamePatternFr
public static Pattern safePersonNamePatternIt
protected NameML(String s)
public static final void init(NonsharedParameters params)
public static final void init(String configPath)
configPath - The path that contains all word lists.public static final void init()
public static boolean isFamilyNamePrefix(String s)
public static boolean isAttributePrefix(String s)
public static boolean isPersonNameSuffix(String s)
public static boolean isCompanyNameSuffix(String s)
public static boolean isName(String s)
public static boolean isNames(String s)
public static boolean couldBeName(String s)
public String normalize()
public String describe()
public String original()
public static List<String> readTextFileLines(String configFile, String encoding) throws IOException
configFile - text file to openencoding - character encoding name (as used by Java) or null for UTF-8IOExceptionpublic static InputStream getConfigFileStream(String configfile) throws FileNotFoundException
FileNotFoundExceptionpublic static Set<String> readTextFileLinesSet(String configFile)
configFile - public static boolean isAbbreviation(String word)
public static boolean couldBeAbbreviation(String word)
public static boolean isCompanyName(String s)
public static boolean couldBeCompanyName(String s)
public static boolean couldBePersonName(String s, Language lang)
public static boolean isPersonName(String m, Language lang)
public static boolean isUSState(String s)
public static boolean isUSStateAbbreviation(String s)
public static String unabbreviateUSState(String s)
public static boolean isLanguage(String s)
public static boolean isLanguageCode(String s)
public static String languageForCode(String s)
public static boolean isNation(String s)
public static boolean isNationality(String s)
public static String nationForNationality(String s)
Copyright © 2018. All rights reserved.