题目描述
People often have a preference among synonyms of the same word. For example, some may prefer “the police”, while others may prefer “the cops”. Analyzing such patterns can help to narrow down a speaker’s identity, which is useful when validating, for example, whether it’s still the same person behind an online avatar.
Now given a paragraph of text sampled from someone’s speech, can you find the person’s most commonly used word?
Input Specification:
Each input file contains one test case. For each case, there is one line of text no more than 1048576 characters in length, terminated by a carriage return \n. The input contains at least one alphanumerical character, i.e., one character from the set [0-9 A-Z a-z].
Output Specification:
For each test case, print in one line the most commonly occurring word in the input text, followed by a space and the number of times it has occurred in the input. If there are more than one such words, print the lexicographically smallest one. The word should be printed in all lower case. Here a “word” is defined as a continuous sequence of alphanumerical characters separated by non-alphanumerical characters or the line beginning/end.
Note that words are case insensitive.
不同的人对描述同一种事物的同义词的偏爱程度可能不同。
例如,在说警察时,有人喜欢用 the police,有人喜欢用 the cops。
分析说话方式有助于确定说话者的身份,这在验证诸如和你线上聊天的是否是同一个人十分有用。
现在,给定一段从某人讲话中提取的文字,你能确定他的最常用词吗?
输入格式
输入共一行,包含一个字符串,以回车符 \n 终止。
输出格式
共一行,输出最常用词以及其出现次数。
如果常用词有多个,则输出字典序最小的那个单词。
注意,单词在输出时,必须全部小写。
单词是指由连续的字母和数字构成的,被非字母数字字符或行首/行尾分隔开的,连续序列。
单词不区分大小写。
Sample Input:
Can1: “Can a can can a can? It can!”
Sample Output:
can 5
//followed by紧随其后,其次是 continuous sequence连续序列 lower case小写字体
//case insensitive不区分大小写的 insensitive不敏感 alphanumerical字母数字的
//lexicographically字典序,按字典顺序 terminated by终止于 avatar化身
// If there are more than one such words, print the lexicographically smallest one.
//如果有多个这样的单词,请打印词典中最小的一个
//For each test case, print in one line the most commonly occurring word in the input text,
//followed by a space and the number of times it has occurred in the input.
//对于每个测试用例,在一行中打印输入文本中最常见的单词,后跟空格和它在输入中出现的次数。
// The word should be printed in all lower case.
//这个单词应该全部小写。
//Here a "word" is defined as a continuous sequence of alphanumerical characters separated by non-alphanumerical characters or the line beginning/end.
//在这里,“单词”被定义为由非字母数字字符或行首/行尾分隔的连续字母数字字符序列。
//Note that words are case insensitive.
//请注意,单词不区分大小写。
#include<iostream>
#include<unordered_map>
using namespace std;
string s1,s2;
bool check(char c)
{
if(c>='0' && c<='9') return true;
if(c>='A' && c<='Z') return true;
if(c>='a' && c<='z') return true;
return false;
}
// char to_lower(char c)
// {
// if(c>='A'&& c<='Z') return c+32;
// return c;
// }
int main()
{
string str;
getline(cin,str);
unordered_map<string,int> hash;//每个单词出现次数
for(int i = 0; i<str.size(); i++)
if(check(str[i]))
{
string word;
int j = i;
while(j<str.size() && check(str[j])) word+=tolower(str[j++]);
hash[word]++;
i = j;
}
string word;
int cnt = -1;
for(auto item:hash)
if(item.second > cnt || item.second == cnt && item.first<word)
{
word = item.first;
cnt = item.second;
}
cout<<word<<' '<<cnt<<endl;
return 0;
}