# 数据分析相似性以及原理解读(cos相似度)

2月 27, 2021 java, 大数据, 算法

## 为什么写这篇文章

$$\frac{\sum_{i=1}^{n} (x_i \times y_i)}{\sqrt{\sum_{i=1}^{n}{(x_i)^{2}}}\times \sqrt{\sum_{i=1}^{n}{(y_i)^2}}}$$

## COS相似度公式推理

$$cos_ß= \frac{a}{c}$$

$$a^{2}-x^{2}=b^{2}-(c-x)^{2}$$

$$=> a^{2}-x^{2}=b^{2}-(c^2-2xc+x^2)$$

$$=> a^{2}-x^{2}=b^{2}-c^2+2xc-x^2$$

$$=> 2xc= a^2+c^2-b^2$$

$$=> x= \frac {a^2+c^2-b^2}{2c}$$

$$A到原点的距离=\sqrt{x_1^2+y1^2}$$

$$B到原点的距离=\sqrt{x_2^2+y_2^2}$$

$$A到B的距离=\sqrt{(y_1-y_2)^2+(x_2-x_1)^2}$$

$$cosß=\frac{x_1^2+y_1^2+x_2^2+y_2^2-((y_1-y_2)^2+(x_2-x_1)^2)}{2\times\sqrt{x_1^2+y1^2}\times\sqrt{x_2^2+y_2^2}}$$

$$=> \frac{x_1^2+y_1^2+x_2^2+y_2^2-(y_1^2-2y_1y_2+y_2^2+x_2^2-2x_1x_2+x_1^2)}{2\times\sqrt{x_1^2+y1^2}\times\sqrt{x_2^2+y_2^2}}$$

$$=> \frac{2y_1y_2+2x_1x_2}{2\times\sqrt{x_1^2+y1^2}\times\sqrt{x_2^2+y_2^2}}$$

$$=> \frac{y_1y_2+x_1x_2}{\sqrt{x_1^2+y1^2}\times\sqrt{x_2^2+y_2^2}}$$

$$\frac{x_1y_1+x_2y_2}{\sqrt{x_1^2+x_1^2}\times\sqrt{y_2^2+y_2^2}}$$

$$\frac{\sum_{i=1}^{n} (x_i \times y_i)}{\sqrt{\sum_{i=1}^{n}{(x_i)^{2}}}\times \sqrt{\sum_{i=1}^{n}{(y_i)^2}}}$$

## COS相似度原理ßß

A: 拥有技术很容易，拥有技术解决方案才是财富
B: 技术很容易拥有，但拥有技术解决方案很难

1. 首先我们对两个语句分别进行分词
A: [拥有，技术，很，容易，拥有，技术，解决，方案，才是，财富]
B: [技术，很，容易，拥有，但，拥有，技术，解决，方案，很，难]
2. 取并集
[拥有，技术，很，容易，解决，方案，才是，财富，但，难]
3. 判断A、B两个句子每个分词出现的频率(拿并集和自己比较)
A:[2,2,1,1,1,1,1,1,0,0]
B:[2,2,2,1,1,1,0,0,1,1]
4. 大家想下如果两个句子分词一样，出现的频率也一样是不是相似度就几乎一样了；而且通过频率得出来的两个数组，是不是可以看作两个多维坐标？如果两条数据越相似那么得出来的这两个坐标的向量的方向就会越靠近甚至一样；那么这样是不是就把数据和算法结合起来了，通过这两个坐标带入到上面的公式就得出了他们的相似度。
5. 这里给出一个接口大家可以测试下https://alanpoi.com/compare/t/p

上面的74.6%是没有过滤分词后的停用词计算的，目前接口已经过滤了停用此，计算出得结果会更符合实际场景；你们通过上面的公式计算的结果也会是这个值

## 源码调用

<dependency>
<groupId>com.alanpoi</groupId>
<artifactId>alanpoi-common</artifactId>
<version>1.3.3</version>
</dependency>
1. 如果自己已经分好词了调用
SimilarUtil.calculate(List<String> val1, List<String> val2)
1. 如果是两个文本字符
// 不强制要求，建议在项目启动的时候初始化
//如果不启动初始化，第一次调用分词由于初始化数据会导致很慢，之后就会很快了
WordSegInitConfig.init();
SimilarUtil.calculate(String text1, String text2)

#### 作者 Alan

##### 《数据分析相似性以及原理解读(cos相似度)》有11个想法
1. What’s up everyone, it’s my first visit at this web site, and piece of writing
is really fruitful in support of me, keep up posting these
content.

2. Appreciate the recommendation. Let me try it
out.

3. Hello! Do you know if they make any plugins to protect against
hackers? I’m kinda paranoid about losing everything I’ve
worked hard on. Any suggestions?

4. Hey There. I found your blog using msn. This is a very well written article.
useful info. Thanks for the post. I will definitely return.

5. We’re a group of volunteers and starting a new scheme in our
community. Your web site provided us with valuable info to work on. You’ve done a formidable job and our entire
community will be grateful to you.

6. Its like you learn my thoughts! You appear to understand a lot approximately this, such as
you wrote the e book in it or something. I think that you simply
could do with some percent to power the message house a little bit, however other than that, that is fantastic blog.
An excellent read. I’ll definitely be back.

7. I always used to read article in news papers but now
as I am a user of net therefore from now I am using net for
content, thanks to web.

8. Hi to every body, it’s my first pay a visit of this web site; this webpage includes awesome and truly good stuff in favor of visitors.

9. I am truly grateful to the owner of this web page who has shared this fantastic post at at this time.

10. I don’t even know how I stopped up here, however I thought this publish was once good.

I don’t know who you’re however definitely you are going to a
famous blogger in the event you are not already. Cheers!