ngram 구현하기

Java 2020. 11. 26. 10:02
반응형

elasticsearch 색인시키기 위해서  2자 , 3자 색인이 필요한 상황

그래서 간단하게 Stringtokenlizer활용하여 만듬

ex)

input : 힐스테이트2차 

 

//2자 result

힐스 

스테 

테이 

이트 

트2 

2차 

 

//3자 result

힐스테 

스테이 

테이트 

이트2 

트2차

 

/**
     * elastic search 용 ngram 2,3 자르기
     * @param str
     * @param n 자를 글자수
     * @return
     */
    public static String esNgram(String str, int n){
        StringTokenizer stringTokenizer = new StringTokenizer(str.trim()," ");
        List<String> strArr = new ArrayList<>();
        while(stringTokenizer.hasMoreTokens()){
            String token = stringTokenizer.nextToken();
            for(int i=0; i<token.length(); i++){
                if(i+n > token.length()){
                    break;
                }
                strArr.add(token.substring(i,i+n));
            }
        }
        return String.join(" ", strArr);
    }

 

반응형
블로그 이미지

visualp

c#, java

,
반응형

elasticsearch jpa연동하여 작업하던중 findAll 사용시 pageable 지정해주지 않고 사용하면 

기본  페이지가 10으로 세팅됨 ,

form 0 , default page size 10 , 

데이터를 한번에 가져와야 할 상황 ~ 1000정도로 세팅함

[참고]

www.elastic.co/guide/en/elasticsearch/reference/6.8/search-request-from-size.html

 

GET /_search
{
    "from" : 0, "size" : 10,
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

Pageable pageable = PageRequest.of(0,1000);

esAddressRepo.findAllByOaddrMatches(addr,pageable);

이런식으로 사용함.

반응형
블로그 이미지

visualp

c#, java

,
반응형

@ 십질 끝에 정리 

 - Setting or Mapping 은 최초 index (table) 생성시 적용이 됩니다.

 - 중간에 끼어 넣기 안됨 , 최초 생성하면서 설정 해줘야함 .

 - spring data @Document 객체에 @Setting, @Mapping annotation을 통해서 간단하게 연결 할 수 있습니다.

 - SettingMapping 에는 "setting" or "mapping" 노드를 포함하지 않는다. <-- 포함시 세팅 안됨

@Data
@NoArgsConstructor
@AllArgsConstructor
@Builder
@Document(indexName = "idx_es_address", createIndex = true)
@Setting(settingPath = "/elasticsearch/settings/settings.json")
@Mapping(mappingPath = "/elasticsearch/mappings/mappings.json")
public class EsAddressVO {

    //고객코드
    @Id
    private String cuscode;

    private String areacode;

    //신주소
    private String naddr;

    //구주소
    private String oaddr;
    
}

@Setting /resources/elasticsearch/settings/settings.json

 - tokenizer와 analyzer 세팅함

 

{
  "analysis": {
    "tokenizer": {
      "nori_none": {
        "type": "nori_tokenizer",
        "decompound_mode": "none"
      },
      "nori_discard": {
        "type": "nori_tokenizer",
        "decompound_mode": "discard"
      },
      "nori_mixed": {
        "type": "nori_tokenizer",
        "decompound_mode": "mixed"
      }
    },
    "analyzer": {
      "korean": {
        "type": "nori",
        "stopwords": "_korean_"
      }
    }
  }
}

위 tokenizer 에 대한 자세한 부분들은 기술문서 참고 할것

 

 

@Mapping /resources/elasticsearch/mappings/mappings.json

기본적으로 spring에서 mapping을 지정하지 않더라도 기본 mapping을 해주지만

nori 한글 검색을 사용하기 위해서 oaddr 필드에 setting.jsoin에서 등록해둔 korean <-- analyzer를  지정

{
    "properties": {
      "oaddr": {
        "type": "text",
        "analyzer": "korean"
      }
    }
}

 

kibana console에서 확인하는 방법

 

 

GET idx_es_addres/_settings

{
  "idx_es_address" : {
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"
            }
          }
        },
        "number_of_shards" : "1",
        "provided_name" : "idx_es_address",
        "creation_date" : "1606194598534",
        "analysis" : {
          "analyzer" : {
            "korean" : {
              "type" : "nori",
              "stopwords" : "_korean_"
            }
          },
          "tokenizer" : {
            "nori_discard" : {
              "type" : "nori_tokenizer",
              "decompound_mode" : "discard"
            },
            "nori_mixed" : {
              "type" : "nori_tokenizer",
              "decompound_mode" : "mixed"
            },
            "nori_none" : {
              "type" : "nori_tokenizer",
              "decompound_mode" : "none"
            }
          }
        },
        "number_of_replicas" : "1",
        "uuid" : "mxGr5zJOTrW5Gp9-n4HjHQ",
        "version" : {
          "created" : "7100099"
        }
      }
    }
  }
}

 

GET idx_es_address/_mapping

{
  "idx_es_address" : {
    "mappings" : {
      "properties" : {
        "_class" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "areacode" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "cuscode" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "naddr" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "oaddr" : {
          "type" : "text",
          "analyzer" : "korean"
        }
      }
    }
  }
}

 

반응형
블로그 이미지

visualp

c#, java

,
반응형

@ 테스트 후 정리 

 - Setting or Mapping 은 최초 index (table) 생성시 적용이 됩니다.

 - 중간에 끼어 넣기 안됨 , 최초 생성하면서 설정 해줘야함 .

 - spring data @Document 객체에 @Setting, @Mapping annotation을 통해서 간단하게 연결 할 수 있습니다.

 - SettingMapping 에는 "setting" or "mapping" 노드를 포함하지 않는다. <-- 포함시 세팅 안됨

 

@Data
@NoArgsConstructor
@AllArgsConstructor
@Builder
@Document(indexName = "idx_es_address", createIndex = true)
@Setting(settingPath = "/elasticsearch/settings/settings.json")
@Mapping(mappingPath = "/elasticsearch/mappings/mappings.json")
public class EsAddressVO {

    //고객코드
    @Id
    private String cuscode;

    private String areacode;

    //신주소
    private String naddr;

    //구주소
    private String oaddr;
    
}

@Setting /resources/elasticsearch/settings/settings.json

 - tokenizer와 analyzer 세팅함

 

{
  "analysis": {
    "tokenizer": {
      "nori_none": {
        "type": "nori_tokenizer",
        "decompound_mode": "none"
      },
      "nori_discard": {
        "type": "nori_tokenizer",
        "decompound_mode": "discard"
      },
      "nori_mixed": {
        "type": "nori_tokenizer",
        "decompound_mode": "mixed"
      }
    },
    "analyzer": {
      "korean": {
        "type": "nori",
        "stopwords": "_korean_"
      }
    }
  }
}

위 tokenizer 에 대한 자세한 부분들은 기술문서 참고 할것

 

 

@Mapping /resources/elasticsearch/mappings/mappings.json

기본적으로 spring에서 mapping을 지정하지 않더라도 기본 mapping을 해주지만

nori 한글 검색을 사용하기 위해서 oaddr 필드에 setting.jsoin에서 등록해둔 korean <-- analyzer를  지정

{
    "properties": {
      "oaddr": {
        "type": "text",
        "analyzer": "korean"
      }
    }
}

 

kibana console에서 확인하는 방법

 

 

GET idx_es_addres/_settings

{
  "idx_es_address" : {
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "_tier_preference" : "data_content"
            }
          }
        },
        "number_of_shards" : "1",
        "provided_name" : "idx_es_address",
        "creation_date" : "1606194598534",
        "analysis" : {
          "analyzer" : {
            "korean" : {
              "type" : "nori",
              "stopwords" : "_korean_"
            }
          },
          "tokenizer" : {
            "nori_discard" : {
              "type" : "nori_tokenizer",
              "decompound_mode" : "discard"
            },
            "nori_mixed" : {
              "type" : "nori_tokenizer",
              "decompound_mode" : "mixed"
            },
            "nori_none" : {
              "type" : "nori_tokenizer",
              "decompound_mode" : "none"
            }
          }
        },
        "number_of_replicas" : "1",
        "uuid" : "mxGr5zJOTrW5Gp9-n4HjHQ",
        "version" : {
          "created" : "7100099"
        }
      }
    }
  }
}

 

GET idx_es_address/_mapping

{
  "idx_es_address" : {
    "mappings" : {
      "properties" : {
        "_class" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "areacode" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "cuscode" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "naddr" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "oaddr" : {
          "type" : "text",
          "analyzer" : "korean"
        }
      }
    }
  }
}
반응형
블로그 이미지

visualp

c#, java

,
반응형

<설치명령어>

/usr/share/elasticsearch/bin/elasticsearch-plugin install analysis-nori

 

elasticsearch를 이용해서 한글검색을 하려면 반드시 analysis-nori plugin을 설치 해야 한다. 

elasticsearch 6.4 버전 부터 추가 되었다고 한다.

/usr/share/elasticsearch/bin/elasticsearch-plugin                                                                                                                                                              install analysis-roi
-> Installing analysis-roi
-> Failed installing analysis-roi
-> Rolling back analysis-roi
-> Rolled back analysis-roi
A tool for managing installed elasticsearch plugins

Non-option arguments:
command

Option             Description
------             -----------
-E <KeyValuePair>  Configure a setting
-h, --help         Show help
-s, --silent       Show minimal output
-v, --verbose      Show verbose output
ERROR: Unknown plugin analysis-roi, did you mean any of [analysis-nori, analysis                                                                                                                                                             -icu, analysis-kuromoji]?
[root@fourfree elasticsearch]# /usr/share/elasticsearch/bin/elasticsearch-plugin                                                                                                                                                              install analysis-nori
-> Installing analysis-nori
-> Downloading analysis-nori from elastic
[=================================================] 100%  
-> Installed analysis-nori

 

  

반드시 설치 후 elasticsearch 재 시작 해줘야 한다. 

간혹 유튜부 설치 관련영상을 보면... 재시작 해도 플러그인 적용이 안되는 경우가 있다고 함..

서버 재부팅을 하면 정상 적용된다는 영상도 있음 .

centos기준으로 재시작만 해줘도 잘 적용됨. 

 

반응형
블로그 이미지

visualp

c#, java

,
반응형

application.proerties에 아래 3줄 추가한다.

 

logging.level.org.springframework.data.elasticsearch.core=DEBUG
logging.level.org.elasticsearch.client=TRACE

logging.level.org.apache.http=TRACE

 

로그기 출력되는거를 확인 할 수 있음

[main] DEBUG o.a.h.i.n.c.MainClientExec - [exchange: 3] start execution 
[main] DEBUG o.a.h.c.p.RequestAddCookies - CookieSpec selected: default 
[main] DEBUG o.a.h.c.p.RequestAuthCache - Re-using cached 'basic' auth scheme for http://192.168.0.50:9200 
[main] DEBUG o.a.h.c.p.RequestAuthCache - No credentials for preemptive authentication 
[main] DEBUG o.a.h.i.n.c.InternalHttpAsyncClient - [exchange: 3] Request connection for {}->http://192.168.0.50:9200 
[main] DEBUG o.a.h.i.n.c.PoolingNHttpClientConnectionManager - Connection request: [route: {}->http://192.168.0.50:9200][total kept alive: 1; route allocated: 1 of 10; total allocated: 1 of 30] 
[main] DEBUG o.a.h.i.n.c.ManagedNHttpClientConnectionImpl - http-outgoing-0 192.168.0.45:8081<->192.168.0.50:9200[ACTIVE][r:r]: Set timeout 0 
[main] DEBUG o.a.h.i.n.c.PoolingNHttpClientConnectionManager - Connection leased: [id: http-outgoing-0][route: {}->http://192.168.0.50:9200][total kept alive: 0; route allocated: 1 of 10; total allocated: 1 of 30] 
[main] DEBUG o.a.h.i.n.c.InternalHttpAsyncClient - [exchange: 3] Connection allocated: CPoolProxy{http-outgoing-0 [ACTIVE]} 
[main] DEBUG o.a.h.i.n.c.ManagedNHttpClientConnectionImpl - http-outgoing-0 192.168.0.45:8081<->192.168.0.50:9200[ACTIVE][r:r]: Set attribute http.nio.exchange-handler 
[main] DEBUG o.a.h.i.n.c.ManagedNHttpClientConnectionImpl - http-outgoing-0 192.168.0.45:8081<->192.168.0.50:9200[ACTIVE][rw:r]: Event set [w] 
[I/O dispatcher 1] DEBUG o.a.h.i.n.c.InternalIODispatch - http-outgoing-0 [ACTIVE] Request ready 
[I/O dispatcher 1] DEBUG o.a.h.i.n.c.MainClientExec - [exchange: 3] Attempt 1 to execute request 
[I/O dispatcher 1] DEBUG o.a.h.i.n.c.MainClientExec - [exchange: 3] Proxy auth state: UNCHALLENGED 
[I/O dispatcher 1] DEBUG o.a.h.i.n.c.ManagedNHttpClientConnectionImpl - http-outgoing-0 192.168.0.45:8081<->192.168.0.50:9200[ACTIVE][rw:w]: Set timeout 5000 
[I/O dispatcher 1] DEBUG o.a.h.headers - http-outgoing-0 >> GET /idx_es_address/_doc/12345678 HTTP/1.1 
[I/O dispatcher 1] DEBUG o.a.h.headers - http-outgoing-0 >> Authorization: Basic ZWxhc3RpYzp0b2Q3cnQyZnJ6ZXFlOA== 
[I/O dispatcher 1] DEBUG o.a.h.headers - http-outgoing-0 >> Content-Length: 0 
[I/O dispatcher 1] DEBUG o.a.h.headers - http-outgoing-0 >> Host: 192.168.0.50:9200 
[I/O dispatcher 1] DEBUG o.a.h.headers - http-outgoing-0 >> Connection: Keep-Alive 
[I/O dispatcher 1] DEBUG o.a.h.headers - http-outgoing-0 >> User-Agent: Apache-HttpAsyncClient/4.1.4 (Java/11.0.7) 
[I/O dispatcher 1] DEBUG o.a.h.i.n.c.ManagedNHttpClientConnectionImpl - http-outgoing-0 192.168.0.45:8081<->192.168.0.50:9200[ACTIVE][rw:w]: Event set [w] 
[I/O dispatcher 1] DEBUG o.a.h.i.n.c.MainClientExec - [exchange: 3] Request completed 
[I/O dispatcher 1] DEBUG o.a.h.i.n.c.ManagedNHttpClientConnectionImpl - http-outgoing-0 192.168.0.45:8081<->192.168.0.50:9200[ACTIVE][rw:w]: 225 bytes written 
[I/O dispatcher 1] DEBUG o.a.http.wire - http-outgoing-0 >> "GET /idx_es_address/_doc/12345678 HTTP/1.1[\r][\n]" 
[I/O dispatcher 1] DEBUG o.a.http.wire - http-outgoing-0 >> "Authorization: Basic ZWxhc3RpYzp0b2Q3cnQyZnJ6ZXFlOA==[\r][\n]" 
[I/O dispatcher 1] DEBUG o.a.http.wire - http-outgoing-0 >> "Content-Length: 0[\r][\n]" 
[I/O dispatcher 1] DEBUG o.a.http.wire - http-outgoing-0 >> "Host: 192.168.0.50:9200[\r][\n]" 
[I/O dispatcher 1] DEBUG o.a.http.wire - http-outgoing-0 >> "Connection: Keep-Alive[\r][\n]" 
[I/O dispatcher 1] DEBUG o.a.http.wire - http-outgoing-0 >> "User-Agent: Apache-HttpAsyncClient/4.1.4 (Java/11.0.7)[\r][\n]" 
[I/O dispatcher 1] DEBUG o.a.http.wire - http-outgoing-0 >> "[\r][\n]" 
[I/O dispatcher 1] DEBUG o.a.h.i.n.c.InternalIODispatch - http-outgoing-0 [ACTIVE] Request ready 
[I/O dispatcher 1] DEBUG o.a.h.i.n.c.ManagedNHttpClientConnectionImpl - http-outgoing-0 192.168.0.45:8081<->192.168.0.50:9200[ACTIVE][r:w]: Event cleared [w] 
[I/O dispatcher 1] DEBUG o.a.h.i.n.c.ManagedNHttpClientConnectionImpl - http-outgoing-0 192.168.0.45:8081<->192.168.0.50:9200[ACTIVE][r:r]: 336 bytes read 
[I/O dispatcher 1] DEBUG o.a.http.wire - http-outgoing-0 << "HTTP/1.1 200 OK[\r][\n]" 
[I/O dispatcher 1] DEBUG o.a.http.wire - http-outgoing-0 << "content-type: application/json; charset=UTF-8[\r][\n]" 
[I/O dispatcher 1] DEBUG o.a.http.wire - http-outgoing-0 << "content-length: 249[\r][\n]" 
[I/O dispatcher 1] DEBUG o.a.http.wire - http-outgoing-0 << "[\r][\n]" 
[I/O dispatcher 1] DEBUG o.a.http.wire - http-outgoing-0 << "{"_index":"idx_es_address","_type":"_doc","_id":"12345678","_version":1,"_seq_no":0,"_primary_term":1,"found":true,"_source":{"_class":"com.fourfree.elasticsearch.address.vo.EsAddressVO","cuscode":"12345678","naddr":"[0xffffffec][0xffffff8b][0xffffffa0][0xffffffec][0xffffffa3][0xffffffbc][0xffffffec][0xffffff86][0xffffff8c]","oaddr":"[0xffffffea][0xffffffb5][0xffffffac][0xffffffec][0xffffffa3][0xffffffbc][0xffffffec][0xffffff86][0xffffff8c]"}}" 
[I/O dispatcher 1] DEBUG o.a.h.headers - http-outgoing-0 << HTTP/1.1 200 OK 
[I/O dispatcher 1] DEBUG o.a.h.headers - http-outgoing-0 << content-type: application/json; charset=UTF-8 
[I/O dispatcher 1] DEBUG o.a.h.headers - http-outgoing-0 << content-length: 249 
[I/O dispatcher 1] DEBUG o.a.h.i.n.c.InternalIODispatch - http-outgoing-0 [ACTIVE(249)] Response received 
[I/O dispatcher 1] DEBUG o.a.h.i.n.c.MainClientExec - [exchange: 3] Response received HTTP/1.1 200 OK 
[I/O dispatcher 1] DEBUG o.a.h.i.n.c.InternalIODispatch - http-outgoing-0 [ACTIVE(249)] Input ready 
[I/O dispatcher 1] DEBUG o.a.h.i.n.c.MainClientExec - [exchange: 3] Consume content 
[I/O dispatcher 1] DEBUG o.a.h.i.n.c.InternalHttpAsyncClient - [exchange: 3] Connection can be kept alive indefinitely 
[I/O dispatcher 1] DEBUG o.a.h.i.n.c.MainClientExec - [exchange: 3] Response processed 
[I/O dispatcher 1] DEBUG o.a.h.i.n.c.InternalHttpAsyncClient - [exchange: 3] releasing connection 
[I/O dispatcher 1] DEBUG o.a.h.i.n.c.ManagedNHttpClientConnectionImpl - http-outgoing-0 192.168.0.45:8081<->192.168.0.50:9200[ACTIVE][r:r]: Remove attribute http.nio.exchange-handler 
[I/O dispatcher 1] DEBUG o.a.h.i.n.c.PoolingNHttpClientConnectionManager - Releasing connection: [id: http-outgoing-0][route: {}->http://192.168.0.50:9200][total kept alive: 0; route allocated: 1 of 10; total allocated: 1 of 30] 
[I/O dispatcher 1] DEBUG o.a.h.i.n.c.PoolingNHttpClientConnectionManager - Connection [id: http-outgoing-0][route: {}->http://192.168.0.50:9200] can be kept alive indefinitely 
[I/O dispatcher 1] DEBUG o.a.h.i.n.c.ManagedNHttpClientConnectionImpl - http-outgoing-0 192.168.0.45:8081<->192.168.0.50:9200[ACTIVE][r:r]: Set timeout 0 
[I/O dispatcher 1] DEBUG o.a.h.i.n.c.PoolingNHttpClientConnectionManager - Connection released: [id: http-outgoing-0][route: {}->http://192.168.0.50:9200][total kept alive: 1; route allocated: 1 of 10; total allocated: 1 of 30] 
[I/O dispatcher 1] DEBUG o.a.h.i.n.c.InternalIODispatch - http-outgoing-0 [ACTIVE] [content length: 249; pos: 249; completed: true] 
[main] DEBUG o.e.c.RestClient - request [GET http://192.168.0.50:9200/idx_es_address/_doc/12345678] returned [HTTP/1.1 200 OK] 
reS:EsAddressVO(cuscode=12345678, naddr=신주소, oaddr=구주소)

반응형
블로그 이미지

visualp

c#, java

,
반응형

application.properties 에

elasticsearch.host=192.168.0.50
elasticsearch.cluster_name=master <-- node name
elasticsearch.port=9200 <-- port 기본 9200
elasticsearch.user_name=elastic  <-- username
elasticsearch.user_password=설정한 비밀번호

 

추가 한다.

 

AbstractElasticsearchConfiguration 확장해서 연결 설정 할 수 있음.

 - RestHighLevelClient <-- 사용하는 것을 6.4 이상부터 권장 하고 있음

 

@Configuration
@EnableElasticsearchRepositories(basePackages = {"com.fourfree.elasticsearch"})
public class ElasticSearchConfig extends AbstractElasticsearchConfiguration {
    @Value("${elasticsearch.host}")
    private String host;

    @Value("${elasticsearch.port}")
    private String port;
    
    @Value("${elasticsearch.cluster_name}")
    private String clusterName;

    @Value("${elasticsearch.user_name}")
    private String userName;

    @Value("${elasticsearch.user_password}")
    private String userPassword;


    @Override
    @Bean
    public RestHighLevelClient elasticsearchClient() {
        final ClientConfiguration clientConfiguration = ClientConfiguration.builder()
                .connectedTo(host+":" + port)
                .withBasicAuth(userName,userPassword)
                .build();
        return RestClients.create(clientConfiguration).rest();
    }
    
    @Bean
    public ElasticsearchOperations elasticsearchOperations(){
        return new ElasticsearchRestTemplate(elasticsearchClient());
    }

}
반응형
블로그 이미지

visualp

c#, java

,
반응형

[참조] docs.spring.io/spring-data/elasticsearch/docs/current/reference/html/#core.extensions

       

spring data release train

spring data elasticsearch

Elasticsearch

Spring boot

2020.0.0[1]

4.1.x[1]

7.9.3

2.3.x[1]

Neumann

4.0.x

7.6.2

2.3.x

Moore

3.2.x

6.8.12

2.2.x

Lovelace

3.1.x

6.2.2

2.1.x

Kay[2]

3.0.x[2]

5.5.0

2.0.x[2]

Ingalls[2]

2.1.x[2]

2.4.0

1.5.x[2]

spring boot 기반에서 세팅하려고 함으로 2.3.x 버전을 설치 해야함

pom.xml<--  추가

<dependency>
	<groupId>org.springframework.boot</groupId>
	<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
	<version>2.3.2.RELEASE</version>
</dependency>

 

 

반응형
블로그 이미지

visualp

c#, java

,