Segmentation of tumour

using K mean clustering algorithm

S Venkateswar Viswanath, Mrs. Sobha T S

Abstract-In radiotherapy using 18-fluorodeoxyglucose

positron emission tomography (18F-FDG-PET), the accurate delineation of the

biological tumour volume (BTV) is a crucial step. In this study, the authors

suggest a new approach to segment the BTV in F-FDG-PET images. The technique is

based on the k-means clustering algorithm incorporating automatic optimal

cluster number estimation, using intrinsic positron emission tomography image

information.

Partitioning data into a finite number of k homogenous and

separate clusters (groups) without use of prior knowledge is carried out by

some unsupervised partitioning algorithm like the k-means clustering algorithm.

To evaluate these resultant clusters for finding optimal number of clusters,

properties such as cluster density, size, shape and separability are typically

examined by some cluster validation methods. Mainly the aim of clustering

analysis is to find the overall compactness of the clustering solution, for

example variance within cluster should be a minimum and separation between the

clusters should be a maximum.

I.

INTRODUCTION

There

are several approaches to validate the segmentation techniques such as phantom

studies and the macroscopic surgical specimen obtained from histology. The use

of macroscopic samples for validation of segmentation techniques in positron

emission tomography (PET) images is one of the most promising approaches

reported so far in clinical studies, the procedure consists of the comparison

of the tumour volumes defined on the PET data with actual tumour volumes

measured on the macroscopic samples recorded from histology (where PET was

performed prior to surgery). Segmentation using the cluster –based algorithms

is very popular, but the main problem in this case is the determination of the

optimal and desired number of clusters. In this, we have implemented an

approach based on k-means algorithm with an automatic estimation of the optimal

number of clusters, based on the maximum intensity ratio in a given volume of

interest (VOI).

II.

METHEDOLOGY

Calculate

the VE for a range of k (2–50

clusters), and the

optimal

number which corresponds to the minimum of SVEs. This method gives good results

but

consumes a significant computation time by performing the clustering for a

large range

of

cluster values before selecting the optimal number of clusters. Several

approaches have

been

proposed in the literature to identify the optimal cluster number to better fit

the data,

three

of them are used.

Unfortunately,

the results are not promising because they are not adapted to PET image

segmentation.

So our goal in this study is to improve the k-means clustering method, by

incorporating

an automatic determination of the optimal number of clusters using a new

criterion

based on PET image features.

After

analysing the variation of the maximum activity (intensity) of the uptake

() in the VOI by scanning all slices , we

conclude for all patients that the maximum

intensity

value decreases from middle to frontier slices, and the maximum intensity is

often

situated

almost at the centre of the BTV. The optimal cluster number has a minimum value

at

the centre of the BTV, and increases from central to frontier slices. This

correlation

between

the optimal number of clusters and the maximum intensity motivates our choice

of

the

following slice image feature:

where

is

the maximum activity (intensity) of the uptake in the

corresponding

slice, is

the maximum activity in all slices that encompasses tumour

volume

BTV inside the , and is the difference between the

maximum

and the minimum values of (Imax (slice)/Imax (VOI)) in the .

Similarly

to , the new criterion , has a maximum value

for the middle slices and

decreases

for the frontiers of the BTV. Note that the r values range from ‘0’ to

‘1’ for all

patients.

3.3.2

Modelling: This section is dedicated to finding a

relationship between the optimal

number

of clusters k, and the new

criterion r. This relationship

could be used to determine

the

optimal cluster number for the segmentation of new PET images using only the

new

slice

image feature r. After analysing the variation of k in function

of r criterion (for all

patients

included in this study), we use two fitting models: an exponential and a power

function

given by below, respectively,

k

= ? ? e?. r + 1

k

= a ? + 1, (4.2)

Where

?, ?, a, b are coefficients of fitting models and r

is the proposed criterion. The fitting

accuracy

evaluation is based on the R-square criterion. Note that we added ‘1’ to

the

original

fitting equation to avoid clustering the image with one cluster for the high

values of

r.

3.3.3

Generalisation: The aim of this step is to automate the

choice of the optimal cluster

number

for all patients using one corresponding relationship function by defining a

generalised

model for all patients. For this reason, we have divided the database randomly

into

two parts of 50% each. The first part (validation set), contains three patients

is used for

optimising

the model coefficients and fixing the optimal power and exponential generalised

model.

The second part (test set), contains four patients, is used for testing the

accuracy of

the

fixed optimal model.

According

to the R-square criterion, the optimal exponential and power generalised

function

can

be rewritten as follows:

k

= 46.52e?5.918 × r

+ 1

k

= 1.683r?1.264 + 1

k

mean alogorithm 3 :

III.

CONCLUSION

A new unsupervised cluster-based

approach for segmenting the BTV in 18F-FDG-PET images is introduced. The system

is more reliable and has very less error. It can be improved by MMMindex technique used

in determining an optimal value of K in K-means

clustering1 , for which

k-means clustering it uses a method to find an optimal value of k number of

clusters, using the features and variables inherited from datasets. The new

proposed method is based on comparison of movement of objects forward/back from

k to k+1 and k+1 to k set of clusters to find the joint probability, which is

different from the other methods and indexes that are based on the distance.