先隨機選取K個對象作為初始的聚類中心。然後計算每個對象與各個種子聚類中心之間的距離,把每個對象分配給距離它最近的聚類中心。聚類中心以及分配給它們的對象就代表一個聚類。一旦全部對象都被分配了,每個聚類的聚類中心會根據聚類中現有的對象被重新計算。這個過程將不斷重複直到滿足某個終止條件。終止條件可以是以下任何一個:
得到相互分離的球狀聚類,在這些聚類中,均值點趨向收斂於聚類中心。一般會希望得到的聚類大小大致相當,這樣把每個觀測都分配到離它最近的聚類中心(即均值點)就是比較正確的分配方案。
%% SEM MAster%% New text data -- Bingbing: you may have to change this depending on what%% elements you ran with EDX. Make sure the column order matches all of%% your .csv filesclcclear allwarning off;SEMStr={'Part#','Field#','Phase#','X_stage','Y_stage','X_cent','Y_cent','X_left','Y_low','X_width','Y_height','Xferet','Yferet','AvgDiam','LProj','Area','Perim','Shape','Aspe','Orient','C','N','O','Na','Mg','Al','Si','P','S','Cl','K','Ca','Mn','Fe','Zn','CPS','AvgVideo','StgX ','StgY ','MinCnts'};elstr={'C','N','O','Na','Mg','Al','Si','P','S','Cl','K','Ca','Mn','Fe','Zn'};%% Import data:% right click on a single .csv file and import as matrix.% You need to import the files one by one. A single file is the data from% a single sample. In the example data, I have 3 samples.% HighOCwithIN1;% MedOCwithIN1;% LowOCwithIN1;
%% Put data into cell arrayfile1=importdata('HighOC_withIN_1.csv');file2=importdata('LowOC_withIN_1.csv');file3=importdata('MedOC_withIN_1.csv');Sample{1}=file1.data;Sample{2}=file2.data;Sample{3}=file3.data;
% Assign sample labels to a cell arraySampString={'HiOC','MedOC','LowOC'};
% make sure to save this so you don't have to import again.save HiMedLow.mat% Get rid of values less than 0.5%
thresh=0.5;
for i=1:length(Sample) for j=1:length(Sample{i}) for k=21:35 if Sample{i}(j,k)<thresh Sample{i}(j,k)=0; end end endend
%% run k-means on cell arrays defined above
% How many clusters?clustnum=6;
%Any gain you want to add? Typically this may change things a little bit,%but not much. A value less than 1 makes the differences in atomic ratio%between particles smaller and greater than 1 makes the difference in%atomic ratio between particles larger. See the MultiSampleKmeans.m%function.gain=0.5;
samplesize=length(Sample);
[idx,C,alldat,SampBegIdx,SampEndIdx]=MultiSampleKmeans(Sample,SampString,clustnum,gain);SampleFraction(SampBegIdx,SampEndIdx,idx,SampString,[1:clustnum],[1:samplesize])KmeansSemClassSizDistCares(alldat,idx,'highoc');
% KmeansClusterPlot(C,idx,alldat(:,21:38),elstr)% You may have to change the column numbers depending on what atoms you% chooseKmeansClusterPlot(C,idx,alldat(:,21:35),elstr)
% Maybe you want to output what particle is what cluster. Then you can go% back and look at the SEM image of that one particle. Use the code below% to get a list of particle number and cluster number.HighOCPartID=Sample{1}(:,1);B=[HighOCPartID,idx(1:length(Sample{1}))];dlmwrite('HighOC.dat',B,'delimiter','\t','precision','%12.5e')