Can you trust your data?
你相信你的數據嗎?
That's the very first question we need to ask when we perform a statistical analysis. If the data's no good, it doesn't matter what statistical methods we employ, nor how much expertise we have in analyzing data. If we start with bad data, we'll end up with unreliable results. Garbage in, garbage out, as they say.
我們在進行統計分析時,問的第一個問題便是-你相信你的數據嗎?如果數據不好,無論我們運用什麼統計方法,在分析數據上擁有多麼充足的專業知識,一切都會變得不重要!一旦我們從不良數據開始,那麼,我們最終會得到不可靠的結果。俗話說,胡亂輸入,胡亂輸出。
So, can you trust your data? Are you positive? Because, let's admit it, many of us forget to ask that question altogether, or respond too quickly and confidently.
你相信你的數據嗎?你能確信嗎?讓我們姑且承認,我們中的很多人都忘記了去問這個問題,或者對這個問題反應太快,而且信心十足。
You can’t just assume we have good data—you need to know you do. That may require a little bit more work up front, but the energy you spend getting good data will pay off in the form of better decisions and bigger improvements.
我們不能僅僅假設我們擁有可靠的數據-你需要明白你是這樣做的。這可能需要提前一點工作,但是你在獲得可靠數據上的精力,將會被以更好的決策與更大的改進的形式得到回饋。
Here are 3 critical actions you can take to maximize your chance of getting data that will lead to correct conclusions.
以下是你可以採取的3個關鍵行動,去將獲得可以帶來正確結論的數據的機會最大化。
Plan How, When, and What to Measure—and Who Will Do It
計劃好「怎麼樣」、「何時」、「測量什麼」,以及「誰去測量」
Failing to plan is a great way to get unreliable data. That’s because a solid plan is the key to successful data collection. Asking why you’re gathering data at the very start of a project will help you pinpoint the data you really need. A data collection plan should clarify:
計劃失敗是獲得不可靠數據的一大罪魁禍首。因為,堅實的計劃是成功收集數據的關鍵。在項目一開始,就要詢問收集數據的原因,將會幫助你確定真正需要的數據。數據收集計劃,應該清晰易懂:
-What data will be collected
收集什麼數據
-Who will collect it
誰去收集數據
-When it will be collected
什麼時間收集到數據
-Where it will be collected
將在什麼地點收集數據
-How it will be collected
怎麼樣去收集數據
Answering these questions in advance will put you well on your way to getting meaningful data.
提前詢問這些問題,會讓你較好地去獲得有價值的數據。
Test Your Measurement System
測試測量系統
Many quality improvement projects require measurement data for factors like weight, diameter, or length and width. Not verifying the accuracy of your measurements practically guarantees that your data—and thus your results—are not reliable.
許多質量改進項目都需要測量重量、直徑、長度和寬度等方面的數據。不去核對保證數據準確性的測量方法,所得到的結果往往是不可靠的。
A branch of statistics called Measurement System Analysis lets you quickly assess and improve your measurement system so you can be sure you’re collecting data that is accurate and precise.
統計的一大分支是MSA(測量系統分析)可以讓你快速評估與改善測量系統,而你可以保證收集的數據的精確度與準確度。
When gathering quantitative data, Gage Repeatability and Reproducibility (R&R) analysis confirms that instruments and operators are measuring parts consistently.
收集量化數據時,量具的重複性和再現性分析,會確認儀器與操作人員測量的不間斷。
If you’re grading parts or identifying defects, an Attribute Agreement Analysis verifies that different
evaluators are making judgments consistent with each other and with established standards.
如果你正在對系統進行評分或者識別缺陷,屬性一致性分析,將驗證不同的評估者是否會使用既定的標準,做出一致的判斷。
If you do not examine your measurement system, you’re much more likely to add variation and
inconsistency to your data that can wind up clouding your analysis.
如果你不去檢查測量系統,這很可能會增加變量,數據的不一致,可能會干擾你的分析。
Beware of Confounding or Lurking Variables
警惕混雜/潛在的變量
As you collect data, be careful to avoid introducing unintended and unaccounted-for variables. These 「lurking」 variables can make even the most carefully collected data unreliable—and such hidden factors often are insidiously difficult to detect.
收集數據時,一定要仔細,避免去引入非預期的變量和不明的變量。這些「潛伏」的變量,甚至能夠讓非常仔細收集到的數據變得不可靠,而這些隱藏的因素往往是隱蔽的,因而很難被探測到。
A well-known example involves World War II-era bombing runs. Analysis showed that accuracy increased when bombers encountered enemy fighters, confounding all expectations. But a key variable hadn’t been factored in: weather conditions. On cloudy days, accuracy was terrible
because the bombers couldn’t spot landmarks, and the enemy didn’t bother scrambling fighters.
其中,一個著名的例子是第二次世界大戰時期的轟炸。分析顯示,當轟炸機遭遇敵方戰鬥機時,準確性會提高,會挫敗敵方所有的企圖。但是卻沒有考慮到一個關鍵得出變量:天氣狀況。在陰天時,轟炸的準確度就很差。因為轟炸機找不到目標,而敵方也懶得緊急起飛戰鬥機。
Suppose that data for your company’s key product shows a much larger defect rate for items made by the second shift than items made by the first.
好了,我們現在假設你所在公司的拳頭產品的數據顯示,第二個班組所生產的產品的缺陷比第一個班組要大得多。
Given only this information, your boss might suggest a training program for the second shift, or perhaps even more drastic action.
如果僅僅給出這樣的信息,你的老闆很可能會建議去培訓第二個班組,或者會給出更加嚴厲的措施。
But could something else be going on? Your raw materials come from three different suppliers.
還有別的嗎?你們的原材料來自於三個不同的供應商。
What does the defect rate data look like if you include the supplier along with the shift?
如果你將供應商與班組的產品缺陷掛鈎,那麼,缺陷比例數據是什麼樣子的呢?
Now you can see that defect rates for both shifts are higher when using supplier 2’s materials. Not
accounting for this confounding factor almost led to an expensive 「solution」 that probably would do little to reduce the overall defect rate.
現在,你可以看到無論是第一班組還是第二班組,在使用第二家供應商的提供的原材料時間,缺陷比例都比較高。如果不考慮這一混雜的因素,很可能會引發昂貴的「解決方案」,而那樣卻對減少整體上的缺陷率幾乎起不到什麼作用。
Take the Time to Get Data You Can Trust…
花點時間去獲得你可以相信的數據
Nobody sets out to waste time or sabotage their efforts by not collecting good data. But it’s all too easy to get problem data even when you’re being careful! When you collect data, be sure to spend
the little bit of time it takes to make sure your data is truly trustworthy.
沒有人會因為沒有收集好的數據而浪費時間,或者破壞他們自己的努力。不過,即使非常小心,誰都會很容易得到問題數據!所以,在收集數據時,請一定要花一點時間去確保收集的數據確實可靠。
The End
【特別提醒】
PLY外語學習-用專注的態度,分享職業英文!
識別以下二維碼,關注PLY外語學習!同時,你也會得到自動答覆如何加入公號旗下英文交流群,謝謝!
讀完本文,記得分享出去!