Data Application Lab 自2017年6月15日起,每天和你分享討論一道數據科學(DS)和商業分析(BA)領域常見的面試問題。
自2017年10月4日起,每天再為大家分享一道Leetcode 算法題。
希望積極尋求相關領域工作的你每天關注我們的問題並且與我們一起思考,我們將會在第二天給出答案。
What is cross-validation? How to do it right?
Combine two tables
Table: Person
+---+----+
| Column Name | Type |
+---+----+
| PersonId | int |
| FirstName | varchar |
| LastName | varchar |
+---+----+
PersonId is the primary key column for this table.
Table: Address
+---+----+
| Column Name | Type |
+---+----+
| AddressId | int |
| PersonId | int |
| City | varchar |
| State | varchar |
+---+----+
AddressId is the primary key column for this table.
Write a SQL query for a report that provides the following information for each person in the Person table, regardless if there is an address for each of those people:
FirstName, LastName, City, State
Two Sum
Description:
Given an array of integers, return indices of the two numbers such that they add up to a specific target.
You may assume that each input would have exactly one solution, and you may not use the same element twice
Input: [2, 7, 11, 15]
Output: [0, 1]
Assumptions: 1. each input would have exactly one solution
2. you may not use the same element twice
3. sorted in ascending order
DS Interview Question & Answer
During analysis, how do you treat missing values?
Should we even treat missing values is another important point to consider? If 80% of the values for a variable are missing then you may drop the variable instead of treating the missing values.
Deleting the observations: when your have sufficient data points and your delete will not introduce bias
Imputation with mean / median / mode or set default value
Imputation with some models: KNN, Mice etc.
Use other features to build a model to predict the missing part
...
Reference:
https://www.r-bloggers.com/missing-value-treatment/
BA Interview Question & Answer
Write a query in SQL to Obtain the names of all patients whose primary care is taken by a physician who is not the head of any department and name of that physician along with their primary care physician.
Table: patient (pt)
ssn | name | address | phone | insuranceid | pcp
-+++----+-+---
100000001 | John Smith | 42 Foobar Lane | 555-0256 | 68476213 | 1
100000002 | Grace Ritchie | 37 Snafu Drive | 555-0512 | 36546321 | 2
100000003 | Random J. Patient | 101 Omgbbq Street | 555-1204 | 65465421 | 2
100000004 | Dennis Doe | 1100 Foobaz Avenue | 555-2048 | 68421879 | 3
Table: physician (p)
Employeeid | name | position | ssn
--++---+-
1 | John Dorian | Staff Internist | 111111111
2 | Elliot Reid | Attending Physician | 222222222
3 | Christopher Turk | Surgical Attending Physician | 333333333
4 | Percival Cox | Senior Attending Physician | 444444444
5 | Bob Kelso | Head Chief of Medicine | 555555555
6 | Todd Quinlan | Surgical Attenian | 666666666
7 | John Wen | Surgical Attending Physician | 777777777
8 | Keith Dudemeister | MD Resident | 888888888
9 | Molly Clock | Attending Psychiatrist | 999999999
Answer:
SELECT pt.name AS "Patient",
p.name AS "Primary care Physician"
FROM patient pt
JOIN physician p ON pt.pcp=p.employeeid
WHERE pt.pcp NOT IN
(SELECT head
FROM department);
https://www.w3resource.com/sql-exercises/hospital-database-exercise/sql-exercise-hospital-database-39.php
LeetCode Question & Answer
Pascal’s Triangle II
Description:
Given an index k, return the kth row of the Pascal’s triangle.
Input: 3
Output: [1,3,3,1]
Assumptions:
Could you optimize your algorithm to use only O(k) extra space?
Solution:
Pascal’s Triangle 的follow up,重點在於O(k)的空間複雜度
通過滾動數組的方式可以達到O(k)的空間複雜度
Code:
Time Complexity: O(k ^ 2)
Space Complexity: O(k)
往期精彩回顧
點擊「閱讀原文」查看數據應用學院核心課程