PubChem-簡介

2021-02-15 FindKey
web:https://pubchem.ncbi.nlm.nih.gov/

PubChem是美國國立衛生研究院(NIH)的開放化學資料庫。「開放」意味著您可以將科學數據存儲在PubChem中,其他人也可以使用它。自2004年啟動以來,PubChem已成為科學家,學生和公眾的重要化學信息資源。我們的網站和程序化服務每個月都會為全球數百萬用戶提供數據。

PubChem主要包含小分子,但也包含較大的分子,例如核苷酸,碳水化合物,脂質,肽和化學修飾的大分子。我們收集有關化學結構,標識符,化學和物理性質,生物活性,專利,健康,安全,毒性數據等許多信息。

PubChem中的數據來自哪裡?PubChem記錄由數百個數據源提供。示例包括:政府機構,化學品供應商,期刊出版商等等。PubChem中的數據量正在不斷增長,請訪問PubChem統計頁面以了解最新的數據計數。
PubChem 統計數據

幾個概念:web:https://pubchemdocs.ncbi.nlm.nih.gov/data-organization

Substances:數據源使用PubChem Upload提交包含注釋和可選結構的記錄。每個數據源的每個記錄都分配有一個唯一的Substance Identifier(SID)。例如,如果有十個組織提交了阿司匹林信息記錄,那麼將創建十個唯一的物質(SID)記錄。Substance records是檔案檔案,允許人們調查以前提交的版本。

Compounds:若且唯若一個或多個物質記錄包含可以標準化為相同化學結構的結構時,才會自動生成一個Compound record (CID)。例如,許多包含阿司匹林相同結構的化學藥品供應商物質記錄將被匯總到單個Compound record (CID)。Compound record成為可用於給定化學結構的所有PubChem信息的有用摘要。

BioAssays:數據源提交生物活性測試結果以及描述substances(SID)的生物測定實驗的相關注釋。每個數據源的每個實驗均分配有唯一的 BioAssay Identifier(AID)。BioAssay Identifier(AID)包括研究人員定義的生物活性的有/無活性測定,並附有解釋。BioAssay Identifier是存檔文件,允許人們調查以前提交的版本。
Downloading PubChem Data:

PubChem是一個開放訪問資料庫,其中的大多數數據均可下載。如果許可協議阻止我們的數據提供者允許批量下載某些數據集,則可能會有例外。


https://pubchem.ncbi.nlm.nih.gov/rest/pug/<input specification>/<operation specification>/[<output specification>][?<operation_options>]

<input specification> = <domain>/<namespace>/<identifiers><domain> = substance | compound | assay | <other inputs>compound domain <namespace> = cid | name | smiles | inchi | sdf | inchikey | formula | <structure search> | <xref> | listkey | <fast search><structure search> = {substructure | superstructure | similarity | identity}/{smiles | inchi | sdf | cid}<fast search> = {fastidentity | fastsimilarity_2d | fastsimilarity_3d | fastsubstructure | fastsuperstructure}/{smiles | smarts | inchi | sdf | cid} | fastformula<xref> = xref / {RegistryID | RN | PubMedID | MMDBID | ProteinGI | NucleotideGI | TaxonomyID | MIMID | GeneID | ProbeID | PatentID}substance domain <namespace> = sid | sourceid/<source id> | sourceall/<source name> | name | <xref> | listkey<source name> = any valid PubChem depositor nameassay domain <namespace> = aid | listkey | type/<assay type> | sourceall/<source name> | target/<assay target> | activity/<activity column name><assay type> = all | confirmatory | doseresponse | onhold | panel | rnai | screening | summary | cellbased | biochemical | invivo | invitro | activeconcentrationspecified<assay target> = gi | proteinname | geneid | genesymbol | accession<identifiers> = comma-separated list of positive integers (e.g. cid, sid, aid) or identifier strings (source, inchikey, formula); in some cases only a single identifier string (name, smiles, xref; inchi, sdf by POST only)<other inputs> = sources / [substance, assay] |sourcetable | conformers | annotations/[sourcename/<source name> | heading/<heading>]
例子:https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2244/<operation specification>/[<output specification>]


compound domain <operation specification> = record | <compound property> | synonyms | sids | cids | aids | assaysummary | classification | <xrefs> | description | conformers<compound property> = property / [comma-separated list of property tags]substance domain <operation specification> = record | synonyms | sids | cids | aids | assaysummary | classification | <xrefs> | description<xrefs> = xrefs / [comma-separated list of xrefs tags]assay domain <operation specification> = record | concise | aids | sids | cids | description | targets/<target type> | <doseresponse> | summary | classificationtarget_type = {ProteinGI, ProteinName, GeneID, GeneSymbol}<doseresponse> = doseresponse/sidFor example, to access the molecular formula and InChI key for CID 2244, one would use a URL like:https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2244/property/MolecularFormula,InChIKey/[<output specification>]


<output specification> = XML | ASNT | ASNB | JSON | JSONP [ ?callback=<callback name> ] | SDF | CSV | PNG | TXTFor example, to access the molecular formula for CID 2244 in JSON format, one would use the (now complete) URL:https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2244/property/MolecularFormula/JSONJSONP takes an optional callback function name (which defaults to 「callback」 if not specified). For example:https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2244/property/MolecularFormula/JSONP?callback=my_callback


微信公眾號:FindKey

Bilibili:ZeroDesigner

寫作不易,歡迎讚賞

相關焦點