服务热线
178 0020 3020
library(httr)
baseUrl="https://eutils.ncbi.nlm.nih.gov/"
pubmedAction=list(
base="entrez/eutils/index.fcgi",
search="entrez/eutils/esearch.fcgi", #搜索接口
fetch="entrez/eutils/efetch.fcgi", #获取数据接口
summary="entrez/eutils/esummary.fcgi" #获取数据接口(fetch可返回多种数据格式)
)
#搜索文章的参数
searchArticleParam=list(
retstart=0, #起始位置
retmax=20, #每次取的数量
usehistory='Y',#是否使用历史搜索
querykey='',
webenv='',
term='(cell[TA]) AND 2017[DP]',#提交pubmed的词,
total_num=0, #总记录
total_page=1, #总页数
page_size=20, #每页数目
current_page=1 #当前所在页数
)
postSearchUrl=paste(baseUrl,pubmedAction$search,sep="") #拼接搜索地址
r <- POST(postSearchUrl,
body = list(
db='pubmed',
term=searchArticleParam$term,
retmode='json',
retstart=searchArticleParam$retstart,
retmax=searchArticleParam$retmax,
usehistory=searchArticleParam$usehistory,
rettype='uilist'
)
)
stop_for_status(r) #清除http状态字符串
data=content(r, "parsed", "application/json")
#data里面存储了所有数据
esearchresult=data$esearchresult
# $count=562,$retmax=20, $retstart=0,$querykey=1, $webenv=NCID_1_30290513_130.14.18.34_9001_1515165012_617859421_0MetA0_S_MegaStore_F_1
count = esearchresult$count
print(count)
searchArticleParam$total_num=esearchresult$count
searchArticleParam$querykey=esearchresult$querykey
searchArticleParam$webenv=esearchresult$webenv
pubmedidStr="28431241"; #多个pubmedid之间用“,”连接
postFetchUrl=paste(baseUrl,pubmedAction$fetch,sep="")
r2 <- POST(postFetchUrl,
body = list(
db='pubmed',
id=pubmedidStr,
retmode='xml', #返回xml格式的,这个接口不支持json格式
usehistory=searchArticleParam$usehistory,
querykey=searchArticleParam$querykey,
webenv=searchArticleParam$webenv
)
)
stop_for_status(r2)
library(xml2)
data2=content(r2, "parsed", "application/xml")
article=xml_children(data2)
#xml_length(article)为里面文章的数量
count=length(article)
cnt=1
while(cnt<=count){ #循环将title和abstract输出
title=xml_find_first(article[cnt],".//ArticleTitle") #找到第一个ArticleTitle节点
abstract=xml_find_first(article[cnt],".//AbstractText")
write.table(xml_text(title), file = "F:/R/a.txt", append =T,quote = FALSE,row.names = FALSE, col.names = FALSE)
write.table(xml_text(abstract), file = "F:/R/a.txt", append =T,quote = FALSE,row.names = FALSE, col.names = FALSE)
cnt = cnt + 1
}
结果:
part1
[1] 563
part2
AKT/PKB Signaling: Navigating the Network.
The Ser and Thr kinase AKT, also known as protein kinase B (PKB), was discovered 25 years ago and has been the focus of tens of thousands of studies in diverse fields of biology and medicine. There have been many advances in our knowledge of the upstream regulatory inputs into AKT, key multifunctional downstream signaling nodes (GSK3, FoxO, mTORC1), which greatly expand the functional repertoire of AKT, and the complex circuitry of this dynamically branching and looping signaling network that is ubiquitous to nearly every cell in our body. Mouse and human genetic studies have also revealed physiological roles for the AKT network in nearly every organ system. Our comprehension of AKT regulation and functions is particularly important given the consequences of AKT dysfunction in diverse pathological settings, including developmental and overgrowth syndromes, cancer, cardiovascular disease, insulin resistance and type 2 diabetes, inflammatory and autoimmune disorders, and neurological disorders. There has also been much progress in developing AKT-selective small molecule inhibitors. Improved understanding of the molecular wiring of the AKT signaling network continues to make an impact that cuts across most disciplines of the biomedical sciences.
看懂了search那部分,然后用xml解析就不明白了。照猫画虎,最终得到了结果。希望以后用的时候依然有效。
附件