3 神经网络的计算原理：第 6 章参考文献

[1] Fukushima K. Neocognitron: A Self-Organizing Neural Network Model For A Mechanism Of Pattern Recognition Unaffected By Shift In Position[J]. Biological Cybernetics, 1980, 36(4): 193-202.

[2] Lecun Y, Boser B, Denker J S, Et Al. Backpropagation Applied To Handwritten Zip Code Recognition[J]. Neural Computation, 1989, 1(4): 541-551.

[3] Krizhevsky A, Sutskever I, Hinton G E. Imagenet Classification With Deep Convolutional Neural Networks[J]. Advances In Neural Information Processing Systems, 2012, 25.

[4] Rumelhart D E, Hinton G E, Williams R J. Learning Representations

By Back-Propagating Errors[J]. Nature, 1986, 323(6088): 533-536.

[5] Kingma D P, Welling M. Auto-Encoding Variational Bayes[J]. Arxiv Preprint Arxiv:1312.6114, 2013.

[6] Goodfellow I, Pouget-Abadie J, Mirza M, Et Al. Generative Adversarial Nets[J]. Advances In Neural Information Processing Systems, 2014, 27.

[7] Vaswani A, Shazeer N, Parmar N, Et Al. Attention Is All You Need[J]. Advances In Neural Information Processing Systems, 2017, 30.

[8] Devlin J, Chang M W, Lee K, Et Al. Bert: Pre-Training Of Deep Bidirectional Transformers For Language Understanding[J]. Arxiv Preprint Arxiv:1810.04805, 2018.

[9] Radford A, Narasimhan K, Salimans T, Et Al. Improving Language Understanding By Generative Pre-Training[J]. 2018.

[10] Mikolov T, Chen K, Corrado G, Et Al. Efficient Estimation Of Word Representations In Vector Space[J]. Arxiv Preprint Arxiv:1301.3781, 2013.

[11] Pennington J, Socher R, Manning C D. Glove: Global Vectors For Word Representation[C]//Proceedings Of The 2014 Conference On Empirical Methods In Natural Language Processing (EMNLP). 2014: 1532-1543.

[12] Chowdhury G G. Introduction To Modern Information Retrieval[M]. Facet Publishing, 2010.

[13] Liu Y, Ott M, Goyal N, Et Al. Roberta: A Robustly Optimized Bert Pretraining Approach[J]. Arxiv Preprint Arxiv:1907.11692, 2019.

[14] Rosenblatt, F. (1958). The Perceptron: A Probabilistic Model For Information Storage And Organization In The Brain. Psychological

Review, 65(6), 386.

[15] Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H. (2009). The Elements Of Statistical Learning: Data Mining, Inference, And Prediction (Vol. 2, Pp. 1-758). New York: Springer.

[16] Lecun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D. (1989). Backpropagation Applied To Handwritten Zip Code Recognition. Neural Computation, 1(4), 541- 551.

[17] Lecun, Y. (1989). Generalization And Network Design Strategies. Connectionism In Perspective, 19(143-155), 18.

[18] Lecun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W., & Jackel, L. (1989). Handwritten Digit Recognition With A Back-Propagation Network. Advances In Neural Information Processing Systems, 2.

[19] Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient- Based Learning Applied To Document Recognition. Proceedings Of The IEEE, 86(11), 2278-2324.

[20] Ioffe, S., & Szegedy, C. (2015). Batch Normalization: Accelerating Deep Network Training By Reducing Internal Covariate Shift. Arxiv Preprint Arxiv:1502.03167.

[21] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A Simple Way To Prevent Neural Networks From Overfitting. The Journal Of Machine Learning Research, 15(1), 1929-1958.

[22] Kingma, D. P., & Ba, J. (2014). Adam: A Method For Stochastic Optimization. Arxiv Preprint Arxiv:1412.6980.

[23] Zeiler, M. D. (2012). ADADELTA: An Adaptive Learning Rate

Method. Arxiv Preprint Arxiv:1212.5701.

[24] Reddi, S. J., Kale, S., & Kumar, S. (2018). On The Convergence Of Adam And Beyond. ICLR.

[25] Jordan, M. I. (1997). Serial Order: A Parallel Distributed Processing Approach. In Advances In Psychology (Vol. 121, Pp. 471-495). North-Holland.

[26] Elman, J. L. (1990). Finding Structure In Time. Cognitive Science, 14(2), 179-211.

[27] Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735-1780.

[28] Cho, K., Van Merriënboer, B., Bahdanau, D., & Bengio, Y. (2014). On The Properties Of Neural Machine Translation: Encoder- Decoder Approaches. Arxiv Preprint Arxiv:1409.1259.

[29] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention Is All You Need. Advances In Neural Information Processing Systems, 30.

[30] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-Training Of Deep Bidirectional Transformers For Language Understanding. Arxiv Preprint Arxiv:1810.04805.

[31] Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever,

I. (2019). Language Models Are Unsupervised Multitask Learners. Openai Blog, 1(8), 9.

[32] Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). Roberta: A Robustly Optimized BERT Pretraining Approach. Arxiv Preprint Arxiv:1907.11692.

[33] Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M.,

Zhou, Y., Li, W., & Liu, P. J. (2020). Exploring The Limits Of Transfer Learning With A Unified Text-To-Text Transformer. Journal Of Machine Learning Research, 21(140), 1-67.

[34] Yan, S., Xiong, Y., & Lin, D. (2021). Spatial Temporal Graph Convolutional Networks For Skeleton-Based Action Recognition. Proceedings Of The AAAI Conference On Artificial Intelligence, 35(1), 607-615.

[35] Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q., & Salakhutdinov,

R. (2019). Transformer-Xl: Attentive Language Models Beyond A Fixed-Length Context. Arxiv Preprint Arxiv:1901.02860.

[36] Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut,

R. (2020). ALBERT: A Lite BERT For Self-Supervised Learning Of Language Representations. ICLR.

[37] Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., ... & Zettlemoyer, L. (2020). BART: Denoising Sequence- To-Sequence Pre-Training For Natural Language Generation, Translation, And Comprehension. Arxiv Preprint Arxiv:1910.13461.

[38] Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D. (2020). Language Models Are Few- Shot Learners. Arxiv Preprint Arxiv:2005.14165.

[39] Kang, D., Khot, T., Sabharwal, A., & Hovy, E. (2020). Data Augmentation For Deep Learning: A Survey. Arxiv Preprint Arxiv:2108.10329.

[40] Yan, Y., Bi, W., Li, X., Wu, W., Zhang, Z., & Tu, Z. (2021). VATT: Transformers For Multimodal Self-Supervised Learning From Raw Video, Audio And Text. Neurips.

[41] Graves, A., Mohamed, A. R., & Hinton, G. (2013). Speech

Recognition With Deep Recurrent Neural Networks. IEEE International Conference On Acoustics, Speech And Signal Processing.

[42] Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence To Sequence Learning With Neural Networks. NIPS.

[43] Luong, M. T., Pham, H., & Manning, C. D. (2015). Effective Approaches To Attention-Based Neural Machine Translation. EMNLP.

[44] Britz, D., Goldie, A., Luong, M. T., & Le, Q. V. (2017). Massive Exploration Of Neural Machine Translation Architectures. EMNLP.

[45] Edunov, S., Ott, M., Auli, M., Grangier, D., & Ranzato, M. (2018). Classical Structured Prediction Losses For Sequence To Sequence Learning. NAACL.

[46] Gu, J., Wang, Y., Chen, Y., Li, V. O., & Cho, K. (2019). Meta- Learning For Low-Resource Neural Machine Translation. EMNLP- IJCNLP.

[47] Bai, S., Kolter, J. Z., & Koltun, V. (2018). An Empirical Evaluation Of Generic Convolutional And Recurrent Networks For Sequence Modeling. Arxiv Preprint Arxiv:1803.01271.

[48] Gehring, J., Auli, M., Grangier, D., Yarats, D., & Dauphin, Y. N. (2017). Convolutional Sequence To Sequence Learning. ICML.

[49] Bello, I., Zoph, B., Vaswani, A., Shlens, J., & Le, Q. V. (2019). Attention Augmented Convolutional Networks. ICCV.

[50] Perozzi, B., Al-Rfou, R., & Skiena, S. (2014). Deepwalk: Online Learning Of Social Representations. Proceedings Of The 20th ACM SIGKDD International Conference On Knowledge Discovery And Data Mining (Pp. 701-710).

[51] Grover, A., & Leskovec, J. (2016). Node2vec: Scalable Feature Learning For Networks. Proceedings Of The 22nd ACM SIGKDD International Conference On Knowledge Discovery And Data Mining (Pp. 855-864).

[52] Hou Z, Liu X, Cen Y, Et Al. Graphmae: Self-Supervised Masked Graph Autoencoders[C]//Proceedings Of The 28th ACM SIGKDD Conference On Knowledge Discovery And Data Mining. 2022: 594- 604.

[53] Kipf T N, Welling M. Variational Graph Auto-Encoders[J]. Arxiv Preprint Arxiv:1611.07308, 2016.

[54] Bo D, Wang X, Shi C, Et Al. Beyond Low-Frequency Information In Graph Convolutional Networks[C]//Proceedings Of The AAAI Conference On Artificial Intelligence. 2021, 35(5): 3950-3957.

[55] Yang L, Li M, Liu L, Et Al. Diverse Message Passing For Attribute With Heterophily[J]. Advances In Neural Information Processing Systems, 2021, 34: 4751-4763.

[56] He D, Liang C, Liu H, Et Al. Block Modeling-Guided Graph Convolutional Neural Networks[C]//Proceedings Of The AAAI Conference On Artificial Intelligence. 2022, 36(4): 4022-4029.

[57] Zhu J, Yan Y, Zhao L, Et Al. Beyond Homophily In Graph Neural Networks: Current Limitations And Effective Designs[J]. Advances In Neural Information Processing Systems, 2020, 33: 7793-7804.

[58] Chien E, Peng J, Li P, Et Al. Adaptive Universal Generalized Pagerank Graph Neural Network[J]. Arxiv Preprint Arxiv:2006.07988, 2020.

[59] Wang X, Zhu M, Bo D, Et Al. Am-Gcn: Adaptive Multi-Channel Graph Convolutional Networks[C]//Proceedings Of The 26th ACM

SIGKDD International Conference On Knowledge Discovery & Data Mining. 2020: 1243-1253.

[60] Pei H, Wei B, Chang K C C, Et Al. Geom-Gcn: Geometric Graph Convolutional Networks[J]. Arxiv Preprint Arxiv:2002.05287, 2020.

[61] Lim D, Hohne F, Li X, Et Al. Large Scale Learning On Non- Homophilous Graphs: New Benchmarks And Strong Simple Methods[J]. Advances In Neural Information Processing Systems, 2021, 34: 20887-20902.

[62] Yu Z, Jin D, Wei J, Et Al. Teko: Text-Rich Graph Neural Networks With External Knowledge[J]. IEEE Transactions On Neural Networks And Learning Systems, 2023

[63] Jin D, Song X, Yu Z, Et Al. Bite-Gcn: A New Gcn Architecture Via Bidirectional Convolution Of Topology And Features On Text-Rich Networks[C]//Proceedings Of The 14th ACM International Conference On Web Search And Data Mining. 2021: 157-165.

[64] Yu Z, Jin D, Liu Z, Et Al. AS-GCN: Adaptive Semantic Architecture Of Graph Convolutional Networks For Text-Rich Networks[C]//2021 IEEE International Conference On Data Mining (ICDM). IEEE, 2021: 837-846.

[65] Wang X, Ji H, Shi C, Et Al. Heterogeneous Graph Attention Network[C]//The World Wide Web Conference. 2019: 2022-2032.

[66] Fu X, Zhang J, Meng Z, Et Al. Magnn: Metapath Aggregated Graph Neural Network For Heterogeneous Graph Embedding[C]//Proceedings Of The Web Conference 2020. 2020: 2331-2341.

[67] Yun S, Jeong M, Kim R, Et Al. Graph Transformer Networks[J]. Advances In Neural Information Processing Systems, 2019, 32.

[68] Jin D, Huo C, Liang C, Et Al. Heterogeneous Graph Neural Network Via Attribute Completion[C]//Proceedings Of The Web Conference 2021. 2021: 391-400.

[69] Jin G, Liang Y, Fang Y, Et Al. Spatio-Temporal Graph Neural Networks For Predictive Learning In Urban Computing: A Survey[J]. Arxiv Preprint Arxiv:2303.14483, 2023.

[70] Chen C, Cheng Z, Li Z, Et Al. Hypergraph Attention Networks[C]//2020 IEEE 19th International Conference On Trust, Security And Privacy In Computing And Communications (Trustcom). IEEE, 2020: 1560-1565.

[71] Kajino H. Molecular Hypergraph Grammar With Its Application To Molecular Optimization[C]//International Conference On Machine Learning. PMLR, 2019: 3183-3191.

[72] Wang C, Pan S, Long G, Et Al. Mgae: Marginalized Graph Autoencoder For Graph Clustering[C]//Proceedings Of The 2017 ACM On Conference On Information And Knowledge Management. 2017: 889-898.

[73] Hou Z, Liu X, Cen Y, Et Al. Graphmae: Self-Supervised Masked Graph Autoencoders[C]//Proceedings Of The 28th ACM SIGKDD Conference On Knowledge Discovery And Data Mining. 2022: 594- 604.

[74] Kipf T N, Welling M. Variational Graph Auto-Encoders[J]. Arxiv Preprint Arxiv:1611.07308, 2016.

[75] Srong Y, Bian Y, Xu T, Et Al. Self-Supervised Graph Transformer On Large-Scale Molecular Data[J]. Advances In Neural Information Processing Systems, 2020, 33: 12559-12571.

[76] Sun K, Lin Z, Zhu Z. Multi-Stage Self-Supervised Learning For

Graph Convolutional Networks On Graphs With Few Labeled Nodes[C]//Proceedings Of The AAAI Conference On Artificial Intelligence. 2020, 34(04): 5892-5899.

[77] Peng Z, Dong Y, Luo M, Et Al. Self-Supervised Graph Representation Learning Via Global Context Prediction[J]. Arxiv Preprint Arxiv:2003.01604, 2020.

[78] Hu Z, Kou G, Zhang H, Et Al. Rectifying Pseudo Labels: Iterative Feature Clustering For Graph Representation Learning[C]//Proceedings Of The 30th ACM International Conference On Information & Knowledge Management. 2021: 720- 729.

[79] Perozzi B, Al-Rfou R, Skiena S. Deepwalk: Online Learning Of Social Representations[C]//Proceedings Of The 20th ACM SIGKDD International Conference On Knowledge Discovery And Data Mining. 2014: 701-710.

[80] Grover A, Leskovec J. Node2vec: Scalable Feature Learning For Networks[C]//Proceedings Of The 22nd ACM SIGKDD International Conference On Knowledge Discovery And Data Mining. 2016: 855-864.

[81] Tang J, Qu M, Wang M, Et Al. Line: Large-Scale Information Network Embedding[C]//Proceedings Of The 24th International Conference On World Wide Web. 2015: 1067-1077.

[82] Zbontar J, Jing L, Misra I, Et Al. Barlow Twins: Self-Supervised Learning Via Redundancy Reduction[C]//International Conference On Machine Learning. PMLR, 2021: 12310-12320.

[83] Bardes A, Ponce J, Lecun Y. Vicreg: Variance-Invariance- Covariance Regularization For Self-Supervised Learning[J]. Arxiv

Preprint Arxiv:2105.04906, 2021.

[84] Zhang H, Wu Q, Yan J, Et Al. From Canonical Correlation Analysis To Self-Supervised Graph Neural Networks[J]. Advances In Neural Information Processing Systems, 2021, 34: 76-89.

[85] Garrido Q, Chen Y, Bardes A, Et Al. On The Duality Between Contrastive And Non-Contrastive Self-Supervised Learning[J]. Arxiv Preprint Arxiv:2206.02574, 2022.

[86] Haochen J Z, Wei C, Gaidon A, Et Al. Provable Guarantees For Self- Supervised Deep Learning With Spectral Contrastive Loss[J]. Advances In Neural Information Processing Systems, 2021, 34: 5000-5011.

[87] Balestriero R, Lecun Y. Contrastive And Non-Contrastive Self- Supervised Learning Recover Global And Local Spectral Embedding Methods[J]. Advances In Neural Information Processing Systems, 2022, 35: 26671-26685.

[88] Kipf T N, Welling M. Variational Graph Auto-Encoders[J]. NIPS, 2016.

[89] Simonovsky, M., Komodakis, N. Graphvae: Towards Generation Of Small Graphs Using Variational Autoencoders[C]. Artificial Neural Networks And Machine Learning, 2018.

[90] Hongwei Wang, Jia Wang, Jialin Wang And Et Al. Graphgan: Graph Representation Learning With Generative Adversarial Nets[C]. AAAI, 2018.

[91] You J, Ying R, Ren X And Et Al. Graphrnn: Generating Realistic Graphs With Deep Auto-Regressive Models[C]. ICML, 2018.

[92] Youzhi Luo, Keqiang Yan, Shuiwang Ji. Graphdf: A Discrete Flow Model For Molecular Graph Generation[C]. ICML, 2021.

[93] Wenqi Fan, C. Liu, Yunqing Liu And At El. Generative Diffusion Models On Graphs: Methods And Applications. IJCAI, 2023.

< 上一个 | 内容 | 下一个 >