Image to Music: Cross-Modal Melody Generation Through Image Captioning

Advances in machine learning in recent years have also been seen in computationally creative systems. Interest in machine-generated artifacts paved a way for creative models to evolve as such. But the earlier methods mostly explored a one domain approach and cross-modal learning has stayed relatively unexplored. Thus, the direct mapping between modalities for cross-modal creative models is not fully explored. This work proposes a novel methodology for generating symbolic music through images by directly mapping their features. A CNN encoder and deep stacked LSTM decoder are the base models as the proposed method uses the image captioning approach to map the two domains’ features. The generated music is evaluated quantitatively by using a custom genre classification model and BLEU scores calculations. The qualitative evaluation involves a melody listening test with human evaluators. The results show that the proposed method works well for music generation.

Views
87
29.12.2023 since the date of
Downloaded
2
29.12.2023 since the date of
Last Access Date
20 Kasım 2024 14:32
Google Check
Click
Full Text
Full text Click to download Preview
Detailed View
Title
(dc.title)
Image to Music: Cross-Modal Melody Generation Through Image Captioning
Author [Asıl]
(dc.creator.author)
Kaplan, Alper
Yazar Departmanı
(dc.creator.department)
Yeditepe University Graduate School of Social Sciences
Yazar Departmanı
(dc.creator.department)
Yeditepe University Graduate School of Social Sciences Cognitive Science Department
Publication Date
(dc.date.issued)
2023
Publication Type [Academic]
(dc.type)
preprint
Publication Type [Media]
(dc.format)
application/pdf
Subject Headings [General]
(dc.subject)
Music Generation
Subject Headings [General]
(dc.subject)
Melody Generation
Subject Headings [General]
(dc.subject)
Cross-Domain Learning
Subject Headings [General]
(dc.subject)
Image Captioning
Subject Headings [General]
(dc.subject)
Machine Learning
Subject Headings [General]
(dc.subject)
Deep Learning
Subject Headings [General]
(dc.subject)
Müzik Üretimi
Subject Headings [General]
(dc.subject)
Melodi Üretimi
Subject Headings [General]
(dc.subject)
Alanlar Arası Öğrenim
Subject Headings [General]
(dc.subject)
Resim Altyazısı
Subject Headings [General]
(dc.subject)
Makine öğrenimi
Subject Headings [General]
(dc.subject)
Derin Öğrenme
Publisher
(dc.publisher)
Yeditepe University Academic and Open Access Information System
Language
(dc.language.iso)
eng
Abstract
(dc.description.abstract)
Advances in machine learning in recent years have also been seen in computationally creative systems. Interest in machine-generated artifacts paved a way for creative models to evolve as such. But the earlier methods mostly explored a one domain approach and cross-modal learning has stayed relatively unexplored. Thus, the direct mapping between modalities for cross-modal creative models is not fully explored. This work proposes a novel methodology for generating symbolic music through images by directly mapping their features. A CNN encoder and deep stacked LSTM decoder are the base models as the proposed method uses the image captioning approach to map the two domains’ features. The generated music is evaluated quantitatively by using a custom genre classification model and BLEU scores calculations. The qualitative evaluation involves a melody listening test with human evaluators. The results show that the proposed method works well for music generation.
Record Add Date
(dc.date.accessioned)
2023-12-28
Açık Erişim Tarihi
(dc.date.available)
2023-12-28
Haklar
(dc.rights)
Yeditepe University Academic and Open Access Information System
Erişim Hakkı
(dc.rights.access)
Open Access
Copyright
(dc.rights.holder)
Unless otherwise stated, copyrights belong to Yeditepe University. Usage permissions are specified in the Open Access System, and "InC-NC/1.0" and "by-nc-nd/4.0" are as stated.
Copyright Url
(dc.rights.uri)
http://creativecommons.org/licenses/by-nc-nd/4.0
Copyright Url
(dc.rights.uri)
https://rightsstatements.org/page/InC-NC/1.0/?language=en
Description
(dc.description)
Final published version
Description [Note]
(dc.description.note)
Note: This preprint reports new research that has not been certified by peer review and should not be used as established information without consulting multiple experts in the field.
Description Collection Information
(dc.description.collectioninformation)
This item is part of the preprint collection made available through Yeditepe University library. For your questions, our contact address is openaccess@yeditepe.edu.tr
Yazar [KatkıdaBulunan]
(dc.contributor.author)
Goularas, Dionysis
Author [Contributor] Institution
(dc.contributor.institution)
Yeditepe University Graduate School of Natural and Applied Sciences
Author [Contributor] Institution
(dc.contributor.institution)
Yeditepe University Graduate School of Natural and Applied Sciences Computer Engineering Department
Author Contributor OrcID
(dc.contributor.authorOrcid)
0000-0002-4802-2802
Analyses
Publication View
Publication View
Accessed Countries
Accessed Cities
6698 sayılı Kişisel Verilerin Korunması Kanunu kapsamında yükümlülüklerimiz ve çerez politikamız hakkında bilgi sahibi olmak için alttaki bağlantıyı kullanabilirsiniz.

creativecommons
Bu site altında yer alan tüm kaynaklar Creative Commons Alıntı-GayriTicari-Türetilemez 4.0 Uluslararası Lisansı ile lisanslanmıştır.
Platforms