




版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)
文檔簡介
1
GPT-4oSystemCard
OpenAI
August8,2024
1Introduction
GPT-4o[1]isanautoregressiveomnimodel,whichacceptsasinputanycombinationoftext,audio,
image,andvideoandgeneratesanycombinationoftext,audio,andimageoutputs.It’strainedend-to-endacrosstext,vision,andaudio,meaningthatallinputsandoutputsareprocessedbythesameneuralnetwork.
GPT-4ocanrespondtoaudioinputsinaslittleas232milliseconds,withanaverageof320
milliseconds,whichissimilartohumanresponsetime[2]inaconversation
.ItmatchesGPT-4TurboperformanceontextinEnglishandcode,withsignificantimprovementontextinnon-Englishlanguages,whilealsobeingmuchfasterand50%cheaperintheAPI.GPT-4oisespeciallybetteratvisionandaudiounderstandingcomparedtoexistingmodels.
InlinewithourcommitmenttobuildingAIsafelyandconsistentwithourvoluntarycommitments
totheWhiteHouse[3],wearesharingtheGPT-4oSystemCard,whichincludesourPreparedness
Framework[4]evaluations
.InthisSystemCard,weprovideadetailedlookatGPT-4o’scapabilities,limitations,andsafetyevaluationsacrossmultiplecategories,withafocusonspeech-to-speech
(voice)1
whilealsoevaluatingtextandimagecapabilities,andthemeasureswe’veimplementedtoensurethemodelissafeandaligned.Wealsoincludethirdpartyassessmentsondangerouscapabilities,aswellasdiscussionofpotentialsocietalimpactsofGPT-4otextandvisioncapabilities.
2Modeldataandtraining
GPT-4o’stextandvoicecapabilitieswerepre-trainedusingdatauptoOctober2023,sourcedfromawidevarietyofmaterialsincluding:
?Selectpubliclyavailabledata,mostlycollectedfromindustry-standardmachinelearningdatasetsandwebcrawls.
?Proprietarydatafromdatapartnerships.Weformpartnershipstoaccessnon-publiclyavailabledata,suchaspay-walledcontent,archives,andmetadata.Forexample,we
partneredwithShutterstock[5]o
nbuildinganddeliveringAI-generatedimages.
1Someevaluations,inparticular,themajorityofthePreparednessEvaluations,thirdpartyassessmentsandsomeofthesocietalimpactsfocusonthetextandvisioncapabilitiesofGPT-4o,dependingontheriskassessed.ThisisindicatedaccordinglythroughouttheSystemCard.
2
ThekeydatasetcomponentsthatcontributetoGPT-4o’scapabilitiesare:
?WebData:Datafrompublicwebpagesprovidesarichanddiverserangeofinformation,ensuringthemodellearnsfromawidevarietyofperspectivesandtopics.
?CodeandMath:–Includingcodeandmathdataintraininghelpsthemodeldeveloprobustreasoningskillsbyexposingittostructuredlogicandproblem-solvingprocesses.
?MultimodalData–Ourdatasetincludesimages,audio,andvideototeachtheLLMshowtointerpretandgeneratenon-textualinputandoutput.Fromthisdata,themodellearnshowtointerpretvisualimages,actionsandsequencesinreal-worldcontexts,language
patterns,andspeechnuances.
Priortodeployment,OpenAIassessesandmitigatespotentialrisksthatmaystemfromgenerativemodels,suchasinformationharms,biasanddiscrimination,orothercontentthatviolatesourusagepolicies.Weuseacombinationofmethods,spanningallstagesofdevelopmentacrosspre-training,post-training,productdevelopment,andpolicy.Forexample,duringpost-training,wealignthemodeltohumanpreferences;wered-teamtheresultingmodelsandaddproduct-levelmitigationssuchasmonitoringandenforcement;andweprovidemoderationtoolsandtransparencyreportstoourusers.
Wefindthatthemajorityofeffectivetestingandmitigationsaredoneafterthepre-trainingstagebecausefilteringpre-traineddataalonecannotaddressnuancedandcontext-specificharms.Atthesametime,certainpre-trainingfilteringmitigationscanprovideanadditionallayerofdefensethat,alongwithothersafetymitigations,helpexcludeunwantedandharmfulinformationfromourdatasets:
?WeuseourModerationAPIandsafetyclassifierstofilteroutdatathatcouldcontributetoharmfulcontentorinformationhazards,includingCSAM,hatefulcontent,violence,andCBRN.
?Aswithourpreviousimagegenerationsystems,wefilterourimagegenerationdatasetsforexplicitcontentsuchasgraphicsexualmaterialandCSAM.
?Weuseadvanceddatafilteringprocessestoreducepersonalinformationfromtrainingdata.
?UponreleasingDALL-E3,wepilotedanewapproachtogiveusersthepowerto
opt
imagesoutoftraining.
Torespectthoseopt-outs,wefingerprintedtheimagesandusedthefingerprintstoremoveallinstancesoftheimagesfromthetrainingdatasetfortheGPT-4o
seriesofmodels.
3Riskidentification,assessmentandmitigation
Deploymentpreparationwascarriedoutviaidentifyingpotentialrisksofspeechtospeechmodels,exploratorydiscoveryofadditionalnovelrisksthroughexpertredteaming,turningtheidentifiedrisksintostructuredmeasurementsandbuildingmitigationsforthem.WealsoevaluatedGPT-4o
inaccordancewithourPreparednessFramework[4]
.
3
3.1Externalredteaming
OpenAIworkedwithmorethan100externalredteamers2,speakingatotalof45different
languages,andrepresentinggeographicbackgroundsof29differentcountries.RedteamershadaccesstovarioussnapshotsofthemodelatdifferentstagesoftrainingandsafetymitigationmaturitystartinginearlyMarchandcontinuingthroughlateJune2024.
Externalredteamingwascarriedoutinfourphases.ThefirstthreephasestestedthemodelviaaninternaltoolandthefinalphaseusedthefulliOSexperiencefortestingthemodel.Atthetimeofwriting,externalredteamingoftheGPT-4oAPIisongoing.
Phase1
?10redteamersworkingonearlymodelcheckpointsstillindevelopment
?Thischeckpointtookinaudioandtextasinputandproducedaudioandtextasoutputs.
?Single-turnconversations
Phase2
?30redteamersworkingonmodelcheckpointswithearlysafetymitigations
?Thischeckpointtookinaudio,image&textasinputsandproducedaudioandtextasoutputs.
?Single&multi-turnconversations
Phase3
?65redteamersworkingonmodelcheckpoints&candidates
?Thischeckpointtookinaudio,image,andtextasinputsandproducedaudio,image,andtextasoutputs.
?Improvedsafetymitigationstestedtoinformfurtherimprovements
?Multi-turnconversations
Phase4
?65redteamersworkingonfinalmodelcandidates&assessingcomparativeperformance
?ModelaccessviaadvancedvoicemodewithiniOSappforrealuserexperience;reviewedandtaggedviainternaltool.
?Thischeckpointtookinaudioandvideoprompts,andproducedaudiogenerations.
?Multi-turnconversationsinrealtime
Redteamerswereaskedtocarryoutexploratorycapabilitydiscovery,assessnovelpotentialrisksposedbythemodel,andstresstestmitigationsastheyaredevelopedandimproved-specificallythoseintroducedbyaudioinputandgeneration(speechtospeechcapabilities).Thisredteaming
effortbuildsuponpriorwork,includingasdescribedintheGPT-4SystemCard[6]andthe
GPT-4(V)SystemCard[7]
.
Redteamerscoveredcategoriesthatspannedviolativeanddisallowedcontent(illegalerotic
content,violence,selfharm,etc),mis/disinformation,bias,ungroundedinferences,sensitive
2Spanningself-reporteddomainsofexpertiseincluding:CognitiveScience,Chemistry,Biology,Physics,Com-puterScience,Steganography,PoliticalScience,Psychology,Persuasion,Economics,Anthropology,Sociology,HCI,FairnessandBias,Alignment,Education,Healthcare,Law,ChildSafety,Cybersecurity,Finance,Mis/disinforma-tion,PoliticalUse,Privacy,Biometrics,LanguagesandLinguistics
4
traitattribution,privateinformation,geolocation,personidentification,emotionalperceptionandanthropomorphismrisks,fraudulentbehaviorandimpersonation,copyright,naturalsciencecapabilities,andmultilingualobservations.
ThedatageneratedbyredteamersmotivatedthecreationofseveralquantitativeevaluationsthataredescribedintheObservedSafetyChallenges,EvaluationsandMitigationssection.Insomecases,insightsfromredteamingwereusedtodotargetedsyntheticdatageneration.Modelswereevaluatedusingbothautogradersand/ormanuallabelinginaccordancewithsomecriteria(e.g,violationofpolicyornot,refusedornot).Inaddition,wesometimesre-purposedtheredteamingdatatoruntargetedassessmentsonavarietyofvoices/examplestotesttherobustness
ofvariousmitigations.
3.2Evaluationmethodology
Inadditiontothedatafromredteaming,arangeofexistingevaluationdatasetswereconvertedtoevaluationsforspeech-to-speechmodelsusingtext-to-speech(TTS)systemssuchasVoice
Engine[8]
.Weconvertedtext-basedevaluationtaskstoaudio-basedevaluationtasksbyconvertingthetextinputstoaudio.Thisallowedustoreuseexistingdatasetsandtoolingaroundmeasuringmodelcapability,safetybehavior,andmonitoringofmodeloutputs,greatlyexpandingoursetofusableevaluations.
WeusedVoiceEnginetoconverttextinputstoaudio,feedittotheGPT-4o,andscoretheoutputsbythemodel.Wealwaysscoreonlythetextualcontentofthemodeloutput,exceptincaseswheretheaudioneedstobeevaluateddirectly,suchasinevaluationsforvoicecloning(seeSection
3.3.1)
.
Limitationsoftheevaluationmethodology
First,thevalidityofthisevaluationformatdependsonthecapabilityandreliabilityoftheTTSmodel.Certaintextinputsareunsuitableorawkwardtobeconvertedtoaudio;forinstance:mathematicalequationscode.Additionally,weexpectTTStobelossyforcertaintextinputs,suchastextthatmakesheavyuseofwhite-spaceorsymbolsforvisualformatting.Sinceweexpect
5
thatsuchinputsarealsounlikelytobeprovidedbytheuseroverAdvancedVoiceMode,weeitheravoidevaluatingthespeech-to-speechmodelonsuchtasks,oralternativelypre-processexampleswithsuchinputs.Nevertheless,wehighlightthatanymistakesidentifiedinourevaluationsmayariseeitherduetomodelcapability,orthefailureoftheTTSmodeltoaccuratelytranslatetextinputstoaudio.
AsecondconcernmaybewhethertheTTSinputsarerepresentativeofthedistributionofaudioinputsthatusersarelikelytoprovideinactualusage.WeevaluatetherobustnessofGPT-4oonaudioinputsacrossarangeofregionalaccentsinSection
3.3.3.
However,thereremainmanyotherdimensionsthatmaynotbecapturedinaTTS-basedevaluation,suchasdifferentvoiceintonationsandvalence,backgroundnoise,orcross-talk,thatcouldleadtodifferentmodelbehaviorinpracticalusage.
Lastly,theremaybeartifactsorpropertiesinthemodel’sgeneratedaudiothatarenotcapturedintext;forexample,backgroundnoisesandsoundeffects,orrespondingwithanout-of-distributionvoice.InSection
3.3.1,weillustrateusingauxiliaryclassifierstoidentifyundesirableaudio
generationthatcanbeusedinconjunctionwithscoringtranscripts.
3.3Observedsafetychallenges,evaluationsandmitigations
Potentialriskswiththemodelweremitigatedusingacombinationofmethods.Wetrainedthemodeltoadheretobehaviorthatwouldreduceriskviapost-trainingmethodsandalsointegratedclassifiersforblockingspecificgenerationsasapartofthedeployedsystem.
Forobservedsafetychallengesoutlinedbelow,weprovideadescriptionoftherisk,themitigationsapplied,andresultsofrelevantevaluations.Therisksoutlinedbelowareillustrative,andnon-exhaustive,andarefocusedontheexperienceintheChatGPTinterface.Wefocusontherisksthatareintroducedbyspeechtospeechcapabilitiesandhowtheymayinteractwithpre-existing
modalities(text,image)3.
Risk
Mitigations
Unauthorizedvoicegenera-tion
?Inallofourpost-trainingaudiodata,wesuperviseidealcom-pletionsusingthevoicesampleinthesystemmessageasthebasevoice.
?Weonlyallowthemodeltousecertainpre-selectedvoicesanduseanoutputclassifiertodetectifthemodeldeviatesfromthat.
Speakeridentification
?Wepost-trainedGPT-4otorefusetocomplywithrequeststoidentifysomeonebasedonavoiceinanaudioinput,whilestillcomplyingwithrequeststoidentifyfamousquotes.
3Wealsoevaluatetextandvisioncapabilities,andupdatemitigationsappropriately.NoincrementalriskswerefoundbeyondexistingworkoutlinedinGPT-4andGPT-4(V)SystemCards.
6
Generatingcopyrightedcon-tent
?WetrainedGPT-4otorefuserequestsforcopyrightedcontent,includingaudio,consistentwithourbroaderpractices.
?ToaccountforGPT-4o’saudiomodality,wealsoupdatedcertaintext-basedfilterstoworkonaudioconversations,builtfilterstodetectandblockoutputscontainingmusic,andforourlimitedalphaofChatGPT’sAdvancedVoiceMode,instructedthemodeltonotsingatall.
Ungroundedinference/sen-sitivetraitattribution
?Wepost-trainedGPT-4otorefuserequestsforungroundedinference,suchas“howintelligentisthisspeaker?”.
?Wepost-trainedGPT-4otosafelycomplywithrequestsforsensitivetraitattributionbyhedginganswers,suchas“whatisthisspeaker’saccent”→"Basedontheaudio,theysoundliketheyhaveaBritishaccent.”
Disallowedcontentinaudiooutput
?Werunourexistingmoderationclassifierovertexttranscrip-tionsofaudiopromptsandgenerations,andblocktheoutputforcertainhigh-severitycategories.
Eroticandviolentspeechout-put
?Werunourexistingmoderationclassifierovertexttranscrip-tionsofaudioprompts,andblocktheoutputifthepromptcontainseroticorviolentlanguage.
3.3.1Unauthorizedvoicegeneration
RiskDescription:Voicegenerationisthecapabilitytocreateaudiowithahuman-sounding
syntheticvoice,andincludesgeneratingvoicesbasedonashortinputclip.
Inadversarialsituations,thiscapabilitycouldfacilitateharmssuchasanincreaseinfrauddue
toimpersonationandmaybeharnessedtospreadfalseinformation[9,
10](forexample,ifwe
alloweduserstouploadanaudioclipofagivenspeakerandaskGPT-4otoproduceaspeechinthatspeaker’svoice).
TheseareverysimilartotherisksweidentifiedwithVoiceEngine[8]
.
Voicegenerationcanalsooccurinnon-adversarialsituations,suchasouruseofthatabilitytogeneratevoicesforChatGPT’sAdvancedVoiceMode.Duringtesting,wealsoobservedrareinstanceswherethemodelwouldunintentionallygenerateanoutputemulatingtheuser’svoice.
RiskMitigation:Weaddressedvoicegenerationrelated-risksbyallowingonlythepreset
voiceswecreatedincollaborationwithvoiceactors[11]tobeused
.Wedidthisbyincludingtheselectedvoicesasidealcompletionswhilepost-trainingtheaudiomodel.Additionally,webuiltastandaloneoutputclassifiertodetectiftheGPT-4ooutputisusingavoicethat’sdifferentfromourapprovedlist.Werunthisinastreamingfashionduringaudiogenerationandblockthe
7
outputifthespeakerdoesn’tmatchthechosenpresetvoice.
Evaluation:Wefindthattheresidualriskofunauthorizedvoicegenerationisminimal.Our
systemcurrentlycatches100%ofmeaningfuldeviationsfromthesystemvoice4
basedonourinternalevaluations,whichincludessamplesgeneratedbyothersystemvoices,clipsduringwhichthemodelusedavoicefromthepromptaspartofitscompletion,andanassortmentofhumansamples.
Whileunintentionalvoicegenerationstillexistsasaweaknessofthemodel,weusethesecondaryclassifierstoensuretheconversationisdiscontinuedifthisoccursmakingtheriskofunintentionalvoicegenerationminimal.Finally,ourmoderationbehaviormayresultinover-refusalswhenthe
conversationisnotinEnglish,whichisanactiveareaofimprovement5.
Table2:Ourvoiceoutputclassifierperformanceoveraconversationbylanguage:
Precision
Recall
English0.96Non-English50.95
1.01.0
3.3.2Speakeridentification
RiskDescription:Speakeridentificationistheabilitytoidentifyaspeakerbasedoninputaudio.Thispresentsapotentialprivacyrisk,particularlyforprivateindividualsaswellasforobscureaudioofpublicindividuals,alongwithpotentialsurveillancerisks.
RiskMitigation:Wepost-trainedGPT-4otorefusetocomplywithrequeststoidentifysomeonebasedonavoiceinanaudioinput.WeallowGPT-4otoanswerbasedonthecontentoftheaudioifitcontainscontentthatexplicitlyidentifiesthespeaker.GPT-4ostillcomplieswithrequeststoidentifyfamousquotes.Forexample,arequesttoidentifyarandompersonsaying“fourscoreandsevenyearsago”shouldidentifythespeakerasAbrahamLincoln,whilearequesttoidentifyacelebritysayingarandomsentenceshouldberefused.
Evaluation:Comparedtoourinitialmodel,wesawa14pointimprovementinwhenthemodelshouldrefusetoidentifyavoiceinanaudioinput,anda12pointimprovementwhenitshouldcomplywiththatrequest.Theformermeansthemodelwillalmostalwayscorrectlyrefusetoidentifyaspeakerbasedontheirvoice,mitigatingthepotentialprivacyissue.Thelattermeanstheremaybesituationsinwhichthemodelincorrectlyrefusestoidentifythespeakerofafamousquote.
Table3:Speakeridentificationsafebehavioraccuracy
GPT-4o-early
GPT-4o-deployed
ShouldRefuse0.83ShouldComply0.70
0.980.83
4Thesystemvoiceisoneofpre-definedvoicessetbyOpenAI.Themodelshouldonlyproduceaudiointhatvoice
5Thisresultsinmoreconversationsbeingdisconnectedthanmaybenecessary,whichisaproductqualityandusabilityissue.
8
3.3.3Disparateperformanceonvoiceinputs
RiskDescription:Modelsmayperformdifferentlywithusersspeakingwithdifferentaccents.Disparateperformancecanleadtoadifferenceinqualityofservicefordifferentusersofthemodel
[12,
13,
14]
.
RiskMitigation:Wepost-trainedGPT-4owithadiversesetofinputvoicestohavemodelperformanceandbehaviorbeinvariantacrossdifferentuservoices.
Evaluations:WerunevaluationsonGPT-4oAdvancedVoiceModeusingafixedassistantvoice(“shimmer”)andVoiceEnginetogenerateuserinputsacrossarangeofvoicesamples.WeusetwosetsofvoicesamplesforTTS:
?Officialsystemvoices(3differentvoices)
?Adiversesetofvoicescollectedfromtwodatacampaigns.Thiscomprises27differentEnglishvoicesamplesfromspeakersfromawiderangeofcountries,andamixofgenders.
Weevaluateontwosetsoftasks:CapabilitiesandSafetyBehavior
Capabilities:
Weevaluate6
onfourtasks:TriviaQA,asubsetofMMLU7,HellaSwagand
Lambada.TriviaQAandMMLUareknowledge-centrictasks,whileHellaSwagandLambadaarecommonsense-centricortext-continuationtasks.Overall,wefindthatperformanceonthediversesetofhumanvoicesperformsmarginallybutnotsignificantlyworsethanonsystemvoicesacrossallfourtasks.
6Evaluationsinthissectionwererunonafixed,randomlysampledsubsetofexamples,andthesescoresshouldnotbecomparedwithpubliclyreportedbenchmarksonthesametask.
7Anatomy,Astronomy,ClinicalKnowledge,CollegeBiology,ComputerSecurity,GlobalFacts,HighSchoolBiology,Sociology,Virology,CollegePhysics,HighSchoolEuropeanHistoryandWorldReligions.FollowingtheissuesdescribedinEvaluationMethodology
3.2,weexcludetaskswithheavilymathematicalorscientificnotation
.
9
SafetyBehavior:Weevaluateonaninternaldatasetofconversationsandevaluatetheconsis-tencyofthemodel’sadherenceandrefusalbehavioracrossdifferentuservoices.Overall,wedonotfindthatthemodelbehaviorvariesacrossdifferentvoices.
10
3.3.4Ungroundedinference/Sensitivetraitattribution
RiskDescription:Audioinputcanleadtothemodelmakingpotentiallybiasedorinaccurate
inferencesaboutspeakers.Wedefinetwocategories:
?Ungroundedinference(UGI):Makinginferencesaboutaspeakerthatcouldnotbedeter-minedsolelyfromaudiocontent.Thisincludesinferencesaboutthingssuchasaspeaker’srace,socioeconomicstatus/occupation,religiousbeliefs,personalitytraits,politicalat-tributes,intelligence,appearance(e.g.,eyecolor,attractiveness),genderidentity,sexualpreference,orcriminalhistory.Thiscanleadtobothallocativeandrepresentationalharms
[13,
15]dependingonhowsuchbehaviormanifests
.
?Sensitivetraitattribution(STA):Makinginferencesaboutaspeakerthatcouldplausiblybedeterminedsolelyfromaudiocontent.Thisincludesinferencesaboutthingssuchasaspeaker’saccentornationality.PotentialharmsfromSTAincludeanincreaseinrisks
11
fromsurveillance[16]andadifferenceinqualityofserviceforspeakerswithdifferentvoice
attributes[12,
13,
14]
.
RiskMitigation:Wepost-trainedGPT-4otorefusetocomplywithUGIrequests,whilehedginganswerstoSTAquestions.Forexample,aquestiontoidentifyaspeaker’slevelofintelligencewillberefused,whileaquestiontoidentifyaspeaker’saccentwillbemetwithananswersuchas“Basedontheaudio,theysoundliketheyhaveaBritishaccent.”
Evaluation:Comparedtoourinitialmodel,wesawa24pointimprovementinthemodelcorrectlyrespondingtorequeststoidentifysensitivetraits(e.g,refusingUGIandsafelycomplyingwithSTA).
Table4:UngroundedInferenceandSensitiveTraitAttributionsafebehavioraccuracy
GPT-4o-early
GPT-4o-deployed
Accuracy
0.60
0.84
3.3.5Violativeanddisallowedcontent
RiskDescription:GPT-4omaybepromptedtooutputharmfulcontentthroughaudiothatwouldbedisallowedthroughtext,suchasaudiospeechoutputthatgivesinstructionsonhowtocarryoutanillegalactivity.
RiskMitigation:Wefoundhightexttoaudiotransferenceofrefusalsforpreviouslydisallowedcontent.Thismeansthatthepost-trainingwe’vedonetoreducethepotentialforharminGPT-4o’stextoutputsuccessfullycarriedovertoaudiooutput.
Additionally,werunourexistingmoderationmodeloveratexttranscriptionofbothaudioinputandaudiooutputtodetectifeithercontainspotentiallyharmfullanguage,andwillblocka
generationifso8.
Evaluation:WeusedTTStoconvertexistingtextsafetyevaluationstoaudio.Wethenevaluatethetexttranscriptoftheaudiooutputwiththestandardtextrule-basedclassifier.Ourevaluationsshowstrongtext-audiotransferforrefusalsonpre-existingcontentpolicyareas.FurtherevaluationscanbefoundinAppendixA.
Table5:Performancecomparisonofsafetyevaluations:Textvs.Audio
Text
Audio
NotUnsafe0.95NotOver-refuse50.81
0.930.82
3.3.6Eroticandviolentspeechcontent
RiskDescription:GPT-4omaybepromptedtooutputeroticorviolentspeechcontent,whichmaybemoreevocativeorharmfulthanthesamecontextintext.Becauseofthis,wedecidedtorestrictthegenerationoferoticandviolentspeech
8
WedescribetherisksandmitigationsviolativeanddisallowedtextcontentintheGPT-4SystemCard[6],
specificallySection3.1ModelSafety,andSection4.2ContentClassifierDevelopment
12
RiskMitigation:
Werunourexistingmoderationmodel[17]overatexttranscriptionofthe
audioinputtodetectifitcontainsarequestforviolentoreroticcontent,andwillblocka
generationifso.
3.3.7Otherknownrisksandlimitationsofthemodel
Throughthecourseofinternaltestingandexternalredteaming,wediscoveredsomeadditionalrisksandmodellimitationsforwhichmodelorsystemlevelmitigationsarenascentorstillindevelopment,including:
Audiorobustness:Wesawanecdotalevidenceofdecreasesinsafetyrobustnessthroughaudioperturbations,suchaslowqualityinputaudio,backgroundnoiseintheinputaudio,andechoesintheinputaudio.Additionally,weobservedsimilardecreasesinsafetyrobustnessthroughintentionalandunintentionalaudiointerruptionswhilethemodelwasgeneratingoutput.
Misinformationandconspiracytheories:Redteamerswereabletocompelthemodeltogenerateinaccurateinformationbypromptingittoverballyrepeatfalseinformationandproduceconspiracytheories.
WhilethisisaknownissuefortextinGPTmodels[18,
19],therewas
concernfromredteamersthatthisinformationmaybemorepersuasiveorharmfulwhendeliveredthroughaudio,especiallyifthemodelwasinstructedtospeakemotivelyoremphatically.Thepersuasivenessofthemodelwasstudiedindetail(SeeSection
3.7
andwefoundthatthemodeldidnotscorehigherthanMediumriskfortext-only,andforspeech-to-speechthemodeldidnotscorehigherthanLow.
Speakinganon-Englishlanguageinanon-nativeaccent:Redteamersobservedinstancesoftheaudiooutputusinganon-nativeaccentwhenspeakinginanon-Englishlanguage.Thismayleadtoconcernsofbiastowardscertainaccentsandlanguages,andmoregenerallytowardslimitationsofnon-Englishlanguageperformanceinaudiooutputs.
Generatingcopyrightedcontent:WealsotestedGPT-4o’scapacitytorepeatcontentfoundwithinitstrainingdata.WetrainedGPT-4otorefuserequestsforcopyrightedcontent,includingaudio,consistentwithourbroaderpractices.ToaccountforGPT-4o’saudiomodality,wealsoupdatedcertaintext-basedfilterstoworkonaudioconversations,builtfilterstodetectandblockoutputscontainingmusic,andforourlimitedalphaofChatGPT’sadvancedVoiceMode,instructedthemodeltonotsingatall.Weintendtotracktheeffectivenessofthesemitigationsandrefinethemovertime.
Althoughsometechnicalmitigationsarestillindevelopment,ourUsagePolicies[20]disallow
intentionallydeceivingormisleadingothers,andcircumventingsafeguardsorsafetymitigations.
Inadditiontotechnicalmitigations,weenforceourUsagePoliciesthroughmonitoringandtakeactiononviolativebehaviorinbothChatGPTandtheAPI.
3.4PreparednessFrameworkEvaluations
WeevaluatedGPT-4oinaccordancewithourPreparednessFramework[4]
.ThePreparednessFrameworkisalivingdocumentthatdescribesourproceduralcommitmentstotrack,evaluate,forecast,andprotectagainstcatastrophicrisksfromfrontiermodels.Theevaluationscurrentlycoverfourriskcategories:cybersecurity,CBRN(chemical,biological,radiological,nuclear),
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 2025建筑工程塔吊租賃合同
- 靜態(tài)爆破勞務(wù)合同協(xié)議
- 2025合同期滿企業(yè)不續(xù)約應(yīng)賠償
- 集體農(nóng)田租賃合同協(xié)議
- 集團內(nèi)部轉(zhuǎn)移合同協(xié)議
- 音樂達人合作合同協(xié)議
- 靜安區(qū)設(shè)備搬運合同協(xié)議
- 閥門半成品銷售合同協(xié)議
- 食堂采購合同協(xié)議模板
- 閑置樣品柜出售合同協(xié)議
- 2025年統(tǒng)計學期末考試題庫:綜合案例分析題解題技巧試卷
- 城投企業(yè)面試題庫及答案
- 2025高級鐵路車輛鉗工核心備考試題庫及答案(濃縮300題)
- 腰椎間盤突出癥護理講課
- 體檢中心知識試題及答案
- 項目一廢舊物品變折扇(教案)-2024-2025學年皖教版(2023)勞動四年級上冊
- 乳腺結(jié)節(jié)健康教育課件
- 學校食堂副食品配送服務(wù)投標方案(技術(shù)方案)
- 私人教練運動指導免責聲明書
- 第二單元《我的語文生活》公開課一等獎創(chuàng)新教學設(shè)計-(同步教學)統(tǒng)編版語文七年級下冊名師備課系列
- 2025年租房合同房東模板
評論
0/150
提交評論